As the Document Store in DocArray, Weaviate can make the Document processing and retrieval in the cloud more quickly.

DocArray & Weaviate big base

DocArray: Data structure for unstructured data

DocArray is an extensible data structure ideal for deep learning tasks. It is primarily used for the transfer of nested and unstructured data, including text, image, audio, video, 3D Mesh, and more.

Compared to other data structures:

✔ Come to ✅, come to some, come to ❌, come to none

With DocArray, deep learning engineers can efficiently process, embed, search, recommend, store, and transfer data with the help of the Pythonic API.

Weaviate: Open source vector search engine

Weaviate is an open source vector search engine that stores both objects and vectors. Weaviate combines vector search with structured filtering to create a robust, fault-tolerant search engine.

Weaviate also provides the Weaviate Cluster Service, an out-of-the-box cloud storage infrastructure.

Jina + Weaviate =?

💥 Jina + Weaviate, what kind of spark can collision?

There are two ways to create cloud storage instances using Weaviate:

  • Start the Weaviate instance locally

  • Create a Weaviate cloud service instance

1. Start the Weaviate instance locally

To use the Weaviate storage service on the back end, you need to start a brand new Instance of Weaviate. You can do this by creating docker-comemess.yml as follows:

-- version: '3.4' services: weaviate: command: --- host-0.0.0.0 --- port - '8080' --- scheme-http image: Semitechnologies/weaviate: 1.11.0 ports: - "8080-8080" restart: on - failure: 0 environment: QUERY_DEFAULTS_LIMIT: 25 AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' PERSISTENCE_DATA_PATH: '/var/lib/weaviate' DEFAULT_VECTORIZER_MODULE: 'none' ENABLE_MODULES: '' CLUSTER_HOSTNAME: 'node1' ...Copy the code

Once created, you can run Docker Compose to start the instance.

2. Create Weaviate cloud service instance

You can create Weaviate instances for free with the Weaviate Cloud Service.

To register and create a new instance, visit Here.

Watch this video to walk you through creating a Weaviate instance.

Introductory tutorial demo

With this tutorial, you will understand:

  • Create a Weaviate local instance to store the Document

  • Create a simple text search system

1. Start the Weaviate service and create oneDocumentArrayAn array of instance

from docarray import DocumentArray

da = DocumentArray(
    storage="weaviate", config={"name": "Persisted", "host": "localhost", "port": 8080}
)
Copy the code

2, Index Documents

da.extend(
    [
        Document(text="Persist Documents with Weaviate."),
        Document(text="And enjoy fast nearest neighbor search."),
        Document(text="All while using DocArray API."),
    ]
)
Copy the code

3. Use BERT model to generate vectors

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")

def collate_fn(da):
    return tokenizer(da.texts, return_tensors="pt", truncation=True, padding=True)

da.embed(model, collate_fn=collate_fn)
Copy the code

4, select Documents from index Documents

results = da.find(
    DocumentArray([Document(text="How to persist Documents")]).embed(
        model, collate_fn=collate_fn
    ),
    limit=1,
)
print(results[0].text)
Copy the code

Output:Persist Documents with Weaviate.

The two artifacts create H&M’s map search system

Integrating DocArray and Weaviate makes it much easier to build a system that searches for images.

See GitHub Repo Here

DocArray and vector database Qdrant, and what kind of spark can wipe out? Check us out next time!

Related links:

GitHub Repo

DocArray Documentation

Jina’s Learning Bootcamp

Weaviate’s Documentation