Image vector similarity Retrieval service (5) - Based on Milvus

An overview of the

In order to make the similar image retrieval scene of “search by image”, a search by image system is designed based on ES vector index calculation and image feature extraction model VGG16.
Open source: github.com/thirtyonele…

Retrieve the scene

Reasoning process: the image is read and the algorithm generates feature vectors
Feature storage: Feature vectors are stored in Milvus
Retrieval process: on-line real-time vector retrieval
The specific process is as follows:

Milvus server installation

Installation Guide: Milvus. IO /cn/docs/mil…
Download the configuration

  mkdir -p milvus/conf && cd milvus/conf
  wget https://raw.githubusercontent.com/milvus-io/milvus/0.10.6/core/conf/demo/server_config.yaml
Copy the code

The service start

Docker run - d - name milvus_cpu_0. 11.0 \ -p 19530:19530 \ -p 19121: \ 19121 - v < ROOT_DIR > / milvus/db: / var/lib/milvus/db \ -v <ROOT_DIR>/milvus/conf:/var/lib/milvus/conf \ -v <ROOT_DIR>/milvus/logs:/var/lib/milvus/logs \ -v < ROOT_DIR > / milvus/wal: / var/lib/milvus/wal \ milvusdb/milvus: 0.10.6 - CPU - ddc2 d022221-64Copy the code

Milvus vector index library

The h5PY vector library is selected to build the library
The retrieval type is the inner product: metricType.ip

H5f = h5py.File(index_dir, 'r') self.retrieval_db = h5f['dataset_1'][:] self.retrieval_name = h5f['dataset_2'][:] h5f.close() # 2. List_collections Milvus if self.index_name in self.client.list_collections()[1]: self.client.drop_collection(collection_name=self.index_name) self.client.create_collection({'collection_name': self.index_name, 'dimension': 512, 'index_file_size': 1024, 'metric_type': MetricType.IP}) self.id_dict = {} status, ids = self.client.insert(collection_name=self.index_name, records=[i.tolist() for i in self.retrieval_db]) for i, val in enumerate(self.retrieval_name): self.id_dict[ids[i]] = str(val) self.client.create_index(self.index_name, IndexType.FLAT, {'nlist': 16384}) # pprint(self.client.get_collection_info(self.index_name)) print("************* Done milvus indexing, Indexed {} documents *************".format(len(self.retrieval_db)))Copy the code

Milvus retrieval implementation

According to the definition of index loading, the dot product distance calculation method is adopted for retrieval here, and the specific code is as follows:

_, vectors = self.client.search(collection_name=self.index_name, query_records=[query_vector], top_k=search_size, params={'nprobe': 16})
Copy the code

Switch to Euclidean: metricType.l2

Introduction to operation

Download the project source code: github.com/thirtyonele…
Operation 1: Build the base index

Python index.py --train_data: specifies the path to the training images folder. The default path is' <ROOT_DIR>/data/train '--index_file: Custom index file storage path, default is' <ROOT_DIR>/index/train.h5 'Copy the code

Operation two: Use similarity search

Python Retrieval. Py --engine=milvus --test_data: Custom test image details address, default '<ROOT_DIR>/data/test/001_accordion_image_0001.jpg' --index_file: H5 '--db_name: specifies the ES or Milvus index name. The default is' image_retrieval' --engine: User-defined search engine type. The default search engine type is' numpy '. The options are numpy, FAiss, ES, or MilvusCopy the code

conclusion

Library-based management is easy to understand
Using posture is similar to ES but performs better
Because Milvus currently only supports vector retrieval and does not support scalar correlation, we need to build our own business library if scalar filtering is involved
The Milvus community will continue to support distribution, making it easier to handle large index scenarios

That’s all!

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Image vector similarity Retrieval service (5) — Based on Milvus

An overview of the

Retrieve the scene

Milvus server installation

Milvus vector index library

Milvus retrieval implementation

Introduction to operation

conclusion

Image vector similarity Retrieval service (5) — Based on Milvus

An overview of the

Retrieve the scene

Milvus server installation

Milvus vector index library

Milvus retrieval implementation

Introduction to operation

conclusion

Related Posts

[NLP Practice 01] SimpleTransformers installation and simple implementation of text classification

The recommendation mechanism of little red book

Minimum Numpy gradient descent