This paper mainly introduces some conceptual things of Solr/Solr Cloud, which are usually abstract and difficult to understand. My advice for reading this article is: read quickly, dive in and dive out, don’t delve too deeply, and read the rest of the article and accumulate personal experience to make it more clear.

Solr

Solr is a high-performance, Java-based full-text search server based on Lucene. It is easy to install and configure, comes with an HTTP-based management interface, and provides REST apis for various clients to call.




The way Solr works

Solr Cloud

Solr Cloud is a high performance, high availability, fault tolerant cluster solution provided by Solr. In Solr Cloud, data is divided into “chunks” or “shards” that can be stored on multiple physical machines and use redundancy provided by Replicas for scalability and fault tolerance. The system uses one or more Zookeeper services to help manage the entire cluster structure and ensure that all index and search requests are correctly routed to different nodes.




How Solr Cloud works

Some concepts in Solr Cloud

  • Core: A Solr contains one or more Solr cores. Each Solr Core provides index and query functions independently, and each Solr Core has an independent configuration file. In Solr Cloud, Core is usually a Shard or Replica and cannot provide services independently.
  • Config Set: Solr Core provides a Set of configuration files required for the service. Each config set has a name and needs to include solrconfig.xml and schema.xml at a minimum, but other files may need to be included depending on how these two files are configured. In Solr Cloud it is stored in Zookeeper and can be CRUD through the API for configuration files.
  • Collection: a logical and complete Collection in the Solr Cloud. It consists of multiple cores (Shard and Replica).
  • Leader: The Shard Replicas who wins the election. Each Shard has multiple Replicas, which need to be elected to determine a Leader. Elections can happen at any time, but usually they are triggered only when a Solr instance fails. When documents are indexed, SolrCloud passes them to the leader of the Shard, who distributes them to Replicas of all shards.
  • Replica: a copy of the Shard.
  • Shard: logical Shard of a Collection.
  • Zookeeper: In Solr Cloud, Zookeeper is used to monitor node status, Leade election, data routing, and configuration file management in a unified manner.

Some concept diagrams for Solr Cloud




The Collection logic diagram




Complete relationship diagram of SolrCloud and Collection




Solr Cloud index creation process diagram




Solr Cloud query index process diagram

Next article: Scaffolding & Configuration