Getting Started with Apache Solr (Beginner's Tour)

The author | | Hu Haichao blog source https://blog.csdn.net/u011936…

First up: This article covers all aspects of getting started with Solr. Read it line by line. It will give you a clear, comprehensive understanding of Solr and make it simple and practical. In this example of the Apache Solr Beginners tutorial, we will discuss how to install the latest version of Apache Solr and show you how to configure it. In addition, we will show you how to carry out the index using Solr’s sample data files. Apache Solr supports a variety of formats, including databases, PDF files, XML files, CSV files, and more. In this example, we will examine how to index data from a CSV file. The preferred environment for this example is Windows. Before starting the Solr installation, make sure you have the JDK installed and JAVA_HOME configured correctly.

1. Why choose Apache Solr

Apache Solr is a powerful search server that supports a REST-style API. Solr is based on Lucene, which supports powerful matching capabilities such as phrases, wildcards, joins, groupings, and many more different data types. It uses Apache ZooKeeper to optimize for high traffic. Apache Solr offers a wide variety of features, and we’ve listed some of the main ones.

1. Advanced full-text search function. 2. XML, JSON, and HTTP - based on open interface standards. 3. Highly scalable and fault tolerant. 4. Both schema and schemaless configurations are supported. 5. Paging search and filtering. 6. Support languages like English, German, Chinese, Japanese, French and many major languages 7. Rich document analysis.

2. Install Apache Solr

To let Apache Solr download the latest version from the following position: http://lucene.apache.org/solr… At the time of this writing, the stable version available is 5.0.0. Apache Solr has gone through changes from 4.xx to 5.0.0, so if you have different versions of Solr, you need to download the 5.xx version and use it as a template. Once the Solr ZIP file is downloaded unzip it into a folder. The extracted folder looks like the following.

Solr’s folder bin folder contains scripts to start and stop the server. The Example folder contains several sample files. We’ll use one of these to illustrate how Solr indexes data. The Server folder contains the Logs folder, to which all Solr’s logs are written. This will help the indexing process to check any error logs. The Solr folder under the sever folder contains different collections or cores/collections. Configuration and data for each collection or core is stored in the appropriate collection or core folder. Apache Solr comes with a built-in Jetty server. But before we start, we must verify that JAVA_HOME is configured. We can start the server using command-line scripts. Let’s go to the bin directory of solr and type the following command from the command prompt

solr start

This will start the server under the default port 8983 Solr. Now we can open the following URL in our browser and verify that our instance of Solr is running. The details of Solr’s administration tools are beyond the scope of the example. http://localhost:8983/solr/

Solr administrative console

3. Configure Apache Solr

In this section, we will show you how to configure the core/collection as a Solr instance, and how to define the fields. Apache Solr comes with an option called a schema-free schema. This option allows users to build effective schemas without manually editing schema files. In this case, however, we will use architectural configuration to understand the internals of Solr.

3.1 Establishing Core

The configuration when Solr’s server is started in standalone mode is called core, and the configuration when it is started in SolrCloud mode is called collection. In this example, we’ll talk about standalone servers and cores. We will discuss SolrCloud later. First, we need to create a core index data. The Solr create command has the following options:

-c <name> - The name of the core or collection to create (required). -d <confdir> - Configuration directory, very useful in SolrCloud mode. -n <configName> - Configure name. This will default to the name of the core or collection. -p <port> - Send create command on port of local Solr instance; The default script attempts to detect the port by looking for an instance running Solr. -s <replicas> -Number of shards to split a collection into, default is 1. -rf <replicas> -Number of files in the collection. The default value is 1.

In this example, we will use the core name and configuration directory -D parameter -C parameter. For all other parameters we use the default Settings.

Now browse the solr-5.0.0bin folder in the command window and issue the following command.

solr create -c jcg -d basic_configs

We can see the following output in the command window. Creating new core ‘jcg’ using command: http://localhost:8983/solr/admin/cores? action=CREATE&name=jcg&instanceDir=jcg { “responseHeader”:{ “status”:0, “QTime”:663}, “core”:”jcg”}

Now we navigate to the following URL and we can see the JCG core populated on the core selector. You can also see the core statistics.

http://localhost:8983/Solr

Solr’s core JCG

3.2 Modify the schema.xml file

We need to modify the fields contained in the serversolrjcgconf file folder in schema.xml. We’ll use one of the sample files “books.csv” that comes with Solr’s installation index. Solr-5.0.0 exampleexampledocs The folder where this file is located is now located in the serverSolr directory. You will see a folder named JCG created. Subfolders conf and data hold the core configuration and indexed data, respectively. Now edit serversolrjcgconfserversolrjcgconfschema. XML files, set after the only element to add the following content. schema.xml

<uniqueKey>id</uniqueKey> <! -- Fields added for books.csv load--> <field name="cat" type="text_general" indexed="true" stored="true"/> <field name="name" type="text_general" indexed="true" stored="true"/> <field name="price" type="tdouble" indexed="true" stored="true"/> <field name="inStock" type="boolean" indexed="true" stored="true"/> <field name="author" type="text_general" indexed="true" stored="true"/>

We have set the property index to true. This specifies that the field is used for indexing and that the record can be retrieved using the index. Setting this value to false will store only realms, but not queries. Notice also that the other attribute, Stored, is set to true. This specifies that the field is stored and can be returned in the output. Setting this field to false makes the field uniquely indexed and cannot be retrieved on output. The type of field we have assigned to the “books.csv” file that exists here. The CSV file “ID” in the first field is taken care of automatically by the unique key of the index schema.xml file. If you notice, we’ve skipped fields SERIES_T, SEQUENCE_I and GENRE_S without making any entries. However, when we perform the index, all of these fields are indexed without any problems. If you want to know about this you need to check out the dynamicField section of the schema.xml file. schema.xml

<dynamicField name="*_i" type="int" indexed="true" stored="true"/>

<dynamicField name="*_is" type="ints" indexed="true" stored="true"/>

<dynamicField name="*_s" type="string" indexed="true" stored="true" />

<dynamicField name="*_ss" type="strings" indexed="true" stored="true"/>

<dynamicField name="*_l" type="long" indexed="true" stored="true"/>

<dynamicField name="*_ls" type="longs" indexed="true" stored="true"/>

<dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>

<dynamicField name="*_txt" type="text_general" indexed="true" stored="true"/>

<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>

<dynamicField name="*_bs" type="booleans" indexed="true" stored="true"/>

<dynamicField name="*_f" type="float" indexed="true" stored="true"/>

<dynamicField name="*_fs" type="floats" indexed="true" stored="true"/>

<dynamicField name="*_d" type="double" indexed="true" stored="true"/>

<dynamicField name="*_ds" type="doubles" indexed="true" stored="true"/>

Now that we have changed the configuration, we must stop and start the server. To do this, we need to issue commands from the command line below the bin directory.

Solr  stop -all

The server will stop now. Now start the server problem by running the following command from the bin directory via the command line.

Solr start

4. Index data

Apache Solr comes with a standalone Java program called SimplePostTool. This program is packaged as a JAR and can be seen under exampleexampleDocs in the installation directory. We now navigate to the ExampleExampleDocs folder on the command line, and type the following command. You’ll see a bunch of options, tools to use.

Java -jar post.jar -h

In general Usage the following format is used: Usage: Java [SystemProperties] – jar post. Jar [-h | -] [< file | folder | | url arg > / < file | folder | | url arg >…]] as we said earlier, We will index the data in the “books.csv” file. We will navigate to solr-5.0.0exampleexampledocs at the command prompt and issue the following command. Java – Dtype = text/CSV – Durl = http://localhost:8983/solr/jcg/update – jar post. The jar books. CSV SystemProperties used here is:

-dtype – The type of the data file. -Durl-JCG core address.

The file “books.csv” will now be indexed and the command prompt will show the following output.

SimplePostTool version 5.0.0 Posting files to/base url http://localhost:8983/solr/jcg/update using the content-type text/csv… POSTing file books.csv to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/jcg/update… Time spent: 0:00:00. 647

Now we navigate to the following URL and select the core JCG. http://localhost:8983/solr

Take a closer look at Solr’s JCG core data in the statistics section. This document parameter will show the number of rows indexed.

5. Access indexed documents

Apache Solr provides a REST-based API to access the data, and also provides different parameters to retrieve the data. We’ll show you some scenario-based queries.

5.1 Search by name

We will use its name to retrieve the details of the book. To do this, we will use the following syntax. The parameter “Q” in the URL is the query event. Open your browser to the following URL. http://localhost:8983/solr/jcg/select?q=name: “A Clash of Kings” output will be in the following, as shown.

By name SOLR

5.2 First letter search

Now we will show you how to search records if we only know the starting letter or word and don’t remember the full title. We can retrieve the results with the following query. http://localhost:8983/solr/jcg/select?q=name: “A” output will list all books letters A staring at

The first letter of Solr

5.3 Search using wildcards

Solr supports wildcard search. We’ll show you how to retrieve all the books with “of” in the title below. http://localhost:8983/solr/jcg/select?q=name:”*of”

Wildcard search for Solr

5.4 Conditions of search use

Solr supports search for terms. We can set conditions, and our query provides the “FQ” parameter. Below we’ll show you how to find books for less than $6. http://localhost:8983/solr/jcg/select?q= * & fq = price: [0 TO 6] output will list only less than $6 books.

Solr search criteria

6. Solr’s client API

There are different client APIs available to connect to Solr’s servers. We list some of the most widely used Solr client APIs.

SolRuby -- To connect from Ruby SolPHP -- To connect from PHP PySolr -- To connect from Python Solperl -- To connect from PHP PySolr Solrj -- To connect from Java SolrSharp -- To connect from C#

In addition, Solr provides a REST-based API that JavaScript can use directly.

PS: Because the length is too long, some paragraphs are translated by translation software. Please understand, but they have been modified manually, which does not affect the study of the content. Students who have time can also read English version. Refer to the address: https://examples.javacodegeek…

If this article has been helpful, please keep an eye on thumb up

Welcome to pay attention to my public account < sentiment IT>, push technical articles for everyone to learn and reference every day.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Getting Started with Apache Solr (Beginner’s Tour)

1. Why choose Apache Solr

2. Install Apache Solr

3. Configure Apache Solr

3.1 Establishing Core

3.2 Modify the schema.xml file

4. Index data

5. Access indexed documents

5.1 Search by name

5.2 First letter search

5.3 Search using wildcards

5.4 Conditions of search use

6. Solr’s client API

Getting Started with Apache Solr (Beginner’s Tour)

1. Why choose Apache Solr

2. Install Apache Solr

3. Configure Apache Solr

3.1 Establishing Core

3.2 Modify the schema.xml file

4. Index data

5. Access indexed documents

5.1 Search by name

5.2 First letter search

5.3 Search using wildcards

5.4 Conditions of search use

6. Solr’s client API

Related Posts

Solr Performance Optimization Practices

Solr Basics (1) — Getting started

Solr Basics (1) — Getting started