CDH+HBase Indexer+Solr Creates a secondary index for HBase data

[TOC]

0. Early declaration

Some variables in this article are user-defined. For convenience, this article uses the path and name used in the installation environment.

  • Collection name: humanCollection
  • The configuration file path is /root/human
  • Zookeeper is installed on the following machines: Master, Slave1, slave2, slave3, slave4
  • CDH version: 5.15.0
  • HBase version: 1.2.0
  • Version: 1.5 Lily HBase Indexer
  • Solr version: 4.10.3
1. Create HBase tables, add data, and ensure that REPLICATION is enabled for HBase tables (1 indicates that REPLICATION is enabled, 0 indicates that REPLICATION is disabled, and the default value is 0).
  • Create a table

Create 'table NAME ', {NAME =>' column family NAME ',NUMREGIONS => 5,SPLITALGO => 'HexStringSplit',REPLICATION_SCOPE => 1}

  • Table already exists:
Disable 'table_name' ALTER 'table_name ',{NAME =>' table_name ', REPLICATION_SCOPE => 1} enable 'table_name'Copy the code
2. Enable replication in HBase (Search for replication on the HBase of the CM and select Enable replication)

3. Prepare A Chinese word participle pack (if necessary)

1. Download Chinese word segmentation package according to solr and CDH version, solr is 4.10.3, CDH is 5.15.0,

Lucene – analyzers – smartcn – 4.10.3 – cdh5.15.0. Jar

2. Distribute the Jar package to the directories related to the Solr and YARN services on all the machines in the cluster

Root @ master: ~ # cp lucene - analyzers - smartcn - 4.10.3 - cdh5.15.0. Jar/opt/cloudera/parcels/CDH/lib/hadoop - yarn root @ master: ~ # Cp lucene - analyzers - smartcn - 4.10.3 - cdh5.15.0. Jar/opt/cloudera/parcels/CDH/lib/solr/webapps/solr/WEB - INF/libCopy the code

3. The restart takes effect

4. Create SolrCloud and generate entity configuration files

solrctl --zk master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181/solr instancedir --generate /root/human

** A conf folder will be generated in /root/human

5. Modify the schema. XML file in the conf directory. Name corresponds to column stored in HBase. Pay special attention to the time type

Details on configuration in schema.xml

<?xml version="1.0" encoding="UTF-8" ? >
<schema name="example" version="1.5">
<fields>
	<! The -- name field corresponds to the outputField property in the morphline. conf file that we will need to modify later. Therefore, it can be regarded as a value to be created in hbase. Therefore, you are advised to combine the value with the table name and column family. The type used needs to be set in fieldType. Id, _version_, and text are required -->
    
    <! -- Required field -->
	<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="_root_" type="string" indexed="true" stored="false"/>
   <field name="name" type="text_general" indexed="true" stored="true"/>
   <field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>
   <field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
   <field name="_version_" type="long" indexed="true" stored="true"/>
    
    <! -- Database field -->
    <field name="Database Fields" type="text_ch" indexed="Create index or not" stored="Whether to store raw data (set to false if you do not need to store field values)" required="false" multiValued="false"  />
</fields>
<! -- Unique keys, like primary keys, can also be generated automatically by Solr -->
<uniqueKey>id</uniqueKey>
<types>
  <! -- DateField removed in Solr5.0 and replaced with TrieDateField -->
  <fieldType name="date_range" class="solr.DateField"/>
  <fieldType name="string" class="solr.StrField" sortMissingLast="true" />

    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="text_ch" class="solr.TextField" positionIncrementGap="100">  
    <analyzer type="index"> 
      	<! -- solr. SmartChineseSentenceTokenizerFactory belongs to lucene - analyzers - smartcn - 4.10.3 - cdh5.15.0. Jar, the package name is: Org, apache lucene. Analysis. Cn. Smart. SmartChineseSentenceTokenizerFactory and Org, apache lucene. Analysis. Cn. Smart. SmartChineseWordTokenFilterFactory solr6.0 removes this class, HMMChineseTokenizerFactory instead. The configuration is as follows:  <analyzer> <tokenizer class="solr.HMMChineseTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="org/apache/lucene/analysis/cn/smart/stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> -->  
      <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/>  
      <filter class="solr.SmartChineseWordTokenFilterFactory"/>  
    </analyzer>  
  </fieldType>
</types>
</schema>
Copy the code

Optional: Modify the solrconfig. XML file in the conf directory to open the hard commit file. Some performance will be affected, do as required.

<autoCommit>
       <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
       <openSearcher>true</openSearcher>
 </autoCommit>

Copy the code
  • If you want to modify the configuration file schema. XML in /root/human/conf, you need to upload it again and run the following statement:
solrctl --zk master:2181,slave1:2181,slave2:2181,slave3:2181/solr instancedir --update humanCollection /root/human/conf
solrctl --zk master:2181,slave1:2181,slave2:2181,slave3:2181/solr collection --reload humanCollection
Copy the code
6. Initialize the Collection instance and upload the configuration file to ZooKeeper

solrctl --zk master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181/solr instancedir --create humanCollection /root/human/conf

See the zk client nodes: ls/solr/configs/humanCollection, under the node solrconfig. XML, scheme. The XML configuration file; HumanCollection is under ls /solr/ Collections /

root@master:~# cd /opt/cloudera/parcels/CDH/lib/zookeeper/bin
root@master:/opt/cloudera/parcels/CDH/lib/zookeeper/bin# ./zkCli.sh
...
[zk: localhost:2181(CONNECTED) 0] ls /solr/configs/humanCollection
[mapping-FoldToASCII.txt, currency.xml, protwords.txt, scripts.conf, synonyms.txt, stopwords.txt, _schema_analysis_synonyms_english.json, velocity, admin-extra.html, solrconfig.xml.secure, update-script.js, _schema_analysis_stopwords_english.json, solrconfig.xml, admin-extra.menu-top.html, elevate.xml, schema.xml, clustering, spellings.txt, xslt, mapping-ISOLatin1Accent.txt, _rest_managed.json, lang, admin-extra.menu-bottom.html]
[zk: localhost:2181(CONNECTED) 1] ls /solr/collections
Copy the code
7. Create a collection
  • The default parameters

solrctl collection --create humanCollection

  • If you want to spread the data across nodes for storage and retrieval, you need to create multiple shards and add parameters

solrctl --zk master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181/solr collection --create humanCollection -s 4 -r 1 -m 10

Among them:

-s sets the number of shards to 4, indicating that solrclound is four machines

-r Indicates that the replica number is 1, indicating one replica

-m The default value is 1, which indicates the maximum number of Shards

-c is the name of the configuration file used by the solr/configs node on the ZK

-A indicates that copies can be added

** When creating a solr shard, replication, and maxShardsPerNode, you must specify shard, replication, and maxShardsPerNode according to the actual situation. Otherwise, an error is reported.

NumShards, replicationFactor, and liveSolrNode. A normal solrCloud cluster cannot be deployed on the same liveSolrNode

Multiple Replics for a shard, so when maxShardsPerNode=1, numShards*replicationFactor>liveSolrNode, reported

Fault. Therefore, the following conditions must be met: numShards*replicationFactor<liveSolrNode*maxShardsPerNode, that is

s*r < liveSolrNode*m

  • View the collection created:

solrctl --zk master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181/solr collection --list

Solr does not restart, so you must restart solr, otherwise the Chinese word segmentation will error!

<?xml version="1.0" encoding="utf-8"? >
<response> 
  <lst name="responseHeader"> 
    <int name="status">0</int>  
    <int name="QTime">1255</int> 
  </lst>  
  <lst name="failure"> 
    <str>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'humanCollection_shard1_replica1': Unable to create core [humanCollection_shard1_replica1] Caused by: solr.SmartChineseSentenceTokenizerFactory</str>  
    <str>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'humanCollection_shard3_replica1': Unable to create core [humanCollection_shard3_replica1] Caused by: solr.SmartChineseSentenceTokenizerFactory</str>  
    <str>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'humanCollection_shard4_replica1': Unable to create core [humanCollection_shard4_replica1] Caused by: solr.SmartChineseSentenceTokenizerFactory</str>  
    <str>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'humanCollection_shard2_replica1': Unable to create core [humanCollection_shard2_replica1] Caused by: solr.SmartChineseSentenceTokenizerFactory</str> 
  </lst> 
</response>
Copy the code
8. Create the Morphline configuration file

Enter the CM management page: http://IP:7180/ — select key-value Store Indexer — Configure — Morphlines file

SOLR_LOCATOR : {
  collection : humanCollection
  zkHost : "$ZK_HOST" 
}
morphlines : [
{
id : morphlineOfHuman 								
importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]	
commands : [                    
  {
    extractHBaseCells {
      mappings : [ 
        {
          inputColumn : "human:features_type"	
          outputField : "features_type" 
          type : string  
          source : value 
        },
        { 
		  inputColumn : "human:createTime" 
          outputField : "createTime" 
          type : string  
          source : value 
        }
      ]
    }
  }
  {
	convertTimestamp {
	  field : createTime
	  inputFormats : ["yyyy-MM-dd HH:mm:ss"]
	  inputTimezone : Asia/Shanghai
	  outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
	  outputTimezone : Asia/Shanghai
	}
  }
  { logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
]
Copy the code

Id: id name of the current Morphlines file.

ImportCommands: indicates the address of the command package to be imported.

ExtractHBaseCells: This command is used to read HBase column data and write it to SolrInputDocument object. This command must contain zero or

Multiple mappings command objects.

Mappings: Field mappings used to specify HBase column qualifiers.

InputColumn: HBase column to be added to SOLR. Values contain column families and column qualifiers separated by a ‘:’. Column qualifiers can also be represented by the wildcard ‘*’.

OutputField: Indicates the name of the data field to be output from the records read by morphline. The field name must be the same as the field name in the schema. XML file in Solr.

Type: defines the data type used to read HBase data. HBase data is stored in byte[], but all data is stored in byte[]

Solr indexes are text, so you need a method to convert the byte[] type to the actual data type. The value of the type parameter is used to do just that

Things. Now supports data types are: byte, int, long, string, Boolean, float, double, short and bigdecimal. Of course you can point

To determine the data type, only need to implement com. Ngdata. Hbaseindexer. Parse. ByteArrayValueMapper interface.

Source: specifies the HBase KeyValue as the input index data. The options are value and qualifier, when value is specified

The HBase column value is used as the index input. If it is Qualifier, the HBase column qualifier is used as the index input.

9. Create the Lily HBase Indexer configuration

Create a morphline-hbase-mapper-humanCollection. XML file in /root/human. Each collection corresponds to a morphline-hbase-mapper-HumanCollect. XML file. The morphlineId must not be the same as the table name of the hbase.

<?xml version="1.0" encoding="UTF-8"? >
<! - table: need index of HBase table name mapper: used to implement and reads the specified Morphline configuration file class, fixed for MorphlineResultToSolrMapper -- -- >
<indexer table="Corresponding to the table name in Hbase"  mapper="com.ngdata.hbaseindexer.morphline.MorphlineResultToSolrMapper" mapping-type="row">
<! The name parameter in param is used to specify the path of the file currently configured as morphlineFile, and the absolute or relative path is used to specify the local path. If you are using Cloudera Manager to manage morphlines.conf, write the value morphlines.conf -->
<param name="morphlineFile" value="morphlines.conf"/>
<! -- The value of morphlineId is the value of the Morphlines property ID in the configuration file morphlines. conf in the key-value Store Indexer. MorphlineId not the same as hbase table name -->
<param name="morphlineId" value="morphlineOfHuman"/>
</indexer>
Copy the code
10. Register the Lily HBase Indexer Configuration and Lily HBase Indexer Service

When the Contents of the Lily HBase Indexer configuration XML file are satisfactory, register it with the Lily HBase Indexer Service. Upload the Lily HBase Indexer configuration XML file to ZooKeeper for the given SolrCloud collection.

hbase-indexer add-indexer \
--name humanIndexer \
--indexer-conf /root/human/morphline-hbase-mapper-humanCollection.xml \
--connection-param solr.zk=master:2181,slave1:2181,slave2:2181,slave3:2181/solr \
--connection-param solr.collection=humanCollection \
--zookeeper master:2181,slave1:2181,slave2:2181,slave3:2181
Copy the code

Run again hbase – indexer list – indexers – they are master: 2181, slave1:2181, slave2:2181, slave3:2181, slave4:2181 check if added successfully.

root@master:/opt/cloudera/parcels/CDH/lib/zookeeper/bin# hbase-indexer list-indexers --zookeeper master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181
Number of indexes: 1

humanCollectionIndexer
  + Lifecycle state: ACTIVE
  + Incremental indexing state: SUBSCRIBE_AND_CONSUME
  + Batch indexing state: INACTIVE
  + SEP subscription ID: Indexer_humanCollectionIndexer
  + SEP subscription timestamp: 2018-12-10T09:36:29.514+08:00
  + Connection type: solr
  + Connection params:
    + solr.zk = master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181/solr
    + solr.collection = humanCollection
  + Indexer config:
      270 bytes, use -dump to see content
  + Indexer component factory: com.ngdata.hbaseindexer.conf.DefaultIndexerComponentFactory
  + Additional batch index CLI arguments:
      (none)
  + Default additional batch index CLI arguments:
      (none)
  + Processes
    + 4 running processes
    + 0 failed processes
Copy the code

Question:

www.cnblogs.com/husky/p/sol…

  • (1) If an indexer fails to be created, the original indexer already existshbase-indexer delete-indexer --name $IndxerName --zookeeper master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181Delete the original indexer
  • (2) usehbase-indexer list-indexers --zookeeper master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181Command to check whether the vm is created successfully

The reason is that only one zooKeeper is configured

Examples of errors:

hbase-indexer add-indexer \
--name bqjrIndexer \
--indexer-conf $HOME/hbase-indexer/bqjr/morphline-hbase-mapper.xml \
--connection-param solr.zk=bqbpm2.bqjr.cn:2181/solr \
--connection-param solr.collection=bqjr \
--zookeeper bqbpm2.bqjr.cn:2181
Copy the code

The correct sample

hbase-indexer add-indexer \
--name bqjrIndexer \
--indexer-conf $HOME/hbase-indexer/bqjr/morphline-hbase-mapper.xml \
--connection-param solr.zk=bqbps1.bqjr.cn:2181,bqbpm1.bqjr.cn:2181,bqbpm2.bqjr.cn:2181/solr \
--connection-param solr.collection=bqjr \
--zookeeper bqbps1.bqjr.cn:2181,bqbpm1.bqjr.cn:2181,bqbpm2.bqjr.cn:2181
Copy the code
  • (3) If the indexer needs to be rebuilt, delete using the following methods. If you cannot delete the node, you will need to manually delete the node information on ZK: ls/ngData/hbaseIndexer.
hbase-indexer delete-indexer -n smsdayIndexer --zookeeper nn1.hadoop:2181
Copy the code
11. Synchronize data

Add data to HBASE and go to the Solr management page: http://IP:8983(CDH 8984 or 8985)/ Solr

Enter HBase_Indexer_Test_cf1_name:xiaogang in q(query) to view the rowkey obtained from the HBase

12. Synchronize indexes in batches

After carefully observing 11, we find a problem. We only record the data inserted later. What about the data originally stored in HBase?

Run in the directory must be morphlines. Conf file, execute the find / | grep morphlines. Usually we choose the conf $to date

The process.

If you do not have morphlines.conf, create a new morphlines.conf file and add the contents of the morphline file configured in CM:

morphlines : [
{
id : morphlineOfHuman
importCommands : ["org.kitesdk.morphline.**", "com.ngdata.**"]	
commands : [                    
  {
    extractHBaseCells { 
      mappings : [ 
        {
          inputColumn : "human:features_type"
          outputField : "features_type"
          type : string
          source : value
        }
        { 
		  inputColumn : "human:createTime" 
          outputField : "createTime" 
          type : string  
          source : value 
        }
      ]
    }
  }
  {
	convertTimestamp {
	  field : createTime
	  inputFormats : ["yyyy-MM-dd HH:mm:ss"]
	  inputTimezone : Asia/Shanghai
	  outputFormat : "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
	  outputTimezone : Asia/Shanghai
	}
  }
  { logDebug { format : "output record: {}", args : ["@{}"] } }
]
}
]
Copy the code

Then, execute

hadoop --config /etc/hadoop/conf.cloudera.yarn \ jar /opt/cloudera/parcels/CDH/lib/hbase-solr/tools/hbase-indexer-mr-*-job.jar \ -D 'mapred.child.java.opts=-Xmx1024m' \ --conf /etc/hbase/conf/hbase-site.xml \ --log4j /opt/cloudera/parcels/CDH/share/doc/search*/examples/solr-nrt/log4j.properties \ --hbase-indexer-file /root/human/morphline-hbase-mapper-humanCollection.xml \ --morphline-file /root/human/morphlines.conf \ --verbose \ - go - live \ - zk - host master: 2181, slave1:2181, slave2:2181, slave3:2181 / solr \ - collection humanCollection - Chinese word segmentation need to add, Without a participle is without add this sentence - libjars/home/hadoop/package/lucene - analyzers - smartcn - 4.10.3 - cdh5.15.0. Jar \Copy the code

Reference: Lily HBase Batch Indexing for Cloudera Search

Question:

  • [morphlines.conf] morphlines.conf [morphlines.conf] morphlines.conf [morphlines.conf] morphlines.conf [morphlines.conf] morphlines.conf

Conf file and the path to the morphline-hbase-mapper. XML file. Implementation: the find / | grep morphlines. Conf $

Typically we choose the latest process, which we copy or add to the configuration item. Into the

/ opt/cm – 5.7.0 / run/cloudera – SCM – agent/process / 1386 – ks_indexer – HBASE_INDEXER/morphlines. Conf

Or add

- morphline - the file/opt/cm - 5.7.0 / run/cloudera - SCM - agent/process / 1501 - ks_indexer - HBASE_INDEXER/morphlines. Conf

Execute the following command

hadoop --config /etc/hadoop/conf \ jar / opt/cloudera/parcels/CDH/lib/hbase - solr/tools/hbase - indexer - Mr - 1.5 - cdh5.7.0 - job. Jar \ conf /etc/hbase/conf/hbase-site.xml \ --hbase-indexer-file $HOME/hbase-indexer/bqjr/morphline-hbase-mapper.xml \ - morphline - the file/opt/cm - 5.7.0 / run/cloudera - SCM - agent/process / 1629 - ks_indexer - HBASE_INDEXER/morphlines. Conf \ - zk - host bqbpm1.bqjr.cn:2181,bqbps1.bqjr.cn:2181,bqbpm2.bqjr.cn:2181/solr \ --collection bqjr \ --go-liveCopy the code
  • Solrconfig.xml cannot be found. Solrconfig.xml cannot be found

Add reducers–reducers 0

hadoop --config /etc/hadoop/conf \ jar /opt/cloudera/parcels/CDH/lib/hbase-solr/tools/hbase-indexer-mr-job.jar \ --conf /etc/hbase/conf/hbase-site.xml \ --hbase-indexer-file $HOME/hbase-indexer/bqjr/morphline-hbase-mapper.xml \ - morphline - the file/opt/cm - 5.7.0 / run/cloudera - SCM - agent/process / 1501 - ks_indexer - HBASE_INDEXER/morphlines. Conf \ - zk - host bqbpm2.bqjr.cn:2181/solr \ --collection bqjr \ --reducers 0 \ --go-liveCopy the code
13. Set multiple Indexers

Each Hbase Table generates a Solr Collection index. Each index corresponds to a Lily HBase Indexer configuration file morphlines.conf and a morphline configuration file morphline-hbase-mapper.xml. Where morphlines.conf is managed by the CDH key-value Store Indexer console, distinguished by ID. Conf files. How do we associate indexer with collection?

SOLR_LOCATOR :{
    # ZooKeeper ensemble
    zkHost :"$ZK_HOST"
}
morphlines :[
	{
	id : XDGL_ACCT_FEE_Map
	importCommands :["org.kitesdk.**","com.ngdata.**"]
	commands :[
        {
        extractHBaseCells {
        mappings :[
			{
        	inputColumn :"cf1:ETL_IN_DT"
        	outputField :"XDGL_ACCT_FEE_cf1_ETL_IN_DT"
        	type :string
        	source : value
			}
		]
		}
	}
	{ logDebug { format :"output record: {}", args :["@{}"]}}
	]
},
{
    id : XDGL_ACCT_PAYMENT_LOG_Map
    importCommands :["org.kitesdk.**","com.ngdata.**"]
    commands :[
		{
        extractHBaseCells {
        mappings :[
			{
            inputColumn :"cf1:ETL_IN_DT"
            outputField :"XDGL_ACCT_PAYMENT_LOG_cf1_ETL_IN_DT"
            type :string
            source : value
			}
		]
		}
	}
	{ logDebug { format :"output record: {}", args :["@{}"]}}
	]
}
]
Copy the code
14. Extend commands

solrctl

solrctl instancedir --list
solrctl collection --list
Copy the code
  • Update the Coolection configuration
	solrctl instancedir --update User \$HOME/hbase-indexer/User
	solrctl collection --reload User
Copy the code
  • Delete instancedir
	solrctl --zk master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181/solr instancedir --delete humanCollection
Copy the code
  • Delete the collection
	solrctl --zk master:2181,slave1:2181,slave2:2181,slave3:2181,slave4:2181/solr collection --delete humanCollection
Copy the code
  • Delete all doc from collection
	solrctl collection --deletedocs User
Copy the code
  • Delete the User configuration directory
	rm -rf $HOME/hbase-indexer/User
Copy the code

hbase-indexer

  • If morphline-hbase-mapper. XML is changed, the index must be updated
	hbase-indexer update-indexer -n userIndexer
Copy the code
  • Remove the index
	hbase-indexer delete-indexer -n userIndexer --zookeeper master:2181,slave:2181
Copy the code
  • View index
	hbase-indexer list-indexers
Copy the code
[hadoop @ db1 lib] $solrctl -- help the usage: / opt/cloudera/parcels/CDH 5.5.1-1. Cdh5.5.1. P0.11 / bin /.. /lib/solr/bin/solrctl.sh [options] command [command-arg] [command [command-arg]] solrctl [options] command [command-arg]  [command [command-arg]] ... Optional parameters are as follows: --solr: Specifies SolrCloud's Web API. This parameter is required if the command is run on a node outside the SolrCloud cluster. --zk: specifies the ZK cluster solr directory. --help: Prints the help information. --quiet: Runs in silent mode. Command The following commands are available: init [--force] : initializes the configuration. Instancedir: Maintains the physical directory. Optional parameters are: --generate path --create name path --update name path --get name path --delete name --list collection: Maintenance of the collections. The optional parameters are:  [--create name -s <numShards> [-a Create collection with autoAddReplicas=true] [-c <collection.configName>] [-r <replicationFactor>] [-m <maxShardsPerNode>] [-n <createNodeSet>]] --delete name: Deletes a collection. --reload name: Reloads a collection. --stat name: Outputs SolrCloud specific run-time information fora collection. `--list: Lists all collections registered in SolrCloud. --deletedocs name: Purges all Indexed Documents from a collection. Core: Maintain cores. Optional parameters are: --create name [-p name=value]] --reload name: Reloads a core. -- Unload Name: Unloads a core. --status name: Prints status of a core. cluster: maintains cluster configuration information. The optional parameters are: -- get-solrXML file --put-solrxml fileCopy the code
Problem of 15.
  • (1)HBaseIndexer automatically exits after a while

There are many reasons for this problem. One is the mappine file mismatch described earlier, and the other is due to memory overflow.

You can see the error log in here. If it is a memory overflow problem, you need to adjust it.

Solution: Configuration – Resource Management – Stack size (bytes) 50MB changed to 1G

  • (2) Data synchronized by HBaseIndexer is inconsistent with Solr

The first is because Spark synchronization and HBaseIndexer are running at the same time, and data is always updated. Clearing data during batch inserts will cause the data originally inserted by HBaseIndexer to be deleted

Read -row=”never” read-row=”never” read-row=”never”

See stackOverflow for details

16. Reference materials:

1.Solr official website documents

2.Solr Wiki

3.Solr Wiki DateRangeField

4. How to use Lily HBase Indexer to index HBase data in Solr

5. The CDH uses Solr to implement HBase secondary index

6. Query HBase+Solr secondary index in real time

7. How to use Solr in CDH to establish full-text index of JSON data in HDFS

8. Hbase configures secondary indexes based on SOLR

9.HBase secondary index creation solutions (such as Solr+ HBase solution)

10. Hbase secondary index based on Solr

Solr and I (5)– A detailed explanation of configuration in schema.xml

12. I work with Solr (VI)– SolR 6.0 to configure the Chinese word Analyzer IK