The maintenance progress of lucene.net open source project under.NET platform is not very ideal, because there are too many packages that need to be transplanted, so the progress of maintainers is relatively slow. Although there are some articles on the Internet for the transformation version of 3.0.3 before the transplantation of.NET Core platform, but the number of articles is very few, it is good to have a reference. So I will write out today’s trip to the pit, for everyone to implement reference.

1. Introduction to Lucene

As usual, copy a paragraph of description, omit 8000 words here, and imagine.

In short, Lucene is the Apache Foundation’s open source full-text search library, and its powerful search capabilities are unmatched by other libraries. Lucene.net is an adaptation library for the.NET platform, currently supporting the.NET Framework and.net Core series of platforms.

The latest version: 4.8.0, 4. Other versions are not available, because of the reason of transplantation, so the current migration from 4.8, for full-text search, basically enough.

So cool, I feel like if I master Lucene, I master the search engine!

Here is a copy of the work of the open source organization, which is already very hard, and I will not trouble them, however, today’s big hole is inseparable from it, which will be discussed in the next section.

ICU4J is Lucene’s largest dependency. Various attempts have been made to take advantage of alternative approaches:

ICU4NET ICU -dotnet But we ran into several issues: lack of support for 32/64-bit lack of support for. NET standard platform support for the lack of features, and problems encountered in trying to implement them for the lack of thread safety we finally completed a direct port of ICU4J functionality 40% to support Lucene.NET. The project is called ICU4N and is conducted in an external GitHub repository. There are several pressing issues that we can use to get Lucene.NET into production.

2, SmartChineseAnalyzer

When I first contacted Lucene, I used pangu word segmentation. However, after a few years, Lucene has built a good word analyzer, SmartChineseAnalyzer, which can customize the word segmentation dictionary. A few other recommended participles:

  1. PanGu participle (can be used directly)
  2. JIEba JIEba

3. An example of building an index

4.8 The MMapDirectory directory has been changed to the directory management index directory, which is used here to build indexes.

var dir = MMapDirectory.Open(@"E:\lucene".new NativeFSLockFactory());
            //var dir = new RAMDirectory(,);
            var t = "cn";
            Analyzer analyzer;
            switch (t)
            {
                case "std":
                    analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_48);
                    break;
                case "cn":
                    analyzer = new SmartChineseAnalyzer(LuceneVersion.LUCENE_48,true);
                    break;
                case "ws":
                    analyzer = new WhitespaceAnalyzer(LuceneVersion.LUCENE_48);
                    break;
                default:
                    throw new NotImplementedException();
            }


            // Prepare data
            IndexWriterConfig iwc = new IndexWriterConfig(LuceneVersion.LUCENE_48, analyzer);
            // When the size of the documents added by IndexWriter exceeds that of RAMBufferSizeMB, IndexWriter writes the operations in memory to hard disk.
            iwc.RAMBufferSizeMB = 32;
            // When the number of documents added to IndexWriter exceeds MaxBufferedDocs, IndexWriter will write the documents to memory
            iwc.MaxBufferedDocs = 32;
            iwc.MergePolicy = new TieredMergePolicy();
            iwc.OpenMode = OpenMode.CREATE_OR_APPEND;
            IndexWriter writer = new IndexWriter(dir, iwc);
            if (IndexWriter.IsLocked(dir))
            {
                IndexWriter.Unlock(dir);  //unlock: forcibly unlock
            }

            //writer.Commit();

            Document doc = new Document();

            // Only index does not include words
            Field pathField = new StringField("id", Guid.NewGuid().ToString("N"), Field.Store.YES);
            doc.Add(pathField);

            // both index and participle
            Field contentField = new TextField("bb", GetContent(), Field.Store.YES);
            doc.Add(contentField);

            Field dblField = new DoubleField("cc".1000.12d, Field.Store.YES);
            doc.Add(dblField);

            string s = "adfadfasfwerewre";
            Field binaryField = new StoredField("bin".new BytesRef(Encoding.UTF8.GetBytes(s)));
            doc.Add(binaryField);

            writer.UpdateDocument(new Term("id"."1"), doc);
            writer.AddDocument(doc);

            writer.Flush(triggerMerge: false, applyAllDeletes: false);
            // Write slowly, pay attention to timing
            writer.Commit();
            writer.Dispose();
Copy the code

The pit is coming

The execution of writer.Com MIT (); Stack Overflow exception:

 at ICU4N.Globalization.CultureInfoExtensions.ToUCultureInfo(System.Globalization.CultureInfo)
   at ICU4N.Globalization.UCultureInfo.GetCurrentCulture()
   at ICU4N.Globalization.UCultureInfo.get_CurrentCulture()
   at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, System.String, System.Reflection.Assembly, ICU4N.Impl.OpenType)
   at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, System.String, System.Reflection.Assembly, Boolean)
   at ICU4N.Util.UResourceBundle+<>c__DisplayClass25_0.<GetRootType>b__0(System.String)
   at System.Collections.Concurrent.ConcurrentDictionary`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[ICU4N.Util.UResourceBundle+RootType, ICU4N, Version=60.0. 0. 0, Culture=neutral, PublicKeyToken=efb17c8e4f0e291b]].GetOrAdd(System.__Canon, System.Func`2<System.__Canon,RootType>) at ICU4N.Util.UResourceBundle.GetRootType(System.String, System.Reflection.Assembly) at ICU4N.Util.UResourceBundle.InstantiateBundle(System.String, System.String, System.Reflection.Assembly, Boolean) at ICU4N.Util.UResourceBundle.GetBundleInstance(System.String, System.String, System.Reflection.Assembly, Boolean) at ICU4N.Util.UResourceBundle.GetBundleInstance(System.String, System.String) at ICU4N.Impl.ICUResourceBundle.GetBundle(ICU4N.Impl.ICUResourceBundleReader, System.String, System.String, System.Reflection.Assembly) at ICU4N.Impl.ICUResourceBundle.CreateBundle(System.String, System.String, System.Reflection.Assembly) at ICU4N.Impl.ICUResourceBundle+<>c__DisplayClass59_0.<InstantiateBundle>b__0(System.String)  at ICU4N.Impl.SoftCache`2+<>c__DisplayClass1_0[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].<GetOrCreate>b__0(System.__Canon)
   at System.Collections.Concurrent.ConcurrentDictionary`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetOrAdd(System.__Canon, System.Func`2<System.__Canon,System.__Canon>)
   at ICU4N.Impl.SoftCache`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetOrCreate(System.__Canon, System.Func`2<System.__Canon,System.__Canon>)
   at ICU4N.Impl.ICUResourceBundle.InstantiateBundle(System.String, System.String, System.String, System.Reflection.Assembly, ICU4N.Impl.OpenType)
   at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, System.String, System.String, System.Reflection.Assembly, ICU4N.Impl.OpenType)
   at ICU4N.Globalization.UCultureInfo+DotNetLocaleHelper.GetDefaultCalendar(System.String)
   at ICU4N.Globalization.UCultureInfo+DotNetLocaleHelper.ToUCultureInfo(System.Globalization.CultureInfo)
   at ICU4N.Globalization.CultureInfoExtensions+<>c.<ToUCultureInfo>b__1_0(System.Globalization.CultureInfo)
   at J2N.Collections.Concurrent.Add2Info`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].CreateValue(System.__Canon, System.__Canon ByRef)
   at J2N.Collections.Concurrent.LurchTable`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InternalInsert[[J2N.Collections.Concurrent.Add2Info`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], J2N, Version=2.0. 0. 0, Culture=neutral, PublicKeyToken=f39447d697a969af]](Int32, System.__Canon, Int32 ByRef, J2N.Collections.Concurrent.Add2Info`2<System.__Canon,System.__Canon> ByRef)
   at J2N.Collections.Concurrent.LurchTable`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Insert[[J2N.Collections.Concurrent.Add2Info`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], J2N, Version=2.0. 0. 0, Culture=neutral, PublicKeyToken=f39447d697a969af]](System.__Canon, J2N.Collections.Concurrent.Add2Info`2<System.__Canon,System.__Canon> ByRef)
   at J2N.Collections.Concurrent.LurchTable`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetOrAdd(System.__Canon, System.Func`2<System.__Canon,System.__Canon>)
   at ICU4N.Globalization.CultureInfoExtensions.ToUCultureInfo(System.Globalization.CultureInfo)
   at ICU4N.Globalization.UCultureInfo.GetCurrentCulture()
   at ICU4N.Globalization.UCultureInfo.get_CurrentCulture()
   at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, System.String, System.Reflection.Assembly, ICU4N.Impl.OpenType)
   at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, System.String, System.Reflection.Assembly, Boolean)
   at ICU4N.Util.UResourceBundle+<>c__DisplayClass25_0.<GetRootType>b__0(System.String)
   at System.Collections.Concurrent.ConcurrentDictionary`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[ICU4N.Util.UResourceBundle+RootType, ICU4N, Version=60.0. 0. 0, Culture=neutral, PublicKeyToken=efb17c8e4f0e291b]].GetOrAdd(System.__Canon, System.Func`2<System.__Canon,RootType>) at ICU4N.Util.UResourceBundle.GetRootType(System.String, System.Reflection.Assembly) at ICU4N.Util.UResourceBundle.InstantiateBundle(System.String, System.String, System.Reflection.Assembly, Boolean) at ICU4N.Util.UResourceBundle.GetBundleInstance(System.String, System.String, System.Reflection.Assembly, Boolean) at ICU4N.Util.UResourceBundle.GetBundleInstance(System.String, System.String) at ICU4N.Impl.ICUResourceBundle.GetBundle(ICU4N.Impl.ICUResourceBundleReader, System.String, System.String, System.Reflection.Assembly) at ICU4N.Impl.ICUResourceBundle.CreateBundle(System.String, System.String, System.Reflection.Assembly) at ICU4N.Impl.ICUResourceBundle+<>c__DisplayClass59_0.<InstantiateBundle>b__0(System.String)  at ICU4N.Impl.SoftCache`2+<>c__DisplayClass1_0[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].<GetOrCreate>b__0(System.__Canon)
   at System.Collections.Concurrent.ConcurrentDictionary`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetOrAdd(System.__Canon, System.Func`2<System.__Canon,System.__Canon>)
   at ICU4N.Impl.SoftCache`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetOrCreate(System.__Canon, System.Func`2<System.__Canon,System.__Canon>)
   at ICU4N.Impl.ICUResourceBundle.InstantiateBundle(System.String, System.String, System.String, System.Reflection.Assembly, ICU4N.Impl.OpenType)
   at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, System.String, System.String, System.Reflection.Assembly, ICU4N.Impl.OpenType)
   at ICU4N.Globalization.UCultureInfo+DotNetLocaleHelper.GetDefaultCalendar(System.String)
   at ICU4N.Globalization.UCultureInfo+DotNetLocaleHelper.ToUCultureInfo(System.Globalization.CultureInfo)
   at ICU4N.Globalization.CultureInfoExtensions+<>c.<ToUCultureInfo>b__1_0(System.Globalization.CultureInfo)
   at J2N.Collections.Concurrent.Add2Info`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].CreateValue(System.__Canon, System.__Canon ByRef)
   at J2N.Collections.Concurrent.LurchTable`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].InternalInsert[[J2N.Collections.Concurrent.Add2Info`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], J2N, Version=2.0. 0. 0, Culture=neutral, PublicKeyToken=f39447d697a969af]](Int32, System.__Canon, Int32 ByRef, J2N.Collections.Concurrent.Add2Info`2<System.__Canon,System.__Canon> ByRef)
   at J2N.Collections.Concurrent.LurchTable`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].Insert[[J2N.Collections.Concurrent.Add2Info`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]], J2N, Version=2.0. 0. 0, Culture=neutral, PublicKeyToken=f39447d697a969af]](System.__Canon, J2N.Collections.Concurrent.Add2Info`2<System.__Canon,System.__Canon> ByRef)
   at J2N.Collections.Concurrent.LurchTable`2[[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e],[System.__Canon, System.Private.CoreLib, Version=5.0. 0. 0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].GetOrAdd(System.__Canon, System.Func`2<System.__Canon,System.__Canon>) at ICU4N.Globalization.CultureInfoExtensions.ToUCultureInfo(System.Globalization.CultureInfo) at ICU4N.Globalization.UCultureInfo.GetCurrentCulture() at ICU4N.Globalization.UCultureInfo.get_CurrentCulture() at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, System.String, System.Reflection.Assembly, ICU4N.Impl.OpenType) at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, ICU4N.Globalization.UCultureInfo, System.Reflection.Assembly, ICU4N.Impl.OpenType) at ICU4N.Impl.ICUResourceBundle.GetBundleInstance(System.String, ICU4N.Globalization.UCultureInfo, ICU4N.Impl.OpenType) at ICU4N.Text.BreakIteratorFactory.CreateBreakInstance(ICU4N.Globalization.UCultureInfo, Int32) at ICU4N.Text.BreakIteratorFactory.CreateBreakIterator(ICU4N.Globalization.UCultureInfo, Int32) at ICU4N.Text.BreakIterator.GetBreakInstance(ICU4N.Globalization.UCultureInfo, Int32) at ICU4N.Text.BreakIterator.GetSentenceInstance(System.Globalization.CultureInfo) at Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer.. cctor() at Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer.. ctor(AttributeFactory, System.IO.TextReader) at Lucene.Net.Analysis.Cn.Smart.HMMChineseTokenizer.. ctor(System.IO.TextReader) at Lucene.Net.Analysis.Cn.Smart.SmartChineseAnalyzer.CreateComponents(System.String, System.IO.TextReader) at Lucene.Net.Analysis.Analyzer.GetTokenStream(System.String, System.IO.TextReader) at Lucene.Net.Documents.Field.GetTokenStream(Lucene.Net.Analysis.Analyzer) at Lucene.Net.Index.DocInverterPerField.ProcessFields(Lucene.Net.Index.IIndexableField[], Int32) at Lucene.Net.Index.DocFieldProcessor.ProcessDocument(Builder) at Lucene.Net.Index.DocumentsWriterPerThread.UpdateDocument(System.Collections.Generic.IEnumerable`1<Lucene.Net.Index.IIndexableField>, Lucene.Net.Analysis.Analyzer, Lucene.Net.Index.Term)
   at Lucene.Net.Index.DocumentsWriter.UpdateDocument(System.Collections.Generic.IEnumerable`1<Lucene.Net.Index.IIndexableField>, Lucene.Net.Analysis.Analyzer, Lucene.Net.Index.Term)
   at Lucene.Net.Index.IndexWriter.UpdateDocument(Lucene.Net.Index.Term, System.Collections.Generic.IEnumerable`1<Lucene.Net.Index.IIndexableField>, Lucene.Net.Analysis.Analyzer)
   at Lucene.Net.Index.IndexWriter.UpdateDocument(Lucene.Net.Index.Term, System.Collections.Generic.IEnumerable`1<Lucene.Net.Index.IIndexableField>)
Copy the code

At first, I thought it was because the.net Core environment did not set Chinese, because there are a bunch of international language related functions in the exception, set Chinese, the code is as follows:

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);
            CultureInfo culture1 = CultureInfo.CurrentCulture;
            CultureInfo culture2 = Thread.CurrentThread.CurrentCulture;
            Console.WriteLine("The current culture is {0}", culture1.Name);
            Console.WriteLine("The two CultureInfo objects are equal: {0}",
                              culture1 == culture2);
Copy the code

Printing is perfect for zh-CN, but abnormal, this is why? The ICU4N class library has been disturbed.

Find the ICU4N class library on Github and finally find the Issue: Getting UCultureInfo.CurrentCulture will throw a StackOverflowException if the current culture is any of the following: zh-CN, zh-HK, zh-MO, zh-SG, zh-TW.

Ok, have time to pull down the requirements, you can help fix the bug oh. We know the problem. Let’s solve it first.

Add the following code before the code, F5, and everything is OK.

//https://github.com/NightOwl888/ICU4N/issues/29
Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo("en-us");
Copy the code

A pit, so wade past!

5. Search results

The search is relatively smooth, with the following code:

// select * from bb
   var a = "bb";
   IndexReader reader = DirectoryReader.Open(dir);
   IndexSearcher searcher = new IndexSearcher(reader);
   QueryParser parser = new QueryParser(LuceneVersion.LUCENE_48, a, analyzer);
   BooleanQuery mp = new BooleanQuery();
   mp.Add(new TermQuery(new Term(a, "Clothes")), Occur.MUST);
   // Query the top 10 results
   var r = searcher.Search(mp, null, Convert.ToInt32(10));


   // Get the first result value, danger warning, there is no check result index here, for testing purposes only
   var b = r.ScoreDocs[0];
   var docRst = reader.Document(b.Doc);
   var f = docRst.GetBinaryValue("bin");
   var fdoc = Encoding.UTF8.GetString(f.Bytes);

   Console.WriteLine(fdoc);
   Console.WriteLine("Hello World!");
Copy the code

6, summary

The simple example is done, but if you want to use full text search, you can use this to go further.