Writing in the front

Get started with Django REST Framework

The FULLTEXT INDEX of Mysql can be used directly for FULLTEXT retrieval in a project. Today, we will continue our discussion about Whoosh, which is used in a Django project

The code in this article is developed based on the Django REST Framework (1)

Extension project

Django-haystack is a third-party django search app that supports multiple search engines such as Solr, Elasticsearch, Whoosh, Xapian, and jieba, a well-known Chinese natural language processing library. Can provide a good effect of the text search system.

Configuration haystack

This is a Django REST framework based project to continue to configure haystack, so drF-Haystack is installed

1. Install dependencies

pipenv install  drf-haystack  whoosh jieba
Copy the code

2. Configure the project

  • Create the file article/models.py

from django.db import models


class Article(models.Model) :
    creator = models.CharField(max_length=50, null=True, blank=True)
    tag = models.CharField(max_length=50, null=True, blank=True)
    title = models.CharField(max_length=50, null=True, blank=True)
    content = models.TextField()

Copy the code
  • Create the article/search_indexes. Py file

from .models import Article
from haystack import indexes


class ArticleIndex(indexes.SearchIndex, indexes.Indexable) :
    text = indexes.CharField(document=True, use_template=True)
    title = indexes.CharField(model_attr="title")
    content = indexes.CharField(model_attr="content")
    tag = indexes.CharField(model_attr="tag")
    creator = indexes.CharField(model_attr="creator")
    id = indexes.CharField(model_attr="pk")
    autocomplete = indexes.EdgeNgramField()

    @staticmethod
    def prepare_autocomplete(obj) :
        return "".join((
            obj.title,
        ))

    def get_model(self) :
        return Article

    def index_queryset(self, using=None) :
        return self.get_model().objects.all(a)Copy the code
  • Create article/serializers. Py

from rest_framework import serializers
from drf_haystack.serializers import HaystackSerializer

from .search_indexes import ArticleIndex
from .models import Article


class ArticleSerializer(serializers.ModelSerializer) :
    class Meta:
        model = Article
        fields = '__all__'


class ArticleHaystackSerializer(HaystackSerializer) :

    def update(self, instance, validated_data) :
        pass

    def create(self, validated_data) :
        pass

    class Meta:
        index_classes = [ArticleIndex]

        fields = ['title'.'creator'.'content'.'tag']


Copy the code
  • Create the file article/urls.py
from django.conf.urls import url, include
from rest_framework import routers
from . import views


router = routers.DefaultRouter()
# router.register('article', views.ArticleViewSet)
router.register("article/search", views.ArticleSearchView, basename='article-search')

urlpatterns = [
    url(r'^', include(router.urls)),

]

Copy the code
  • Create the article/views.py file


from .models import Article
from rest_framework import viewsets
from .serializers import ArticleSerializer, ArticleHaystackSerializer
from drf_haystack.viewsets import HaystackViewSet


class ArticleSearchView(HaystackViewSet) :

    index_models = [Article]

    serializer_class = ArticleHaystackSerializer


class ArticleViewSet(viewsets.ModelViewSet) :
    """ API path that allows users to view or edit. "" "
    queryset = Article.objects.all()
    serializer_class = ArticleSerializer

Copy the code

New project settings.py

# specify the search engine
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine'.'PATH': os.path.join(BASE_DIR, 'whoosh_index'),}}# specify how to paginate search results (10 results per page, 20 results per page by default
HAYSTACK_SEARCH_RESULTS_PER_PAGE = 10
When the database changes, the index will be automatically updated, very convenient
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
Copy the code

Add the project to INSTALLED_APPS

INSTALLED_APPS = [
    ...
    'haystack',
    'article'
]
Copy the code

3. Configure data

  • Generate data table
#Generate database modeling statements
python manage.py makemigrations
 
#Execute modeling statement
python manage.py migrate 
Copy the code
  • Initialize data
INSERT INTO article_article (creator, tag, title, content)
VALUES ('admin'.'Modern Poetry'.'如果'.'I'll never think of you again in this life except in some night wetted with tears if you will'),
	('admin'.'Modern Poetry'.'love'.'One day the signpost changes I hope you take it easy one day the pier breaks I hope you cross one day the beams fall I hope you stay strong one day expectations wither I hope you understand'),
	('admin'.'Modern Poetry'.'Far and near'.'You look at me and you look at the clouds and I think you look at me very far away and you look at the clouds very close'),
	('admin'.'Modern Poetry'.Chapter 'off'.'You stand on the bridge and look at the scenery. The scenic man looks at you from upstairs. The moon adorns your window, you adorn someone else's dream. '),
	('admin'.'Modern Poetry'.'language alone'.'I pour out my thoughts to you like a statue of stone silence should not be if silence is your sorrow you know it hurts the most');

Copy the code

4. Build indexes

#The new file
templates/search/indexes/article/article_text.txt

{{ object.title }}
{{ object.tag }}
{{ object.content }}
{{ object.creator }}

#Create indexes

$ python manage.py rebuild_index
WARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'.
Your choices after this are to restore from backups or rebuild via the `rebuild_index` command.
Are you sure you wish to continue? [y/N] y
Removing all documents from your index because you said so.
All documents removed.
Indexing 5 articles


Copy the code

5. View the result

Using curl to verify results, support multi-condition query


$ curl -H 'Accept: application/json; indent=4'-u admin: admin at http://127.0.0.1:8000/api/article/search/\? Content__contains \ = tears{" count ": 1, the" next ", null, "previous", null, "results" : [{" title ":" if ", "content" : "Never in this life will I think of you again \n except \n except on some \n nights \n wet with tears \n if you will "," Tag ": "Modern poetry "," Creator ": "admin"}]}

$ curl -H 'Accept: application/json; indent=4'-u admin: admin at http://127.0.0.1:8000/api/article/search/\? Content__contains \= tears \&title\= If{" count ": 1, the" next ", null, "previous", null, "results" : [{" title ":" if ", "content" : "Never in this life will I think of you again \n except \n except on some \n nights \n wet with tears \n if you will "," Tag ": "Modern poetry "," Creator ": "admin"}]}Copy the code

Change the word segmentation tool

Since the default word segmentation tool does not fully support Chinese, it can be changed to jieba word segmentation tool

1. Modify the whoosh configuration file

Generic files are found in site-Packages in the current Python environment

#1. Copy 'haystack/backends/whoosh_backends.py' to the current./article

#2. Search for and modify the SettingsChange all StemmingAnalyzer to ChineseAnalyser...#Note that you need to find this first and then modify it, rather than add it directly  
schema_fields[field_class.index_fieldname] = TEXT(stored=True, analyzer=ChineseAnalyzer(),field_boost=field_class.boost, sortable=True)
Copy the code

2. Modify Settings configuration files

Change the HAYSTACK configuration only
HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'article.whoosh_backend.WhooshEngine'.'PATH': os.path.join(BASE_DIR, 'whoosh_index'),}}Copy the code

3. Generate indexes again

$ python manage.py rebuild_indexWARNING: This will irreparably remove EVERYTHING from your search index in connection 'default'. Your choices after this are to restore from backups or rebuild via the `rebuild_index` command. Are you sure you wish to continue? [y/N] y Removing all documents from your index because you said so. All documents removed. Indexing 5 articles Building prefix dict from the default dictionary ... Loading the model from the cache/var/folders/st/b16fyn3s57x_5vszjl599njw0000gn/T/jieba cache Loading model cost 0.764 seconds. Prefix dict has been built successfully.Copy the code

4. Verify the results

Results can also search to the desired results


$ curl -H 'Accept: application/json; indent=4'-u admin: admin at http://127.0.0.1:8000/api/article/search/\? Content__contains \ = scenery{" count ": 1, the" next ", null, "previous", null, "results" : [{" title ":" poems ", "content" : "You stand on the bridge and look at the scenery, \n look at the scenery people look at you from upstairs. \n Bright moon decorates your window, \n You decorate someone else's dream. "," Tag ": "Modern poetry "," Creator ": "admin"}]}Copy the code

Project directory

Let’s have a bucket for the project

$tree | grep -v pyc. ├ ─ ─ Pipfile ├ ─ ─ Pipfile. Lock ├ ─ ─ article │ ├ ─ ─ migrations │ │ ├ ─ ─ 0001 _initial. Py │ │ ├ ─ ─ just set py │ ├ ─ ─ models. Py │ ├ ─ ─ search_indexes. Py │ ├ ─ ─ serializers. Py │ ├ ─ ─ urls. Py │ ├ ─ ─ views. Py │ └ ─ ─ whoosh_backend. Py ├ ─ ─ demo │ ├ ─ ─ Just set py │ ├ ─ ─ asgi. Py │ ├ ─ ─ serializers. Py │ ├ ─ ─ Settings. Py │ ├ ─ ─ urls. Py │ ├ ─ ─ views. Py │ └ ─ ─ wsgi. Py ├ ─ ─ Manu.py ├── search │ ├─ article │ ├─ manu.txt ├─ ├─ MAIN_WRITELOCK ├─ main_ox1iJ98muwsyw2qv. seg ├─ _main_1.tocCopy the code

conclusion

So far, we have two different implementations of full-text search. For simple projects, mysql can be used to solve the problem. There are no language restrictions. If your project happens to be Django development, whoosh + Jieba is also a good choice

The resources

  • Haystack
  • Whoosh