Nodejs does natural language processing very well, this time I did a little experiment, let’s take a look.

Because I still have a passion for natural language processing, I will read my graduation thesis (the thesis topic is about natural language processing) in my spare time recently, and then I added some related processing in my new blog, mainly doing the following aspects:

  1. For each article to quickly understand the content, according to the title and content, output multiple content labels;

  2. Automatic classification of articles according to content provides a basis for article clustering and text content analysis.

  3. According to the title of the article, user-defined tags, and tags obtained by artificial intelligence, similarity calculation is carried out.

  4. When reading an article, recommend relevant articles to users based on the results of similarity calculation.

The following is the result diagram of the automatic output content label:

The blog system

Operating environment: Centos9 + Docker

Development language: NodeJS

Database: MariaDB

Development framework: EggJS + Nunjucks (template engine)

This is also my first time to do back-end rendering blog, Ajax website, do SEO is really not good to do…

This is the first time I have used EggJS, the “Onion model” framework. I really like it, whether it is es7 elegant js asynchronous processing, or classic MVC, or the plug-in mechanism of the framework and so on. It’s really awesome. This framework is highly recommended for those who love NodeJS.

Recommendation system

Recommendation system is we usually in the use of software, or often met in the website, such as information, Baidu feed, headlines, QQ hotspot, etc.; E-commerce, Alibaba, JINGdong and so on, douyin and so on, many many.

A good recommendation system can bring more income, but a bad recommendation system will often get ridicule from others. I saw on Maimai that a CTO of a company received the recommended position of Android engineer recommended by Maimai and was ridiculed. Robin Li made fun of his employees the other day because he didn’t receive an important piece of tech news on his feed. There’s usually a lot of that.

In my opinion, a good recommendation system should be more “understanding”. If I bought a mobile phone a month ago, I hope to push some mobile phone accessories to me, instead of pushing a mobile phone to me. At this time, the probability of me buying mobile phone accessories is far greater than that of buying a mobile phone. At present, many recommendation systems are analyzed and pushed through user portraits and various buried points, and user operation data. I don’t think it is necessary to add emotion analysis on this basis, with an extra dimension, perhaps more accurate data can be obtained.

Having said so much, I think there are still a lot of bottlenecks exist 😐, now THE AI is just like the mobile Internet many years ago, is on the rise. There’s so much more we can do.

Let’s move on to today’s real question…

This time to do the article recommendation system, share some details to you:

The right part of the figure is the push result of our article push system. We use different colors to mark the correlation degree between this article and the article we are currently browsing. The darker the color, the higher the correlation degree, the higher the confidence degree and the greater the weight.

The third point mentioned above is mainly used in this recommendation system: similarity calculation; The mathematical model used is spatial vector model, which can convert unstructured text data into vector form, and lay a good mathematical foundation for subsequent processing after being expressed in vector form.

The spatial vector model helps us transform each document into a multidimensional spatial vector form:

Where, vector W1i represents the proportion of the first word in the document Ci, vector W2i represents the proportion of the second word in the document Ci, and so on, vector Wti represents the proportion of the t word in the document Ci.

Then for the similarity of two articles, we can calculate the cosine of the included Angle of their corresponding vectors for calculation:

The closer the cosine of two documents is to 1, the more similar the two documents are.

The key code to calculate the similarity is given below: