Content operation refers to a series of work such as planning, editing, publishing, optimization and marketing based on content, mainly concentrated in the Internet, media and other content-oriented industries. Content operation can be divided into UGC, PGC and OGC according to different content production modes.

(1) User-generated Content (UGC) : user-generated Content. This is the main content production mode in the era of forum, post bar and microblog. The content is mainly generated by users participating in the content carrier, and the operator itself does not produce any substantive content. These users are generally non-professional “writers” who spontaneously form content based on common language such as interests and hobbies.

(2) PGC (Professionally-generated Content), specializing in the production of Content. Compared with UGC, PGC is all user-generated content, but the users here mainly refer to users with professional background and qualifications, including industry leaders, knowledge experts, book authors, etc., who are usually able to produce very high-quality professional content. Now many knowledge websites are in this form, such as Zhihu and personal wechat public accounts.

Occupationally generated Content (OGC) is the Content of occupational production. OGC is comparable to PGC in content specialization, but OGC is characterized by content production as an “occupation”, so the corresponding income from content production is a significant feature of this type. OGC is generally represented by various news websites and media, which attract high-quality “writers” to participate in content production by paying contributions and sharing. Of course, in addition to inviting outside experts to contribute to content production, these sites also have many professional content producers themselves.

The data-based operation analysis model in this section mainly involves emotion analysis model, search optimization model, article keyword model, topic model and junk information detection model.

I. Emotion analysis model

Emotion analysis is the analysis of emotional tendency, which is used to analyze specific objects’ views, attitudes, emotions, positions and other subjective feelings towards relevant attributes. The analysis results are usually positive, neutral or negative.

Application scenarios of emotion analysis:

  • Competitive intelligence: Capture specific information about competitors from a user’s perspective.
  • Public opinion monitoring: to obtain the monitoring and prediction of relevant information about their own websites, content, products, services, brands, images and so on, so as to obtain the status quo and future trend of influential and tendentious opinions and opinions.
  • Customer orientation analysis: the analysis of whether customers are positive or negative toward the enterprise is conducive to the establishment of a comprehensive understanding of the image of customers and enterprises.
  • Topic supervision: to supervise the topic concentration, main content and topic evolution of all users under a specific topic.
  • Word-of-mouth analysis: users’ perception and understanding of various aspects of the enterprise, especially the word-of-mouth control of opinion leaders with good communication effect.

Common methods of sentiment analysis: In addition to non-negative matrix decomposition and sentiment analysis based on genetic algorithm, supervised learning algorithms, such as naive Bayes, K-nearest Neighbor and support vector machine, are most commonly used. The basic idea of sentiment analysis using classification method is as follows:

  • Step 1: text preprocessing, including invalid label removal, encoding conversion, document segmentation, basic error correction, blank removal, case unification, punctuation removal, stopping words, retention of special characters, etc.
  • Step 2: Text segmentation, in the Chinese environment requires a specific word segmentation model.
  • Step 3: Text vectorization, converting text features into vector space models to mark.
  • Step 4: Feature extraction: Feature extraction for massive sparse features, including feature selection and data transformation and other methods.
  • Step 5: Classification modeling and effect evaluation, select a specific classification model, establish the model and make effect evaluation and conclusion analysis.

Second, search optimization model

Users may frequently refer to some texts, and these keywords contain users’ potential intentions. For example, when a user searches for the word “heat analysis” in a search engine, the relevant search terms may include: spatial heat analysis, keyword heat analysis, audio heat analysis, hot word analysis, ten methods of keyword heat analysis, keyword heat analysis, online game heat list, etc.

The search optimization model can help users find potential content of interest more quickly, and can be used for associative functions, relevant result prompts and secondary search suggestions in the search process.

The commonly used method of search optimization model is association model, such as Apriori, FP-growth, etc

3. Article keyword model

Keyword extraction is to extract the most relevant words from the text. The results of keyword extraction are often used in document retrieval, article label editing, text clustering, text classification, keyword summary and other aspects.

Keyword model can generate short indicative information about the content of the document, and present the main content or core keywords of the document to users to decide whether to read the original document, which can save a lot of browsing time and improve the ability to display key information.

Application scenarios of article keyword model extraction: tag, content and meta information generation of posts, news, information, comments, q&A, etc.

The common methods of keyword model extraction are as follows: main keywords of text are obtained by word frequency statistics and TF-IDF model.

4. Theme model

Topic Model is a modeling method to extract the implied theme in the text. In statistics, a topic is a vocabulary or word probability distribution model for a particular word, which is the central idea or core concept expressed by a text (article, speech, sentence). For example, when we think of IBM, maybe we think of ThinkPad. When we think of Bill Gates, we often think of Windows. IBM and ThinkPad, Bill Gates and Windows are related concepts in their respective themes.

Topic model is a powerful tool for mining hidden information behind language, and it is an important part of semantic mining, natural language understanding, text parsing and analysis, and information retrieval.

  • It can measure semantic similarity between documents, and is an important part of text clustering, classification, sentiment analysis, document similarity and other applications.
  • It can solve the problem of polysemy and achieve accurate pos tagging.
  • It can eliminate the noise in the text and extract the topic keywords accurately.

The topic model overcomes the shortcoming of the traditional method of document similarity calculation in information retrieval and can automatically find the semantic topic between words in massive data. The topic model can be applied to the application scenarios generated around the topic, such as search engine domain, sentiment analysis, public opinion monitoring, personalized recommendation, social analysis, etc. The results of the theme model can be further displayed in the form of tag cloud after stopping words.

Common topic models include:

  • Latent Dirichlet Allocation (LDA) model.
  • ProbabilisticLatent Semantic Analysis (pLSA).
  • Other derivative models based on LDA, such as Twitter LDA, TimeUserLDA, ATM, image-lda, Maxent-Lda, etc.

5. Spam detection model

Spam detection model is a classification application, mainly used to detect whether a specific object contains spam, is an important way of website content management.

Common spam detection applications include:

  • Filter spam from E-mail.
  • Filter messages containing malicious messages from in-site messages.
  • Filter intemperate comments from comments or comments.
  • Identify negative topics from user posts.

Spam detection model can be implemented by classification model, and the common methods are naive Bayes, matrix transformation, K-nearest neighbor, support vector machine, neural network, etc.

In addition to supervised learning based on tagged training sets, you can also use unsupervised methods for spam monitoring, such as:

  • Based on the content similarity, the content similarity between the new comment and the existing spam information is analyzed. If the similarity is higher than a certain threshold, the new comment is identified as spam. This, of course, assumes that there is a relatively complete collection of spam that needs to be maintained.
  • Filtering based on fixed information, such as a fixed IP address, containing a specific keyword, containing a URL, or from a specific domain, does not belong to algorithm applications.

In addition to text spam detection, it can also include more types of content, such as video, picture, voice, etc.