When learning zero times of learning, we already have a certain understanding of machine learning and deep learning. At this point we need to learn it with a few questions:

1. Why is there zero learning?

2. What is the main application of zero-degree learning? Can it be applied in network security?

3. What are the skills and ideas that bring advantages to zero-learning?

4. Disadvantages and research points of ZSL?

To quote a story I’ve told before: Suppose Xiao Ming and his father went to the zoo and saw a horse. His father told him that it was a horse. Then he saw the tiger again and told him:

“Look, this animal with stripes is a tiger.” ; Finally, he took him to see the panda and said to him, “Look, the panda is black and white.” Then, his father arranged a task for Xiao Ming, let him find an animal he had never seen in the zoo, called zebra, and told Xiao Ming about zebra information: “The zebra has the outline of a horse, the body has stripes like a tiger, and it is black and white like a panda. Finally, Xiao Ming found the zebra in the zoo according to his father’s tips.

The above example involves a human reasoning process that uses past knowledge (descriptions of horses, tigers, pandas and zebras) to make mental inferences about the specific shape of a new object, and thus to identify the new object.

ZSL is to be able to imitate the human reasoning process, so that computers have the ability to identify new things.

Deep learning is very hot now, make pure supervision and learning on many tasks have achieved impressive results, but its limitation is: often require plenty of samples to make good enough training model, and use the cat and dog trained classifier, can classify dogs and cats, it can identify other species. Such a model obviously does not meet our requirements for artificial intelligence, we hope that the machine can be like Xiao Ming, with the ability to recognize new categories through reasoning.

Medical imaging images, identification of endangered species

1. Introduction to zero learning

The above reasoning process is abstracted to infer the category of newly emerged objects by adding auxiliary information to known information. Therefore, the known information (horse, tiger, panda) in the inference process is the training set, the auxiliary information (horse shape, black stripes, black and white colors) is the semantic information associated with the training set and the test set, and the speculation (zebra) is the test set. Known before training, is visible class (SeenClass); Not seen in the training process, is invisible class (unseenclass) X is the data, Y is the label, S is the visible class, U is the invisible class, Tr is the training set category, Te is the test set category, then the definition of zero sample learning is ZSL :X- “YU, that is, through the training of visible class data to extract the corresponding features, plus the embedding of auxiliary knowledge, and finally predict the invisible class. Where Te and Tr do not intersect; Tr is S, Te is U. It is worth noting that the prediction cannot be made if the corresponding category of the training set appears.

Zero-sample learning is a special supervised learning technique because the known knowledge that zero-order learning depends on is still labeled data.

model

Relationship between training set and test set category

The relationship between training set, test set category, visible class and invisible class

Zero sample learning

Does not contain

The training set category is visible

The test set category is invisible

Traditional supervised learning

Is equal to the

The training set and test set are both visible classes

Generalized zero-sample learning

contains

The training set category is visible

The test set categories are visible and invisible

One of the previously learned variants of zero-sample learning derivation and transfer learning. The main difference between zero-sample learning and other transfer learning is that there is no intersection between training sample set and test sample set. With the continuous development in recent years, zero-sample learning has gradually separated from transfer learning and become an independent research direction of machine learning. Compared with existing classification methods, zero-sample learning method has the following three advantages:

1) For some specific classes that have not yet established sample sets (such as newly designated biological species or endangered species, newly designed industrial products, etc.), zero-sample learning can successfully identify and classify these objects, which can not only meet the actual needs, but also reduce labor and economic costs.

2) The core mechanism of zero-sample learning has a lot in common with the learning mechanism of human beings. In-depth research on zero-sample learning will provide strong help to the field of human cognitive science.

3) Zero-sample learning and deep learning are not contradictory, they can be organically combined, learn from others, and develop together, so as to better meet the needs of the future object recognition field.

Structure of zero-sample learning:

Zero sample learning basic idea is to use the training samples, sample and the corresponding auxiliary information (such as text description or attribute characteristics, etc.) to train the model, in the testing phase using the information in the training process, and the model of the test class auxiliary information to supplement, the model makes the model can successfully classify test set of the sample.

 

In the training stage, the reversible mapping of class label to feature subspace given by auxiliary information Ytr= G (Str),Str=g-1(Ytr) is used to determine the corresponding feature of each class label to represent Str, and the mapping function of Xtr to feature subspace Str is trained by the corresponding relationship between Str and Xtr. After obtaining the mapping function F (·) in the training stage, Xte is mapped to the same feature subspace by f(·) in the test stage, and its corresponding feature representation is estimated S= F (Xte). In addition, the auxiliary information of Yte is used to obtain Ste by reversible mapping, and a similar comparison is made between S and Ste to obtain the test class feature list most similar to S Indicates that ste corresponds to the class tag yte that estimates the class tag of the test class.

In the figure, Xtr represents the training class input sample set,Ytr represents the training class tag set,Xte represents the test class input sample set,Yte represents the test class tag set, and the dotted line box represents the feature subspace shared by the training class and the test class. The feature subspace contains the feature coding of each category in the test class and the training class. The bidirectional arrow on the right represents the bidirectional mapping from class label to feature code, which is known as the auxiliary information to be provided in zero-sample learning. Currently, there are three kinds of auxiliary information commonly used in zero-sample learning: attribute description, text description and class hierarchy relationship

In the training stage, Xtr and Str are used to train the mapping from image space to feature subspace, and the dotted arrow represents that the mapping is in the training process. That’s the estimate of the class tag that competitors in our test class Xte compute when we enter our test model that maps to our feature subspace. We compare S to Yte’s feature code Ste in the feature subspace to determine the class tag estimate of our test class Xte. The essence of zero sample learning, through various means, to improve the generalization ability of the model after training, the generalization ability of the model are strong enough to identify had never seen a test class sample, so as to determine the test sample’s class labels, and to bring the model after training to did not see the test sample, need training sample and test sample have auxiliary information, and in training, Learning the representation model of auxiliary information. During testing, the auxiliary information model learned during training and the auxiliary information of test samples are used to predict the class label of test samples. The key to realize zero-sample learning is to provide sufficient and effective auxiliary information to the zero-sample learning model and make the zero-sample learning model make efficient use of it.

Zero sample classification:

 

2. Key issues

By definition, zero-sample learning is a special type of supervised learning. In addition to the over-fitting problems inherent in traditional supervised learning, there are four key problems: domain drift, pivot point, generalized zero-sample learning and semantic interval.

2.1 Domain offset problem

The visual effects of the same thing in different areas are too different. When visible class trained mapping is applied to the prediction of invisible class, because the visible and invisible belonging to a different domain, visible and invisible class correlation is not big, the visual feature of different domains in the same things may vary widely, in the absence of any adaptation of the invisible class situation, there will be a domain migration problems. For example, in real life, we know that a tiger’s tail is visually very different from a rabbit’s tail. As shown in the figure. However, when we predicted the category of tiger, the tail attribute was given in the auxiliary information, and the training effect of rabbit tail was not consistent with the actual effect.

At present, there are three main solutions proposed by scholars: the first is to add invisible class data in the training process, that is, to establish a direct inference model. The second is to force constraints/information on training data, that is, to build inductive models. The third way is to generate pseudo samples into the testing process, that is, to establish a generative model,

Its essence is to transform zero-sample learning into traditional supervised learning.

Of course, all of the above solutions assume that the data distribution of visible and invisible classes is consistent at the sample level.

2.2 Pivot question

One point becomes the nearest point to most points. In the process of projection from the original space to the target space, a certain point will become the nearest point of most nodes. For example, when zero-sample learning model is used for classification, the algorithm adopted is K-nearest Neighbor (KNN) algorithm, a point may have several or even dozens of Nearest Neighbor nodes, which may produce a variety of results, resulting in poor model effect.

There are two main solutions: the first is to use ridge regression model to establish a mapping from low dimension to high dimension, and in computer vision to establish a mapping from semantic to visual, this method is also known as reverse mapping. The second way is to use the generative model, and the second way is to use the generative model to generate pseudo samples and add them to the test process.

2.3 Generalized zero-sample learning

The training set category and test set category are mutually exclusive. The prerequisite of zero-sample learning is that there is no intersection between test set and training set, that is, visible class equals training set and invisible class equals test set. This means that the test phase, if the sample comes from the training set, cannot be predicted. This is not realistic in real life.

There are two main solutions. The first one is to divide visible class and invisible class data in the test set through classifier. If it is visible class data, classifiers are directly used for classification. If it is invisible class data, auxiliary information is used for prediction. The second generation model uses the generation model to generate invisible class samples, and then trains a classifier with the generated samples and visible class samples to transform the generalized zero-sample learning into traditional supervised learning

2.4 Semantic Interval

Semantic space and visual space are different in popular composition, and there are gaps in mapping between them. The common solution of zero-sample learning to predict invisible class data is to construct the relationship between image and semantics.

At present, the main solution proposed by scholars is to map the visual features extracted from image space and semantic information extracted from semantic space to public space and align them

2.5 Sample data set

According to the different types of applications, text, image and video, the common data sets in zero-sample learning are introduced respectively.

Text:

LASER language set, wordNet English word dataset, ConceptNet common sense dataset

Image:

Fine-grained WAW animals, CUB birds, APY mixed categories, SUN scene fine-grained, imageNet scene fine-grained

Video:

UCF101, activityNet Human Behavior, CCV, USAA Social Activities

3. Zero-sample model development

The classical model of zero-sample learning in three development stages is introduced, which provides theoretical support for the construction of application system in Chapter 3. The three development stages are: first, attribute-based zero-sample learning; Second, zero sample learning based on embedding; Third, zero-sample learning based on generative model.

3.1 Attribute-based zero-sample learning

Attributes are a kind of semantic information. This method is the pioneering work of zero sample learning and also the foundation of the subsequent development of zero sample learning.

1. The model of the DAP

The following two steps: First, use Support Vector Machine (SVM) to train the mapping of visible class data to public attributes, and learn an attribute classifier for each visible class data, which is also the shared space between visible and invisible classes. Secondly, the bayesian formula is used to predict the attributes of invisible classes, and then the classification of invisible classes is deduced through the relationship between invisible classes and attributes.

By using attributes, DAP models successfully predict categories without data and with high accuracy. But the DAP

There are three obvious drawbacks. First, for the newly added visible class data, the attribute classifier needs to be retrained, so it cannot be optimized and improved. Second, it is difficult to use auxiliary information other than attributes, such as network structure data Wordnet. Thirdly, due to the use of attributes as the middle layer, the model can achieve the optimal prediction attributes. But it’s not necessarily the best for predicting categories.

2. The IAP model

Two steps: First, use support vector machine (SVM) to train the visible class to attribute mapping and the invisible class to attribute mapping. Secondly, the probability of visible class data and visible class is obtained by Using Bayes formula, and a category classifier is learned for each visible class data, and then the category to which invisible class data belongs is deduced through the class-attribute relationship.

Like the DAP model, the IAP model successfully predicts categories for which there is no data and is much more flexible and simple than the DAP model. The IAP model has a smaller training time cost when there are new classes to train. However, the IAP model did not perform as well as the DAP model in the experiment.

3.2 Zero-sample learning based on embedding

With the continuous development of machine learning, computer vision has gradually become the focus of researchers. The zero-sample learning based on attributes is far from satisfying the needs of image processing, and there are many problems in zero-sample learning based on attributes. Therefore, zero-sample learning is proposed based on embedded zero-sample learning, which combines semantic information closely with image information. The main methods include embedding semantic information into image space, embedding semantic information into semantic space, embedding semantic information and image information into public space, etc.

The training functions commonly used in image information embedding into semantic space include single linear function, bilinear function, nonlinear function, etc. The loss functions include sorting loss, square loss, etc.

1.ESZSL

There are two stages: the training stage and the reasoning stage. Learn bilinear functions through SVM. In the training stage, the mapping between the feature space and the attribute space is established by multiplying the training sample and the feature matrix. The other obtains the final prediction model by using the description of training samples and the mapping between feature space and attribute space in the reasoning stage, and learns a mapping from image space to semantic space for each category. It is worth noting that the two stages can be completed in a single line without calling other functions. It is very simple to complete zero-sample learning. ESZSL also established the corresponding regularization method and square loss function to optimize the model.

3.3 Zero-sample learning based on generation model

In the field of zero-sample learning, embedding semantic information into image space is often done using generative models. On the premise of obtaining the visual and semantic information of the known class, the invisible class samples are generated through the semantic coherence between the known class and the unknown class, so that the zero-sample learning becomes the traditional supervised learning and the generation model is applied to the extreme.

1.SAE

Zero sample learning combined with AE. SAE model takes semantic space as a hidden layer, maps visible class image information to semantic space through encoder, and generates invisible class image through decoder through semantic coherence between known class and unknown class, and then transforms zero-sample learning into traditional supervised learning.

The premise of SAE model is that the mapping matrix of image information to semantic space is the transpose of the embedding matrix of the image generated in semantic space, and a penalty constraint is added, that is, the product of the embedding matrix of image information to semantic space and the representation of visible image information is equal to the representation of hidden layer. This allows the encoded image to retain all the information of the original image as much as possible.

SAE model not only has simple model and good effect, but also can be applied to generalized zero-sample learning to solve the problem of domain drift. However, the embedding function of semantic information and image information used by SAE model is too simple and fixed, which cannot generate high-quality images and accurately predict invisible class samples.

4. Application of zero-sample learning

Application of zero sample learning in three dimensions. The first dimension is the word. Zero-sample learning technique is used to process words, and it is applied in many fields. The second dimension is the picture. The text information generated in the first dimension can be embedded in the visual space as semantic information to promote the application of zero-sample learning in image processing. The third dimension is video. Each frame in the video can be used as a picture. The video is segmented into pictures and the second dimension method is used to further the application of zero-sample learning in video.

1. Word: dialogue system, machine translation, text classification

2. Image: image retrieval, target recognition, semantic segmentation

3. Video: Human behavior recognition, super resolution,

With the improvement of the performance of zero-sample learning method, its application in practical scenarios gradually increases.

1) Computer vision. The biggest application of zero-sample learning lies in the study of images and videos. Zero-sample learning can not only complete classification tasks and solve fine-grained classification problems such as birds and flowers, but also be used for image segmentation, image retrieval and domain adaptation. Zero-sample learning is also used to study video-related problems, and it can be used to identify videos with unknown action and unknown emotional labels. In addition, zero-sample learning is also used for action location, event narrative and description generation (text) tasks.

(2) Natural language processing. In recent years, zero-sample learning has also gained a foothold in the field of natural speech processing. In the study of rare and rare languages, zero-sample learning helps to construct bilingual dictionaries. In machine translation problems, zero-sample learning is used for zero-sample translation in language pairs without parallel corpora. In addition, zero-sample learning is also used in oral comprehension and semantic discourse classification. In addition, zero-sample learning can also be used for web page entity extraction, fine-grained named entity types, cross-language document retrieval, relationship extraction and other natural language processing related problems.

(3) Others. In addition to the above areas, zero-sample learning can be used to identify human activities with the help of sensors; In computational biology, zero-sample learning can analyze the composition of molecular compounds. In the area of security and privacy, zero sample learning can help transmitter identification.

5. Intrusion detection based on zero-sample learning

1. Attack semantic knowledge base construction

Zero-sample learning is to complete the identification of invisible classes through the effective transfer of “knowledge” from visible classes to invisible classes. Knowledge in zero-sample learning can be divided into three levels, namely, primary knowledge, abstract knowledge and external knowledge. If we want to complete the invisible class attack and transfer “knowledge” to the visible class attack, we need the knowledge corresponding to the sample characteristics. Therefore, we can choose to obtain semantic knowledge corresponding to attack samples from the external text description of attack types.

Using a public data set, NSL-KDD does not provide a detailed description of the corresponding attack categories, but only data and tags. We need to collect semantic knowledge of different attack categories in the dataset by ourselves and convert it into machine-readable semantic embedding vector, so as to form our semantic knowledge base of attack. (Collect semantic description information of all attack categories by summarizing popular science knowledge of Wikipedia, Baidu Encyclopedia and security websites.) The text is transformed into word vector by NLP technology to generate attack semantic knowledge base.

2. Autoencoder

Input encoder will be visible class training set, the corresponding semantic embedding vector input encoder automatically embedded semantic layer, our goal is to update the encoder, decoder and regression parameters, the discriminant embedding layer can extract the characteristics of the more representative embedded, decoder can generate enough semantic embedding information and discriminant features of invisible class dummy samples.

3. The classifier

Once a pseudo-sample of an invisible class attack is generated, we can use the sample and its corresponding tag to train the supervised classifier of an invisible class attack.