The topic of this paper is: start from small sample learning and run towards the sea of stars. It is mainly divided into five parts:

  • Small sample learning method and its importance \

  • Three classic scenarios of small sample learning

  • Application areas of small sample learning

  • Definitions and challenges of small sample learning

  • PaddleFSL helps you learn from small samples

Yaqing Wang graduated from the Department of Computer Science and Engineering, Hong Kong University of Science and Technology in 2019. Her research direction is machine learning, supervised by Professors Mingxuan Ni and Tianyou Guo. She mainly focuses on small sample learning.

WAVE SUMMIT+2021 Deep Learning Developer SUMMIT

Scientific and technological innovation, female power] Forum

Since I started my PhD, I have published many articles in ICML, NeurIPS, TheWebConf, EMNLP, TIP and other top conferences. My small sample learning review was the most cited paper of ACM Computing Surveys 2019-2021 and the ESI highly cited paper of this year.

In addition, the small sample learning tool she is responsible for developing has gained 1.1K+ followers on GitHub. If you are interested, you can check out this link: github.com/tata1661/FS…

Since joining Baidu, Wang Yaqing has been deeply engaged in the field of small sample learning, mainly about how to quickly generalize to new tasks that only contain a small amount of annotated data.

Figure 1

Small sample learning method

And its importance

Three perspectives to solve small sample learning:

  • First of all, dig into the relevant theoretical learning basis, such as meta-learning, graph learning.
  • Secondly, we also need to consider how to implement practical applications in Baidu, such as new drug discovery, text classification, intention recognition, cold start recommendation, gesture recognition and so on.
  • Finally, it is to help you quickly get started with small sample learning, to achieve rapid prototyping of small sample learning methods, and to achieve a universal small sample learning tool. It is based on PaddlePaddle, which provides an easy to use and stable, classic approach to small sample learning that currently includes classic applications in CV and NLP.

When it comes to small sample learning, we need to talk about deep learning first. Since 2015, there have been numerous breakthroughs in deep learning, with AlphaGo beating the human go champion. Since ResNet, machine learning models have been able to annotate big data like ImageNet with less error than human annotators. But the success of these deep learning models requires a lot of annotated data and high-performance computing equipment.

AlphaGo, for example, has been trained from a database of 30 million pairs of games, and has been able to match itself repeatedly. ResNet was trained on ImageNet, a rare large data set containing millions of annotated images. Therefore, in most scenarios, it is difficult to meet the two conditions of “large amount of annotated data” and “high-performance computing equipment”, which is also the reason why small sample learning is required.

Figure 2

Three classic scenarios of small sample learning

First, three classic scenarios of small sample learning.

1. To make artificial intelligence more human-like, with the ability to draw inferences from one another, take the leftmost picture in Figure 3 as an example. Given a wheelbarrow, even a child can easily tell which one is a wheelbarrow from a pile of pictures. You can still tell it’s a unicycle by tilting it, turning it upside down, or thickening the poles and making the wheels bigger.

In addition, if you give a unicycle, a bicycle, a motorcycle, human children can easily see the similarities between different cars. For example, they have wheels and handlebars. Such an ability to draw inferences is still missing from current artificial intelligence. Therefore, small sample learning has always been the focus of academic research, the goal is to reduce the gap between artificial intelligence and human intelligence.

Figure 3

2. The key scenario of small sample learning is to reduce the cost of data collection, annotation, processing and calculation. Today, many developers encounter huge amounts of unlabeled data that contains a lot of noise. This also makes it very difficult to really use these data to mine out some knowledge and information.

In general, you need to find people who crowdsource data to help you label your data. But the standard data, first of all, it takes a long time, many rounds of iteration between the two sides. The quality of the final data will still involve some subjective factors of the people who are benchmarking the data.

Therefore, if we can apply small sample learning, we can greatly reduce the cost of data collection and annotation. By collecting a very small dataset that contains only a small number of high-quality labeled samples, you can train a model for regression prediction and classification.

3. Deal with rare situations. Dangerous, personal, ethical. A classic scenario is the discovery of new drugs. In drug discovery, the hope is to find those compounds that match the desired properties, among millions of compounds. They’re less toxic, they’re more water soluble, things like that.

But drug discovery itself is a time-consuming process. It could take a decade or so, and it would cost a lot of money, to get people to come in for the test. But at the end of the day, the number of samples that actually make it into the lab is very small. This makes drug discovery a small sample learning problem. (Figure 3)

Application areas of small sample learning

Because small sample learning is really too common, so at present, all walks of life, all fields, there are small sample learning figure. The first was CV, or computer vision, such as picture classification, object recognition, image cutting.

Later, it also appeared in the field of NLP, such as doing some classic relationship extraction and NER tasks. Recently with the advent of pre-training models, everyone wants to take advantage of pre-training models. These pre-training models are generally trained in a large corpus, which is rich in semantic information and prior knowledge.

How it can be tuned to new tasks, even if it contains only a small amount of annotation data, by fine-tuning or building templates, has been the focus of recent research in the NLP field.

In addition to the NLP domain, there are also things like knowledge graphs, such as how to deal with new entities and new relationships that are emerging, that can be handled through small sample learning.

Figure 4.

And then there’s drug discovery and robotics, as I mentioned earlier. For example, teaching the robot dog to take two steps to the left, or showing it just one or two gestures, and it knows what I want to do, all with small samples.

Definitions and challenges of small sample learning

The following is a more rigorous definition of small sample learning, based on the classic machine learning definition by Professor Tom Mitchell in 1997.

What is machine learning? For a certain type of task T, a computer program is said to be learning from experience E if its performance, measured with P on that task T, improves with experience E.

Small sample learning is a kind of machine learning. But what’s special about it is that the experience in it, there are very few supervisory signals. A more common monitoring signal is the label of the sample.

Figure 5

The ideal of learning is to reduce the expected risk of the model. That is, no matter what kind of sample you have in the future, you can predict it very well. But the joint distribution of this model is generally unknown, so you have to estimate it.

In machine learning, it’s generally about optimizing experience risk. But, the empirical risk that you see on the formula is measured by how many samples are in the training set. If it is in the training sample, there is only a small amount of annotated data. If the number of I’s is very small, you end up with a very unreliable, minimal empirical valuation of risk, which makes learning from a small sample really difficult.

However, this is not impossible to solve, the solution is that we combine the labeling information in experience E with some prior knowledge. For example, the pre-training model in the NLP field mentioned just now can make the learning of task T feasible by combining these prior knowledge. There are generally three angles.

  1. Through these prior knowledge, more labeled samples can be generated for training.
  2. The spatial complexity of the model is limited by prior knowledge.
  3. We can also have a priori knowledge of how to design an economical search strategy. So, for example, in the hypothesis space, this big H, where do I start? Which way are you looking? At what speed? These, will make the final search strategy, can be a little more economical and effective. With only a few samples, you can get very good results.

These methods are summarized and combed in detail in the review of small sample learning. This is the most cited paper in ACM Computing Surveys for the past two years and ESI’s most cited paper of the year.

PaddleFSL

Help you achieve small sample learning

I just introduced you to the general approach to small sample learning. So here’s how to do small sample learning with PaddleFSL.

Figure 6.

PaddleFSL is a small sample learning kit based on flying paddles. This toolkit provides simple, easy to use, and stable methods for classic small sample learning, and supports the expansion of new small sample learning methods.

In addition, unified data set processing is provided to make model effect comparison easier. It also provides very detailed annotations so that you can easily customize new data sets. It now includes classic applications of small CV and NLP samples, and is expanding into new areas based on the thriving ecology of flying OARS.

From the overall framework of PaddleFSL shown here, you can see that a range of tasks like image classification, relationship extraction, and general natural language processing are now supported. And it contains some of the classical data sets that are involved in these three tasks.

In order to deal with different applications, we also provide different feature extractors for you to extract features.

CNN, for example, is used to extract images, and supports all the pre-training models available in PaddleNLP. In addition, the model library also provides a classic small sample learning method. Because PaddleFSL is deployed on top of the paddle, it also supports cross-platform deployment.

In this paper, the repetition of the classification results of small sample images is given. PaddleFSL on ProtoNet, RelationNet, MAML, ANIL, on Omniglot, mini-Imagenet two classic datasets can reproduce better than the article report, or at least comparable results.

Here is a summary. Since joining Baidu Research Institute, Wang Yaqing has mainly been studying in the direction of small samples. For theoretical research, this paper has been accepted for ACM Computing Surveys, and WWW. In addition, practical applications in small samples, especially new drug discovery work, were received as Spotlight Paper by NeurIPS 2021 this year. EMNLP will accept the short text classified by small sample as long text. Intent identification and cold start are also in progress and are currently under review.

In addition, the work on small sample gesture recognition was supported by the National Natural Science Foundation of China. Last but not least, PaddleFSL, which now has over 1100 stars and over 10,000 article views.

Take this opportunity, I hope students who are interested in small sample learning can scan the qr code below to learn more, and carry out cutting-edge research and practice together.

Figure 7.