How can you do deep learning with too little data?

Compiled by Tyler Folkman McGL

You’ve seen the reports — deep learning is the most popular thing since sliced bread. It promises to solve your most complex problems with only a fraction of the massive amount of data. The only problem is that you don’t work at Google or Facebook, and data is scarce. So what to do? Can you still harness the power of deep learning? Or just plain bad luck? Let’s take a look at how deep learning can be leveraged with limited data, and why I think this could be one of the most exciting areas for future research.

Start Simple

Before we discuss approaches to deep learning with limited data, forget about neural networks and create a simple benchmark. It usually doesn’t take long to try some traditional models, such as random forests. This will help you assess any potential improvements in deep learning and gain an insight into the trade-offs between deep learning and other traditional approaches to your problem.

Get more data

It sounds ridiculous, but have you really considered collecting more data? I’m surprised at how often companies make this suggestion. They look at me like I’m crazy. Yes, time and money can be spent collecting more data. In fact, this is often your best bet. For example, maybe you’re trying to classify rare bird species and have very limited data. You can almost certainly solve this problem more easily simply by tagging more data. Not sure how much data you need to collect? Try to plot the learning curve and see the changes in model performance as you add data.

Fine-tuning

Okay. Suppose you now have a simple baseline model, and it is impossible to collect more data or too expensive. The most tried approach at this point is to use the pre-training model and then fine-tune it for your problem.

The basic idea of fine-tuning is to train a neural network with a very large data set, preferably one that is somewhat similar to the domain of your data, and then fine-tune the pre-trained network with a smaller data set.

For image classification problems, the key data set is the classic ImageNet. This dataset contains millions of images in many different categories and is very useful for many types of image problems. It even includes animals, so may help classify rare birds.

Data Augmentation

If you can’t get more data, and you can’t fine-tune large data sets, data enhancement is often your second choice. It can also be used in conjunction with fine-tuning.

The idea behind data enhancement is simple: change the input data without changing the output label value.

For example, if you have a picture of a cat and rotate the image, it’s still a cat. This is a nice data enhancement. On the other hand, if you have a picture of a road and want to predict the appropriate steering Angle (self-driving car), rotating the image will change the appropriate steering Angle. In this case, of course, data enhancement is not possible.

Data enhancement is most commonly used for image classification problems.

You can often think of creative ways to apply data enhancement to other areas (such as NLP), and people are experimenting with gans to generate new data. If you are interested in the GAN approach, see DADA (Deep Adversarial Data Augmentation).

Cosine Loss

A recent paper “Deep Learning on Small Datasets without pre-training using Cosine Loss” studied Deep Learning on Small Datasets without pre-training using Cosine Loss. When the loss function is switched from cross entropy loss to cosine loss, the accuracy of the small data set of classification problems is improved by 30%.

You can see how performance varies based on the number of samples per class. Fine-tuning is very valuable for some small data sets (CUB) and less meaningful for others (CIFAR-100).

deeper

In NIPs paper “Modern Neural Networks Generalize on Small Data Sets”, they considered deep Neural Networks as ensembles. Specifically, “the final layers may provide an ensemble mechanism rather than each layer presenting an ever-increasing hierarchy of features.”

My takeaway from this is that for small data, make sure your network is deep enough to take advantage of this ensemble effect.

Autoencoders

Some success has been achieved with the use of stacked autoencoders to pre-train networks with more desirable initial weights. This allows you to avoid local optimal solutions and other pitfalls of bad initialization. But Andrej Karpathy advises against getting too excited about unsupervised pre-training.

The basic idea of autoencoder is to build a neural network with predictable input.

Prior Knowledge

Last but not least, try to find ways to integrate domain-specific knowledge to guide the learning process. For example, in the paper “Human-level Concept Learning Through Probabilistic Program Induction “, the author constructed a model through probabilistic program induction. The model builds concepts from each part by utilizing prior knowledge in the process. This went beyond the deep learning methods of the time to human level performance.

You can also apply domain knowledge to limit the input of the network to reduce dimensions or make the network structure smaller.

I chose this as my last choice because integrating prior knowledge is challenging and often the most time consuming.

Making Small Cool Again

Hopefully, this article has given you some ideas on how to leverage deep learning techniques with limited data. I find that this issue is not being discussed as it should be, but it is significant.

Many of these problems exist today: data is very limited, and getting more data is expensive or impossible. For example, testing for rare diseases or educational outcomes. Finding ways to apply our best techniques (deep learning, etc.) to these problems is very exciting!

Source: towardsdatascience.com/how-to-use-…