This article reprinted from: the IJCAI 2021 | system survey 168 paper, generalization appeared first review, the 2021-05-11 | author: Wang Jindong

Editor’s note: In recent years, domain generalization in transfer learning has attracted more and more attention and has been widely used in many fields. In IJCAI 2021, Wang Jindong, a researcher from Microsoft Research Asia, published the first review paper in this field, which summarized the development status and future development direction of the field generalization.


Domain Generalization (DG) is a very popular research direction in recent years. The problem is to learn a generalization model from several data sets (domains) with different data distributions in order to get a better effect in the Unseen test sets.

This is the first review article to introduce Domain Generalization, Generalizing to Unseen Domains: A Survey on Domain Generalization. A total of 168 literatures were investigated in this paper, of which 90 were directly related to domain generalization. In this paper, the problems of domain generalization are summarized in detail from the aspects of problem definition, theoretical analysis, method summary, data set and application introduction, and future research direction.

The condensed version of this paper has been accepted by IJCAI 2021.

Link to article:

Arxiv.org/abs/2103.03…

PDF:arxiv.org/pdf/2103.03…

Author: Microsoft Research Asia, Central University of Finance and Economics

Problem definition

The biggest difference between Domain generalization and Domain Adaptation (DA) is that in DA training, both source Domain and target Domain data can be accessed (in unsupervised DA, only unmarked target Domain data can be accessed). In the domain generalization problem, we can only access a number of source domain data for training, but the test data cannot be accessed. There is no doubt that domain generalization is a more challenging and practical scenario than domain adaptation: after all, we all like “train once, apply everywhere” sufficiently generalized machine learning models.

For example, in the figure below, the domain adaptive problem assumes that both the training set and the test set can be accessed during the training process, while the domain generalization problem has only the training set.

Figure 1: Example of domain generalization in the PACS dataset. The training set data consisted of stick figures, cartoons, and art drawings. The purpose of domain generalization is to learn a generalization model that performs well in the unknown target domain.

The schematic diagram of the domain generalization problem is shown below, and its formal definition is as follows:

Figure 2: Schematic diagram of domain generalization

Domain generalization not only has some similarities with domain adaptive problems, but also has some similarities and differences with multi-task learning, transfer learning, meta-learning, lifelong learning, etc. We summarize their differences in the table below.

Table 1: Domain generalization versus other related learning paradigms

The theory of

Starting from the theory of domain adaptation, we analyze the factors that influence the learning results in different domains, such as photo-divergence and photo-divergence, and then transition to the problem of domain generalization, and analyze the factors that influence the model generalization to the new domain. The important results of domain generalization are summarized theoretically, which points out the theoretical direction for future research.

Please refer to part 3 of the original article for detailed results.

methods

The domain generalization approach is our core. We divided the existing domain generalization methods into three aspects according to data operation, representational learning and learning strategy, as shown in the figure below.

Figure 3: Classification of domain generalization methods

Among them:

  • Data manipulation refers to the enhancement of training data through enhancement and variation of data. This category includes data enhancement and data generation.
  • Representational learning refers to learning domain-invariant representation learning so that models can be well adapted to different domains. Invariant feature learning consists of four sections: kernel method, explicit feature alignment, domain confrontation training, and Invariant Risk Minimiation (IRM). Feature decoupling is consistent with the goal of domain invariant feature learning, but the learning method is not consistent, so we will introduce it as a separate category.
  • Learning strategy refers to the introduction of mature learning patterns in machine learning into multi-domain training to make the model more generalized. This part mainly includes methods based on integration learning and meta-learning. At the same time, we will also introduce other methods, such as the application of self-supervised methods in domain generalization.

In this paper, we introduce and summarize each method in detail.

Applications and data sets

Domain generalization has been widely used in many fields. Most of the existing work is focused on designing better domain generalization methods, so it is often evaluated on image classification data. In addition, the domain generalization method is also applied to pedestrian re-ID, semantic segmentation, street view recognition, video understanding and other mainstream tasks of computer vision.

In particular, domain generalization is widely used in health care, such as Parkinson’s disease recognition, tissue segmentation, chest X-ray recognition, and tremor detection.

In the field of natural language processing, domain generalization is used in sentiment analysis, semantic segmentation, web page classification and other applications.

Domain generalization is also widely used in reinforcement learning, automatic control, fault detection, speech detection, physics, brain-computer interface and other fields.

The following figure shows the standard data sets that are popular in the domain generalization problem.

Table 2: Eight mainstream data sets for domain generalization

Future challenges

We have the following prospects for domain generalization:

  • Continuous domain generalization: a system should be capable of continuous generalization and adaptation, and is currently only a one-time offline application.
  • Domain generalization for new categories: Currently we assume that all domains have the same category, and in the future we need to expand to different categories or even new categories.
  • Explainable domain generalization: Although decoupling based methods have made progress in explainability, other broad classes of methods remain weak. Further research on their interpretability is needed in the future.
  • Large-scale pre-training and domain generalization: It is well known that large-scale pre-training (such as BERT) has become the mainstream, so how can we use the method of domain generalization to further improve the generalization ability of these pre-training models in the large-scale pre-training of different problems?
  • Evaluation of domain generalization: Although work empirically shows that the effectiveness of existing domain generalization methods is not significantly superior to empirical risk minimization, it is based on the simplest classification tasks. We believe that domain generalization can play its role to the maximum extent in specific evaluation, such as pedestrian re-recognition. In the future, we need to find a more suitable application scenario for the domain generalization problem.