TOWARDS FASTER AND BETTER FEDERATED LEARNING: A FEATURE FUSION APPROACH

Abstract

This paper proposes a feature fusion approach to accelerate and improve the performance of federated learning.

Introduction

Many smart devices today rely on pre-training models, which makes the machine’s reasoning capabilities less personalized and flexible. At the same time, smart terminals also generate a large amount of effective private data, which can improve the personalized ability of these models. Federated learning, a distributed training algorithm that trains models directly on the terminal, solves this problem. Among them, the federated learning algorithm represented by FedAvg algorithm effectively alleviates the privacy problem in information exchange. However, some studies later showed that federated learning still has problems such as calculation consumption and model accuracy.

In this paper, a feature fusion federated learning algorithm, FedFusion, is proposed, which integrates the features of global model and local model. The three main contributions of this paper are as follows: 1. Introduction of feature fusion mechanism 2. Fusion of local model and global model in an effective and personalized way 3. The experimental results show that the model is superior to the baseline in accuracy and generalization ability and the traffic is reduced by more than 60%.

Related Work

It is mainly the FedAVG algorithm of Federated Learning

Methods

It is mainly divided into feature fusion module and FedFusion algorithm

Feature Fusion Modules

The blue features in the figure are the two-channel features extracted from the Local model, and the gray features are the two-channel features extracted from the Global model. The figure shows three feature fusion modes: Conv, Multi and Single.

Conv:


Among themRepresents a learnable weight matrix whose shape is 2C*C. The specific operation is to convolve global and local features after concat.

Multi:


The multiplication operation is a weighted sum of local and global using a lambda weight matrix.

Single:


The addition operation is a weighted sum of local and global using a scalar lambda weight.

FedFusion

The training process is to use the features of the last round of global model to participate in the model feature aggregation training of this round.

Experiment

Experiment setup

Data set: Mnist, Cifar10

Data segmentation method:

  1. Manual noniID segmentation, each node contains only two categories
  2. In the noniID mode of user segmentation, each node contains similar categories but adopts different distribution, similar to multi Task learning
  3. IID distribution

Artificial Non-IID Partition

Experimental results of two random manual noniID sampling methods. The experimental results show that the fusion mode of multi operation has the best effect. Conv fusion mode converges a little faster, but the final effect is not as good as multi. None and Single are very average.

The multi operation allows the model to select feature maps that are valid for local data for fusion. The single operation, however, is a scalar and cannot select specific channels of FeatureMap.

User-Specific Non-IID Partition

In terms of accuracy, FedFusion is higher than Fedavg, and CONV has a faster convergence rate and higher accuracy.

The figure above shows the reduction in traffic between FedAvg and FedFusion. According to the results, conV fusion works better in User specific noniID scenarios. This is because, in the user Specific noniID pattern, the categories of data are similar, but with different distributions. The conV fusion mode has stronger integration of feature maps from local and global models. That is, the knowledge of data distribution between different nodes.

As for the impact of generalization ability, when new nodes are added, Fedfusion only needs more than 60 local epochs to achieve fitting, which has better initialization compared to other methods.

IID Partition

The authors argue that the distribution of Iids also needs to be evaluated, as the effectiveness of strategies for IID distribution is questionable if they cannot be addressed.

The fusion of MULTI and CONV achieves better accuracy with minimal communication consumption. In terms of the accuracy of the final convergence, there are many improvements compared with other methods.

Make a summary about the three fusion methods:

The multi operation makes the selection of local and global feature maps more flexible and more explicable. Each channel of the weight vector represents the weight of the corresponding channel of the global feature map. When gaps occur between data categories, the multi operation can select the most effective feature map for fusion. Conv operations are more effective at integrating knowledge of global and local models. Conv fusion is better if the data on the nodes have similar categories but different distributions. Single fusion has a slight improvement.

Conclusion

The fusion of feature map reduces the traffic, increases the model effect, and improves the generalization ability of new nodes.