Technical editor: fly Beijing SegmentFault public | think no number: SegmentFault

PaddlePL is an open source federated learning framework based on PaddlePaddle. Researchers can easily copy and compare different federated learning algorithms using paddlePL. Developers can also benefit from Padderfl because it is easy to deploy federated learning systems in large distributed clusters using PaddlePL.

PaddlePL provides many federated learning strategies and their applications in computer vision, natural language processing, recommendation algorithms, and more. In addition, PaddlePL will also provide applications of traditional machine learning training strategies, such as multi-tasking and transfer learning in federated learning environments. Thanks to PaddlePaddle’s large-scale distributed training and Kubernetes’ ability to flexibly schedule training tasks, PaddlePL can be easily deployed based on full-stack open source software.

The federal study

Today, data is becoming more expensive and it is difficult to share raw data across organizations. Joint learning aims to solve the problem of data isolation and data knowledge security sharing among organizations. The concept of federated learning was developed by researchers at Google.

PaddleFL overview

In PADDLEPL, horizontal and vertical federated learning strategies are implemented based on the given classification. PaddlePL will also provide examples of applications in areas such as natural language processing, computer vision, and recommendation algorithms.

Federated learning strategy

  • Vertical federated learning: Logical regression with PRIVC, neural network with third party PRIVC [5]
  • Horizontal federated learning: federated averaging, differential privacy

Training strategy

  • multitasking
  • The migration study
  • Active learning

Paddlefl frame design

In PaddeFL, the components used to define federated learning tasks and federated learning training work are as follows:

Compile time

  • FL-Strategy: Users can use FL-Strategy to define federated learning policies, such as FED-AVG [1].
  • User-defined-program: PaddlePaddle’s programs define machine learning model structures and training strategies, such as multitasking.
  • Distributed-Config: In federated learning, the system is deployed in a Distributed environment. Distributed training configuration defines distributed training node information.
  • FL-Job-Generator: Given Fl-Strategy, User-defined Program and Distributed Training Config, fl-jobs on the Server side and Worker side of federated parameters will be generated through FL Job Generator. FL-JOBS are sent to the organization and federated parameter servers for joint training.

The runtime

  • FL-Server: A federated parameter Server running in the cloud or a third-party cluster.
  • Fl-Worker: Each organization participating in federated learning will have one or more workers that communicate with the federated parameter server.
  • Fl-Scheduler: It plays the role of scheduling workers during the training process and determines which workers can participate in the training before each update cycle.

Installation guide and quick start

Refer to Quick Start.

Kubernetes is easy to deploy

kubectl apply -f ./paddle_fl/examples/k8s_deployment/master.yaml

Refer to the K8S deployment example

You can also refer to the K8S cluster application and Kubectl installation to configure your own K8S cluster

The performance test

Gru4Rec [9] introduces a recursive neural network model in session-based recommendation. PaddlePaddle GRU4RC implementation code at https://github.com/PaddlePaddle/models/tree/develop/PaddleRec/gru4rec. For an example of a Federated Learning based training Gru4Rec model refer to Gru4Rec in Federated Learning

The address of the project: https://github.com/PaddlePadd…