Vertical federated learning requirements modeling scenarios

Fintech – Risk management for Small and Micro Business Credit

Pain points


Ideal is full, reality is very skinny



Banks expect intensive and comprehensive information about enterprises and their controllers

But in practice banks usually only have central bank credit reports



Therefore, there is a lack of comprehensive understanding of customers and the distribution of data is severely skewed

Copy the code

Federated learning-based solutions

The bank cooperates with the bill company

Through the joint modeling of billing amount and credit rating label attributes of the central Bank in the last 3/6 months, the expectation is predicted

Copy the code

Insurtech – Personalized pricing

Pain points

  • Insurance company plump ideal

    • Accurate personalized user portrait (hundreds of dimensions)
    • Comprehensive data coverage
  • The reality of the backbone of insurance companies

    • Lack of comprehensive understanding of customers
    • The distribution of data is heavily skewed

Personalized insurance pricing based on vertical federalization

Through the federal learning modeling of age, occupation, car rental and other sub-label attributes, the probability of risk prediction, decision whether to risk

Copy the code

Horizontal federated learning requirements modeling scenarios

Weizhong and cooperative bank jointly build anti-money laundering model, hoping to optimize the anti-money laundering model

Copy the code

set

  • Y indicates whether there is money laundering
  • Cooperative row and micro have (X,Y)
  • They don’t expose their (X,Y)

Problems of traditional modeling methods

There are not enough samples of wemedia and partner banks

Copy the code

Expect the result

  • Under the condition of privacy protection, the joint model is established
  • Federated models outperform unilateral data modeling

Horizontal federated learning

The characteristics of

  • All participants have the same data characteristics (including data labels)
The traditional way to view a database is in a table

Group data horizontally in rows

Each row contains the same data characteristics

Copy the code
  • Participants are not required to exchange information
  • There are FedAvg algorithm
  • Good support for deep learning (deep neural networks)


Horizontal federated application scenarios

Security field


Visual business in different places: pedestrian detection, travel detection, area detection, equipment anomaly detection, helmet detection, flame detection, smoke detection

Copy the code

Pain points

  • Low number of labels
  • Data is scattered and centralized management costs are high
  • Discrete delay model update and feedback

Federated Learning Solutions

  • Online model update and feedback
  • Centralized data upload is not required
  • Data protection, high privacy
  • Compared with local modeling, the accuracy of the algorithm is further improved
  • The formation of network effect, will lower the long tail application cost, improve the overall profit margin of visual business

Horizontal federation addresses healthcare big data

Pain points

  • Medical data is highly private, and data maintainers are strict in patient data management and use
  • Data is scattered and there are not enough samples available for a single organization

Naturally suitable for medical big data scenarios

  • Data security sharing mechanism, effectively protect user privacy
  • Securely connect disparate data sources to build data models
  • Security federation modeling is almost nondestructive

Multi-agency combined stroke prediction

Federal learning to establish stroke probability prediction model

  • Three grade A hospitals + two small hospitals
  • Patient hospitalization process data and physical signs data

The effect

  • The effect of joint modeling based on federated learning is better than that of any hospital data independent modeling
  • There is little difference between the effect of federal learning training model and that of centralized data training model

Samples from each hospital


AUC results were calculated separately for each hospital, all data were calculated centrally, and AUC results were compared federally



Epoch: A complete training of the model using all the data in the training set is called generation training

Similar concepts:

Batch: a backpropagation parameter update of model weights is performed using a small part of samples in the training set, which is called a Batch of data

Iteration: The process of updating the model parameters once using a Batch of data is called a training

Copy the code

FATE

  • Industrial level Federal learning system
  • Effectively assist multiple organizations in data use and joint modeling in compliance with data security and government regulations

Design principles

  • Support a variety of mainstream algorithms: machine learning, deep learning, transfer learning to provide high performance federated learning mechanism
  • Supports a variety of multi-party secure computing protocols: homomorphic encryption, secret sharing, hashing, etc
  • Friendly cross – domain interaction management solution to solve the federal learning information security audit problem

The technical architecture


Federated ML

Federated learning algorithms: Federated Feature Engineering, Federated Statistics, Federated LR, GBDT, DNN

Copy the code

Fate-Board

Federated modeling visualization:

A. Federated modeling task lifecycle process visualization

B. Visualization of federation model and evaluation report

Copy the code

Fate-Flow

End-to-end federated modeling Pipeline scheduling

A. Federated Modeling of multi-task scheduling

B. Fault tolerance and automatic error recovery

Copy the code

Fate-Serving

Online reasoning service for production environment



A. Online prediction ability of the model

B. Online model management ability

Copy the code

Fate-Cloud Manager

Data cooperation grid sets up basic management facilities

Multi-party association

Copy the code

KubeFate

Cloud native technology management Fate Workload

Have Fate rapidly deployed on the K8S

Copy the code

End-to-end federated modeling Pipeline scheduling and management

DAG defines federated learning pipelines

  • Multi – asymmetric Pipeline DAG
  • General JSON format DAG DSL, DSL Parser

Federated task cooperative scheduling

  • Multi-task queue
  • Distribute the tasks
  • State synchronization and other collaborative scheduling

Federated model Management

  • Federated model access, consistency, versioning, release management

Federated task lifecycle management

  • Multi-party start and stop, state detection

Real-time tracking of federated state input and output

  • Data, models, and custom indicator logs are recorded and stored in real time

Federated modeling Pipeline scheduling and management


Fate-serving: High-performance federated online inference service

Help customers solve the problem of complex model deployment and inefficient manual resource expansion

Copy the code
  • High performance, GRPC protocol based, batch federated request, federated participant model results multi-level caching
  • High availability, stateless design, abnormal degradation function
  • High elasticity, model & data processing App loading dynamically

Architecture diagram