Overview of Safe Learning framework

In this paper, the Learning-Based Model Predictive Control Notes and understanding of the article cited in Section5 of the Toward Safe Learning in Control article, and a simple overview and analysis of Safe reinforcement Learning.

Introduction to different ideas and methods of Safe Reinforcement Learning (taking 2021TAC as an example)

Learning- Based Model Predictive Control (Toward Safe Learning in Control) 1. learning the system dynamics 2. Learning the controller design 3. MPC for safe learning. Among them, the first two are routine operations. For the third one, combining with the popular Safe Reinforcement learning, several tasks are listed in this paper. This article is mainly the work of ETH’s laboratory staff, which quotes a lot of their own laboratory work.

Generally, learning-based control algorithms, such as reinforcement learning, have made great progress in high-dimensional control problems. However, due to the existence of system physical constraints, most of the work cannot guarantee security, especially in the process of iterative learning. In order to solve this problem, A 2011 article Guaranteed Safe Online Learning of a Bounded System proposed a security framework by using a model-based controller when necessary and a learning-based controller otherwise, To optimize the loss function.

MPC can be combined with safety filter proposed in A Predictive Safety Filter for Learning-Based Control of Constrained Systems. To transform a safety-critical dynamic system into a safety system, and various learning-based controllers can be used directly together. Traditional MPC algorithm considers constraint and cost function at the same time, namely a pair of tradeoff. Especially for random system, if constraint is satisfied, cost function may be high; if constraint is not satisfied, cost function will be reduced. Therefore, Constraint satisfaction and cost function optimization need to be considered separately, MPC algorithm is used to satisfy constraints, and learning-based method is used to optimize performance. MPC controllers are usually rough approximations of true stochastic optimal control problems. To solve these problems, consider using learning-based controllers, such as random strategy search, and approximate dynamic programming.

2011 Guaranteed Safe Online Learning of a Bounded System

Many learning-based control methods, represented by reinforcement learning, have made great progress and can be used to deal with complex and high-dimensional problems. However, most of these methods cannot guarantee safety limits under physical conditions, especially in the process of learning iterations. To solve this problem, safety framework appears in control theory. That is, this article should be the first to address the safe-learning problem.

Machine learning methods are widely used in automatic robots. However, many results only describe the performance of the system by accuracy and convergence rate, and little work is done to analyze the theoretical analysis to ensure its stability and robustness. Therefore, many machine learning algorithms are limited to scenarios with low security requirements. To solve this problem, use Reachability Analysis, a technique that can be used to calculate regions of the state space, also known as reachable sets. The point in this set, despite the disturbance, is guaranteed to be safe for some time in the future. This article shows how accessibility analysis combined with machine learning algorithms can be applied to a real scenario: a flying robot trying to learn the dynamics of a ground car, using a camera and a limited field of view.

The proposed algorithm uses the Reachability analysis method

2017: Safe Reinforcement Learning Via Online Shielding

This paper puts forward two modes to ensure safety, which are distinguished according to the position of shield’s action. The first mode is that Shield takes effect before the learning Agent calculates the control quantity and provides a list of safety actions. The second method is to verify whether the action is safe after the learning Agent calculates the control quantity, and then correct the unsafe action.

Safe Reinforcement learning is defined in this paper as learning an optimal controller when the conditions of logical security are met during training and execution.

Give an example:

For a hot water storage tank, learn an energy efficient controller to keep water above a specified temperature. There is a correlation between energy consumption and water level, but the relationship is unknown. The water tank can give water and water, and the volume of the water tank has certain limitations, which can be expressed as follows in logical expression:

The first and second constraints are physical constraints on the water level, and the third and fourth constraints need to maintain two consecutive time steps after the valve is opened or closed.

The Shield method is constructed mainly through formal verification.

Safe Exploration of Nonlinear Systems: A Predictive Safety Filter for Reinforcement Learning

2018: An Online Approach to Active Set Invariance

Traditional methods to ensure system security require offline calculation of a viable set, which is difficult to calculate and consumes large resources. In this paper, an optimal backup strategy was designed for linear systems combined with MPC. The definition of security was as follows: “A system being safe is known as this system never leaving the safety set.”

A natural question is “What is the best backup control rate?” This article considers the use of MPC to optimize the control rate in the future time domain.

Algorithm: For a given linear system

1. According to the system dynamics, an Ellipsoidal set is calculated by using optimal control

2. Use MPC to solve an approximate optimization problem.

2019: Probabilistic Model Predictive Safety Certification for Learning-Based Control

This paper proposes a Probabilistic Model Predictive Safety Certification (PMPSC), which can be combined with any RL algorithm and can provide demonstrable safety guarantees. The proof of stability is by connecting the state of the current system to a terminal set of states via a stochastic tube, a form that allows recursive feasible solutions for unbounded disturbances. By designing a design step using Bayesian inference and the results of the latest probabilistic invariant set sum. Finally, a simulation of a numerical vehicle example is used to show that the RL algorithm can be combined with the framework proposed in this paper to obtain security proof.

A Predictive Safety Filter for Learning-Based Control of Constrained Nonlinear Systems

This article puts forward the most RL algorithm does not support the state and input constraints is considered, and the paper puts forward a solution to the problem of the algorithm, using a predictive security filters, it can be a constrained dynamic systems is transformed into an unconstrained security systems, and then you can use any RL algorithm is used to. The predictive filter inputs a control input and, based on the current state, determines whether the input can be safely used in the actual system. Otherwise, it must be modified to meet the security conditions. Security is satisfied by continuously updating the security policy, mainly using the MPC algorithm to update this security policy, using a data-driven system model, while taking into account the independent uncertainties of state and input.

Learning-based MPC algorithm attempts to combine the advantages of the two, as described in the review in 2020. However, designing such an algorithm is challenging, usually conservative, and requires a lot of expert knowledge. At the same time, this method can only be limited to model-based methods. In addition, in each time step, an optimal control problem in the finite time domain needs to be solved to approximate the control problem in the possibly infinite time domain.

Concept: Predictive Safety Filter, a variant of Model Predictive Control, is proposed in this paper. It can transform the dynamic system with very nonlinear and high safety requirements into a security system, and then apply any RL algorithm to the application, and do not need any safety certificates. Compared to the MPC algorithm, PSF verifies that the output of the RL controller is safe, otherwise, it is modified very little to maintain safe operation at all times in the future. This means that the PSF only needs to ensure that the system is secure, and does not need to control it with a specific objective function. And then the problem turns to finding a safe filter instead of finding a desired controller, and for finding such a controller, you have to think about objective functions and constraints, and it’s complicated.

Contribution:

Aiming at the problem that the system dynamics is a probabilistic model, a predictive safety filter is proposed based on the concept of NMPC. You can search for safe Backup trajectory. To ensure security at all times in the future, you can modify the RL input for the backup trajectory search. Because MPC is often better than some non-optimal algorithms, such as Lyapunov function or synovial controller to solve an approximate optimal control problem online.
The PSF algorithm provides an implicit representation of state and input pairs, approximates its maximum allowable control set and input set, using the MPC method in the finite time domain.

Introduce the simulation environment (the problem studied) and determine the problem you need to study

The most important thing to find the research direction is to find the object to be studied. After reading many articles, I sorted out the simulation environment and system expression they used

Guaranteed Safe Online Learning of a Bounded System, a meeting by Jeremy in 2011

This article considers the problem of a drone and a ground car, using a drone to track a target on the ground.

The UAV can be regarded as an observer with a fixed sampling interval

The system dynamics of the ground vehicle is unknown, and the ground vehicle is tracked by controlling the sky observer. Meanwhile, the ground vehicle is always in the field of view of the observer.

A Probabilistic Approach to Model Predictive Control, CDC, 2013:

The article is a double integrator, said a particle in a two-dimensional plane movement, is a linear system, at the same time, considering the random disturbance, the system state with 4 d, which, respectively in the two-dimensional plane coordinate system, the control system of the target is to control the system to a state of equilibrium, which is near.

System constraints, state constraints and input constraints are also considered

A Lyapunov-based Approach to Safe Reinforcement Learning posted on ArXiv in 2018

This paper simulates a stochastic 2D Grid-World motion planning problem. An agent (robot car) starts from a safe area and its goal is to reach a given target location. At any given moment, an agent can move in four different directions, and due to noise from sensors and controllers, there is a probability that it will move randomly to a neighbor state. Taking into account fuel consumption, the fuel consumed at each stage is 1, and the payoff for reaching the target is 1000. Therefore, we want to make it possible for an agent to reach the target in as few steps as possible. In the journey from the initial point to the target point, there are a certain number of obstacles that the agent cannot pass due to safety restrictions. Therefore, the objective of the agent is to reach the target point in the most possible period of time, and the number of obstacles encountered does not exceed a specific value.

Use a 25×25 grid-World with 625 states.

CDC 2018 An Online Approach to Active Set Invariance

“Probabilistic Model Predictive Safety Certification for Learning-Base Control” published on TAC in 2021

This paper considers an autonomous driving scenario, in which an autonomous driving vehicle tracks a given trajectory. At the same time, considering the constraints of the road, that is, the constraints of the system state, the system dynamics can be expressed as:

Considering the system input constraints,

A Predictive Safety Filter for Learning-Based Control of Constrained Nonlinear Systems, published on ARXIV in 2021

The first simulation in this article looked at the traditional control problem of swinging Up, considering input constraints from a downward starting position, and also limiting the Angle of the upward position to a safe limit. The transition model is obtained by linear Bayesian regression. Considering gaussian measurement noise,

The second numerical simulation example is AscTec Hummingbrid Drone, a drone environment, simulated using Bullet Physics SDK,

A two-layer control structure is used for control. PD controller is used in the bottom layer, and the proposed controller is used in the top layer. Its state is 10-dimensional, and its input is 3-dimensional.

The final control objective is to control the UAV from a given initial position to the landing position, which is a three-dimensional coordinate given point. Bayesian regression method is used based on gaussian prior knowledge and Gaussian noise.

Wabersich’s 2018 paper Linear Model Predictive Safety Certification for Learning-Based Control, CDC

This paper considers a simple control problem, in which the dynamic part of the system is unknown, and through the method of data obtained an approximate model with errors, using the approximate model and LQR design method, the final control goal is to control the state of the system to.

Yang Yongliang published an article entitled Safe Reinforcement Learning with Static and on Neural Networks and Learning Systems in 2020 The dynamic event generators.”

This article uses the article synchronization assistant to synchronize