This is the 7th day of my participation in the August More Text Challenge

Flink-BroadcastState

concept

During development, you can use Broadcast State when you need to send/Broadcast low-throughput events such as configurations and rules to all downstream tasks.

Broadcast State is a new feature introduced in Flink 1.5.

Downstream tasks take these configurations, rules and save them as BroadcastState, and apply them to calculations in another data stream.

scenario

  1. Dynamic update of calculation rules: If the event flow needs to be calculated according to the latest rules, the rules can be broadcast to the downstream Task as a broadcast status.

  2. Adding additional fields in real time: If the basic user information needs to be added to the event flow in real time, the basic user information can be broadcast to the downstream Task as the broadcast status.

API,

  1. First create a Keyed or non-keyed DataStream,

  2. Then create a BroadcastedStream,

  3. Finally, connect to the Broadcasted Stream via DataStream.

  4. This implementation broadcasts the BroadcastState to each Task downstream of the Data Stream.

Keyed Stream

If DataStream is Keyed Stream, the connection to the Broadcasted Stream, need to use when adding processing ProcessFunction KeyedBroadcastProcessFunction to implement.

The code looks like this:

public abstract class KeyedBroadcastProcessFunction<KS.IN1.IN2.OUT> extends BaseBroadcastProcessFunction {

    public abstract void processElement(final IN1 value, final ReadOnlyContext ctx, final Collector<OUT> out) throws Exception;

    public abstract void processBroadcastElement(final IN2 value, final Context ctx, final Collector<OUT> out) throws Exception;

}
Copy the code

The meanings of the parameters in the above generics are described as follows:

  • KS: represents the type of Key that Flink programs rely on when calling keyBy to build a Stream from the Source Operator at the most upstream;

  • IN1: indicates the type of Data records in non-broadcast Data streams.

  • IN2: indicates the type of data records in Broadcast Stream.

  • OUT: after KeyedBroadcastProcessFunction processElement () and processBroadcastElement () method after processing the output data record type.

Non-Keyed Stream

If the Data Stream is a non-keyed Stream, then it is connected to a Broadcasted Stream.

The code looks like this:

public abstract class BroadcastProcessFunction<IN1.IN2.OUT> extends BaseBroadcastProcessFunction {

public abstract void processElement(final IN1 value, final ReadOnlyContext ctx, final Collector<OUT> out) throws Exception;

public abstract void processBroadcastElement(final IN2 value, final Context ctx, final Collector<OUT> out) throws Exception;

}
Copy the code

The meaning of each parameter of the generic, and after the front of the generic type KeyedBroadcastProcessFunction 3 same meaning, but no call keyBy to partition the original Stream operation, there is no need for KS generic parameters.

emphasis

  1. Broadcast State is of Map type, that is, K-V type.

  2. Only on one side of the radio Broadcast the State, be in namely BroadcastProcessFunction or KeyedBroadcastProcessFunction processBroadcastElement method can be modified. On one side of the radio is read-only in BroadcastProcessFunction or KeyedBroadcastProcessFunction processElement method.

  3. The order of elements in Broadcast State may vary among tasks. Order based processing, need to be careful.

  4. Broadcast State At Checkpoint, each Task Checkpoint broadcasts the status.

  5. Broadcast State is stored in memory during runtime, but cannot be stored in RocksDB State Backend.