Sentinel: Traffic Defense for distributed Systems

The flow control components of distributed service architecture mainly take traffic as the entry point and help developers guarantee the stability of microservices from multiple dimensions such as traffic limiting, traffic shaping, fusing downgrading, system load protection, and hotspot protection.

How Sentinel works

1, architecture,

ProcessorSlotChainCore skeleton: Different slots are strung together sequentially (chain of responsibility mode) to combine different functions (traffic limiting, downgrading, system protection). Slot chain is divided into two parts: statistic and rule checking.The system creates a SlotChain for each resource.

2. SPI mechanism

The Slot execution order in the Sentinel Slot chain is fixed, but not absolute. Sentinel extends ProcessorSlot as an SPI interface, making SlotChain scalable. Users can customize slots and arrange the order between them.

Code implementation

Inheritance AbstractLinkedProcessorSlot, and set the @ Spi (order)

@Spi(order = Constants.ORDER_FLOW_SLOT) public class FlowSlot extends AbstractLinkedProcessorSlot<DefaultNode> { ... . }Copy the code

3, slot.

Workflow: The main workflow of Sentinel is included in sphu. Entry method. Through chain call, it has gone through the establishment of tree structure, storage of statistical cluster points, anomaly logging, real-time data statistics, load protection, permission authentication, flow control, fuse downgrading and other slots

Call chain:

META-INF/services/com.alibaba.csp.sentinel.slotchain.ProcessorSlot

NodeSelectorSlot >>> ClusterBuilderSlot >>> LogSlot >>> StatisticSlot >>> ParamFlowSlot >>> SystemSlot >>> AuthoritySlot >>> FlowSlot >>> DegradeSlot

  • NodeSelectorSlot Collects the paths of resources and stores the call paths of these resources in a tree structure for traffic degradation based on the call paths.
  • ClusterBuilderSlot is used to store resource statistics and caller information, such as RT, QPS and Thread Count of the resource. These information will be used as the basis for multi-dimensional traffic limiting and degradation.
  • StatisticSlot is used to record and collect statistics about runtime indicators at different latitudes.
  • ParamFlowSlot (Hotspot flow control) Corresponding to hotspot flow control (flow control based on resource hotspot parameters)
  • SystemSlot (system rules) controls the total incoming traffic by system states such as load1. (Global traffic control for the current service)
  • AuthoritySlot (authorization rule) controls the blacklist and whitelist according to the configured blacklist and whitelist and call source information. (Authorize specific applications that access resources)
  • FlowSlot (flow control rule) is used to control the flow according to the preset flow limiting rule and the status of previous slot statistics. (For resource flow control)
  • It degrades with statistics and preset rules. (Perform degradation for resource scheduling)

4, the Node

A tree structure

Class relationships

Entry (resource) : Contains the resource name, curNode (current statistics node), and originNode (source statistics node). The constructor transforms the call chain, connecting the current Entry to the call link of the incoming Context.

Context: Each resource operation must belong to a Context (passed through ThreadLocal). If not specified, the default name=sentinel_default_context is created. A Context life cycle can contain multiple resource operations. The last resource in the Context lifecycle cleans up the Conetxt on exit(), indicating the end of the Context lifecycle.

node instructions The dimension Create time note
ROOT The Invocation tree root One application creates one System startup
EntranceNode Entry node, all of the call data for a Context (one request) entry context ContextUtil.enter context
DefaultNode A link node that collects data about a resource on the invoked link resource * context NodeSelectorSlot is created based on the context set curNode to context
ClusterNode Cluster point, used to collect global data for each resource resource ClusterBuilderSlot Is created based on resourceName set clusterNode to defaultNode
StatisticNode Statistics node, containing second/minute sliding window resource * origin The source node is created according to Origin set originNode to curEntry

The core source

Sentinel Each Entry has a unique Slot Chain to implement overall traffic control.

Core classes:

  • Sphu-sentinel static call entry
  • CtSph – Actual call entry
  • Context – Resource Context. The same resource can be contained in different contexts
  • CtEntry – Represents the actual resource
  • The default implementation DefaultProcessorSlotChain – slot chain
  • ProcessorSlot and subclasses – different slot implementations

SentinelResourceAspect — Entrance

Spring AOP: AspectJ pointcuts (with annotations as an example)

@ Aspect public class SentinelResourceAspect extends AbstractSentinelAspectSupport {/ / entry point for: @SentinelResource @Pointcut("@annotation(com.alibaba.csp.sentinel.annotation.SentinelResource)") public void SentinelResourceAnnotationPointcut () {} / / Around notice @ Around (sentinelResourceAnnotationPointcut "()") public Object invokeResourceWithSentinel(ProceedingJoinPoint pjp) throws Throwable { ... . String resourceName = getResourceName(annotation.value(), originMethod); EntryType entryType = annotation.entryType(); int resourceType = annotation.resourceType(); Entry entry = null; Entry = SphU. Entry (resourceName, resourceType, entryType, pjp.getargs ()); // Call the target method return pjp.proceed(); } catch (BlockException ex) { return handleBlockException(pjp, annotation, ex); } catch (Throwable ex) { // No fallback function can handle the exception, so throw it out. throw ex; } finally { if (entry ! = null) {// Current resource enhancement processing end entry.exit(1, ppp.getargs ()); }}}}Copy the code

2. Call entry

2.1 SphU – Static call entry

Five main things have been done

  • 1. Wrap the resource name and traffic type
  • 2. Get the context from the current thread. If no context has been created before, a new context is created with context-name sentinel_default_name and original “”
  • 3, add a rule check call chain, according to our configured rule layer by layer check, as long as a certain rule fails to end in advance to throw the rule corresponding exception
  • 4. Create a traffic entry that holds information about this call, specifying the context’s curEntry
  • 5. Start executing the rule checking call chain
public static Entry entry(String name, int resourceType, EntryType trafficType, Object[] args) throws BlockException {// name: resource name, resourceType: resourceType, entryType: whether the traffic type is the entry or exit (the system rules apply only to the entry traffic), args: parameter, // batchCount: by default, one request returns env.sph. EntryWithType (name, resourceType, trafficType, 1, args); }Copy the code

2.2 CtSph – Actual call entry

private Entry entryWithPriority(ResourceWrapper resourceWrapper, int count, boolean prioritized, Object... Args) throws BlockException {// [concern] Context (ThreadLocal) held by the current thread: Contextutil.getcontext (); contextutil.getContext (); If (context instanceof NullContext) {// The number of context requests exceeds the threshold: Return new CtEntry(resourceWrapper, null, context); } if (context == null) {// Create the default name (sentinel_default_context) : In a ThreadLocal context = InternalContextUtil. InternalEnter (the CONTEXT_DEFAULT_NAME); } // Global switch - Off: returns a resource object without a rule check if (! Constants.ON) { return new CtEntry(resourceWrapper, null, context); } // Add a rule checking call chain ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper); If (chain == null) {return new CtEntry(resourceWrapper, null, context); } Entry e = new CtEntry(resourceWrapper, chain, context); Try {// Start rule check chain-entry (context, resourceWrapper, NULL, count, prioritized, args); } catch (BlockException e1) {// E.xit (count, args); // throw e1 up; } catch (Throwable e1) { RecordLog.info("Sentinel unexpected exception", e1); } return e; }Copy the code

3. Context-resource Context

The same resource can be contained in different contexts: statistics resource invocation information, such as QPS and RT

Protected static Context trueEnter(String name, String origin) {// Try: Contextholder.get (); Context = contextholder.get (); If (context == null) {// Try: Key =context-name, value=EntranceNode map <String, DefaultNode> localCacheNameMap = contextNameNodeMap; EntranceNode: context-name DefaultNode DefaultNode node = localCachenamemap. get(name); If (node == null) {// limit 2000, Constants.MAX_CONTEXT_NAME_SIZE) {setNullContext(); return NULL_CONTEXT; } else { LOCK.lock(); Node = contextNamenodemap.get (name); contextNamenodemap.get (name); if (node == null) { if (contextNameNodeMap.size() > Constants.MAX_CONTEXT_NAME_SIZE) { setNullContext(); return NULL_CONTEXT; } else {create EntranceNode node = new EntranceNode(new StringResourceWrapper(name, entryType.in), null); // Add entrance node. Add the new node to ROOT constants.root.addChild (node); // Write the new node to the cache map // To "prevent iterative stability issues" : iterate stable (for writes to shared collections: Otherwise, dirty data may be read.) Map<String, DefaultNode> newMap = new HashMap<>(ContextNamenodemap.size () + 1); newMap.putAll(contextNameNodeMap); newMap.put(name, node); contextNameNodeMap = newMap; } } } finally { LOCK.unlock(); Context = new Context(node, name); context = new context (node, name); // Initialize the context source context.setorigin (origin); // Write the context to ThreadLocal contextholder.set (context); } return context; }Copy the code

4, DefaultProcessorSlotChain – slot chain default implementation

Unidirectional list: by default, one node is created and two Pointers (first and end) point to the node

AbstractLinkedProcessorSlot<? > first = new AbstractLinkedProcessorSlot<Object>() { @Override public void entry(Context context, ResourceWrapper resourceWrapper, Object t, int count, boolean prioritized, Object... args) throws Throwable { super.fireEntry(context, resourceWrapper, t, count, prioritized, args); } @Override public void exit(Context context, ResourceWrapper resourceWrapper, int count, Object... args) { super.fireExit(context, resourceWrapper, count, args); }}; AbstractLinkedProcessorSlot<? > end = first; @Override public void entry(Context context, ResourceWrapper resourceWrapper, Object t, int count, boolean prioritized, Object... Throws Throwable {args) throws Throwable { Transition from first to the next node first. TransformEntry (Context, resourceWrapper, T, count, prioritized, ARgs); } @Override public void addLast(AbstractLinkedProcessorSlot<? > protocolProcessor) {// End next node: specifies a new node end.setNext(protocolProcessor); End = protocolProcessor; }Copy the code

ProcessorSlot and subclasses – different slot implementations

META-INF/services/com.alibaba.csp.sentinel.slotchain.ProcessorSlot

Obtaining SlotChain: Determine SlotChain one by one

// CtSph ProcessorSlot<Object> lookProcessChain(ResourceWrapper ResourceWrapper) { Get SlotChain (Key = resource, value= ProcessorSlotChain) ProcessorSlotChain = Chainmap. get(resourceWrapper); If (chain == null) {// Cache None: Creates and stores the cache synchronized (LOCK) {chain = chainmap. get(resourceWrapper); If (chain == null) {// Entry// Create a new chain size limit. // Cache map size >= maximum chain number threshold. Constants.MAX_SLOT_CHAIN_SIZE) {return null; Constants. } / / [key] to create a new chain chain = SlotChainProvider. NewSlotChain (); Map<ResourceWrapper, ProcessorSlotChain> newMap = new HashMap<ResourceWrapper, ProcessorSlotChain>( chainMap.size() + 1); newMap.putAll(chainMap); newMap.put(resourceWrapper, chain); chainMap = newMap; } } } return chain; }Copy the code

5.1, NodeSelectorSlot

Collect the paths of resources and store the call paths of these resources in a tree structure for limiting traffic degradation based on the call paths

Create DefaultNode from the context.

@Override public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... Args) throws Throwable {// Get DefaultNode from the cache DefaultNode node = map.get(context.getName()); if (node == null) { synchronized (this) { node = map.get(context.getName()); If (node == null) {// Create DefaultNode node = new DefaultNode(resourceWrapper, null); HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size()); cacheMap.putAll(map); cacheMap.put(context.getName(), node); map = cacheMap; // Add the new node to the invocation tree ((DefaultNode) Context.getLastNode ()).addChild(node); } } } context.setCurNode(node); // Trigger the next node fireEntry(context, resourceWrapper, node, count, prioritized, args); }Copy the code

5.2, StatisticSlot

Record and count the monitoring information of runtime indicators at different latitudes.

Note: All subsequent slots in SlotChain are called to complete rule detection. And then we count.

@Override public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count, boolean prioritized, Object... Args) throws Throwable {try {// Pass backwards: Calls all subsequent slots in SlotChain to complete rule detection. (Exceptions may be thrown during the execution. For example, BlockException) fireEntry(Context, resourceWrapper, node, count, prioritized, args); // All previous rules pass: add thread count and QPS to DefaultNode (number of passed requests: sliding window involved) node.increasethreadnum (); node.addPassRequest(count); . . }Copy the code

5.3, FlowSlot

Traffic is controlled according to preset traffic limiting rules and slot statistics.

@ Override public Boolean canPass (Node Node, int acquireCount, Boolean prioritized) {/ / get the current time window has statistics: QPS int curCount = avgUsedTokens(node); If (curCount + acquiretell > count) {// Set priorities for real events and set mode to QPS (prioritized=true) : Wait for if (prioritized && Grade == Ruleconstant.flow_grade_qps) {long currentTime; long waitInMs; currentTime = TimeUtil.currentTimeMillis(); WaitInMs = node.tryOccupyNext(currentTime, acquipynext, count); / / OccupyTimeoutProperty getOccupyTimeout = 500 ms / / if traffic has priority, Number of tokens for future if (waitInMs < OccupyTimeoutProperty. GetOccupyTimeout ()) {/ / add to take up the future QPS, Invokes the OccupiableBucketLeapArray. AddWaiting (long time, int acquireCount) node. AddWaitingRequest (currentTime + waitInMs, acquireCount); node.addOccupiedPass(acquireCount); sleep(waitInMs); throw new PriorityWaitException(waitInMs); } } return false; } return true; }Copy the code

5.5, DegradeSlot

Using statistics and preset rules to do circuit breaker downgrades;

Note: See only that the state changes from OPEN to HALF_OPEN and HALF_OPEN to OPEN, but not how the state changes from HALF_OPEN to CLOSE. It changes after normal request execution,entry.exit() calls the survival lot.exit() method to change the status

@override public Boolean tryPass(Context Context) {// Template implementation State.CLOSED) { return true; } if (currentstate.get () == state.open) {For half-open State we allow a request For probing. Return retryTimeoutArrived() &&fromOpenToHalfOpen (context); } return false; } protected Boolean fromOpenToHalfOpen(Context Context) {// Try to set the state from OPEN to HALF_OPEN if (currentState.com pareAndSet (State. The OPEN State. HALF_OPEN)) {/ / State change notification notifyObservers (State. The OPEN, the State HALF_OPEN, null); Entry entry = context.getCurEntry(); WhenTerminate (new BiConsumer<Context, Entry>() {@override public void accept(Context Context, Entry Entry) { Resetting the state to OPEN requests is different via if (entry.getBlockerror ()! = null) { // Fallback to OPEN due to detecting request is blocked currentState.compareAndSet(State.HALF_OPEN, State.OPEN); NotifyObservers (State. HALF_OPEN State. The OPEN, 1.0 d); }}}); return true; } return false; }Copy the code

Sentinel Dashboard server source code

There are three main things to do

  • Using spi load com. Alibaba. CSP. Sentinel. Init. InitFunc some implementation class;
  • Sort the loaded implementation classes;
  • Call the initialization methods of these implementation classes
    1. CommandCenterInitFunc: Gets the command center, does some preparatory work (registers the Dashboard interface handler), and then creates a socket to listen on port 8719 (the number sentinel communicates with the client)
    2. HeartbeatSenderInitFunc: initializes heartbeat tasks
    3. MetricCallbackInit: Registers the extended entry and exit callback classes
    4. ParamFlowStatisticSlotCallbackInit: registration parameters into the mouth and export callback class

The resources

The official documentation

Sentinel core class parsing