Fault tolerance

Fault-tolerant techniques are important in any framework because no one can guarantee that the system will be error-free 100% of the time, but we want the system to be available 100% of the time. The reality is that software servers often fail or become unusable, even with hardware failures. In the telecom industry with many equipment, equipment failure is “too common”. If there is no good response measures, it will not be able to provide effective services. Therefore, the idea of “Let it Crash” emerged. Because faults cannot be completely prevented, measures should be taken in a timely manner. For example:

  1. The system needs to be fault-tolerant and should remain available and continue to operate when a failure occurs. Recoverable failures should not include triggering catastrophic failures.
  2. In some cases, as long as the core functions are available in the system. In this case, measures should be taken to isolate the failure part from the core part of the system to prevent unexpected results.
  3. A partial failure of the system cannot destroy the entire system, and a method is needed to isolate specific failures for subsequent processing.

At the end of this chapter, we’ll use a framework based on Akka Testkit + scalaTest to verify some of the fault-tolerant mechanisms of actors that can be confusing or difficult to intuitively understand. See: Akka Combat: Test-driven Development TDD – Nuggets (juejin. Cn)

Let it crash

Akka takes a separation of concerns approach, isolating normal business processes from troubleshooting. The Actor performs its own tasks in one of these processes, regardless of what to do when an exception is encountered. At the same time, the monitor is ready to handle any errors that might occur in another troubleshooting process.

Where do the monitors come from? As mentioned in the first chapter of this column, Akka adopts parental supervision. When Actor A creates another Actor B, A assumes the responsibility to monitor Actor B. The monitor itself does not catch exceptions and only makes different decisions based on the cause of the crash, including:

  1. Restart; The Actor is recreated from its registered Props, and the next Actor instance continues to process the next message, but the ActorRef reference remains the same.
  2. Resume: The same Actor instance ignores the error and moves on to the next message.
  3. Stop: The Actor terminates and is no longer involved in processing the message.
  4. Escalate: If the monitor does not know what to do, it reports the problem to its parent Actor, who is also a monitor.

These four situations are discussed in detail in the monitoring section below. In general, the following are Akka’s “Let it Crash “features for building systems:

  1. Fault Isolation — The monitor can decide to abort an Actor, removing it from the system altogether.
  2. Fault-tolerant Structures — Akka can replace Actor instances without affecting other actors.
  3. Redundancy — One Actor can be replaced by another. The monitor can decide to weed out the failing Actor and then create another type to replace it.
  4. Reboot — This can be done by restarting.
  5. Component lifecycle – An Actor is an active component that can be started, stopped, and restarted.
  6. Suspend – When an Actor crashes, its mailbox is suspended until the monitor makes a decision.
  7. Separation of concerns — The two processes of normal Actor message processing and error recovery are orthogonal and distinct.

Actor life cycle

There are three important events throughout the life cycle of an Actor: a start event, a stop event, and a restart event.

In each event, the Actor reserves some hook methods. You can insert your own code into these hook methods to recreate the specific state of a new Actor instance. For example, processing messages that failed before, or releasing resources when the Actor is down. Although Akka calls hooks asynchronously, the order in which they are called is guaranteed.

Start events

Actor is created by the actorOf method and starts automatically. A top-level Actor can be created with system.actorof (), which then calls context.actorof () in its own context to create child actors, as illustrated in the monitoring section below. The preStart method is called after the constructor of the Actor, and you can override preStart to do some pre-initialization of the Actor.

Stop events

The stop event is discussed before the restart event. The stop event indicates that the Actor is stopped, and the method, indicating that the life cycle of the Actor is terminated, only happens once. There are three situations where it stops:

  1. By context (ActorSystem orActorContext)stopMethod stop.
  2. Send one to ActorRefPoisonPill.
  3. The monitored Stop policy stops.

You can override the postStop hook method to free up some valuable system resources before the Actor is removed and recycled, or save the last state of the Actor somewhere outside of the Actor for the rest of the system to use. Note that the stopped Actor will be disconnected from its ActorRef. In other words, when the Actor is completely stopped, all subsequent messages to the ActorRef will become dead letters.

Restart the event

When the Actor is restarted, the upper ActorRef in the Actor remains the same, but the internal instances are replaced (actors can be rebuilt in the ActorSystem through the Props object). The restart event involves two Actor instances, so the process is much more complex than the stop and start events. Actor reserves two hook methods in the restart event: preRestart and postRestart.

PreRestart stores the last state of a crashed Actor before rebuilding a new Actor instance: Reason represents the exception encountered, and Message holds the information that was processed when the exception occurred.

override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
    super.preRestart(reason, message)
}
Copy the code

There are two points to note:

  1. The message that raises the exceptionmessageIt is discarded by default, which avoids the situation where invalid messages repeatedly cause restarts and other normal messages cannot be processed, for shortEmail poisoning. This is not mandatory unless the developer is fully aware of the consequences of trying to reprocess an exception message, such asAble to determineExceptions are caused by accidental errors from the outside world rather than by the message itself.
  2. In general, rewritepreRestartMethod is actively calledsuper.preRestart(reason, message)It’s necessary. This causes the system to actively call the values of the current Actor and its childrenpostStopMethod to clean up the resources they occupy. This is not mandatory, and developers also need to be fully aware of the consequences of not recycling resources immediately, such as explicitly ordering the remaining children to do the rest of the work before being stopped.

A previous instance can “leave” information to its next instance through the preRestart method before being stopped and destroyed. Such as:

override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
    super.preRestart(reason, message)
    // This message will be passed to the next instance.
    self ! Map["port":8080]}Copy the code

This message is delivered by self to the end of the queue in its Mailbox, so it will be received and processed by the next instance later. For example, if the user insists on processing the message that raised the exception again, the message can be reposted to the next successor using the preReStart method. The postRestart method is called after the new Actor instance is constructed through the constructor. See Experiment 4 below.

The entire life cycle of an Actor can be directly concatenated using a single diagram:

Important: As mentioned earlier, if the super-. preRestart method is declared to be called when overriding preRestart, then it and children’s postStop method are additionally called to clean up resources before execution. Similarly, if the super.postRestart method is declared to be called when postRestart is overridden, it and children’s preStart are first called to recover resources before it executes. See Experiment 3 below.

monitoring

In Akka’s error handling, monitoring and monitoring are two different concepts, but monitoring and monitoring can be used together and are closely related to the life cycle of actors.

Akka Series (3) : Monitoring and Fault Tolerance – Jianshu.com

As long as you can get an Actor reference ActorRef, you can actively establish/unmonitor relationships. This does not need to be a parent-child Actor relationship. Such as:

// Set up the monitoring relationship context.watch(otherActorRef) // Remove the monitoring relationship context.unwatch(otherActorRef)Copy the code

When the ActorRef being monitored is stopped for one of the following reasons (corresponding to an Actor lifecycle stop event) :

  1. Affected by the parent Actor’s Stop policy, see monitoring below.
  2. receivedPoisonPillNews.
  3. Father is Actorcontext.stop()Method stop.

The monitor receives the Terminate(actorRef) message and does some processing with it. Note that when an Actor is restarted, it does not pass this message to the monitor because its ActorRef itself is not affected. See Experiment 2 below.

monitoring

All actors created by users have a common ancestor called user Guardian, as shown in the figure.

The user creates an Actor using either context.acterof () or the system.actorof () method. Actors created with system.actorof () are called top-level actors. Within an ActorSystem, there is usually only one or very few top-level actors.

The entire Akka system of Actors builds a tree of father-child family trees, and developers only need to focus on the user space. Instead of actively monitoring, parent and child actors naturally form a monitoring relationship. For example, if the context of Actor A creates Actor B in its own context, then A is the parent Actor of B. It can also be said that A is the monitor of B. The monitor can decide what to do when a child encounters an exception, or it can dump a problem that it cannot handle on its own to a higher-level monitor.

Part I: Actor Architecture _ Wang_WbQ’s blog -CSDN Blog

Predefined policy

Even if the monitor responsibility is not actively implemented, each Actor implements the default policy defaultStrategy, as shown in the source code below. In addition to the first three Akka internal exceptions, the monitor will always try to reboot an infinite number of times to resolve application exceptions, which can cause blocking in some cases.

Exceptions that are not caught are continually thrown up to the User Guardian level. But as you can see from the default policy code, only system-level errors (which are throwable but not Exception) can actually be passed to the user guardian, which already means that the program has encountered a serious Error that cannot be recovered, and it is wise to gracefully shut down the entire system.

/** * When supervisorStrategy is not specified for an actor this * `Decider` is used by default in the supervisor strategy. * The child will be stopped when [[akka.actor.ActorInitializationException]], * [[akka.actor.ActorKilledException]], or [[akka.actor.DeathPactException]] is * thrown. It will be restarted for other `Exception` types. * The error is escalated if it's a `Throwable`, i.e. `Error`. */
final val defaultDecider: Decider = {
  case_ :ActorInitializationExceptionStop
  case_ :ActorKilledException         ⇒ Stop
  case_ :DeathPactException           ⇒ Stop
  case_ :Exception                    ⇒ Restart
}
Copy the code

When one of the parent Actor’s children crashes at runtime, it has two strategies:

  1. Handle only broken children,OneForOneStrategy. This applies when children perform independent tasks without sharing resources.
  2. All the children were treated,AllForOneStrategy. This method is applicable to children performing associated tasks and sharing resources.

Akka provides another built-in stop strategy, stoppingStrategy, which will stop any crashed Actor directly. It is a one-for-one strategy.

// The following two policies are equivalent. override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy(){case _ : Exception => Stop} override def supervisorStrategy: SupervisorStrategy = stoppingStrategyCopy the code

Custom Policies

Each monitor can make a different strategy based on the actual situation. For crashed actors, the monitor has four different policies:

  1. Resume: The least costly, easiest way to deal with. Ignore the error and proceed with the same Actor instance.
  2. Restart: Remove the previous Actor and replace it with a new instance,The ActorRef is going to stay the same over timeMailbox hangs briefly until the restart is complete.
  3. Stop: Deactivates subactor,Including the ActorRef that permanently discards it.
  4. Escalate: throws an error up, and the parent Actor decides how to handle it.

For exceptions that are not specified, Escalate is automatically thrown to the user’s guardian.

The Restart policy is tied to the Actor Restart event. The number of Restart times can be explicitly specified using the maxNrOfRetries parameter. When the number of retries exceeds this threshold, Actor is stopped instead of throwing an exception up. WithinTimeRange limits the time window for retries. For example, the following policy indicates that the Actor restarts at most five times in 10 seconds, otherwise it is deactivated.

override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy(maxNrOfRetries = 5,withinTimeRange = 10 second){
    case _ : Exception => Restart
}
Copy the code

Without any restrictions on Restart, the system can become stuck in a blocking loop due to mailbox poisoning, as shown in Experiment 5 below.

ActorSystem always restarts actors as quickly as possible, and Akka provides an alternative “back off reboot” approach that is more useful in some cases, but is not based on a reboot strategy. See below: BackoffSupervisor.

Actor always defaults to continue processing from the next message after Resume/Restart. If the Actor is terminated by the Stop policy, all other asynchronous messages to that Actor become dead letters.

Akka does not allow orphan actors, so stop or restart any Actor and its children will stop and restart from the bottom up.

BackoffSupervisor

This section is referenced from Classic Fault Tolerance • Akka Documentation.

The fallback restart policy applies to errors that are caused by an occasional external event, such as a database connection failure that causes a write to fail. It is wise to make the Actor wait a little while and then try again, rather than repeatedly Restart in a near-blocking manner.

A BackoffSupervisor is an Actor that establishes a monitoring relationship by receiving another child Actor Props that is being monitored at initialization. It acts as a message broker for this child Actor: The BackoffSupervisor receives messages from outside, but the actual logic for processing depends on the propped Child Actor.

Instead of using the Restart mechanism to Restart an Actor, BackoffSupervisor directly deactivates the crashed Actor and starts a new instance at an appropriate time. Therefore, the postStop and preStart methods are triggered during a backoff restart, but the preRestart and postRestart methods are not. All messages that continue the BackoffSupervisor become dead letters while waiting for the restart.

BackoffSupervisor has two trigger options that have strict usage scenarios:

  1. Backoff.onFailure: The monitored Actor triggers a backoff restart when it throws an Exception, not a passcontext.stop()PoisonPillSuch as normal stop and shut down.
  2. Backoff.onStop: The Actor to be monitoredStop by any meansAll trigger an evasive reboot.This option must be used by actors that stop themselves as an Exception signal instead of throwing exceptions.

In usage scenarios that don’t involve persistent actors, developers typically just need to select OnFailure. The following code gives an example of TestedActor being monitored by BackoffSupervisor:

val childProps: Props = Props[TestedActor05]
val supervisorProps: Props = BackoffSupervisor.props{
    Backoff.onFailure (
        childProps = childProps,    // Back up the monitoring child Props
        childName = "child".// child Props
        minBackoff = 3 second,      // Minimum retreat time
        maxBackoff =30 second,      // Maximum retreat time
        randomFactor = 0.2          // Random factor 0.2)}// It is the ActorRef of backoffSupervisor.
// The actual processing is internal ActorRef.
val ref: ActorRef = context.actorOf(backoffSupervisor, "backoff-supervisor")
Copy the code

The first retreat time is set to minBackoff, and the subsequent retreat time will be increased by multiples. In the above code, the retreat time will gradually accumulate to 3,6,12,24,30 (unit: s), and the maximum retreat time will not exceed maxBackoff.

Backout algorithm is very common in the underlying mechanism of computer network. Fallback algorithms also generally use jitter (random delay) to prevent continuous collisions, and this is no exception in BackoffSupervisor. For example, this prevents a simultaneous restart of a large number of actors from causing the load of the external database to surge at a certain moment. Use the randomFactor parameter to control the amount of random jitter.

Sometimes, developers may need more configuration and want to either cancel the backout restart and Stop if something goes wrong, or reset the backout time after it has been normal for a while. It is designed with withcontainer Strategy() and withAutoReset(), respectively:

val backoffSupervisor: Props = BackoffSupervisor.props {
    Backoff.onFailure(
        childProps = Props[TestedActor05],
        childName = "child-actor",
        minBackoff = 3 second,
        maxBackoff = 30 second,
        randomFactor = 0.2
    )
    // Refresh the retreat time after normal operation within 10s.
    .withAutoReset(10 second)
    // Mount a predefined monitoring policy.
    .withSupervisorStrategy(
        OneForOneStrategy() {case FatalException= >Stop})}Copy the code

See Experiment 6 for the complete test code.

In most situations, Actor hook methods, monitoring methods, and monitors are mixed together to build a complex and effective fault-tolerant mechanism. The following are all the unit tests covered in this section.

Experiment 1: Capture Terminated information through a monitor

The overall logic of the unit test:

  1. To establishTestActor02SupervisormonitoringTestedActor02The former is also a monitor for the latter (as mentioned earlier, monitoring and monitoring are not contradictory).
  2. Create a Stop event.
  3. Cause the monitor to receiveTerminated(child)The message.

The entire event flow is annotated in the comments, and the TestActor02Supervisor is both monitor and monitor in this test case.

class StopStrategyTest extends TestKit(ActorSystem("testSystem"))
with WordSpecLike
with MustMatchers
with StopSystemAfterAll {
    "The TestedActor02" must {
        "send 'Terminated' to its supervisor when it is broken." in {
            val ref: ActorRef = system.actorOf(Props(new TestActor02Supervisor(testActor)), "supervisor-01")
		   // 1. Create child
            ref ! NewActor
            // 3. The command is abnormal
            ref ! ThrowEx
            // 9. Verify that the test succeeded.
            expectMsg(TestOk)}}}class TestActor02Supervisor(out : ActorRef) extends Actor with ActorLogging {
    // 6. The monitor catches an exception and executes the Stop policy.
    override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy() {case _ : Exception= >Stop}
    override def receive: Receive = {
        // 2. Create child to establish monitoring relationship
        case NewActor= >val child = context.actorOf(Props[TestedActor02]."child-1"); context.watch(child)// 4. Make child throw an exception
        case ThrowEx => context.child("child-1").get ! ThrowEx
        TestActor is testk. TestActor is testk. TestActor is testk. TestActor is testk
        case Terminated(child) => {
            log.info("the child actor[{}] is terminated.",child.path)
            out ! TestOk}}}class TestedActor02 extends Actor with ActorLogging{
    // 5. Throw an exception
    override def receive: Receive = {case ThrowEx= >throw new Exception("Designed Exception")}
    // 7. Run postStop to send the Termianted message to the monitor before destruction
    override def postStop() :Unit = log.info("TestedActor02 will shut down.")}Copy the code

Experiment 2: Verify that the Restart policy does not trigger Terminated

The idea behind this unit test is simple: generate a Restart event and the monitor, if it receives Terminated(), sends a test failure message to the unit test testActor. To see the experimental results more clearly, you can set some side effects in the hook method in TestedActor.

class TerminatedTest extends TestKit(ActorSystem("testSystem"))
with MustMatchers
with WordSpecLike
with StopSystemAfterAll {
    "The TestActor04supervisor" must {
        "get no message like `Terminated(child)`" in {
            val ref: ActorRef = system.actorOf(Props(new TestActor04Supervisor(testActor)), "supervisor-1")
            // 1. Create child
            ref ! NewActor
            // 3. Run the command to throw an exception
            ref ! ThrowEx
            TestActor is not receiving the message. The supervisor is not receiving the Terminated() message during the restart.
            expectNoMsg()

        }
    }
}

class TestActor04Supervisor(out : ActorRef) extends Actor with ActorLogging {
    // 6. Catch the exception and restart
    override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy() {case_ :Exception= >Restart}
    override def receive: Receive = {
        // 2. Generate child to establish monitoring relationship.
        case NewActor= > {val child: ActorRef = context.actorOf(Props[TestedActor04]."child-1")
            context.watch(child)
        }
        // 4. Run the child command to raise an exception
        case ThrowEx => context.child("child-1").get ! ThrowEx
        // If Terminated information is received during restart, the test fails to be sent to testActor.
        case Terminated(child) => {
            log.info("the child:{} was crashed.",child)
            out ! TestFailed
        }
        // 11. After receiving the message, the program ends.
        case Restart => log.info("the child restarted.")}}class TestedActor04 extends Actor with ActorLogging {
    // 5. Throw an exception
    override def receive: Receive = {case ThrowEx= >throw new Exception("Designed Exception.")}
    // run preRestart
    override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
    Call super.prerestart
        super.preRestart(reason, message)
        log.info("invoke preRestart")}// 8. Run postStop
    override def postStop() :Unit = log.info("invoke postStop")
    // 10. After the Restart, send a Restart message to the monitor
    override def postRestart(reason: Throwable) :Unit =  context.parent ! Restart
}


Copy the code

Experiment 3: Test the full Actor life cycle

The overall idea of the unit test:

  1. rewritepreStart.preRestart.postRestart.postStopMethod, each method to insert a little side effect.
  2. callsuper.preRestartsuper.postRestart.
  3. Generate a Restart event and observe the printing sequence of logs.
class LifecycleTest extends TestKit(ActorSystem("testSystem"))
with WordSpecLike
with MustMatchers
with StopSystemAfterAll {
    // Do not leave Spaces after this string, otherwise it will cause a Bug
    "The Tested Actor" must {
        "go through: <constructor>, preStart, postStop, preRestart, <constructor>, preStart, postRestart, postStop" in {
            val ref: ActorRef = system.actorOf(Props(new TestedActorSupervisor(testActor)), "supervisor-01")
            ref ! NewActor
            ref ! ThrowEx}}}class TestedActorSupervisor(out: ActorRef) extends Actor {
    override def receive: Receive = {
        case NewActor => context.actorOf(Props[TestedActor01]."child-01")
        case ThrowEx => context.child("child-01").get ! ThrowEx;
    }
    override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy() { case_ :Exception= >Restart}}class TestedActor01 extends Actor with ActorLogging {
    override def receive: Receive = {
        case ThrowEx= >throw new Exception("Designed Exception")}// This is equivalent to the Scala object's instantiation field.
    log.info("invoke constructor<TestedActor01>:{}".this.hashCode())

    override def preStart() :Unit = log.info("invoke preStart, the hashcode of this instance:{}".this.hashCode())
    override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
        super.preRestart(reason,message)
        log.info("invoke preRestart, the hashcode of this instance:{}".this.hashCode())
    }
    override def postRestart(reason: Throwable) :Unit = {
        super.postRestart(reason)
        log.info("invoke postRestart, the hashcode of this instance:{}".this.hashCode())
    }
    override def postStop() :Unit = log.info("invoke postStop, the hashcode of this instance:{}".this.hashCode())
}
Copy the code

To verify that the two actors are not the same instance after being restarted, the hook method always prints this.hashcode to the log. This unit test does not need to set assertions, just watch the log output order (as illustrated in the previous Actor life cycle).

Experiment 4: Send information via preRestart to the next instance

The general idea of the unit test is as follows:

  1. TestActorSupervisorTestedActorThrows an exception and crashes.
  2. Create a restart event.
  3. Record the time the crash occurred before the last instance was removed, throughselfTo the next instance.
  4. The next instance receives and records the time when the last crash occurredlastMsg

Note that without overwriting the supervisorStrategy monitor method, ActorSystem by default will always restart a crashed Actor an infinite number of times.

class RestartTest extends TestKit(ActorSystem("testSystem"))
with WordSpecLike
with MustMatchers
with StopSystemAfterAll {
    "A Tested Actor" must {
        "send crashTime to next Actor instance by `preRestart` method when it brakes." in {
            val ref: ActorRef = system.actorOf(Props(new TestActorSupervisor(testActor)))
		   // 1. Create child
            ref ! NewActor
            // 3. Run the command to throw an exception
            ref ! ThrowEx
            / / 10. Accept to testActor TestOk, testing success.
            expectMsg(TestOk)}}}class TestActorSupervisor(out: ActorRef) extends Actor with ActorLogging {
    // The default policy will try to restart TestedActor automatically.
    // 6. Run the Restart command by default
    // override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy(){}...
    override def receive: Receive = {
        // create child
        case NewActor => context.actorOf(Props[TestedActor]."child-1")
        // 4. Make child throw an exception
        case ThrowEx => context.child("child-1").get ! ThrowEx
        TestActor testk is testActor. // testActor is testActor
        case TestOk => out ! TestOk}}class TestedActor extends Actor with ActorLogging {
    var lastMsg: String = "ok"
    override def receive: Receive = {
        Testk is displayed. Testk is displayed. Testk is displayed.
        case ExInfo(exMessage) => {
            lastMsg = exMessage
            log.info(s"this actor has crashed in ${lastMsg} yet.")
            context.parent ! TestOk
        }
        // 5. Throw an exception
        case ThrowEx= >throw new Exception("Designed Exception")}// 7. The previous instance is destroyed and the next instance is sent an ExInfo message.
    override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
        super.preRestart(reason, message)
        val crashTime: String = new SimpleDateFormat("yyyy-MM-dd").format(new Date)
        self ! ExInfo(crashTime)
    }
}
Copy the code

Experiment 5: Recreate mailbox poisoning

This unit test made a simple change based on Experiment 4 to create a mailbox poisoning scenario: when the previous Actor instance restarts, it sends another ThrowEx message to the next Actor instance.

class PoisonMailboxTest extends TestKit(ActorSystem("testSystem"))
with MustMatchers
with WordSpecLike
with StopSystemAfterAll {
    "A TestActor03" must {
        "struggles in loop exception with default strategy" in {
            val ref: ActorRef = system.actorOf(Props[TestActor03Supervisor])
            // 1. Create child
            ref ! NewActor
            // 3. Run the command to throw an exception
            ref ! ThrowEx
            // There is no assertion here, let the main thread enter timeout waiting TIMED_WAITED state,
            // Observe the work of other threads.
            Thread.sleep(10000)}}}class TestActor03Supervisor extends Actor with ActorLogging {

    // 6. Default policy Restart
    
    override def receive: Receive = {
        // create child
        case NewActor => context.actorOf(Props[TestedActor03]."child-1")
        // 4. Make child throw an exception
        case ThrowEx => context.child("child-1").get ! ThrowEx}}class TestedActor03 extends Actor with ActorLogging {
    // 5. Throw an exception
    // 8. Throw an exception again to cause mailbox poisoning, and the program falls into an infinite loop step 6-8.
    override def receive: Receive = {case ThrowEx= >throw new Exception("Designed Exception")}
    Pass the exception message to the next instance.
    override def preRestart(reason: Throwable, message: Option[Any) :Unit = {
        super.preRestart(reason, message)
        log.info("send message to next instance:{}",message)
        self ! message.get
    }
}
Copy the code

As I mentioned earlier, by default, the system executes the restart policy an infinite number of times. So without limiting the restart strategy, the system risks getting clogged. On the other hand, be careful to handle message, the exception message that causes an Actor to crash, to avoid the Bug of looping exceptions during an Actor restart.

Experiment 6: Test BackoffSupervisor

This unit test is the complete code from the previous section on BackoffSupervisor. Pay attention to:

  1. TestedActorpreRstartpostRestartIf no, the Restart mechanism of BackoffSupervisor is different from that of Restart.
  2. Because of the setwithAutoReset(10 second)According to the printing time of the log, it can be found that the two retreat and restart times are about 3s.
  3. Because of the extra SettingswithSupervisorStrategyBackoffSupervisor receives a message thrown by the ActorFatalExceptionThen choose to stop rather than back off and restart.
class BackoffSupervisorTest extends TestKit(ActorSystem("system"))
with WordSpecLike
with MustMatchers
with StopSystemAfterAll{
  "A BackoffSupervisor" must {
    "waiting for a moment after crashed." in {

      val backoffSupervisor: Props = BackoffSupervisor.props {
        Backoff.onFailure(
          childProps = Props[TestedActor05],
          childName = "child-actor",
          minBackoff = 3 second,
          maxBackoff = 30 second,
          randomFactor = 0.2
        )
          .withAutoReset(10 second)
          .withSupervisorStrategy(
            OneForOneStrategy() {case FatalException= >Stop})}// Just declare it as a top-level Actor for testing purposes.
      val ref: ActorRef = system.actorOf(backoffSupervisor, "backoff-supervisor")

      // Send a message to this BackoffSupervisor Ref, which actually forwards it to the internal Child Actor.
      ref ! ThrowEx
      // Wait patiently for BackoffSupervisor to reset the retreat time.
      Thread.sleep(15000)
      // Throw an exception again
      ref ! ThrowEx

      // Allow for a bit of jitter to make the wait time slightly longer than 3s
      // If the retreat time is not reset (4s < 6s), it will lose the message.
      Thread.sleep(4000)
      ref ! ThrowFatalEx

      // Allow some time for the program to run
      Thread.sleep(10000)}}}class TestedActor05 extends Actor with ActorLogging {

  override def receive: Receive = {
    case ThrowEx= >throw new Exception("Designed Exception")
    case ThrowFatalEx= >throw FatalException
  }

  override def postStop() :Unit = {log.info("TestedActor shuts down.")}
  override def preStart() :Unit = {log.info("TestActor starts.")}

  // The two hook methods will not be called
  override def preRestart(reason: Throwable, message: Option[Any) :Unit = {log.info("pre-Starting.")}
  override def postRestart(reason: Throwable) :Unit = {log.info("post-Starting.")}}object FatalException extends Exception("Fatal Exception")
Copy the code

The resources

Akka delayed the restart of TEH TeH -CSDN blog

Introduction to Akka Framework – Zhihu (Zhihu.com)

22, talk about AKka (2) monitoring and monitoring _LLIANLIANpay blog -CSDN blog _AKka monitoring

Introduction to Akka Framework – Zhihu (Zhihu.com)