• Expressive Code for State Machines in C++
  • Jonathan Boccara
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: zh1an
  • Proofreader: TodayCoder001, PingHGao

C++ clear state machine code

This is Valentin Tolmer’s guest post. Valetin, a software engineer at Google, tries to improve the quality of the code around him. He was influenced by template programming as a young man and now works exclusively on metaprogramming. You can find some of his work on GitHub, especially the ProtEnc library covered in this article.

Have you ever come across comments like this?

// Important: do not call SetUp() before calling it!
Copy the code

Or do something like this:

if  (my_field_.empty())  abort(a);Copy the code

These are all endemic to the protocols our code must follow. Sometimes an explicit protocol that you are complying with will also have a requirement for state checking, such as in SSL handshakes or other business logic implementations. Or maybe you have a state machine in your code that specifies state transitions, and that state machine needs to do transition state checks each time against a list of possible transitions.

Let’s take a look at how we can deal with this scenario unambiguously.

For example, establish an HTTP connection

Our example today is to build an HTTP connection. To simplify things considerably, let’s just say that our join request contains at least one header (and perhaps more), one and only one body, and that these headers must be specified before the body (for example, we’re only writing an appended data structure for performance reasons).

Note: Although thisspecificThe problem can be solved by passing the right arguments to the constructor, and I don’t want to overcomplicate the protocol. You’ll see how easy it is to extend it.

For the first time:

class  HttpConnectionBuilder  {
 public:
  void  add_header(std: :string  header)  {
    headers_.emplace_back(std::move(header);
  }
  // Important: add_header must be called at least once before it can be called
  void  add_body(std: :string  body)  {
    body_  =  std::move(body);
  }
  // Important: this can only be called after add_body is called
  // Consumer object
  HttpConnection build()  &&  {
    return  {std::move(headers_),  std::move(body_)};
  }
 private:
  std: :vector<std: :string>  headers_;
  std: :string  body_;
};
Copy the code

So far, this example has been fairly simple, but it relies on users not doing the wrong thing: if they haven’t read the document in advance, there’s nothing to stop them from adding another header after the body. Put it in a 1000-line file and you’ll quickly see how bad it is. Worse, there is no check to see if a class is being used correctly, so the only way to see if a class is being misused is to see if it has an unintended effect! If it causes memory corruption, good luck with your debugging.

We can do better…

Using dynamic enumeration

In general, the protocol can be represented by a finite state machine: the state machine starts when we don’t add any headers (the START state), and there is only one option to add headers. Then enter at least one header (header state), where you can either add additional headers to keep the state, or add a body to get into the body state. Only in the BODY state can we call build to get to the final state.

So, let’s write these ideas into our class!

enum  BuilderState  {
  START,
  HEADER,
  BODY
};
class  HttpConnectionBuilder  {
  void  add_header(std: :string  header)  {
    assert(state_  ==  START  ||  state_  ==  HEADER);
    headers_.emplace_back(std::move(header)); state_ = HEADER; }...private: BuilderState state_; . };Copy the code

The same is true for all other functions. This is good enough: we have a certain state that tells us what kind of transformation is possible, and we check it. Of course, you have well-thought-out test cases for your code, right? If your tests have adequate coverage of your code, you will be able to catch any violations while testing. You can also enable these checks in production to ensure you don’t deviate from the protocol (controlled crashes are better than memory corruption), but you have to pay for the added checks.

Using typestates

How can we catch these errors faster and with 100% accuracy? Let the compiler do the work! Next, I introduce the concept of typestates.

Roughly speaking, typestates encode the state of an object as its own type. Some languages do this by implementing a separate class for each state (e.g. HttpBuilderWithoutHeader, HttpBuilderWithBody, and so on), but this can get very verbose in C++ : We have to declare constructors, remove copy functions, convert one object to another… And it will soon expire.

But C++ has another trick up its sleeve: templates! We can encode state in an enum and use that enum to template the constructor. This results in the following code:

template  <BuilderState  state>
class  HttpConnectionBuilder  {
  HttpConnectionBuilder<HEADER> 
  add_header(std: :string  header)  &&  {
    static_assert(state  ==  START  ||  state  ==  HEADER, 
      "add_header can only be called from START or HEADER state");
    headers_.emplace_back(std::move(header));
    return  {std::move(*this)}; }... };Copy the code

Here we statically check to see if the object is in the correct state, and invalid code doesn’t even compile! And we get a pretty clear error message. Every time we create a new object corresponding to the target state, we also destroy the object corresponding to the previous state: You call add_header on an object of type HttpConnectionBuilder

, but you will get a return value of type HttpConnectionBuilder

. This is the core idea of typestates.

Note: This method can only be called in an rvalue reference (r-values) (STD ::move, which is what the && at the end of the function declaration line does). Why is that? It compulsorily destroys the previous state, so that only one related state can be obtained. Think of it as unique_ptr: you don’t want to copy an internal component and get an invalid state. Just as unique_Ptr has only one owner, typeStates must have only one state.

With this, you can write:

auto connection  =  GetConnectionBuilder()
  .add_header("first header")
  .add_header("second header")
  .add_body("body")
  .build();
Copy the code

Any deviation from the protocol will result in compilation failure.

Here are a few rules to follow no matter what:

  • All of your functions must use objects referenced by rvalues (e.g*thisIt must be an rvalue reference and must have an rvalue at the end&&).
  • You may want to disable the copy function unless it makes sense to jump to the protocol intermediate state (that’s why we have rvalue references, after all).
  • You need to declare your constructor private and add a factory function to make sure people don’t create an object with no start state.
  • You need to add the move constructor as a friend and implement it into another state, without which you can move objects from one state to another at will.
  • You need to make sure that you have added checks to each function.

All in all, implementing these correctly from scratch is a little trickier, and in natural growth, you probably don’t want 15 different homemade typestates implementations. If only there was a framework that could easily and safely declare these type states!

ProtEnc library

This is where ProtEnc(Short for Protocol Encoder) comes in. With a surprising number of templates, the library allows you to easily declare classes that implement typestate checking. To use it, you need your (unchecked) protocol implementation, which is the first class we implemented with all the “important” annotations.

We will add a wrapper class to this class that has the same interface but adds type checking. The wrapper class will contain such things as possible initialization states, transitions, and final states in its type. Each wrapper class function simply checks to see if the conversion works and then perfectly forwards the call to the next object. None of this includes pointer indirection, runtime components, or memory allocation, so it’s completely free!

So, how do we declare this wrapper class? First, we have to define a finite state machine. This consists of three parts: initial state, transition, and final state or transition. The list of initial states is just a list of our enumerated types, like the following:

using  MyInitialStates  =  InitialStates<START>;
Copy the code

For transitions, we need to initialize the state, the final state, and the function that performs the state transition:

using  MyTransitions  =  Transitions<
  Transition<START,  HEADERS,  &HttpConnectionBuilder::add_header>,
  Transition<HEADERS,  HEADERS,  &HttpConnectionBuilder::add_header>,
  Transition<HEADERS,  BODY,  &HttpConnectionBuilder::add_body>>;
Copy the code

For the final transition, we also need a state and a function:

using  MyFinalTransitions  =  FinalTransitions<
  FinalTransition<BODY,  &HttpConnectionBuilder::build>>;
Copy the code

This additional “FinalTransitions” is because we may define multiple “FinalTransitions.”

Now we can declare the type of our wrapper class. Some templates are inevitably hidden by macro definitions, but they are mostly base class constructs or meta-declarations.

PROTENC\_DECLARE\_WRAPPER(HttpConnectionBuilderWrapper,  HttpConnectionBuilder,  BuilderState,  MyInitialStates,  MyTransitions,  MyFinalTransitions);
Copy the code

This is the expanded scope (a class) in which we can forward our function:

PROTENC\_DECLARE\_TRANSITION(add_header);
PROTENC\_DECLARE\_TRANSITION(add_body);
PROTENC\_DECLARE\_FINAL_TRANSITION(build);
Copy the code

Then close the scope.

PROTENC\_END\_WRAPPER;
Copy the code

(That’s just a close parenthesis, but you don’t want mismatched parentheses, do you?)

With this simple but extensible setup, you can use it just as you used the wrapper in the previous step, and everything will be checked. 🙂

auto connection  =  HttpConnectionBuilderWrapper<START>{}
  .add_header("first header")
  .add_header("second header")
  .add_body("body")
  .build();
Copy the code

Attempting to call functions in the wrong order will result in a compilation error. Don’t worry, careful design ensures that the first error message is readable 😉. For example, remove the.add_body(“body”) line and you get the following error:

In file included from example/http_connection.cc:6:

SRC /protenc.h: In instantiation of 'struct  prot_enc::internal::return\_of\_final\_transition\_t< prot_enc: : internal: : NotFound HTTPConnectionBuilder > ': SRC/protenc. H:273:15:     required by  ...
example/http_connection.cc:174:42:     required from here
src/protenc.h:257:17:  error:  static  assertion failed:  Final  transition not  found
   static_assert(!std::is\_same\_v<T,  NotFound>,  "Final transition not found");
Copy the code

Just make sure that the wrapper classes can only be constructed from the wrapper, and the entire code base will work correctly!

If your state machine is encoded in a different form (or if it becomes too large), generating code that describes it is simple, because all transitions and initial states are clustered in a format that is easy to read/write.

The full code sample can be found at GitHub. Please note that this code cannot now use Clang because of Bug #35655.

You’ll love it, too

  • TODO_BEFORE(): A Cleaner Codebase for 2019
  • How to Disable a Warning in C++
  • Curried Objects in C++

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.