Parser Combinator can be easily constructed using C++ constexpr capabilities, which unlock great potential for User defined literal strings.

# # the introduction

I recently saw a Talk at CppCon: Constexpr All the things, C++ constexpr All the things, C++ constexpr All the things, C++ constexpr All the things, C++ constexpr All the things, C++ constexpr This greatly improves compile time computing power.

The constexpr feature dates back to C++11, when there were a lot of constraints, a single return statement, and all you could do was simply recursively implement math and hash functions. By C++14, this constraint was loosened, allowing functions to behave like normal functions, and the community produced a series of constexpr libraries; In C++17, constexpr was generalized, allowing if constexpr to replace the SFINAE approach of metaprogramming. Some algorithms in the STL library support constexpr. Even lambda is constexpr by default. Constexpr destructor supports constExpr new. STL vectors are constExpr vectors. Constexpr allocator and CONSTExpr destructor support constExpr new. Then we can unify all constExpr containers.

Parser Combinator is easy to construct with the help of C++ constexpr capabilities, and implementing a Parser is not that complicated, unlocking huge potential for User defined strings, which is the focus of this article.

## What is Parser

Parser is a Parser function that inputs a string and outputs a parsed set of type values. The signature of the function is as follows:

```haskell
Parser a:: String -> [(a, String)]
```
Copy the code

For simplicity, here we consider printing only zero or a type value instead of a collection, so the signature looks like this:

```haskell
Parser a:: String -> Maybe (a, String)
```
Copy the code

For example, a digital Parser parses the input string “123456” and outputs Just (1, “23456”), yielding the number 1 and the remaining string “23456” for use by the next Parser. If parsing fails, None is printed.

The corresponding C++ function signature is as follows:

```cpp // Parser a :: String -> Maybe (a, String) using ParserInput = std::string_view; template <typename T> using ParserResult = std::optional<std::pair<T, ParserInput>>; template <typename T> using Parser = auto(*)(ParserInput) -> ParserResult<T>; ` ` `Copy the code

That’s the definition of Parser.

We can implement several basic parsers by definition, such as matching a given character:

```cpp constexpr auto makeCharParser(char c) { // CharParser :: Parser Char return [=](ParserInput s) -> ParserResult<char> { if (s.empty() || c ! = s[0]) return std::nullopt; return std::make_pair(s[0], ParserInput(s.begin() + 1, s.size() - 1)); }; }; ` ` `Copy the code

MakeCharParser is a factory that, given a character c, creates an Parser that matches c.

Matches characters in a given collection:

```cpp constexpr auto oneOf(std::string_view chars) { // OneOf :: Parser Char return [=](ParserInput s) -> ParserResult<char> { if (s.empty() || chars.find(s[0]) == std::string::npos) return std::nullopt; return std::make_pair(s[0], ParserInput(s.begin() + 1, s.size() - 1)); }; } ` ` `Copy the code

## What is Parser Combinator

Parser is the smallest unit that can be combined, and any complex Parser can be combined from one Parser to another. Parser Combinator is a higher-order function that inputs a series of parsers and outputs a new combined Parser.

By definition, a Combinator can combine two parsers and compute a new result based on the results of both parsers to get a new Parser:

```cpp // combine :: Parser a -> Parser b -> (a -> b -> c) -> Parser c template<typename P1, typename P2, typename F, typename R = std::invoke_result_t<F, Parser_t<P1>, Parser_t<P2>>> constexpr auto combine(P1&& p1, P2&& p2, F&& f) { return [=](ParserInput s) -> ParserResult<R> { auto r1 = p1(s); if (! r1) return std::nullopt; auto r2 = p2(r1->second); if (! r2) return std::nullopt; return std::make_pair(f(r1->first, r2->first), r2->second); }; }; ` ` `Copy the code

Since C++ supports operator overloading, it is possible to override a binary operator to combine two parsers. for example, taking one Parser from two parsers produces a new Parser:

Take the result of the Parser on the left:

```cpp // operator> :: Parser a -> Parser b -> Parser a template<typename P1, typename P2> constexpr auto operator>(P1&& p1, P2&& p2) { return combine(std::forward<P1>(p1), std::forward<P2>(p2), [](auto&& l, auto) { return l; }); }; ` ` `Copy the code

Take the result of the Parser on the right:

```cpp // operator< :: Parser a -> Parser b -> Parser b template<typename P1, typename P2> constexpr auto operator<(P1&& p1, P2&& p2) { return combine(std::forward<P1>(p1), std::forward<P2>(p2), [](auto, auto&& r) { return r; }); }; ` ` `Copy the code

Sometimes the same Parser needs to be matched multiple times, similar to the regular expression * operation. This operation can be seen as a fold, executing the Parser multiple times until the match fails, and each time the result is passed to a function:

```cpp // foldL :: Parser a -> b -> (b -> a -> b) -> ParserInput -> ParserResult b template<typename P, typename R, typename F> constexpr auto foldL(P&& p, R acc, F&& f, ParserInput in) -> ParserResult<R> { while (true) { auto r = p(in); if (! r) return std::make_pair(acc, in); acc = f(acc, r->first); in = r->second; }}; ` ` `Copy the code

With the fold function, it’s easy to implement functions that match any number of times and atLeast once:

```cpp // many :: Parser a -> Parser monostate template<typename P> constexpr auto many(P&& p) { return [p=std::forward<P>(p)](ParserInput  s) -> ParserResult<std::monostate> { return detail::FoldL(p, std::monostate{}, [](auto acc, auto) { return acc; }, s); }; }; ``` ```cpp // atLeast :: Parser a -> b -> (b -> a -> b) -> Parser b template<typename P, typename R, typename F> constexpr auto atLeast(P&& p, R&& init, F&& f) { static_assert(std::is_same_v<std::invoke_result_t<F, R, Parser_t<P>>, R>, "type mismatch!" ); return [p=std::forward<P>(p), f=std::forward<F>(f), init=std::forward<R>(init)](ParserInput s) -> ParserResult<R> { auto r = p(s); if (! r) return std::nullopt; return detail::foldL(p, f(init, r->first), f, r->second); }; }; ` ` `Copy the code

There’s also an operation that matches zero to one, like a regular expression, right? The operation, which I define as the option operation:

```cpp // option :: Parser a -> a -> Parser a template<typename P, typename R = Parser_t<P>> constexpr auto option(P&& p, R&& defaultV) { return [=](ParserInput s) -> ParserResult<R> { auto r = p(s); if (! r) return make_pair(defaultV, s); return r; }; }; ` ` `Copy the code

With the basics, let’s see how it works.

# # of actual combat

### parse the values

In the project, there was a lot of template metaprogramming. Before C++17, the template Dependent type (non-type parameter) did not support double. The temporary solution was to use template

struct NumWrapper {}; To simulate the type of double and get its value, you need to parse the string. This should be done at compile time.

The first match is the symbol +/-, if there is no symbol, it is considered + :

```cpp constexpr auto sign = Option(OneOf("+-"), '+'); ` ` `Copy the code

The second part is the integer part, which may or may not be there. If not, it is considered to be 0:

```cpp constexpr auto number = AtLeast(OneOf("1234567890"), 0l, [](long acc, char c) -> long { return acc * 10 + (c - '0'); }); constexpr auto integer = Option(number, 0l); ` ` `Copy the code

If there is no decimal point, return a long in order not to lose precision.

```cpp constexpr auto point = MakeCharParser('.'); // integer if (! (sign < integer < point)(in)) { return Combine(sign, integer, [](char sign, long number) -> R { return sign == '+' ? number : -number; })(in); } ` ` `Copy the code

If there is a decimal point, it is considered a floating-point number and returns a double.

```cpp // floating constexpr auto decimal = point < Option(number, 0l); Constexpr Auto Value = Combine(integer, decimal, [](long integer, long decimal) -> double {double d = 0.0; constexpr auto Value = Combine(integer, decimal, [](long integer, long decimal) -> double {double d = 0.0; While (decimal) {d = (d + (decimal % 10)) * 0.1; decimal /= 10; } return integer + d; }); return Combine(sign, value, [](char sign, double d) -> R { return sign == '+' ? d : -d; })(in); Because the Parser may return either long or double, it can be both STD :: Variant:  ```cpp constexpr auto ParseNum() { using R = std::variant<double, long>; return [](ParserInput in) -> ParserResult<R> { // ... }; } ` ` `Copy the code

We ended up with the following NumWrapper implementation that could be incorporated into the template type system:

```cpp template<char... Cs> constexpr std::array<char, sizeof... (Cs)> ToArr = {Cs... }; template<char ... Cs> class NumberWrapper { public: constexpr static auto numStr = ToArr<Cs... >; constexpr static auto res = ParseNum()(std::string_view(numStr.begin(), numStr.size())); static_assert(res.has_value() && res->second.empty(), "parse failed!" ); public: constexpr static auto value = std::get<res->first.index()>(res->first); // long or double } ```Copy the code

If it was just for parsing numbers, that would be crazy, because in previous versions of Parser Combinator, I did it ina normal constexpr function, and the code was boring, but NOW I might want to go back to it.

### Json parsing guide

This time around, the theme of CppCon is to parse JSON strings at compile time, using string_view to load strings directly. Then construct some constExpr containers, such as a fixed-length Vector for constExpr. Since we are 17 years old, we can only do this if we don’t support constExpr new. With a CONSTExpr vector, we can then construct a map, which is a very simple pair vector collection.

Furthermore, Parser Combinator is proposed to parse string and Fmap into JSON data structure.

When initially implemented, the JSON data structure is also a large template

struct Json_Value; Template hosting, which results in specifying only the maximum number of recursion levels, is not practical. Then talker to a very clever way to get rid of the layer number of constraints, is to recursively sizes () scan again, calculate the number of all the values, so you can determine how much a Value container to store, then calculate the length of the string, because of the influence of the UTF8, escaped string, eventually to parse is probably less than the length of the length of the input. Once the space is determined, a second recursive value_recur

: value_parser() scan is performed, filling in the Value data structure each time the full Value is parsed. Since arrays are similar to objects, they may be nested, so a third recursive extent_recur<>:: value_Parser () scan is performed. A width-first search is performed to determine the number of outermost elements, and the values are then parsed.
,>

Click to follow, the first time to learn about Huawei cloud fresh technology ~