In the last article, we completed the creation of the AST. Next, we’ll define Parser in detail and give a generic implementation of single-character Parser. If in doubt, read together against the implementation of the project code.

Define the correct posture of the Parser

Parser takes strings and parses them into AST structures, according to the requirements in this example. To define it, use a function:

Parser : String -> AST
Copy the code

Since parsing is a step-by-step process, not all parses can be done at once, we have an improved version of the Parser to show the idea of step-by-step parsing:

Parser : String -> (AST, String)
Copy the code

The input is a String, and the output is a tuple, followed by the parsed AST. The strings that follow represent the strings that are left after parsing the previous AST.

Further refinement, since it is parsed step by step, even a most common AST, such as Exp.Constant(1234) before, parses individual numbers and combines them into integers before forming an AST. To represent something more granular, like parsing a single number. We continue to improve:

Parser : String -> (a,String)
Copy the code

Compared to the last version, we changed AST to a, which is a generic variable, which stands for any type. That is, Parser can be character, String, Int, AST, and so on.

So now we have something that looks good. Now the question is, how do we represent parsing failure? Return some kind of special (ERROR, String)? Or return (a,””)? To indicate parsing failure, we change the structure as follows:

Parser : String -> [(a,String)]
Copy the code

Compared to the previous version, we returned a list of tuples instead of a tuple. This allows an empty list [] to indicate parsing failure. If the parse succeeds, a list of individual tuples is returned. As for why not use the above two methods, readers can look at the next article with this question in mind.

Define Parser in Swift

Given the form of our Parser definition, we can’t imagine the so-called Parser type being a function. For writing purposes, we put it in a structure.

An important way to think about functional programming is to start thinking about the meaning of code in terms of types.

struct Parser<a>{
    let p : String -> [(a,String)]}Copy the code

Look at the Parser structure:

  • First you need a generic parameteraRepresents the type of result parsed by the Parser.
  • I have a member variable,pIts type is a function that takes a String as input and returns [(a,String)]

Implement the first Parser

Let’s start with the simplest example, such as defining a Parser that parses the character “H”. What does that mean? It is this parser that attempts to parse a character H from the string (starting at the beginning of the string), returning [(“H”, the rest of the string)] if it succeeds, and [] if it fails.

In other words, the Parser checks if the string starts with an H, succeeds if it does, and fails otherwise.

let par1 = Parser<Character> { x in
    guard let head = x.characters.first where head == "H" else{
        return[]}return [("H",String(x.characters.dropFirst()))]
}
Copy the code

To create a Parser, first use the constructor of the Parser and specify the type of the generic. Such as the Parser < Character > (…). . Because the structure has only one member and that member is a function type… What you pass in is a closure. Is the Parser < Character > ({… }), and because of Swift’s tail closure nature, closures can be moved from () to the outside, so the final form of constructing Parser is Parser

{x in… }, the code above is structured like this, and finally assigns the Parser to the variable par1.

Now let’s see what the closure for creating the Parser looks like. The closure is of type String->[(a,String)]. We name the input parameter x, which represents the String to be parsed. We use a guard to see if the initial is H. Return [], otherwise construct [(a,String)] return. In this case, the a is the character “H”, and the String is the String with the initial letter removed (because the corresponding part of the input String will be consumed if parsing succeeds).

Test it out.

print(par1.p("HELLO WORLD"))
print(par1.p("NIHAO"))
Copy the code

Output:

[("H"."ELLO WORLD")]
[]
Copy the code

We tried to parse two strings using par1. The first one started with H, so it succeeded, and returned H and the remaining “ELLO WORLD” tuple. The other one failed.

Parser generators

If you think about it, par1, which we just did, doesn’t actually have any use for eggs. Now try modifying the form to make it slightly useful:

func parserChar(c : Character) -> Parser<Character>{
    return Parser { x in
        guard let head = x.characters.first where head == c else{
            return[]}return [(c,String(x.characters.dropFirst()))]
    }
}
Copy the code

We defined a function, parserChar, to analyze how it works from its type. It takes a single character and returns a Parser, equivalent to parserChar being a “single character Parser” constructor. For example, calling parserChar(“E”) generates an Parser that parses character E. This seems to be a lot more generic than the Parser we just implemented in the Hardcode version.

As usual, let’s test this:

let result = parserChar("H").p("HELLO WORLD")
print(result)
let result1 = parserChar("E").p(result[0].1)
print(result1)
Copy the code

Output:

[("H"."ELLO WORLD")]
[("E"."LLO WORLD")]
Copy the code

Here we feed the results of the first Parser to the second parser. Then the second Parser successfully parsed the character “E” from this.

Abstract without end

Still with a single character Parser. In parsing, we tend to parse not just a particular character, but a class of characters, such as all numeric characters, or operator characters. For such requirements, the result of our first abstraction, parserChar, seems inadequate. Let’s modify it slightly:

func satisfy(condition : Character -> Bool) -> Parser<Character>{
    return Parser { x in
        guard let head = x.characters.first where condition(head) else{
            return[]}return [(head,String(x.characters.dropFirst()))]
    }
}
Copy the code

To be more general, we changed the function name to satisfy, but the internal implementation is pretty much the same. The difference is that instead of accepting a single character as an argument, we now accept a function condition, which function? Character->Bool Bool indicates whether a Character meets the condition, and returns true if it does, false otherwise. That means we can define the range of characters using condition. Then, when we implement it, instead of using the first letter to do some sort of parsing, we pass the first letter to condition, which tells Parser if the first letter is qualified or not.

For example, suppose there is a function isDigit(Character) -> Bool to determine whether a Character is a number. We can satisy(isDigit) to build a Parser that recognizes any number. Let’s try it:

Implement isDigit first

func isDigit(c : Character) -> Bool{
    let s = String(c)
    returnInt(s) ! = nil }Copy the code

The Parser is then constructed via satisfy

let pNum = satisfy(isDigit)
print(pNum.p("abcde"))
print(pNum.p("1asd"))
print(pNum.p("6asd"))
print(pNum.p("12asd"))
Copy the code

Output:

[]
[("1"."asd")]
[("6"."asd")]
[("1"."2asd")]
Copy the code

The isDigit implementation is simple enough to ignore, but later we constructed a Parser pNum that recognizes any number through SATISFY and then tried to parse four strings separately. The first one didn’t contain a number, so it failed. Each of the following are resolved to a different number. At least the reason why the following 2 is not parsed together in the last example is that multi-character logic is not yet implemented.

Organize your tasks for the day

Much has been said today, but in terms of Parser implementation progress, we just added the following code:

struct Parser<a>{
    let p : String -> [(a,String)]

}

func isDigit(c : Character) -> Bool{
    let s = String(c)
    returnInt(s) ! = nil } func satisfy(condition : Character -> Bool) -> Parser<Character>{return Parser { x in
        guard let head = x.characters.first where condition(head) else{
            return[]}return [(head,String(x.characters.dropFirst()))]
    }
}
Copy the code

PS:

  • Functional programming can be tricky to understand at first, but once you get used to it, it’s easy to write very concise code.
  • Satisfy is actually very powerful. Try using different conditions to generate different Parser classes

How powerful? You can download the entire project and see how it is used first.


Want to see more? You can pay attention to my Zhihu