We will develop a small but complete Swift library for processing and serializing JSON data.
Project source: github.com/swiftdo/jso…
In the previous Swift code JSON parser (2), we successfully parsed JSON strings into JSON data. In this article, the third in the Swift Code JSON Parser series, we will optimize the parser above to make the parsing process clearer.
I have been studying the principles of compilation recently, so I will understand the process from a compilation perspective. There are usually three steps in front of a compiler:
- Lexical analysis: The process of dividing programs into tokens can be achieved by constructing finite automata.
- Parsing: Identifying the structure of a program and forming an abstract syntax tree (AST) that can be easily processed by a computer. It can be done by recursive descent.
- Semantic analysis: eliminate semantic ambiguity, generate some attribute information, so that the computer can generate object code according to these information.
Implementation summary
A JSON parser is essentially a state machine created from JSON grammar rules, with JSON strings as input and JSON objects as output. Above “Swift code a JSON parser (ii)” we are from the first character to start parsing, and then according to the syntax of JSON parsing into JSON. In this article, however, the process will be divided into two stages: lexical analysis and grammatical analysis.
In the lexical analysis phase, the goal is to parse the string into a sequence of tokens. Such as:
{
"name": "oldbirds"."level": 8
}
Copy the code
Lexical analysis results in a set of tokens:
{, name, :, oldbirds,, level, :, 8,}
In the syntax stage, it is to check whether the JSON format formed by the input Token sequence is legal. Because json is easy to parse, at this step we can interpret the syntax tree directly as the JSON to be generated.
Complete lexical parsing
The goal of lexical analysis is to parse a JSON string into a Token stream according to syntactic rules.
We first need to define the type of Token. When the lexical parser reads a word that conforms to the data type specified in JSON, the lexical parser generates tokens for the word that conforms to the word-formation rules. Therefore, through the definition of JSON, we can summarize the following Token types:
enum JsonToken {
case objBegin / / {
case objEnd // }
case arrBegin / / /
case arrEnd // ]
case null // null
case number(String) / / 1 2 | | - 3.0
case string(String) // "a"
case bool(String) // true false
case sepColon / / :
case sepComma // ,
}
Copy the code
Next we will implement the JSON string into a Token stream.
With the Token class defined, we need to implement the lexical parser:
/ / / word segmentation
struct JsonTokenizer {
private var input: String
private var currentIndex: String.Index
init(string: String) {
self.input = string
self.currentIndex = string.startIndex
}
/// The current character
private var current: Character? {
guard currentIndex < input.endIndex else {return nil}
return input[currentIndex]
}
/// move subscript
private mutating func advance(a) {
currentIndex = input.index(after: currentIndex)
}
/// go back to the previous one
private mutating func back(a) {
currentIndex = input.index(before: currentIndex)
}
/// Generate the token stream
mutating func nextToken(a) throws -> JsonToken? {
// Filter out Spaces
scanSpaces()
guard let ch = current else { return nil }
switch ch {
case "{":
advance()
return JsonToken.objBegin
case "}":
advance()
return JsonToken.objEnd
case "[":
advance()
return JsonToken.arrBegin
case "]":
advance()
return JsonToken.arrEnd
case ",":
advance()
return JsonToken.sepComma
case ":":
advance()
return JsonToken.sepColon
case "n":
let _ = try scanMatch(string: "null")
advance()
return JsonToken.null
case "t":
let str = try scanMatch(string: "true")
return JsonToken.bool(str)
case "f":
let str = try scanMatch(string: "false")
return JsonToken.bool(str)
case "\"":
let str = try scanString()
advance()
return JsonToken.string(str)
case _ where isNumber(c: ch):
let str = try scanNumbers()
return JsonToken.number(str)
default:
throw JsonParserError(msg: "Unparsed character:\(ch) - \(currentIndex)")}}private mutating func peekNext(a) -> Character? {
advance()
return current
}
mutating func scanString(a) throws -> String {
var ret:[Character] = []
repeat {
guard let ch = peekNext() else {
throw JsonParserError(msg: "ScanString error,\(currentIndex)An error")}switch ch {
case "\ \": // Handle escape characters
guard let cn = peekNext(), !isEscape(c: cn) else {
throw JsonParserError(msg: "Invalid special type of character")
}
ret.append("\ \")
ret.append(cn)
/// Handle Unicode encoding
if cn = = "u" {
try ret.append(contentsOf: scanUnicode())
}
case "\"": // If another quote is encountered, the string parsing is considered complete
return String(ret)
case "\r"."\n": // The incoming JSON string does not allow line breaks
throw JsonParserError(msg: "Invalid character\(ch)")
default:
ret.append(ch)
}
} while (true)}mutating func scanUnicode(a) throws- > [Character] {
var ret:[Character] = []
for _ in 0..<4 {
if let ch = peekNext(), isHex(c: ch) {
ret.append(ch)
} else {
throw JsonParserError(msg: "Unicode characters are not canonical\(currentIndex)")}}return ret
}
mutating func scanNumbers(a) throws -> String {
let ind = currentIndex
while let c = current, isNumber(c: c) {
advance()
}
if currentIndex ! = ind {
return String(input[ind..<currentIndex])
}
throw JsonParserError(msg: "ScanNumbers error:\(ind)")}/// skip the space
mutating func scanSpaces(a) {
var ch = current
while ch ! = nil && ch = = "" {
ch = peekNext()
}
}
mutating func scanMatch(string: String) throws -> String {
return try scanMatch(characters: string.map { $0})}mutating func scanMatch(characters: [Character]) throws -> String {
let ind = currentIndex
var isMatch = true
for index in (0..<characters.count) {
if characters[index] ! = current {
isMatch = false
break
}
advance()
}
if (isMatch) {
return String(input[ind..<currentIndex])
}
throw JsonParserError(msg: "ScanUntil not satisfied\(characters)")}func isEscape(c: Character) -> Bool {
// \" \\ \u \r \n \b \t \f
return ["\""."\ \"."u"."r"."n"."b"."t"."f"].contains(c)
}
/// Check if it is a numeric character
func isNumber(c: Character) -> Bool {
let chars:[Character: Bool] = ["-": true."+": true."e": true."E": true.".": true]
if let b = chars[c], b {
return true
}
if(c > = "0" && c < = "9") {
return true
}
return false;
}
/// Check whether it is a hexadecimal character
func isHex(c: Character) -> Bool {
return c > = "0" && c < = "9" || c > = "a" && c < = "f" || c > = "A" && c < = "F"}}Copy the code
The above processing logic, with “Swift code a JSON parser (two)” is the same, here is not too much introduction.
Complete parsing
The syntax analysis takes the Token sequence resolved in the lexical analysis stage as the input and outputs JSON Object or JSON Array.
// Syntax parsing
struct JsonParser {
private var tokenizer: JsonTokenizer
private init(text: String) {
tokenizer = JsonTokenizer(string: text)
}
static func parse(text: String) throws -> JSON? {
var parser = JsonParser(text: text)
return try parser.parse()
}
private mutating func parse(a) throws -> JSON? {
guard let token = try tokenizer.nextToken() else {
return nil
}
switch token {
/ / if [
case .arrBegin:
return try JSON(parserArr())
/ / if {
case .objBegin:
return try JSON(parserObj())
default:
return nil}}}Copy the code
The core of the parser implementation is parserArr and parserObj.
In parserArr, each element is separated by.sepcomma, and if.arrend is encountered, array has been read and the result is returned. Each element is a JSON data that can be processed recursively.
private mutating func parserArr(a) throws- > [JSON] {
var arr: [JSON] = []
repeat {
guard let ele = try parseElement() else {
throw ParserError(msg: "ParserArr parsing failed")}/// as an element
arr.append(ele)
guard let next = try tokenizer.nextToken() else {
throw ParserError(msg: "ParserArr parsing failed")}/// If the value is], the description is complete
if case JsonToken.arrEnd = next {
break
}
/// If the next element is not ', ', then it does not match the json array definition and throws an exception
if JsonToken.sepComma ! = next {
throw ParserError(msg: "ParserArr parsing failed")}}while true
return arr
}
Copy the code
The parserObj method, also separated by.sepcomma, completes object parsing when.objend is read. The difference with parserArr is that each element is a key-value pair.
private mutating func parserObj(a) throws- > [String: JSON] {
var obj: [String: JSON] = [:]
repeat {
guard let next = try tokenizer.nextToken(), case let .string(key) = next else {
throw ParserError(msg: "ParserObj error, key not found")}if obj.keys.contains(key) {
throw ParserError(msg: "ParserObj error already existing key:\(key)")}guard let comma = try tokenizer.nextToken(), case JsonToken.sepColon = comma else {
throw ParserError(msg: "ParserObj error, : does not exist")}guard let value = try parseElement() else {
throw ParserError(msg: "Error parserObj, value not found")
}
obj[key] = value
guard let nex = try tokenizer.nextToken() else {
throw ParserError(msg: "Error parserObj, next value does not exist")}if case JsonToken.objEnd = nex {
break
}
if JsonToken.sepComma ! = nex {
throw ParserError(msg: "ParserObj error, does not exist")}}while true
return obj
}
Copy the code
Parse one or more tokens into the corresponding JSON type based on the JSON definition.
private mutating func parseElement(a) throws -> JSON? {
guard let nextToken = try tokenizer.nextToken() else {
return nil
}
switch nextToken {
case .arrBegin:
return try JSON(parserArr())
case .objBegin:
return try JSON(parserObj())
case .bool(let b):
return .bool(b = = "true")
case .null:
return .null
case .string(let str):
return .string(str)
case .number(let n):
if n.contains("."), let v = Double(n) {
return .double(v)
} else if let v = Int(n) {
return .int(v)
} else {
throw ParserError(msg: "Number conversion failed")}default:
throw ParserError(msg: "The unknown element.\(nextToken)")}}Copy the code
test
complete
let str = "{ \"a\": [8, 9, 10].\"c\": {\"temp\":true,\"say\":\"hello\".\"name\":\"world\"}, \"b\": 10.2}"
/// Parse through PASer2.0
do {
if let result2 = try JsonParser.parse(text: str) {
print("\n\n✅ Parser2.0 returns result:")
print(prettyJson(json: result2))
} else {
print("\n\n❎ Parser2 parsed empty")}}catch {
print("\n\n❎ Paser2.0 error:\(error)")}Copy the code
The results show:
✅ Parser2. 0Return result: {"a": [8.9 -.10]."c": {"name":"world"."say":"hello"."temp":true
},
"b":10.2
}
Copy the code
conclusion
Through lexical analysis and grammar analysis, the parsing task is divided into two stages, which greatly improves the code clarity. Lexical parsing hides subscript movement, as well as filtering out useless characters that are passed to parsing as meaningful data. Parsing is only about JSON recognition, not cursor movement.
Compared to the analytical steps in the previous article, the method in this article is undoubtedly more methodical. Although the performance will be weaker, but the clear and clear process is more admired, for similar parsing, can be applied.