SwiftSyntax is a Swift library based on libSyntax that can be used to analyze, generate and transform Swift code. There are already open source libraries based on it, such as SwiftRewriter for automatic formatting of code (including simple code optimizations based on code specifications).

Swift, the compiler

Swift compilers are divided into front end and back end, both in LLVM architecture (The front end of the Objective-C compiler is Clang, and the back end is also LLVM). The following is the Swift compiler structure:

The stages are explained below

phase explain role
Parse Syntax analysis The parser analyzes Swift source code word by word and generates an Abstract Syntax Tree (AST) without semantic and type information. The AST generated at this stage also does not contain warnings and error injection.
Sema Semantic analysis The semantic parser does its work and generates a type-checked AST, embedding warnings and errors in the source code
SILGen Swift intermediate language generation Canonical SIL is generated after the Swift Intermediate Language Generation (SILGen) phase converts the AST generated through semantic analysis into a Raw SIL and performs some optimizations on the Raw SIL (such as generic specialization, ARC optimization, etc.). SIL is a Swift custom intermediate language that has been optimized for Swift in a number of ways to improve Swift performance. SIL is also the essence of the Swift compiler.
IRGen Generate an intermediate language for LLVM Demote SIL to LLVM IR, the intermediate language of LLVM
LLVM Back end under LLVM compiler architecture The first several stages belong to Swift compiler, equivalent to Clang in OC, and belong to the front end of LLVM compiler architecture, where LLVM is the back end of compiler architecture, further optimize LLVM IR and generate object files (.o).

SwiftSyntax

SwiftSyntax operation target is the AST generated in the first step of compilation process. From the above, we know that AST does not contain semantic and type information. This paper also focuses on AST.

The first step, lexical analysis, is also called scanning scanner (or Lexer). It reads our code and merges them into tokens according to predefined rules. It also removes whitespace, comments, etc. Finally, the entire code is split into a list of tokens (or a one-dimensional array).

When lexically analyzing source code, it reads the code letter by letter, so it is figuratively called scanning – SCANS; When it encounters Spaces, operators, or special symbols, it considers a statement to be complete.

The second step, parsing, is also the parser. It transforms the lexicographically analyzed array into a tree representation. When generating a tree, the parser removes unnecessary tokens (such as incomplete parentheses), so the AST is not 100% source code matching, but it does give us an idea of how to handle them.

xcrun swiftc -frontend -emit-syntax ./Cat.swift | python -m json.tool
Copy the code

You can use this command on a terminal, and the result is a string of AST in JSON format, part of which I intercepted, removing the leading import reference.

{
    "id": 28."kind": "SourceFile"."layout": [{"id": 27,
            "kind": "CodeBlockItemList"."layout": [{"id": 25."kind": "CodeBlockItem"."layout": [{"id": 24,
                            "kind": "StructDecl"."layout": [
                                null,
                                null,
                                {
                                    "id": 7,
                                    "leadingTrivia": [{"kind": "Newline"."value": 2}],"presence": "Present"."tokenKind": {
                                        "kind": "kw_struct"
                                    },
                                    "trailingTrivia": [{"kind": "Space"."value": 1}]}, {"id": 8,
                                    "leadingTrivia": []."presence": "Present"."tokenKind": {
                                        "kind": "identifier"."text": "Cat"
                                    },
                                    "trailingTrivia": [{"kind": "Space"."value": 1
                                        }
                                    ]
                                },
                                null,
                                null,
                                null,
                                {
                                    "id": 23."kind": "MemberDeclBlock"."layout": [{"id": 9,
                                            "leadingTrivia": []."presence": "Present"."tokenKind": {
                                                "kind": "l_brace"
                                            },
                                            "trailingTrivia": []}Copy the code

SwiftSyntax internal structure

RawSyntax

RawSyntax is the original immutable backup store of all Syntax and represents the original tree structure underlying the Syntax tree. These nodes have no concept of identity and only provide the structure of the tree. They are immutable and can be freely shared between syntactic nodes, so they do not maintain any parental relationships. Eventually, RawSyntax reaches its lowest point in the Token represented by the TokenSyntax class, which is the leaf node.

  • RawSyntax is an immutable backup store for all syntax.
  • RawSyntax is immutable.
  • RawSyntax builds the tree structure of the syntax.
  • RawSyntax does not store any parental relationships, so if grammar nodes have the same content, they can be shared between grammar nodes.
final class RawSyntax: ManagedBuffer<RawSyntaxBase, RawSyntaxDataElement> {
	letdata: RawSyntaxData var presence: Fileprivate enum RawSyntaxData {/// A Token containing tokenKind, leading trivia, and trailing triviacaseToken (TokenData) /// A tree node containing syntaxKind and an array of child nodescase layout(LayoutData)
}
Copy the code

Trivia

Trivia has nothing to do with the semantics of the program. Here are some examples of Trivia “atoms” :

  • The blank space
  • The label
  • A newline
  • / / comment
  • / *… * / comment
  • / / / comment
  • / * *… * / comment
  • Back quotes

When parsing or constructing new grammar nodes, follow these two Trivia rules:

  1. Trailing trivia: A Token owns all trivias following it until the next newline is encountered and does not include the newline.
  2. Leading Trivia: A Token owns all of its previous trivias up to and including the first newline character.

example

func foo(a) {
  var x = 2
}
Copy the code

Let’s break it down Token by Token

  • func
  • Leading trivia: no

  • Trailing trivia: a space after possession (according to rule 1)

    // Equivalent to:
    Trivia::spaces(1)
    Copy the code
  • foo
  • Leading trivia: None, precedingfuncOccupies this space
  • The Trailing trivia: no
  • (
  • Leading trivia: no
  • The Trailing trivia: no
  • )
  • Leading trivia: no
  • Trailing trivia: a space after possession (according to rule 1)
  • {
  • Leading trivia: None, preceding(Occupies this space
  • Trailing trivia: none, does not occupy the next newline character (according to rule 1)
  • var
  • Leading trivia: 1 newline and 2 Spaces (according to rule 2)

    ```    
       // Equivalent to:
      Trivia::newlines(1) + Trivia::spaces(2)
    ```
    Copy the code
  • Trailing trivia: a space after possession (according to rule 1)

  • x
  • Leading trivia: None, precedingvarOccupies this space
  • Trailing trivia: a space after possession (according to rule 1)
  • =
  • Leading trivia: None, precedingxOccupies this space
  • Trailing trivia: a space after possession (according to rule 1)
  • 2
  • Leading trivia: None, preceding=Occupies this space
  • Trailing trivia: none, does not occupy the next newline character (according to rule 1)
  • }
  • Leading trivia: a line break (according to rule 2)
  • The Trailing trivia: no
  • EOF
  • Leading trivia: no
  • The Trailing trivia: no

SyntaxData

It wraps the RawSyntax node with some additional information: a pointer to the parent node, the location of the node within its parent, and the cached child node. You can think of SyntaxData as a “concrete” or “implemented” syntax node. They represent specific snippets of source code, with absolute locations, row and column numbers, and so on. SyntaxData is the underlying store of each Syntax node, which is private and not exposed to the public.

Syntax

Syntax represents a tree of nodes with tokens on the leaf, where each node has an accessor to its known child nodes and allows effective iteration of the child nodes through its children attribute.

There are three classes of abstract syntax tree nodes: declaration-related, expression-related, and statement-related. Swift is the same, but with a more granular implementation

public protocol DeclSyntax: Syntax {}

public protocol ExprSyntax: Syntax {}

public protocol StmtSyntax: Syntax {}

public protocol TypeSyntax: Syntax {}

public protocol PatternSyntax: Syntax {}

Copy the code
  • DeclSyntax: In relation to the statement, Such as TypealiasDeclSyntax, ClassDeclSyntax, StructDeclSyntax, ProtocolDeclSyntax, ExtensionDeclSyntax, FunctionDeclSyntax, Deiniti AlizerDeclSyntax, ImportDeclSyntax, VariableDeclSyntax, EnumCaseDeclSyntax and so on.
  • StmtSyntax: Related to statements, such as GuardStmtSyntax, ForInStmtSyntax, SwitchStmtSyntax, DoStmtSyntax, BreakStmtSyntax, ReturnStmtSyntax, etc.
  • ExprSyntax: With respect to expressions, Examples include StringLiteralExprSyntax, IntegerLiteralExprSyntax, TryExprSyntax, FloatLiteralExprSyntax, TupleExprSyntax, DictionaryExprSy Ntax and so on.
  • TypeSyntax: Related to the declaration, represents the type, TupleTypeSyntax, FunctionTypeSyntax, DictionaryTypeSyntax, ArrayTypeSyntax, ClassRestrictionTypeSyntax, AttributedTypeSyntax etc
  • PatternSyntax: Related to pattern matching

There are the following modes in SWIFT:

  • WildcardPattern
  • IdentifierPatternSyntax
  • Value binding mode (ValueBindingPatternSyntax)
  • TuplePatternSyntax
  • Enumeration use case Pattern (Enumeration syntax)
  • OptionalPatternSyntax
  • Type conversion pattern (AsTypePatternSyntax)
  • ExpressionPatternSyntax
  • UnknownPatternSyntax

In addition to the big types of Syntax above, there are other types of Syntax:

  • SourceFileSyntax
  • FunctionParameterSyntax
  • InitializerClauseSyntax
  • MemberDeclListItemSyntax
  • MemberDeclBlockSyntax
  • TypeInheritanceClauseSyntax
  • InheritedTypeSyntax
  • .

SyntaxNode

Represents a node in the syntax tree. This is a more efficient representation than Syntax because it avoids casting Syntax that represents a parent hierarchy. It provides general information, such as node location, range, and uniqueIdentifier, while still allowing the associated Syntax object to be retrieved if necessary. SyntaxParser uses SyntaxNode to effectively report which grammar nodes are reused during incremental reparsing.

Example: {return 1}

This is what the {return 1} sample diagram looks like.

  • Green: RawSyntax type (TokenSyntax is also RawSyntax), this graph is taken from Syntax, where RawTokenSyntax is TokenSyntax on SwiftSyntax.
  • Red: SyntaxData type
  • Blue: Syntax type
  • Gray: Trivia
  • Solid arrow: Strong reference
  • Dotted arrow: weak reference

SwiftSyntax API

Make APIs

let returnKeyword = SyntaxFactory.makeReturnKeyword(trailingTrivia: .spaces(1))
let three = SyntaxFactory.makeIntegerLiteralExpr(digits: SyntaxFactory.makeIntegerLiteral(String(3)))
let returnStmt = SyntaxFactory.makeReturnStmt(returnKeyword: returnKeyword, expression: three)

Copy the code

The output

return 3

Copy the code

With APIs

The WITH API is used to convert nodes to other nodes. Suppose instead of returning 3, we want the statement to return “hello”. We’ll call it using the expression method and pass in the string.

let returnHello = returnStmt.withExpression(SyntaxFactory.makeStringLiteralExpr("Hello"))

Copy the code

Syntax Builders

For each syntax, there is a corresponding builder structure. These provide an incremental way to build syntactic nodes. If we wanted to build the CAT structure from scratch, we would only need four tokens, the struct keyword, the CAT identifier, and two curly braces.

let structKeyword = SyntaxFactory.makeStructKeyword(trailingTrivia: .spaces(1))
let identifier = SyntaxFactory.makeIdentifier("Cat", trailingTrivia: .spaces(1))

let leftBrace = SyntaxFactory.makeLeftBraceToken()
let rightBrace = SyntaxFactory.makeRightBraceToken(leadingTrivia: .newlines(1))
let members = MemberDeclBlockSyntax { builder in
    builder.useLeftBrace(leftBrace)
    builder.useRightBrace(rightBrace)
}

let structureDeclaration = StructDeclSyntax { builder in
    builder.useStructKeyword(structKeyword)
    builder.useIdentifier(identifier)
    builder.useMembers(members)
}

Copy the code

SyntaxVisitors

Using the SyntaxVisitor, we can walk through the syntax tree. This is useful when we want to extract some information for source code analysis.

class FindPublicExtensionDeclVisitor: SyntaxVisitor {

    func visit(_ node: ExtensionDeclSyntax) -> SyntaxVisitorContinueKind {
        ifnode.modifiers? .contains(where: {$0.name.tokenKind == .publicKeyword }) == true {
            // Do something if you find a `public extension` declaration.
        }
        return .skipChildren
    }
}

Copy the code

The return value is a continuation type indicating whether to continue and visit the child node of the syntax tree (.visitChildren) or skip it (.skipchildren)

public enum SyntaxVisitorContinueKind {

  /// The visitor should visit the descendents of the current node.
  case visitChildren

  /// The visitor should avoid visiting the descendents of the current node.
  case skipChildren
}

Copy the code

SyntaxRewriters

SyntaxRewriter allows us to modify the structure of the tree by simply overriding the visit method and returning new nodes based on rules. Note: All nodes are immutable, so instead of modifying the node, we create another node and return it to replace the current one.


class PigRewriter: SyntaxRewriter {

    override func visit(_ token: TokenSyntax) -> Syntax {
        guard case .stringLiteral = token.tokenKind else { return token }
        return token.withKind(.stringLiteral("\" 🐷 \ ""))}}Copy the code

In this example, we replaced all strings in the code with emoticons 🐷. The official example is to add one to all numbers. If you’re interested, go to Github.

The use of SwiftSyntax

The generated code

Usually we don’t use SwiftSyntax to generate a lot of code because it’s a lot of code to write and it’s a crushing amount of work 😂. We can use GYB, GYB (Template Generation) is a tool used internally by Swift to generate source files from templates. Much of the source code in the Swift library is generated using GYB. In fact, much of SwiftSyntax is generated using GYB, including SyntaxBuilders, SyntaxFactory, SyntaxRewriter, etc. Another great tool in the open source community is Sourcery, which allows you to write templates in Swift (via Stencil) instead of Python. SwiftGen also uses Stencil to generate Swift code.

Analyze and transform code

There are two good libraries that use SwiftSyntax, one being Kinetics, which detects unused Swift code such as unused Protocols and classes, their methods and method parameters, and so on. Another is SwiftRewriter, a Swift code formatting tool. You can also write a Swift syntax highlighting tool, such as SwiftGG.

Reference:

The Swift compiler analyzes the implementation of the libSyntax programming language, starting with AST (Abstract Syntax Tree)