4- Parsers – Parsers

Parsers fall into two categories, depending on the order in which the syntax is parsed.

Top-down parsers: Start with the higher structure of the syntax and try to find a matching structure. In other words, the parser matches the advanced syntax and then degrades the match.

Again in our example above (2+3-1), the parser marks 2+3 as an expression and 2+3-1 as an expression, rather than as an item first.

Bottom-up parser: also known as shift reduction parser, which is triggered from the lower rules of the grammar and gradually converts the input into the grammar rules until the higher rules are satisfied.

Similarly, the parser takes the input content, finds the corresponding matching rule, and replaces the matching content with the corresponding underlying rule (term, operation). This will go on until the end of the input.

Although the parsing process is still quite rigid, which is nothing more than matching rules, it is still difficult to write a parser by hand, which requires a deep understanding of the parsing process and optimization of the whole process. It must be a miracle to have a tool that generates a parser just by typing in the syntax (vocabulary and grammar rules) of the language.

The WebKit rendering engine is said to use two well-known parser generators (Flex and Bison).

Flex: Not front-end CSS. The generator creates a lexer, and the input is a token file containing regular expression definitions.

Bison: A bottom-up parser (parse) created that takes input from syntax rules in the NBF format.

Since you need to use the BNF format to write the syntax, the proof applies to the language of Context Free Grammer. CSS parsers in WebKit are generated by Bison; The CSS parser in Firefox is hand-written, and its main logic is to convert a CSS file into a StyleSheet object, which contains the corresponding CSS rules.

Can Bison create an HTML parser? No way, because HTML syntax rules are not context Free Grammer types, so you can’t write syntax using BNF format. Strictly speaking, regular parsers that parse CSS and JavaScript cannot parse HTML documents.

The HTML format can be defined by a DTD (Document Type Definition), which reminds me that XML and HTML are a lot like markup languages at the language level, XML is also defined by DTDS and XML Schemas, And there is a variant of HTML in XML: XHTML. Can an HTML parser parse HTML?

Still no, why not? Presumably with XML, HTML syntax rules are more inclusive, allowing developers to omit some implicitly added markup, and sometimes start, start, and end tags. This aspect is one of the things that makes HTML popular, and it also makes HTML syntax difficult to define, so it can’t be parsed using a regular parser.

What does an HTML parser look like

Next article

Parse-html parser