At work, regular expressions may not be used very much, and there are ready-made examples on the Internet when they are used, rarely lacking a comprehensive understanding. This document uses HTML tag matching as an example to describe common functions of regular expressions. Match the FOLLOWING HTML fragment:

Let the STR = ` < div id = "app" > < div > 21 < / div > < h1 > < span > hello < / span > < span > smile < / span > < / h1 > < button > button < / button > < / div > `;Copy the code

Match (RegExp),RegExp. Test (String),RegExp. Exec (String)

1. Declaration of regular expressions

Regular expressions can be declared in the same way as regular variables. In general, there are two ways: literals and the regular constructor RegExp. Take matching a SPAN tag as an example. The matches for a single SPAN tag are as follows:

Let normalPattern = new RegExp('span')Copy the code

2. The modifier

If I want to match more than one element, I need to introduce the concept of re modifiers. The usual ones are I and g.

For example, /span/ can only match a span string. A span string cannot match a span string. If you want to match a span, you need to add the I modifier

G stands for global. By default, the re matches only the first value that matches the condition. If you want to match globally, you need to add the G modifier.

The specific writing is as follows:

Let pattern = new RegExp('span', 'ig')Copy the code

3. Character categories

The above matched text is very basic. What if I wanted to match all the HTML tags? There are too many HTML tags to list all of them, so we can use character categories for quick matching. The reference list is as follows:

A character category, to some extent, can be understood as a character set, as shown in the above list. \d is the set of numbers 0-9. If we wanted to match all the numbers in a piece of text, we could simply declare the following re:

let numberPattern = /\d/g
Copy the code

What if I want to match all the HTML tags? Before matching a text, we need to analyze the rules of HTML tags. First, HTML tags and XML tags in general, end with Angle brackets < head, Angle brackets >. Second, only English letters or numbers can be in the middle, and finally, numbers or letters may appear once or more. The first element of writing regular is to sum up and summarize first. Only by summing up the rules can we analyze the writing method of regular. A regular expression is essentially an explanatory text of rules. From the rules we summarize, we can write as follows:

let matchTags = /<\w+>/g
Copy the code

The matching results are as follows:

[ '<div>', '<h1>', '<span>', '<span>', '<button>' ]
Copy the code

\w stands for numbers or letters, + for one or more matches, and g for global matches. When we match text, it is sometimes impossible to use a pattern only once. Regular expressions provide a rich quantifier syntax that meets most of our requirements, as described in the following sections.

We see that the HTML tag we match is just the preceding tag, and there is no closing tag behind it. How do we match the closing tag? Closing tag more than ordinary label a slash /, if we simply write /, you can’t, you may realize that regular expressions literal notation is two slashes/wrapped up, if we write directly, there will be conflict, at this time, we need to use backslash escapes characters \ to escape, some conflict with regular expressions, Both require escaping characters, such as special characters (),[], and special meaning sets,\ w,\d….

The re for matching closed tags is as follows:

let matchCloseTags = /<\/\w+>/g
Copy the code

Matching results:

[ '</div>', '</span>', '</span>', '</h1>', '</button>', '</div>' ]
Copy the code

4. The quantifiers

In the character class, we use the quantifier +, which means one or more matches. The list of regular expression quantifiers goes far beyond this:

For example, if I want to match the h tag in an HTML fragment, we all know that the basic rule of the H tag is to have only one letter and number. From the above we can conclude that we can use the quantifier {n}, the regular expression is as follows:

let matchH = /<h{1}\d{1}>/g
Copy the code

Matching results:

[ '<h1>' ]
Copy the code

Here, we have to remember two quantifiers, x star, right? And x +? This is the least matching rule. If we now want to match the first span closing tag and text, that is, hello, if we write the normal way, starting with, ending with any character in between, the regular expression implementation looks like this:

let matchSpan = /<span>(\S|\s)+<\/span>/g
Copy the code

The matching result may be different from what we expected. In fact, there is only one matching result, as shown below:

[ '<span>hello</span>\n        <span>smile</span>' ]
Copy the code

By default, the re will match the full text. Instead of following the rule of minimum matching, the re will maximize the matching text. If we want to match two span tags separately, we need to use the least match quantifier as follows:

let matchSpan = /<span>(\S|\s)*? <\/span>/g let matchSpan2 = /<span>(\S|\s)+? <\/span>/gCopy the code

Text matching result:

[ '<span>hello</span>', '<span>smile</span>' ]
Copy the code

Collection of 5.

For example, if you want to match the letters ABC, pattern=/[ABC]/; if you do not want to match ABC, pattern=[^ ABC]; if you match numbers, pattern=[^ ABC]. That is /[0-9]/,- for a continuous set, the letter is [a-za-z].

6. The border

The simplest use of the boundary, ^ represents the beginning of the matching input and $represents the end of the matching input.

Group 7.

If there is a need to match the text in the SPAN tag, but not the SPAN tag, you might use grouping, which literally means categorizing rules. The syntax for grouping is simple: the parentheses (). The syntax for converting to re is as follows:

let matchSpanText = /(<span>)(\S+)+? (<\/span>)/ let result = str.match(matchSpanText)Copy the code

The matching results are as follows:

The first part is the matching result, the second part is the matching result of (span), the third part is the matching result of (\S+), and the fourth part is the matching result of (). If we want to get plain text, we can directly go to arr[2]. We can also draw a conclusion that the so-called grouping is actually the matching set of ordinary regular matching plus grouping conditions.

8. Assertion

Assertion is a good thing, can solve many difficult problems, such as the example above, I want to take two labels in the middle of the text, use the group to get, is possible, but just see every time want to take the third match results array, a little upset, what can we do to assert that can help you to solve the troubles, the result of the match is a text directly. The basic rules for assertions are as follows:

  

From the basic rule, we can literally conclude that the assertion can set a match rule before and after the text. We can modify the text rule in the middle of the match tag as follows:

let matchSpanTextDescribe = /(? <=<span>)(\S+)+? (? =<\/span>)/g let result = str.match(matchSpanTextDescribe)Copy the code

The matching results are as follows:

Reference: developer.mozilla.org/zh-CN/docs/…