Reprint please indicate the source: juejin.cn/post/699200…

Past reviews:

Learning Regular Expressions Together (part 1) Those head-spinning metacharacters

Learning Regular Expressions Together (part 2) Quantifiers and Greed

Learning Regular Expressions together (3) Grouping and Referencing

Learn regular expressions: 4 Common Matching Patterns

Learning Regular expressions (part 5) : Assertion Matching

# 0. Write first

When you think of assertions, the first thing that comes to mind is unit testing. In a regular expression, an assertion is a requirement to match the position of the text. For example, we want to look for the word Tom in a text, but the other words tomorrow or Tomato also contain Tom. This is where assertions come into play.

If you think about it, it’s similar to assertions in unit tests. There are three common assertions in regular text: word boundaries, line beginning and end, and look around. Let’s see.

# 1. Word boundaries

We learned about grouping and referencing in regex in the third article, and in the lookups section, we took an example to review:

(\w{2,} \1) (\w{2,} \1) (\w{2,} \1)

\w+ \w+ \w+ \w+ \w+ \w+ \w+ \w+ \w+ \w+ \w+ \w+

You can see that the letter S is also matched, which is not what we want, and we need to solve this problem gracefully by using the word Boundary. The word Boundary can be written on the left or right side of the word, as follows:

\b is placed to the left or right of the word, and the matching results are different:

The word tom

The word contains Tom
\btom

Words that begin with Tom
tom\b

A word that ends in “Tom”
\btom\b

Only Tom
tom
tomorrow
atom
atomic

# 2. Start and end of line

In previous articles, we’ve learned a bit about the start and end of lines. In re, we use the character ^ to indicate the start of a line and the dollar sign \$to indicate the end of a line. For example:

To verify the format of the login password, the password must be 6 consecutive digits. We can quickly write the regular expression \d{6} to verify:

6 digits can be matched successfully, 7 digits and 8 digits can be matched successfully, this is obviously incorrect, modify:

Much like word boundaries, the ^ limit begins with something and the \$limit ends with something, except instead of a word, the limit is a piece of text.

We can also use \A and \z (\z in Python) to qualify the beginning and end of A line:

Note: the \A \Z match does not support multiple lines, and the target string may not contain other characters, such as newlines or Spaces.

# 3. Look around

Also known as a zero-width assertion, it is similar to a word boundary, but more flexible. It defines what type the left side of a word can and cannot be, and what type the right side of a word can and cannot be:

regular The name of the meaning The sample
(? <=Y) Definitely look around in reverse order On the left is Y (? <=\d)th is the number th on the left, can match on 9th
(? <! Y) Negate reverse order look around The left hand side is not Y (? <! \d) the left side of th is not the number th, can match health
(? =Y) Look around in certain order On the right is the Y six(? =\d) To the right is the number six, which matches six6
(? ! Y) Look around in negative order The right hand side is not Y six(? ! \d) Six, which is not a number, can be matched with sixGod

The circular regular expression is enclosed in parentheses. Note that the parentheses are not stored as subgroups, so skip when referring to the position of the parentheses.

These four expressions, at first glance, look a little bit confused, but they actually follow a pattern. If you have Angle brackets, you look at the left, if you don’t have Angle brackets, you look at the right, and if you have an exclamation mark, it’s negative, so you can remember it.

To represent the boundaries of the words we learned above, we can write ((?

# 4. Put it at the end

Finally, to summarize the above content:

That’s the end of assertion matching for regular expressions. If you have any questions, leave me a comment. Thank you.

The online regular expression verification tool is regex101.com/