One, foreword

Regular expressions are both familiar and unfamiliar to you. Familiar with the work because there are many scenarios can be used, such as mobile phone number, email, password and other rules verification.

Strange because regular expressions look like a bunch of garbled code and the matching rules are hard to understand at first glance. Sometimes, when searching for a regular expression of a particular rule on the Internet, the results are different, and the execution effect is not satisfactory. If you want to modify it by yourself, you feel at a loss to start.

Spend 30 minutes today explaining the purpose of matching from another perspective, understanding the idea of matching, and learning how to write regular expressions step by step (reading regular expressions is much more difficult than writing regular expressions).

2. Understand what Re wants to do

What the re does can be summed up in the following three soul questions: Q1, match what? Q2. What is matching not? Q3. How many matches?

Q1, match what?

If you want to match character a, you can write /a/ as long as the string position is a:

/a/.test("javascript") //true
Copy the code

To match a string beginning with a, add the metacharacter ^ (start position identifier), /^a/ :

/^a/.test("javascript") // Returns false instead of starting with a
/^a/.test("abc") // Returns true starting with a
Copy the code

If a string ends in a, add the metacharacter $(end position identifier), /a$/ :

/a$/.test("javascript") // Return false instead of ending with a
/a$/.test("cba") // End with a returns true
Copy the code

/[ab]/ if the string contains a or b, the string can be matched:

/[ab]/.test("byte") //true
Copy the code

Matching string ABC and xyz, / ABC | xyz / :

/abc|xyz/.test("aabbxyz") // This string contains xyz, so return true
Copy the code

① Match what is in front of (use foresight)

Forward-looking definition (? =exp) : matches a position followed by exp

exp1(? =exp2) : matches exp1 before exp2, and does not include exp2

For example, to match the part of the string before script Java, / Java (? = script) / :

/java(? =script)/.test("javascript,javaee,typescript") // A string in javascript that meets the rule returns true

// select * from exec/java(? =script)/.exec("javascript,javaee,typescript")
//2.
["java".index: 0.input: "javascript,javaee,typescript".groups: undefined]
//3, index = 0; //3, index = 0
Copy the code

② Match what behind (after use)

Afterthought definition (? <=exp) : Matches a position preceded by exp

(? <=exp2)exp1: matches exp1 following exp2, excluding exp2

For example, to match ee after Java in a string, /(? < = Java ee / :

/ (? <=java)ee/.test("javascript,javaee,typescript") // True is returned when javaEE meets the rule in the string

// select * from exec/ (? <=java)ee/.exec("javascript,javaee,typescript") 
//2.
["ee".index: 15.input: "javascript,javaee,typescript".groups: undefined]
Javaee = javaEE; javaEE = javaEE; javaEE = javaEE
Copy the code

Q2. What is matching not?

/[^a]/, /[^a]/, /[^a]/, /[^a]/, /[^a]/

/[^a]/.test("aaa") // String is full of A's, return false
/[^a]/.test("abc") // The string is not all a, return true
Copy the code

/^[^a]/ : /^[^a]/ :

/^[^a]/.test("javascript") // This string does not begin with a and returns true
/^[^a]/.test("abc") // This string starts with a and returns false
Copy the code

$[^a]$/ [^a]$/ :

/[^a]$/.test("javascript") // This string does not end with a, returns true
/[^a]$/.test("cba") // This string ends with a and returns false
Copy the code

[^ ABC]/ [^ ABC]/ :

/[^abc]$/.test("abccba") // All characters in this string do not match, return false
Copy the code

① Matching behind is not what (use negative foresight)

Negative forward-looking definition (? ! Exp) : matches a position that is not followed by exp

exp1(? ! Exp2) : matches exp1 not followed by exp2, and does not include exp2

For example, to match a string in Java that is not followed by script, / Java (? ! Script) / :

/java(? ! script)/.test("javascript,javaee,typescript") // String javaee meets the rule and returns true

// select * from exec/java(? ! script)/.exec("javascript,javaee,typescript")
//2.
["java".index: 11.input: "javascript,javaee,typescript".groups: undefined]
//3, index = 11, javaEE = Java, Java = ee
Copy the code

② Match the front is not what (use negative afterthought)

Negative afterthought definition (?

(?

For example, to match a script in a string that does not start with Java, /(?

/ (? <! java)script/.test("javascript,javaee,typescript") // Typescript compliance rules in strings return true

// select * from exec/ (? <! java)script/.exec("javascript,javaee,typescript")
//2.
["script".index: 22.input: "javascript,javaee,typescript".groups: undefined]
// select * from typescript; // select * from typescript; // select * from typescript
Copy the code

③ Match the string that does not contain consecutive ABCs

This is a special matching behavior, if you just write it as /[^ ABC]/, the set inside is or. This means that the string cannot contain any of a, B, or C, and cannot detect whether the string contains consecutive ABCs.

Then we need to analyze from another perspective, ABC cannot appear continuously at any position of the string. We can use the feature of negative foresight to match, and then realize the regular step by step:

  1. The matching position cannot be followed by ABC. Use a position like negative forward-looking matching:/ (? ! abc)/
  2. This position can be followed by other characters, using\wTo represent:/ (? ! abc)\w/
  3. After satisfying the above situation, the position can be used for many consecutive times+To indicate quantity:/ ((? ! abc)\w)+/
  4. Each position from start to end is overwritten to, add the start and end flags:/ ^ ((? ! abc)\w)+$/
/ ^ ((? ! abc)\w)+$/.test("cbacbac") // This string does not contain consecutive abCs, and returns true/ ^ ((? ! abc)\w)+$/.test("cbacbabc") // This string contains consecutive abCs, which returns false
Copy the code

Here you may also have a question: why use negative forward (not what), not negative backward (not what), as long as each position is not preceded by ABC?

The answer is no. Since the re is checked from the first position in the string, the first position in any string is empty, so the first string does not satisfy the condition that it is not preceded by ABC, so it does not work.

In this case, however, it is perfectly possible to use lookahead to determine whether a string contains an ABC. Here is just how to use negative lookahead.

Illustrate the meaning of forward and backward

In the figure, the green characters and positions have been matched and detected, the red characters are currently being matched and detected, and the gray characters are not yet matched and detected. The matching direction is from left to right. Before and after in forward-looking and backward refer to before and after the matching direction, not before and after the string position. Lookahead and lookback are also referred to as forward zero-width assertion and negative zero-width assertion (zero-width means that only position-mismatched characters are matched), where forward and negative are also matching directions.

So looking ahead is to look in front of the matching direction from the matching position to determine whether the character in front is exp. Backtracking is to look behind the matching position to determine whether the trailing character is exp. Negative foresight is looking ahead to see if the character ahead is not exp. Negative afterthought is to look behind and determine whether the trailing character is not exp.

Note: Backtracking and negative backtracking are not supported in some languages or environments. Use them with caution to verify their support

In order to better understand the forward-looking and backward knowledge points, I specifically wrote an analytical article, interested in can look at “a map to fix the regular forward-looking and backward”

Q3. How many matches?

For example, to match a single number /\d/, the easiest way to match three consecutive numbers is to write three times in a row: /\d\d\d/.

But if the number of times is too high or the number is uncertain, this is not going to work, so we can add the length rule:

* : Matches any time

+ : matches at least once

? : Matches 1 or 0 times

{m} : matches m times

{m,} : indicates the minimum number of matches

{m,n} : the minimum number of matches is m, and the maximum number is n. M must be less than or equal to n

If you want to prevent greedy matching, you can add a length rule after? , such as:

/\d{2,}/.exec("1234567890")
// Get a matching result like this:
["1234567890".index: 0.input: "1234567890".groups: undefined]

// add? To prevent non-greedy matches after
/\d{2,}? /.exec("1234567890")
// select * from ();
["12".index: 0.input: "1234567890".groups: undefined]
Copy the code

① Use groups

What should we do if we want to match a certain word such as regregregregreg for multiple times? We see that regs appear 6 times in a row. If we write 6 regs in the regular expression foolishly, it is definitely not appropriate, so we can use groups to achieve this. {6} / / (reg) :

/(reg){6}/.test("regregregregregreg") // If the match is successful, return true
Copy the code

The only way to use this grouping is to know that the string to match is reg, and then repeat the grouping. What if you want to match consecutive overlapping types like 8899 or 5522 but are not sure what they are? Then we can put the first overlapping item into the group and capture the contents of the group by \n (n indicates the number of groups) to match the next one:

/(\d)\1(\d)\2/.exec("2345566789")
// Return 5566, group 5 and 6
["5566"."5"."6".index: 3.input: "2345566789".groups: undefined]
Copy the code

② Group capture

The default groups can be captured; \1 and \2 above are captured inside regular expressions. You can use RegExp.$1-$9 to capture the packet matching data externally. As long as the re matches. You can use the test, exec, or STR replace methods to get $1-$9.

Use the test:

/([a-z]{2})(\d{2})/.test("xyz123")
RegExp$1.// return the content yz matched by the first grouping expression
RegExp$2.// returns the second grouping expression matching 12
Copy the code

Use the replace:

"xyz123".replace(/([a-z]{2})(\d{2})/.'$2 $1')
// return x12yz3, which swaps yz with 12 in the second group
Copy the code

③ Grouping does not capture

If you don’t want to capture groups, just add? B: That’s fine

/([a-z]{2(})? :\d{2})/.test("xyz123")
RegExp$1.// return the content yz matched by the first grouping expression
RegExp$2.// Group not captured returns an empty string
Copy the code

Third, summary

This article will not go into the meaning of metacharacters and combination of use methods, these are to memorize the hard things. Instead, it teaches us how to cut into the thinking of regular expression, how to analyze the requirements to be matched step by step, break the long and complex requirements into short and simple ones, reverse analyze the positive and negative requirements, combine them bit by bit, and then return to the soul three questions: Matching what? What is a match not? How many matches? To complete the required regular expression.

A final recommendation is regexper.com, which illustrates regular