This article comes from the internal sharing of the technology department

1. Regular

1.1 What is a regular expression

A regular expression is a special pattern of strings used to match a set of strings. It is like making a product out of a mold. The regular expression is a mold that defines a rule to match the characters that match the rules. Colloquially, it’s a sieve.

1.2 Online Matching Tool

  • regex101.com
  • rubular.com

2. Basic usage

2.1 qualifier,

Regular expression delimiters can be //, ##, ~~, @@, etc. By default, // is used.

2.2 yuan character

  • \ W: Matches letters, digits and underscores

  • \W: Any non-alphabetic, numeric, or underscore character

  • \d: Matches any decimal number

  • \D: Matches a non-numeric character

  • \s: Matches any blank character, space, line feed, carriage return, TAB, page feed [\f\n\r\t\v]

  • \S: Any non-whitespace character

  • \ B: Match a word boundary

  • \B: Match non-word boundaries

  • | : select characters, or said

  • [] : indicates a collection of characters

[a-z] Matches any character in a-zCopy the code
  • Dot character: matches any character except carriage return newline character

  • Matching Chinese characters: Remember that a range of 4e00-9fa5 is written differently in JS and PHP to match Chinese characters:

$regex = '/[\x{4e00}-\x{9fa5}]/u';
Copy the code
var regex = '/[\u4e00-\u9fa5]/';
Copy the code

2.3 ^.$The use of special symbols

^ In [], in addition to

[^\d], any character except a numberCopy the code

^ After [], where to start,$where to end.

For example, /^a.*\d$/ Matches only characters that start with a character and end with a digitCopy the code

2.4 quantifiers

  • {} form

    • {n}: Matches the preceding character n times
    • {n, }: Matches at least n times
    • {n, m}: Indicates at least n times and at most m times
  • *: Arbitrary quantity, equivalent to {0,}

  • +: at least once, equivalent to {1,}

  • ? : Matches 0 times or 1 time

Such as: you (r)? "Can match you or yourCopy the code
  • note
Used after the metacharacter? , as a quantifier, indicating that the number of qualifiers is 0 or 1. , indicates lazy mode.Copy the code

3. Advanced usage

3.1 Lazy mode (as few matches as possible)

  • *?: Matches any number of times, but as few as possible
Such as: a. *? B matches babacb only matches ABCopy the code
  • +?: Matches once or more, but as few matches as possible
Such as: a. +? B matches babacb, it matches ABacbCopy the code
  • ??: Matches 0 or 1 times, as few matches as possible
Such as: a.?? B matches babacb, it matches ABCopy the code
  • {n,m}?: Matches n to m times, as few matches as possible
Such as: a {0, 6}? Matches aaAAAA, and the result is null because it's at least 0 timesCopy the code
  • {n,}?: Matches at least n times, as few matches as possible
Such as: a {2, 6}? Matches AAAAAA and aaCopy the code

3.2 Greed Mode:

The greedy pattern matches as many matches as possible under the premise that the entire expression matches successfully.Copy the code

3.3 Capturing SubGroups (Pattern)

Capturing subgroups is to save the content matched by regular expression subexpressions to a numbered or explicitly named group in memory, which is easy to reference later.

Matches pattern and retrieves the matched content from the generated Matches collectionThe $1.$9You can obtain the corresponding group.Copy the code

3.3.1 classification

  • Common Capture subgroups (Pattern)

  • Named capture subgroups (? pattern)

3.3.2 Capture numbering rules

The capture subgroups are numbered in the order in which “(” appears, from left to right, starting with 1. From the outside in, left to right

  • Regular expressions:(\d{4})-(\d{2})-(\d{2})Use the above re to match dates:

Serial number Capture subgroups Match the content
0 ( (\d{4})-(\d{2}-(\d{2}) ) ) 2018-07-25
1 (\d{4}) 2018
2 (\d{2}-(\d{2})) 07
3 (\d{2}) 25
  • (\d{4})-(\d{2}-(\d{2}) )
Serial number Capture subgroups Match the content
0 ( (\d{4})-(\d{2}-(\d{2}) ) ) 2018-07-25
1 (\d{4}) 2018
2 (\d{2}-(\d{2})) 07-25
3 (\d{2}) 25
  • ((? <year>\d{4})-(? <month>\d{2})-(? <day>\d{2}))

Serial number named Capture subgroups Match the content
0 ((? <year>\d{4})-(? <month>\d{2})-(? <day>\d{2})) 2018-07-25
1 year (? <year>\d{4}) 2018
2 month (? <month>\d{2}) 07
3 day (? <day>\d{2}) 25
  • (? <year>\d{4})-(? <date>\d{2}-(? <day>\d{2}) )

Serial number named Capture subgroups Match the content
0 (? <year>\d{4})-(? <date>\d{2}-(? <day>\d{2}) ) 2018-07-25
1 year (? <year>\d{4}) 2018
2 date (? <date>\d{2}-(? <day>\d{2}) ) 07-25
3 day (? <day>\d{2}) 25
  • (\d{4})-(? <date>\d{2}-(\d{2}) )

Serial number named Capture subgroups Match the content
0 (\d{4})-(? <date>\d{2}-(\d{2}) ) 2018-07-25
1 (\d{4}) 2018
2 date (? <date>\d{2}-(\d{2}) ) 07-25
3 (\d{2}) 25

3.3.3 use

  • To obtainThe address of the picture in the tag

<img[^>]*src="(.+?) ". * >

  • Greed mode<img[^>]*src="(.+)".*>

  • Batch change the image address to path/images directory and change the image extension to JPEG

3.4 Non-capture subgroups(? :pattern)

  • In contrast to capturing subgroups, it does not capture text, nor does it count against a combinator
For example: match industry or industries using industr(? : y | ies) method is equivalent to'industry|industries'
Copy the code

3.5 Reverse Reference

A character that occurs consecutively in a match string and refers to a previously obtained subexpression.

  • Reference way

    • use\Plus the numbers,\ 1Reference the first subexpression in the captured subgroup,\ 2Is the second… And so on.

3.5.1 track of use

  • Filter n consecutive characters that follow
$regex = '/ (.). \1{n-1}/';
Copy the code
  • Filter a string of the form xxabc123xx
$regex = '/(\w)\1(.*)\1{2}/';
Copy the code

3.6 Forward Check

  • Positive pre-examination is also known as positive circumspection, which is divided into positive form and negative form. The following contents of pattern are queried.

3.6.1 Affirmative form(? =pattern)– Followed by pattern

  • /industr(? =y|ies)/

  • willHelloWorldHelloPhpString, separated by words. becomeHello World Hello Php

3.6.2 Negative form(? ! pattern)– It doesn’t follow pattern

  • windows (? ! \d+)– Search Windows that is not followed by a number

3.7 Performing a Pre-Check in Reverse Order

  • Reverse inspection is also called reverse circumspection, divided into positive form and negative form of two. The contents of pattern are queried.

3.7.1 Affirmative form(? <=pattern)– Preceded by Pattern

  • (? <=[\x{4e00}-\x{9fa5}])\d+– Query a string of numbers preceded by Chinese characters

3.7.2 Negative form (? <! Pattern) – not preceded by pattern

  • (? <! \w)\d+– Queries numeric strings that are not preceded by alphanumeric underscores