preface

Regular expressions are a technology that has always bothered many programmers, including me. Most of the time when we want to use regular expressions during development, we will open Google or Baidu search directly and then copy and paste.

As a versatile technique, I believe it pays to have a deep understanding of regular expressions and be able to digest them. So, with this share, I hope that through this share, we can understand the re, and in the face of re related problems can be solved without the help of search engines.

Feel the charm of the regular

1. Obtain all numbers in the text

Let title = "It is 12:10 am, August 22, 2021 "; Let time = [...title].filter((item) =>! Number.isNaN(Number.parseInt(item))); // Get the number console.log(time.join("")); // let numbs = tit. match(/\d/g); console.log(numbs.join("")); / / print the results exactly the same console. The log (time. Join (" ") = = = numbs. Join (" ")); //trueCopy the code

2. How do I check whether the entered mobile phone number is valid?

/^1[3456789]\d{9}$/

/^1(3|4|5|6|7|8|9)\d{9}$/

^1\d{10}
Copy the code

Comparative analysis Regular expressions can produce more concise and powerful code than ordinary function manipulation characters.

0 x01 is what

Regular Expression is a tool for matching string patterns to enable search and replace. It originated from some work scientists did in mathematics in the 1950s and was later introduced into the field of computers. As we know from its name, it is an expression used to describe rules. And its underlying principle is also very simple, is to use the idea of state machine pattern matching.

In short, there are two ways to implement a regular expression engine: DFA (Deterministic Final Automata) and NFA (Non-Deterministic Finite Automaton).

0x02 Basics

1. The character

A single character

The simplest regular expressions can consist of simple numbers and letters. They have no special semantics and are purely one-to-one correspondence. If you want to find the character ‘a ‘in the word ‘apple’, just use the /a/ re.

Sometimes we want to find the/character, but what if it has special meaning in the re? So we need to use the escape character \.

The backslash indicates that the escape character is used to change the meaning of a character. It is used when a character has more than one meaning.

If the character is not a special character, using an escape symbol gives it a special meaning. We often need to match special characters, such as Spaces, tabs, carriage returns, newlines, etc., which we need to use escape characters to match.

More characters

The mapping of a single character is one-to-one, that is, only one character is used to filter matches in the regular expression. And this is obviously not enough, as long as the introduction of set interval and wildcard way can achieve one-to-many matching.

2. Repeat match modifier

We use repeat match modifiers when we want to repeat something, including the following.

The basic use

let name = "sooooo"; // + : one or more console.log(name.match(/so+/)); Sooooo // * : zero or more console.log(name.match(/so*/)); // sooooo // ? Log (name.match(/so? /)); // so // {1,2} : console.log(name.match(/so{1,2}/)); // sooCopy the code

Effect of repeated matching on atomic groups

let name = "sososososososos"; Console. log(name.match(/(so)+/g)); // ["sososososososo"]Copy the code

No greed (lazy mode)

When the regular expression is repeated, the default is greedy matching mode, that is, it will match as much as possible, but sometimes we do not want it to match as much as possible. Modifier to disallow duplicate matching

let name = "soooooo"; // * : 0 or more, with a question mark to match 0 let reg = /so*? /g; console.log(name.match(reg)); // ["s"] // + : one or more, with a question mark to indicate that only 1 match reg = /so+? /g; console.log(name.match(reg)); // ["so"] // ? : 0 or 1, with a question mark to indicate that only 0 reg = /so?? /g; console.log(name.match(reg)); / / / / / "s"] {2, 5} means to match 2 to 5, plus the question mark means to match two reg = / so} {2 and 5? /g; console.log(name.match(reg)); // ["soo"]Copy the code

Exclusive mode

If the expression is followed by a plus sign (+), exclusive mode is turned on. As with greedy mode, exclusive mode will match the longest. However, in exclusive mode, the regular expression matches the string as long as possible and ends the match without backtracking if the match is unsuccessful.

Here are the expressions for the three patterns:

3. The boundary

Words are the basic units that form sentences and articles. A common use scenario is to find specific words in articles or sentences. Such as:

The cat scattered his food all over the room. 
Copy the code

I was trying to find cat, but using the /cat/ re alone would match both cat and scattered text. In this case, we need to use the boundary regular expression \b, where b is the first letter of boundary. In the regular engine it actually matches the position between the character (\w) that can form a word and the character (\w) that cannot form a word.

Rewrite the above example as /\bcat\b/ to match the word cat.

4. Schema modifiers

Regular expressions are executed the way they are executed by default, but sometimes the default is not what we want, so you can change the default using the schema modifier.

  • G global matching

  • I is case insensitive

  • M multi-line match

  • S treats a string as a single-line match

  • Y mode indicates that the match will stop when the match does not meet the conditions. The match will not continue, and the match must be continuous

5. Selectors

| expressed in a vertical bar, or select modifier,

let title = "xiaoming"; / / or a vertical said, the expression can satisfy any conditions on both sides of the console, log (/ xiao | m/test (the title)); / / test if the phone is in Shanghai or Beijing by machine / / error results: matches only | any results on both left and right, the console. The log (tel. Match (/ 010 | 020 \ - \ {7, 8} / d)); Let tel = "020-9999999"; The console. The log (/ (010 | 020) \ - \ {7, 8} d/test (tel));Copy the code

0x03 Advanced Usage

We’ve covered character matching pretty much, but more advanced uses go to subexpression. You can make the re more powerful by nesting recursion and self-referencing.

The evolution of regular expressions from simple to complex usually involves grouping, backreferencing, and logical processing. Using these three rules, you can derive infinitely complex regular expressions

1. The atomic group

grouping

Grouping is shown as follows: All regular expressions contained in (and) metacharacters are divided into groups. Each group is a subexpression, which is also the basis for forming advanced regular expressions. Using simple (regex) matching syntax is essentially the same as not grouping, and its power is often combined with backtracking references.

When group matching is performed without g mode modifier, only the first one is matched, and the matched information contains the following data

Using atomic group matching in match returns each group data to the result

let nh = "nihaoya.com";
console.log(nh.match(/nihao(ya)\.(com)/)); 
Copy the code

Back in the reference

A backreference is when the later part of the pattern refers to a previously matched substring. You can think of it as a variable, with backreference syntax like \1,\2,…. , where \1 represents the first subexpression of the reference, \2 represents the second subexpression of the reference, and so on. While \0 represents the entire expression.

Suppose we now want to match [” in the following DOM

A title

Title 2

Let dom = '<h1> header 1 </h1> <h2> header 2 </h2>'; / / a parentheses wrapped up things called atomic groups, \ 1 said the content of the same as the first atomic groups / / / < (h) [1-6] > [\ s \ s] * < / \ [1-6] (h) > / let reg = / < (h) [1-6] > [\ s \ s] * < 1 > / / / / g; console.log(dom.match(reg)); // ["<h1> </h1>", "<h2> </h2>"]Copy the code

Using backtracking references, we can easily write regulars like /<(h[1-6])>[\s\ s]*<\/\1>/g.

Backtracking references are commonly used in substitution strings, with slightly different syntax, with $1,$2… To reference the string to be replaced.

Substitution strings can insert the following special variable names:

Here is js code for demonstration:

  • Characters to replace

    var str = ‘abc abc 123′; str.replace(/(ab)c/g,’$1f’); // get result ‘abf abf 123’

  • Add three = before and after hello

    Let hd =” hello =”; Console. log(hd.replace(/ hello /g, “‘ ‘ ‘&’$'”));

  • Use the number. – Connect

    let hd = “(010)99999999 (020)8888888”; The console. The log (hd) replace (/ ((\ d {3, 4})) (\ d {7, 8})/g, “1-1-1-2”));

Nested and unrecorded groups

Nested groups have lots of /(1(2(3)))/, as shown in the figure for backreference

If we do not want the subexpression to be referenced, we can use the non-captured re (? :regex) so you don’t waste memory.

Plus what? : indicates that the atomic group is not recorded, but the atomic group function still takes effect

var str = 'scq000'. str.replace(/(scq00)(? :0)/, '$1,$2') // returns scq00,$2Copy the code

Group alias

If you want to return clearer group data, you can number the atomic groups and the result will be saved in the returned Groups field

let hd = "<h1> nihaoya.com </h1>"; console.dir(hd.match(/<(? <tag>h[1-6])[\s\S]*<\/\1>/));Copy the code

  • Use? Alias atomic group

  • Use $to read the alias

2. Assertion match

Sometimes we need to limit the scope of backtracking references. You can do this by asserting a match.

An assertion, although written in parentheses, is not a group, so it is not stored in the match result, and can be understood as a condition in the re.

An assertion is used to state a fact that should be true. Matches in regular expressions continue only if the assertion is true

Zero width first assertion

(? =exp) positive lookup

Used to check if a particular character set is present next.

Font color changes for Mr. And Mrs. Zhang SAN

// Mr. And Mrs. Zhang are a pair made in heaven. Let main = document.body; Let men = / let men (? Mr =)/g; // match the field let wumen = /? Ms =)/g; main.innerHTML = main.innerHTML.replace( men, `<span style="color:blue">$&</span>` ); main.innerHTML = main.innerHTML.replace( wumen, `<span style="color:pink">$&</span>` ); </script>Copy the code

Here we add.00 after the price

<script> let lessons = 'CSS,200 yuan,300 times js,300.00 yuan,100 times Node.js,180 yuan,260 times'; let reg = /(\d+)(.00)? (? =)/gi; lessons = lessons.replace(reg, (v, ... args) => { args[1] = args[1] || ".00"; return args.splice(0, 2).join(""); }); console.log(lessons); </script>Copy the code

Zero-width trailing assertion

? <=exp reverse affirmative lookup

Used to check if the preceding character set is a particular character set.

Matches the number preceded by shijei

let hd = "nihao789shijei666"; let reg = /(? <=shijei)\d+/i; console.log(hd.match(reg)); / / 666Copy the code

Matches before and after the number of content

let hd = "nihao789shijei666"; let reg = /(? <=\d)[a-z]+(? =\d{3})/i; console.log(hd.match(reg)); // shijeiCopy the code

Gets the content in the title

let hd = `<h1>nihaoya</h1>`; let reg = /(? <=<h1>).*(? =<\/h1>)/g; console.log(hd.match(reg));Copy the code

Zero – width negative – ahead assertion

(? ! Exp) positive negative search

A character set used to check that matches should not follow.

Use (? ! Exp) cannot be followed by two digits

let reg = /abc(? ! de)/; reg.test('abcdefg'); // false; reg.test('abcd'); // true; reg.test('abcabc'); // true;Copy the code

Zero-width negative trailing assertion

(?

Used to check for character sets that should not appear earlier.

0 x04 application

1. Use regular replacement characters

Use assertions to obscure phone numbers

Sometimes we need to blur the phone number and replace the middle four digits with *

// let tels = '15036999999'; let reg = /(\d{3})(\d{4})(\d+)/g; tels = tels.replace(reg, (v, ... args) => { args[1] = "*".repeat(4); return args.splice(0, 3).join(""); }); console.log(tels); // 150****9999 // let newtel = '15036999999'; Function hideTel(tel) {// match newReg = /(? <=\d{3})\d{4}(? =\d{4})/g; Return tel.replace(newReg, (v) => {// Replace this field with 4 * s return "*". Repeat (4); }); } console.log(hideTel(newtel)); * * * * 9999 / / 150Copy the code

Time string formatting

Sometimes the time string returned from the back end is in the format of ‘2021-8-13 16:32:2’, want to convert to ‘August 13, 2021 16:32:02’.

@param {string} templete {string} @param {string} Templete {string} {0} - > {1 ~ 5} - > the date when the minutes * @ returns {string} time formatted string * / function formatTime (time, Templete = "{0} year {1} month {2} day {3} hour {4} minute {5} second ") {// Let timeAry = time.match(/\d+/g); return templete.replace(/\{(\d+)\}/g, (... [, $1]) => {// =>content: information about the current grand re match $1: This small group match information alone / / for $1 of value as the index, to find the corresponding timeAry time (if there is no "00" is used to fill) let time = timeAry [$1] | | "00"; return time.length < 2 ? "0" + time : time; }); } let time = "2021-8-13 16:32:2"; // time = "2021/8/26"; console.log(formatTime(time)); Console. log(formatTime(time, "{0} year {1} month {2} day ")); . / / August 13, 2021, the console log (formatTime (time, "{0} / {1} {2}, {3}, {4}, {5}")); / / 2021/08/13 day 16:32:02 console. The log (formatTime (time, "{1} - {2} {3}, {4}")); // 08-13 16:32 let time1 = "2021/8/26"; console.log(formatTime1(time1)); Console. log(formatTime1(time1, "{0} year {1} month {2} day ")); . / / August 26, 2021 the console log (formatTime1 (time1, "{1} - {2} {3}, {4}")); / / 08-26 00:00Copy the code

2. Form verification

Microsignal regularization

Var wxPattern = /^[a-za-z]([-_a-za-z0-9]{5,19})+$/; // Output true console.log(wxpattern. test("nihaoya_com"));Copy the code

Verify numbers, can be positive, negative, or 0

^ 0 $| ^ -? [1-9]\d*$Copy the code

Consider: digital price in thousandths

123456789 = > 123456789

Thinking – Steps:

  1. Add a comma before every third digit from the back

  2. Do not start with a comma (e.g.,123 does not end with 123)

Does it fit (? What’s the rule of p? P can represent every three digits, and the comma to be added is exactly where (? =p) matched position.

Step 1: Take out the last comma

let price = '123456789' let priceReg = /(? = \ d {3} $)/console log (price) replace (proceReg, ', ')) / / 123456789Copy the code

Step 2: Take out all commas

let price = '123456789' let priceReg = /(? = (\ d {3}) + $)/g the console log (price) replace (priceReg, ', ')) / /, 123456789Copy the code

Step 3: Remove the first comma

let price = '123456789' let priceReg = /(? ! (^)? = (\ d {3}) + $)/g the console log (price) replace (priceReg, ', ')) / / 123456789Copy the code

Results:

0 x05 summary

For re, reasonable use of re can greatly simplify our code, but also help us to achieve more powerful functions, I hope that through this sharing, so that we can be familiar with re, can be very good to use re in the future

Recommended regular visualization tools:

1. regex101.com/

2. regexper.com/

3. jex.im/regulex