This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Tell a little joke

Yesterday after work, long time to come back home, because the company downstairs parking lot design and maze, every time to find for a long time, to find, I have no car O (╥﹏╥) O.

preface

There was a time when I had a fear and loathing of regular expressions. Why? Because I always feel that this thing is very difficult, very boring, when I see other people write awesome regulars, I think I can be as good as them. Until I saw these three things…

It only takes 10 minutes and you can harvest

  1. Principle and knowledge of position matching in regular expression
  2. The principle and knowledge of string matching in regular expression
  3. Handy use of parentheses in regular expressions
  4. 14 common regular expression parses to help you understand the point

Trust me, by the end of this article, you’ll have a solution for more than 90% of regular problems at work.

Trust me, by the end of this article, you’ll have a solution for more than 90% of regular problems at work.

Trust me, by the end of this article, you’ll have a solution for more than 90% of regular problems at work.

Say three

Regular expressions are matching patterns that match either characters or positions

Regular expressions are matching patterns that match either characters or positions

Regular expressions are matching patterns that match either characters or positions

1. What does knowing location do?

Topic 1: The thousandth division of numbers

Convert 123456789 to 123,456,789

Title 2: Cell phone number 3-4-4 split

Change the phone number 18379836654 to 183-7983-6654

Topic 3: Verify password validity

The password contains 6 to 12 characters, including digits, lowercase characters, and uppercase letters, but must contain at least two types of characters

These questions come up all the time in job interviews and in everyday business. Knowing the position will not only help you nail the interview, but also your business

What is location?

Regular expressions are matching patterns that match either characters or positions. So what is location?

As indicated by the arrow in the figure below, the position can be understood as the position between adjacent characters.

An analogy can be made to empty strings, which can be used to concatenate the beginning, end, and space between characters.

'hello'= = =' ' + 'h' + ' ' + 'e' + ' ' + 'l' + ' ' +  'l' + ' ' + 'o' + ' ' // true

Copy the code

What are the locations?

The symbols commonly used to indicate position in a re are:

^, $, \b, \b,? = p, (? ! P) and (? < = p) and (? <! p)

Let’s figure them out one by one.

^

Decarbonate matches the beginning of a line

For example, how to insert a smiley face at the beginning of hello (😄)


let string = 'hello'

console.log(string.replace(/ ^ /.'😄')) / / 😄 hello


Copy the code

$

The dollar sign matches the end of the line

How about a smiley face at the end of hello (😄)?


let string = 'hello'

console.log(string.replace(/ $/.'😄')) / / hello 😄


Copy the code

These two symbols represent the position of the head and tail, I believe you must be familiar with.

\b

Word boundaries, specifically speaking, have three rules.

① The position between \w and \w

② ^ and \w

③ the position between \w and $

For example, if you want to change an episode in the tutorial folder on your computer to ❤️ xxx_LOVE_study_1 ❤️.❤️mp4❤️, how do you do that?

You only need to execute one line of code


'xxx_love_study_1.mp4'.replace(/\b/g.'❤ ️') / / ❤ ️ xxx_love_study_1 ❤ ️. ❤ ️ mp4 ❤ ️

Copy the code

Let me draw a picture of it

\B

The rules for the non-word boundary, which means \b the other way around, are as follows:

① The position between \w and \w

② the position between \W and \W

③ position between ^ and \W

④ the position between \W and $

Again, using the seeds in the learning tutorial folder, modify a little bit. When I execute this line of code, what does it output?

'[[xxx_love_study_1.mp4]]'.replace(/\B/g.'❤ ️')

Copy the code

.

Yes, it’s full of love!! I can hardly read the name.

❤ ❤ ️ [❤ ️ [x ️ x ❤ ❤ ️ x ️ _ ❤ ️ l ❤ ️ o ❤ v ❤ ️ ️ e ❤ ️ _ ❤ ️ s ❤ ❤ ️ t ️ u ❤ ❤ ️ d ️ ❤ y ️ _ ❤ ️1.M ❤ ️ p ❤ ️4] ❤ ️] ❤ ️Copy the code

The illustration is explained below

(? =p)

Fits the position in front of the P subpattern. In other words, there is a position that immediately follows that needs to satisfy the P subpattern. There is also a scientific term called forward antecedent assertion.

Xxx_love_study_1.mp4, insert ❤️ before XXX (XXX can refer to any TA you like).

Is that so? Not, this will lead to your XXX disappeared, that also ❤️ do what?


'xxx_love_study_1.mp4'.replace('xxx'.'❤ ️') / / ❤ ️ _love_study_1. Mp4


Copy the code

Use (? =p) can be very convenient this matter (can think of and above what difference?)


'xxx_love_study_1.mp4'.replace(/ (? =xxx)/g.'❤ ️') / / ❤ ️ xxx_love_study_1. Mp4


Copy the code

Drawing understanding

(? ! p)

(? =p =p =p =p =p =p) all positions other than the matched position belong to (? ! P), it also has a science called negative antecedent assertion.


'xxx_love_study_1.mp4'.replace(/ (? ! xxx)/g.'❤ ️') 

/ / (? The output of = XXX)❤ ️ xxx_love_study_1. Mp4/ / (? ! The output of the XXX)X x x ❤ ️ ❤ ️ ❤ ️ _ ❤ ️ l ❤ ️ o ❤ v ❤ ️ ️ e ❤ ️ _ ❤ ️ s ❤ ❤ ️ t ️ u ❤ ❤ ️ d ️ ❤ y ️ _ ❤ ️1❤ ️. ❤ ❤ ️ m ️ p ❤ ️4❤ ️Copy the code

For a closer comparison, isn’t it possible that (? = XXX) matches the first position, all other positions are (? ! XXX) matched.

(? <=p)

After conforming to the p subpattern (note (? =p) represents the position in front of). In other words, there is a position, and the front part of it needs to satisfy the P subpattern.

Here’s another example: we’re going to insert ❤️ after “XXX” (” XXX “can refer to any person you like).

'xxx_love_study_1.mp4'.replace(/ (? <=xxx)/g.'❤ ️') / / XXX ❤ ️ _love_study_1. Mp4


Copy the code

Drawing interpretation

(? <! p)

(? <=p); <=p) any position other than the matched position belongs to (? <! P),


'xxx_love_study_1.mp4'.replace(/ (? 
      .'❤ ️') 

/ / (? The output of < = XXX)XXX ❤ ️ _love_study_1. Mp4/ / (? 
      ❤ ❤ ❤ ️ x ️ x ️ x_ ❤ ️ l ❤ ❤ ️ o ️ v ❤ ❤ ️ e ️ _ ❤ ️ s ❤ ❤ ️ t ️ u ❤ ❤ ️ d ️ ❤ y ️ _ ❤ ️1❤ ️. ❤ ❤ ️ m ️ p ❤ ️4❤ ️Copy the code

For a closer comparison, isn’t it possible that (? <= XXX) matches the following position, all other positions are (? <! XXX) matched.

Chestnut,

Now that we’re done with location, let’s do the first couple of problems

Topic 1: The thousandth division of numbers

Convert 123456789 to 123,456,789

Observe the pattern of the problem is to add a comma before every third digit, starting from the back. Does it fit (? What’s the rule of p? P can represent every three digits, and the comma to be added is exactly where (? =p) matched position.

Step one, try to get the first comma out of it



let price = '123456789'
let priceReg = / (? =\d{3}$)/

console.log(price.replace(priceReg, ', ')) // 123456,789


Copy the code

Step two, get all the commas out

To get all the commas out, the main problem is how to represent groups of three numbers, multiples of three. We know that regular brackets can turn a p pattern into a small whole, so using the properties of brackets, we can write it this way



let price = '123456789'
let priceReg = / (? =(\d{3})+$)/g

console.log(price.replace(priceReg, ', ')) / /, 123456789

Copy the code

Step three, remove the first comma,

This is basically enough, but it’s not enough. The first comma will appear, so how to remove the first comma? Is there a piece of knowledge that fits this scenario? That’s right. ! P), that’s it, the combination of the two is to add a comma before every three digits, but this position cannot be the first.


let price = '123456789'
let priceReg = / (? ! (^)? =(\d{3})+$)/g

console.log(price.replace(priceReg, ', ')) / / 123456789
Copy the code

Title 2: Cell phone number 3-4-4 split

Change the phone number 18379836654 to 183-7983-6654

With the thousandths of the number above, it would be much easier to do the problem, which is to find the position from back to front:

Every four digits in front of the position, and replace this position with –


let mobile = '18379836654'
let mobileReg = / (? =(\d{4})+$)/g

console.log(mobile.replace(mobileReg, The '-')) / / 183-7983-6654

Copy the code

Title 3: Cell phone number 3-4-4 split extension

Convert the phone number up to 11 digits to 3-4-4 format

Think of a scenario where we have a form that needs to collect the user’s cell phone number. The user enters it digitally, and we need to convert it to 3-3-4 format as the user enters an 11-digit cell phone number. namely

123= >123
1234= >123-4
12345= >123-45
123456= >123-456
1234567= >123-4567
12345678= >123-4567-8
123456789= >123-4567-89
12345678911= >123-4567-8911

Copy the code

(this use? =p) is not suitable, for example, 1234 would become -1234. Do you think any of the previous information is suitable for dealing with this scenario? Yes (? <=p)

The first step is to get the first – out

const formatMobile = (mobile) = > {
  return String(mobile).replace(/ (? <=\d{3})\d+/.The '-')}console.log(formatMobile(123)) / / 123
console.log(formatMobile(1234)) / / 123-4

Copy the code

Let’s get the second – out

After removing the first -, the length of the character is increased by one bit, so 1234567(this position inserts -)8 will be moved one bit back

const formatMobile = (mobile) = > {
  return String(mobile).slice(0.11)
      .replace(/ (? <=\d{3})\d+/.($0) = > The '-' + $0)
      .replace(/ (? < = [-] \ d \ d {1, 4} {8}) /.($0) = > The '-' + $0)}console.log(formatMobile(123)) / / 123
console.log(formatMobile(1234)) / / 123-4
console.log(formatMobile(12345)) / / 123-45
console.log(formatMobile(123456)) / / 123-456
console.log(formatMobile(1234567)) / / 123-4567
console.log(formatMobile(12345678)) / / 123-4567-8
console.log(formatMobile(123456789)) / / 123-4567-89
console.log(formatMobile(12345678911)) / / 123-4567-8911

Copy the code

Topic 4: Verify password validity

The password contains 6 to 12 characters, including digits, lowercase characters, and uppercase letters, but must contain at least two types of characters

They’re made up of three conditions

The password contains 6 to 12 characters

② Consists of digits, lowercase characters, and uppercase letters

③ Must contain at least two characters

The first step is to write the conditions ① and ② and the re

Let reg = / ^] [a zA - Z \ d {6, 12} $/Copy the code

The second step must contain some character (number, lowercase, uppercase)

let reg = /(? =.*\d)/ // This re means that it matches a position which needs to satisfy any number of symbols followed by a number. Note that it ends up with a position. Log (reg.test('hello')) // false console.log(reg.test('hello1')) // true Console. log(reg.test('hel2lo')) // true // Same for other typesCopy the code

Third, write out the full re

It must contain two types of characters, which can be combined in the following four ways

① Combination of numbers and lowercase letters

② Combination of numbers and uppercase letters

③ Combination of lowercase and uppercase letters

④ A combination of numbers, lowercase letters, and uppercase letters

// 表示条件①和②
// let reg = /((?=.*\d)((?=.*[a-z])|(?=.*[A-Z])))/
// 表示条件条件③
// let reg = /(?=.*[a-z])(?=.*[A-Z])/
// 表示条件①②③
// let reg = /((?=.*\d)((?=.*[a-z])|(?=.*[A-Z])))|(?=.*[a-z])(?=.*[A-Z])/
// 表示题目所有条件
let reg = /((?=.*\d)((?=.*[a-z])|(?=.*[A-Z])))|(?=.*[a-z])(?=.*[A-Z])^[a-zA-Z\d]{6,12}$/


console.log(reg.test('123456')) // false
console.log(reg.test('aaaaaa')) // false
console.log(reg.test('AAAAAAA')) // false
console.log(reg.test('1a1a1a')) // true
console.log(reg.test('1A1A1A')) // true
console.log(reg.test('aAaAaA')) // true
console.log(reg.test('1aA1aA1aA')) // true

Copy the code

2. String matching used to be so simple

Two kinds of fuzzy matching

Regular matches are meaningless if they are only exact matches

The transverse

The length of a string that can be matched by a re is not fixed. , {m,n}, can realize horizontal matching

let reg = / / ab} {2 and 5 c
let str = 'abc abbc abbbc abbbbc abbbbbc abbbbbbc'

str.match(reg) // [ 'abbc', 'abbbc', 'abbbbc', 'abbbbbc' ]
Copy the code

The longitudinal

A regular string matching, specific to a particular character, can string is not certain, can have a variety of possible realization way is character groups (actually, also can realize multiple branch |)

let reg = /a[123]c/
let str = 'a0b a1b a2b a3b a4b'

str.match(reg) // [ 'a1b', 'a2b', 'a3b' ]
Copy the code

Character groups

Don’t be fooled by the name, it’s called a character set, but it’s really just the possibility of one character

Range notation

[123456abcdefABCDEF] => [1-6a-fA-F]
Copy the code

Excluded character group

A certain character can be anything but XXX, using the ^ sign

Question: How do you represent anything but a word?

[^abc]
Copy the code

Common shorthand form

\d / / digital
\D / / the number
\w // [0-9a-zA-Z_]
\W // [^0-9a-zA-Z_]
\s // [\t\v\n\r\f]
\S // [^\t\v\n\r\f]
.
Copy the code

quantifiers

Quantifiers & abbreviations

1. {m,} At least m occurrences
2. {m} // m times
3. ? {0,1}}
4. + // occur at least once, equivalent to {1,}
5. * {0,}}
Copy the code

Greedy match vs. lazy match

The re itself is greedy, matching as many characters as possible that fit the pattern

let regex = / \ d {2, 5} / g
let string = '123, 1234, 12345, 123456'
// Greedy match
// string.match(regex) // [ 123, 1234, 12345, 12345 ]

// Lazy matching
let regex2 = / \ d {2, 5}? /g
// string.match(regex) // [ 12, 12, 34, 12, 34, 12, 34, 56 ]
Copy the code

Add one after the quantifier? , which becomes lazy matching

Greedy quantifiers {m,n} {m,n}? {m,} {m,}? ? ?? + +? * *?Copy the code

Multiple branch

A model can realize fuzzy matching, horizontal and vertical and multiple branch can support multiple child mode either, form is | p2 | p1 (p3)

let regex = /good|nice/
let string = 'good idea, nice try.'

// string.match(regex) // [ 'good', 'nice' ]

/ / note that use/good | goodbye/to match 'goodbye' is a good match
// Since the branch structure is lazy, the first one matches, and the second one does not try again
Copy the code

Case analysis

1. The matching id

/ / 1
let regex = /id=".*?" / // Why? If it is not added, it will match the following class
let string = '<div id="container" class="main"></div>';
console.log(string.match(regex)[0]);
/ / 2
let regex = /id="[^"]*"/ 
let string = '<div id="container" class="main"></div>'console.log(string.match(regex)[0]); 
Copy the code

2. Match the hexadecimal color value

// Match the following colors
/*
#ffbbad
#Fc01DF
#FFF
#ffE
*/

let regex = /#([a-fA-F\d]{6}|[a-fA-F\d]{3})/g
let string = "#ffbbad #Fc01DF #FFF #ffE";

console.log(string.match(regex))
// ["#ffbbad", "#Fc01DF", "#FFF", "#ffE"]
Copy the code

3. Match the 24-hour system time

/* Requires a match of 23:59 02:07 */
/ / resolution:
// The first digit can be 0, 1, 2
// Second digit: 0 to 9 when the first digit is 0 or 1, 0 to 3 when the first digit is 2
// Third bit: fixed colon:
// Fourth digit: 0 to 5
// Number 5:0 to 9

let regex = /^([01]\d|2[0-3]):[0-5]\d$/

console.log(regex.test('23:59')) // true
console.log(regex.test('02:07'))// true

// Can be non-zero
let regex = / ^ (0? \d|1\d|2[0-3]):(0? |[1-5])\d/

console.log( regex.test("23:59."))// true
console.log( regex.test("02:07"))// true
console.log( regex.test("7:09"))// true
Copy the code

4. Match the date

/* Requires a date in yyyY-MM-DD format. Note the month, day and day matches */

let regex = /\d{4}-(0\d|1[0-2])-(0[1-9]|[12]\d|3[01])/

console.log( regex.test("2017-06-10"))// true
console.log( regex.test("2017-11-10"))// true
Copy the code

3. The magic of brackets

The parentheses provide grouping (the regular in parentheses is a whole, providing a subexpression) so that we can refer to it

grouping

How do quantifiers work on a whole?

let reg = /(ab)+/g
let string = 'ababa abbb ababab'

console.log(string.match(reg)) // ["abab", "ab", "ababab"]
Copy the code

Branching structure

Branching structure is a bit like the concept of programming or | |

/* Match I love JavaScript I love Regular Expression */

let reg = /I love (JavaScript|Regular Expression)/

console.log(reg.test('I love JavaScript')) // true
console.log(reg.test('I love Regular Expression')) // true
Copy the code

Grouping reference

You can create subexpression through parentheses for data extraction and powerful substitution operations, and you can refer to grouped content through JS

Extract the data

/* Extract date 2021-08-14 */

let reg = /(\d{4})-(\d{2})-(\d{2})/

console.log('2021-08-14'.match(reg))
// ["2021-08-14", "2021", "08", "14", index: 0, input: "2021-08-14", groups: undefined]

// The second solution is through the global $1... $9 reads the referenced parenthesis data
let reg = /(\d{4})-(\d{2})-(\d{2})/
let string = '2021-08-14'

reg.test(string)

console.log(RegExp. $1)/ / 2021
console.log(RegExp. $2)/ / 08
console.log(RegExp. $3)/ / 14
Copy the code

Data to replace

/* Replace the following format with mm/dd/ YYY 2021-08-14 */
// The first solution
let reg = /(\d{4})-(\d{2})-(\d{2})/
let string = '2021-08-14'
// The first way
let result1 = string.replace(reg, '$2 / $3 / $1')
console.log(result1) / / 08/14/2021
// The second way
let result2 = string.replace(reg, () = > {
    return RegExpThe $2 +'/' + RegExp. $3 +'/' + RegExp.$1
})
console.log(result2) / / 08/14/2021
// The third way
let result3 = string.replace(reg, ($0, $1, $2, $3) = > {
    return $2 + '/' + $3 + '/' + $1
})
console.log(result3) / / 08/14/2021
Copy the code

Backreference (important)

In addition to referring to grouped content via JS, you can also refer to grouped content via re

/* Write a re to support the following three formats: 2016-06-12 2016/06/12 2016.06-12 */
let regex = /(\d{4})([-/.])\d{2}\1\d{2}/

var string1 = "2017-06-12";
var string2 = "2017/06/12";
var string3 = "2017.06.12";
var string4 = "2016-06/12";

console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // false
Copy the code

Pay attention to

  1. What happens when you reference a nonexistent subset?

    1. That is, the match is \1 \2 itself
  2. What happens when groups have quantifiers after them?

    1. If a group is followed by a quantifier, the final capture of the group is the last match
'12345'.match(/(\d)+/) // ["12345", "5", index: 0, input: "12345", groups: undefined]

/(\d)+ \1/.test('12345 1') // false
/(\d)+ \1/.test('12345 5') // true
Copy the code

Non-trapping parentheses

The parentheses used above all match the data they match for subsequent reference, so they can also be called capture grouping and capture branching.

If you want the primitive functionality of parentheses, but do not refer to them, that is, they do not appear in either API or regular references

Non-trapping parentheses (? : p)

// Non-capture reference
let reg = / (? :ab)+/g
console.log('ababa abbb ababab'.match(reg)) // ["abab", "ab", "ababab"]
// Note that the match method does not appear at position 1 of the array because it is not a capture grouping
let reg = / (? :ab)+/
console.log('ababa abbb ababab'.match(reg)) // ["abab", index: 0, input: "ababa abbb ababab", groups: undefined]
let reg = /(ab)+/
console.log('ababa abbb ababab'.match(reg)) // ["abab", "ab", index: 0, input: "ababa abbb ababab", groups: undefined]
Copy the code

case

1. Trim method simulation

// 1. Extract middle key characters, use group references
const trim1 = (str) = > {
  return str.replace(/^\s*(.*?) \s*$/.'$1')}// 2. Remove the leading and trailing null characters
const trim2 = (str) = > {
    return str.replace(/^\s*|\s*$/g.' ')}Copy the code

Capitalize the first letter of each word

The key is to find the first letter of each word

// my name is epeli

const titleize = (str) = > {
  return str.toLowerCase().replace(/ (? :^|\s)\w/g.(c) = > c.toUpperCase())
}  

console.log(titleize('my name is epeli')) // My Name Is Epeli

// extend the hump horizontally, for example base-act-tab => BaseActTab
'base-act-tab'.replace(/ (? :^|-)(\w)/g.($0, $1) = > $1.toUpperCase()) // BaseActTab
Copy the code

3. The hump

// -moz-transform => MozTransform
const camelize = (str) = > {
    return str.replace(/[-_\s]+(\w)/g.(_, $1) = > $1.toUpperCase())     
}

console.log(camelize('-moz-transform')) // MozTransform
Copy the code

4. Underline

// MozTransform => -moz-transform
const dasherize = (str) = > {
    return str.replace(/[A-Z]/g.($0) = > (The '-' + $0).toLowerCase())
}

console.log(dasherize('MozTransform')) // -moz-transform
Copy the code

5.HTML escape and anti-escape

/ / HTML escaping the rules can be seen in https://blog.wpjam.com/m/character-entity/

const escapeHTML = (str) = > {
    const escapeChars = {
    '<''lt'.'>''gt'.'"''quot'.' '':'#39', '&':'amp' } let regexp = new RegExp(`[${Object.keys(escapeChars).join('')}] `, 'g') // To get the character group [<>"'&]
    
    return str.replace(regexp, (c) = > ` &${escapeChars[ c ]}; `)}console.log( escapeHTML('<div>Blah blah blah</div>')) // < div> Blah blah blah< /div>


/ / unescapes
const unescapseHTML = (str) = > {
    const htmlEntities = {
    nbsp' '.lt'<'.gt'>'.quot'"'.amp'&'.apos' '' } return str.replace(/&([^;] +); /g, ($0, $1) => { return htmlEntities[ $1 ] || '' }) } console.log(unescapseHTML('&lt; div&gt; Blah blah blah&lt;/div> ')) // <div>Blah blah blah</div>
Copy the code

6. Match pairs of labels

/* Match regular expression 

laoyao bye bye

*/
let reg = ([^ / < >] +) >. *? </\1>/g console.log(reg.test('<title>regular expression</title>')) // true console.log(reg.test('<p>laoyao bye bye</div>')) // false Copy the code

Meet goodbye

I strongly recommend Lao Yao’s mini-book on regular expressions. After reading this book, I gradually began to understand regular and no longer resisted it. This article is mainly based on the content of this book to make a summary.

“Welcome to the discussion in the comments section. The excavation authorities will draw 100 nuggets in the comments section after project Diggnation. See the event article for details.”

reference

  1. Full tutorial on regular expressions
  2. 30 minutes package will – regular expressions
  3. Talk about annoying regular expressions