As far as I know, only the front end of regular expressions is not good, which may be related to the fact that most front ends write less regular expressions and the regular expressions used are relatively simple. The power of regular expressions seems to be greatly underestimated by the front end.

I will look back at the collection of articles and write out the regex I want. I have only a partial understanding of regex. I happened to be working on a rich text editor a few days ago, and I used some slightly complex regex, so I looked back at my favorites again. It doesn’t feel the same this time.

Common character

There are many characters that need to be memorized in regular expressions. Here’s a list of common characters and their meanings:

Common matching characters meaning
[0-9] Matches single digits 0-9
[a-z] Matches a single lowercase letter
[A-Z] Matches a single uppercase letter
\s Matches all whitespace characters, such as Spaces, newlines, and so on
\n Matches all newline characters
\b Matches boundaries such as Spaces between characters
Special characters meaning usage
^ 1. Matches the start position of the input string. 2 1. /^http/Matches the string 2 starting with HTTP./[^a-zA-Z]/Match non-letters
$ Matches the end of the input string /.com$/Matches a string ending in.com
| Choose one or the other /a|b/Match either a or B
. The decimal point matches any single character other than the newline \n character / /.Matches characters other than newlines
[] Brackets match a character /[aeiou]/Matches one letter of aeiou
(a) The parentheses represent a group of subexpression The matched subexpression can be used later
{} Curly braces indicate how many times to qualify an expression {n} matches n times; {n,} matches at least n times; {n, m} matches n-m times
+ Matches the previous subexpression one or more times / [0-9] + /Matches one or more digits
* Matches the previous subexpression zero or more times / [0-9] * /Matches a number or numbers zero times
? 1. Matches the previous subexpression zero or one time. 2 1. / [0-9]? / 2. / <. *? >/Match a label such as<p>
Special characters themselves need to be escaped. There are the following:
*.? + $^ [] () {} | \ /Copy the code
  • Among them/Escape is required in literals, not in constructors, and matches a slash as follows/.
const reg = / / / /
const reg = new RegExp('/')
Copy the code
  • Take an escape character in a literal\The constructor is written with two escape characters\ \Matches a string as follows.
const reg = / /. /
const reg = new RegExp('\ \.)
Copy the code

Commonly used method

Regular expressions in JS are divided into literals and constructors:

/ / literal
const reg = /[0-9a-z]/g
// The constructor
const reg = new RegExp('[0-9a-z]'.'g')
Copy the code

Where literals cannot contain variables, constructors can use variables:

const name = 'Illyere'
const reg = new RegExp('My name is${name}`)
Copy the code

The reg.test(STR) method is often used to determine whether a regular expression is matched in a string:

const reg = / [0-9]
const str = 'Are the numbers 1234 and so on in the text?'
if (reg.test(str)) {
  ...
}
Copy the code

It is also common to replace the contents of a string with the str.replace(reg, “) method:

const reg = /[0-9]/g
const str = 'Replace the number 1234 in the text with x's.'
const newStr = str.replace(reg, 'x')
Copy the code

The str.match(reg) method is also used to get a match(reg.exec(STR) is also used) :

const reg = /[0-9]+[.][0-9]+[.][0-9]+/g
const str = 'Here's a table called 11.11.11.'
str.match(reg) / / / '11.11.11'
Copy the code
  • A regular expression in match that uses the G flag returns all results that match the full regular expression, but does not return the capture group.
  • If the G flag is not used, only the first full match and its associated capture group (Array) are returned. In this case, the returned item will have the other properties described below.

Lao Tzu said: The following is the essence

Greed & not greed

The * and + qualifiers are greedy and try to match as many words as possible. Do you add one after them? You can achieve non-greed or minimal matching.

  • Greed (greed by default)
const str = '

Regular expression

'
const reg = > / <. * / str.match(reg) // ['

regular expression

']
Copy the code
  • Not greed

const str = '

Regular expression

'
const reg = / <. *? >/ str.match(reg) // ['<h1>'] Copy the code

Capture grouping and backtracking references

Subexpressors matched by parentheses () are cached as groups to be referenced later. Suppose you want to get the H1 tag in HTML:

  • Using \n in a regular expression can refer to the NTH capture group
const str = '

Regular expression

Regular expression

Regular expression

'
const reg = /<(h1)>.+? < 1 > \ / \ / str.match(reg) // ['

regular expression

']
Copy the code
  • Used outside of regular expressions$nReference the NTH capture group (regexp.$n)
const str = 'abc'
const reg = /(abc)/
RegExp$1.// 'abc'
str.replace(reg, '$1 $1') // 'abcabc'
Copy the code

Non-capture grouping and qualified lookup

Because the capture group () caches each captured result for reference, memory usage increases. If you just want to use the raw functionality of grouping without caching, you can use non-capture grouping (? 🙂

const str = 'abc'
const reg = / (? :abc)/
RegExp$1./ /"
Copy the code

Non-capture groups also have (? =) and (? < =), (? !). And (? The

Prior to find

Forward lookup is used to restrict suffixes.

  1. (? =): Indicates that the search is qualified(? =)(The output does not include(? =)Matches in)
const str = 'a.png b.jpg c.gif d.svg'
// Find all the lowercase letters at the beginning of the boundary, before.svg.
const reg = /\b[a-z](? =.svg)/g
str.match(reg) // ['d']
Copy the code
  1. (? !).: that is, to findDo not conform to thequalification(? !).(The output does not include(? !).Matches in)
const str = 'a.png b.jpg c.gif d.svg'
[a-z]{3} '.[a-z]{3} '
const reg = /\b[a-z](? ! .svg)(? =\.[a-z]{3})/g
str.match(reg) // ['a', 'b', 'c']
Copy the code

To find after

Backward lookups are used to restrict prefixes.

  1. Find that the qualification is met(? < =)(The output does not include(? < =)Matches in)
const str = '1. 1111; 2. 2222; 3. 3333; 4. 4444. '
// Look for all items following the sequence number.
const reg = / (? <=\b[0-9]+\.\s).+? [.] /g
str.match(reg) // ["1111; ", "2222; ", "3333; ", "4444. "]
Copy the code
  1. To find theDo not conform to thequalification(? The <!(The output does not include(? The <!Matches in)
const str = 'a.png b.jpg c.gif d.svg'
// Look for items not prefixed by a, B, or c
const reg = /\b(? 
      
str.match(reg) // ['svg']
Copy the code

Have a chestnut that you can eat

In general, slightly more complex regexes use more than one rule at a time. Here are a few examples:

Both forward and backward search are used:

Suppose you want to get the link in the data-img-URL attribute in . The left side of the link must be data-img-URL =” and the right side must be “(not greedy).

const str = '<img crossorigin src="https://abcdefg.com" data-img-url="https://test.com">'
const dataImgUrl = 'data-img-url'
const reg = new RegExp(` (? < =${dataImgUrl}= ". +? (? ` = ").'g')
str.match(reg) // ['https://test.com']
Copy the code

Backtracking references and non-greed

If I want to get a piece of HTML text, but I don’t want the text in a tag with the not-show-in-text tag, I can do this:

const notShowInText = 'not-show-in-text'
const html = `
  <p>test1</p>
  <p ${notShowInText}style="text-align: center;" > <b> Table 1.4.4 Test table </b> </p> <p>test2</p> '
const reg = new RegExp(`<([a-z][a-z1-6]*?) [^ >] +${notShowInText}[\\s\\S]+? < 1 > \ \ `.'g')
const text = html.replace(reg, ' ').replace(/<[^>]+>/g.' ')
Copy the code

The key is to match the entire tag where not-show-in-text is located. ([a-z][a-z1-6]*?) Matches a non-greedy tag name, [^>] guarantees that < to > is a half-complete tag, and
matches a closed tag, [\\s\ s]+? Matches any element that the tag can see and is non-greedy.

The second argument to replace can be a callback function

For example, if you want to replace the format YYYY-MM-DD with mm/ DD/YYYY, what do you do?

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, "$2 / $3 / $1");
console.log(result); / / "06/12/2017"
Copy the code

The second parameter of replace refers to the corresponding group with $1, $2, and $3. Is equivalent to the following form:

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, function() {
	return RegExpThe $2 +"/" + RegExp. $3 +"/" + RegExp. $1; });console.log(result); / / "06/12/2017"
Copy the code

It is also equivalent to:

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, function(match, year, month, day) {
	return month + "/" + day + "/" + year;
});
console.log(result); / / "06/12/2017"
Copy the code

In this paper, the reference

  • Don’t memorize regular expressions
  • Functions of regular expression parentheses