preface

A number of formats and queries in recent projects use regular expressions. Simple re syntax can be written, but I haven’t spent much time researching the complex re encountered. Just in time to take advantage of this wave of research interest, summarize what I’ve learned, no longer soulless after Google copy and paste. In the future, you won’t have to look at elegant code written by others.

Recommended tools

  1. The regex is used for testing
  2. Regexper is used to parse

Regexper’s parsing is really useful in graphically displaying all logical branches of a re. For the long regex that is difficult to understand, or the complex regex that is written by oneself, we can first parse it into graphical logic to analyze, greatly reducing the difficulty. A random regex is written as follows:

Regular expression basics

Character classes

character matching character matching
[] Any character inside square brackets (^) Any character that is not inside square brackets
\w [a-zA-Z0-9] \W [^a-zA-Z0-9]
\s Any Unicode whitespace character \S Any Unicode whitespace character
\d [0-9] \D [^ 0-9]
. Any character except newline and other Unicode terminators \b Word boundaries. The boundary between \w and \w.

repeat

character meaning
{n,m} n <= matches < m
{n,} matches >= n
{n} matches === n
? {0, 1}
+ {1,}
* , {0}

Note:

* and? Zero characters can be matched, so they allow nothing to be matched.

/a*/.test('bbbb')   //=> true
Copy the code

Non-greedy repetition

Normal matching of repeated characters is as many matches as possible, which we call “greedy” matches.

Non-greedy match levels match as few as possible. Simply add a question mark after the character to be matched

// Greedy match
'bbbb'.match(/b+/g)   // => ["bbbb"]
// Non-greedy match
'bbbb'.match(/b+? /g)   // => ["b", "b", "b", "b"]
Copy the code

Regular expression matches always look for the first possible position in the string. So it doesn’t take into account shorter matches in its substring.

'aaab'.match(/a+b/g)  // => ["aaab"]
// You might expect 'ab', but you don't.
'aaab'.match(/a+? b/g)  // => ["aaab"]
Copy the code

choose

character meaning
| Or relationship

reference

character meaning
(a) 1. Synthesize individual items into subexpression so that they can be used as a single unit.
2. Define subschemas within the complete schema. When a regular expression successfully matches the target string, the subpattern matches can be extracted from the target string.
3. Allow references to previous subexpressions after the same regular expression.\ 1We refer to the first parenthesized subexpression.3 \The third one was quoted. Because it can be nested, it is determined by the position of the left parenthesis participating in the count.
\n Matches the string matching the NTH group
\k Invoke named groups in syntactic mode, similar to \n

One advantage of a reference is that it is not the same subexpression, but is equal to the text that matches the pattern of the reference subexpression. That is, each part of a string contains exactly the same string.

For example, matching single or double quotes:

// It does not require that the left and right single and double quotes match
const partten = /['"][^'"]*['"]/g
/ / to
const partten = /(['"])[^'"]*\1/g
Copy the code

grouping

character meaning
(a) You can remember the string that matches this combination for future references
(? 🙂 Pure grouping, not remembering strings that match the combination
(?) Name the group by name

The anchor

Similar to \b that does not match any character, specifies the legal place where the match occurs. Sometimes we become anchors for regular expressions.

character meaning
\b A word boundary, the boundary between \w and \w, or the boundary between a word and the beginning and end of a string.
\B Non-word boundary
^ Beginning of a string
$ End of string

Contextual constraint

character meaning
(? =) Preconstraint – Exists
(? !). Preconstraint – exclusion
(? < =) Post constraint – Exists
(? The <! Post constraint – Exclusion

The modifier

character meaning
i Execution is case insensitive
g Find all matches, or stop after finding the first one
m Multiple lines match, ^ matches the beginning of a line and $matches the end of a line
s (es2018) make.Matches all characters, including newlines

JavaScript Basics

String methods for pattern matching

String.prototype.search()

Returns the position in the string of the first match that satisfies the condition. If there is no match, -1 is returned. Note: Global search is not supported. Only the location of the first match is returned.

"This is a test text".search(/th/i)  / / = > 0
Copy the code

String.prototype.replace()

The replace method of a string object can replace the matched value. It takes two parameters, the first is the regular expression, which represents the search pattern, and the second is what to replace.

var str = ' #id div.class ';

str.replace(/^\s+|\s+$/g, ' ') / /"#id div.class"
Copy the code

The second argument to the replace method can use the dollar sign $to refer to what is being replaced.

symbol meaning
$& Matching substring
$` Matches the text before the result
$’ Matches the text following the result
$n The content of the NTH group, starting from 1, is matched successfully.
$ The content of the named group was matched successfully
? $$

The second argument to the replace method can also be a function that replaces each match with the return value of the function.

var a = 'The quick brown fox jumped over the lazy dog.';

a.replace(pattern, function replacer(match) {
  return match.toUpperCase();
});
Copy the code

String.prototype.match()

Returns a matching array or NULL. The g modifier is valid.

String.prototype.split()

str.split(separator, [limit])
Copy the code

The RegExp object

RegExp.prototype.test()

Returns a Boolean type

If the regular expression has a G modifier, each test method matches backwards from where it last ended.

With the G modifier, you can specify where to start the search through the lastIndex property of the re object.

var r = /x/g;
var s = '_x_x';

r.lastIndex = 4;
r.test(s) // false
Copy the code

If the regular pattern is an empty string, all strings are matched.

RegExp.prototype.exec()

The exec method of the re instance object, used to return a match. Returns an array of successfully matched substrings if a match is found, or NULL otherwise.

var s = '_x_x';
var r1 = /x/;
var r2 = /y/;

r1.exec(s) // ["x"]
r2.exec(s) // null
Copy the code

If the regular representation contains parentheses (that is, “group matches”), the returned array will contain more than one member. The first member is the result of a successful match, and the following members are the matched groups for which the parentheses correspond. That is, the second member corresponds to the first parenthesis, the third to the second parenthesis, and so on. The length property of the entire array is equal to the number of group matches plus one.

var s = '_x_x';
var r = /_(x)/;

r.exec(s) // ["_x", "x"]
Copy the code

The exec method returns an array that also contains the following two properties:

  1. input: The entire string.
  2. index: Indicates the starting position (counting from 0) at which the entire pattern matches successfully.

If the regular expression is followed by the G modifier, the exec method can be used multiple times, with the next search starting at the location where the last match successfully ended.

New ES2018 RegExp feature

Well, after writing this article, I found out the new features of ES 2018. So while it was hot, I added it myself.

1. Add modifier s

Dot (.) Is a special character in a regular expression pattern that matches any character except a newline character. This results in a special method if we want to match all characters, including newlines. Such as [\ d \ d]…

ES2018 introduces a pattern in which points can be used to achieve the same results. This pattern can be activated on a per-regular expression basis using the S flag:

const regold = /test.test/
console.log(regold.test('test\ntest')) // => false

const  reg = /test.test/s
console.log(reg.test('test\ntest')) // => true
Copy the code

2. Namable groups

symbol meaning
(? <name> ) Name the group by name
\k<name> A named grouping matching string called in syntactic mode, similar to \n
$<name> The content of the named group was matched successfully

3. Support after assertion

All right, I didn’t have the support.

4. Unicode attribute escape

character meaning The sample
\p Matches Unicode characters in a string /\p{Number}/uMatches any decimal number in Unicode/\p{Alphabetic}/uMatches any Unicode alphabetic character
\P Negative mode

exercises

The basics are pretty much there. The best way to remember the instructions is to do them.

  1. Matches the three digits at the beginning of the search string.
  2. Matches any character except a, B, and C.
  3. '1234567'. The match (\ d {1, 3} / g)Results. (Greed match)
  4. Words that do not begin with “th” match.
  5. Removes Spaces at the beginning and end of strings.
  6. Format a number in three quartiles.
  7. Apply the following restrictions to passwords: they must be between 4 and 8 characters long and must contain at least one number.
  8. Gets the corresponding value of the attribute in the URL

There are also some websites for regular exercises:

regexone.com/

Callumacrae. Making. IO/regex – tuesd…

If you know any interesting regular questions, please share them

The resources

  1. The RegExp object
  2. Regular expression practice
  3. learn-regex
  4. The Definitive JavaScript Guide (6th edition)
  5. New ES2018 features every JavaScript developer should know about
  6. Character encoding notes: ASCII, Unicode and UTF-8