The main source of this article is the “JavaScript Authority Guide” (6th edition), the author is just moving bricks at the same time sorting out ideas, hope to point out mistakes in time, thank you!

Defining regular expressions

An overview of the

We won’t waste any more breath on the concept of regular expressions… The use of regular expressions for pattern matching in JavaScript is dependent on RegExp objects, which can be created in two ways 1. Regular expression direct quantities (characters contained between a pair of /)

var reg = /java/;
reg.test('java'); // trueCopy the code

2.new RegExp()

var reg = new RegExp('java');
reg.test('java'); // trueCopy the code

JavaScript also gives String methods for pattern matching, but using these methods is also dependent on RegExp for example:

var str = 'java';
str.match(/java/); // / Java/is a regular direct quantityCopy the code

In the following two sections, we will explain how to define regular expressions. In this section, we will use the test and exec methods of RegExp. If you don’t know RegExp, you can combine them with the last section.

Re – direct character quantity

The direct character quantity represents the representation of characters in regular expressions. Special characters need to be escaped

Letters, Numbers, - > itself \ o - > NUL character (\ u0000) \ t - > tabs (\ n \ u0009) - > a newline character (\ u000A) \ v - > vertical tabs (\ u000B) \ f - > form-feed character (\ u000C) \ r - > Carriage return (\u000D)...Copy the code

Re – character classes

A character quantity placed in [] becomes a character class, which matches any character it contains

[abc] // represents a or B or C
[^abc] // ^ Select all characters except a, b, and c
[a-z] //- Indicates a connection, any character from a to ZCopy the code

From the above we can see that [0-9] represents any number, such as common character classes for which JavaScript has its own special escape character.

. // Any character except newline and other Unicode line terminators
\w // The value is equivalent to [A-za-z0-9_], and contains any of 63 characters including uppercase and lowercase letters, digits, and underscores
\W // equivalent to [^ a-za-z0-9_]
\s // Any Unicode whitespace
\S // Any Unicode non-whitespace character
\d // equivalent to [0-9]
\D // equivalent to [^0-9]
[\b] // \b identifies the backspace direct quantity in []Copy the code

Re – repetition

Describes the same character that appears more than once

{n, m} // Repeat at least n times and at most m times
{n, } Repeat at least n times
{n} // Repeat n times
? // equivalent to {0, 1}
+ // equivalent to {1,}
* // equivalent to {0,}Copy the code

Such as:

var reg = new RegExp('a{2,}');
var str = 'aaa';
var str2 = 'a';
reg.test(str); // true
reg.test(str2); // falseCopy the code

In the example above, reg.test(STR) === true. If we use reg.exec(STR) to get the matching result, it will be ‘AAA’ if we choose to use non-greedy repetition

var reg = new RegExp('a{2,}? ');
var str = 'aaa';
reg.exec(str);Copy the code

In this case, the result is ‘aa’. It’s easy to implement non-greedy repetition. In this case, the regular expression matches as few repetitions as possible.

{n, m} -> {n, m}? {n, } -> {n,}? {n} -> {n}? ? - >?? + - > +? * - > *?Copy the code

choose

Can use | segmentation is used to select characters, priority from left to right

ab|cd|ef // Indicates ab, CD, or EFCopy the code

Subpatterns.

We can call a subpattern in the same regular expression by writing \index, which indicates the index of the subpattern, starting at 1.

var reg1 = /(java)script and \1/;
var reg2 = /javascript and java/;
Reg1 and reg2 are basically equivalent regular direct quantities
var str = 'javascript and java';
reg1.test(str); // true
reg2.test(str); // trueCopy the code

Subpatterns also help us extract subexpression match results change the above example to exec method and print

var reg1 = /(java)script and \1/;
var reg2 = /javascript and java/;
var str = 'javascript and java';
console.log(reg1.exec(str));
console.log(reg2.exec(str));Copy the code

The output is:



index

Of course,JavaScript allows us to use subexpression without generating a subpattern. Wrap subexpression:)

var reg = / (? :java)script and java/;Copy the code

At this time, we cannot find the subpattern through \1, nor can we obtain the matching result of the subpattern.

Specify the matching position

^ // The position at which the string begins (denoted inversely in character classes)
$ // The end of the string
\b // word boundary, i.e. the boundary between \w and \w(? =p)// The string is required to match p, but the result set does not contain characters that match p(? ! p)// Requires that the string does not match pCopy the code

^ and $

/^javascript/ // The string starts with javascript
/javascript$/ // The string ends in javascriptCopy the code

\b

var reg = /\bjava\b/;
var str1 = 'java';
var str2 ='javascript';
var str3 = 'java c++';
var str4 = 'html java c++';
reg.test(str1); // true
reg.test(str2); // false
reg.test(str3); // true
reg.test(str4); // trueHere \b matches non-\ W characters, including the beginning and end of the string. \B By contrast, match non-word boundariesCopy the code

(? =)

var reg = /java(? =script)/;
var str = 'java';
var str1 = 'javascript';
reg.exec(str); // The match failed because script was not included
reg.exec(str1); // The match is male, but the result does not contain scriptCopy the code

The output is:


var reg = /java(? ! script)/;
var str = 'javaee';
var str1 = 'javascript';
reg.exec(str); // The match is successful and the matching result is Java
reg.exec(str1); // The match failed because script was includedCopy the code

The modifier

i // case insensitive
m // Match multiple lines (use ^ $to specify the start and end of a line)
g // It does not continue to search for all matches after the first matchCopy the code

Add modifiers by creating regular objects directly: / Java /gim (using multiple modifiers directly side by side) Add modifiers by creating regular objects by constructors: new RegExp(reg, ‘gim’);

Pattern matching by String

The String object provides four methods for regular matching.

search()

Str.search (reg) returns the starting position on success, -1 on failure, and the g modifier does not take effect in the search method

var str = 'hello java';
str.search(/java/); / / 6
str.search(/^java/); // -1Copy the code

match()

STR. Match (reg) returns null on a failed match, and an array of match results is returned on a successful match. If the regular expression has the g modifier set, the method returns an array containing all matches in the string.

var str = 'hello java and javascript';
str.match(/java/); 
str.match(/java/g);Copy the code

The output is:

replace()

Two calls with different second arguments

  • str.replace(reg , replaceStr)

    var str = 'javaee javaee';
    // str1 = 'javascript javaee'
    var str1 = str.replace(/e{2}/ , 'script'); 
    // str2 = 'javascript javascript' modifier g represents global substitution
    var str2 = str.replace(/e{2}/g , 'script'); 
    // STR is still javaEE, for the simple reason that strings cannot be changed directly. All methods that change the value of a string are new values.Copy the code

    The second argument, replaceStr, has some special notation for the replacement string

    var reg = /"([^"]*)"/g; // Match the content between "", and the content does not contain"
    var str = '"java","c++" and "html"';
    var str1 = str.replace(reg , '" $1");
    console.log(str1) / / "Java", "c + +" and "HTML"Copy the code

    At this point, $1 represents the result set that the subpattern ([^”]*) matches, similar to the subpattern \1 that we called ourselves in defining regular expressions in the previous section. Again, there are several other uses of $here

    $index // Is $1 in the example above
    $& -> "Java ","c++" and" HTML"
    $'// matches the left value of the string three times successfully, namely, null value," Java "," Java ","c++" and $' // matches the right value of the string three times successfully, namely: ,"c++" and "HTML", and "HTML", null? // The character constant $Copy the code
  • str.replace(reg , function)

    var reg = /"([^"]*)"/g; // Match the content between "", and the content does not contain"
    var str = '"java","c++" and "html"';
    var str1 = str.replace(reg , function (. arr) {
      console.log(arr)
    });Copy the code

    The output is:

    The arR array contains the following elements: 0: matching result… Last-1: indicates the index position. Last: indicates the original character string

    We can dynamically replace content based on the result we want, using a return to replace the result into a new string.

     var reg = /"([^"]*)"/g; // Match the content between "", and the content does not contain"
     var str = '"java","c++" and "html"';
     var str1 = str.replace(reg , function (. arr) {
      return `"${arr[1]}"`
     });
     console.log(str1) / / "Java", "c + +" and "HTML"Copy the code

    split()

    • Str.split (separator)

      Returns a new array split by delimiter
    • str.split(reg)

      Returns a new array based on the regular partition
      var str = 'a,b-c,e,f-g';
      var arr = str.split(/ [-] /);
      console.log(arr) // ['a','b','c','d','e','f','g']Copy the code

Pattern matching by RegExp

The constructor

var reg = new RegExp('\\w'); // Through the regular expression string
var reg1 = new RegExp(/\w/); // Use regular expressions to measure directly
var reg2 = new RegExp(reg , 'gim'); // The second argument is the modifierCopy the code

attribute

  • source: StringRegular expression text value
  • global: BooleanWhether to carry global modifiersg
  • ignoreCase:BooleanWhether to carry case – ignoring modifieri
  • multiline:BooleanWhether to carry a matching multi-line modifierm
  • lastIndex: Numberifglobal === true, so this parameter records the index position after each match, belowtestandexecThe method will use

methods

  • reg.exec(str)

    Returns if the match failsnullIf the match is successful, only one match result is returned each timeglobal === trueAfter each match is successful, thelastIndexProperty is set to the character position next to the matching substring. Calling this method again will match from the current position (when we use the sameRegExpWhen matching a new string, it is best to putlastIndexProperty set to 0)
    var reg = new RegExp('java' , 'g');
    var str = 'javascript java javaee';
    var result = reg.exec(str);
    while (result) {
      console.log(result)
      console.log(`lastIndex = ${reg.lastIndex}`)
      result = reg.exec(str);
    }Copy the code

    The output is:

This doesn’t happen when we call regular operations via regular expression direct quantities, because in ES5, JavaScript generates a new RegExp object every time a regular operation is performed via direct quantities.

  • reg.test(str)

    testThe methods andexecIt’s basically the same thing, except that the return value, test, is pretty straightforwardexecreturnnullWhen, it returnsfalse, return in other casestrue