Regular advanced usage

locator

character	describe
^	Matches the starting position of the input string. If the Multiline property of the RegExp object is set, ^ will also match the position after \n or \r.
$	Matches the position at the end of the input string. If the Multiline property of the RegExp object is set, $also matches the position before \n or \r
\b	Matches a word boundary, the position between a word and a space.
\B	Non-word boundary matching.

Note: You cannot use a qualifier with a locator. Since there cannot be more than one position immediately before or after a newline or word boundary, expressions such as ^* are not allowed.

qualifiers


X ?	X, not once or not
X *	X, zero or more times
X+	X, once or more times
X {n}	X, exactly n times
X {n,}	X, at least n times
X{n,m}	X, at least n times but not more than m times

But what if we want to repeat multiple characters? We can use the parenthesis “()” to specify which subexpression to repeat, and then repeat the subexpression, for example :(ABC)? Zero or one ABC where a parenthetical expression represents a group.

Groups can be divided into two forms, capture groups and non-capture groups.

Get the grouping method

match exec

 arr[n] = str.match(reg);     
 arr[n] = reg.exec(str);
Copy the code

In the array arr[n] returned,arr[0] represents the entire match,arr[1],arr[2]……. Represents the matching results of each group

Through the static properties of the RegExp object

RegExp. 1,RegExp.1,RegExp.1,RegExp.2……… RegExp.$9 represents the contents from the first group to the ninth group, respectively

Capture group

Capture groups can be numbered by counting their opening brackets from left to right. For example, in the expression (A)(B(C)), there are four such groups:


0	(A) 、 (B(C))
1	(A)
2	(B(C))
3	(C)

Once you understand capture groups, you can learn about back references

backreferences

Adding parentheses to either side of a regular expression pattern or a partial pattern causes related matches to be stored in a temporary buffer, with each captured sub-match stored in the order in which they appear in the regular expression pattern from left to right. The buffer number starts at 1 and can store up to 99 captured subexexpressions. Each buffer can be accessed using \n, where n is a one – or two-digit decimal number that identifies a particular buffer.

That is, in subsequent expressions, you can use the group number to refer to the sequence of text captured by the previous expression.

For example, have the following regular:

([a-z])([a-z])\2\1The string abba can be matchedCopy the code

([a-z])([a-z])\2\1 = (a)(b)\2\1 So if you look at (a)(b), that matches “ab”. (a) (b), 2, 1 is “ab” and the content of the \ \ 1 2 matching \ here is (b), 2 (a) (b) \ 2 is matching “abb”, in the same way (a) (b), 2, 1 match “abba”.

One of the simplest and most useful applications of backreferencing is the ability to find a match between two identical adjacent words in text. Take the following sentence for example:

var str = "Is is the cost of of gasoline going up up";
var patt1 = /\b([a-z]+) \1\b/ig;
document.write(str.match(patt1));
> Is is,of of,up up
Copy the code

\1 is a reference to a previously captured child match, specifying the first child match.

Determine the characters that appear the most times in a string and count The Times

var s = 'aaaaacccccbbbbb';

var a = s.split(' ');
a.sort();
s = a.join(' ');
var pattern = /(\w)\1*/g; = = > \ here1What does that mean? What happens if I don't write this?var ans = s.match(pattern);
ans.sort(function(a, b) {
  return a.length < b.length;
});;
console.log(ans[0] [0] + ':' + ans[0].length); A \1The value of ans is: ["aaaaa"."bbbbb"."ccccc"] no \1The value of ans is: ["aaaaabbbbbccccc"[If it is \2Or \3? The value of ANS is: ["a"."a"."a"."a"."a"."b"."b"."b"."b"."b"."c"."c"."c"."c"."c"]
Copy the code

The parentheses “()” in the regular expression. It stands for group. If it is followed by \1, it is the same as the first parenthesis to match.

Note: \1 must be used with parentheses

Decompose a common resource indicator (URI) into its components. Suppose you want to decompose the following URIs into protocol (FTP, HTTP, and so on), domain address, and page/path

var str = "http://www.runoob.com:80/html/html-tutorial.html";
var patt1 = /(\w+):\/\/([^/:]+)(:\d*)? / [^ #] *);
arr = str.match(patt1);
for (var i = 0; i < arr.length ; i++) {
    document.write(arr[i]);
    document.write("<br>"); } The third line of code, str.match(patt1), returns an array containing the array in the instance5Three elements, index0That corresponds to the whole string, the index1Corresponds to the first matching character (in parentheses), and so on. > http://www.runoob.com:80/html/html-tutorial.html
http
www.runoob.com
:80
/html/html-tutorial.html
Copy the code

The first parenthesis subexpression contains HTTP and the second parenthesis subexpression contains www.runoob.com. The subexpression matches one or more characters following non: and /. The third parenthesis subexexpression contains :80 and the fourth parenthesis subexexpression contains/HTML /html-tutorial. HTML, which matches any sequence of characters that does not include # or space characters.

Non capturing group

Can you use non-captured metacharacters? : and? = or? ! To override the capture, ignoring the save of the correlation match.

To (?) The leading group is a pure non-capturing group, which does not capture text and does not count against combinations. That is to say,If the parentheses start with? The group does not capture text, and of course there is no group number, so there is no backreference.

Why have a non-capture group when you can get what you want to match with a capture group?

The reason is that the content captured by the capture group is stored in memory for later use, such as a backreference to the content captured in the capture group stored in the memory of the reference.

A non-capture group does not capture text, nor does it group the contents it matches into separate groups and place them in memory. Therefore, using non-capture groups is more memory efficient than using capture groups.

1, non-capture group (? :Pattern)

Its function is to match characters in the Pattern. The advantage is that no text is captured and the matched characters are not stored in memory, thus saving memory.

Match the indestry or indestries can use indestr (y | ies) or indestr (? :y|ies) (? :a|A)123(? :b) Can match either a123b or a123bCopy the code

Non-capture groups come in many forms, including zero-width assertions and pattern modifiers

Zero width assertion


(? = X )	exp1(? =exp2) : Find exp1 before exp2.
(? ! = X )	exp1(? ! Exp2) : find exp1 that does not follow exp2.
(? <= X )	(? <=exp2)exp1: Find exp1 after exp2.
(? <! X )	(? <! Exp2)exp1: Find exp1 that is not preceded by EXP2.

3. Pattern modifier

To (?) The non-capture group at the beginning has a pattern modifier in addition to the zero-width assertion.

Pattern modifiers commonly used in regular expressions include I, g, m, s, x, and e. They can be combined.

(? Imnsx – IMNSx:) applies or disables the options specified in the subexpression. For example, (? I-s:) will enable case insensitive and disable single line mode. To turn off the case-insensitive switch, you can use (? – I). See Regular Expression Options for more information.

(? I)ab is a pair of (? I) All characters after this parameter are case-insensitive. So it matches ab, ab, ab, ab (? I :a)b Indicates that the case insensitive switch is enabled only for A. So it can match ab and AB. Cannot match aB and aBCopy the code

4. (? >Pattern) is equivalent to encroachment Pattern

\d++ can be written as (? > \ d +).

For example, to shorten a number of decimal places to three decimal places:

\d+\.\d\d[1-9]? \d+Copy the code

In this case 6.625 matches, which is unnecessary because it is a three-digit decimal. The last “5” was supposed to match [1-9], but it was followed by a \d+, so [1-9] because of the “? Can not match so you must discard the current match and send the “5” to \d+ to match.

\d+.\d\d[1-9]? +\d+

In the encroachment form, when “5” matches to [1-9], since it is encroachment type, no backtracking will be carried out, then \d+ will not match anything, so 6.625 matching failure.

In this case, it works in substitution, for example, if you shorten the number to three decimal places, if it’s exactly three decimal places, you don’t have to replace it, it can improve the efficiency, and occupy quantifiers are basically used to improve the matching efficiency.

The \ d +. \ d \ d [1-9]? + \ d + instead \ d + \ d \ d (? > [1-9])? \d+ It’s the same.

locator

qualifiers

Get the grouping method

Capture group

backreferences

Non capturing group

Related Posts

When you encounter parenthesis validity, the next larger element, a specific minimum, try the stack

Introduction to TypeScript

How should programmers ask questions?