Regular expression position matching guide

This is the 22nd day of my participation in the August More Text Challenge.

Regular expressions are matching patterns that match either characters or positions. Please remember this sentence.
However, most people do not attach so much importance to the matching position when they learn regularity.

In this chapter, we will cover the summation of regular matching positions.

The contents include:

What is location?
How do I match positions?
Property of position
Several application examples are analyzed

1. What is location?

A position is the position between adjacent characters.

2. How do you match locations?

In ES5, there are six anchor characters: **^** **$** **\b** **\b** **(? =p)** **(? ! P)** 2.1 ^ and $

^ (off character) matches the beginning of a line in a multi-line match.

The $(dollar sign) matches the end of a line in a multi-line match.

For example, we replace the beginning and end of a string with “#”. :

var result = "hello".replace(/^|$/g.The '#'); console.log(result);  // => "#hello#" 
Copy the code

In the case of multi-line matching pattern, the two concepts are rows, which needs our attention:

var result = "I\nlove\njavascript".replace(/^|$/gm.The '#'); console.log(result); /* #I# #love# #javascript# */
Copy the code

2.2 \ \ b and b

\b is the boundary between \w and \w, including \w and ^, and \w and $.

For example, if the file name is \b in “[JS] Lesson_01.mp4”, as follows:

var result = "[JS] Lesson_01.mp4".replace(/\b/g.The '#'); console.log(result);  // => "[#JS#] #Lesson_01#.#mp4#" 
Copy the code

Why is that? This needs to be looked at carefully.

First, we know that \w is a short form of the group [0-9a-za-z_], that is, \w is any alphanumeric or underscore character. While \W is a short form of the excluded character group [^ 0-9a-za-z_], that is, \W is any character other than \W.

[#JS#] #Lesson_01#.#mp4#

The first “#”, flanked by “[” and” J “, is\Wand\wThe position between.
The second “#” is flanked by “S” and “] “, i.e\wand\WThe position between.
The third “#”, flanked by Spaces and “L”, is\Wand\wThe position between.
The fourth “#” is surrounded by “1” and “.” , that is,\wand\WThe position between.
The fifth “#” is flanked by “.” And “m”, that is\Wand\wThe position between.
The sixth “#” corresponds to the end, but the character “4” before it is\w, i.e.,\wand$The position between.

Now that the concept of \B is known, \B is relatively easy to understand.

\B means the opposite of \B, not word boundary. For example, if \b is deducted from all positions in a string, all that is left is \B’s.

Specifically, the positions between \w and \w, \w and \w, ^ and \w, \w and $.

As in the example above, replace all \B with “#” :

var result = "[JS] Lesson_01.mp4".replace(/\B/g, '#'); console.log(result);  // => "#[J#S]# L#e#s#s#o#n#_#0#1.m#p#4" 
Copy the code

2.3 (? = p) and (? ! p)

(? =p), where P is a subpattern, i.e. the position before p.

Such as? =l), indicating the position before the ‘l’ character, for example:

var result = "hello".replace(/ (? =l)/g.The '#'); console.log(result);  // => "he#l#lo" 
Copy the code

——————————————————————————————- |

And (? ! P) is (? =p), for example:

var result = "hello".replace(/(? ! l)/g, '#'); console.log(result); // => "#h#ell#o#"Copy the code

The scientific names for the two are positive lookahead and Negative lookahead.

The Chinese translation is positive antecedent assertion and negative antecedent assertion respectively.

ES6 also supports positive lookbehind and negative lookbehind.

(is specific? < = p) and (?

There are also books that translate these four things as looking around, looking to the right or looking to the left.

But in general books, there is no good emphasis on these four as a position.

Such as? =p), the following characters are required to match p, but not those of P.

In my opinion (? P is just as easy to understand as alpha, which is the position in front of p.

3. Location characteristics

For the understanding of position, we can understand as the null character “”.

For example, a “hello” string is equivalent to the following:

"hello"= ="" + "h" + "" + "e" + "" + "l" + "" + "l" + "o" + ""; 
Copy the code

Also equivalent to:

"hello"= ="" + "" + "hello" 
Copy the code

Therefore, there is no problem in writing /^hello$$$/ as /^^hello$$$/ :

var result = /^^hello$$$/.test("hello"); console.log(result);  // => true 
Copy the code

It could even be written as something more complicated:

var result = /(? =he)^^he(? =\w)llo$\b\b$/.test("hello"); console.log(result); // => trueCopy the code

That is, the positions between characters can be multiple.

A very efficient way to understand positions is to understand them as null characters.

4. Relevant cases

4.1 Regex that does not match anything

Let you write a re that doesn’t match anything

Easy, /. ^ /

Because this re requires only one character, which is followed by a beginning.

4.2 Thousands separator notation for numbers

For example, change “12345678” to “12,345,678”.

It is necessary to replace the corresponding position with “, “.

What’s the idea?

4.2.1 Make the last comma

Use (? =\d{3}$)

var result = "12345678".replace(/ (? =\d{3}$)/g.', ') console.log(result);  / / = > "12345678"
Copy the code

4.2.2 Make all commas

Because of the position in which the comma appears, it is required that the next three digits, i.e. \d{3}, appear at least once.

You can use the quantifier + :

var result = "12345678".replace(/ (? =(\d{3})+$)/g.', ') console.log(result);  // => "12,345,678" 
Copy the code

4.2.3 Match other cases

After writing the re, we need to validate a few more cases, where we find problems:

var result = "123456789".replace(/(? =(\d{3})+$)/g, ',') console.log(result); / / = > ", 123456789"Copy the code

Because the re above simply means that if the number from the end is a multiple of 3, replace the position in front of it with a comma. Hence the problem.

How do you solve it? We want to match up to this position and not the beginning.

We know we can use ^ at the beginning of a match, but what if this position is not at the beginning?

Easy. (? ! ^), do you have any idea? The tests are as follows:

var string1 = "12345678", string2 = "123456789";
reg = / (? ! (^)? =(\d{3})+$)/g; 
var result = string1.replace(reg, ', ') console.log(result);  / / = > "12345678" result = string2. Replace (reg, ', ');
console.log(result);  // => "123,456,789" 
Copy the code

4.2.4 Other formats are supported

If you want to replace “12345678 123456789” with “12,345,678 123,456,789”.

In this case, we need to change the re, and replace the beginning ^ and end $with \b:

var string = "12345678, 123456789", reg = / (? ! \b)(? =(\d{3})+\b)/g;  var result = string.replace(reg, ', ') console.log(result);  / / = > "12345678 123456789"
Copy the code

Among them,? ! \ B) How to understand?

Ask the current is a position, but not before \b, in fact (? ! \B) means \B.

So the final re becomes: /\B(? = (\ d {3})/g + \ b).

4.3 Verifying passwords

The password contains 6 to 12 characters, including digits, lowercase characters, and uppercase letters, but must contain at least two types of characters.

This problem, if written as a number of re to judge, relatively easy. But writing it as a regular is more difficult.

So, let’s challenge this. Let’s see if we have a good understanding of location.

This will simplify

The condition “but must contain at least 2 characters” is not considered. We can easily write:

Var/reg = ^ [0-9 a Za - z] {6, 12} $/;Copy the code

4.3.2 Checking whether certain characters are contained

Suppose I have to include a number? At this point we can use (? =. * [0-9]) to do.

Hence the re becomes:

var reg = /(? ^ =. * [0-9]) [0-9 a Za - z] {6, 12} $/;Copy the code

Regular expression position matching guide

1. What is location?

2. How do you match locations?

3. Location characteristics

4. Relevant cases

Related Posts

CSS Basics

For example, simple code, compare the responsivity principle of Vue2 and 3

Wechat template push small gifts for girlfriend