If you need to handle complex string-related functions, consider using regular expressions.

The re is used to find and replace characters

Before we look at the formal concepts, let’s take a quick look at regex through a series of examples, and then dive into some of the concepts of regex.

I recommend the website regexr.com/

Use this website to quickly learn the basics of regular by following the examples below.

Light out of the case

9. Regular is written:

let reg=/... /Copy the code

. It is the content that you want to match, for example, I want to match my name in a series of English words: qiuyanxi, how to do?

RegExr was created by gskinner.com,
qiuyanxi and yanxi is proudly qiuyanxi hosted qiu yan xi by Media QiuYanxi Temple.
Copy the code

I’ll just say /qiuyanxi/ and it will match correctly.

G modifier – global

So I can only access the first one, and if I want to access all of the qiuyanxi in the global text, this matching pattern, which we call horizontal matching, uses a modifier G for global matching. It’s written like this /qiuyanxi/g

I modifier – case

As you can see, all of the above patterns are exact matches, even for uppercase characters, but I still want to match uppercase characters, so I can use the I modifier, which says qiuyanxi/gi

\ d and[0-9]

Now let’s convert the text

My name is QiuYanxi,my skill is Awesome!
My name is QiuYanxi,my skill is 66.
My name is QiuYanxi,my skill is 6.
Copy the code

I want to be able to match the numbers inside, and the numbers are usually 0-9. If you write 10-100, the re won’t help you find the numbers between 10-100, because the re matches character by character and doesn’t recognize the size of the numbers. So remember that the numbers in the re are 0 through 9.

That is, I want to be able to match between 0 and 9. This matching pattern, called vertical matching, can be matched with the character group []. It reads like this

/[0-9]/g
Copy the code

Notice the red box in the picture. It means six characters have been matched. In other words, the above notation matches the numbers, but each number has been broken down to match.

We don’t want to do that, so I want to tell the re how much I want to match, and then I need quantifiers

Quantifiers are represented by {}, which is used like this

So we have a full match of six, six, six, six hundred and sixty-six.

If you don’t want to match these three numbers, you can use [^…

In brackets ^ can be expressed as not.

Shorthand characters

When using groups of characters, we can use [0-9a-za-z] to represent any character of all numbers + upper and lower case letters

[0-9] shorthand \ d

[0-9 a Za – z] shorthand \ w

[^ 0-9] shorthand \ D

[^ 0-9 a Za – z] shorthand \ W

The following character sets are commonly used

shorthand describe
. All characters except newline characters
\w Matches all letters and digits[a-zA-Z0-9_]
\W Matching all non-alphanumeric, i.e. symbols, is equivalent to:[^\w]
\d Matching number:[0-9]
\D Match non-numbers:[^\d]
\s Matches all space characters, equivalent to:[\t\n\f\r\p{Z}]
\S Matches all non-space characters:[^\s]
\f Matches a feed character
\n Matches a newline character
\r Matches a carriage return
\t Matches a TAB character
\v Matches a vertical TAB character
\p Matching CR/LF (equivalent\r\n) to match the DOS line terminator

In this case, for example, I can match all characters except newline.

What if you want to match points? This is just like JS, where the escape character is used, and the escape character for the regular is \

That’s just the match dot symbol.

Starting position and ending position

In the following characters, I want to match My. What should I do?

My name is QiuYanxi,My skill is Awesome!
My name is QiuYanxi,My skill is 66.
My name is QiuYanxi,My skill is 6.
Copy the code

It’s easy, I can match it with /My/g, but I just want to match the first My, how do I do that? Again, use the ^ metacharacter, which means “not” when placed in brackets, or “beginning” when not.

It reads like this

I got a match, and here I see another line that didn’t get a match. This is because a newline is just a symbol for the re, and we need to let the re know that we want it to match multiple lines. You can use the m modifier, which, like g, is a modifier.

The following example is the use of the m modifier

The $metacharacter is used at the end.

metacharacters

The short character can help us replace [0-9] with the simpler \d. What about quantifiers? Quantifiers also have metacharacters to help us abbreviate.

It has already been introduced that quantifiers are represented in braces, {least bit, maximum bit}, for example

{0,1} can have no, at most 1 {1,} at least 1, at most unlimited {0,} zero or infiniteCopy the code

Using metacharacters instead is

{0,1} ==> * {1,} ==> + {0,} ==>?Copy the code

For example, \d{1,} represents one digit with no upper limit. We can write it as \d+

The other two metacharacters are used in the same way. A list of commonly used metacharacters is attached

metacharacters describe
. Period matches any single character except newline.
[] Character type. Matches any character inside square brackets.
(^) The character type of the negation. Matches any character except those in square brackets
* Matches >=0 repeated characters before the * sign.
+ Matches >=1 duplicate character before the + sign.
? Mark? The preceding characters are optional.
{n,m} Matches the character or character set before num braces (n <= num <= m).
(xyz) Character set that matches a string exactly equal to xyz.
| The or operator matches the character before or after the symbol.
\ Escape characters that match reserved characters[] () {}. * +? ^ $\ |
^ Matches start at the beginning line.
$ Match from the end.

The inside of the more important is brackets [] character set, the parentheses () group and pipeline operator |.

The brackets character set is generally understood, and is used for vertical matching. Matching one of these characters, such as [tT], matches either t or t, with no order in it.

The above example can also use the parentheses () and pipe | write.

The parentheses () are groups represented as wholes.

Or pipe | said

Perhaps more difficult to understand is the parenthesis grouping (), which is a grouping that represents a whole, and in which order is also strictly defined.

It’s hard to understand because it usually requires reference substitution with $.

My name is QiuYanxi,QiuYanxi's skill is 666. How to convert QiuYanxi => 666?Copy the code

This is where you need to reference and then replace. A reference is a group of characters in parentheses and a $is used to get the reference. As the figure below

The above example first parentheses the required characters and then references them with a $+ sequence number.

For example, (QiuYanXi) is quoted by $1.

(\d+) is quoted by $2.

In-depth concept

Regular expressions are matching patterns that match either characters or positions.

Character match

1. Horizontal fuzzy matching – You need to match several

Horizontal fuzzy matching means matching one or more quantities.

The main way to do this is to use quantifiers, such as {m,n}, for m-n times

var regex = / ab} {2 and 5 c/g;
var string = "abc abbc abbbc abbbbc abbbbbc abbbbbbc";
console.log( string.match(regex) ); 
//=> ["abbc", "abbbc", "abbbbc", "abbbbbc"]
Copy the code

The mantissa g in the case represents the global matching pattern and is a modifier.

That is, all substrings that meet the matching pattern are found in order in the target string, emphasizing “all” rather than just “the first”. G is the first letter of the word global.

2. Longitudinal fuzzy matching – what do you want to match

Vertical fuzzy matching means that a character can be matched in any number of ways, not necessarily that way.

This is done by using groups of characters, such as [ABC], to indicate that the character can be any of the characters “A”, “b”, or “C”.

var regex = /a[123]b/g;
var string = "a0b a1b a2b a3b a4b";
console.log( string.match(regex) ); 
Copy the code

The number in the middle can be 1 or 2 or 3.

summary

Horizontal fuzzy matching is used to match quantities and vertical fuzzy matching is used to match multiple possibilities.

Quantifiers are used for horizontal fuzzy matching and character groups are used for vertical fuzzy matching

3. Quantifiers — used to indicate the number of characters

In layman’s terms, a quantifier is the number of occurrences of the character.

shorthand

{m,} indicates at least m occurrences. {m,n} means at least m occurrences, and at most n occurrences {m} is equivalent to {m,m}, indicating m occurrences. ? Equivalent to {0,1}, indicating presence or absence. How to memorize: the meaning of question mark, is there? + is equivalent to {1,}, indicating at least one occurrence. How to remember: The plus sign means to add, you have to have one first, then you can consider adding. * is equivalent to {0,}, indicating any occurrence, and may not occur. How to remember: Look at the stars in the sky. There may be none, there may be a few scattered, and you may not be able to count them.Copy the code

3.1 Greedy matching and lazy matching

Greedy matches are as many matches as I can make

Lazy matching is where I make as few matches as possible

var regex = / \ d {2, 5} / g;
var string = "123, 1234, 12345, 123456";
console.log( string.match(regex) ); 
// => ["123", "1234", "12345", "12345"]
Copy the code

It’s a case of greedy matching, I’ll take whatever I’m given.

The re above means \d I need numbers,{2,5} means I need 2-5, and if I have 5, I need 5.

So lazy matching, give me two, and I’m good.

Lazy matching is written like this

var regex = / \ d {2, 5}? /g;
var string = "123, 1234, 12345, 123456";
console.log( string.match(regex) ); 
// => ["123", "1234", "12345", "12345"]
Copy the code

Lazy matching can be achieved by placing a question mark after the quantifier, so all lazy matching cases are as follows:

{m,n}?

{m,}?

??

+?

*?

The way to remember lazy matching is to put a question mark after the quantifier and ask are you satisfied? Are you greedy?

4. Character group: indicates the range of characters

It is important to note that a character group (character class) is only one character. For example, [ABC] matches a character. It can be one of A, B, or C.

4.1 What Do I Do if the Range of Matched Characters is Too Large

If the range of characters to match is too large to write, range notation can be used. You can use a hyphen – here.

For example, [123456abcdefGHIJKLM] can be written as [1-6a-fg-m].

For example, 26 letters in lower case could be written as [a-z]

Because the hyphen is special, what do you do to match any of the characters “A”, “-“, or “z”? Cannot be written as [a-z] because it represents any character in lower case. It can be written as: [-az] or [az-] or [a\-z]. Either at the beginning, at the end, or escaped. You don’t want the engine to think it’s a range notation.

4.2 What if I don’t need a Character

For example, the character can be anything, but we don’t need “A”, “b”, “C”, we can use the exclusion character ^, which means invert.

This is the time to exclude the concept of character groups (antisense character groups). For example, [^ ABC] is any character except “A”, “b”, and “C”. The first part of the character group is ^ (decaracter) to indicate the concept of inverting.

Common shorthand form

\d is [0-9]. Represents a digit. How to remember: Digit.

\D is [^0-9]. Represents any character except a number.

\w is [0-9a-zA-z_]. Represents digits, uppercase letters, and underscores. How you remember it: W is short for Word, also known as word character.

\ W is [^ 0-9 a zA – Z_]. Non-word characters.

\s is [\t\v\n\r\f]. Represents whitespace, including Spaces, horizontal tabs, vertical tabs, line feeds, carriage returns, and page feeds. How to remember: S is the first letter of space character.

\S is [^ \t\v\n\r\f]. Non-whitespace character.

Is [^\n\r\u2028\u2029]. Wildcard character, representing almost any character. Newline, carriage return, line and segment separators are excluded. How to remember it: Think ellipses… Each of these dots can be interpreted as a placeholder for anything like it.

What if I want to match arbitrary characters? You can use any of [\d\ d], [\w\ w], [\s\ s] or [^].

5. Multiple selection mode

Described above are based on a mode matching, but sometimes we need a variety of patterns, such as I want to choose a ABC, also want to choose a xyz, can use the multiselect mode, through the pipe | segmentation

var reg=/[abc]? |[xyz]? /g
var string='xyz abc '
var string2='abc xyz'
console.log(string.match(reg))
console.log(string2.match(reg))
// => ["x"]
// => ["a"]
Copy the code

For example, to match the “good” and “nice” can use/good | nice /. The tests are as follows:

var regex = /good|nice/g;
var string = "good idea, nice try.";
console.log( string.match(regex) ); 
// => ["good", "nice"]
Copy the code

Case analysis

Regular expressions are most important for analysis, followed by writing, such as matching the following characters

var string = "#ffbbad #Fc01DF #FFF #ffE abc";
Copy the code

Need to match out hexadecimal characters.

Analysis:

1.Hexadecimal characters range from 1 to 9, A to Z, and a to Z

There’s a # in front of it

3.3-6

The range is the first condition, using groups of characters

Quantity is the third condition, using quantifiers

var reg=/ # [0-9 a - fA - F] {3, 6} / g
var string = "#ffbbad #Fc01DF #FFF #ffE abc";
console.log(string.match(reg))
// => ["#ffbbad", "#Fc01DF", "#FFF", "#ffE"]
Copy the code

Match the time

23:59

thou

24:00

Analysis:

1. The first digit is between 0 and 2

2. The second digit is between 0 and 9

3. The third digit is between 0-5

4. The fourth digit is between 0 and 9

5. If the first digit is 2, the second digit is between 0 and 4

6. If the first and second digits are 24, then the third and fourth digits must be 00

var reg=/ (([0, 1] [0-9] | [2] [0, 3]) : [0 to 5] [0-9]) | 24:00 /
console.log( reg.test("01:09"));// true
console.log(reg.test("24:01")); // false
console.log(reg.test("00:6 0")); // false
Copy the code

If you want to ignore the zeros, you can write it like this

var reg=/ ((^ (0? [0-9] [0-9]) | | 1 [2] [0, 3]) : (0? | [0 to 5] [0-9]) | 24:00 /
console.log(reg.test("He"));// true
console.log(reg.test("24:01")); // false
console.log(reg.test("23:9")); // true
Copy the code

Match the date

Take the YYYY-MM-DD format as an example.

Required to match 2017-06-10

Analysis:

What scope do I need to match?

Year: number, between 0 and 9 [0-9]

Month: digital, may be the 01-09, and between 10 to 12, can be used (0 [1-9] | 1 [2-0])

Day: Numbers, is likely to be 01-09, and 10-29, 31, can use 0 [1-9] | [12] [0-9] | 3 [0, 1]

How many bits do I need to match?

Year matches four, month matches two, day matches two

const reg=/ [0-9] {4} - (0 [1-9] | 1) [0-2] - [1-9] | 0 [12] [0-9] [0, 1] / | 3
console.log(reg.test("2017-06-10"));//true
Copy the code

Position matching

What is location

A position is the position between adjacent characters. For example, where the arrow in the image below points:

How to match positions

In a re, there are six anchor characters

^ $ \b \B (? =p) (? ! p)

Match the beginning and end

Matches start and end with ^ and $

^ (off character) matches the beginning of a line in a multi-line match.

The $(dollar sign) matches the end of a line in a multi-line match.

For example, we replace the beginning and end of a string with a “#”. :

var result = "hello".replace(/^|$/g.The '#');
console.log(result); 
// => "#hello#"
Copy the code

In the case of multi-line matching pattern, the two concepts are rows, which needs our attention:

var result = "I\nlove\njavascript".replace(/^|$/gm.The '#');
console.log(result);
/*
#I#
#love#
#javascript#
*/
Copy the code

Matches word boundaries and non-word boundaries

\b is word boundary

\B Non-word boundary

\b is the boundary between \w and \w, including \w and ^, and \w and $.

\w is [0-9a-zA-z_] for letters, digits, uppercase letters, and underscores.

\W is all except letters, digits, uppercase letters, and underscores.

For example, a file named \b in “[JS] lesson_01.mp4 “would look like this:

var result = "[JS] Lesson_01.mp4".replace(/\b/g.The '#');
console.log(result); 
// => "[#JS#] #Lesson_01#.#mp4#"
Copy the code

The re above adds # to the word boundary.

What are word boundaries? First, JS is related to \w, and second, Lesson_01 and mp4 belong to \ W.

So what is \W? That is [], Spaces, and.

So let’s analyze it:

  • [withJThere is a word boundary between
  • Swith]There is a word boundary between
  • The blank space withLThere is a word boundary between
  • 1 with.There is a word boundary between
  • .withmThere is a word boundary between
  • The last one#Student: Because 4 belongs to\w, with$There is a word boundary between the endings

Now that the concept of \B is known, \B is relatively easy to understand.

\B means the opposite of \B, not word boundary. For example, if \b is deducted from all positions in a string, all that is left is \B’s.

var result = "[JS] Lesson_01.mp4".replace(/\B/g.The '#');
console.log(result); 
// => "#[J#S]# L#e#s#s#o#n#_#0#1.m#p#4"
Copy the code

Front position and non-front position

(? = p) and (? ! P) represents the front and non-front positions of the P mode respectively. Such as

var result = "hello".replace(/ (? =l)/g.The '#');
console.log(result); 
// => "he#l#lo"
Copy the code

The above code indicates that the character inserted before l is #

And (? ! P) is (? =p), for example:

var result = "hello".replace(/ (? ! l)/g.The '#');

console.log(result); 
// => "#h#ell#o#"
Copy the code

Property of position

You can think of the properties of positions as null characters.

For example, a “hello” string is equivalent to the following:

"hello"= ="" + "h" + "" + "e" + "" + "l" + "" + "l" + "o" + "";
Copy the code

That is, the positions between characters can be multiple.

A very efficient way to understand positions is to understand them as null characters.

Related to the case

The thousands separator representation of a number

For example, change “12345678” to “12,345,678”.

You need to put a comma in front of the three digits, so it becomes

const reg=/ (? =(\d{3})+$)/g

console.log('12345678'.replace(reg,', '))
/ / "12345678"
Copy the code

However, if the character is 123456789, it will be “123,456,789”.

So we need to get rid of the first position, which can be represented by the alpha.

Non-primacy can be used in position? ! P mode, so it becomes p mode

const reg=/ (? ! (^)? =(\d{3})+$)/g

console.log('123456789'.replace(reg,', '))
//"123,456,789"
Copy the code

The function of regular expression parentheses

The function of parentheses, in fact, can be explained in a few words, the parentheses provide grouping, so that we can refer to them.

There are two ways to refer to a group: in JavaScript, or in regular expressions.

Grouping and branching structures

We know that /a+/ matches consecutive occurrences of “a”, and to match consecutive occurrences of “ab”, we need to use /(ab)+/.

Where parentheses provide grouping function, so that the quantifier + applies to the whole “ab”, test as follows:

var regex = /(ab)+/g;
var string = "ababa abbb ababab";
console.log( string.match(regex) ); 
// => ["abab", "ab", "ababab"]
Copy the code

In the multiple branch structure (p1 | p2), the role of the parentheses is self-evident, provides the expression of all possible.

For example, to match the following string:

var regex = /^I love (JavaScript|Regular Expression)$/;
console.log( regex.test("I love JavaScript"));console.log( regex.test("I love Regular Expression"));// => true
// => true
Copy the code

Reference group

This is an important function of parentheses, which allows us to do data extraction, as well as more powerful substitution operations.

To take advantage of its benefits, you must use the API of the implementation environment.

Take dates, for example. Assuming the format is YYYY-MM-DD, we can start with a simple re:

var regex = /\d{4}-\d{2}-\d{2}/;
Copy the code

Then modify the parenthesized version:

var regex = /(\d{4})-(\d{2})-(\d{2})/;
Copy the code

Why use this re?

Extract the data

For example, to extract the year, month, and day, you can do this:

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
console.log( string.match(regex) ); 
// => ["2017-06-12", "2017", "06", "12", index: 0, input: "2017-06-12"]
Copy the code

An array returned by match. The first element is the overall match result, followed by the matches for each group (in parentheses), followed by the match subscript, and finally the input text. (Note: The array format returned by match is different if the re has the g modifier or not).

Alternatively, we can use the exec method of the regular object:

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
console.log( regex.exec(string) ); 
// => ["2017-06-12", "2017", "06", "12", index: 0, input: "2017-06-12"]
Copy the code

It can also be obtained using the constructor’s global attributes 1 to 1 to 1 to 9:

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";

regex.test(string); // Regex operations, for example
//regex.exec(string);
//string.match(regex);

console.log(RegExp. $1);/ / "2017"
console.log(RegExp. $2);// "06"
console.log(RegExp. $3);/ / "12"
Copy the code

replace

For example, if you want to replace YYYY-MM-DD with MM/DD/YYYY, what do you do?

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, "$2 / $3 / $1");
console.log(result); 
/ / = > "06/12/2017"
Copy the code

The equivalent of

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, function(){
  return RegExpThe $2 +"/" + RegExp. $3 +"/" + RegExp. $1; });console.log(result); 
/ / = > "06/12/2017"
Copy the code

backreferences

In addition to referring to groups using the corresponding API, you can also refer to groups within the re itself. But you can only refer to the previous grouping, which is called a backreference.

Again, take dates.

For example, to write a re that matches one of the following three formats:

2016-06-12

2016/06/12

2016.06.12

The first regular that might come to mind is:

var regex = /\d{4}(-|\/|\.) \d{2}(-|\/|\.) \d{2}/;
var string1 = "2017-06-12";
var string2 = "2017/06/12";
var string3 = "2017.06.12";
var string4 = "2016-06/12";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // true
Copy the code

Where/and. Need to be escaped. Although the required condition is matched, data such as “2016-06/12” is also matched.

What if we wanted to be consistent with the separator? Use a backreference:

var regex = /\d{4}(-|\/|\.) \d{2}\1\d{2}/;
var string1 = "2017-06-12";
var string2 = "2017/06/12";
var string3 = "2017.06.12";
var string4 = "2016-06/12";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // false
Copy the code

Four operations on regular expressions

check

var regex = /\d/;
var string = "abc123";
console.log( regex.test(string) );
// => true
Copy the code

cut

And once we have a match, we can do things like slice.

The so-called “shard” is to cut the target string into segments. Split is used in JS.

For example, if the target string is “HTML, CSS,javascript”, split by commas:

var regex = /, /;
var string = "html,css,javascript";
console.log( string.split(regex) );
// => ["html", "css", "javascript"]
Copy the code

You can use split to “cut out” year month day:

var regex = /\D/;
console.log( "2017/06/26".split(regex) );
console.log( "2017.06.26".split(regex) );
console.log( "2017-06-26".split(regex) );
// => ["2017", "06", "26"]
// => ["2017", "06", "26"]
// => ["2017", "06", "26"]
Copy the code

take

Although the whole match is made, it is sometimes necessary to extract partial matched data.

In this case, the regex usually uses the grouping reference (grouping capture) function, along with the related API.

Here, again, I’m taking the date as an example and extracting the year, month and day. Note the parentheses in the re below:

var regex = /^(\d{4})\D(\d{2})\D(\d{2})$/;
var string = "2017-06-26";
console.log( string.match(regex) );
// =>["2017-06-26", "2017", "06", "26", index: 0, input: "2017-06-26"]
Copy the code

in

Finding is often not the goal, usually the next step is to replace. In JS, replace is used.

For example, replace the date format from YYYY-MM-DD to YYYY /mm/dd:

var string = "2017-06-26";
var today = new Date( string.replace(/-/g."/"));console.log( today );
// => Mon Jun 26 2017 00:00:00 GMT+0800
Copy the code

The resources

Full tutorial on regular expressions

learn-regex