Regular expression learning notes

Regular expressions have always been dark magic for me. Mobile phone regular to search on the Internet, mailbox regular to search on the Internet, complex point to see the document together, and then complex can only be cheeky to let other colleagues to write a. I have never studied systematically, and I am going to take it today.

1. Learning Objectives:

Understand regular expression syntax
Use regular expressions in the IDE
Use regular expressions to process strings in javascript

2. What is regular expression

Regular Expression, also known as normal representation, Regular Expression, Regular Expression, Regular Expression (in code often abbreviated to regex, regexp, or RE), is a concept of computer science. Regular expressions use a single string to describe and match a series of strings that match a syntactic rule. In many text editors, regular expressions are used to retrieve and replace text that matches a pattern.

In plain English, it is to match the strings that meet the conditions according to certain rules

3. Use graphical tools to understand regular expressions

Regexper.com/ is an online tool to help you understand regular expressions. Grammar does not understand the table anxious, later will have, here just learn to use tools to help us learn.

Mobile phone number regular

/ ^ 1 [34578] [0-9] {9} $/

Note: it starts with 1, and the second digit is 3, 4, 5, 7, 9. It ends with 9 digits (1 by itself plus 8 repetitions) 0-9

Word boundaries

/\bis\b/

Note: is is the boundary of the word. To feel the difference between the two, \ B will explain the grammar part of square dao

URL group substitution

/http:(\/\/.+\.jpg)/

Don’t panic if you don’t understand the syntax section later, this is just to show you how to use visual graphics to help you understand regular expressions

Parentheses are used to group groups in regular expressions, and we can get the contents of group#1 by using $1

To illustrate the meaning of this re, if the site uses HTTPS, the site referencing static resources must also be HTTPS, otherwise an error will be reported. If written as //, HTTP or HTTPS is automatically recognized

Date matching and group substitution

/^\d{4}[/-]\d{2}[/-]\d{2}$/

This regular is complicated, and there are too many incantations in it.

Start of line is made by^Valid representations begin with this
Corresponding End of line by$Valid means this ends
Then look at Digit\dEffective representation number
3 times by{4}Effective means you repeat it 4times, and at the beginning there was a question why it wasn’t 4times. A regular expression is a rule used to match a string from the beginning to the end\t) It’s designed like a little train track running to the end. Digit: 3times: three times: three times I was wordy by myself.
Next, “One of” has appeared in mobile Regular. I can say anything. As long as it matches both of these, let it pass.

Now that the re has been explained, what do I do with it?

We can verify the validity of the date

Combined with the grouping nature of URL grouping substitution, we can easily write out a date formatting method

Let’s change this re

/^(\d{4})[/-](\d{2})[/-](\d{2})$/

You can easily get the contents of group#1, #2, #3, corresponding to $1, $2, $3

Now you can read regular expressions with graphical tools. If you want to write your own, you need to work on regular syntax

4. Regexp object in Js

Javascript supports regular expressions through the built-in object RegExp, which you can instantiate in two ways

Literal method

const reg =/\bis\b/g

The constructor

const reg = new RegExp('\\bis\\b', 'g')

Note: The first argument is a regular expression string (note the translation), and the second argument is the modifier g, which represents a global search, more on that later.

5. Regular expression syntax

1 modifier (three g I m)

Modifiers are passed in as a second argument after the // declaration of a literal method, and as a second argument after the constructor declaration. The entire regular expression can be interpreted as a regular expression rule string + modifier

G: global Performs a global match
I: Ignore case performs a case-insensitive match
M: Multiple lines matching

The modifiers can be used together with const reg =/\bis\b/gim

What do they do

The difference between having g and not having g

There’s no g there’s just the first one, there’s g there’s everything

The difference between having I and not having I

There is I that ignores case, and there is no I that is strictly case sensitive

2 yuan character

Regular expressions consist of two basic characters:

The original meaning character
Non-print character
Metacharacter (* +? $^. | \ () {} [])

3. Original meaning characters

There’s nothing to explain here, the /is/ matching string ‘is’ that we’ve been exemplifying

\ Marks the next character as a special character, or a literal character, or a backreference, or an octal escape. For example, n matches the character n. “\n” matches a newline character. The sequence “\\” matches “\” and “\(” matches” (“.

4. Non-print characters

character	describe
`\cx`	Matches the control character specified by x. For example, \cM matches a Control-m or carriage return character. The value of x must be either A-z or a-z. Otherwise, c is treated as a literal ‘c’ character.
`\f`	Matches a feed character. This is equivalent to \x0c and \cL.
`\n`	Matches a newline character. Equivalent to \x0a and \cJ.
`\r`	Matches a carriage return. Equivalent to \x0d and \cM.
`\s`	Matches any whitespace character, including Spaces, tabs, page feeds, and so on. Equivalent to [\f\n\r\t\v].
`\S`	Matches any non-whitespace character. Equivalent to [^ \f\n\r\t\v].
`\t`	Matches a TAB character. Equivalent to \x09 and \cI.
`\v`	Matches a vertical TAB character. Equivalent to \x0b and \cK.

Non-print characters, for example \n

Others are less referenced in the front end and should be used when working with text files in the back end

5. Character class []

[] /^1[34578][0-9]{9}$/ : [34578] indicates 34578. In date in matching and group replace example 🌰 / ^ \ d {4} [/ -] \ d {2} [/ -] \ d {2} $/ meet / – can be said

6. Character class inverse [^]

Indicates that it does not belong to this category

Space does not belong to a lot of dogs 🐶

7. Range class [-]

Regular expressions support range rules such as [A-z] [A-z] [0-9] can be penned [A-z0-9] if you just want to match – add – at the end of the range class. Look at an example.

8. Predefined classes

Often used to facilitate writing

character	Equivalence class	meaning
`.`	`[^\n\r]`	All characters except carriage returns and newlines
`\d`	`[0-9]`	Numeric characters
`\D`	`[^ 0-9]`	Non-numeric character
`\s`	`[\t\n\x0B\f\r]`	Whitespace characters
`\S`	`[^\t\n\x0B\f\r]`	Non whitespace characters
`\w`	`[a-zA-Z_0-9]`	Word characters (letters, numbers, underscores)
`\W`	`[^a-zA-Z_0-9]`	Non-word character

With these predefined classes, it’s easy to write regees, such as /ab\d./ if we want to match an ab+ number + arbitrary string

9. The border

character	meaning
`^`	Begin with xx
`$`	Ends in xx
`\b`	Word boundaries, meaning characters other than [A-zA-Z_0-9]
`\B`	Non-word boundary

A boundary, as its name implies, defines the matching boundary conditions, which are basically encountered in the previous example. Here is the difference between \b and \b

10. Quantifiers

character	meaning
?	Zero or one occurrence
*	Zero or more occurrences (any occurrence)
+	One or more occurrences (up to one track)
{n}	Zero or n
{n,m}	At least n times but no more than m times
{n,}	Appear at least n times (upgraded version of +)
{0,n}	{n,m} at most n times

If you don’t have a quantifier, you want to match four digits, so you can write it like this: /\d\d\d\d/. What if you match 50 digits and 100 digits? Isn’t that crazy? With quantifiers, you can write /\d{100}/. The use of quantifiers we used in mobile phone numbers is shown in the visualization as how many times we loop.

Create an instance /\d? * @ @ \ d \ d + @ \ d {10} @ \ d {10, 20} @ \ d {10} @ \ d {0, 10} /

11. Greed and Laziness (not Greed)

By default, regular expressions match greedy patterns. What is greedy patterns? As many matches as the name suggests. Let’s look at an example 🌰.

/ / \ d {3, 6}

Greedy mode matches the most cases.

The opposite of greed is the lazy pattern, and the opposite of laziness is matching as little as possible. How to start lazy mode? Add? After the quantifier. . Continue with the example above

/ \ d {3, 6}? /

If you want to know how regular expressions match quantifiers, look at advanced regular expressions and see how regex can be traced back.

12. Grouping and backreferencing

Grouping, also known as subexpression. Break regular expressions into smaller expressions. The concept is boring. Here’s an example of how to use groups:

No grouping / {2} / ABC

Quantifiers apply only to the last C

grouping/(abc){2}/

Notice group #1 here

Grouping is similar to the operator (), but grouping is in regular expressions, so understand what groups mean. Groups are often abused

/ / ^ | HTTPS (HTTP) really need this? Actually / ^ HTTPS? / It’s ok. If you re writing for a long time, you will get a bunch of useless results that will make your head ache.

Grouping is often used with backreferences. Don’t be alarmed by the concept: When a regular expression is grouped, each group is automatically assigned a group number, which is $1 $2 from left to right…

So let’s take our previous example

/^(\d{4})[/-](\d{2})[/-](\d{2})$/

You can easily get the contents of group#1, #2, #3, corresponding to $1, $2, $3

What if YOU don’t want to capture years in a backreference? Plus? Can be:

/ ^ (? :\d{4})[/-](\d{2})[/-](\d{2})$/

12. Looking forward to

⚠️ This is divided into the advanced part – choose to see

Regular expressions include the concepts of Lookahead and Lookbehind, which graphically describe the matching behavior of a regular engine. One thing to note is that before and after in regular expressions are a little different from before and after as we generally understand it. For a piece of text, we usually call the direction of the beginning of the text “front” and the direction of the end of the text “back”. But for the regular expression engine, because it comes from head to tail began to parse text (parsing can through regular options control direction), so for text tail direction, known as the “before”, because this time, didn’t go to the regex engine, and the text direction, head is referred to as the “after”, because the spot has gone through a regular engine.

⚠️ Note: the backward-looking performance loss is relatively large, js only supports the forward-looking (see on The website, specific reasons unknown)

The above comparison concept words, try to talk about with plain English, say the emperor choose concubine, the first condition must be beauty, grow “such as flower” so the emperor can not, beautiful this closed, the emperor wants this beauty is not good, the royal family has rules, must be noble blood.

Then “beautiful” is a normal match, and when it is, it must be determined whether the family is noble or not.” “Aristocrat” is equivalent to a prospective condition

Foresight is divided into two kinds: one is forward foresight (? = XXX), the other is negative forward-looking (? ! xxx)

Is 😂 easy? Let’s have a good time.

Title: How to convert ‘123456’ into money with commas. ‘123456’. This is a very common requirement to format the amount.

If I had not learned regular before, my thought would be as follows:

String to array
Inversion array
Add commas every third time
We finished adding the inversion array
Array to string

I’m so tired

‘123456789’. Replace (/(\d)(? = (? :\d{3})+$)/g, ‘$1,’)

This is the end of the basic use of re, and we will share some of the tricks and tricks of re in a later article.