The purpose of this article is to present the most boring basic knowledge in the most popular language

Article outline:

  1. metacharacters
  2. Repetition qualifier
  3. grouping
  4. escape
  5. Conditions or
  6. interval

Regular expressions can be used in almost any language, from JavaScript on the front end to Java or c# on the back end. They all provide interfaces/functions that support regular expressions.

But here’s the amazing thing: No matter which computer language you choose in college, there’s no regular expression course for you. Until you learn regex, you’ll just have to watch the regex masters write a string of alien characters instead of doing some content validation with a whole lot of if else code.

Since like, that begin to learn, but when you baidu out one by one pile of relevant information, you find all without exception boring extremely, difficult to learn (to be honest, when ignore gentleman is also such mentality 😂😂).

I’m going to try to talk about regex in a more informal way, so that you can write some simple regex yourself after you’ve read it, or worse, be able to see other people’s regex.

1. Metacharacters

Everything has a source, so does a regular expression, and metacharacters are one of the basic elements in constructing regular expressions. Let’s start by remembering some common metacharacters:

metacharacters instructions
. Matches any character except the newline character
\w Matches letters or numbers or underscores or Chinese characters
\s Matches any whitespace
\d Match the Numbers
\b Matches the beginning or end of a word
^ Matches the beginning of the string
$ Matches the end of the string

With metacharacters in place, we can use them to write simple regular expressions, such as:

  1. Matches a string starting with ABC:
1ABC \ babc or ^

Copy the code
  1. Match 8-digit QQ number:
1^\d\d\d\d\d\d\d\d$

Copy the code
  1. Match a mobile phone number with 11 digits beginning with 1:
1^1\d\d\d\d\d\d\d\d\d\d$

Copy the code

2. Repeat qualifiers

You can write a lot of regular expressions with metacharacters, but if you’re careful, you’ll notice that other people’s re is simple and clear, while yours is a jumble of repeated metacharacters. Doesn’t the re provide a way to handle these repeated metacharacters?

The answer is yes! To deal with these repetition problems, some repeat qualifiers in regular expressions replace the repeated parts with appropriate qualifiers. Let’s look at some qualifiers:

grammar instructions
* Repeat zero or more times
+ Repeat one or more times
? Repeat zero or one time
{n} Repeated n times
{n,} Repeat n or more times
{n,m} Repeat n to m times

With these qualifiers in place, we can modify the previous regular expression, such as:

  1. Match 8-digit QQ number:
1^\d{8} $

Copy the code
  1. Match a mobile phone number with 11 digits beginning with 1:
1^1\d{10} $

Copy the code
  1. The matching bank card number is 14 to 18 digits:
1^\d{14.18} $

Copy the code
  1. Matches zero or more strings beginning with a and ending with b
1^ab*$

Copy the code

3. The group

As you can see from example (4) above, the * qualifier applies to the character nearest to its left, so what if I want ab to be bounded by * at the same time?

Regular expressions are grouped in parentheses (), which means that the contents in parentheses are grouped as a whole.

So when we want to match more than one AB, we can do something like this:

1^(ab)*

Copy the code

4. Escape

We see that regular expressions are grouped in parentheses, so here’s the question:

Is it a conflict if the string to be matched contains parentheses itself? What should I do?

In this case, the re provides a way to escape metacaracters, qualifiers, or keywords into normal characters by simply placing a slash (\) before the character to be escaped. For example, to match a match that begins with (ab) :

1^(\(ab\))*

Copy the code

5. Conditions or

Back to our phone number match, we all know: Domestic number comes from three net, they all have their own them roughly, such as unicom 130/131/132/155/156/185/186/145/176 them roughly, if let’s match a unicom number, have learned that according to our current regular, should do not know how to start, because it contains some coordinate conditions, That is “or”, so how do you mean “or” in regular expressions?

| regular use symbols to represent or, also called branching conditions, when meet the branch in the regular condition of any kind of condition, will be as a match.

Then we can deal with the problem with the or condition

1^ (130|131|132|155|156|185|186|145|176)\d{8} $

Copy the code

Range of 6.

When you look at the example above, do you see any patterns? Is there a compulsion to simplify? In fact there is

The re provides a metacharacter parenthesis [] to indicate the interval condition.

  1. The limit from 0 to 9 can be written as [0-9].
  2. Define A-Z as [A-Z]
  3. Limit certain numbers [165]

Let’s change the regular expression to this:

1^ ((13[0-2]) | (15[56]) | (18[5-6]) |145|176)\d{8} $

Copy the code

Okay, that’s all for basic usage of the regular expression, but it also has a lot of knowledge and metacharacters, we are here only lists some metacharacters and grammar, to give those who do not know the regular or want to learn the regular but people can’t bear to see the document to do a quick introductory tutorial, read this tutorial, even if you can’t write on a regular, tall At least you can write some simple regex or read others’ regex. If you need to learn more, it’s up to you to practice.