Before we explained the origin, development, genre, syntax, engine, optimization and other related knowledge of regular expressions, today we will focus on regular expressions in Python language application!

Most programming languages learn regular expression design from Perl, so the syntax is basically similar, but each language has its own functions to support regular expressions. Today we will learn about regular expression functions in Python.

Note: in order to avoid the occurrence of code format confusion, pig brother try to use code screenshots oh demo.

I. Introduction to RE module

When you think of Python regular expression support, the first thing that comes to mind is the Re library, which is a standard library for processing text in Python.

The standard library means that this is a built-in Python module, which does not need to be downloaded, and there are currently about 300 built-in Python modules. You can view all of Python’s built-in modules here: docs.python.org/3/py-modind…

Re is a built-in module, so you don’t need to download it.

import re
Copy the code

Official documentation of the re module: docs.python.org/zh-cn/3.8/l… Re module library source: github.com/python/cpyt…

2. Re module constants

Constants represent variables that cannot be changed and are commonly used for marking.

There are nine constants in the re module, all of which are int values!

RegexFlag enumeration class

Let’s quickly learn what these constants do and how to use them, in order of frequency!

1. IGNORECASE

Syntax: re.IGNORECASE or abbreviated to re.I

Purpose: Case – insensitive matching.

Code examples:

Capital B
Lowercase B

2. ASCII

Syntax: re.ASCII or abbreviated re.A

What it does: As the name implies, ASCII stands for ASCII, making \w, \w, \b, \b, \d, \d, \s and \s match only ASCII, not Unicode.

Code examples:

\w+
ASCII

Note: This only applies to string matching patterns, not byte matching patterns.

3. DOTALL

Syntax: re.DOTALL or abbreviated re.s

DOT, DOT, ALL, DOT, DOT, DOT Matches all, including the newline character \n. By default. Does not match line \n.

Code examples:

.
\n
re.DOTALL
\n

Note: In default matching mode. Newline \n does not match.

4. MULTILINE

Syntax: re.MULTILINE or re.m for short

Function: In multi-line mode, when a string has a newline character \n, the default mode does not support newline features, such as the beginning and end of a line, while in multi-line mode, matching the beginning of a line is supported.

Code examples:

^
\n

Note: in regular syntax, ^ matches the beginning of A line and \A matches the beginning of A string. In single-line mode, \A does not recognize \n.

5. VERBOSE

Syntax: re.verbose or re.x for short

Action: detailed pattern, can add annotations in regular expression!

Code examples:

Verbose patterns may provide you with an alternative way to annotate a regular expression when it is complex, but they should not be used as a show-trick and should be used with caution!

6.LOCALE

Syntax: re.LOCALE or simply re.l

Function: the current locale determines \w, \w, \b, \b and case sensitive matching. This tag is valid only for byte styles. This tag is officially deprecated because the locale mechanism is so unreliable that it can only handle one “habit” at a time and only works with 8-bit bytes.

Note: since this mark has not been officially recommended to use, and brother Pig has not used, so we do not give the actual case!

7.UNICODE

Syntax: Re.unicode or re.u for short

What it does: Similar to ASCII mode, matches characters supported by unicode encoding, but Python 3 default strings are already Unicode, so it’s a bit redundant.

8. DEBUG

Grammar: re. The DEBUG

Function: Displays debugging information during compilation.

Code examples:

Although the debug mode does print the compilation information, but I do not understand the language and the meaning of the expression, hope to understand the friend can not hesitate to comment.

9.TEMPLATE

Syntax: re.TEMPLATE or simply re.t

Disable Backtracking (TEMPLATE) : Disable backtracking(TEMPLATE);

10. Constant summary

  1. Of the nine constants, the first five (IGNORECASE, ASCII, DOTALL, MULTILINE, and VERBOSE) are useful. Two (LOCALE, UNICODE) are not recommended, and two (TEMPLATE, DEBUG) are experimental and cannot be relied on.
  2. Constants in re commonly used functions can be used, check the source can be informed.
  3. Constant can be superimposed, because the constant value is 2 power value, so it can be used superimposed, please use superimposed|Symbol, do not use+Symbol!

Finally, let’s summarize the constants in the RE module with a mind map. If you need a HIGH-DEFINITION map or XMIND file, you can reply to it in the background of wechat public account “Naked sleeping pig” : RE.

Re module functions

Re module has 12 functions, pig brother will be functional classification to explain; It’s more comparable and easier to remember.

1. Search for a match

There are three functions that search and return a match: search, match, and fullmatch. The differences are as follows:

  1. Search: Searches for matches at any position
  2. Match: Must match from the beginning of the string
  3. Fullmatch: The entire string matches the re exactly

Let’s take a look at the actual code example:

Case 1:

The search function

The fullmatch function needs to be identical, so it doesn’t match either!

Case 2:

Match function
Fullmatch function

Case 3:

Fullmatch function

Complete case:

Note: Finding a Match always returns a Match.

2. Search for multiple matches

Findall (); finditer (); findall (); finditer ();

  1. Findall: Searches anywhere in the string and returns a list
  2. Finditer: Searches anywhere in the string and returns an iterator

The two methods are basically similar, except that one returns a list and the other returns an iterator. We know that lists are generated once in memory, while iterators are generated bit by bit as needed, which is better memory usage.

Finditer function
The.findall function

Division of 3.

Split (pattern, string, maxsplit=0, flags=0) re.split(pattern, string, maxsplit=0, flags=0)

Note:strThe module also has a split function, so how to choose between these two functions?

What about the speed of the two? Re.split function and str.split function execution times and execution time comparison figure:

str.split
re.split

So the conclusion is: use it when you don’t need regex support and the data volume and number are smallstr.splitFunction is more appropriate, and vice versare.splitFunction.

Note: Specific execution time is related to test data!

4. Replace

Replace the sub function and subn function, their function is similar!

Let’s start with the sub function:

Re.sub (pattern, repl, string, count=0, flags=0) Re.sub (pattern, repl, String, count=0, flags=0) repl replaces string characters that are matched by pattern.

Note that the sub function input parameter: repl replacement content can be either a string or a function. If repL is a function, there can be only one entry: Match Match object.

Re.subn (pattern, repl, String, count=0, flags=0) functions the same as re.sub except that it returns a tuple (string, number of substitutions).

5. Compile the re object

The compile and template functions compile the patterns of regular expressions into a regular expression object (the regular object Pattern), which has the same regular function as the RE module (we’ll look at Pattern regular objects later).

The template function
The compile function
re.TEMPLATE

6. Other

Re.escape (pattern) can escape characters with special meanings in regular expressions, such as:. Or *, for a practical example:

re.escape(pattern)
It is recommended that you manually escape!

The re.purge() function is used to purge the regular expression cache. Let’s take a look at the source code to see what it does behind the scenes:

re.purge()

7. To summarize

Finally, let’s summarize the functions in the RE module with a mind map. If you need a HIGH-DEFINITION map or XMIND file, you can reply to it in the background of wechat public account “Naked sleeping pig” : RE.

4. The RE module is abnormal

The re module also contains a regular expression compilation error, which raises an exception when the regular expression is invalid.

Let’s look at specific cases:

Note: This exception must be invalid because the regular expression itself has nothing to do with the string being matched!

5. Regular object Pattern

We’ve covered constants, functions, and exceptions in the RE module, but it’s worth going back to the regular object Pattern.

1. Consistent with re module functions

Among the re module’s functions, there is an important function, the compile function, which can be precompiled to return a regular object, this regular object has the same function as the RE module, let’s look at the source of Pattern class.

Re module
Regular object Pattern

And, some students may have seen the re module source, you will find that in fact, compile function and other RE functions (search, split, sub, etc.) internal call is the same function, or call the re object function!

# re function
re.search(pattern, text)

# re object function
compile = re.compile(pattern)
compile.search(text)
Copy the code

Is it necessary to use the compile function to get the regular object and then call the search function? Can I just call re.search?

2. What does the official document say

Does the official documentation say whether to use the RE module or the regular Pattern object?

The regular object Pattern is recommended when a regular expression is used multiple times
re.compile(pattern)

3. What about the actual tests?

The official documentation above recommends using regular objects when using a regular expression multiple times. Is this really the case?

Let’s test it

Re. The search function
The compile. The search function
The count time

The result is drawn into a broken line graph:

Regular object Pattern
Re module

The Python documentation recommends using regular object functions when using a regular expression more than once.

Six, notes

With Python regular expressions covered, there are a few things to be aware of.

1. Bytes and strings

The pattern and searched string can be either a Unicode string (STR) or an 8-bit byte string (bytes). However, Unicode strings and 8-bit bytes cannot be mixed!

2. The role of r

Regular expressions use backslashes (“) to represent special forms or to escape special characters into ordinary characters.

Backslashes work the same way in ordinary Python strings, so there’s a conflict.

The solution is to use Python’s raw string notation for regular expression styles; In string literals prefixed with ‘r’, backslashes need not be treated in any special way.

3. The regular lookup function returns the matching object

It’s easy to forget that the function that finds a match (search, match, fullmatch) all returns a match object, which needs to be retrieved by mate.group ().

match.group()
match.groups()

4. Reuse a re

If you want to reuse a regular expression, it is recommended to use the re.compile(pattern) function to return a regular object, and then reuse the regular object. This is faster!

5.Python regular interviews

You may need to use Python regular expressions for the written test, but it should not be too difficult. As long as you remember the differences between these methods, you will be able to use them correctly.

All the contents of the article have been organized into a mind map, if you want to mind map XMIND format students can pay attention to the pig wechat public account: naked sleeping pig, reply: re can be obtained!