Python: Regular modules

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

Hello everyone, I am a bowl week, a front end that does not want to be drunk (inrolled). If I am lucky enough to write an article that you like, I am very lucky

A module for manipulating regular expressions is provided in Python, the RE module.

A regular expression decorator

The modifier	describe	Full name
re.I	Make the match case insensitive	re.IGNORECASE
re.A	let`\w`.`\W`.`\b`.`\B`.`\d`.`\D`.`\s` 和 `\S`Matches only ASCII, not Unicode	re.ASCII
re.L	Do location-aware matching	re.LOCALE
re.M	Multi-line matching, affecting ^ and $, is supported at the beginning of a line in multi-line mode	re.MULTILINE
re.S	make`.`Matches all characters, including newlines	e.DOTALL
re.U	Resolves characters according to the Unicode character set. This sign affects`\w`.`\W`.`\b`.`\B`.	re.UNICODE
re.X	This flag allows you to make your regular expressions easier to understand by giving you a more flexible format.	re.VERBOSE

Find a single match

match

Re.match Returns a matching object if the 0 or more characters at the beginning of the string match the regular expression style. If there is no match, None is returned; Notice that this is different from zero-length matching.

Syntax format

re.match(pattern, string, flags=0)
Copy the code

Pattern: indicates a matching regular expression
String: string to be matched.
Flags: flags used to control the matching mode of regular expressions, such as case sensitive and multi-line matching.

The re.match method returns a matching object on success, or None otherwise.

The sample code

""" -*- coding:uft-8 -*- author: Time :2020/5/30 """ import re string1 = "hello python" string2 = "hell5o python" pattern = r"[a-z]+\s\w+" # Print (re.match(pattern, string1)) # < re.match object; span=(0, 12), match='hello python'> print(re.match(pattern, string2)) # NoneCopy the code

Start Import the RE module. R “” is a regular expression

Because string2 has a number 5 in the middle, it doesn’t match

group

Re.group gets the result from the Match object, but no group defaults to 0. Group indexes start at 0 (0 is a complete Match), and if there are multiple groups, the first group is 1. You can also give it a name

The sample code

""" -*- coding:uft-8 -*- author: Import re string1 = "Hello Python" string2 = "hell5o Python "pattern = r"[a-z]+\s\w+" pattern1 = r"(\w+)(\s)(\w+)" pattern2 = r"(? P<first>\w+\s)(? P<last>\w+)" print(re.match(pattern, string1)) # span=(0, 12), match='hello python'> print(re.match(pattern, string1).group()) # hello python print(re.match(pattern, string2)) # None print(re.match(pattern1, string2).group(0)) # hell5o python print(re.match(pattern1, Group (1)) # hell5o print(re.match(pattern1, string2).group(2)) # hell5o print(re.match(pattern1, string2).group(2)) string2).group(3)) # python print(re.match(pattern2, string2).group("last")) # pythonCopy the code

search

Re.search scans the entire string for the first position of the matching style and returns a corresponding matching object. If there is no match, return None; Note that this is not the same as finding a zero-length match. The syntax is the same as match

The sample code

""" -*- coding:uft-8 -*- author: Time :2020/5/30 "" import re string = "Hi World Hello python" pattern = r"Hello python" print(re.search(pattern, string).group()) # Hello python print(re.match(pattern, string)) # NoneCopy the code

The difference between the two

Re.match matches only the beginning of the string. If the beginning of the string does not match the regular expression, the match fails and the function returns None, while Re.search matches the entire string until a match is found.

fullmatch

Re.fullmatch Returns a corresponding match object if the entire string matches the regular expression. Otherwise, return None; Note that matching with zero length is different.

The syntax is the same as above

The sample code

""" -*- coding:uft-8 -*- author: Time :2020/5/30 "" import re string = "Hi World Hello Python "pattern = r"Hi World Hello Python" pattern1 = r"Hi World hello python" print(re.fullmatch(pattern, string)) # <re.Match object; span=(0, 21), match='Hi World Hello python'> print(re.fullmatch(pattern1, string)) # NoneCopy the code

The difference between the three

Match: matches the beginning of the string

Search: Searches for matches at any position

Fullmatch: The entire string must match the regular expression exactly

A match object

Matching objects always have a Boolean value True. Match () and search() return None if there is no match, so you can simply use the if statement to determine if there is a match

The sample code

import re string = "Hi World Hello python" pattern = r"Hello python" match1 = re.search(pattern, string) match2 = re.match(pattern, string) if match1: print(match1.group()) # Hello python if match2: Print (match2.group())Copy the code

Find multiple matches

compile

Re.compile compiles the style of the regular expression into a regular object that can be used for matching

Grammatical structure

re.compile(pattern, flags=0)
Copy the code

pattern: Indicates the matched regular expression
flags: flag bit used to control the matching mode of regular expressions, such as case – sensitive or multi-line matching.

findall

Re.findall finds all substrings matched by the regular expression in the string and returns a list, or an empty list if no matches are found. Unlike match and search, which match once, findAll matches all.

Grammatical structure

re.findall(string[, pos[, endpos]])
Copy the code

string: Indicates the string to be matched.
pos: Optional argument that specifies the start position of the string. Default is 0.
endpos: Specifies the end of the string. The default is the length of the string

finditer

All non-repeating matches of pattern in string are returned as an iterator that holds the matching object. * String * scans from left to right, matching in order. Empty matches are also included in the result.

The syntax structure is the same as match

The sample code

Import re from collections. ABC import Iterator string = "Hello Python hi javascript" pattern = r"\b\w+\b" pattern_object = re.compile(r"\b\w+\b") print(type(pattern_object)) # <class 're.Pattern'> findall = pattern_object.findall(string) for i in findall: Print (I) finditer = re.finditer(pattern, string) Iterator)) # True for _ in range(4): Finditer1 = finditer.__next__() # print(finditer1.group()) "" -- loop -- hello python hi javascript" "Copy the code

Finditer returns better performance than FindAll if there are a large number of matches, which is the difference between a list and an iterator.

Split the split

The re.split method splits the string according to the substrings that can be matched and returns a list

Grammatical structure

re.split(pattern, string[, maxsplit=0, flags=0])
Copy the code

pattern: Indicates the matched regular expression
string: delimiter.
maxsplit: separation times,maxsplit=1The default value is 0 and the number of times is unlimited.
flags: flag bit used to control the matching mode of regular expressions, such as case – sensitive or multi-line matching.

The sample code

Import re string = "hello Hi good Morning Goodnight Python javascript Linux" "pattern = r'\s+' # Print (re.split(pattern, string)) # ['hello', 'hi', 'good', 'morning', 'goodnight', 'python', 'javascript', Print (re.split(pattern, string, 5)) # ['hello', 'hi', 'good', 'morning', 'goodnight', 'python\njavascript\nLinux\n']Copy the code

Unlike the SPLIT of the STR module, the split of the RE module supports re

replace

sub

Re.sub replaces matches in strings

Grammatical structure

re.sub(pattern, repl, string, count=0, flags=0)
Copy the code

pattern : Pattern string in the re.
repl : The replacement string, which can also be a function.
string : The original string to be searched and replaced.
count: Indicates the maximum number of times to replace a pattern. The default value is 0, indicating that all matches are replaced.
flags : Matching pattern used at compile time, in numeric form.

Here you can complete a certain hand of the comment area, modify the small case of bad comments

Print (re.sub(pattern, "cute ", string) print(re.sub(pattern," cute ", string))Copy the code

rendering

subn

Behaves the same as sub(), but returns a tuple (string, number of substitutions).

escape

Re.escape (pattern) Escapes special characters in the pattern. Such as metacharacters in re.

The sample code

Import re pattern = r'\w\s*\d\d.' print(re.escape(pattern)) # \w\s*\d.Copy the code

Matches any text string that might contain a regular expression metacharacter, which is useful, but error-prone and better escaped manually

purge

Re.purge () clears the regular expression cache.

A regular expression decorator

Find a single match

match

group

search

fullmatch

A match object

Find multiple matches

compile

findall

finditer

Split the split

replace

sub

subn

escape

purge

Related Posts

For the first time, the AI system was truly self-programmed, killing junior programmers

Jetson Nano uses TensorRT to accelerate Yolov3-tiny target recognition

Neural network parameters -warmup and decay