The RE module is a module provided in the Python standard library for processing regular expressions. It is convenient to use the RE module to match and replace strings with regular expressions

Match the grouping

The Python RE module provides the match function, which matches regular expression rules specified in strings. For example, if you want to match Isaac Newton in “Isaac Newton, physicist”, you can use regular expression \w+ \w+ and run the following result:

>>> m = re.match("\w+ \w+"."Isaac Newton, physicist")
>>> m
<re.Match object; span=(0.12), match='Isaac Newton'>
Copy the code

The first argument to re.match is the specified regular expression rule and the second argument is the string to be matched. Regular expression rule \w+ indicates matching a string of consecutive characters. The number of characters to be matched is greater than 1. \w+ \w+ matches two consecutive characters separated by a space

When matches are performed using match, the matched results are grouped and can be viewed through the group() interface where match returns results

>>> m.group(0)
'Isaac Newton'
Copy the code

By default, match produces only one group, the 0th group, representing the entire match. For the above example, the 0th grouping is the complete content that \w+ \w+ matches, namely Issac Newton. Groups of matches can be manually specified using parentheses () in the regular expression. For example, if you wanted Issac and Newton as two groups, you could change the regular expression to (\w+) (\w+) :

>>> m = re.match("(\w+) (\w+)"."Isaac Newton, physicist")
Copy the code

In the regular expression (\w+) (\w+), two groups are specified that match \w+ and are separated by Spaces. Use groups() to see all groups in the matching result:

>>> m.groups()
('Isaac'.'Newton')
Copy the code

You can also use the group() interface to view each group separately, where group(0) still represents the complete matching result, group(1) represents the first group in the matching result, group(2) represents the second group, and so on:

>>> m.group(0)
'Isaac Newton'
>>> m.group(1)
'Isaac'
>>> m.group(2)
'Newton'
Copy the code

Re.sub matches and replaces

The re.match() function provides an interface to match regular expressions. Re.sub () can not only match regular expressions, but also replace the result in a string, generating a new string. For example, to replace the matching result of (\w+) (\w+) in the string with Albert Einstein, we could write:

>>> re.sub("(\w+) (\w+)"."Albert Einstein"."Isaac Newton, physicist")
'Albert Einstein, physicist'
Copy the code

In re.sub, the first argument represents the matched regular expression, the second argument represents the replacement expression, and the third argument represents the original string

The replacement expression here is the manually specified new string Albert Einstein, which has nothing to do with what was in the original string. If you want to reuse the contents of the original string, you need to use the backreference function of re.sub.

backreferences

A backreference refers to the fact that a match in the original string can be referenced in the process of specifying the replacement result. For example, if (\w+) (\w+) matches Isaac Newton in the original string, rewrite the result as FirstName: Isaac, LastName: Newton.

Since you need a reference, you need an expression that represents what matches. Re.sub happens to have the same grouping as re.match, so you only need to reference the grouping in the replacement expression. There are several types of references:

  • \number: e.g.\ 1Represents the first group in the matching result, which is in the exampleIsaacPart.
  • \g<number>: e.g.\g<1>, and\numberIn the same notation, represents the first grouping in the matching result. with\numberCompared to the notation,\g<number>Ambiguity is avoided. Imagine if you wanted to use\numberMatched to the first groupIsaacReplace withIsaac0, then need to use10 \Add the value after the first group0, but the program recognizes the 10th grouping. While the use of\g<number>I just need to write\g<1>0Can.

Returning to the original example, rewrite the result Isaac Newton as FirstName: Isaac, LastName: Newton, using the following expression:

>>> re.sub("(\w+) (\w+)"."FirstName: \g<1>, LastName: \g<2>"."Isaac Newton, physicist")
'FirstName: Isaac, LastName: Newton, physicist'
Copy the code

reference

  • Re – Regular expression operations
  • Python re(gex)? — Groupings and backreferences