While learning the jQuery source code, I came across an interesting regular:

The purpose of this re is simple: it matches the tag name of the HTML tag. At first glance, my feeling is, isn’t this pipe symbol redundant?

But on closer inspection, it’s not so simple.

First, introduce the basic usage of grouping:

  • Capture the packet:(partten): matches the pattern and gets the match.
  • Variant:Uncaptured grouping:(? :partten): matches the pattern but does not obtain the matching result. That is, it is a non-obtained match and is not stored for later use.

If you don’t want to be case-sensitive, use a modified group :(? I: partten). This modifier, also known as an inner tag, comes in six varieties. (you can use? -i) Syntax deletes inner tags.

The difference between a captured group and a non-captured group is that a captured group can store the result of the match in memory. You can use the \num attribute to retrieve the contents of the num group. This usage is called a backreference.

For example, in the figure above, we use \1 to get the contents of the group (\w+) matching.

If you think numbering groups is too much trouble, it’s ok. Modern schools of regular expressions also support named capture, which gives groups a variable name. However, this feature is not supported by JavaScript (python does). Manual funny

The syntax is: (?

regex)

The python syntax is (? P

regex)

Here’s another thing to note:

The python syntax for named backreferences is (? P=group_name), although it’s wrapped in (), it’s not a group, and you can’t put anything between the name and the parentheses.

Just as numbered capture groups and named capture groups do, named backreferences and numbered backreferences do the same, but there is a slight difference here that you should pay attention to as you use them.

Finally, let’s talk about the use of pipes in non-captured groups.

If we want to match the HelloWorld and HelloChina, normal writing is HelloWorld | HelloChina, but this is not simple, because the two strings have the same substring Hello. If this substring is long, wouldn’t our regular expression be too long?

Using the pipe character in a non-captured group solves this problem. We extract the substring, the HelloWorld | HelloChina can be abbreviated to the Hello (? : the World | China).

After understanding these basic usage, we are back to the first question, in non capturing group, the pipe (|) in the end how do?

This usage is a trick to indicate that you can match the contents of a non-captured group zero or one times, such as Hello(? : the World can match the HelloWorld, HelloChina | China |), also can only to match the Hello.

This usage corresponds to the quantifier, right? {0, 1} Hello (? : the World | China |), Hello (? :World|China)? , Hello? : the World | China) {0, 1} the effect of the three are equivalent.

In practice, the first usage is recommended because in regular expressions,? There are many ways to use:

  • Can be followed by a quantifier to indicateNon-greedy matching (also called lazy matching), such as+?.*?.{n}?.{n,}?.{n,m}?
  • It is itself a quantifier, matching the preceding expression 0 or 1 times, and{0, 1}Equivalent, for exampledo(es)?It can matchdo, can also matchdoes
  • Can be a fixed form of grouping: e.g(? :)Represents a non-captured group
  • Named groups:(? 'name'partten).(? <name>partten).(? P<name>partten)
  • It can also be a fixed form of assertion, as in:
    • Positive zero-width assertion:(? =partten)
    • Positive negation of zero-width assertions:(? ! partten)
    • Reverse affirmative zero-width assertion:(? <=partten)
    • Reverse negation of zero-width assertions:(? <! partten)
  • Note: (? # Comment on the content, which will not be parsed when the re matches)
  • Other uncommon uses:
    • Curing grouping: ‘ ‘
    • Inside the tag. Grammar is(? imsxXU). Such as(? i)aYou can match lettersaYou can also match lettersA
    • Recursive expression:(? R)
    • Recurse the first subexpression:(? 1)

In conclusion.