Q: Match the correct title tags (< H1 > to

) and their contents in the following HTML fragments
<h1>I am a H1</h1>
<div>I am a div</div>
<h2>I am the H2</h2>
<span>I am a span</span>
<h3>I am the H3</h3>
<h3>I'm wrong H3</h4>
<h4>I am the H4</h4>
Copy the code

An overview of the

The difficulties in this question include:

  1. Must match each<h1>to<h6>
  2. Cannot match a tag contained within two header tags to, for example< span > I am span < / span >
  3. Cannot match incorrectly formatted title tags, such as<h3> I am wrong h3 </h4>

The second point is easier to solve, and can be avoided by using lazy matching within a pair of tags. But for the first and third points, backtracking references to the re are needed

A retrospective reference to a re

Backtracking refers to the fact that the second half of the schema refers to a subexpression defined in the first half of the schema, and can only be used to refer to a subexpression in the schema (that is, the part enclosed by metacharacters (and)). To put it simply, you can think of backtracking references as references to pairs of variables.

Method of use

\1 refers to the first subexpression, \2 refers to the second subexpression, and so on (usually, backreferences match from 1).

Matters needing attention

Since subexpressions are referred to relative to their positions in the expression, the downside is that it is easy to invalide the regex by backreferencing it once the relative positions of the subexpressions are changed.

The answer to the

OK, now you can answer the question at the beginning of the article:

<[hH]([1-6])>.*? </[hH]\1>Copy the code

Where \1 refers to a backreference to the first subexpression, the reference ([1-6]). And use lazy matching inside the open and close tags. Instead of being greedy.