Debugger regexp.exec () finds a property lastIndex, and the regular expression is stateful.

Parse a property in a string

const strTag = `
      
`
; Copy the code

If you have a string like this and want to match three properties, you write a regular expression like this:

const regAttr = /\s[a-z0-9-_]+\b(\s*=\s*('|")[\s\S]*? \ 2)? /gi
Copy the code

const arrAttr = strTag.match(regAttr);
// [' id="outer"', ' data-a', ' ttt = "asd"']
Copy the code

When you want to parse the properties from the array above, write a regular like this:

const regSplitAttr = /(\s[a-z0-9-_]+\b\s*)(? :=(\s*('|")[\s\S]*? \ 3))? /ig;
Copy the code

for(const attr of arrAttr) {
  const attrBuffer = regSplitAttr.exec(attr);
  console.log(attrBuffer);
}
// [" id="outer"", " id", ""outer"", """, index: 0, input: " id="outer"", groups: undefined]
// null
// [" ttt = "asd"", " ttt ", " "asd"", """, index: 0, input: " ttt = "asd"", groups: undefined]
Copy the code

Seeing the result of parsing, the question arises, why is the result null in the second case?

The theoretical knowledge

MDN lastIndex

LastIndex is a readable and writable integer property of a regular expression that represents the starting index for the next regular match. This property is set and only takes effect if the re itself uses the global match g. The setting of this property follows the following rules:

  • iflastIndexIs greater than the length of the string, thenregexp.testregexp.execThe match will fail and the lastIndex will be set to 0
  • iflastIndexIs equal to the length of the string, and the regular expression matches an empty stringlastIndexStarting string
  • iflastIndexIs equal to the length of the string, and the regular expression does not match an empty string, then the regular expression does not match the string,lastIndexIs reset to 0
  • Otherwise,lastIndexIs set to the next position following the most recent successful match.

If you use the same re with G multiple times, the next match will have the state of the last match:

for(const attr of arrAttr) {
  console.log(regSplitAttr.lastIndex);
  const attrBuffer = regSplitAttr.exec(attr);
}
/ / 0
/ / 11
/ / 0
Copy the code

So if I match the second result from 11 where I saved the last match, I get nothing.

Problem solution

The usual way to write a regular expression is to write a g, but this can lead to some unexpected consequences. In this case, it can be solved as follows:

  • Take the regulargRemove:/(\s[a-z0-9-_]+\b\s*)(? :=(\s*('|")[\s\S]*? \ 3))? /i
  • Before each loop, manually set lastIndex to zero:regSplitAttr.lastIndex = 0;
  • Generate a new re before each loop

conclusion

Regular expressions can be very powerful because we work with strings a lot, and we can take every opportunity to practice regular expressions. For example, we can use VSCode’s regular search function.

Blog posts

www.chenng.cn/archives/