Regular expression: Do not include the specified string content after it

Previously only /< ABC (? ! Def).+>/. Exec (”

\n

“) matches

.

/<.+(? ! Def). + > /. The exec (” < abcdef > \ n < abczzz > “), yi? How to match the front! From getting started to giving up.

Yesterday (2019-04-07) I wrote /<(? : (? ! Def))+>/. Exec (”

\n

“) ! Excluded string) the function of this structure can not be understood, no wonder, (.(? ! The excluded string))+ is the correct solution.

Leave an unresolved question, each character after the exclusion works, a bunch of undetermined length characters after the exclusion does not work, the foreground does not work with the preceding +, *, {}? Look at the end of the explanation.

Attach: / < (? ! .def).+>/. Exec (”

\n

“) /<.+(?); /<. ! Def) z > /. The exec (” < abcdefzzz > \ n < abczzz > “).

The regular expression matches whether or not to include the specified string content after the specified content:

Do: relatively simple, write this want just

Don’t: more complex than a lot, how to eliminate?

Principle of premise

The expression inside the fixed content of the string can not write as far as possible not to write, can be simplified as far as possible to write (as written beforeabcPart cannot be written dead)
Can use a small amount of forward-looking (forward), backward (reverse) basic do not learn, learn to study those browser support, dare not use, too much also can not learn.
NoJS(Not Only JavaScript); Not just JS in the browser; But the js re/exp/Literals are so simple that they don’t need functions, objects, or strings. Don’t accept refutations.

Assume the text to be matched

htmlRaw=`
<div* * * ="***
    ***" class="* * *"* * * ="* * *">
    class=" matchX 1"
    <div* * * ="* * * * * *" class="*** matchX ***"* * * ="*** excludeX ***">
        class=" matchX 2"
    </div>
    class=" matchX 3"
</div>

<div* * * ="***
    ***" class="* * *"* * * ="* * *">
    class=" matchX 4"
    <div* * * ="* * * * * *" class="* * * * * *"* * * ="* * *">
        class=" matchX 5"
    </div>
    class=" matchX 6"
</div>. `; // Note: *** is not fixed, but does not appear unescaped XML entities, matchX, excludeX // Note: HtmlNoWrap = htmlraw.replace (/[\r\n]+/g," "); / / regular test code (/ [\ s \ s] * / exec (htmlRaw) | | []) [0]Copy the code

Do not match single characters

Match the div. MatchX tag: < div * * * = “* * * * * *” class = “* * * * * * matchX” = “* * * * * * >

You can use [^>] directly to restrict matchX to <> HTML tags, meaning that the text in <> does not have a trailing > character.

A single character is fairly simple:

/ / effective
/<div[^>]+matchX[^>]+>/.exec(htmlRaw)[0]
Copy the code

If it is not bounded within the <> tag, it may match out of bounds; And this kind of uncertainty, sooner or later, things go wrong:

/ / is invalid/<div.+? matchX.+? >/.exec(htmlNoWrap)[0] /<div[\s\S]+? matchX[\s\S]+? >/.exec(htmlRaw)[0Copy the code

Other single-character scenarios can be flexibly used.

Do not match a string

Matching the first div at the first level without excludeX text content returns the div that contains matchX 4

[^] Syntax can only exclude a single character, not a string. [^ ABC] [^ ABC] [^ ABC] There seems to be no easy solution other than using foresight.

Use the (.(? ! Excluded string))+ structure will do the job, the core is in (? (: [^ >]? ! ExcludeX)) * :

[^ / / effective / < div >] * > [^ <] + < div (? (: [^ >]? ! excludeX))*>[^<]+<\/div>[^<]+<\/div>/.exec(htmlRaw)[0]Copy the code

To contain a string match, just write the desired string. It’s too easy to write this example.

Did not study(. (? ! Excluded string))+Structural performance.

for`. + (? ! Rule out)`Dissolving doubts that don’t work

Because (? ! Exclusion does not apply to every string that greed matches, only to the last character that greed matches. This means that foresight does not prevent + from greedy matching of all characters up to the last character.

/ <. + (? ! Def).+>/. Exec (”

\n

“)

From: www.v2ex.com/t/552813#re…

Additional records

/(\d+)(? ! \.1)/.exec("123.141"> 123: \d+ greed matches. Up to > 12: found 123.1 does not conform (? ! \.1), back one bit > no expression left, return 12 /(.+)(? ! \.1)/.exec("123.141"+ greedy match to end > 123.141: match (? ! \.1) > No expression, return 123.141 /(.(? ! \.1))+/.exec("123.141"> 1:. Match to a new bit > 1:123 match (? ! \.1) > 12:.match to a new bit > 12:123. ! \.1) > 123:. Match a new bit > 12: find 123.1 does not match (? ! \.1), back up one bit and exit the loop > there is no expression, return 12Copy the code

If you’re looking ahead to every character, only the last one is easier to understand.

Regular expression: Do not include the specified string content after it

Principle of premise

Assume the text to be matched

Do not match single characters

Do not match a string

for. + (? ! Rule out)Dissolving doubts that don’t work

Additional records

Related Posts

SVG Icon Scheme Practice based on SVG-Sprite

Wechat mini programs record users’ movements

HTTPS principle parsing that can withstand torture

for`. + (? ! Rule out)`Dissolving doubts that don’t work