Written on May 7, 2015, may be out of date, please refer to it carefully.


Today, I’m going to share a little bit of information about regular expressions that can get overlooked.

Underscore we know that regular expressions can use. To represent any single character, but in the source code of underscore and jquery, we can see that the code of these well-known class libraries often does not use. To represent arbitrary characters, use [\w\ w] or [\s\ s] instead. At first glance, it seems that the meaning of the expression is the same, but why not put the simple method, and go around more circles? Let’s talk a little bit about that today.

First of all, we must correctly understand the meaning of. It may be misleading to say that it represents any single character, but it must be emphasized that this “any single character” does not include the characters that control the newline, i.e., \n, \r, \u2028, or \u2029. \W and \s can contain these characters. The difference between these two methods is clear, that is, the difference between matching several newline controls.

So when do we need to consider these newline controls? When the string to be processed may contain a newline. There are a lot of scenarios like handling HTML strings, handling templates, node.js reading text, and so on.

When it comes to multi-line text scenarios, it’s easy to think of the M-pattern (multi-line pattern) for regular expressions. Does that have any effect on what we’re talking about today? I’m not really sure. Why not? Some people swear that the meaning of. In single-line mode is different from that in multi-line mode, which is equivalent to [\w\ w] or [\s\ s], while the multi-line mode excludes the newline control character. But according to my tests, and according to MDN, this is not true. There are a lot of languages with regular expressions that have this feature, but I haven’t seen it in javascript, and I’m not sure if there’s a browser difference. So what does the multi-line pattern do to javascript? I think I just changed the meaning of the ^ and $flags: in single-line mode, the beginning and end of the entire string, respectively; In multi-line mode, the start and end of each line are represented. In both multi-line mode and single-line mode, I think. Does not contain the newline control character, equivalent to [^\n\r\u2028\u2029].

A little further, for modern browsers, you can use [^] to match any character.

/.*/g, /^.*$/g, /.*/gm, /^.*$/gm.