[This is the 14th day of my participation in the August Challenge. For details, see: August Challenge.]

Search, matching, and aggregation are an important part of everyday web activity, for example, when browsing or searching for certain keywords, there are a lot of searches. To make search/match efficient and accurate, popular editors like VsCode and Sublime use regular expressions to support search and replace. Therefore, when using these editors, you can search and match the selected text when pressing the CTRL + F key combination.

In addition to searching, you can also use regular expressions to perform form input validation. For example, you can check whether the mobile phone number entered by the user is all digits, or whether the password entered contains special characters. Many developers like to use regular expressions (RegExp) because they are efficient and not constrained by the programming language. A RegExp written in JavaScript can be easily migrated to Golang or Python.

Yesterday’s article, “5 Methods of JavaScript Regular Expressions,” introduced five methods involved in regular expressions in JavaScript, In this article, I’ll explain what RegExp is in JavaScript, its importance, special characters, how to effectively create and write them, the main use cases, and its different properties and methods.

What is a regular expression

A regular expression is a sequence of characters used to match combinations of characters in a text match/search string. In JavaScript, regular expressions search for patterns from a sequence of characters, and regular expressions are also objects.

RegExp makes searching and matching strings easier and faster. For example, text can be easily and efficiently filtered/matched in search engines, logs, editors, etc. This is where the RegExp pattern comes in, defining the search pattern with a series of characters.

The importance of regular expressions

With the acceleration of digital transformation, information has become an important part of more and more industries. From there, we’ll explore why regular expressions are important and how they can be used to their advantage in data management.

Search/match of strings

Most developers who use regular expressions use it to search for and match strings. RegExp allows searching for text in other text pools. When searching for text with RegExp, you get true or false if the text is found. When you try to match a text from a set of text, you get an array with the expected text, that is, the text that matches the searched pattern.

Form input validation

Form input validation is a requirement that many projects deal with, and is more common on the front end. For example, you want your mobile phone number to be a number and your email address to be @xxx.com. The first thing that comes to mind for such a requirement is regular expressions.

Take a look at the following RegExp example to validate user input:

const mobile = 13000000000;
const regex = new RegExp("[0-9]");
console.log(regex.test(mobile)); // true
Copy the code

The above code will print true because mobile is a number between 0 and 9, but there are other rules for mobile numbers that I won’t expand here.

Web scraping

Web crawl involves extracting data from a web site. With RegExp, you can easily accomplish this requirement. For example, you can extract substrings, such as titles, urls, and so on, by scraping the source code of a Web page and extracting data that matches its pattern.

Data sorting

You can do more with data retrieved from a web page. For example, data from a network can be evaluated and arranged in the desired format to make the right decision. With RegExp, data can be aggregated and mapped for analysis purposes. Now, some COVID-19 data is captured by crawlers, and then aggregated and mapped for analysis.

Code generation

Using regular expressions generates code and can greatly improve coding efficiency, assuming a level of proficiency in regular expressions is required.

How do I create a RegExp object

Regular expressions in JavaScript are created using RegExp objects. As a result, regular expressions are mostly JavaScript objects. Now that you understand what regular expressions are, let’s look at how to create them in JavaScript.

The constructor

Another way to create regular expressions in JavaScript is to use constructors, which take the regular expression as a string in a function argument. As of ECMAScript 6, constructors can now accept regular expression literals.

It is recommended to use the constructor when creating a pattern that will send a changing regular expression at run time. For example, when validating user input or performing iterations. The syntax for creating a JavaScript regular expression using a constructor is as follows:

const regexp = new RegExp("hello", "g"); // String constructor const regexp = new regexp (/hello/, "g"); // Constructor with regular expressionCopy the code

Just like the literal representation example in the 5 Methods of JavaScript Regular Expressions, you will use the RegExp constructor to create a case-sensitive search:

const strText = "Hello China";
const regex = new RegExp("hello");
console.log(regex.test(strText)); // false
Copy the code

How to define the common tags g and I in the constructor mode, as follows:

  • tagg: Indicates a global matching mode
  • tagi: Case-insensitive mode
const strText = "Hello China";
const regex = new RegExp("hello", "i");
console.log(regex.test(strText)); // true
Copy the code

Regular expression methods

There are two main approaches to regular expressions, exec() and test(). However, there are other methods for strings in regular expressions, such as match(), matchAll(), replace(), replaceAll(), search(), and split(). From here on out, you’ll explore the different ways you can use JavaScript regular expressions, some of which were introduced in the article “5 Ways to Use JavaScript Regular Expressions,” which I won’t repeat in this article.

exec()

This method performs a search and returns an array of results or null values for iterating over multiple matches in a text string. For example, here’s an instance that uses the exec() method to iterate and not to iterate:

No iterations

const regex = RegExp("chin*", "g");
const strText = "hello china,i love china";

const result = regex.exec(strText);
console.log(result); // ['chin',index: 6,input: 'hello china,i love china',groups: undefined]
Copy the code

Iterative way

const regex = RegExp("chin*", "g"); const strText = "hello china,i love china"; let arrayResult = []; while ((arrayResult = regex.exec(strText)) ! == null) {console.log(' find ${arrayResult[0]}, next search start index value: ${regex.lastindex}. `); // Find chin, the next search starts with index value: 10. // Find chin and start the next search with index value: 23. }Copy the code

As you can see from the run, in the absence of iteration, only the index of the first match is obtained. By iterating, all (multiple) matches can be obtained.

matchAll()

The matchAll() method, which must be used with the global tag g, differs from the match() method by being able to return an iterator containing all matching and capturing groups. In the match() method, the capture group with the G tag is not returned. If there is no tag g in the method, match() returns the first match and the associated capture group.

For the matchAll() method, it is important to use the G tag, otherwise an exception will occur. Here is some sample code using the match() and matchAll() methods:

Example code for match() :

const strText = "Hello China,I love China";
const regex = /Ch(i)[a-z]/g;
const found = strText.match(regex);

console.log(found); // [ 'Chin', 'Chin' ]
Copy the code

Example code for matchAll() :

const strText = "Hello China,I love China";
const regex = /Ch(i)[a-z]/g;
const found = strText.matchAll(regex);

Array.from(found, (item) => console.log(item));
// ['Chin','i',index: 6,input: 'Hello China,I love China',groups: undefined]
// ['Chin','i',index: 6,input: 'Hello China,I love China',groups: undefined]
Copy the code

From the above example code run result, you can see that the capture group I is returned. This is not returned in the instance code of the match() method. MatchAll () is grammatically similar to match().

split()

Substrings can be extracted from a string using the split() method. The purpose of this method is to split the string into substrings according to the provided pattern. An array of all the substrings is then returned, and you can split the string into words, characters, and so on using the split() method.

const strText = "Hello China,I love China";
const words = strText.split(" ");
console.log(words); // [ 'Hello', 'China,I', 'love', 'China' ]

const chars = strText.split("");
console.log(chars); // ['H', 'e', 'l', 'l', 'o',' ', 'C', 'h', 'i', 'n','a', ',', 'I', ' ', 'l','o', 'v', 'e', ' ', 'C','h', 'i', 'n', 'a']

const strCopyText = strText.split();
console.log(strCopyText); // [ 'Hello China,I love China' ]
Copy the code

Write regular expression patterns

In JavaScript, regular expression patterns can be written using simple patterns, special characters, and tags. Next, you’ll explore different ways to write regular expressions, focusing on simple patterns, special characters, and tags.

A simple model

Sometimes, when searching for text, you want to get an exact match. For example, if you want to search for the word China, search for China in the phrase Hello China,I love China. You don’t want to get a result like Chin, you want to get an exact match like China, so you prefer simple patterns.

const strText = "Hello China,I love China"; const regex = /China/; console.log(strText.search(regex)); / / 6Copy the code

Special characters

Sometimes the search doesn’t have to be precise; for example, you might want to search using ranges. You might want to search for the letters A-c, even if there are Spaces between them in the string, and for that, you need to use special characters. Special characters for RegExp in JavaScript fall into the following categories: assertions, character classes, groups and ranges, quantifiers, and Unicode attribute escapes. Let’s look at how to use special characters in these categories.

assertions

Assertions in RegExp represent pattern boundaries, and with assertions, you can indicate the beginning and end of a word, and you can write patterns for matches using the following expressions: forward or reverse prelookup. For assertions of boundary types, you can use characters like ^, $, \b, or \b. The syntax is as follows:

  • ^ Matches the beginning of the input. If I set itRegExpThe object’sMultilineProperties,^Also matches the position after ‘\n’ or ‘\r’.
  • $ Matches the end of the input. If I set itRegExpThe object’sMultilineProperties,$Also matches the position before ‘\n’ or ‘\r’.
  • \b Match word boundariesThat is, the position between a word and a space. For example,er\bCan matchneverIn theer, but can’t matchverbIn theer
  • \B Matches non-word boundaries. Such aser\BCan matchverbIn theer, but can’t matchneverIn theer

For forward or reverse lookup expressions, the syntax is as follows:

  • x(? =y)Positive positive forecheck. Only when thexFollowed by theyIs matchedx. willxyReplace the selected value to perform the assertion. For example,/Man(? =Money)/Only if followed bymoneyWhen the matchman.
  • x(? ! y)Positive negative forecheck. Any mismatchyAt the beginning of the stringx. For example,/Man(? =Money)/Only if there is nomoneyWhen the matchman
  • (? <=y)xThe reverse affirmative precheck is similar to the forward affirmative precheck, but in the opposite direction. If there isyTo matchx. For example,/Man(? =Money)/Only the current side hasmoneyIs matchedman
  • (? <! y)xReverse negative preexamination, similar to positive negative preexamination, only in the opposite direction. If there is noyTo matchx. For example,/Man(? =Money)/Only the current one does notmoneyIs matchedman

Here is the instance code for special characters and assertions:

let str1 = `let the river dry up`; // 1) Use ^ to fix the match at the beginning of the string, after the newline character. str1 = str1.replace(/^l/, "h"); console.log(1, str1); // 1 het the river dry up // 2) Use $to fix the match at the end of the string and before the newline. let str2 = `let the river dry up`; str2 = str2.replace(/p$/, "n"); console.log(2, str2); // 2 let the river dry UN // 3) Use \b to match word boundary let str3 = 'let the river dry up'; str3 = str3.replace(/\bl/, "n"); console.log(3, str3); // 3 net the river dry up // 4) Use \B to match non-word boundary let str4 = 'let the river dry up'; str4 = str4.replace(/\Bt/, "y"); console.log(4, str4); Let str5 = "let us make light"; str5 = str5.replace(/us(? = make)/, "them"); console.log(5, str5); // 6 let everyone make light // let str6 = "let us make light"; str6 = str6.replace(/us(? ! let)/, "everyone"); console.log(6, str6); // 6 let everyone make light let str6_2 = "let us let light"; str6_2 = str6_2.replace(/us(? ! let)/, "everyone"); The console. The log (" 6.2 ", str6_2); Let str7 = "let us make light"; str7 = str7.replace(/(? <=let)us/, "them"); console.log(7, str7); // 7 let us make lightCopy the code

Character classes

Character classes are used to distinguish different characters. For example, you can use character classes to distinguish letters from letters. Take a look at special characters with character classes and how they work.

  • \dMatches a numeric character, equivalent to[0-9].
  • \DMatches a non-numeric character, equivalent to[^ 0-9].
  • \wMatches letters, digits, and underscores[A-Za-z0-9_]
  • \WMatches non-alphanumeric characters. That is, characters that do not come from the basic Latin alphabet[^ A Za - z0 - _ 9].
  • \sMatches any whitespace character, including Spaces, tabs, feed, and so on, equivalent to[ \f\n\r\t\v].
  • .Match the dividing newline (, \ r \ n) for any single character to match including\n, please use the image(.|\n)The model.
  • \xhhMatches a character with two hexadecimal digits.
  • \uhhhhMatches utF-16 code units with hexadecimal numbers.
  • \cXUsed to match control characters using caret.

There are other special characters, such as \t, \r, \n, \v, \f, which match horizontal TAB, carriage return, newline, vertical TAB, and page feed, respectively.

Let’s look at the example code:

const chess = "She played the Queen in C4 and he moved his King in c2."; const coordinates = /\w\d/g; console.log(chess.match(coordinates)); / / / 'C4', 'the c2' const mood = "happy 🙂, confused 😕, sad 😢"; const emoji = /[\u{1F600}-\u{1F64F}]/gu; console.log(mood.match(emoji)); // ['🙂', '😕', '😢']Copy the code

Grouping and range

If you want to group or range expression characters, you can do this using special characters.

  • x|yUsed to matchxy. For example, the expression189 | 188Will match in the string189or188
  • [xbz]Matches any character in parentheses. For example,[xbz]Will match in the stringxb,z
  • [a-c]Matches any character within the range of characters in parentheses. For example,[a-c]Will matchabc. However, if the hyphen is at the beginning or end of the parentheses, it is treated as a normal character. Such as[-ac]Matches the hyphen in non-profit.
  • [^xyz]Matches any character not enclosed in parentheses. For example,[^xyz]Don’t matchLazyIn theyz, but will matchLA.
  • [^a-c]Matches anything that is not contained within the range of characters contained in parentheses. For example,[^a-c]Don’t matchbankIn theba, but will matchnk
  • (x)For capture groups, for example,(x)Will match the characterxAnd stores the matching characters for future reference. For example,/(family)/make family familiarIs matched and stored infamily, just like in the capture group.
  • \nUsed as a backreference to the last substring, matching the group number in the regular expressionn, includingnIt’s a positive integer.
  • \k<Name>A backreference to the last substring, matched by<Name>The named capture group specified
  • (? <Name>x)Used for name capture group, matchxAnd store it in the group attribute that returns the match, and use<Name>The name specified.
  • (? :x)For non-capture groups, in which case, pattern matchingx, but it does not store matching groups, so the matched substring cannot be recalled from the result array.

Capturing groups is to save the matched contents of regular expressions into numbered or explicitly named groups in memory for later reference.

In cases where () is used for special reasons but there is no need to reference it, you can use a non-capture group declaration to prevent it from being used as a capture group and reduce memory waste

let str1 = `let the river dry up`; str1 = str1.replace(/let|the/, "m"); console.log(1, str1); // 1 m the river dry up let str2 = `let the river dry up`; str2 = str2.replace(/[abcde]/, "o"); console.log(2, str2); // 2 lot the river dry up let str3 = `let the river dry up`; str3 = str3.replace(/[^abcde]/, "o"); console.log(3, str3); // 3 oet the river dry up let str4 = "Sir, yes Sir in Do you copy? Sir, yes Sir!" ; str4 = str4.replace(/(? <title>\w+), yes \k<title>/, "Hello"); console.log(4, str4); // 4 Hello in Do you copy? Sir, yes Sir!Copy the code

quantifiers

When matching characters, you sometimes need to specify the number of expressions or characters to match. Quantifiers can indicate the number of expressions or characters you want to match.

  • x*Used to matchxZero or more times. For example,/bo*/matchingbin Birdnothing in goat.
  • x+Used to matchxOne or more times./x+/The equivalent of{1,}.
  • x?Used to matchxZero or one.
  • x{n}Used to matchx nTime,nIs a positive integer.
  • x{n,}Used to matchx >=nWhere “n” is a positive integer.
  • x{n,m}Used to matchx nm
let str = `let the river dry up`; str = str.replace(/et*/, "a"); console.log(1, str); // 1 la the river dry up let str1 = `let the river dry up`; str1 = str1.replace(/e+/, "a"); console.log(2, str1); // 2 lat the river dry up let str2 = `let the river dry up`; str2 = str2.replace(/e? et? /, "a"); console.log(3, str2); // 3 la the river dry up let str3 = `let the riveer dry up`; str3 = str3.replace(/e{2}/, "a"); console.log(4, str3); // 4 let the rivar dry up let str4 = `let the riveer dry up`; str4 = str4.replace(/e{2,}/, "a"); console.log(5, str4); // 5 let the rivar dry up let str5 = `let theee riveer dry up`; Str5 = str5. Replace (/ e {1, 3}, "a"); console.log(6, str5); // 6 lat theee riveer dry upCopy the code

tag

Regular expressions in JavaScript have five commonly used tags that enhance regular expression patterns.

  • dUsed to generate indexes for substring matches.
  • gUsed to represent a global search.
  • iUsed to represent case-insensitive searches.
  • mUsed for multi-line matching to make boundary characters^$Matches the beginning and end of each line, multiple lines, not the beginning and end of the entire string.
  • sSpecial characters dot.Contains a newline character\n, by default.Is to match the division newline character\nAny character other than that, plussAfter the modifier,.Contains a newline character\n.

conclusion

Regular expression is more commonly used in WEB programming, form input validation in project development, content retrieval in crawler software. Higher-order usage is used to generate code, as in software developed using the MVC architecture, there are still many general rules that can be used to generate code using regex.