preface

In development, it is often necessary to do text processing, such as data search, obtaining and reorganizing text, verifying user input, and so on. When faced with such problems related to strings, using normal expressions can save a lot of processing trouble and make the program more simple and easy to understand. This article will take you through the MDN documentation and a few common examples in 15 minutes to get you started and to use this technology, so let’s get started without further discussion!

What is a normal expression?

Regular expressions are patterns that describe how strings conform to certain grammatical rules. They can be used to search, compare, extract, replace, transform, and so on. Many programming languages support the use of Regular expressions. The following example will use Javascript as an example.

Writing regular expressions

When writing a normal expression, create a RegExp object using either two slashes // or new RegExp().

// 1. Use literal, which is better when the script is compiled as soon as it is loaded.
const regex = /some text/

// 2. Create a RegExp object with new, suitable for situations where dynamic pattern generation is required.
const regex = new RegExp('some text')

// Add the flag setting to make the comparison more powerful. I: case-insensitive. G: compares all positions of the string
const regex = /some text/i
const regex = new RegExp('some text'.'g')
Copy the code

Using normal expressions

Once you have established a normal expression, you can use test and exec in RegExp to manipulate strings.

const regex = /hello world/i

// Use test to check whether the string matches pattern, and return Boolean
regex.test('Hello World !! ') // true

// Use exec to get the details of the comparison and return null if the comparison fails
regex.exec('Hello World !! ') // ["Hello World", index: 0, input: "Hello World !!", groups: undefined]
regex.exec('Hello Regex !! ') // null
Copy the code

Regular expression writing is also supported in the search, match, replace, split, and other methods on String objects.

const paragraph = 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.'

// Use search to find out if the string is in the paragraph, if the starting position of the return string is found, if the return -1 is not found
paragraph.search('tExT') // -1
paragraph.search(/tExT/i) / / 28

// Use match to find the details of the first successful match, and add g flag to list all successful strings
paragraph.match(/ing/) // ["ing", index: 45, ...]
paragraph.match(/ing/g) // ["ing", "ing"]
Copy the code

Special characters

In regular expressions, certain characters or symbols are reserved words, and direct use may not have the desired effect.

const  str ='Rails is a web framework written in Ruby'

// ^ indicates that pattern must be at the beginning of the string
str.match(/^Rails/) // ["Rails", index: 0, ...]
str.match(/^Ruby/) // null

$indicates that pattern must be at the end of the string
str.match(/Ruby$/) // ["Ruby", index: 36, ...]
str.match(/Rails$/) // null

/ / show and (or) | | string can compare before and after
const regex = /color|colour/
regex.exec('color') // ["color", index: 0, ...]
regex.exec('colour') // ["colour", index: 0, ...]

// When comparing these special characters, use the backslash '\' to skip the special character
const regex = / / \ $100
regex.test('$100') // true
Copy the code

A collection of[]

In the previous examples, the pattern is specified with specific literals. If you want to compare English, numbers, or certain combinations of the pattern, you can use the set [] to capture all of them. The set represents one of the characters in [].

// As long as the English capital letters, the comparison is successful
const regex = /[ABCDEFGHIJKLMNOPQRSTUVWXYZ]/
'K'.match(regex) // ["K", index: 0, ...]
'delta'.match(regex) // null

// You can use '-' to simplify the set, and 'a-z' means that all letters A to Z correspond
const regex = /[A-Z]/

// If you want to compare English or numbers, you can express it like this
const regex = /[A-Za-z0-9]/
Copy the code

Some commonly used collections have special characters.

const regex = / /.   // Compare any character other than the newline symbol
const regex = /\d/  // Compare a number equal to /[0-9]/
const regex = /\w/  = /[a-za-z0-9_]/. = /[a-za-z0-9_]/
const regex = /\s/  // Compare a space (ex: space, TAB, newline,...)
Copy the code

Use elimination [^] to compare characters outside this set

const regex = /[^\w]/ regex.test('a') // false regex.test('! ') // trueCopy the code

quantifiers{}

In the contents of the set, we mentioned that we can only compare one text at a time using the set. In this case, we can use the quantifier {} if we want to compare consecutive identical rules.

// If you are not using quantifiers, you must write five times to compare five consecutive numbers
const regex = /\d\d\d\d\d/
regex.test('12345') // true

// Use {5} to indicate five consecutive occurrences
const regex = /\d{5}/
regex.exec('abcde12345') // ["12345", index: 5, ...]
regex.exec('a1b2c3d4e5') // null

// Use {2,} to indicate more than two consecutive occurrences
const regex = /\w\+{2,}/
regex.exec('a+') // null
regex.exec('a++') // ["a++", index: 0, ...]

// Use {2, 5} to indicate 2 to 5 consecutive occurrences
const regex = / ^ \ w} {2 and 5! /
regex.exec('Hi! ') // ["Hi!", index: 0, ...]
regex.exec('Helloooo! ') // null
Copy the code

Quantifiers can also be replaced by special characters.

/ / use? Means 0 or 1 occurrence, equal to {0,1}
const regex = /\w? /
// Use + to indicate 1 or more occurrences, equal to {1,}
const regex = /\w+/
// Use * to indicate 0 or more occurrences, equivalent to {0,}
const regex = /\w*/
Copy the code

Use, +,? ,, {2, 5} are all Greedy quantifiers, meaning that more consecutive occurrences take precedence. Instead, follow the quantifier with a question mark +? And???? ,? , {2, 5}? It becomes a Lazy quantifier, which means that it takes precedence over fewer consecutive occurrences.

// The more times '+' appears, the better
const regex = /a\+{2,}/
regex.exec('a+++++') // ["a+++++", index: 0, ...]
// The less times '+' appears, the better
const regex = /a\+{2,}? /
regex.exec('a+++++') // ["a++", index: 0, ...]
Copy the code

Assertions (Assertions)

The last topic to cover is assertion, which according to Wikipedia,

An assertion is a first-order logic (such as a judgment that results in true or false) placed in a program that executes the judgment when it reaches the position of the assertion, continues execution if the result is true, and aborts execution if the result is false.

In normal expressions, assertions can be used to specify that an anchor point in a string meets certain conditions. For example, special characters like ^ at the beginning of the string and $at the end of the string are categorized as assertions. Common assertions include the word boundary \b and Lookaround.

// Suppose you want to find 'Java' instead of 'Javascript'
const str = 'difference between Javascript and Java.'
// Find 'Java' for 'Javascript' without assertions
str.match(/Java/) // ["Java", index: 19, ...]
// Use the text boundary \b instead to find the desired result
str.match(/\bJava\b/) // ["Java", index: 34, ...]

// The word boundary refers to the fact that one of the adjacent characters must not be text when matching to the \b position
/ / use the replace the \ b replace '|' to see the effect
const regex = /\b/g
str.replace(regex, '|') // "|difference| |between| |Javascript| |and| |Java|."
Copy the code

For the description of strings and anchor points, it is recommended to go to regex101 and work with the visuals of the site to understand it better. If you want to determine more complex conditions, you can use Lookaround.

// Lookaround is divided into two types: 'Lookahead' and 'Lookbehind'. There are two types of judgment: positive and negative
// Lookahead: A(? =B) → A
// Negative Lookahead: A(? ! B) → A
// look ahead: (? <=A)B → B
// Negative Lookbehind: (? 
      

// Suppose you want to withdraw the amount of goods
const str = "Quantity 2, actual amount 990 yuan"
// Analyze the data to retrieve "a space before + amount + a 'yuan' after"
// Write the rule as a normal expression
const regex = / (? <=\s)\d+(? =) /
str.match(regex) // ["990", index: 10, ...]
Copy the code

Note that while the syntax for Assertions and groups is similar, the condition of the assertion will not be included in the result of the match, so this is also called a zero-length assertion in normal expressions.

conclusion

In addition to being commonly used in Web development, formal expressions can also be used in applications such as crawlers and data analysis. This article takes 15 minutes to introduce you to the use of formal expressions. More content such as comparison of priorities, multiple conditions writing, the effectiveness of formal expressions and other topics can be further studied through the MDN documentation, web search or related books, the above is all the content, if there are mistakes or need to add part of please do not hesitate to point out, thank you for watching.

The resources

  • Regular expression – JavaScript | MDN
  • regex 101
  • Assertion – Wiki
  • Number with commas – StackOverflow
  • Google