NSRegularExpression

Regular expression, also known as normal representation, conventional representation. Regular Expression (often abbreviated to regex, regexp, or RE in code) is a term used in computer science. Regular expressions use a single string to describe and match a set of syntactic rules. In many text editors, regular expressions are used to retrieve and replace text that fits a pattern.

Enumerated type

typedef NS_OPTIONS(NSUInteger, NSRegularExpressionOptions) { NSRegularExpressionCaseInsensitive = 1 << 0, / / case-insensitive NSRegularExpressionAllowCommentsAndWhitespace = 1 < < 1, / / ignore Spaces and# (comment)NSRegularExpressionIgnoreMetacharacters = 1 < < 2, / / integration NSRegularExpressionDotMatchesLineSeparators = 1 < < 3, // Matches any character, Including line separators NSRegularExpressionAnchorsMatchLines = 1 < < 4, / / allow ^ and $in matching the beginning and end of the line NSRegularExpressionUseUnixLineSeparators = 1 < < 5, / / the search range for the entire (invalid) NSRegularExpressionUseUnicodeWordBoundaries = 1 < < 6 / / (the search range for the entire invalid)};Copy the code
typedef NS_OPTIONS(NSUInteger, NSMatchingOptions) { NSMatchingReportProgress = 1 << 0, / / find the longest matching string after call block callback NSMatchingReportCompletion = 1 < < 1, / / to find any matching string after the callback a block NSMatchingAnchored = 1 < < 2, / / matching range from the start of the match NSMatchingWithTransparentBounds = 1 < < 3, / / allow matching range beyond the scope of setting NSMatchingWithoutAnchoringBounds = 1 < < 4 / / banned ^ and $automatic matching line or end};Copy the code

This enumerated value is only used in block methods

typedef NS_OPTIONS(NSUInteger, NSMatchingFlags) { NSMatchingProgress = 1 << 0, NSMatchingCompleted = 1 << 1, NSMatchingHitEnd = 1 << 2, NSMatchingRequiredEnd = 1 << 3, NSMatchingInternalError = 1 << 4 // Set when the match fails due to an error};Copy the code

methods

1. Returns a set of all matching results (fit, extract all data we want to match from a string) * - (NSArray *)matchesInString:(NSString *)string options:(NSMatchingOptions)options range:(NSRange)range; * - (NSUInteger)numberOfMatchesInString:(NSString *)string options:(NSMatchingOptions)options range:(NSRange)range; 3. Return the result of the first match. Pay attention to, The matching result is stored in the NSTextCheckingResult type * - (NSTextCheckingResult *)firstMatchInString:(NSString *)string options:(NSMatchingOptions)options range:(NSRange)range; 4. Return the first correct matching result string NSRange * - NSRange rangeOfFirstMatchInString: (nsstrings *) string options: (NSMatchingOptions) options range:(NSRange)range; 5. * - block method (void) enumerateMatchesInString: (nsstrings *) string options: (NSMatchingOptions) options range: (NSRange) range  usingBlock:(void (^)(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop))block;Copy the code

Replace method

- (NSString *)stringByReplacingMatchesInString:(NSString *)string options:(NSMatchingOptions)options range:(NSRange)range withTemplate:(NSString *)templ;
- (NSUInteger)replaceMatchesInString:(NSMutableString *)string options:(NSMatchingOptions)options range:(NSRange)range withTemplate:(NSString *)templ;
- (NSString *)replacementStringForResult:(NSTextCheckingResult *)result inString:(NSString *)string offset:(NSInteger)offset template:(NSString *)templ;
Copy the code

Use case

String substitution

let test = "Sdgreihen a quiet evening jlosd a"
let regex = "A"
let RE = try NSRegularExpression(pattern: regex, options: .caseInsensitive)
let modified = RE.stringByReplacingMatches(in: test, options: .reportProgress, range: NSRange(location: 0, length: test.count), withTemplate: "Yes")
Copy the code

print

Sdgreihen yes quiet night Jlosd yesCopy the code

String matching

let test = "sdgreihendfjbhiidfjdbjb"
let regex = "jb"
let RE = try NSRegularExpression(pattern: regex, options: .caseInsensitive)
let matchs = RE.matches(in: test, options: .reportProgress, range: NSRange(location: 0, length: test.count))
print(matchs.count)
Copy the code

But sometimes, what we need to match is not the exact string, but the fuzzy match, like checking the phone number, email, and so on

let test = "1832321108"
let regex = "^ 1 [0-9] {10} $"
let RE = try NSRegularExpression(pattern: regex, options: .caseInsensitive)
let matchs = RE.matches(in: test, options: .reportProgress, range: NSRange(location: 0, length: test.count))
print(matchs.count)
Copy the code

Let’s look at the rules of regular expressions

Regular expression

Let’s start by writing a test tool

/// /// -parameters: /// -regex: matching rule /// -validATEString: matching pairtestLike / / / - Returns: Returns the func RegularExpression (regex: String, validateString: String) - > [String] {do {
        let regex: NSRegularExpression = try NSRegularExpression(pattern: regex, options: [])
        let matches = regex.matches(in: validateString, options: [], range: NSMakeRange(0, validateString.count))
        
        var data:[String] = Array()
        for item in matches {
            let string = (validateString as NSString).substring(with: item.range)
            data.append(string)
        }
        
        return data
    }
    catch {
        return[]}} /// string replacement /// /// -parameters: /// -validateString: matching object /// -regex: matching rule /// -content: replacing content /// - Returns: Results the func replace (validateString: String, the regex: String, content: String) - > String {do {
        let RE = try NSRegularExpression(pattern: regex, options: .caseInsensitive)
        let modified = RE.stringByReplacingMatches(in: validateString, options: .reportProgress, range: NSRange(location: 0, length: validateString.count), withTemplate: content)
        return modified
    }
    catch {
        return validateString
    }
   
}

Copy the code

This chapter is studied in the following order

  • Regular expression character matching guide
  • Regular expression position matching guide
  • The function of regular expression parentheses
  • The principle of regular expression backtracking
  • Regular expression splitting

The first chapter, regular expression character matching strategy

Regular expressions are matching patterns that match either characters or positions

  • 1. Two kinds of fuzzy matching
  • 2. Character groups
  • 3, quantifiers
  • 4. Branch structure

1. Two kinds of fuzzy matching

It doesn’t make much sense if the re only matches exactly, such as Hello, which can only match the hello substring in the string

Regular expressions are powerful because they enable fuzzy matching.

And fuzzy matching, there are two directions of “fuzzy” : horizontal fuzzy and vertical fuzzy.

1.1. Horizontal fuzzy matching

Horizontal blurring refers to the fact that the length of a regular matching string is not fixed and can be multiple.

This is done by using quantifiers. For example, {m,n} indicates that the occurrence of at least m times, at most n times.

For example, ab{2,5}c matches a string with the first character a, followed by two to five characters B, and finally character C. The tests are as follows:

let regex = "Ab} {2 and 5 c"
let validate = "abc abbc abbbc abbbbc abbbbbc abbbbbbc"
letResult = RegularExpression(regex: regex, validateString: validate)"abbc"."abbbc"."abbbbc"."abbbbbc"]
Copy the code

1.2. Longitudinal fuzzy matching

Vertical blurring refers to the fact that the string of a regular match, when specific to a character, may not be a certain character, but can have many possibilities.

This is done by using groups of characters. For example, [ABC] indicates that the character can be any of the characters A, B, or C.

For example, a[123] B can match three strings: a1b, a2b, and a3b. Test the following

let regex = "a[123]b"
let validate = "a0b a1b a2b a3b a4b"
let result = RegularExpression(regex: regex, validateString: validate)
print(result) // Print the result ["a1b"."a2b"."a3b"]
Copy the code

2. Character groups

It is important to note that a character group (character class) is only one character. For example, [ABC] matches a character. It can be one of a, B, or C.

  • Range notation: If there are too many characters in the group, range notation can be used. For example, [123456abcdefGHIJKLM] can be written as [1-6a-fg-m]. Use hyphens for ellipsis and abbreviations

  • 2. Exclude character groups: Vertical fuzzy matching, or a case where a character can be anything but “A”, “B”, or “C”. This is the time to exclude the concept of character groups (antisense character groups). For example, [^ ABC] is any character except “A”, “b”, and “C”. The first part of the character group is ^ (decaracter) to indicate the concept of inverting.

2.1. Common abbreviations

Once we have the concept of character groups, we can understand some common symbols. Because they’re all shorthand forms that come with the system

Regular expression Match the range memory
\d [0-9] Indicates a digit The word is digit
\D [^0-9] indicates any character except digits
\w [0-9A-zA-z_] indicates digits, uppercase letters, and underscores W is short for word, also known as word character
\W [^0-9a-zA-Z_] Non-word character
\s [\t\v\n\r\f] represents whitespace characters, including Spaces, horizontal tabs, vertical tabs, line feeds, carriage returns, and page feeds S is the first letter of space character
\S [^ \t\v\n\r\f] Non whitespace characters
. [^\n\r\u2028\u2029] wildcard, representing almost any character. Newline, carriage return, line and segment separators are excluded

2.2, quantifiers

Quantifiers are also called repetitions. Once you know exactly what {m,n} means, you just need to remember some abbreviations.

  • {m,}Indicates at least m occurrences
  • {m}Is equivalent to{m,m}, indicating the occurrence of m times
  • ?Is equivalent to{0, 1}Is present or not present. How to memorize: the meaning of question mark, is there?
  • +Is equivalent to{1,}Is displayed, indicating at least one occurrence. How to remember: The plus sign means to add, you have to have one first, then you can consider adding.
  • *Is equivalent to, {0}, which indicates that it occurs any time and may not occur. How to remember: Look at the stars in the sky. There may be none, there may be a few scattered, and you may not be able to count them.

Greedy matches: It will match as many as possible. You can give me six, I’ll take five. If you can give me three, I’ll take three. As long as you can handle it, the more the better.

Lazy matching: to make as few matches as possible:

let regex = "\ \ d {2, 5}"
let validate = "123, 1234, 12345, 123456"
let result = RegularExpression(regex: regex, validateString: validate)
print(result) // Print the result ["123"."1234"."12345"."12345"] -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --let regex = "\ \ d {2, 5}?"
let validate = "123, 1234, 12345, 123456"
let result = RegularExpression(regex: regex, validateString: validate)
print(result) // Print the result ["12"."12"."34"."12"."34"."12"."34"."56"]
Copy the code

Lazy matching can be achieved by placing a question mark after the quantifier, so all lazy matching cases are as follows:

{m,n}? {m,}? ?? +? *?

2.3. Multiple branches

A pattern can achieve horizontal and vertical fuzzy matching. A multiple choice branch can support any one of several subpatterns.

Specific form is as follows: (p1 | p2 | p3), p1, p2 and p3 is sub mode, use | (pipe) separated, said one of any of them

For example, to match good and nice can use | nice. The tests are as follows:

let regex = "good|nice"
let validate = "good idea, nice try."
let result = RegularExpression(regex: regex, validateString: validate)
print(result) // Print the result ["good"."nice"]
Copy the code

But there is a fact we should pay attention to, such as I use good | goodbye, goodbye to matching strings, the result is good:

let regex = "good|goodbye"
let validate = "goodbye"
let result = RegularExpression(regex: regex, validateString: validate)
print(result) // Print the result ["good"]

Copy the code

And the regular change goodbye | good, the result is

let regex = "goodbye|good"
let validate = "goodbye"
let result = RegularExpression(regex: regex, validateString: validate)
print(result) // Print the result ["goodbye"]
Copy the code

In other words, the branch structure is also lazy, that is, once the front one matches, the next one doesn’t try.

The second chapter, regular expression position matching strategy

Matching strategy is mainly introduced from the following aspects

  • 1. What is location?
  • 2. How to match position?

1. What is location

A position is the position between adjacent characters. For example, the arrow in the image below

2. How do you match locations?

2.1,^and$

  • ^(off character) matches the beginning of a line in a multi-line match
  • $(dollar sign) matches the end of a line in a multi-line match.

For example, we replace the beginning and end of a string with “#”

let regex = "^ | $"
let validate = "hello"
let result = replace(validateString: validate, regex: regex, content: "#")
print(result) // Prints the result#hello#
Copy the code

2.2,\band\B

\b is the boundary between \w and \w, including \w and ^, and \w and $.

let regex = "\\b"
let validate = "[JS] Lesson_01.mp4"
let result = replace(validateString: validate, regex: regex, content: "#")
print(result)
//[#JS#] #Lesson_01#.#mp4#
Copy the code

First, we know that \w is a short form of the group [0-9a-za-z_], that is, \w is any alphanumeric or underscore character. While \W is a short form of the excluded character group [^ 0-9a-za-z_], that is, \W is any character other than \W.

[#JS#] #Lesson_01#.#mp4#

  • The first “#”, flanked by “[” and “J”, is the position between \W and \W.
  • The second “#” is flanked by “S” and “]”, which is the position between \w and \w.
  • The third “#”, with Spaces and “L”, is the space between \W and \W.
  • The fourth “#”, flanked by “1” and “.”, is the position between \w and \w.
  • The fifth “#” is flanked by “.” and “m”, which is the position between \W and \W.
  • The sixth “#” corresponds to the end, but the character “4” before it is \w, which is the position between \w and $.

\B means the opposite of \B, not word boundary. For example, if \b is deducted from all positions in a string, all that is left is \B’s.

let regex = "\\B"
let validate = "[JS] Lesson_01.mp4"
let result = replace(validateString: validate, regex: regex, content: "#")
print(result)
//#[J#S]# L#e#s#s#o#n#_#0#1.m#p#4
Copy the code

2.3,(? =p)and(? ! p)

(? =p), where P is a subpattern, i.e. the position before p

Such as? =l), indicating the position before the l character, for example:

let regex = "(? =l)"
let validate = "hello"
let result = replace(validateString: validate, regex: regex, content: "#")
print(result)

//he#l#lo
Copy the code

And (? ! P) is (? =p)

let regex = "(? ! l)"
let validate = "hello"
let result = replace(validateString: validate, regex: regex, content: "#")
print(result)
Copy the code

3, case

The thousands separator representation of a number

For example, “12345678” becomes “12345678”.

let regex = "(? =(\\d{3})+$)"
let validate = "12345678"
let result = replace(validateString: validate, regex: regex, content: "")
print(result)
//12 345 678
Copy the code

Ideas:

  • 1. Get one out of the last three firstThe blank space, the use of(? =\d{3}$)
  • 2. Because every third person appears onceThe blank space, so you can use quantifiers+In the end(? =(\\d{3})+$)

If 123456789 is shard and there is an extra space at the beginning, we can use (? ! ^). To see the effect, we use # instead of space

let regex = "(? =(\\d{3})+$)"
let validate = "123456789"
let result = replace(validateString: validate, regex: regex, content: "#")
print(result)
//# 123 # 456 # 789

let regex = "(? ! (^)? =(\\d{3})+$)"
let validate = "123456789"
let result = replace(validateString: validate, regex: regex, content: "#")
print(result)
//123# 456 # 789
Copy the code

Password verification problem

The password contains 6 to 12 characters, including digits, lowercase characters, and uppercase letters, but must contain at least two types of characters.

We can do it step by step

  • 1. The password contains 6 to 12 characters, including digits, lowercase characters, and uppercase letters. The regular expression is ^[0-9a-za-z]{6,12}$

  • 2. Determine whether a certain character is included. The required value must contain numbers and the regular expression is (? =. * [0-9]). (? *[0-9], any number of arbitrary characters, followed by a number. In plain English, the next character must contain a number.

  • 3. Contain two types of characters at the same time, for example, digits and lowercase letters. The regular expression is (? =. * ([0-9])? =.*[a-z])

  • 4. The complete regular expression is (? =. * ([0-9])? =. * [a-z]) ^ [0-9 a Za – z] {6, 12} $

Chapter 3, the function of regular expression parentheses

There are parentheses in every language. Regular expressions are also a language, and the presence of parentheses makes it even more powerful.

The contents include:

  • 1. Grouping and branching structure
  • 2. Grouping references
  • 3. Backreference
  • 4. Non-capture grouping

1. Grouping and branching structure

grouping

We know that a+ matches consecutive occurrences of “a”, and to match consecutive occurrences of “ab”, we need to use (ab)+.

Where parentheses provide grouping functions, so that the quantifier + applies to the whole of ab, as follows

let regex = "(ab)+"
let validate = "ababa abbb ababab"
let result = RegularExpression(regex: regex, validateString: validate)
print(result)
//["abab"."ab"."ababab"]
Copy the code

Branching structure In the multiple branch structure (p1 | p2), the role of the parentheses is self-evident, provides the expression of all possible.

Match the following string

I love Swift I love Regular Expression

Test the following

let regex = "^I love (Swift|Regular Expression)$"
let validate = "I love Swift"
let result = RegularExpression(regex: regex, validateString: validate)
print(result)
//["I love Swift"]
Copy the code

2. Grouping references

It seems that swift does not support this function. It is possible that I have not found a corresponding method. If you find a relevant supporting method, please come forward.

This is an important function of parentheses, which allows us to do data extraction, as well as more powerful substitution operations.

To take advantage of its benefits, you must use the API of the implementation environment.

Take dates, for example. Assuming the format is YYYY-MM-DD, we can start by writing a simple re

var regex = /\d{4}-\d{2}-\d{2}/;
Copy the code

And then I’ll change it to a parenthesis version

var regex = /(\d{4})-(\d{2})-(\d{2})/;
Copy the code

For example, to extract the year, month, and day, you can do this:

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12"; console.log( string.match(regex) ); / / = > ["2017-06-12"."2017"."6"."12", index: 0, input: "2017-06-12"]
Copy the code

An array returned by match. The first element is the overall match result, followed by the matches for each group (in parentheses), followed by the match subscript, and finally the input text. (Note: The array format returned by match is different if the re has the g modifier or not).

You can also use the exec method of the re object

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12"; console.log( regex.exec(string) ); / / = > ["2017-06-12"."2017"."6"."12", index: 0, input: "2017-06-12"]
Copy the code

Also, we can use the constructor’s global attributes $1 through $9 to get:

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12"; regex.test(string); Exec (string); //regex.exec(string); //string.match(regex); console.log(RegExp.The $1); // "2017"
console.log(RegExp.$2); // "6"
console.log(RegExp.$3); // "12"
Copy the code

For example, if you want to replace YYYY-MM-DD with MM/DD/YYYY, what do you do?

var regex = /(\d{4})-(\d{2})-(\d{2})/;
var string = "2017-06-12";
var result = string.replace(regex, "$2/$3/The $1"); console.log(result); / / = >"06/12/2017"
Copy the code

3. Backreference

In addition to referring to groups using the corresponding API, you can also refer to groups within the re itself. But you can only refer to the previous grouping, which is called a backreference.

Again, take dates.

Say you want to write a re that matches one of the following three formats

The 2016-06-12 2016/06/12 2016.06.12

The first regular that might come to mind is:

var regex = /\d{4}(-|\/|\.) \d{2}(-|\/|\.) \d{2}/; var string1 ="2017-06-12";
var string2 = "2017/06/12";
var string3 = "2017.06.12";
var string4 = "2016-06/12";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // true
Copy the code

Where/and. Need to be escaped. Although the required condition is matched, data such as “2016-06/12” is also matched.

What if we wanted to be consistent with the separator? Use a backreference:

var regex = /\d{4}(-|\/|\.) \d{2}\1\d{2}/; var string1 ="2017-06-12";
var string2 = "2017/06/12";
var string3 = "2017.06.12";
var string4 = "2016-06/12";
console.log( regex.test(string1) ); // true
console.log( regex.test(string2) ); // true
console.log( regex.test(string3) ); // true
console.log( regex.test(string4) ); // false
Copy the code

Notice \ 1, said a reference before the group (- | \ / | \.) . No matter what it matches (such as -), \1 matches that same concrete character.

Now that we know what \1 means, we understand the concepts \2 and \3, which refer to the second and third groups respectively

What about nesting parentheses

Open parentheses prevail. Such as:

var regex = /^((\d)(\d(\d)))\1\2\3\4$/;
var string = "1231231233";
console.log( regex.test(string) ); // true
console.log( RegExp.The $1 ); // 123
console.log( RegExp.$2 ); // 1
console.log( RegExp.$3 ); // 23
console.log( RegExp.$4); / / 3Copy the code

We can look at the regular matching pattern:

  • The first character is a number, like 1,
  • The second character is a number, like 2,
  • The third character is a number, like 3,
  • Next up is\ 1Alpha is the first group, so let’s see what the first open bracket is, it’s 123,
  • Next up is\ 2, find the second open bracket, corresponding group, matching the content of 1,
  • Next up is3 \, find the third open bracket, the corresponding group, the matching content is 23,
  • The last is4 \, find the third open bracket, corresponding group, matching content is 3.

4. Non-capture grouping

The groups that appear in the previous article capture the data they match for subsequent reference, so they are also called captured groups.

If you just want the primitive functionality of parentheses, you don’t refer to them, that is, you don’t refer to them in the API or back reference them in the re. At this point you can use non-capture grouping (? :p), for example, the first example of this article can be modified to:

var regex = /(? :ab)+/g; var string ="ababa abbb ababab"; console.log( string.match(regex) ); / / = > ["abab"."ab"."ababab"]
Copy the code

Chapter four, the principle of regular expression backtracking

To learn regular expressions, you need to know some matching principles.

And when it comes to matching, there are two words that come up a lot: “backtracking.”

That sounds lofty, but there are a lot of people out there who don’t know anything about it.

Therefore, this chapter is about what backtracking really is.

The contents include:

  • 1. No backtracking matching
  • 2. Matching with backtracking
  • 3. Common forms of backtracking

1. No backtracking matching

Suppose our re is ab{1,3}c, which looks like this visually:

When the target string is abBBC, there is no backtracking. The matching process is as follows:

The subexpression b{1,3} indicates that the “B” character occurs 1 to 3 times in a row

2. Matching with backtracking

If the target string is “abbc”, there is a backtrace.

Step 5 in the figure has a red color indicating that the match was unsuccessful. B {1,3} has already matched two characters “b” and is about to try the third, only to discover that the next character is “c”. B {1,3} is considered to have been matched. The state is then returned to the previous state (step 6, same as step 4), and finally the subexpression c is used to match the character “c”. Of course, the entire expression matches. Step 6 in the figure is “backtrace”.

3. Common forms of backtracking

The way regular expressions match strings is known as backtracking. The basic idea of retrospective method, also known as heuristic method, is: From the problems of a particular state (initial state), the search from this state can achieve all of the “state”, when a road to the “end” (can’t), then take a step back or a number of steps, starting from another possibility “state”, continue to search until all of the “path” (state) are tested. This method of constantly “going forward” and “backtracking” to find a solution is called “backtracking”

It’s essentially a depth-first search algorithm. The process of going back to a previous step is called backtracking. As you can see from the procedure described above, backtracking occurs when the path is blocked. That is, when an attempt to match fails, the next step is usually backtracking.

Greed quantifiers

The previous examples were all greedy quantifiers. For example, b{1,3}, because it is greedy, the possible order of attempts is to try from more to less. We try “BBB” first, and then see if the entire re matches. When they don’t match, spit out a “B”, that is, on the basis of the “BB”, and try again. If that doesn’t work, spit out another one and try again. And if not? It just means the match failed

let regex = "\ \ d {1, 3}"
let validate = "12345"
let result = RegularExpression(regex: regex, validateString: validate)
print(result)
//["123"."45"]
Copy the code

Lazy quantifiers

An inert quantifier is a greedy quantifier followed by a question mark. To indicate as few matches as possible, for example:

let regex = "\ \ d {1, 3}?"
let validate = "12345"
let result = RegularExpression(regex: regex, validateString: validate)
print(result)
//["1"."2"."3"."4"."5"]
Copy the code

Branching structure

We know the branch is inert, such as/can | candy /, to match the string “candy”, the result is “can”, because the branch will be a a try, if satisfied, in front of behind will not test again. Branch structure, perhaps the previous subpattern will form a local match, if the following expression does not match the whole, continue to try the remaining branches. This attempt can also be seen as a kind of backtracking. Such as regular

Chapter five, the resolution of regular expression

There are two ways to measure your mastery of a language: reading and writing.

Not only should you be able to solve your own problems, but you should also be able to read the solutions of others. The code is like this, and the regular expression is like this. The re language is a little different from other languages in that it is usually just a bunch of characters without the concept of a “statement.” How to correctly divide a large string of regular pieces into a piece, has become the key to crack the “secret book”.

This chapter addresses this problem, including:

  • 1. Structures and operators
  • 2, pay attention to the main points
  • 3. Case analysis

1. Structures and operators

  • Literals that match a specific character, both unescaped and escaped. Such asaMatch character “a”
  • Character groups, matching a character, can be one of many possibilities, such as[0-9]Is matched with a number. There are also\dShort form for. There are also antisense character groups, which can be any character other than a specific character, such as [^0-9], which is a non-numeric character, or the abbreviation \D.
  • A quantifier is a character that appears consecutively, for exampleA {1, 3}Indicates that the character A appears three times consecutively. There are also common abbreviations likea+Indicates that the a character appears at least once consecutively
  • Anchor points, which match a position, not a character. Such as^Matches the beginning of a string, for example\bMatch word boundaries, for example(? =\d)Represents the position before a number.
  • Grouping, using parentheses to represent a whole, for example(ab)+, indicates that the characters “ab” appear more than once in a row. You can also use non-capture grouping(? :ab)+.
  • Branch, choose one of many subexpressions, for exampleabc|bcd, the expression matches the “ABC” or “BCD” character substring

Here, let’s analyze a re:

ab? (c|de*)+|fg

  • 1. Due to the existence of parentheses,(c|de*)It’s a whole structure.
  • 2, in(c|de*)Notice the quantifiers in*, soe*It’s a whole structure
  • 3. Because of the branching structure|Lowest priority, thereforecIs a whole, andde*It’s another whole
  • 4. Similarly, the whole re is dividedA and b? , (...). Plus, f, g. And because of the branching, it can be dividedab? (c|de*)+andfgThese two parts.

2, pay attention to the main points

The whole problem of matching strings

Because we want to match the entire string, we often enclose the anchor characters ^ and $before and after the re

Such as to match the target string “ABC” or “BCD”, if not careful, will write ^ ABC | BCD $.

Positional characters and character sequences take precedence over vertical bars. This re means that the match starts with ABC or ends with BCD

let regex = "^abc|bcd$"
let validate = "abc123456"
let result = RegularExpression(regex: regex, validateString: validate)
print(result)
//["abc"]
Copy the code

The correct method should be ^ | BCD (ABC) $

Quantifier linking problem

Suppose we want to match a string like this:

  1. Each character is one of a, B, or C
  2. The length of the string is a multiple of 3

The re cannot be written as ^[ABC]{3}+$for granted

let regex = "^[abc]{3}+$"
let validate = "abcaaa"
let result = RegularExpression(regex: regex, validateString: validate)
print(result)
//[]
Copy the code

The correct answer should be ^([ABC]{3})+$

let regex = "^([abc]{3})+$"
let validate = "abcaaa"
let result = RegularExpression(regex: regex, validateString: validate)
print(result) 
//["abcaaa"]
Copy the code

Metacharacter escape problem

^ $. * +? | \ / () [] {} =! : -- -

let regex = "\ \ ^ \ \ $\ \. \ \ \ \ * + \ \? \ \ | \ \ \ \ \ \ / \ \ \ \ [\] = {\ \} \ \ \ \! \ \ : \ \ - \ \"
let validate = "^ $. * +? | \ \ / [] {} =! : --"
let result = RegularExpression(regex: regex, validateString: validate)
print(result)
//["^ $. * +? | \ \ / [] {} =! : --"]
Copy the code

Need to use \\ escape

Match “[ABC]” and “{3,5}”

let regex = "\\[abc]"
let validate = "[abc]"
let result = RegularExpression(regex: regex, validateString: validate)
print(result)
//["[abc]"]
Copy the code

You only need to escape the first square bracket, because the following square brackets do not form character groups and the re does not cause ambiguity.

Article reprint:Full tutorial on regular expressions