What is a regular expression?

Regular Expression is usually used to retrieve and replace text that conforms to a pattern (rule).

Regular Expression is the Expression used to describe a rule.

This article has collected some common usages of regular expressions for easy retrieval, and a detailed regular expression syntax manual is attached at the end.

Examples include email, ID number, mobile number, fixed number, domain name, IP address, date, zip code, password, Chinese characters, digits, and character string

How does Python support regex?

I use Python to implement the regex and use Jupyter Notebook to write the code.

Python supports regular expressions through the RE module, which gives the Python language full regular expression functionality.

Note the use of two functions here:

Re.compile is used to compile regular expressions and generate a regular expression (Pattern) object.

Findall is used to findall substrings matched by the regular expression in a string and return a list, or an empty list if no match is found.

Import reCopy the code

PS: It is not easy for Xiao Bai to learn by himself. It takes quite a bit of time to adapt to the syntax and logic of Python, and he must stick to his own coding and practice.

If you are not confident about yourself, you can also consider taking a programming course and follow the pace of the teacher. You can quickly master the Python syntax system and get sufficient case practice.

Recently, many friends have sent messages to ask about learning Python. For easy communication, click on blue to join the discussion and answer resource base

1. Email

Contains uppercase and lowercase letters, underscores, Arabic digits, periods, and hyphens

Expressions:

[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(? :\.[a-zA-Z0-9_-]+)

Case study:

The pattern = re.com running (r "[a zA - Z0 - _ - 9] + @ [a zA - Z0 - _ - 9] + (? : \ [a zA - Z0 - _ - 9] +)") STRS = 'my personal email is [email protected], The company email is [email protected], please register? ' result = pattern.findall(strs) print(result) ['[email protected]', '[email protected]']Copy the code

2. Id number

XXXXXX YYYY MM DD 375 0 eighteen digits

  • Area: [1-9] \ d {5}

  • Two years ago: (18 19 | | ([23] \ d)), 1800-2399

  • \d{2}

  • In: ((0 [1-9]) | | (10, 11 | 12))

  • Number of days: (([0-2] [1-9]) 10 20 | | | | 30 31) leap year 29 + cannot ban

  • Three-digit sequence code: \d{3}

  • Two digit sequence code: \d{2}

  • Verification code: [0-9XX]

Expressions:

[1-9]\d{5}(18|19|([23]\d))\d{2}((0[1-9])|(10|11|12))(([0-2][1-9])|10|20|30|31)\d{3}[0-9Xx]

Case study:

pattern = re.compile(r"[1-9]\d{5}(?:18|19|(?:[23]\d))\d{2}(?:(?:0[1-9])|(?:10|11|12))(?:(?:[0-2][1-9])|10|20|30|31)\d{3}[0-9Xx]") STRS = '342623198910235163' result = pattern.findall(STRS) print(result) ['342623198910235163']Copy the code

3. Domestic mobile phone number

The mobile phone numbers are 11 digits and start with 1. The second digit is usually 3, 5, 6, 7, 8, and 9. The remaining eight digits are arbitrary, for example, 13987692110 and 15610098778

Expressions:

1(3|4|5|6|7|8|9)\d{9}

Case study:

Pattern = re.pile (r"1[356789]\d{9}") STRS = ' 'result = pattern.findall(STRS) print(result) ['13987692110']Copy the code

4. Domestic landline

Area code 3\~4 digits, number 7\~8 digits

For example, 0511-1234567, 021-87654321

Expressions:

\d{3}-\d{8}|\d{4}-\d{7}

Case study:

The pattern = re.com running (r "\ d {3} - \ d {8} | \ d {4} - \ d {7}") STRS = '0511-1234567 is xiao Ming's home phone, Print (result) ['0511-1234567', '021-87654321'] print(result) ['0511-1234567', '021-87654321']Copy the code

5. The domain name

Contains http:\\ or https:\\

Expressions:

(? : (? :http:\/\/)|(? :https:\/\/))? (? :[\w](? : [\ w \] {0, 21} [/ w])? \.) + [a zA – Z] {2, 6} (? : \ /)

Case study:

Running pattern = re.com (r "(? : (? : HTTP: / / / /) | (? : HTTPS: / / / /))? (? : [/ w] (? : [\ w \ -], 21 {0} [/ w])? \.) + [a zA - Z] {2, 6} (? :\/)") STRS = 'https://www.python.org/' result = pattern.findall(STRS) print(result) ['https://www.python.org/']Copy the code

6. The IP address

The LENGTH of the IP address is 32 bits (there are 2^32 IP addresses in total). Each IP address is divided into four segments of 8 bits. Each segment ranges from 0 to 255 in decimal notation and is separated by periods (.)

Expressions:

((? : (? :25[0-5]|2[0-4]\d|[01]? \d? \d)\.) {3} (? :25[0-5]|2[0-4]\d|[01]? \d? \d))

Case study:

pattern = re.compile(r"((?:(?:25[0-5]|2[0-4]\d|[01]?\d?\d)\.) {3} (? :25[0-5]|2[0-4]\d|[01]? \d? \d))") STRS = "" please enter valid IP address, illegal IP address and other characters will be filtered! After adding, deleting, or changing the IP address, save or close notepad! 192.168.8.84 192.168.8.85 192.168.8.86 0.0.0.1 256.1.1.1 192.256.256.256 192.255.255.255 Aa.bb.cc. dd "" result = The pattern. The.findall (STRS) print (result) [' 192.168.8.84 ', '192.168.8.85', '192.168.8.86', '0.0.0.1', '56.1.1.1', '192.255.255.255]Copy the code

Date of 7.

Common date formats are yyyyMMdd, YYYY-MM-DD, YYYY /MM/ DD, and YYYY.mm-dd

Expressions:

\d{4}(? : – | \ / |.) (\ d {1, 2}? : – | \ / |.) \ d {1, 2}

Case study:

pattern = re.compile(r"\d{4}(?:-|\/|.) (\ d {1, 2}? : - | \ / |.) \d{1,2}") STRS = 'today is 2020/12/20, last year was 2019.12.20, Print (result) ['2020/12/20', '2019.12.20', '2021-12-20'] print(result) ['2020/12/20', '2019.12.20', '2021-12-20']Copy the code

8. National zip code

China’s postal code adopts a four-level six-digit code structure. The first two digits represent provinces (municipalities directly under the Central government and autonomous regions) and the third digit represents postal areas. The fourth digit is the county and the last two digits are the office

Expressions:

[1-9]\d{5}(? ! \d)

Case study:

pattern = re.compile(r"[1-9]\d{5}(?! \d)") STRS = '20040' result = pattern.findall(STRS) print(result) ['200040']Copy the code

9. Password

Password (must start with a letter and contain 6 to 18 letters, digits, and underscores (_).)

Expressions:

[a zA – Z] \ w {5} in 2

Strong password (must start with a letter, contain uppercase and lowercase letters and digits, cannot use special characters, and must be between 8 and 10 characters in length)

Expressions:

[a-zA-Z](? =.*\d)(? =.*[a-z])(? =. * [a-z]). 8, 10 {}

The pattern = re.com running (r "\ [a zA - Z] w {5} 12") STRS = 'password: q123456_abc' result = pattern.findall(strs) print(result) ['q123456_abc'] pattern = Re.com running (r "[a zA - Z] (? = * \ d) (? = * [a-z]) (? = * [a-z]), 8, 10 {}") STRS = 'strong password: q123456ABc, weak password: Q123456abc 'result = pattern.findall(STRS) print(result) [' q123456abc, ']Copy the code

10. Chinese characters

Expressions:

[\u4e00-\u9fa5]

Case study:

Pattern = re.pile (r"[u4e00-u9fa5]") STRS = 'apple: apple 'result = pattern. Findall (STRS) print(result) [' u4e00-u9fa5 ']Copy the code

11. Digital

  • Verification number: ^[0-9]*$

  • Verify n digits: ^\d{n}$

  • Verify at least n digits: ^\d{n,}$

  • Verify m-n digits: ^\d{m,n}$

  • Verify the number of zero and non-zero start: ^ (0 | 1 – [9] [0-9] *) $

  • Verify positive real numbers with two decimal places: ^[0-9]+(.[0-9]{2})? $

  • Verify positive real numbers with 1-3 decimal places: ^[0-9]+(.[0-9]{1,3})? $

  • Verify non-zero positive integers: ^\+? [1-9] [0-9] * $

  • Validates non-zero negative integers: ^\-[1-9][0-9]*$

  • Validates non-negative integers (positive integers + 0) ^\d+$

  • Verify the positive integer (negative integers + 0) ^ ((\ d +) | (0 +)) $

  • Integer: ^ -? \d+$

  • Non-negative floating-point (positive floating-point + 0) : ^\d+(\.d +)? $

  • Floating-point ^ (([0-9] + \. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] * \ [0-9] +) | ([0-9] * [1-9] [0-9] *)) $

  • Non-positive floating point (negative floating point + 0) ^((-\d+(\.d +)?) | (0 + (\. 0 +)? ) $

  • Negative floating-point ^ (- (([0-9] + \. [0-9] * [1-9] [0-9] *) | ([0-9] * [1-9] [0-9] * \ [0-9] +) | ([0-9] * [1-9] [0-9] *))) $

  • Floating-point ^ (-? \d+)(\.\d+)? $

12. The string

  • ^[a-za-z0-9]+ or ^[a-za-z0-9 \]{4,40}

  • All characters from 3 to 20: ^.{3,20}$

  • The value is A string of 26 letters: ^[a-za-z]+$

  • The value contains 26 uppercase letters: ^[a-z]+$

  • The value is a string of 26 lowercase letters: ^[a-z]+$

  • The value is A string of 26 letters and digits: ^[A-za-z0-9]+$

  • A string of numbers, 26 letters, or underscores: ^\w+ or ^\\w{3,20}

  • Chinese, English, digits including underscore: ^[\u4E00-\ U9FA5A-zA-z0-9_]+$

  • Chinese, English, Numbers, but not including the underline symbols such as: ^ [\ u4E00 – \ u9FA5A – Za – z0-9] + or ^ \ [\ \ u4E00 – \ \ u9FA5A – Za – z0-9 \] {2} 11

  • Can be entered with ^%&’,; =? \\ “and other characters: \[^%&’,;=?\x22]+

  • Do not enter characters containing \ : [^ \x22]+

Attached: regular expression syntax details

Have a good look, learn! This line is familiar! Ha, ha, ha