Note: This article is intended for those who are not programming with zero foundation. It will help you quickly understand Python syntax, and then you can have fun with it. If it is zero basis, or honest reading is the most safe.

preface

I happened to see some interesting Python projects on Zhihu. Made me a little interested in Python. There are still two months to go before the north drift internship, so I can learn something in this free time. If you can make a small tool, may be helpful to work, why not?

Without further ado about environment installation and IDE, there are many tutorials available online. Here is a blog post for you to follow: VSCode to build a Python development environment. VSCode is mainly used because it is free, and there are a number of plug-ins available to download, so you can customize your IDE as much as you like. If you haven’t used VSCode before, it’s a good idea to learn more about the necessary plug-ins to optimize your Coding experience. For example, Python plug-ins are recommended.

Once the environment is set up, you can have fun typing code. VSCode needs to create its own Python file with the.py suffix. Ctrl+F5 run the program, F5 debug the program.

Python based

annotation

Single-line comment: #

Multi-line comments: “” (three single quotation marks at the beginning and three single quotation marks at the end)

# This is a one-line comment

This is a multi-line comment.
Copy the code

variable

Python variable definitions do not need to specify the data type explicitly, just [variable name = value]. Note that variable names are case-sensitive. For example, Name and Name are not the same variable.

name = "Wang"
print(name) # Export Xiao Wang
Copy the code

The data type

Python provides six basic data types: number, string, list, tuple, dictionary, and set. There are also three numeric types: int, float, and complex.

Lists, tuples, we’ll leave that in the containers section, but first look at numeric types.

floating-point

Floating-point represents a decimal. We create a floating-point variable and use the type function to see its type:

pi = 3.1415926
print(type(pi)) 
      
Copy the code

An Integer is an Integer.

The plural types

A complex number is a real number and an imaginary number.

x = 10+1.2 j. # Imaginary numbers end with j or j
print(type(x)) 
      
Copy the code

When I first came into contact with complex numbers, I wondered why there is such a type and what it actually does, so I made a search:

Mzy0324: Microelectronic operations are basically all complex operations.

Hilevel: At least complex numbers are much more convenient than matrices for calculating rotations of vectors. Scientific computation and physics should be useful. PS: I often use Python as a computer with programming capabilities, which is handy for debugging purely mathematical algorithms.

Morris88: A large area of use of Python, mainly for scientific computing, mainly for space, banking, etc.

Python is often used in algorithms, scientific research and other fields, so it is clear that I do not have the need to use it.

string

String variable definitions are enclosed in a pair of double or single quotation marks. Such as:

x = "Hello Python"
y = 'Hello Python'
print(x,y) Hello Python Hello Python
Copy the code

String built-in functions:

function role
find(str[,start,end]) Looks for the substring STR in the string, scoped by the optional arguments start and end
count(str[,start,end]) Counts the number of substrings STR in a string. The optional arguments start and end limit the range
replace(old,new[,count]) Replace the old substring with the new substring in the string. The optional argument count represents the number of substitutions. By default, all substitutions are made
split(sep[,maxsplit]) Returns a list of split characters with the specified delimiter sep, with the optional argument maxsplit representing how many splits, all by default
Upper (), the lower () Change case
The join (sequence) Separates the elements in a sequence with the specified characters and generates a string.
startwith(prefix[,start,end]) Bool is returned to check whether the string begins with prefix. There is also aendwithJudge the ending.
strip([,str]) Remove whitespace at the beginning and end of a string (including \n and \t). The optional argument indicates that the specified character can be removed

Boolean type

A word about Booleans, by the way, but unlike Java, Boolean True and False must begin with a capital letter:

x = True
print(type(x)) 
      
Copy the code

Type conversion

After a few basic data types, we can’t help but mention type conversions. Python has some built-in functions for type conversions:

The function name role
int(x) Convert x to integer (decimal to integer removes decimal part)
float(x) Convert x to floating point
str(x) Convert x to a string
tuple(x) Convert x to a tuple
list(x) Convert x to a list
set(x) Convert x to a set and deduplicate it

Input and output

The input function is input. The input function returns the user’s input as a string. So if you type a number, remember to cast.

x = input("Please enter a number")
print(type(x),x) 
      
        10
      
Copy the code

As demonstrated many times before, the function print directly prints variables and values. Printing multiple variables at once can be separated by commas, as demonstrated above, to print both types and values. End =”” : end=”” : end=”” : end=”\n” : end=”\n” : end=”\n” : end=”\n” : If you want to output special characters, you may need to use the escape character: \.

x = 10
y = 20
print(x,y,end="") # print 10, 20 plus end="" no newline
print("Hello \\n Python") # output Hello \n Python
Copy the code

When printing, you can also format the output: %s for string format, %d for integer, and %f for floating point

z = 1.2
print("%f"%z) Output # 1.200000
Copy the code

In addition to formatting, %d, etc. can also be used as placeholders:

name = "Xiao Ming"
age = 18
print(Name: %s, age: % D%(name,age)) Name: Xiao Ming, age: 18
Copy the code

If you don’t bother with placeholders, you can also use the format function, which simply writes a pair of {} :

print(Name: {}, age: {}".format(name,age)) Name: Xiao Ming, age: 18
Copy the code

The operator

Arithmetic operator

In addition to addition, subtraction, multiplication and division, there are powers (**), modulus (%), integer (//)

x = 3支那2 # x is equal to 9, which is 3 to the second power
y = 5 % 3 # y=2 is 5 divided by 3 remains 2
z = 5 // 2 # z=2 is 5 divided by 2, the integer part is 2
Copy the code

Comparison operator

Is basically the same as other common programming languages, not equal to (! =), greater than or equal to (>=), equal to (==).

The assignment operator

Python also supports +=, *=, and other forms of assignment. In addition, the aforementioned exponents and modulo arithmetic operators are also supported, such as integer and assignment (//=) and modulo and assignment (%=).

x = 10
x %= 3
print(x) X %=3 x%=3
Copy the code

Logical operator

Not, and, or (or)

x = True
print(not x) # output False
Copy the code

If, while, for

These three are basically the same as other programming languages, except that they are written differently. First, the braces are removed and the conditional statement begins with a colon; Code fast has strict indentation requirements, because without curly braces, indentation is how a conditional statement determines its code fast range. The rest is basically the same: continue skips the next loop, break skips the entire loop. Here are three simple examples:

a = 10
# if or else is followed by a colon, and the code block needs to be indented
if a >= 10:
    print("Hello, boss.")
else:
    print("Fuck off")
    
# Similarly, a colon is required after the while, and the code block must be indented. (Python does not have num++, num+=1)
End =""; end="\n"
# ""*(j-i), for j-i Spaces
i = 1
j = 4
while i <= j:
    print(""*(j-i), end="")
    n = 1
    while n <= 2*i-1:
        print("*", end="")
        n += 1
    print("")
    i += 1

Range = range = range = range = range = range = range = range = range = range
# continue skips the next loop, break breaks the entire loop
for i in range(1.21) :if i % 2= =0:
        if(i % 10= =0) :continue
        if(i >= 15) :break
        print(i)
Copy the code

The container

The list of

A list is defined with a pair of [], each element is separated by comma, the element type is not required to be the same, and the list element is obtained by index. Specifically, we look at the following code:

info_list = ["Little red".18."Male"] # can not be the same type
info_list[2] = "Female" Modify the element at the specified index position
del info_list[1] Drop the element at the specified index position
info_list.remove("Female") # delete the value specified in the list
for att in info_list:   Walk through the elements
    print(att)
Copy the code

The above sample code demonstrates the use of partial lists, but here are some other common functions or syntax:

Function or syntax role
list.append(element) Add an element to the end of the list (this element can also be a list)
list.insert(index,element) Adds an element to the list at the specified location
list.extend(new_list) Add all elements of new_list to list
list.pop([,index]) Pops the last element, optional index, pops the location element
list.sort([,reverse=True]) To sort a list, the optional parameter Reverse =True indicates descending order
list[start:end] For list sharding, start and end represent the start-end index
list1+list2 Concatenating two lists

tuples

Tuples are defined with a pair of (). Tuples are also ordered, and they differ from lists in that lists can modify elements, whereas tuples cannot. Because of this, tuples also take up less memory than lists.

name_list=("Little red"."Wang")
Copy the code

The dictionary

Dictionaries are defined with a pair of {}, and the elements are key-value pairs. Examples of usage are as follows:

user_info_dict = {"name": "Wang"."age": "18"."gender": "Male"}
name = user_info_dict["name"] Select value from key
age = user_info_dict.get("age") # You can also get value with get(key)
user_info_dict["tel"] = "13866663333" If the key does not exist, add a key/value pair to the dictionary
del user_info_dict["tel"] Delete the specified key-value pair
Copy the code

So that’s the common syntax and functions. Dictionaries can also be traversed, but only through the duration, you need to specify whether the traversal is key or value, for example:

for k in dict.keys(): Pass through all keys
for v in dict.values(): Pass through all values
for item in dict.items(): It is also possible to iterate over key/value pairs directly
Copy the code

A collection of

Collections are unordered, also defined by a pair of {}, but not key-value pairs, as separate and non-repeating elements. Some usage is as follows:

user_id_set = {"1111"."22222"."3333"} # element does not repeat
print(type(user_id_set)) 
      
Instead of using {} directly, you can pass in a sequence using the set function, which unduplicates the list and returns a set (if it is a string, the string is split into characters).
new_user_id_set = set(list) 
Copy the code

Here’s a table showing some of the common functions or syntax:

Function or syntax role
element in set Checks whether the element is in the collection and returns a Boolean type
element not in set Determines if an element is not in the collection
set.add(element) Add elements to the collection
set.update(list,…..) Unduplicate each element in the sequence and add it to the collection, separated by commas if there are more than one sequence
set.remove(element) Deletes the specified element. An error is reported if the element does not exist
set.discard(element) Deletes the specified element. No error is reported if the element does not exist
set.pop() Randomly removes elements from the collection and returns the deleted elements
Set1&set2 or set1 intersection set2 Find the intersection of two sets, and you get the same result either way
Characters | set2 or characters union set2 Find the union of two sets
Set1-set2 or set1.difference(set2) Find the difference between the two sets, in order. Set1 -set2 means set1 has elements that set2 does not

function

Definition of a function

Functions in Python are defined in def:

def function_name(The list of parameters) : Parameters can be null. Multiple parameters are separated by commasThe body of the functionreturnThe return value# optional

# Function callFunction_name (parameter list)Copy the code

The default parameters

As with the body of the loop, indentation is strictly required because there are no braces. In addition to the more common format above, there is a default argument for Python functions, that is, arguments with default values. When you call a function with default arguments, you do not need to pass in the value of the default argument; if you pass in the value of the default argument, the value passed in is used.

def num_add(x,y=10) : # y is the default function. If this function is called with only the value of x, y defaults to 10
Copy the code

Named parameters

In general, when a function is called, its arguments are passed in the order of the argument list. Named parameters mean that when a function is called, the arguments are passed in by their names, so that they are not passed in the order in which they are defined.

def num_add(x, y) :
    print("x:{},y:{}".format(x, y))
    return x+y
# output:
# x:10,y:5
# 15
print(num_add(y=5, x=10))
Copy the code

Indefinite length parameter

An indeterminate argument can take any number of arguments, and there are two Python methods for accepting arguments: 1. The argument is preceded by an *, and the passed argument is placed in the tuple. 2. Two asterisks (**) are added before the parameter, indicating that the parameter is received in key-value pair form.

A * #
def eachNum(*args) :
    print(type(args))
    for num in args:
        print(num)
# output:
# < class 'tuple' > '
# (1, 2, 3, 4, 5)
eachNum(1.2.3.4.5)

## Two **. The other is to tell you that you can also use ordinary parameters when using variable length parameters
def user_info(other,**info) :
    print(type(info))
    print("Other information: {}".format(other))
    for key in info.keys():
        print("{}, {}".format(key,info[key]))
When you pass in a named parameter, you don't need to add a key/value pair with curly braces
# output:
# <class 'dict'>
# Other information: Administrator
# a little...
user_info("Administrator",name="Zhao four",age=18,gender="Male")
Copy the code

As noted in the comments in the sample code above, when using an indeterminate parameter, you do not need to pass in the parameter directly, as dictionary or tuple definitions do. Sometimes, however, you will need to unpack a function that wants to pass elements from a dictionary, tuple, etc., into a variable argument.

Unpacking is simply passing in a parameter with one or two * in front of the container. Use the user_info function above as an example:

user_info_dict={"name":"Zhao four"."age":18."gender":"Male"}
user_info("Administrator",**user_info_dict) # same effect as above
Copy the code

Note that if the receiver’s variable length argument uses only one * definition, then only one * can be used for the argument passed in.

Anonymous functions

Anonymous functions, that is, functions that have no name. When defining anonymous functions, neither the name nor the DEF keyword is required. The syntax is as follows:

lambdaParameter list: expressionCopy the code

Multiple arguments are separated by commas, and the anonymous function automatically returns the result of the expression. When used, anonymous functions are usually received with a variable or passed directly as arguments.

sum = lambda x,y : x+y
print(sum(1.2)) Output # 3
Copy the code

Closures and decorators

In Python, you can also define functions inside functions, which are called outer functions, and functions inside functions are called inner functions. The return value of the outer function is a reference to the inner function, which is expressed as a closure. An internal function can call a variable from an external function. Let’s look at an example:

# external function
def sum_closure(x) :
    # Inner function
    def sum_inner(y) :
        return x+y
    return sum_inner Return the inner function
# get the inner function
var1 = sum_closure(1)
print(var1) 
      
       . Sum_inner at 0x000001D82900E0D0
      
print(var1(2)) Output # 3
Copy the code

With closures out of the way, let’s look at decorators. I don’t know if you’re familiar with AOP, aspect oriented programming. In human language, the object function is preceded and followed by some public functions, such as logging, permissions, and so on. There is of course a way to implement aspect programming in Python: decorators. Decorators, together with closures, can be very flexible to achieve similar functionality, as shown in the following example:

import datetime If you don't have the package, type pip3 install datetime in the terminal
An external function whose argument is the target function
def log(func) :
    # Internal function, parameters must be consistent with the target function. Variable length parameters can also be used to further improve program flexibility
    def do(x, y) :
        # Pretend to log, execute the section function. The first datetime is a module, the second is a class, and now is a method. Modules in the next section)
        print("Time: {}".format(datetime.datetime.now()))
        print("Log")
        Execute the target function
        func(x, y)
    return do

# @ is the syntax sugar for decorator, log external function
@ log
def something(x, y) :
    print(x+y)
    
Call the target function
# output:
# Time: 2021-01-06 16:17:00.677198
# Log
# 30
something(10.20)
Copy the code

So much for functions. In fact, there are still some knowledge left unsaid, such as the scope of variables, return values and so on. This section is almost the same as any other language, except that the type of the return value does not matter. After all, the type of the return value of the function is not specified when defining the function, as any veteran driver will remember.

Packages and modules

package

The difference between a package and a normal folder in Python is that an __init__.py file is created inside the package to identify it as a package. This file can be blank, or it can define some initialization operations. When a module in another package calls a module in this package, the contents of the __init__.py file are automatically executed.

The module

A Python file is a module, and modules in different packages can have the same name, using the package name. Module name “distinction. Import other modules using the import keyword, as demonstrated in the previous example code. Importing multiple modules can be separated by commas or written separately. In addition to importing the entire module, you can also import functions or classes specified in the module:

from model_name import func_name(or class_name)
Copy the code

After importing a function or class, do not use the module name and simply call the imported class or function.

object-oriented

Classes and objects

Python is an object-oriented interpreted programming language. Object orientation is all about classes and objects. Classes in Python are defined using the class keyword, as follows:

classThe name of the class:
	defThe method name (Self [, argument list])...Copy the code

A function defined inside a class is called a method, just to distinguish it from a function outside the class, regardless of the name. Class, there will be a default argument in the argument list that represents the current object, which you can use as this in Java. Because a class can create multiple objects, with self, Python knows which object it is manipulating. We don’t need to manually pass in self when we call this method. Sample code:

class Demo:
    def do(self) :
        print(self)
Create two objects of type Demmo
demo1=Demo()
demo1.do() <__main__.Demo object at 0x0000019C78106FA0>
demo2=Demo() 
demo2.do() <__main__.Demo object at 0x0000019C77FE8640>
print(type(demo1)) # <class '__main__.Demo'>
Copy the code

A constructor

A constructor initializes an object of a class when it is created. The constructor name of a class in Python is __init__ (two underscores on each side). When an object is created, the __init__ method is executed automatically. As with normal methods, if you want to customize the constructor, accept the self argument as well. Sample code:

class Demo:
    Other parameterizations can be passed in as well
    def __init__(self,var1,var2) :
        # set the parameter to the current object, even if there is no property in the class
        self.var1=var1
        self.var2=var2
        print("Initialization completed")
    def do(self) :
        print("Working...")
Pass the argument through the constructor
demo1=Demo(66.77)
demo1.do()
Get the parameter from the current object
print(demo1.var1)
print(demo1.var2)
Copy the code

Access permissions

There are several types of access in Java or C#. In Python, attributes and methods are private by adding two underscores, and public by adding two underscores. Properties and methods that have private access rights can only be accessed inside a class’s methods, not outside. As in other languages, the purpose of private is to ensure the accuracy and security of attributes. Example code is as follows:

class Demo:
    # For ease of understanding, we display a private property for setting
    __num = 10
    # Public operation method, add judgment inside, ensure the accuracy of data
    def do(self, temp) :
        if temp > 10:
            self.__set(temp)
	# private setting method, do not let external directly set properties
    def __set(self, temp) :
        self.__num = temp
	# public get method
    def get(self) :
        print(self.__num)

demo1 = Demo()
demo1.do(11)
demo1.get() Output # 11
Copy the code

A bunch of self. It’s a little confusing at first, just think of it as this.

inheritance

Inheritance is another great tool in object-oriented programming, and one of its benefits is code reuse. Subclasses can only inherit the public attributes and methods of their parent class. Python syntax is as follows:

class SonClass(FatherClass) :
Copy the code

When we create a SonClass object, we can call the public methods of FatherClass directly from that object. Python also supports multiple inheritance, which is separated by commas in parentheses.

If you want to call a method from a subclass, there are two ways: 1. Method name (self[, argument list]). Self is the subclass self and needs to be passed in; 2. Super (). Method name (). The second way is because there is no parent class specified, so in the case of multiple inheritance, if you call a method with the same name in one of these parent classes, Python will actually execute the method in the preceding parent class in the parentheses.

If a subclass defines a method with the same name as the parent class, the subclass’s method overrides the parent class’s method.

Exception handling

Catch exceptions

The syntax for catching exceptions is as follows:

tryFast code:# Code for possible exceptions
except(Exception type,...)as err: Use commas to separate multiple exception types. If only one exception type is present, leave out the parentheses. Err is the alias of the fetchException handlingfinallyFast code:# will be executed anyway
Copy the code

In a try block, code after the error code is not executed, but does not affect the try… Code other than except. Take a look at an example code:

try:
    open("123.txt") An exception occurs when a file does not exist
    print("hi") # This line of code will not execute
except FileNotFoundError as err:
    print("Exception: {}".format(err)) # exception handling

print("I'm code outside of try except") # Normal execution
Copy the code

Although the above content is not much different from other languages, but I am new to Python who knows what Exception types there are. Do you have Exception types similar to Java? There has to be. Python also provides the Exception type to catch all exceptions.

What if the exception occurred in code that was not caught by try except? Either an error is reported and the program stops running. Either it will be caught by an external try except, which means that exceptions can be passed. If func1 does not catch an exception, and func2 calls func1 and uses a try except, then func1’s exception is passed to func2. Is it the same as throws on Java?

An exception is thrown

The keyword used to throw an exception in Python is raise, which is similar to the Java throw new. Example code is as follows:

def do(x) :
    if(x>3) :Throw an exception if it is greater than 3
        raise Exception("No more than three.") If you know the specific exception, it is best to write the exception information in parentheses
    else:
        print(x)

try:
    do(4)
except Exception as err:
    print("Exception: {}".format(err)) Output error: cannot exceed 3
Copy the code

File operations

Read and write files

To manipulate a file, you first have to open it. Python has a built-in function called open. You can open a file in three modes: read-only (the default mode is r, which only reads the file content), write (the original text content is overwritten, w), and append (the new content is appended to the end, A). The following is an example:

f = open("text.txt"."a") Get file objects by append
Copy the code

Text.txt and the code are in the same directory, so only the file name is written. If not in the same directory, you need to write a relative path or absolute path.

After obtaining the file object, you can then operate on it. Anyway, there are some apis.

f = open("text.txt"."a",encoding="utf-8") Open the file append and set the encoding as Chinese will be written next
f.write("234567\n") # write data, the last \n is a newline character, implement a newline
f.writelines(["Zhang SAN \ n"."Zhao four \ n"."Fifty and \ n"]) Write can write only one string, and writelines can write a list of strings
f.close() Remember to close after operation
Copy the code

So those are the two ways to write files. Finally, close the file, because the operating system caches the written data and loses it if the system crashes. Although the file will be closed automatically after the program is executed, in the actual project, there must be more than this code. Python is also considerate enough to provide a safe way to open files in case we forget to close, with open() as alias:, as shown in the following example

with open("test.txt"."w") as f: Open the file safely without closing it.
    f.write("123")
Copy the code

That’s it. Time to read it. The following is an example:

f = open("text.txt"."r",encoding="utf-8")
data = f.read() # read reads everything at once
print(data)
f.close()
Copy the code

Instead of reading it all at once, we can return the whole thing in a row and put it in a list so we can traverse it. The method is readlines as shown in the following example:

f = open("text.txt"."r",encoding="utf-8")
lines = f.readlines() # lines is a list
for line in lines:
    print(line)
f.close()
Copy the code

File management

When manipulating files, you must not only read and write, but also delete, rename, create files, and so on. Before you can manipulate files with Python functions, you need to import the OS mode: import OS. The following is a brief demonstration of the renamed functions. The other functions are presented in table form.

import os
os.rename("text.txt"."123.txt") # change the name of text.txt to 123
Copy the code
function role
os.remove(path) Delete a specified file
os.mkdir(path) Creates a new file in the specified path
os.getcwd() Gets the absolute path where the program runs
os.listdir(path) Gets a list of files, including files and folders, in the specified path
os.redir(path) Delete an empty folder in the specified path (error if it is not empty)

Operating the JSON

After learning the previous container, you’ll see that JSON is formatted somewhat like Python dictionaries, with key-value pairs. The format is similar, but there are some minor differences: Python tuples and lists are both lists in JSON, Python True and Flase are converted to lowercase, and None is converted to NULL. Now let’s look at some specific functions.

Manipulating jSON-formatted data in Python requires importing the JSON module. Again, I’m only going to show you one function here, and the other ones that I use are listed in a table.

import json
user_info={"name":"Zhang"."age":18."gender":"Male"."hobby": ("Sing"."Dance"."Play basketball"),"other":None} # create a dictionary
json_str=json.dumps(user_info,ensure_ascii=False) The # dumps function converts the dictionary to a JSON string
Output # {" name ":" zhang ", "age" : 18, "gender" : "male" and "hobby" : [" sing ", "dance", "basketball"], "other" : null}
print(json_str)
Copy the code

Note that if the data exists in Chinese, you need to add ensure_ASCII =False to the dumps function.

function role
json.loads(json_str) Convert JSON strings to Python data structures
json.dump(user_info,file) To write Python data to a JSON file, you first get the file, and that file is the file object
json.load(file) To convert data from a JSON file into a Python data structure, you also need to obtain the file

So much for JSON manipulation. JSON is not the only common data format, such as XML, CSV, etc. In order to save space, I will not repeat the details, but you can check the CORRESPONDING API according to your own needs.

Regular expression

The last section covers regular expressions, one because it’s a basic knowledge that can be used in many places. Second, because the back of the crawler combat, will certainly use regular expression to parse a variety of data.

Python has the re module built in to handle normal expressions, and with this module we can easily perform all kinds of rule matching checks on strings. Re.match (pattern,string), where pattren is the regular expression and stirng is the string to be matched. A Match object is returned if the Match is successful, otherwise None is returned. Matches go from left to right. If they don’t match, None is returned. The following is an example:

import re
res=re.match("asd"."asdabcqwe") # Match if there is an ASD in the string (return None if the ASD is not at the beginning)
print(res) < re.match object; span=(0, 3), match='asd'>
print(res.group()) This function is used if the asD wants to get matched subcharacters
Copy the code

In the spirit of helping out, here are some rules for regular expressions.

Single character matching

Single character matching, as the name implies, matches a character. In addition to using a specific character directly, you can use the following symbols to match:

symbol role
. Matches any single character except “\n”
\d Matches a number between 0 and 9, equivalent to [0-9]
\D Matches a non-numeric character, equivalent to [^0-9]
\s Matches any whitespace characters, such as space, \t, \n, etc
\S Matches any non-whitespace character
\w Matches word characters, including letters, digits, and underscores
\W Matches non-word characters
[] Matches the characters listed in [], such as [ABC], as long as one of these three letters appears

For those of you who have never worked with regular expressions and don’t know how to use them, let me give you a quick demonstration. Suppose I want to match three characters: the first is a number, the second is a space, and the third is a letter. Let’s see how to write this regular expression:

import re
pattern = "\d\s\w" # \d matches digits, \s matches Spaces, and \w matches letters.
string = "2 z Hello"
res=re.match(pattern,string)
print(res.group()) # output: 2z
Copy the code

You might look at this and think, well, that’s a lot of trouble to have to match each character, but is there a more flexible rule? Of course we do. Keep watching.

The number of said

What if we just want to match letters, but don’t limit how many letters there are? Take a look at the chart below:

symbol role
* Matches a character zero or more times
+ Matches a character at least once, equivalent to {,1}
? Matches zero or one occurrence of a character, equivalent to {1,2}
{m} Matches a character m times
{m,} Matches a character at least m times
{m,n} Matches a character m to n times

If the quantity matches the symbol with? , will match as few characters as possible, in Python called non-greedy mode, otherwise the default is greedy mode. For example, {m,} matches as many characters as possible, while {m,}? As few characters as possible to match as long as there are at least m. And the same goes for everything else.

For example, I want to match any lowercase letter at the beginning, followed by 1 to 5 2-6 numbers, and at least one space at the end:

import re
pat = R "[a-z] * [2-6] {1, 5} \ s +"
str = Abc423 "hello"
res=re.match(pat,str) 
print(res) # output abc423
Copy the code

The r at the beginning of the pat string tells Python that this is a regular expression. Do not escape the \ in the pat string. [a-z] stands for any lowercase letter. The reason \w is not used is that \w also includes digits and underscores, which does not strictly meet our requirements. Plus a star is any number. Here we emphasize the logical relationship between monocharacter matching and quantity representation, in the case of [a-z]*, which represents any [a-z] rather than any letter. Once you understand this logic, you understand everything else.

I made up all the examples, but now that I’ve learned this, I can write an expression that actually works, like matching a cell phone number. The first digit must be 1. The second digit must be one of 3, 5, 7 or 8. Knowing these three rules, let’s write the expression: 1[3578]\d{9}. It looks like it can, but if you think about it, the regular expression matches from left to right, and if it matches, it will return the result, not even if the string matches at all. If there are 10 digits at the end, the expression will also match. Let’s move on to this question.

The border said

There are two boundary representations: the beginning ^ and the end $. ^1[3578]\d{9}$^1[3578]\d{9}$ \d{9}$\d{9}$\d{9}$ This ^1 is optional. After all, it goes from left to right, and a string that doesn’t start with a 1 will return None, but this terminator is required.

Escape character

What if the characters we want to match are the same as those specified by the regular expression? Let’s say we want a pure match. This character, but this character represents arbitrary characters in regular expressions. This is where the escape character \ comes in. In fact, this escape character is the same in many languages. The previous example could then be written as \. Let’s show another example of matching a mailbox:

import re
pat = R "^ \ w {4, 10} @ qq\.com" # If the. Is not preceded by \, it represents any character
str = "[email protected]"
res=re.match(pat,str)
print(res)
Copy the code

Match the grouping

If I want to match more than QQ email, what should I do? That’s where grouping comes in, which can match multiple situations. The grouping symbols are as follows:

symbol role
(a) Think of the contents in brackets as a group, and each group will have a number, starting with 1
| Concatenates multiple expressions that are “or” in relation to each other and can be used with ()
\num Reference group. Num indicates the group number
(? P…) Alias the group. The alias is written before the expression. Name is not quoted
(? P=name) Use regular expressions in groups based on aliases

Then we modify the example above slightly under: ^ \ w {4, 10} @ (qq | 163 | outlook | gmail) \. Com. This allows you to match multiple mailboxes.

Simple demonstrated | usage, you may have on other grouping symbols have a doubt, we’ll demonstrate the symbols below:

import re
pat = r"<(.+)><(.+)>.*<(/\2)><(/\1)>" 
str = "<body><div></div></body>"
res=re.match(pat,str)
print(res)
Copy the code

This expression matches an HTML string consisting of two tags. It’s a bit of a hassle at first glance, but it’s actually quite simple. Again, ordinary characters can be matched as expressions, such as < > above.

Let’s examine this expression. First a pair of parentheses denotes a grouping, and the.+ inside denotes only one non-\ n character. The middle.* matches the contents of the tag. In /\2, the first slash pairs with the previous HTML tag, and /2 refers to the content of the second group. Why do we use groups here? Because we also want to make sure that the HTML tags match correctly. If you also use a.+, you can try swapping /div and /body and the expression will still match, but this is clearly not HTML syntax.

Operation function

Here are some Python functions that manipulate regular expressions :(re is the imported module)

function role
re.compile(patt) Encapsulates the regular expression and returns an expression object
re.search(patt,str) Searches left to right for the first substring matching the regular expression
re.findall(patt,str) Searches the string for all substrings matched by the regular expression and returns a list
re.finditer(patt,str) Searches the string for all substrings matched by the regular expression and returns an Iterator object
re.sub(patt,newstr,str) Replaces the substring in the string matched by the regular expression with newstr, and returns the new string, unchanged

That’s the end of the first Python article. Next I will learn and write, do some fun Python projects, and share them together. Thank you for pointing out any mistakes!

Reference: Python 3 Quickstart and In Action