Processes and threads

Multicore cpus are very common today, but even single-core cpus in the past can perform multiple tasks. Since the CPU executes code sequentially, how can a single-core CPU multitask?

The answer is that the operating system alternates tasks. Task 1 executes for 0.01 seconds, switch to Task 2, task 2 executes for 0.01 seconds, switch to task 3, and execute for 0.01 seconds… And so on and so forth. On the surface, each task is executed alternately, but the CPU is so fast that it feels like everything is being executed at the same time.

For the operating system, a task is a Process, such as opening a browser is to start a browser Process, opening a Notepad is to start a notepad Process, opening two notepad is to start two Notepad processes, opening a Word is to start a Word Process.

Some processes do more than one thing at a time, such as Word, which can type, spell check, print, and so on at the same time. In a process, to do more than one thing at a time, it is necessary to run multiple “sub-tasks” at the same time. We call these “sub-tasks” in the process threads.

Since every process has to do at least one thing, a process has at least one thread. Of course, a complex process like Word can have multiple threads, and multiple threads can be executed at the same time. Multithreading works in the same way as multi-processing, with the operating system quickly switching between multiple threads so that each thread briefly alternates and appears to be executing at the same time. Of course, truly multithreading at the same time requires a multi-core CPU to be possible.

What if we need to perform multiple tasks at once?

There are two solutions:

One is to start multiple processes. Although each process has only one thread, multiple processes can perform multiple tasks together.

Another option is to start a process, where multiple threads are started within a process so that multiple threads can perform multiple tasks together.

Of course, there is a third way, which is to start multiple processes, each process then starts multiple threads, so that more tasks can be executed at the same time, of course, this model is more complex and rarely used in practice.

To summarize, multitasking can be achieved in three ways:

Multi-process mode;
Multithreaded mode;
Multi-process + multi-thread mode.

To perform multiple tasks at the same time usually is not no relation between the individual tasks, but need to communicate with each other and coordination, sometimes, task 1 must be suspended waiting for after the completion of the task 2 to continue to perform, sometimes, task 3 and task 4 and cannot perform at the same time, so the complexity of the multi-process and multi-thread programs is far higher than our previous write program of single process threads.

Multiple processes

Let’s take a look at the operating system.

The Unix/Linux operating system provides a fork() system call that is very special. A normal function call is called once and returns once, but fork() is called once and returns twice, because the operating system automatically copies the current process (called the parent process) (called the child process) and returns it in both the parent and child processes.

The child process always returns 0, and the parent process returns the ID of the child. The reason for this is that a parent can fork many children, so the parent remembers the ID of each child, and the child can get the parent’s ID simply by calling getppId ().

Python’s OS module encapsulates common system calls, including fork, which makes it easy to create child processes in Python programs:

import os print('Process (%s) start... ' % os.getpid()) # Only works on Unix/Linux/Mac: pid = os.fork() if pid == 0: print('I am child process (%s) and my parent is %s.' % (os.getpid(), os.getppid())) else: print('I (%s) just created a child process (%s).' % (os.getpid(), pid))Copy the code

The running results are as follows:

Process (876) start...
I (876) just created a child process (877).
I am child process (877) and my parent is 876.
Copy the code

Because Windows does not have a fork call, the above code will not run on Windows.

With fork, when a process receives a new task, it can replicate a child process to handle the new task. A common Apache server has the parent process listening on the port and forking out the child process to handle the new HTTP request whenever it comes in.

multiprocessing

Since Python is cross-platform, it is natural to provide cross-platform multi-process support as well. The Multiprocessing module is a cross-platform version of the multiprocess module.

The MultiProcessing module provides a Process class to represent a Process object. The following example demonstrates starting a child Process and waiting for it to end:

Def run_proc(name): print('Run child Process %s (%s)... ' % (name, os.getpid())) if __name__=='__main__': print('Parent process %s.' % os.getpid()) p = Process(target=run_proc, args=('test',)) print('Child process will start.') p.start() p.join() print('Child process end.')Copy the code

The result is as follows:

Parent process 928.
Child process will start.
Run child process test (929)...
Process end.
Copy the code

Creating a child Process is easier than forking () by passing in a function that executes and its parameters, creating a Process instance and starting it with the start() method.

The join() method can wait for the child process to finish before continuing, and is typically used for synchronization between processes.

Pool

If you want to start a large number of child processes, you can use the process pool to create a batch of child processes:

from multiprocessing import Pool import os, time, random def long_time_task(name): print('Run task %s (%s)... ' % (name, Os.getpid ()) start = time.time() time.sleep(random.random() * 3) end = time.time() print('Task %s runs %0.2f seconds.' % (name, (end - start))) if __name__=='__main__': print('Parent process %s.' % os.getpid()) p = Pool(4) for i in range(5): p.apply_async(long_time_task, args=(i,)) print('Waiting for all subprocesses done... ') p.close() p.join() print('All subprocesses done.')Copy the code

The result is as follows:

Parent process 669. Waiting for all subprocesses done... Run task 0 (671)... Run task 1 (672)... Run task 2 (673)... Run task 3 (674)... Task 2 runs 0.14 seconds. Run Task 4 (673)... Task 3 runs 0.86 seconds. Task 0 runs 1.41 seconds. Task 4 runs 1.91 seconds. All subprocesses  done.Copy the code

Calling join() on the Pool object will wait for all child processes to complete. Close () must be called before calling join(), and no new processes can be added after calling close().

Note that task 0, 1, 2, and 3 are executed immediately, while Task 4 waits for one of the previous tasks to complete. This is because the default Pool size on my computer is 4, so at most 4 processes are executed simultaneously. This is a deliberate limitation by Pool, not by the operating system. If…

p = Pool(5)
Copy the code

You can run five processes at once.

Since the default size of the Pool is the number of CPU cores, if you are unlucky enough to have an 8-core CPU, you will have to commit at least 9 child processes to see the wait effect above.

The child process

Most of the time, the child process is not itself, but an external process. Once we have created the child process, we also need to control the input and output of the child process.

The subprocess module makes it very easy to start a child process and then control its inputs and outputs.

The following example shows how to run the command nslookup www.python.org from Python code, which has the same effect as running it directly from the command line:

import subprocess

print('$ nslookup www.python.org')
r = subprocess.call(['nslookup', 'www.python.org'])
print('Exit code:', r)
Copy the code

Running results:

$nslookup www.python.org Server: 192.168.19.4 Address: 192.168.19.4#53 Non-authoritative answer: www.python.org canonical name = python.map.fastly.net. name: python.map.fastly.net Address: 199.27.79.223 Exit code: 0Copy the code

If the child process also needs input, it can use the Communicate () method:

import subprocess

print('$ nslookup')
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
print(output.decode('utf-8'))
print('Exit code:', p.returncode)
Copy the code

The above code is equivalent to executing the command nslookup from the command line and manually typing:

set q=mx
python.org
exit
Copy the code

The running results are as follows:

$nslookup Server: 192.168.19.4 Address: 192.168.19.4#53 Non-authoritative answer: python.org mail exchanger = 50 mail.python.org. Authoritative answers can be found from: Python.org Internet address = 82.94.164.166 mail.python.org has AAAA address 2001:888:2000:d::a6 Exit code: 0Copy the code

Interprocess communication

Processes definitely need to communicate with each other, and the operating system provides many mechanisms to do so. Python’s Multiprocessing module wraps the underlying mechanism, providing Queue, Pipes, and other ways to exchange data.

From multiprocessing import Process, Queue import OS, time, random # Def write(q): print( %s' % os.getpid()) for value in ['A', 'B', 'C']: print('Put %s to queue... Sleep (random.random()) # def read(q): print('Process to read ') # def read(q): print('Process to read: %s' % os.getpid()) while True: value = q.get(True) print('Get %s from queue.' % value) if __name__=='__main__': The parent process creates a Queue and passes it to its children: Q = Queue() pw = Process(target=write, args=(q,)) pr = Process(target=read, args=(q,)) Pw.start () # Pr.start () # Pr.start () # pr.start() # pr.start() # wait for pw to end: pw.join() # PR.terminate ()Copy the code

The running results are as follows:

Process to write: 50563
Put A to queue...
Process to read: 50564
Get A from queue.
Put B to queue...
Get B from queue.
Put C to queue...
Get C from queue.
Copy the code

multithreading

Multitasking can be done by multiple processes or by multiple threads within a process.

The Python standard library provides two modules: _thread and threading. _thread is a low-level module, and threading is a high-level module that encapsulates _thread. For the most part, we just need to use the threading high-level module.

To start a Thread, pass in a function, create an instance of Thread, and call start() to start execution:

Def loop(): print('thread %s is running... def loop(): print('thread %s is running... ' % threading.current_thread().name) n = 0 while n < 5: n = n + 1 print('thread %s >>> %s' % (threading.current_thread().name, n)) time.sleep(1) print('thread %s ended.' % threading.current_thread().name) print('thread %s is running... ' % threading.current_thread().name) t = threading.Thread(target=loop, name='LoopThread') t.start() t.join() print('thread %s ended.' % threading.current_thread().name)Copy the code

The result is as follows:

thread MainThread is running...
thread LoopThread is running...
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.
Copy the code

Since any process starts a thread by default, we call that thread the main thread, which in turn can start new threads. Python’s threading module has a current_thread() function that always returns an instance of the current thread. The name of the MainThread instance is MainThread, and the name of the child thread is specified at creation time. We use LoopThread to name the child thread. The name is only used when printing and has no other meaning at all. Python automatically names the Thread thread-1, thread-2…

Lock

Multithreading and the difference from multiple processes, and more, in the process of the same variable, each have a copy in each process, each other, and multiple threads, all variables are Shared by all threads, so any one variable can be any one thread to modify, therefore, the thread to share data between the biggest danger lies in the fact that multiple threads at the same time change a variable, I’ve got it all messed up.

Let’s see how multiple threads working on a variable can mess things up:

Balance = 0 def change_it(n): global balance balance = balance + n balance = balance - n def run_thread(n): for i in range(2000000): change_it(n) t1 = threading.Thread(target=run_thread, args=(5,)) t2 = threading.Thread(target=run_thread, args=(8,)) t1.start() t2.start() t1.join() t2.join() print(balance)Copy the code

We define a shared variable balance, with an initial value of 0, and start two threads, save first and fetch later. Theoretically, the result should be 0. However, since the thread scheduling is determined by the operating system, when T1 and T2 are executed alternately, the result of balance may not be 0 as long as there are enough cycles.

Because a statement in a high-level language is several statements when executed by the CPU, even a simple computation:

balance = balance + n
Copy the code

There are also two steps:

To calculatebalance + n, stored in temporary variables;
Assigns the value of a temporary variable tobalance.

In other words, it can be seen as:

x = balance + n
balance = x
Copy the code

You don’t want your bank balance to go negative for no reason, so we have to make sure that when one thread changes balance, the other thread cannot change it.

If we want to make sure that balance is calculated correctly, we place a lock on change_it(). When one thread starts to execute change_it(), we say that the thread has acquired the lock, so other threads cannot execute change_it() at the same time and can only wait until the lock is released. You can’t change it until you get the lock. Because there is only one lock, no matter how many threads hold the lock at most one thread at a time, there is no modification conflict. Creating a Lock is done through threading.Lock() :

Balance = 0 lock = threading.lock () def run_thread(n): for I in range(100000): Change_it (n) finally: # Change the lock: lock.release()Copy the code

When multiple threads execute Lock.acquire () at the same time, only one thread succeeds in acquiring the lock and continues to execute the code, while the other threads continue to wait until the lock is acquired.

The thread that acquired the lock must release the lock when it is used up, otherwise the thread that is waiting for the lock will wait forever and become dead. So we use try… Finally to ensure that the lock must be released.

Multicore cpus

Multicore should be able to execute multiple threads at the same time. What happens if you write an infinite loop?

In order to run all the cores of an N-core CPU to full, N dead-loop threads must be started.

import threading, multiprocessing

def loop():
    x = 0
    while True:
        x = x ^ 1

for i in range(multiprocessing.cpu_count()):
    t = threading.Thread(target=loop)
    t.start()
Copy the code

Starting N threads with the same number of CPU cores can monitor a CPU usage of only 102% on a 4-core CPU, that is, only one core is used.

But rewriting the same loop in C, C++, or Java can directly run the entire core to 400% for 4 cores and 800% for 8 cores. Why not Python?

Because Python threads are real threads, but the interpreter executes code with a GIL lock: The Global Interpreter Lock, which must be acquired by any Python thread before it executes, is then automatically released by the Interpreter for every 100 bytes of bytecode, giving other threads a chance to execute. The GIL global lock actually locks all thread execution code, so multithreading can only be executed alternately in Python, even if 100 threads are running on a 100 core CPU, only one core is used.

Python can do multi-core tasks with multiple processes, although it cannot do so with multiple threads. Multiple Python processes have independent GIL locks that do not affect each other.

ThreadLocal

In a multithreaded environment, each thread has its own data. It is better for a thread to use its own local variables than to use global variables, because local variables are visible only to the thread itself and do not affect other threads, and changes to global variables must be locked.

The problem with local variables, however, is that they are cumbersome to pass when a function is called:

def process_student(name): Do_task_1 (STD) do_task_2(STD) def do_task_1(STD): do_subtask_1(std) do_subtask_2(std) def do_task_2(std): do_subtask_2(std) do_subtask_2(std)Copy the code

What if a global dict holds all Student objects, and the thread itself is the key to get the corresponding Student object?

global_dict = {} def std_thread(name): Global_dict STD = Student(name)  global_dict[threading.current_thread()] = std do_task_1() do_task_2() def do_task_1(): Global_dict [threading.current_thread()] STD = global_dict[threading.current_thread()] Def do_task_2(): STD = global_dict[threading.current_thread()] def do_task_2(): STD = global_dict[threading.current_thread()]...Copy the code

This approach is theoretically possible, and its biggest advantage is that it eliminates the problem of passing STD objects through functions at each level, but the code for each function to fetch STD is a bit ugly.

Is there an easier way?

Instead of looking up a dict, ThreadLocal does this for you automatically:

Local_school = threading.local() def process_student(): std = local_school.student print('Hello, %s (in %s)' % (std, Thread.current_thread ().name) def process_thread(name): local_school.student = name process_student() t1 = threading.Thread(target= process_thread, args=('Alice',), name='Thread-A') t2 = threading.Thread(target= process_thread, args=('Bob',), name='Thread-B') t1.start() t2.start() t1.join() t2.join()Copy the code

Execution Result:

Hello, Alice (in Thread-A)
Hello, Bob (in Thread-B)
Copy the code

The local_school global variable is a ThreadLocal object that can read and write student attributes from each Thread, but does not affect each other. You can think of local_school as a global variable, but each property, such as local_school.student, is local to the thread and can be read or written to without interfering with each other.

The most common use of ThreadLocal is to bind each thread to a database connection, HTTP request, user identity, etc., so that all handlers called by a thread can easily access these resources.

Process vs. thread

To realize multitasking, we usually design the master-worker mode, where the Master is responsible for assigning tasks and the Worker is responsible for executing tasks. Therefore, in a multitasking environment, there is usually one Master and multiple workers.

If multiple processes are used to implement master-worker, the Master process is Master and other processes are workers.

If a master-worker is implemented using multiple threads, the main thread is the Master and the other threads are workers.

The biggest advantage of the multi-process mode is its high stability, because a child process crash does not affect the main process and other child processes. (Of course, when the main process dies, all processes will die, but the Master process is only responsible for assigning tasks and the probability of dying is very low.)

The disadvantage of multi-process mode is that it is expensive to create a process. On Unix/Linux systems, it is ok to fork a process, but on Windows, it is expensive to create a process. In addition, the operating system can run a limited number of processes at the same time, in memory and CPU constraints, if there are thousands of processes running at the same time, the operating system even scheduling problems.

Multithreaded mode is usually a little faster than multi-process, but not much faster, and the fatal drawback of multithreaded mode is that the failure of any thread can directly cause the entire process to crash, since all threads share the process’s memory. On Windows, if there is a problem with code executed by a thread, you will often see a message like “This program has performed an illegal operation and is about to be shut down.” The problem is usually one thread, but the operating system forces the entire process to die.

Under Windows, multi-threading is more efficient than multi-process, so Microsoft IIS server uses multi-threading mode by default. IIS is not as stable as Apache due to stability issues with multiple threads. In order to alleviate this problem, IIS and Apache now have a multi-process + multi-thread hybrid mode, which really makes the problem more complicated.

Computation-intensive vs. IO intensive

The second consideration for multitasking is the type of task. We can classify tasks as computationally intensive and IO intensive.

Computation-intensive tasks, such as calculating PI or decoding video in high definition, require a lot of computation and consume CPU resources. This type of computationally intensive task can also be accomplished with multi-task, but the more tasks, the more time spent switching between tasks, and the less efficient the CPU performs the task. Therefore, for the most efficient utilization of the CPU, the number of computationally intensive tasks concurrently should be equal to the number of CPU cores.

Since computationally intensive tasks consume CPU resources, code execution efficiency is critical. Scripting languages such as Python are inefficient and completely unsuitable for computationally intensive tasks. For computationally intensive tasks, it is best to write in C.

The second type of task is IO intensive. Tasks involving network and disk I/O are IO intensive tasks. These tasks consume little CPU and spend most of the time waiting for I/O operations to complete (because the I/O speed is much lower than that of CPU and memory). For IO intensive tasks, the more tasks, the higher THE CPU efficiency, but there is a limit. Most common tasks are IO intensive tasks, such as Web applications.

IO intensive tasks spend 99% of their time on IO and very little time on CPU, so replacing a very slow scripting language like Python with a very fast C language is not going to improve performance at all. For IO intensive tasks, the most appropriate language is the most efficient (least code) language, scripting language is preferred, C language is the worst.

Distributed process

Between Threads and processes, Process is preferred because it is more stable and can be distributed across multiple machines, whereas threads can be distributed at most across multiple cpus on the same machine.

Not only does Python’s Multiprocessing module support multiple processes, but the Managers submodule also supports the distribution of multiple processes across multiple machines. A server process can act as a scheduler, distributing tasks among other processes and relying on network communication. Because the Managers module is well packaged, it is easy to write distributed multi-process programs without knowing the details of network communication.

For example, if we already have a multi-process program that communicates via Queue running on the same machine, now we want to distribute the sending process and the processing process on the two machines due to the heavy workload of the processing process. How do you do that with distributed processes?

The original Queue can continue to be used, but by exposing it over the network through the Managers module, it can be accessed by processes on other machines.

The server starts the Queue, registers it with the network, and writes tasks to it:

# task_master.py import random, time, queue from multiprocessing.managers import BaseManager Task_queue = queue.queue () result_queue = queue.queue () Class QueueManager(BaseManager): pass # Registers both queues to the network, and the callable argument associates the Queue object: QueueManager.register('get_task_queue', callable=lambda: Task_queue) queuemanager. register('get_result_queue', callable=lambda: result_queue) Manager = QueueManager(address=(", 5000), authKey =b' ABC ') # start Queue: manager.start() # Task = manager.get_task_queue() result = manager.get_result_queue() # n = random.randint(0, 10000) print('Put task %d... '% n) task.put(n) # print('Try get results... ') for I in range(10): r = result.get(timeout=10) print(' result: %s' % r) #Copy the code

Note that when writing a multiprocess program on a single machine, the Queue created can be used directly. However, in a distributed multiprocess environment, adding tasks to the Queue cannot operate directly on the original task_queue, thus bypassed the QueueManager wrapper. Must be added through the Queue interface obtained by manager.get_task_queue().

Then, start the task process on another machine (local machine also works) :

# task_worker.py import time, sys, queue from multiprocessing.managers import BaseManager Class QueueManager(BaseManager): pass # Since this QueueManager only gets queues from the network, only the name is provided when registering: Queuemanager.register ('get_task_queue') queuemanager.register ('get_result_queue') queuemanager.register ('get_task_queue') Server_addr = '127.0.0.1' print('Connect to server %s... '% server_addr) # Keep the port and verification code exactly as set by task_master.py: M = QueueManager(address=(server_addr, 5000), authKey =b' ABC ') Task = m.get_task_queue() result = m.get_result_queue() # for I in range(10): try: n = task.get(timeout=1) print('run task %d * %d... ' % (n, n)) r = '%d * %d = %d' % (n, n, n*n) time.sleep(1) result.put(r) except Queue.Empty: Print ('task queue is empty.') #Copy the code

The task process is connected to the server process over the network, so specify the IP of the server process.

Now you can see how distributed processes work. Start the task_master.py service process:

$ python3 task_master.py 
Put task 3411...
Put task 1605...
Put task 1398...
Put task 4729...
Put task 5300...
Put task 7471...
Put task 68...
Put task 4219...
Put task 339...
Put task 7866...
Try get results...
Copy the code

After the task_master.py process sends the task, it waits for the result queue. Now start task_worker.py:

$python3 task_worker.py Connect to server 127.0.0.1... run task 3411 * 3411... run task 1605 * 1605... run task 1398 * 1398... run task 4729 * 4729... run task 5300 * 5300... run task 7471 * 7471... run task 68 * 68... run task 4219 * 4219... run task 339 * 339... run task 7866 * 7866... worker exit.Copy the code

The task_worker.py process terminates, and the task_master.py process continues to print the result:

Result: 3411 * 3411 = 11634921
Result: 1605 * 1605 = 2576025
Result: 1398 * 1398 = 1954404
Result: 4729 * 4729 = 22363441
Result: 5300 * 5300 = 28090000
Result: 7471 * 7471 = 55815841
Result: 68 * 68 = 4624
Result: 4219 * 4219 = 17799961
Result: 339 * 339 = 114921
Result: 7866 * 7866 = 61873956
Copy the code

What does this simple Master/Worker model do? In fact, this is a simple but truly distributed computing. If the code is slightly modified and multiple workers are started, tasks can be distributed to several or even dozens of machines. For example, the code for calculating N *n is replaced by sending emails, and asynchronous sending of mail queues can be realized.

Where is the Queue object stored? Note that task_worker.py has no code to create a Queue at all, so the Queue object is stored in task_master.py:

│ │ ├ ─ ─ task = manager.get_task_queue() │ │ ├ ─ task = manager.get_task_queue() │ │ ├ ─ task = manager.get_task_queue() │ │ │ result = manager. Get_result_queue () │ │ result = manager. Get_result_queue () │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ QueueManager │ │ │ │ │ │ │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ │ │ │ │ │ │ task_queue │ │ result_queue │ │ < ─ ─ ─ ┼ ─ ─ ┼ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ │ NetworkCopy the code

The Queue is accessible through the network through the QueueManager. Since QueueManager manages more than one Queue, give each Queue’s network call interface a name, such as get_task_queue.

What does authKey do? This is to ensure that the two machines can communicate normally and not be maliciously interfered by other machines. If the authkey of task_worker.py is different from the authkey of task_master.py, the connection cannot be made.

Regular expression

Regular expressions are a powerful weapon for matching strings. The idea is to define a rule for strings in a descriptive language, and any string that matches the rule is considered “matched”, otherwise, the string is illegal.

So the way we can tell if a string is a valid Email is:

Create a regular expression that matches Email.
The regular expression is used to match the user’s input to determine whether it is valid.

Since regular expressions are also represented as strings, we first need to understand how to describe characters in terms of characters.

In regular expressions, if characters are given directly, it is an exact match. \d can match a number, \w can match a letter or number, so:

'00\d'Can match'007'But there is no match'00A';
'\d\d\d'Can match'010';
'\w\w\d'Can match'py3';

. Can match any character, so:

'py.'Can match'pyc','pyo','py! 'And so on.

To match variable length characters, use * to represent any character (including 0), + to represent at least one character, and use? For 0 or 1 character, use {n} for N character, use {n,m} for n-m character:

Let’s take a complicated example: \d{3}\s+\d{3,8}.

Let’s read it from left to right:

\d{3}Matches three digits, for example'010';
\sCan match a space (including Tab and other whitespace), so\s+Indicates at least one space, such as match' '.' 'And so on;
\ d {3, 8}Represents 3-8 numbers, for example'1234567'.

Taken together, the above regular expression can match any number with an area code separated by any space.

What if you want to match numbers like ‘010-12345’? Since ‘-‘ is a special character, it is escaped with ” in regular expressions, so the above re is \d{3}-\d{3,8}.

However, ‘010-12345’ still cannot be matched because of Spaces. So we need more sophisticated matching.

The advanced

For more accurate matching, use [] to represent ranges, such as:

[0-9a-zA-Z_]Can match a number, letter, or underscore;
[0-9a-zA-Z_]+Can match at least one string consisting of digits, letters, or underscores, for example'a100'.'0_Z'.'Py3000'And so on;
[a-zA-Z_][0-9a-zA-Z_]*It can match any string that starts with a letter or underscore followed by any number, letter, or underscore, which is a Python valid variable;
[a-zA-Z_][0-9a-zA-Z_]{0, 19}More precisely limits the length of a variable to 1-20 characters (1 character in front + up to 19 characters after).

A | B can match A or B, so, P | P ython can match ‘Python’ or ‘Python’.

^ indicates the beginning of a line, and ^\d indicates that it must begin with a number.

$indicates the end of the line, and \d$indicates that it must end with a number.

You may have noticed that py can also match ‘python’, but adding ^py$makes it a full line match, which only matches ‘py’.

Re module

With this knowledge of preparation, we are ready to use regular expressions in Python. Python provides the RE module, which contains all the functionality of regular expressions. Since Python strings themselves are also escaped, be careful:

S = 'ABC\-001' # Python string # corresponds to the regular expression string: # 'abC-001'Copy the code

So we strongly recommend using Python’s r prefix and not worrying about escaping:

S = r' abC-001 '# Python string # corresponds to the regular expression string unchanged: #' abC-001 'Copy the code

Let’s look at how to determine if a regular expression matches:

> > > import re > > > re. The match (r '^ \ d {3} - \ d {3, 8} $',' 010-12345 ') < _sre. SRE_Match object; Span = (0, 9), match = '010-12345 > > > > re. The match (r' ^ \ d {3} - \ d {3, 8} $', '010 12345') > > >Copy the code

The match() method checks for a match and returns a match object if successful, or None otherwise. The common judgment method is:

Test = 'if re.match(r' regular expression ', test): print('ok') else: print('failed')Copy the code

Shred string

Using regular expressions to split strings is more flexible than using fixed characters.

>>> 'a b   c'.split(' ')
['a', 'b', '', '', 'c']
Copy the code

Well, can’t recognize consecutive Spaces, use regular expressions to try:

>>> re.split(r'\s+', 'a b   c')
['a', 'b', 'c']
Copy the code

No matter how many Spaces can be divided normally. Join, try:

>>> re.split(r'[\s,]+', 'a,b, c  d')
['a', 'b', 'c', 'd']
Copy the code

Add; Try:

>>> re.split(r'[\s,;] +', 'a,b;; c d') ['a', 'b', 'c', 'd']Copy the code

If the user enters a set of tags, remember to use regular expressions next time to convert the non-standard input into the correct array.

grouping

In addition to simply determining a match, regular expressions have the power to extract substrings. () represents the Group to be extracted. Such as:

^(\d{3})-(\d{3,8})$defines two groups, which can be directly extracted from the matching string area code and local number:

> > > m = re match (r '^ (\ d {3}) - (\ d {8}) $',' 010-12345 ') > > > m < _sre. SRE_Match object; span=(0, 9), match='010-12345'> >>> m.group(0) '010-12345' >>> m.group(1) '010' >>> m.group(2) '12345'Copy the code

If groups are defined in the regular expression, substrings can be extracted using the group() method on the Match object.

Note that group(0) is always the original string, group(1), group(2)… 1, 2… Is the string.

Extracting substrings is very useful. Here’s an even more egregious example:

>>> t = '19:05:30' >>> m = re.match(r'^(0[0-9]|1[0-9]|2[0-3]|[0-9]):(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-9]|[0-9]):(0[0-9]|1[0-9]|2[0-9]|3[0-9]| 4[0-9]|5[0-9]|[0-9])$', t) >>> m.groups() ('19', '05', '30')Copy the code

This regular expression directly identifies valid times. But there are times when regular expressions can’t be used for full validation, such as identifying dates:

'^ (0 [1-9] [0-2) | | 1 [0-9]) - (0 | 1 [1-9] [0-9] [0-9] | 2 | 3 | [0, 1] [0-9]) $'Copy the code

For ‘2-30’, ‘4-31’ such illegal dates, with the re is still unable to identify, or very difficult to write out, then need to program with recognition.

Greed match

Finally, it is important to note that re matching defaults to greedy matching, that is, matching as many characters as possible. For example, match a 0 after a number:

>>> re.match(r'^(\d+)(0*)$', '102300').groups()
('102300', '')
Copy the code

\d+ = 0* = 0* = 0* = 0* = 0* = 0*

You must make \d+ use non-greedy matches (that is, as few matches as possible) in order to match the following zeros and add? We can make \d+ use non-greedy matching:

>>> re.match(r'^(\d+?) (0*)$', '102300').groups() ('1023', '00')Copy the code

The graphical interface

Python supports a variety of graphical third-party libraries, and Python’s own library is Tkinter support for Tk, so using Tkinter, no need to install any packages, you can directly use.

Tkinter

Using Tkinter is very simple, let’s write a GUI version of “Hello, World!” .

The first step is to import all the contents of the Tkinter package:

import tkinter
Copy the code

The second step is to derive an Application class from Frame, which is the parent container of all widgets:

class Application(Frame): def __init__(self, master=None): Frame.__init__(self, master) self.pack() self.createWidgets() def createWidgets(self): self.helloLabel = Label(self, text='Hello, world! ') self.helloLabel.pack() self.quitButton = Button(self, text='Quit', command=self.quit) self.quitButton.pack()Copy the code

In a GUI, each Button, Label, input box, and so on is a Widget. A Frame is a Widget that can hold other widgets, all of which together form a tree.

The pack() method adds the Widget to the parent container and implements the layout. Pack () is the simplest layout, and Grid () can achieve more complex layouts.

In the createWidgets() method, we create a Label and a Button. When the Button is clicked, self.quit() is triggered to exit the program.

Step 3 instantiate the Application and start the message loop:

App = Application() # Set window title: app.master.title('Hello World') # Set main message loop: app.mainloop()Copy the code

mac No module named ‘_tkinter’!

If MAC reports an error with No module named ‘_tkinter

Install python – tk

brew install python-tk
Copy the code

The input text

import tkinter import tkinter.messagebox as messagebox class Application(tkinter.Frame): def __init__(self, master=None): tkinter.Frame.__init__(self, master) self.alertButton = tkinter.Button(self, text='Hello', command=self.hello) self.nameInput = tkinter.Entry(self) self.pack() self.createWidgets() def createWidgets(self): self.nameInput.pack() self.alertButton.pack() def hello(self): Name = self.nameinput.get () or 'world' messagebox.showinfo('Message', 'Hello, %s' % name) app = Application() # App.master. title('Hello World') # Main message loop: app.mainloop()Copy the code

The turtle drawing

In 1966, Seymour Papert and Wally Feurzig created a language for children to learn programming, the LOGO language, which featured a turtle being programmed to draw on the screen.

Turtle Graphics has since been ported to various high-level languages. Python has the Turtle library built in, which essentially replicates all the functionality of the original Turtle Graphics.

Let’s look at a simple code that tells a baby turtle to draw a rectangle:

From Turtle import * # Set brush width: width(4) # Forward: Forward (200) # Turn right 90 degrees: Right (90) # Brush color: pencolor('red') forward(100) right(90) pencolor('green') forward(200) right(90) pencolor('blue') forward(100) right(90) # call done() to make the window wait closed, otherwise the window will be closed immediately: done()Copy the code

It can be seen from the program code that the turtle drawing is to direct the turtle forward and turn, and the trajectory of the turtle movement is the drawn line. To draw a rectangle, just have the turtle go forward, turn 90 degrees to the right, four times.

The width() function is called to set the brush width, and the pencolor() function is called to set the color. Please refer to the turtle library for more instructions.

When the drawing is complete, remember to call the done() function and let the window enter the message loop, waiting to be closed. Otherwise, the window will be closed immediately, as the Python process will end immediately.

The Turtle package itself is just a drawing library, but with Python code, you can draw all kinds of complex graphics. For example, draw five pentacles through a loop:

from turtle import *

def drawStar(x, y):
    pu()
    goto(x, y)
    pd()
    # set heading: 0
    seth(0)
    for i in range(5):
        fd(40)
        rt(144)

for x in range(0, 250, 50):
    drawStar(x, 0)

done()
Copy the code

Import turtle # set colormode to RGB: turtle.colormode(255) turtle.lt(90) lv = 14 l = 120 s = 45 turtle.width(lv) # initialize RGB color: r = 0 g = 0 b = 0 turtle.pencolor(r, g, b) turtle.penup() turtle.bk(l) turtle.pendown() turtle.fd(l) def draw_tree(l, level): global r, g, B # save the current pen width w = turtle.width() # narrow the pen width turtle.width(w * 3.0/4.0) # set color: r = r + 1 g = g + 2 b = b + 3 turtle.pencolor(r % 200, g % 200, B % 200) l = 3.0/4.0 * l turtle.lt(s) turtle.fd(l) if level < lv: draw_tree(l, level + 1) turtle.bk(l) turtle.rt(2 * s) turtle.fd(l) if level < lv: draw_tree(l, level + 1) turtle.bk(l) turtle.lt(s) # restore the previous pen width turtle.width(w) turtle.speed("fastest") draw_tree(l, 4) turtle.done()Copy the code

Network programming

Network communication is communication between two processes on two computers. For example, a browser process communicates with a Web service process on a Sina server, while a QQ process communicates with a process on a Tencent server.

TCP/IP

In order to connect all the different types of computers in the world, there must be a universal Protocol. To achieve this goal, the Internet Protocol Suite is the universal Protocol standard. Internet is a combination of the words inter and NET, the original meaning is to connect the “network” of the network, with the Internet, any private network, as long as support this protocol, can be connected to the Internet.

Because the Internet protocol contains hundreds of protocol standards, but the two most important protocols are TCP and IP, so people call the Internet protocol TCP/IP protocol.

When communicating, both parties must know each other’s identifiers, just as they must know each other’s email addresses when sending emails. The unique identification of each computer on the Internet is the IP address, similar to 123.123.123.123. If a computer is connected to two or more networks at the same time, such as a router, it will have two or more IP addresses, so the IP address corresponds to the computer’s network interface, usually a network card.

The IP protocol is responsible for sending data from one computer to another over the network. The data is sliced into bits and sent out via IP packets. Because of the complexity of Internet links, there are often multiple lines between two computers, so the router is responsible for deciding how to forward an IP packet. IP packets are sent in blocks and through multiple routes, but they are not guaranteed to arrive or arrive sequentially.

An IP address is actually a 32-bit integer (called IPv4). An IP address such as 192.168.0.1 is actually a string of 32-bit integers grouped into 8-bit numbers for easy reading.

IPv6 address is actually a 128 – bit integer, it is currently using IPv4 upgrade, expressed as a string similar to 2001-0 db8:85 a3:0042-1000:8 a2e: 0370-7334.

TCP is based on IP. TCP is responsible for establishing a reliable connection between two computers and ensuring that packets arrive in sequence. The TCP protocol establishes the connection by shaking hands, and then numbers each IP packet to ensure that the other party receives the packet in order, and automatically resends the packet if it is lost.

Many of the more advanced protocols commonly used are based on TCP, such as HTTP for browsers, SMTP for sending mail, and so on.

A TCP packet contains not only the data to be transmitted, but also the source IP address, destination IP address, source port, and destination port.

What are ports for? When two computers communicate, simply sending IP addresses is not enough, because multiple network programs are running on the same computer. When a TCP packet is sent to the browser or QQ, the port number is needed to distinguish it. Each network program requests a unique port number from the operating system, so that two processes need separate IP addresses and separate port numbers to establish a network connection between two computers.

A process may also be linked to more than one machine at a time, so it may claim many ports.

TCP programming

Socket is an abstract concept of network programming. To open a Socket, you need to know the IP address and port number of the target computer, and then specify the protocol type.

The client

Most connections are reliable TCP connections. When a TCP connection is created, the client that initiates the connection is called the client, and the server that responds the connection is called the server.

# import socket library: S = socket.socket(socket.af_inet, socket.sock_stream) # import socket # set up a connection: s.connect(('www.sina.com.cn', 80))Copy the code

To initiate a TCP connection, the client must know the IP address and port number of the server. The IP address of sina website can be automatically converted to IP address with domain name www.sina.com.cn, but how to know the port number of Sina server?

The answer is that as a server, what kind of service, the port number must be fixed. Because we want to access Web pages, sina’s Web server has to fix the port number on port 80, which is the standard port for Web services. Other services have standard port numbers, such as SMTP port 25, FTP port 21, and so on. The port number smaller than 1024 is a standard Internet service port. The port number larger than 1024 can be used arbitrarily.

After the TCP connection is established, we can send a request to sina server to return to the contents of the home page:

S.end (b 'get/HTTP/1.1\r\nHost: www.sina.com.cn\r\nConnection: close\r\n\r\n')Copy the code

A TCP connection creates a two-way channel. Both parties can send data to each other at the same time. But who will be sent first and who will be sent later, and how to coordinate, should be decided according to specific agreements. For example, the HTTP protocol requires the client to send a request to the server before the server sends data to the client.

The format of the text sent must conform to the HTTP standard. If the format is correct, then you can receive the data returned by sina server:

If d: buffer. Append (d) else: if d: buffer. Append (d) break data = b''.join(buffer)Copy the code

When data is received, the recv(Max) method is called to receive up to a specified number of bytes at a time, so it is received repeatedly in a while loop until recv() returns empty data, indicating completion and exiting the loop.

When we have received the data, we call the close() method to close the Socket, thus ending a complete network communication:

# close the connection: s.close()Copy the code

The data received includes the HTTP header and the page itself. We just need to separate the HTTP header from the page, print out the HTTP header, and save the page content to a file:

Split (b'\r\n\r\n', 1) print(header.decode('utf-8')) with open('sina.html', 'wb') as f: f.write(html)Copy the code

Now, just open the sina.html file in your browser and you’ll see sina’s home page.

The server

The server process first binds a port and listens for connections from other clients. If a client is connected, the server establishes a Socket connection with the client, and the subsequent communication is connected by the Socket.

Therefore, the server turns on fixed port (such as 80) listening and creates that Socket connection every time a client connection comes in. Because the server has a large number of connections from clients, the server needs to be able to distinguish which client a Socket connection is bound to. A Socket uniquely identifies a Socket based on four items: server address, server port, client address, and client port.

However, the server also needs to respond to requests from multiple clients at the same time, so each connection needs to be handled by a new process or a new thread. Otherwise, the server can only serve one client at a time.

Let’s write a simple server program that accepts a client connection, adds Hello to the string sent by the client and sends it back.

First, create a Socket based on IPv4 and TCP:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
Copy the code

Next, we bind the listening address and port. A server may have multiple network adapters. You can use 0.0.0.0 to bind the IP address of one network adapter, 0.0.0.0 to all network addresses, or 127.0.0.1 to the local address. 127.0.0.1 is a special IP address that represents the local IP address. If you bind to this IP address, the client must be running on the local IP address to connect to, that is, external computers cannot connect to it.

The port number must be specified in advance. Since the service we are writing is not a standard service, we use the port number 9999. Note that port numbers less than 1024 must have administrator privileges to bind:

S.bind (('127.0.0.1', 9999))Copy the code

Next, the listen() method is called to start listening on the port, passing in an argument that specifies the maximum number of pending connections:

s.listen(5) print('Waiting for connection... ')Copy the code

Next, the server program accepts a connection from the client in a permanent loop, and Accept () waits and returns a connection from the client:

While True: accept a new connection: sock, addr = s.acept () t = threading.Thread(target=tcplink, args=(sock, addr)) t.start()Copy the code

A new thread (or process) must be created for each connection; otherwise, a single thread cannot accept connections from other clients while it is processing the connection:

def tcplink(sock, addr): print('Accept new connection from %s:%s... ' % addr) sock.send(b'Welcome! ') while True: data = sock.recv(1024) time.sleep(1) if not data or data.decode('utf-8') == 'exit': break sock.send(('Hello, %s! ' % data.decode('utf-8')).encode('utf-8')) sock.close() print('Connection from %s:%s closed.' % addr)Copy the code

After the connection is established, the server sends a welcome message, waits for the client data, and then sends Hello to the client. If the client sends the exit string, the connection is closed.

To test the server program, we also need to write a client program:

S = socket.socket(socket.AF_INET, socket.sock_stream) Print (s.recv(1024).decode(' utF-8 ')) for data in [b'Michael', b'Tracy', b'Sarah']: print(s.recv(1024).decode(' utF-8 ')) for data in [b'Michael', B 'Tracy', b'Sarah']: s.send(data) print(s.recv(1024).decode('utf-8')) s.send(b'exit') s.close()Copy the code

We need to open two command-line Windows, one to run the server program and the other to run the client program, to see the effect:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ Command Prompt - - │ x ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ $python echo_server. Py │ │ Waiting for connection... │ │Accept new connection from 127.0.0.1:64398... │ │ Connection from 127.0.0.1:64398 closed. │ │ │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ Command Prompt - - - x │ │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ │ $python echo_client. Py │ │ │ Welcome! │ │ │ Hello, Michael! │ └ ─ ─ ─ ─ ─ ─ ─ ┤ Hello, Tracy! │ │ Hello, Sarah! $│ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

Note that the client program exits when it finishes running, while the server program runs forever and must press Ctrl+C to exit the program.

UDP programming

TCP establishes a reliable connection, and both parties can send data in the form of streams. In contrast to TCP, UDP is a connectionless protocol.

Using UDP, you do not need to establish a connection. You only need to know the IP address and port number of the peer party to send data packets. But we don’t know if we can get there.

Although data transmission through UDP is not reliable, it has the advantage of being faster than TCP. For data that does not require reliable arrival, UDP can be used.

Let’s look at how to transfer data over UDP. Similar to TCP, UDP communication is divided into client and server. The server first needs to bind ports:

Socket (socket.AF_INET, socket.SOCK_DGRAM) # bind port: s.bind(('127.0.0.1', 9999))Copy the code

When creating a Socket, SOCK_DGRAM specifies that the Socket type is UDP. A bound port is the same as TCP, but instead of calling the Listen () method, it receives data directly from any client:

print('Bind UDP on 9999... Print ('Received from %s:%s.' % addr) s.sendto(b'Hello, %s! ' % data, addr)Copy the code

The recvfrom() method returns the data and the address and port of the client, so that when the server receives the data, it calls Sendto () to send it to the client using UDP.

Note that I omitted multithreading, because this is a very simple example.

When the client uses UDP, it first creates a UDP-based Socket, and then sends data to the server via sendto() without calling connect() :

Socket (socket.AF_INET, socket.SOCK_DGRAM) for data in [b'Michael', B 'Tracy', B 'Sarah']: S.s endto (data, (9999) '127.0.0.1') # receive data: print (s.r ecv (1024). The decode (' utf-8) s.c lose ()Copy the code

The recv() method is still called when receiving data from the server.

The server and client tests are still started with two command lines, with the following results:

┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ Command Prompt - - │ x ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ $python udp_server. Py │ │ Bind UDP on 9999... │ │ a Received from 127.0.0.1:63823... │ │ a Received from 127.0.0.1:63823... │ │ a Received from 127.0.0.1:63823... │ │ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ Command Prompt - - - x │ │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ │ $python udp_client. Py │ │ │ Welcome! │ │ │ Hello, Michael! │ └ ─ ─ ─ ─ ─ ─ ─ ┤ Hello, Tracy! │ │ Hello, Sarah! $│ │ │ │ │ │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code

UDP is used similarly to TCP, but does not require the establishment of a connection. In addition, the UDP port and TCP port bound to the server do not conflict. That is, UDP port 9999 and TCP port 9999 can be bound separately.

Using MySQL

MySQL installation

You can download the latest Community Server 5.6.x directly from the MySQL website. MySQL is cross-platform. Select a platform to download the installation file and install it.

During the installation, MySQL will prompt you to enter the password of user root. If you’re afraid you won’t remember, set your password to password.

On Windows, choose utF-8 encoding when installing to handle Chinese properly.

On Mac or Linux, you need to edit the MySQL configuration file to change the default encoding of the database to UTF-8. CNF /etc/ mysql.cnf /etc/ mysql.cnf /etc/ mysql.cnf

[client]
default-character-set = utf8

[mysqld]
default-storage-engine = INNODB
character-set-server = utf8
collation-server = utf8_general_ci
Copy the code

After MySQL restarts, you can use MySQL client command line to check the encoding:

$ mysql -u root -p Enter password: Welcome to the MySQL monitor... . mysql> show variables like '%char%'; +--------------------------+--------------------------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | / usr/local/mysql - 5.1.65 - osx10.6 - x86_64 / share/charsets / | + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - + 8 rows in the set (0.00 SEC)Copy the code

Seeing utF8 indicates that the encoding is set correctly.

Installing the MySQL driver

Since the MySQL server runs as a separate process and serves external services over the network, a Python-enabled MySQL driver is required to connect to the MySQL server. Allow-external: mysql-connector-python allow-external: mysql-connector-python: mysql-connector-python: allow-external

 pip install mysql-connector-python --allow-external mysql-connector-python
Copy the code

If the above command fails to install, try another driver:

$ pip install mysql-connector
Copy the code

We demonstrate how to connect to the test database on the MySQL server:

>>> import mysql.connector # >>> conn = mysql.connector.connect(user='root', password='password', Database ='test') >>> cursor = conn.cursor() >>> cursor.execute('create table user (id varchar(20) primary key, name varchar(20))') >>> cursor.execute('insert into user (id, name) values (%s, %s)', ['1', 'Michael']) >>> cursor.rowcount 1 # commit transaction: >>> conn.mit () >>> cursor.close() >>> cursor = conn.cursor() >>> cursor.execute('select * from user where id = %s', Fetchall () >>> VALUES [('1', 'Michael')] # close cursor and Connection: >>> cursor.close() True >>> conn.close()Copy the code

Web development

WSGI interface

Knowing the HTTP protocol and HTML documents, we understand the essence of a Web application:

The browser sends an HTTP request;
The server receives the request and generates an HTML document;
The server sends the HTML document to the browser as the Body of the HTTP response;
The browser receives the HTTP response, pulls the HTML document from the HTTP Body and displays it.

So, the simplest Web application is to save the HTML file, use an off-the-shelf HTTP server software, receive the user’s request, read the HTML from the file, and return it. Apache, Nginx, Lighttpd, and other common static servers do just that.

If you want to generate HTML dynamically, you need to do the above steps yourself. However, accepting HTTP requests, parsing HTTP requests, and sending HTTP responses are all hard work, and if we were to write the underlying code ourselves, it would take us months to read the HTTP specification before we even started writing dynamic HTML.

Instead, the underlying code is implemented by specialized server software, and we use Python to focus on generating HTML documents. Because we don’t want to be exposed to TCP connections, HTTP raw request and response formats, we need a unified interface that lets us focus on writing Web business in Python.

This Interface is WSGI: Web Server Gateway Interface.

The WSGI interface definition is very simple, requiring Web developers to implement a single function to respond to HTTP requests. Let’s take a look at the simplest Web version of the Hello, Web! :

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    return [b'<h1>Hello, web!</h1>']
Copy the code

The application() function above is a WSGI compliant HTTP handler that takes two arguments:

Environ: an object that contains all HTTP request informationdictObject;
Start_response: a function that sends an HTTP response.

In the application() function, call:

start_response('200 OK', [('Content-Type', 'text/html')])
Copy the code

The Header of the HTTP response is sent. Note that the Header can only be sent once, that is, the start_response() function can only be called once. The start_response() function takes two parameters, an HTTP response code and a list of HTTP headers, each represented by a tuple containing two STR.

In general, you should send the Content-Type hair to the browser. Many other commonly used HTTP headers should also be sent.

The function then returns the value b’

Hello, web!

‘ is sent to the browser as the Body of the HTTP response.

With WSGI, all we care about is getting HTTP requests from environ dict objects, constructing HTML, sending headers via start_Response (), and returning the Body.

The entire application() function itself is not involved in any part of parsing HTTP, which means that we don’t need to write the underlying code ourselves, just worry about how to respond to requests at a higher level.

How to call the application() function? If we call environ and start_response ourselves, we cannot provide environ and start_Response, and the returned bytes cannot be sent to the browser.

So the application() function must be called by the WSGI server. There are many WSGI compliant servers, and we can pick and choose one to use. But for now, we just want to test as soon as possible that the application() function we wrote can actually output HTML to the browser, so let’s find the simplest WSGI server we can and get our Web application up and running.

The good news is that Python has a WSGI server built in. This module is called WSGIref, which is a reference implementation of WSGI server written in pure Python. A “reference implementation” is an implementation that is fully compliant with WSGI standards, but is only for development and testing without regard to any operational efficiency.

Run the WSGI service

We’ll start by writing Hello. py to implement WSGI handlers for our Web application:

# hello.py

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/html')])
    return [b'<h1>Hello, web!</h1>']
Copy the code

Then, write a server.py that starts the WSGI server and loads the application() function:

Wsgiref. Simple_server import make_server From hello import application # create a server with an empty IP address and port 8000. httpd = make_server('', 8000, application) print('Serving HTTP on port 8000... ') # start listening for HTTP requests: httpd.serve_forever()Copy the code

Make sure both files are in the same directory, then start the WSGI server by typing Python server.py on the command line:

After the startup is successful, open your browser and enterhttp://localhost:8000/, you can see the result:

按Ctrl+CTerminate the server.

If you think this Web application is too simple, you can modify it a bit and read PATH_INFO from environ to display more dynamic content:

# hello.py def application(environ, start_response): start_response('200 OK', [('Content-Type', 'text/html')]) body = '<h1>Hello, %s! </h1>' % (environ['PATH_INFO'][1:] or 'web') return [body.encode('utf-8')]Copy the code

You can enter the username in the address bar as part of the URL, which will return Hello, XXX!

Using a Web Framework

Each URL can correspond to GET and POST requests, as well as PUT and DELETE requests, but we generally consider only the most common GET and POST requests.

The simplest idea is to take the HTTP request information from the environ variable and judge one by one:

def application(environ, start_response):
    method = environ['REQUEST_METHOD']
    path = environ['PATH_INFO']
    if method=='GET' and path=='/':
        return handle_home(environ, start_response)
    if method=='POST' and path='/signin':
        return handle_signin(environ, start_response)
    ...
Copy the code

The reason for this unmaintainable code is that the WSGI interface, while much more advanced than the HTTP interface, is still relatively low level compared to the processing logic of Web Apps. We need further abstraction on top of the WSGI interface. Let’s focus on processing a URL with a function. Leave it to the Web framework.

Because it is so easy to develop a Web framework in Python, Python has hundreds of open source Web frameworks. Here we will not discuss the advantages and disadvantages of various Web frameworks, and will directly choose a popular Web framework – Flask to use.

Flask is easier to write Web apps in than the WSGI interface (if it’s more complicated than WSGI, why use frameworks?). Install Flask with PIP:

$ pip install flask
Copy the code

Then write an app.py that handles three urls:

GET /: Home page, go backHome;
GET /signin: Login page, display login form;
POST /signin: Processes the login form and displays the login result.

Note that the same URL/signin has both GET and POST requests, mapped to two handlers.

Flask automatically associates urls with functions internally through Python’s decorators, so we write code that looks like this:

from flask import Flask from flask import request app = Flask(__name__) @app.route('/', methods=['GET', 'POST']) def home(): return '<h1>Home</h1>' @app.route('/signin', methods=['GET']) def signin_form(): return '''<form action="/signin" method="post"> <p><input name="username"></p> <p><input name="password" type="password"></p> <p><button type="submit">Sign In</button></p> </form>''' @app.route('/signin', methods=['POST']) def signin(): If request. Form ['username']=='admin' and request. Form ['password']=='password': return '<h3>Hello, admin! </h3>' return '<h3>Bad username or password.</h3>' if __name__ == '__main__': app.run()Copy the code

Flask Server: Run Python app.py, Flask Server listening on port 5000:

$python app.py * Running on http://127.0.0.1:5000/Copy the code

Open your browser and enter http://localhost:5000/ :

Home page display correct!

And then input http://localhost:5000/signin in your browser’s address bar, will display the login form:

Enter the default user nameadminAnd passwordpasswordLogin succeeded

The login fails when other incorrect user names and passwords are entered

The actual Web App should get the user name and password, check and compare with the database to judge whether the user can log in successfully.

In addition to Flask, common Python Web frameworks are:

Django: Versatile Web framework;
Web.py: a small Web framework;
Bottle: A Web framework similar to Flask;
Tornado: Facebook’s open source asynchronous Web framework.

Use the template

from flask import Flask, request, render_template

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def home():
    return render_template('home.html')

@app.route('/signin', methods=['GET'])
def signin_form():
    return render_template('form.html')

@app.route('/signin', methods=['POST'])
def signin():
    username = request.form['username']
    password = request.form['password']
    if username=='admin' and password=='password':
        return render_template('signin-ok.html', username=username)
    return render_template('form.html', message='Bad username or password', username=username)

if __name__ == '__main__':
    app.run()
Copy the code

Flask uses the render_template() function to render templates. Like Web frameworks, There are many types of Python templates. Flask supports jinja2 by default, so we install Jinja2 directly:

$ pip install jinja2
Copy the code

Then, start writing the Jinja2 template:

home.html

Template to display the home page:

<html>
<head>
  <title>Home</title>
</head>
<body>
  <h1 style="font-style:italic">Home</h1>
</body>
</html>
Copy the code

form.html

The template to display the login form:

<html> <head> <title>Please Sign In</title> </head> <body> {% if message %} <p style="color:red">{{ message }}</p> {% endif %} <form action="/signin" method="post"> <legend>Please sign in:</legend> <p><input name="username" placeholder="Username" value="{{ username }}"></p> <p><input name="password" placeholder="Password" type="password"></p>  <p><button type="submit">Sign In</button></p> </form> </body> </html>Copy the code

signin-ok.html

Successful login template:

<html> <head> <title>Welcome, {{ username }}</title> </head> <body> <p>Welcome, {{ username }}! </p> </body> </html>Copy the code

Asynchronous I/o

In the IO programming section, we learned that cpus are much faster than disk, network, and other IO. In a thread, the CPU executes code extremely fast. However, once it encounters IO operations, such as reading and writing files or sending network data, it needs to wait for the I/O operation to complete before it can proceed to the next operation. This situation is called synchronous IO.

Another approach to IO problems is asynchronous IO. When code needs to perform a time-consuming IO operation, it simply issues IO instructions, does not wait for the IO result, and then executes other code. After a period of time, when the IO returns a result, the CPU is notified to process it.

The asynchronous IO model requires a message loop in which the main thread repeats the “read message – process message” process:

loop = get_event_loop()
while True:
    event = loop.get_event()
    process_event(event)
Copy the code

asyncio

Asyncio is a standard library introduced in Python 3.4 that provides direct built-in support for asynchronous IO. Asyncio asyncio Hello world

import asyncio @asyncio.coroutine def hello(): print("Hello world!" Asyncio.sleep (1): r = yield from asyncio.sleep(1) print("Hello again! Coroutine loop.run_until_complete(hello()) loop.close()Copy the code

Coroutine @asyncio.coroutine marks a generator as a Coroutine, and then we throw the coroutine into an EventLoop.

Hello () first prints Hello World! The yield from syntax then allows us to easily invoke another generator. Since asyncio.sleep() is also a coroutine, the thread does not wait for asyncio.sleep(), but instead interrupts and executes the next message loop. When asyncio.sleep() returns, the thread can get the return value (None in this case) from yield and proceed to the next line.

Think of asyncio.sleep(1) as an IO operation that takes 1 second, during which the main thread does not wait but instead executes any other executable coroutine in the EventLoop, thus enabling concurrent execution.

Let’s try wrapping two coroutines in Task:

import threading
import asyncio

@asyncio.coroutine
def hello():
    print('Hello world! (%s)' % threading.currentThread())
    yield from asyncio.sleep(1)
    print('Hello again! (%s)' % threading.currentThread())

loop = asyncio.get_event_loop()
tasks = [hello(), hello()]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Copy the code

Observe the execution:

Hello world! (<_MainThread(MainThread, started 140735195337472)>) Hello world! <_MainThread(MainThread, started 140735195337472)> (<_MainThread(MainThread, started 140735195337472)>) Hello again! (<_MainThread(MainThread, started 140735195337472)>)Copy the code

As you can see from the printed current thread name, both coroutines are executed concurrently by the same thread.

If asyncio.sleep() is replaced with a true IO operation, multiple coroutines can be executed concurrently by a single thread.

We use asyncio’s asynchronous network connection to get the homepage of Sina, SOhu and 163:

import asyncio @asyncio.coroutine def wget(host): print('wget %s... ' % host) connect = asyncio.open_connection(host, 80) reader, Writer = yield from connect header = 'GET/HTTP/1.0\r\nHost: %s\r\n\r\n' % host writer.write(header.encode('utf-8')) yield from writer.drain() while True: line = yield from reader.readline() if line == b'\r\n': break print('%s header > %s' % (host, line.decode('utf-8').rstrip())) # Ignore the body, close the socket writer.close() loop = asyncio.get_event_loop() tasks = [wget(host) for host in ['www.sina.com.cn', 'www.sohu.com', 'www.163.com']] loop.run_until_complete(asyncio.wait(tasks)) loop.close()Copy the code

The result is as follows:

wget www.sohu.com... wget www.sina.com.cn... wget www.163.com... www.sohu.com header > HTTP/1.1 200 OK www.sohu.com header > Content-type: text/ HTML... www.sina.com.cn header > HTTP/1.1 200 OK www.sina.com.cn header > Date: Wed, 20 May 2015 04:56:33 GMT ... www.163.com header > HTTP/1.0 302 Moved Temporarily www.163.com header > Server: Cdn Cache Server V2.0...Copy the code

You can see that three connections are made concurrently by one thread over the Coroutine.

async/await

With the @asyncio.coroutine provided by Asyncio, you can mark a generator as a Coroutine type and then invoke another coroutine within the coroutine with yield from for asynchronous operations.

To simplify and better identify asynchronous IO, new syntactic async and await were introduced starting with Python 3.5 to make Coroutine code more concise and readable.

Note that async and await are the new syntax for coroutine, and to use the new syntax, just do two simple substitutions:

the@asyncio.coroutineReplace withasync;
theyield fromReplace withawait.

Let’s compare the code from the previous section:

@asyncio.coroutine def hello(): print("Hello world!" ) r = yield from asyncio.sleep(1) print("Hello again!" )Copy the code

Rewrite it with the new syntax as follows:

async def hello(): print("Hello world!" ) r = await asyncio.sleep(1) print("Hello again!" )Copy the code

The rest of the code stays the same.

The previous chapterYou already know the basics of Python

And we’ll do a little demo of PY later on

reference

Refer to The Python tutorial by Liao Xuefeng

See Python’s official website

You’ve learned python before

Processes and threads

Multiple processes

multiprocessing

Pool

The child process

Interprocess communication

multithreading

Lock

Multicore cpus

ThreadLocal

Process vs. thread

Computation-intensive vs. IO intensive

Distributed process

Regular expression

The advanced

Re module

Shred string

grouping

Greed match

The graphical interface

Tkinter

mac No module named ‘_tkinter’!

The input text

The turtle drawing

Network programming

TCP/IP

TCP programming

The client

The server

UDP programming

Using MySQL

MySQL installation

Installing the MySQL driver

Web development

WSGI interface

Hello, web!

Run the WSGI service

Using a Web Framework

Use the template

home.html

form.html

signin-ok.html

Asynchronous I/o

asyncio

async/await

The previous chapterYou already know the basics of Python

reference

Related Posts

Hadoop Learning Notes – 01 Big Data Enlightenment

FastDFS cluster setup

I’m barely 35 and already bald