“This is the 14th day of my participation in the First Challenge 2022.

I planned to write crawler # 25 directly, but found that it required knowledge about the Queue. I checked the “Snowballing Python” column written at the same time, but there was no related blog. This leads to an additional article, which happens to place him in the Python 120 Crawlers.

The Queue module

Before studying, you can directly open the official manual for comparison study.

The Queue module in Python provides a synchronous, thread-safe Queue class, It includes the usual FIFO (first in, first out), LIFO (last in, first out), PriorityQueue (queue by priority), and simple queues of the first in, first out type (SimpleQueue, new in version 3.7).

First in, first out, after in, first out these are some basic knowledge of data structure, no longer expand the explanation, directly baidu related keywords can be learned.

Remember that stacks are fifin, queues are fifin, fifout.

The initial queue can start with a few concepts:

  • Initialize a queue: Create an empty queue;
  • 1. To add data to a matchup;
  • Out of the team: the first data from the team;
  • Destroy queue: Delete queue-related data.

Initialize a queue

We can import the Queue module directly in Python and initialize it:

import queue

q = queue.Queue(maxsize=5)
print(type(q))
Copy the code

The maxsize parameter is an integer that indicates the maximum length of the queue. In practice, when the queue reaches its maximum, data is blocked until data is queued out. The default value of this parameter is 0, which means that the queue length is unlimited. If the value is set to negative, the queue length is infinite.

If you want to declare a different type of queue, use the following code:

Last in, first out
q = queue.LifoQueue(maxsize=5)
# priority queue
q = queue.PriorityQueue(maxsize=5)
# FifO simple queue with no size limit
q = queue.SimpleQueue()
Copy the code

The four queues differ as follows:

  1. queue.Queue(maxsize=0): First-in, first-out (FIFO), the first data to enter the queue goes out first.
  2. queue.LifoQueue(maxsize=0): The last data to enter the queue exits the queue first.
  3. PriorityQueue(maxsize=0): Compares the size of each data in the queue, and the data with the smallest value exits the queue first.
  4. queue.SimpleQueueAnd:1Similar, just a simple queue, without some advanced methods.

Adding queue data

To add data to a queue, put a value into the queue. This method has the following format:

Queue.put(item, [block[, timeout]])
Copy the code

Block defaults to True. If block is True and timeout is None (the default), put() causes the calling thread to block here until a data unit is empty, as follows:

# Do not run the code, it will enter suspended animation wait state.
import queue

q = queue.Queue(maxsize=2)
q.put("Rubber",block=True)
q.put("Skin",block=True)
q.put("Brush",block=True)

print(q)
Copy the code

If timeout is a positive number, it blocks for at most timeout seconds, and if no free slots are available during that time, the Full exception is raised. If block is False, the item is put into the queue if the free slot is immediately available, otherwise the Full exception is raised (timeout is ignored in this case).

import queue

q = queue.Queue(maxsize=2)
q.put("Rubber",block=True)
q.put("Skin",block=True)
q.put("Brush",block=True,timeout=3)

print(q)
Copy the code

Run the code and wait for 3s. The following exception occurs.

    raise Full
queue.Full
Copy the code

The queue value

The code format is as follows:

Queue.get(block=True, timeout=None)
Copy the code

This function takes a value from the queue header and removes it from the queue, blocking if necessary until the item is available if the optional arguments block is True and timeout is None (the default). If timeout is a positive number, it blocks for at most timeout seconds, and if the item is not available during that time, the Empty exception is raised. Otherwise (block is False), return an item if it is immediately available, otherwise raise the Empty exception (in which case timeout is ignored).

import queue

q = queue.Queue(maxsize=4)
q.put("Rubber",block=True)
q.put("Skin",block=True)
q.put("Brush",block=True,timeout=3)

item1 = q.get()
item2 = q.get()
item3 = q.get()
print(item1,item2,item3)
Copy the code

Other common methods

The basic use of queues is as simple as above, and the rest is method-level applications.

  • q.qsize(): Queue size;
  • q.empty(): Checks whether the queue is empty.
  • q.full(): Checks whether the queue is full.
  • q.get_nowait(): equivalent to theq.get(False), the method uses reference to the above part of the queue value;
  • q.put_nowait(item) : equivalent to theq.put(item, False);
  • q.task_done()withq.join()Read on for more on these two methods.

Q.ask_done () and q.jin () methods

SimpleQueue does not support task_done and join methods.

Before you start, look at the instructions for these two methods:

  • task_done: indicates that the element in the queue has been fetched, i.e., eachgetAfter getting an element, calltask_doneRaises if the number of calls is greater than the element format in the queueValueErrorThe exception;
  • joinBlock until all elements in the queue are fetched and executed, as long as elements are added to the queuequeue,joinIt doesn’t block, or it can be interpreted as waiting until the queue is empty before doing something else.

If the text is hard to understand, the following queue will always block because task_done is not called after the first put.

import queue

q = queue.Queue(3)
q.put('oak', block=True, timeout=5)
q.put_nowait('skin')
q.task_done()
print(q.get())
q.join()
Copy the code

Unblocking is as simple as adding a task_done call after each put method.

import queue

q = queue.Queue(3)
q.put('oak', block=True, timeout=5)
q.task_done()
q.put_nowait('skin')
q.task_done()

print(q.get())
q.join()
Copy the code

If you still have trouble understanding, you can refer to the following examples:

import queue
q = queue.Queue()
q.put('oak')
q.put('skin')
q.put('clean')
for i in range(3) :print(q.get())
    If task_done is not executed, the join will block until task_done tells it that the data processing is complete
    # q.task_done()
q.join()
Copy the code

The official manual also provides a multithreading case for reference, here is also marked for you, and provide Chinese annotations.

import threading, queue

# initialize an empty queue with no limit on length
q = queue.Queue()

def worker() :
    while True:
        item = q.get()
        print(F 'is executing:{item}')
        print(F 'complete:{item}')
        Send the command to complete the task
        q.task_done()

# Enable multithreading
threading.Thread(target=worker, daemon=True).start()

# set 30 puts
for item in range(30):
    q.put(item)

print('All tasks have been completed \n', end=' ')

Block until all tasks are complete
q.join()
print('All Missions completed')
Copy the code

Landing of queues on crawler applications

Since queue is a common way of communication between threads and has its own locking mechanism, it will naturally be applied to crawler collection, which is generally developed using the producer-consumer mode.

In the following case, there is a producer [eraser] who makes a set of courses every 5 seconds. There are two consumers who monitor whether the eraser makes a course and buy the course when the eraser makes a course:

from queue import Queue
import time
import threading

Initialize a queue
q = Queue(maxsize=0)


# producers
def producer(name) :
    course_num = 1
    while True:
        q.put('The {} set of courses made'.format(course_num))
        print("The {} set of courses made by {}".format(name, course_num))
        course_num += 1
        time.sleep(5)


# consumers
def consumer(name) :
    while True:
        print('{} bought {}'.format(name, q.get()))
        time.sleep(1)
        q.task_done()


Start three processes
t1 = threading.Thread(target=producer, args=('Eraser',))
t2 = threading.Thread(target=consumer, args=('A' CSDN account,))
t3 = threading.Thread(target=consumer, args=('B' CSDN account,))

t1.start()
t2.start()
t3.start()
Copy the code

Priority queue

Next, I will explain the priority queue. The order of the priority queue is related to the priority, so when joining the queue, you need to arrange the data in a good order.

import queue

q = queue.PriorityQueue(5)
q.put((5.'dream'))
q.put((4.'want to'))
q.put((4.'oak'))
q.put((3.'skin'))
q.put((2.'clean'))


print(q.get())
print(q.get())
print(q.get())
print(q.get())
print(q.get())
Copy the code

The output order is as follows:

(2, 'wipe ') (3,' PI ') (4, 'think ') (4,' oak ') (5, 'dream ')Copy the code

Write in the back

This article is a part of the 120 Crawlers column, but it will not take up the length of the case. After all, starting from the next article, we will combine the threading module and queue module to implement multi-threaded crawlers.

The crawler, 120 cases of code warehouse address: codechina.csdn.net/hihell/pyth… Go and give a follow or Star.

== come all, don’t send a comment, click a like, receive a hide? = =

Today is day 207/365 of continuous writing. You can follow me, like me, comment on me, favorites me.