Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

One: ADBAPI in Twisted

Commit () and execute() of Pymysql commit data synchronously to the database, due to the framework of the parsing and asynchronous multithreading, so the speed of parsing data is far faster than the data write database. If data is written too slowly, database write will be blocked, affecting database write efficiency. The Twisted asynchronous IO framework is used to write data asynchronously. Data is written asynchronously by multiple threads, which improves the data writing speed.

1.1 Two main methods

Adbapi. ConnectionPool:

Create a database connection pool object that contains multiple connection objects, each working in a separate thread. Adbapi simply provides a programming framework for asynchronous database access, while internally allowing libraries like MySQLdb to access the database.

Dbpool. RunInteraction (do_insert, item) :

When do_INSERT is called asynchronously, dbPool selects a connection object from the connection pool and calls insert_DB in a separate thread, where item is passed to the second argument of DO_INSERT, and the first argument passed to do_INSERT is a Transaction object. The interface is similar to that of Cursor objects. You can call the execute method to execute SQL statements. After do_INSERT is executed, the connection object automatically calls the COMMIT method

1.2 Example

from twisted.enterprise import adbapi
Copy the code
Initialize database connection pool (thread pool)
Parameter 1: mysql driver
Mysql > connect to mysql
dbpool = adbapi.ConnectionPool('pymysql', **params)
Copy the code
Parameter 1: insert_db, the function to be executed in the asynchronous task;
Parameter 2: The parameter passed to the function insert_db
query = self.dbpool.runInteraction(self.do_insert, item)
Copy the code
After execute(), there is no need to commit().
def do_insert(self, cursor, item) :
    insert_sql = """ insert into qa_sample( need_id, need_question_uptime, need_title, need_title_describe, need_answer_uptime, need_answer) values (%s, %s, %s, %s, %s, %s) """
    params = (item['need_id'],
              item['need_question_uptime'],
              item['need_title'],
              item['need_title_describe'],
              item['need_answer_uptime'],
              item['need_answer'])
    cursor.execute(insert_sql, params)
Copy the code

< span style = “box-sizing: border-box; word-break: inherit! Important

# -*- coding: utf-8 -*-
from twisted.enterprise import adbapi
import pymysql
 
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
 
 
class QaSpiderPipeline(object) :
    def process_item(self, item, spider) :
        return item
 
class MysqlTwistedPipeline(object) :
    def __init__(self, dbpool) :
        self.dbpool = dbpool
 
    @classmethod
    def from_settings(cls, settings) :
        dbparams = dict(
            host=settings['MYSQL_HOST'],
            db=settings['MYSQL_DBNAME'],
            user=settings['MYSQL_USER'],
            passwd=settings['MYSQL_PASSWORD'],
            charset='utf8',
            cursorclass=pymysql.cursors.DictCursor,
            use_unicode=True
        )
        dbpool = adbapi.ConnectionPool('pymysql', **dbparams)
        return cls(dbpool)
 
    def process_item(self, item, spider) :
        query = self.dbpool.runInteraction(self.do_insert, item)
 
    def do_insert(self, cursor, item) :
        insert_sql = """ insert into qa_sample( need_id, need_question_uptime, need_title, need_title_describe, need_answer_uptime, need_answer) values (%s, %s, %s, %s, %s, %s) """
        params = (item['need_id'],
                  item['need_question_uptime'],
                  item['need_title'],
                  item['need_title_describe'],
                  item['need_answer_uptime'],
                  item['need_answer'])
        cursor.execute(insert_sql, params)
Copy the code