The original

In order to better display the respage01 data, I want to get the difference set between the data of the day and the previous data, so as to know whether there is a decrease or increase of stores on the day. Secondly, I want to count the total amount of all historical data.

Use redis set calculation and Django interface

The above requirements can be easily implemented using redis native command, but we still want to interface, that also needs to modify our timing crawl script. Synchronize daily calculation results with mysql.

Update redisToMysql. Py

* TrainUnion stores the union of all data up to the date; * TrainuniontMP stores the union of all data except for that day; * TrainNew is used to store stores added on the day * TraingOne is used to store stores reduced on the day

127.0.0.1:6379> KEYS *
 1) "trainunion"
 2) "trainuniontmp"
 3) "trainnew"
 4) "traingone"
Copy the code

def getUnionData(re):
    """ Get the union of all data and save it in trainUnion :param re: :return: ""
    keyList = getKeyList(re)
    re.sunionstore(tool.getFileKey() + 'union', keyList)
    try:
        re.delete(tool.getFileKey() + 'unionmp')
    except Exception as e:
        pass
    finally:
        re.sunionstore(tool.getFileKey() + 'uniontmp', getKeyListExceptToday(re))


def getNewDiffData(re):
    Get new data :param re: :return: ""
    today = time.strftime("%Y_%m_%d")
    if 'train_' + today in getKeyList(re):
        re.delete(tool.getFileKey() + 'new')
        re.sdiffstore(tool.getFileKey() + 'new', tool.getFileKey() + '_' + today, tool.getFileKey() + 'uniontmp')
    else:
        pass


def getGoneDiffData(re):
    """ Get missing data :param re: :return: ""
    today = time.strftime("%Y_%m_%d")
    if tool.getFileKey() + '_' + today in getKeyList(re):
        re.delete(tool.getFileKey() + 'gone')
        re.sdiffstore(tool.getFileKey() + 'gone', tool.getFileKey() + 'union', tool.getFileKey() + '_' + today)
    else:
        pass
Copy the code

The new interface

According to requirements, three new interfaces are added: * To obtain the newly added part of the intersection set between the current day and all previous data * to obtain the reduced part of the intersection set between the current day and all previous data * to obtain the union of all data up to the current day

The current status is as follows:

Basic molding subsequent optimization

According to the data collected by crawling, the number of disappearing stores fluctuates greatly. This suspicion is related to the error of crawling, which needs time to confirm and fine-tune. From the data of a few days, the overall fluctuation is very small, which may serve as a monitoring function to make some investment decisions when the magnitude of sudden changes.

Roubo ‘s dashboard