First, I need to write a crawler to capture the data of the double-colored ball lottery and find this websitedatachart.500.com/ssq/By default, only the winning numbers of the last 30 issues are displayed

To find all the winning numbers on the website,Datachart.500.com/ssq/history…This is what it looks like when you open it in your browserThe Chinese is garbled, but it doesn’t matter, we just need the numbers and we’re going to start getting the data, and the code is as follows

import requests
from lxml import etree
url = "http://datachart.500.com/ssq/history/newinc/history.php?start=00001&end=18081"
response = requests.get(url)
response = response.text
selector = etree.HTML(response)
for i in selector.xpath('//tr[@class="t_tr1"]'):
    datetime = i.xpath('td/text()')[0]
    red = i.xpath('td/text()')[1:7]
    blue = i.xpath('td/text()')[7]
    print(datetime,red,blue)
Copy the code

The effect is as follows, successfully obtained the winning number

We first put the winning red ball number and the winning basketball number into two arrays and convert them to Series. The code is as follows

import requests
from lxml import etree
import matplotlib.pyplot as plt
from pandas import Series

url = "http://datachart.500.com/ssq/history/newinc/history.php?start=00001&end=18081"
response = requests.get(url)
response = response.text
selector = etree.HTML(response)
reds = []
blues = []
for i in selector.xpath('//tr[@class="t_tr1"]'):
    datetime = i.xpath('td/text()')[0]
    red = i.xpath('td/text()')[1:7]
    blue = i.xpath('td/text()')[7]
    for i in red:
        reds.append(i)
    blues.append(blue)

s_blues = Series(blues)
s_blues = s_blues.value_counts()
s_reds = Series(reds)
s_reds = s_reds.value_counts()
print(s_blues)
Copy the code

Print the blue ball test and the results are as followsOn the left is the number of blue balls, on the right is the number of occurrences, and on the right is the number of occurrences of red balls, but that doesn’t seem intuitive enough, so all we need to do to draw with Matplotlib is this code

import matplotlib.pyplot as plt
labels = s_blues.index.tolist()
sizes = s_blues.values.tolist()
rect = plt.bar(range(len(sizes)) , sizes , tick_label = labels)
plt.show()
Copy the code

Results the followingIn this way, it is very intuitive to see which number appears the most frequently, but this does not show the exact number of occurrences, we need to add a method to make it appear.

def autolabel(rects):
    for rect in rects:
        height = rect.get_height()
        plt.text(rect.get_x(), 1.02*height, "%s" % height)


labels = s_blues.index.tolist()
sizes = s_blues.values.tolist()
rect = plt.bar(range(len(sizes)) , sizes , tick_label = labels)
autolabel(rect)
plt.show()
Copy the code

The effect is shown belowThat makes it a little bit clearer. The diagram of the red ball is as follows, and the method is similar

Anyway, the bottom line is that the highest winning numbers are

01 08 14 20 22 26 + 12

For reference only, please remember to contact me if anyone wins. The complete code is as follows

import requests
from lxml import etree
import matplotlib.pyplot as plt
from pandas import Series

url = "http://datachart.500.com/ssq/history/newinc/history.php?start=00001&end=18081"
response = requests.get(url)
response = response.text
selector = etree.HTML(response)
reds = []
blues = []
for i in selector.xpath('//tr[@class="t_tr1"]'):
    datetime = i.xpath('td/text()')[0]
    red = i.xpath('td/text()')[1:7]
    blue = i.xpath('td/text()')[7]
    for i in red:
        reds.append(i)
    blues.append(blue)

s_blues = Series(blues)
s_blues = s_blues.value_counts()
s_reds = Series(reds)
s_reds = s_reds.value_counts()

def autolabel(rects):
    for rect in rects:
        height = rect.get_height()
        plt.text(rect.get_x(), 1.02*height, "%s" % height)


labels = s_blues.index.tolist()
sizes = s_blues.values.tolist()
rect = plt.bar(range(len(sizes)) , sizes , tick_label = labels)
autolabel(rect)
plt.show()

labels2 = s_reds.index.tolist()
sizes2 = s_reds.values.tolist()
rect2 = plt.bar(range(len(sizes2)), sizes2 , tick_label = labels2)
autolabel(rect2)
plt.show()


Copy the code

Github is at github.com/dangsh/hive… If you are interested in crawlers, please check out my other articles or the Hive project on Github. I hope it will help you