The September 13 release of the iPhone Xs was big news in the mobile world, and the price of the new iPhone set a new record for mobile pricing. After watching the conference, I believe that many people are in such a mood

Johnny used to have an iPhone 6, but was recently ready for a new phone. In a tough economy, you can’t afford to switch to an iPhone, so you’re downgrading your spending and heading for android.

With a budget of 1,500 yuan, you can’t even buy a used iPhone, but there are plenty of options on Android. In this article, we’ll take a look at how to use data analytics to buy a mobile phone.

Analysis methods

The idea is very simple, go to Jingdong Mall to climb down the data of all mobile phones, and then filter out the eligible mobile phones according to the configuration and price, and select the one with the highest cost performance among the filtered phones. Let me draw it as a flow chart. It looks something like this

Crawl data

The first step is to crawl all the mobile phone data for sale from Jingdong Mall. Here we are mainly concerned with the price and configuration information, which is shown in the following two images on the product page

We wrote the code to crawl the price and configuration information of all the phones. The core code of the crawler is as follows

# Get the price of a phone item
def get_price(skuid):
    url = "https://c0.3.cn/stock?skuId=" + str(skuid) + "& area = 1 _72_4137_0 & venderId = 1000004123 & cat = 9987653655 & buyNum = 1 & choseSuitSkuIds = & extraParam = {% 22 originid % 22:221% % 22} & ch =1&fqsp=0&pduid=15379228074621272760279&pdpin=&detailedAdd=null&callback=jQuery3285040"
    r = requests.get(url, verify=False)
    content = r.content.decode('GBK')
    matched = re.search(r'jQuery\d+\((.*)\)', content, re.M)
    if matched:
        data = json.loads(matched.group(1))
        price = float(data["stock"] ["jdPrice"] ["p"])
        return price
    return 0

Get the phone's configuration information
def get_item(skuid, url):
    price = get_price(skuid)
    r = requests.get(url, verify=False)
    content = r.content
    root = etree.HTML(content)
    nodes = root.xpath('.//div[@class="Ptable"]/div[@class="Ptable-item"]')
    params = {"price": price, "skuid": skuid}
    for node in nodes:
        text_nodes = node.xpath('./dl') [0]
        k = ""
        v = ""
        for text_node in text_nodes:
            if text_node.tag == "dt":
                k = text_node.text
            elif text_node.tag == "dd" and "class" not in text_node.attrib:
                v = text_node.text
                params[k] = v
    return params

Get all mobile phone information in a page
def get_cellphone(page):
    url = "Https://list.jd.com/list.html?cat=9987, 653655 & page = {} & sort = sort_rank_asc & trans = 1 & JL = 6 _0_0 & ms = 4 # J_main".format(page)
    r = requests.get(url, verify=False)
    content = r.content.decode("utf-8")
    root = etree.HTML(content)
    cell_nodes = root.xpath('.//div[@class="p-img"]/a')
    client = pymongo.MongoClient()
    db = client[DB]
    for node in cell_nodes:
        item_url = fix_url(node.attrib["href"])
        matched = re.search('item.jd.com/(\d+)\.html', item_url)
        skuid = int(matched.group(1))
        saved = db.items.find({"skuid": skuid}).count()
        if saved > 0:
            print(saved)
            continue
        item = get_item(skuid, item_url)
        The result is stored in MongoDB
        db.items.insert(item)

Copy the code

Note that the get_price and get_item functions above get their data from two urls, respectively, because the configuration information can be parsed directly from the item page, while the price information needs to be retrieved from a separate Ajax request. All climbing data is stored in MongoDB.

Filter the data

More than 4,700 phones belonging to 70 brands had complete information. These brand word clouds look like this

The mobile phone configuration mainly includes the following parameters

  • Dual-card dual-standby mode
  • The fuselage material
  • The CPU model
  • Memory size
  • Storage capacity
  • Battery capacity
  • The screen material
  • The screen size
  • The resolution of the
  • camera

Brother Qiang usually uses his mobile phone mainly to read books, swipe Zhihu wechat and buy things. Therefore, when buying a new mobile phone, he is most concerned about speed, capacity and standby time, rather than the camera and screen material. Considering the above factors, I set the following conditions when filtering the data

  • The brand name of CPU is Qualcomm
  • The memory size is greater than or equal to 6GB
  • The storage capacity is greater than or equal to 64GB
  • The battery capacity is greater than 3000mAh
  • It must be dual-card and dual-standby
  • The price is within 1500 yuan

The code for filtering data is as follows

client = pymongo.MongoClient()
db = client[DB]
items = db.items.find({})
result = preprocess(items)
df = pd.DataFrame(result)
df_res = df[df.cpu_brand=="Snapdragon"][df.battery_cap >= 3000][df.rom >= 64][df.ram >= 6][df.dual_sim == True][df.price<=1500]
print(df_res[["brand"."model"."color"."cpu_brand"."cpu_freq"."cpu_core"."cpu_model"."rom"."ram"."battery_cap"."price"]].sort_values(by="price"))

Copy the code

First read the data from MongoDB, then create a DataFrame and select the data in the DataFrame according to the above criteria. The last line of code prints out the selected phones and sorts them in order of price from lowest to highest.

After such a round of screening, we came up with the following 38 phones

The configuration of the above several mobile phones is similar, but the online evaluation of Xiaomi is generally high, so I screened all the Xiaomi mobile phones in the list above, and obtained the following 7 models

This is where the redmi Note5 and mi 6X compete. In terms of price, they’re about the same. In terms of configuration, it is found online that the CPU of Redmi Note5 is Snapdragon 636 (the CPU model of Redmi Note5 is missing in the table above). Compared with snapdragon 660,636 of MI 6X, although the performance is not as good as 660, it is more energy saving. Moreover, considering the super capacity battery of Redmi Note5 with 4000 mah, Finally, I decided to buy the Redmi Note 5. As a thousand yuan phone, the Snapdragon 636 has eight core CPU, 6G large memory, 64G large storage, 5.99-inch wide field of view, front camera and rear dual-camera, and long standby time. This phone is probably the king of thousand yuan phones.

All the code has been uploaded to Github, public account [Python and data Analysis] background reply “buy mobile phone” to obtain the address.