Hangzhou rent: The highest in Qiantang, Olympic single room up to 4830 yuan/month. Many people lament that they have escaped the high housing price but cannot escape the high rent. Facing the rising rent, they feel empty. In the summer of 2018, rent is becoming the “first straw” that destroys young people. If you are struggling in Hangzhou, has the rent in your city increased? Can you still gracefully say, “If you can’t afford a house, rent it?”

This is a news that sees in sina finance and economics, because come to Hangzhou at the beginning, rent goes up did not go up, I am not clear, but rent is high indeed is existence, say much or because poor na…

How high is the rent in Hangzhou?

Data collection & data cleansing

Copy the code

Import re import time import requests from LXML import etree headers = {‘ user-agent ‘:’Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36’} def get_city_areas(url): response = requests.get(url, headers=headers) html = etree.HTML(response.text) areas = html.xpath(‘//ul[@class=”new_di_tab sTab”]/a/@title’) areas_links = html.xpath(‘//ul[@class=”new_di_tab sTab”]/a/@href’) for i in range(1, len(areas)): Area = areas [I]. Replace (‘ hangzhou ‘, ‘). The replace (‘ rent ‘, ‘) area_link = ‘https://hz.5i5j.com’ + areas_links [I] try: get_area_page(area, area_link) except Exception as e: continue def get_area_page(area, link): key = 1 num = 1 url = link while key: time.sleep(2) response = requests.get(url, headers=headers) html = response.text nextpage_link_message = re.findall(‘







This time is mainly for I love my home rental website information climb, why to climb I love my home, is also mainly on the way back from work can see two I love my home store, so feel the information should be more reliable, they start on the website, climb means is very simple, the main problems are the following:

① There is no housing information in Fuyang District and Haining city, which is a blank page without information

② The total number of pages of rental information in each area is “…” , so can not directly obtain, through the link on the next page, continuous judgment, finally obtain the total number of pages

(3) Normally, there are 30 houses on a page, but the last page is generally less than 30 houses, so you need to judge how many houses on the last page

④ The title of the rental information will have the comma of the English character, which will lead to the later TXT file to be saved as CSV file, there is an error, so directly replace when the information is obtained

⑤ For the room type (several rooms and several halls) of the acquisition, originally thought that there are numbers, but there is a “multi-room multi-hall”, so it is also judged to delete, because the need to calculate the price of a single room, there will be errors

Finally, the information is successfully obtained.

Data visualization

Copy the code

import pandas as pd from pyecharts import Bar, Line, XLSX ‘,’ rb’) df = pd. Read_excel (f, sep=’,’, header=None, encoding=’utf-8′, names=[‘area’, ‘title’, ‘room_type’, ‘room_quantity’, ‘square’, ‘xiaoqu’, ‘loupan’, ‘price’, ‘one_room_price’]) area_message = df.groupby([‘area’]) area_com = area_message[‘one_room_price’].agg([‘mean’, ‘count’]) area_com.reset_index(inplace=True) area_message_last = area_com.sort_values(‘count’, ascending=False) attr = area_message_last[‘area’] v1 = area_message_last[‘count’] v2 = area_message_last[‘mean’] line = Add (” 本 空 “, attr, v2, is_stack=True,xaxis_rotate=30, yaxis_min=0, mark_point=[” Max “, “min”] xaxis_interval=0, line_color=’lightblue’, line_width=4, mark_point_textcolor=’black’, mark_point_color=’lightblue’, Is_splitline_show =False) bar = bar (attr, v1, is_stack=True, xaxis_rotate=30, yaxis_min=0, xaxis_interval=0, is_splitline_show=False) overlap = Overlap() overlap.add(bar) overlap.add(line, Yaxis_index =1, is_add_yaxis=True) overlap.







Through the analysis of more than 3000 rental housing information, it can be seen that housing resources are mainly concentrated in Xihu District, Xiacheng District, Gongshu District, Jianggan District, Shangcheng District. I calculated the rent here as a single room price, and the rent in Binjiang District is the highest, which is probably the result of binjiang as a high-tech zone, Ali, netease, Huawei, Geely, KVIV and many other listed companies gathered there. After all, the salaries of employees in big factories are placed there, which naturally raises the rent in Binjiang. Based on the API of Baidu Map, I made a heat map for the distribution of housing sources, so that the distribution can be seen more intuitively.

Copy the code

xiaoqu_message = df.groupby([‘xiaoqu’]) xiaoqu_com = xiaoqu_message[‘one_room_price’].agg([‘mean’, ‘count’]) xiaoqu_com.reset_index(inplace=True) xiaoqu_message_last = xiaoqu_com.sort_values(‘count’, ascending=False)[0:20] attr = xiaoqu_message_last[‘xiaoqu’] v1 = xiaoqu_message_last[‘count’] v2 = Add (” 数 “, attr, v2, is_stack=True, xaxis_rotate=30, yaxis_min=0, mark_point=[“max”, “min”], xaxis_interval=0, line_color=’lightblue’, line_width=4, mark_point_textcolor=’black’, mark_point_color=’lightblue’, Add (” 数 “, attr, v1, is_stack=True, xaxis_rotate=30, yaxis_min=0, xaxis_interval=0, is_splitline_show=False) overlap = Overlap() overlap.add(bar) overlap.add(line, Yaxis_index =1, is_add_yaxis=True) overlap.



Which shenhua and three pier are depressed, it also with the depressed housing number corresponding to the maximum, binjiang, the price of the government’s highest, binjiang, for the government’s impression is close to geely, hikvision, and is close to the qiantang river shore, binjiang library was there, borrow books can pay treasure to avoid deposit, this determination.

Copy the code

loupan_message = df.groupby([‘loupan’]) loupan_com = loupan_message[‘one_room_price’].agg([‘mean’, ‘count’]) loupan_com.reset_index(inplace=True) loupan_message_last = loupan_com.sort_values(‘count’, ascending=False)[0:20] attr = loupan_message_last[‘loupan’] v1 = loupan_message_last[‘count’] v2 = Loupan_message_last [‘mean’] line = line.add(” build “, attr, v2, is_stack=True, xaxis_rotate=30, yaxis_min=0, mark_point=[“max”, “min”], xaxis_interval=0, line_color=’lightblue’, line_width=4, mark_point_textcolor=’black’, mark_point_color=’lightblue’, Is_splitline_show =False) bar = bar (” 本 “, attr, v1, is_stack=True, xaxis_rotate=30, yaxis_min=0, xaxis_interval=0, is_splitline_show=False) overlap = Overlap() overlap.add(bar) overlap.add(line, Yaxis_index =1, is_add_yaxis=True) overlap.



The rent of Oriental Grand Hyatt here is extremely high, because I don’t know about these real estate in Hangzhou at the beginning, I specially searched it, it turns out that it is located beside the Qiantang River, close to the civic center, river-view rooms, butler service, high-end serviced apartments, it is the aristocrat of rental industry…

Copy the code

price_info = df[‘one_room_price’] bins = [0, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 5000, 6000, 7000, 8000, 9000, 10000] level = [‘0-1000’, ‘1000-1500’, ‘1500-2000’, ‘2000-2500’, ‘2500-3000’, ‘3000-3500’, ‘3500-4000’, ‘4000-5000’, Bins = pd.cut(price_info, bins=bins, Value_counts ().sort_index() attr = price_stage.index v1 = price_stage.values bar = bar (” 表 面 房 价 价 值 “) bar.add(“”,attr,v1,is_stack=True, xaxis_rotate=30, yaxis_min=0, xaxis_interval=0, Is_splitline_show =False) bar.render(“



After all, I just looked for a house in Hangzhou last month. Originally, I had a budget of 1000-2000. As I learned from the Internet, all the better houses require 2000+, so I found an old community to live in. It is very close to the company and the transportation is convenient.

Copy the code

square_info = df[‘square’] bins = [0, 30, 60, 90, 120, 150, 200, 300] level = [‘0-30′, ’30-60′, ’60-90′, ’90-120’, ‘120-150’, ‘150-200’, ‘200+’] square_stage = pd.cut(square_info, bins=bins, Value_counts ().sort_index() attr = square_stage.index v1 = square_stage.values pie = pie (” 表 面 房 价 值 数 据 “, title_pos=’center’) pie.add(“”, attr, v1, radius=[40, 75], label_text_color=None, is_label_show=True, Legend_orient =”vertical”, legend_pos=”left”,) pie. Render (legend_orient=”vertical”, legend_pos=”left”,) pie. Render (legend_orient=”vertical”, legend_pos=”left”,) pie.



It can be seen that most rental houses in Hangzhou are distributed between 30 and 90 square meters, which is also in line with the normal situation. After all, the area is too large to rent, the area is too small, and the living is not comfortable.

Copy the code

square_info = df[‘square’] bins = [0, 30, 60, 90, 120, 150, 200, 300] level = [‘0-30′, ’30-60′, ’60-90′, ’90-120’, ‘120-150’, ‘150-200’, ‘200+’] df[‘square_level’] = pd.cut(square_info, bins=bins, labels=level) df_message = df[[‘area’, ‘title’, ‘room_type’, ‘room_quantity’, ‘square’, ‘xiaoqu’, ‘loupan’, ‘price’, ‘one_room_price’, ‘square_level’]] prices_message = df_message.groupby([‘square_level’]) prices_com = prices_message[‘price’].agg([‘mean’, ‘count’]) prices_com.reset_index(inplace=True) attr = prices_com[‘square_level’] v1 = prices_com[‘mean’] bar = Add (” rent “, attr, v1, is_stack=True, xaxis_rotate=30, yaxis_min=0, xaxis_interval=0, Is_splitline_show =False) is_splitline_show=False)



This is not the single room price, but the total price of the room. It can be seen that the rent increases with the increase of the house area, which is consistent with the normal situation.

Copy the code

from wordcloud import WordCloud, ImageColorGenerator import matplotlib.pyplot as plt import pandas as pd import jieba f = Open (r’C:\Users\Administrator\Desktop\xuexi\ XLSX ‘,’ rb’) df = pd. Read_excel (f, sep=’,’, header=None, encoding=’utf-8′, names=[‘area’, ‘title’, ‘room_type’, ‘room_quantity’, ‘square’, ‘xiaoqu’, ‘loupan’, ‘price’, ‘one_room_price’]) text = ” for line in df[‘title’]: text += ‘ ‘.join(jieba.cut(line, cut_all=False)) backgroud_Image = plt.imread(‘room.jpg’) wc = WordCloud( background_color=’white’, mask=backgroud_Image, font_path=’C:\Windows\Fonts\STZHONGS.TTF’, max_words=2000, max_font_size=150, random_state=30 ) wc.generate_from_text(text) img_colors = ImageColorGenerator(backgroud_Image) Wc.recolor (color_func=img_colors) plt.imshow(wc) plt.axis(‘off’) wc.to_file(” house.jpg “) print(‘ generated word ‘successfully! ‘)



For the title of the rental information make word cloud, it can be seen that the bag check in is most, this also and I love my family’s main corresponds to the advertisement, after all, every day pass by, don’t know is difficult, hardcover, roommates, household appliances complete, the transportation is convenient, the checking at any time, these are generally renters are concerned, of course the price is also nots allow to ignore.

Copy the code

import json import requests import pandas as pd def get_lnglat(address): Url = ‘http://api.map.baidu.com/geocoder/v2/’ output = ‘json ak =’ your key city = ‘hangzhou uri = url +’? ‘+’ address = ‘+ ‘Hangzhou’ + address + ‘&city=’ + city + ‘&output=’ + output + ‘&ak=’ + AK response = requests. Get (uri) message = Json. Loads (response.text) return message f = open(r’C:\Users\Administrator\Desktop\xuexi\ XLSX ‘, ‘rb’) df = pd.read_excel(f, sep=’,’, header=None, encoding=’utf-8′, names=[‘area’, ‘title’, ‘room_type’, ‘room_quantity’, ‘square’, ‘xiaoqu’, ‘loupan’, ‘price’, ‘one_room_price’]) xiaoqu_message = df.groupby([‘xiaoqu’]) xiaoqu_com = xiaoqu_message[‘xiaoqu’].agg([‘count’]) xiaoqu_com.reset_index(inplace=True) file = open(r’C:\Users\Administrator\Desktop\cuiqingcai\point.json’, ‘w’) for i in range(len(xiaoqu_com[‘xiaoqu’])): b = xiaoqu_com[‘xiaoqu’][i] c = xiaoqu_com[‘count’][i] lng = get_lnglat(b)[‘result’][‘location’][‘lng’] lat = get_lnglat(b)[‘result’][‘location’][‘lat’] lng_lat_count = ‘{“lat”:’ + str(lat) + ‘,”lng”:’ + str(lng) + ‘,”count”:’ + str(c) + ‘},’ file.write(lng_lat_count) file.close()



This is the longitude and latitude access code, by calling baidu map API, to achieve the longitude and latitude access to the site, and then in

http://lbsyun.baidu.com/jsdemo.htm#c1_15

Then adjust the latitude and longitude of the central point of the map (30.28,120.16), map level (12), radius size (35), and maximum number (160) to obtain the thermal map. It is necessary to pay attention to the perfect information, because incomplete location information will lead to errors in latitude and longitude query. Return the wrong latitude and longitude, so for the location that returned the wrong latitude and longitude, manually query the location information, finally obtain relatively complete location information, and then manually modify in the table (I have about a dozen).


This article is from the cloud community partner “Programmers grow together”. For relevant information, you can pay attention to “Programmers grow together”.