Big Data Digest is published under license

Project developer: Ke Zhenxu

It is also the peak of n-degree housing search in a year, and various rental information is dazzling. How to find a reliable house quickly and efficiently?

A tech geek recently created a scrapy-based crawler project that compiles listings from hundreds of cities, including Douban, Homelink and 58.com, to search for listings of interest and break through some of the site’s boring search functions.

Using this “secret weapon,” the tech geek has used the reptile to find a suitable home.

Not only that, but I selflessly cleaned up the project code and put it up on Github.

Making links:

https://github.com/kezhenxu94/house-renting

Click “Read the original article” to view the project introduction, in the background of big Data Digest reply “Rent” can download the source code ~

Next, follow the digest bacteria to see this wave of cool operation.

The deployment environment

Python version: Python 2 | | Python 3

Crawler frame: Scrapy

Linux operating system: Mac | | | | Windows

Service engine: Docker

Access to the source code

$ git clone https://github.com/kezhenxu94/house-renting$ cd house-renting

Copy the code

In big Data Digest backstage reply “rent” can download the source code ~

Start the service

Using Docker (recommended)

$ docker-compose up --build -d

Copy the code

Environment and Version: Mac Docker CE Version 18.03.1-CE-MAC65 (24312).

In order to facilitate users to use this project, the author provided the Docker-comemage.yml file for deploying the services needed for this project. However, due to the limitations of Docker itself, Docker Toolbox must be used in Windows non-professional version, which brought many problems. For detailed reference:

http://support.divio.com/local-development/docker/how-to-use-a-directory-outside-cusers-with-docker-toolbox-on-windowsdo cker-for-windows

If you encounter such a problem, you can submit an Issue here. If you encounter such a problem and solve it yourself, welcome to submit a Pull Request here to help optimize the project!

Issue:

https://github.com/kezhenxu94/house-renting/issues

Pull the Request:

https://github.com/kezhenxu94/house-renting/pulls

Manual deployment (not recommended)

Install Elasticsearch 5.6.9 and Kibana 5.6.9 and start

Download and install Elasticsearch and Kibana from:

https://www.elastic.co/downloads/past-releases

Install Redis and start it

Download and install Redis from:

https://redis.io/download

Configure the relevant hosts and ports in the crawler/house_renting/settings.py file:

# ES nodes that can be configured with multiple nodes (clusters), default to None, will not be stored in ESELASTIC_HOSTS = [{'host': 'elastic', 'port': 9200},]REDIS_HOST = 'redis' # default to None, REDIS_PORT = 6379 # default to 6379

Copy the code

Installing Python dependencies

$ cd crawler$ pip install -r requirements.txt

Copy the code

Select the city to loot (currently support Lianjia, 58.com) :

Select the city you want to pick from lianjia:

Open the crawler/house_renting/ spider_Settings /lianjia.py file and follow the comments to complete the city selection;

#... (u' guangzhou ')cities = (u' Guangzhou ', u' Beijing ')cities = (u' Guangzhou ', u' Beijing ')cities = (u' Guangzhou ', u' Beijing ')#...

Copy the code

Select the city you want to pick from 58.com:

Open crawler/house_renting/ spider_Settings /a58.py and follow the comments to complete the city selection:

#... (u' guangzhou ')cities = (u' Guangzhou ', u' Beijing ')cities = (u' Guangzhou ', u' Beijing ')cities = (u' Guangzhou ', u' Beijing ')#...

Copy the code

Start the crawler

Start the site crawler that needs to be scraped in a different command line window

$scrapy crawl douban $scrapy crawl LIANjia $scrapy crawl 58 #

Copy the code

Here, congratulations! House information has been successfully climbed to take a look at the climb results!

View the results

Look at the picture to choose room

After the crawler operation picks up data, house_renting/data directory will be created, in which the images folder downloads images in rental information. Users can use the image browser to view the images in this folder, and search for appropriate house pictures in Kibana using the file name of the image. Find the appropriate rental information details.

Search keywords

Open your browser and navigate to http://127.0.0.1:5601 (please change the URL corresponding to Kibana according to the Docker IP address).

Setting index Mode

Enter house_RENTING in the Index pattern input box in the figure below, and then press TAB, the Create button will become available. At this time, click the Create button. If the Create button is not available at this time, the crawler has not retrieved the data into Elasticsearch, so it needs to wait for a long time. If so, you need to check whether the crawler service is successfully started.

Switch to the Discover page

Add fields

Chronological order

Search for a keyword

Search for multiple keywords

Expand details

Warm prompt

If the environment is configured correctly and the results are not correct, the reason may be that the site has been updated, readers can go to the project introduction page to update the code and try again. The author will update the project according to spare time and energy, interested friends can continue to pay attention to oh.