Micro-blog hot search real-time monitoring platform based on PHP crawler

  • background
  • First, the overall idea
  • Second, data crawling
    • 1. Get HTML
    • 2. Extract data
    • 3. Return data
  • Data visualization
    • 1. Draw a bar chart
    • 2. Ajax request data
  • Iv. Effect display
  • Write in the last

background

After studying and losing my hair, I go to Weibo to see if anything interesting is happening, or to know what “big” things are happening, or something that I can’t get over for a long time.

Because that’s not the point

The point is, when I went to Search weibo trending, it went like this:



The interface had to be manually refreshed all the time, and when I did a quick search, there seemed to be no relevant real-time statistics, so I tried to write one myself.

(ps. If anyone knows there is a welcome message, they would be grateful.)

First, the overall idea

A simple thought came to my aunt’s mind: FirstTo collect dataAnd then through oneThe web pageTo do a statistical demonstration.

For collecting data, of course, it would be good to write a crawler. The first thought that flashed through my mind was to use Python, but people always like to try new things, so I chose to try to write a crawler in PHP. So, here’s the big picture:



Specific related technology we can refer to:

  • PHP crawler preparation, PHP crawler three ways
  • Echarts Usage Guide
  • HTML5

Second, data crawling

1. Get HTML

PHP crawls microblogging hot search page, get HTML source code:

function getUrlContent($url){// Get HTML content from the URL
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_USERAGENT,"Mozilla / 4.0 (compatible; MSIE 7.0; Windows NT 6.1)");
    curl_setopt($ch,CURLOPT_HEADER,1);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    $output = curl_exec($ch);
    curl_close($ch);
    return $output;
 }
Copy the code

Of course, you could just use file_get_contents or something like that.

2. Extract data

Extract the TABLE tag from HTML and convert it to Array type by means of re:

function getTable($html) {
	preg_match_all("/
      
       [\s\S]*? <\/table>/i"
      
.$html.$table); $table = $table[0] [0]; $table = preg_replace("' ]*? >'si"
[^>
."".$table); $table = preg_replace("' ]*? >'si" [^>."".$table); $table = preg_replace("' ]*? >'si" [^>."".$table); $table = str_replace("</tr>"."{tr}".$table); $table = str_replace("</td>"."{td}".$table); // Remove HTML tags $table = preg_replace("' < / /! *? [^ < >] *? >'si"."".$table); // Remove whitespace characters $table = preg_replace("'([rn])[s]+'"."".$table); $table = str_replace(""."".$table); $table = str_replace(""."".$table); $table = explode('{tr}'.$table); array_pop($table); foreach ($table as $key= >$tr) { // You can add the corresponding substitution yourself $tr = str_replace("\n\n"."".$tr); $td = explode('{td}'.$tr); array_pop($td); $td_array[] = $td; } return $td_array; } Copy the code

3. Return data

Crawl collated data and return it for a front-end call:

$html = getUrlContent("https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6");
$table = getTable($html);
$table = array_slice($table.2); # Cut off the excess in front
echo json_encode($table);
Copy the code

At this point, the above code can be integrated into a PHP file named “weibo. PHP”, ready to be called by the front end through Ajax.

Data visualization

It’s true: I can’t do it, but I’m good at moving bricks around and putting things together. Now I’ve learned echarts.js, and then I’ve seen the demo of the big guys on the Internet.

1. Draw a bar chart

Use echarts.js to draw statistical histogram on canvas:

function CreateBar(keywords,value){
	// Initialize the echarts instance
    var myChart = echarts.init(document.getElementById('chartmain'));
    myChart.on('click'.function(param){
		window.open(The '#');
	});
	// Specify the icon configuration and data
    var option = {
        title: {text:' '
        },
        tooltip: {},grid: {top:"15%".left:"16%".bottom:"5%"
        },
        legend: {data: ['Hot Search term']},xAxis:{
        },
        yAxis: {data:keywords
        },
        series: [{name:'Search volume'.type:'bar'.itemStyle: {
                normal: {
                    color: '#ff9406'}},data:value
        }]
    };
    myChart.setOption(option);
}
Copy the code

You need two parameters (two arrays), hot search terms and search volume, which you get by sending an Ajax request to the back end.

2. Ajax request data

Make a request to the back end via Ajax (i.e. weibo. PHP mentioned earlier) to get data:

function GetData(){
	$.ajax({
		type: "post".// Data submission (post/get)
		url: "weibo.php".// The url submitted to
		dataType: "json".// The format of the data type returned
		
		success: function(msg){
			// Returns a successful callback
			if(msg! =' ') {var data = eval(msg); // Parse the returned JSON data and assign it to data
				var keywords = [];
				var value = [];
				for(var i=0; i < 20; i++){ / / get TOP20
					keywords.push(data[i][1].split('\n') [0]);
					value.push(Number(data[i][1].split('\n') [1]));
				}
				CreateBar(keywords.reverse(),value.reverse());
				setInterval("GetData()".10000); / / interval of 10 s}},error:function(msg){
				// Return the failed callback function
				console.log(msg);
				setInterval("GetData()".30000); / / interval of 30 s}}); }Copy the code

Note: **setInterval()** is used here to implement asynchronous request timing, to achieve real-time monitoring.

Iv. Effect display

After unremitting attempts and modifications, a relatively satisfactory effect was finally achieved, which is roughly like this:

Write in the last

Another way to achieve this goal is to fetch the data in a script (such as Python) and store it in a database, and then display the data by reading the database, which may be more flexible and allow for more interesting statistical analysis once a certain amount of data has been collected.

If there is insufficient in the article, we also hope to criticize and correct! Finally, thank you for reading ~ with great patience