When it comes to Singles’ Day, everyone will think of tmall’s powerful real-time large screen. When it comes to real-time large screens, the most typical streaming computing architecture comes to mind:

  • Data collection: real-time data collection from all sources in the future
  • Intermediate storage: The production and consumer systems are decoupled using the Kafka Queue class
  • Real-time computing: The most important part of the link is to subscribe to real-time data and calculate the data in the window through calculation rules
  • Result storage: The calculation result data is stored in SQL and NoSQL
  • Visualization: Presentation through API call result data

In Ali Group, there are a large number of mature products to complete such work, generally available products are as follows:

In addition to this solution, today I will introduce a new method: through the LOG service (formerly SLS) query analysis LogSearch/Analytics API directly to DataV for large screen display.

In September 2017
Log Service (formerly SLS)To strengthen
Real-time log analysis (LogSearch/Analytics), you can use query +SQL92 syntax to analyze logs in real time. In the visualization of results analysis, in addition to using
Built-in DashboardOutside, also support
Grafana, Tableua (
JDBC) and other docking modes

The two ways are different

Generally, computing can be divided into two methods based on data volume, real-time performance and service requirements: real-time computing (stream computing) and offline computing (data warehouse + offline computing). Log service (formerly SLS) provides two ways to connect real-time collected data.

In addition, for log analysis scenarios with a large amount of data and requirements for real-time performance, we provide the data mechanism in real-time index LogHub, and LogSearch/Anlaytics can be used to directly query and analyze the data. What are the benefits of this approach:

  • Fast: The API gets results immediately when it passes a Query, without waiting and anticipating the results
  • Real-time: Logs are generated to feedback in 99.9% of cases within 1 second
  • Dynamic: no matter the statistical method is modified, the data can be updated immediately, without waiting for recalculation

Of course, no computing system is a panacea, and the limitations of this approach are as follows:

  • Data volume: The data volume of a single calculation is less than 10 billion, exceeding the time limit required
  • Calculation flexibility: Currently, calculation is limited to SQL92 syntax, and custom UDF is not supported

Practical case: Constantly adjust the real-time large screen of statistical caliber

During the computing conference, there was a temporary demand to count the number of visits to online sites across the country. Since the full log data was collected and Query analysis was turned on in the logging service, you can simply write a Query analysis Query. Take statistical UV as an example: we get a unique count for the forward field under nginx in all access logs from October 11 to date:

* | select approx_distinct(forward) as uv
Copy the code

We have been running online for one day and the demand has changed. We only need to statistics the data under the domain name Yunqi. We added a filter condition (host) to get the result immediately:

host:yunqi.aliyun.com | select approx_distinct(forward) as uv
Copy the code

It turns out that there are multiple IP addresses in the Nginx access log. By default, only the first IP address is used and the Query is processed in the Query

host:yunqi.aliyun.com | select approx_distinct(split_part(forward,',',1)) as uv
Copy the code

On the third day, we received a demand that advertising access in UC should be removed from the access calculation, so we added a filter condition (not…). Now that we have the latest results:

host:yunqi.aliyun.com not url:uc-iflow  | select approx_distinct(split_part(forward,',',1)) as uv
Copy the code

The final large screen effect is as follows:

Usage: SLS connects to DataV

There are three main steps:

  1. Data collection, reference documentation
  2. Index setup with console queries, refer to index setup with visualizations, or best practice site log analysis examples
  3. Connect to DataV plug-in to convert real-time query SQL into view

We mainly demonstrate step 3. After step 1 and Step 2, you can see the original log on the query page:

Create the dataV data source

Type: Simple Log service-SLS

Custom name

AK ID and AK Secret Enter the primary account or THE AK of the sub-account that has the permission to read the log service.

Endpoint Specifies the address of the region where the project of the log service is located. The region address of Hangzhou is shown in the figure.

Create a line chart

Create a line chart. In the data configuration of the line chart, select Simple Log Service-SLS as the data source type, and select log_service_API as the data source.

The following is an example of query parameters:

{
    "projectName": "dashboard-demo",
    "logStoreName": "access-log",
    "topic": "",
    "from": ":from",
    "to": ":to",
    "query": "*| select approx_distinct(remote_addr) as uv ,count(1) as pv , date_format(from_unixtime(date_trunc('hour',__time__) ) ,'%Y/%m/%d %H:%i:%s')   as time group by date_trunc('hour',__time__) order by time limit 1000" ,
    "line": 100,
    "offset": 0
  }
Copy the code

ProjectName Fill in your project.

LogstoreName LogStore to which logs are filled.

From and to indicate the start time and end time of a log respectively.

Note that we filled in “from” and “to”. During testing, you can enter Unix time first, for example, 1509897600. After publishing, change to :from and :to, and then we can control the time range of these two values in the URL parameter. For example, preview is the URL
http://datav.aliyun.com/screen/86312Open,
http://datav.aliyun.com/screen/86312?from=1510796077&to=1510798877After, the calculation will be performed according to the specified time.

Query Specifies the conditions for a query. The syntax of query is referenced in the parse syntax document. In this example, pv per minute is shown. The time format in query must be 2017/07/11 12:00:00, So use date_format(from_unixTime (date_trunc(‘hour’,__time__)),’%Y/%m/%d %H:% I :%s’) to align the time to the hour and convert it to the target format.

Keep the default values for other parameters.

After the configuration, click “View Data Response Result” :

Click “Use filters” above and create a new filter:

Filter content:

return Object.keys(data).map((key) => { let d= data[key]; d["pv"] = parseInt(d["pv"]); return d; })Copy the code

In the filter, to convert the Y-axis to an int, in the example above, the Y-axis is PV, so the PV column needs to be converted.

You can see that there are two columns t and PV in the result, so let’s configure t on the X-axis and PV on the Y-axis.

Configure a pie chart

Enquiry filling:

{
    "projectName": "dashboard-demo",
    "logStoreName": "access-log",
    "topic": "",
    "from": 1509897600,
    "to": 1509984000,
    "query": "*| select count(1) as pv ,method group by method" ,
    "line": 100,
    "offset": 0
  }
Copy the code

In the query, we calculate the percentages of different methods.

Add a filter and fill in the filter:

return Object.keys(data).map((key) => { let d= data[key]; d["pv"] = parseInt(d["pv"]); return d; })Copy the code

In the pie chart, set Type to method and value to PV.

Preview and Release

Click Preview and Publish to create a large screen. Developers and business students can see their business access in real time on Singles’ Day!

Attached: Demo address. The from and to parameters in the URL, you can switch to any time you want.

Operation demonstration: