This article source: making here | | GitEE, click here

1. Data visualization

1. Basic concepts

Data visualization is a scientific and technological research on the visual expression of data. Among them, the visual representation of such data is defined as a kind of information extracted in some summary form, including various attributes and variables of the corresponding information unit.

If the actual words are appropriate: common data report statistics in system development, the data is presented in the form of charts or tables, to help operators or decision makers understand the rules or value of these data, is a simple visualization application.

From the development point of view, the core data in the system, with a certain means of statistics, with the help of some exquisite chart style, display, or a series of charts assembled into a data screen, full of style.

However, from the perspective of operation personnel, it is more important to analyze business scenarios with the help of visual data, so as to obtain valuable reference data and provide guidance for subsequent decisions or activities. Therefore, with the continuous development of business lines, the requirements for data analysis are increasingly high, which gives birth to the BI analysis tools and BI analysts that are now relatively common.

2. Value of data visualization

  • Accurately, efficiently and intuitively transfer the rules and information in the data;
  • Real-time monitoring of various data indicators of the system to realize self-interpretation of data;
  • Specify precise operational strategies based on visual insights into data rules;

3. Basic construction principles

Simple steps are as follows: Perform visual data processing (collection, rules, timing tasks, etc.) based on business requirements. Use the usual charts to combine the presentation, but here are some caveats:

  • Visualized data should be associated with core data of business value;
  • The chart display attention is simple, clear, the essence of the chart is to make the data more intuitive;
  • Do not pursue the system fancy, can bulk add charts;

Two, commonly used chart design

1. Common basic charts

A histogram

Features: Group data are generally presented, and the differences between groups of data are intuitively displayed. For example, the data are usually divided into axes by weekly, monthly, or different clients.

The line chart

Features: Focusing on showing the trend of data change, usually taking time as the axis to show the trend of data under time.

The pie chart

Features: do not pay attention to the details of the data, emphasize the percentage of the data in the total, or distribution, pay attention to the contrast between modules.

Funnel figure

Features: Emphasize the transformation relationship and progressive law between data, the classic common is the number of user views, clicks, to the number of order payment.

Combination of figure

Features: a variety of basic chart combination, some special business data, need to combine two charts or more charts, emphasizing the load of the business report key combination information.

2. Large data screen

As the name suggests, on the large screen of the report, there are generally a variety of rich business data, naturally need a variety of reports to show the form, more three-dimensional sense and visual impact.

1: data screen in more times is the pursuit of impressive, this is the most key, understand all understand.

Three, commonly used statistical methods

1, SQL analysis statement

In reporting business, SQL analysis statements are often used. The following methods are commonly used:

  • Count: The sum of data, for example, how many users;
  • Sum: the sum function, such as total sales, total costs, etc.
  • Group-by: Group-by statistics, and the grouping result is the axis identifier;
  • Average calculation, such as average daily sales;

Although the report of business class is complex, the data interface related to the report is relatively simple, and the report data is generated based on some basic statistical SQL.

2. Basic cases

Product and latitude table

CREATE TABLE 'VC_PRODUCT_INFO' (' id 'INT (11) NOT NULL AUTO_INCREMENT CREATE TABLE' vc_product_INFO '(' id' INT (11) NOT NULL AUTO_INCREMENT 'product_sort' varchar(20) DEFAULT 'COMMENT' product_name 'varchar(50) DEFAULT' COMMENT ', 'product_name' varchar(50) DEFAULT 'COMMENT ', 'inventory' int(11) DEFAULT '0' COMMENT 'decimal ',' price 'decimal(10,2) DEFAULT '0' COMMENT' 'total_sales' int(11) DEFAULT '0' COMMENT' sales_amount '(10,2) DEFAULT '0' COMMENT' total ', 'create_time' datetime DEFAULT NULL COMMENT 'create time ', PRIMARY KEY (' id ')) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT=' product '; CREATE TABLE 'vc_product_detail' (' id 'INT (11) NOT NULL AUTO_INCREMENT SELECT * FROM' vc_product_detail ', 'product_id' INT (11) NOT NULL, 'place_origin' VARCHAR (50) DEFAULT 'COMMENT ', 'weight' DECIMAL (10, 2) DEFAULT '0.00' COMMENT 'weight ',' color 'VARCHAR (50) DEFAULT' COMMENT ', 'high_praise' INT (11) DEFAULT '0' COMMENT 'low_praise' INT (11) DEFAULT '0' COMMENT 'low_praise ', 'create_time' datetime DEFAULT NULL COMMENT 'create time ', PRIMARY KEY (' id ')) ENGINE = INNODB DEFAULT CHARSET = utf8 COMMENT = "";

Underlying query statement

<mapper namespace="com.visual.chart.mapper.ProductInfoMapper"> <! > <select id="countNum" resultType=" java.lang.integer "> select COUNT(product_name) FROM vc_product_info </select> <! > <select id="sumAll" resultType="java.lang.Double"> select SUM(Sales_amount) FROM vc_product_info </select> <! > <select id="groupSum" resultType="java.util.Map"> select product_sort,SUM(Sales_amount) FROM vc_product_info  GROUP BY product_sort </select> <! -> <select id="average" resultType="java.lang.Double"> select AVG(price) FROM vc_product_info </select> </mapper>

Fourth, custom tools

1. Data sets

Data set concept

A DataSet is a collection of data, usually in the form of a list. Each column represents a specific variable. Each row corresponds to a member’s dataset problem. In the business of statistical analysis, it is often called a large wide table to facilitate business analysis.

Data set generation

View mode

Based on view, a single table data set is generated, which is convenient to simplify the operation. This approach does not encourage the use of views, which are prohibited in most companies, but is simply demonstrated here.

Based on the above product information table and dimension table, the data set is generated by means of view, simply to simplify the operational difficulty of business analysis, because multiple tables are combined and simplified into a single table in the sense of feeling.

CREATE OR REPLACE 
VIEW data_set_view AS SELECT
    t1.*, t2.place_origin,
    t2.weight,
    t2.color,
    t2.high_praise,
    t2.low_praise
FROM vc_product_info t1
LEFT JOIN vc_product_detail t2 ON t1.id = t2.product_id

Task mode

Data structures to be analyzed are obtained through timed tasks and constantly injected into analysis tables. This is the most common way to generate report data sets in business development, and some of them even directly count the data needed by reports through timed tasks, which is not suitable for big data scenarios.

Offline or real-time computing

By means of big data analysis, offline calculation or real-time calculation, business report data is acquired, and then injected into OLAP real-time analysis and calculation library, and scenarios are analyzed with big data.

2. Customize BI tools

BI tools can quickly and effectively integrate business data, quickly and accurately provide statements and put forward decision-making basis, and help enterprises to make wise business decisions. The concept of business intelligence was first proposed in 1996. At that time, business intelligence was defined as a kind of data warehouse (or data mart), query report, data analysis, data mining, data backup and recovery, etc., with the purpose of helping enterprises to make decisions as the technology and its application.

Basic construction idea:

  • Manage data sources, establish relationships between data tables, and maintain specific data sets;
  • Create a drag-and-drop report panel to host individual chart combinations.
  • Encapsulate different chart processing logic, rule display, associated data set fields;
  • Encapsulate diagram styles, such as size, color, background, interaction, etc. configurable;
  • Chart correlation analysis data set, report panel combined multiple charts to generate reports;

The actual development process is very complex, managing data sources of various lines of business, joint analysis, and adapting various chart specifications and styles is a very long process.

3. Basic cases

Interface into the reference

Continuing with the above business scenario, parameters need to be passed in dynamically, such as operational: data sets, charts, parameter ownership axes, or business product parameters.

@RestController public class DefineController { @Resource private DataSetService dataSetService ; /** * @getMapping ("/getDefChart") public Map<Integer, List<ChartParam>> getDefChart (){ List<ChartParam> chartParamList = new ArrayList<>() ; chartParamList.add(new ChartParam("X",1,"data_set_view","product_sort")) ; chartParamList.add(new ChartParam("X",1,"data_set_view","product_name")) ; chartParamList.add(new ChartParam("Y",2,"data_set_view","high_praise")) ; chartParamList.add(new ChartParam("Y",2,"data_set_view","low_praise")) ; chartParamList.add(new ChartParam("Z",3,"data_set_view","inventory",1)) ; chartParamList.add(new ChartParam("Z",3,"data_set_view","total_sales",1)) ; return dataSetService.analyData(chartParamList); }}

Argument parsing

According to various dynamic parameters, the query conditions are parsed and the query results are obtained.

@Service public class DataSetServiceImpl implements DataSetService { @Resource private DataSetMapper dataSetMapper ; // Override public Map<Integer, List<ChartParam>> AnalyData (List<ChartParam> ChartParamList) {Map<Integer, List<ChartParam>> dataMap = chartParamList.stream() .collect(Collectors .groupingBy(ChartParam::getDataType)); For (Integer dataType: datamap.keyset ()){switch (dataType){case 1: taskCount(datamap.get (dataType)); break; Case 2: // taskSum(datamap. get(dataType)); break; Case 3: // Percent style data taskPercent(datamap.get (dataType)); break; default: break; } } return dataMap ; } / / Count data private void taskCount (List < ChartParam > chartParamList) {for (ChartParam ChartParam: chartParamList) { chartParam.setResultNum(dataSetMapper.taskCount(chartParam.getColumnName(), chartParam.getTableName())); }} / / Sum data private void taskSum (List < ChartParam > chartParamList) {for (ChartParam ChartParam: chartParamList) { chartParam.setResultNum(dataSetMapper.taskSum(chartParam.getColumnName(), chartParam.getTableName())); } // Percent private void TaskPercent (List<ChartParam> ChartParamList){for (ChartParam) chartParam:chartParamList){ chartParam.setResultNum(dataSetMapper.taskPercent(chartParam.getColumnName(), chartParam.getTableName(), chartParam.getProductId())); }}}

To sum up: Data visualization tools are a long process, not only to analyze your own business, but also to generate value as an open BI tool.

Five, the source code address

Making address GitEE, https://github.com/cicadasmile/data-manage-parent, https://gitee.com/cicadasmile/data-manage-parent

Recommended reading: Architectural Design Series, radish green vegetables, each for your needs

The serial number The title
01 Architecture design: single service. Clustering. Distributed, basic differences and connections
02 Architecture design: Global ID generation strategy in distributed business system
03 Architecture design: distributed system scheduling, ZooKeeper cluster management
04 Architecture design: the principle of idempotent interface to prevent duplicate submission of Token management
05 Architecture design: Cache management pattern, monitoring and memory collection strategy