preface

1️ If you have ever made data analysis and used PowerBI and Tableau to conduct data analysis, you will find that these two products have obvious inconsistency on how to make visual chart operation. For example, in PowerBI, when you need to draw a chart, you need to select a statistical chart and then configure the necessary configuration information based on the currently selected chart. As shown below, PowerBI will present a full list of the graphs it says it supports and let you choose from them. Once you have selected the chart type, you will be prompted to configure the corresponding chart. For example, the bar chart on the left prompts you to configure X/Y axes, legends, etc., while the pie chart on the right prompts you to configure values, legends, etc. To summarize, PowerBI charts require you to select one of the supported charts and then configure it.

Bar charts The pie chart

But to make a chart in the Tableau, you will find that Tableau is not let you to choose a chart type, but directly to let you select field to form the structure of ranks, and then to provide a configuration area called “tag”, let the current data only configuration, can form a variety of statistical graphs. In “Marker configuration Area”, users can configure basic markers, which are basic graphic symbols, such as Circle, Line, Area and Shape. In addition to selecting graphical symbols as markers, configurations such as colors, sizes, labels, details, tooltips, and so on are provided (these are called visual channels, more on that later)

Why are the two BI data analysis products so different in the way of creating visual statistical charts? This is because PowerBI and Tabluea reveal different concepts of visual charts to users. Specifically, how users understand visual charts in the two products. There are different ways of expressing visual diagrams. Users of PowerBI express graphs by naming them according to the general categories of visual charts, such as bar charts, pie charts, etc. PowerBI and users express statistical charts by naming them this way. The latter uses a theory called graph grammar to express graphs. Under graph grammar theory, a statistical chart can be formalized through the rules and complex grammar structure of graph grammar. For example, the formal language shown in the following figure expresses the corresponding two pie charts

To put it simply, graph grammar abandons the simple naming method of graph to express statistical graph, but abstracts a set of formal language to describe the inner composition of statistical graph deeply, thus describing statistical graph syntactically completely.

2️ retail If you are a developer, when choosing and using chart library, you will also find two types of chart library, one of which is represented by ECharts and HighChart. In the process of use, it is also named as conventional statistical chart to express charts, such as in the configuration of ECharts. The chart type is expressed by specifying series. Type, which you can see in ECharts documentation lists all the chart types

The document Line chart sample code

Another class of chart libraries, including ggplot2, the most popular in the R language community, look at how to draw a pie chart using GGplot2:

As you can see, the ggplot2 pie chart drawing sample code does not specify that this is a pie chart anywhere, but instead forms a pie chart by combining three elements (GGplot, GEOM_bar, and coord_polar). In fact, GGplot2 is the most fully implemented graphical syntax in the open source community

Another example is the AntV G2 chart library. How to draw a pie chart using G2?

As you can see, in the G2 sample code, it is also not specified that this is a pie chart, but is implemented through chart.coodinate, chart.data, chart.interval, etc. Ggplot2 and G2 are graphical syntax implementations that use R and JS respectively to allow users to use graphical syntax theory to express and create statistical charts.

Graph grammar

history

We are familiar with statistical charts, such as bar charts, pie charts and so on actually has a long history, as early as in the 18th century has been published, with the development of computer graphics, a variety of graphics software is also more and more. In the early 1990s, Leland Wilkinson redesigned the SYSTAT Scientific Drawing software package using an object-oriented approach by abstracting graphic elements into a tree structure. The final selection of graphics was done by traversing and adding and deleting nodes, keeping the package below 1M. In The late 1990s, Leland Wilkinson, Dan Rope, and Dan Carr recreated a production software called The Graphics Programming Language (GPL), The GPL provides a programming language for users to formally describe statistical charts. The code below describes a scatter chart. GPL packages are maintained below 0.5M.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=X Y
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: X=col(source(s), name("X"))
  DATA: Y=col(source(s), name("Y"))
  GUIDE: axis(dim(1), label("X"))
  GUIDE: axis(dim(2), label("Y"))
  ELEMENT: point(position(X*Y))
END GPL.
Copy the code

During GPL development, Leland Wilkinson wrote The Grammar of Graphics, a formal description of The design and implementation of Graphics syntax. Wilkinson points out that, for a long time, no one did in-depth analysis of the rules and deep syntax of a visual diagram, but made generalizations by looking at it visually.

So what are the advantages of using graphical syntax to express statistics? Grammatical intention refers to the rules of a language, and grammar makes a language expressive. A language consisting only of words without grammar (statement = word) expresses just as many thoughts as words. Grammar expands the scope of a language by dictating how words are combined in a statement. Also for visual graphics, graph syntax makes visual graphics more expressive. By combining the syntax, graph syntax can fully express any graph, instead of being limited to a known named graph.

  • Graph syntax allows us to understand the composition of a complex graph
  • Graph syntax allows us to understand the interrelationships between different graphs
  • But graph syntax can also produce syntactically accurate but meaningless graphs

composition

In Leland Wilkinson’s GPL Graph syntax, there are two concepts: Graph and Graphic:

  • A chart is a collection of points. A mathematical chart is invisible to the eye. It is an abstract representation.
  • A graph is physically visible only when it is drawn to form a graph, i.e. a graph is an expression on the physical medium of the graph
  • Diagrams contain visual properties such as color, size, and so on, which is why diagrams can be rendered as graphics

In terms of the implementation of GPL graphics syntax, Wilkinson believes that the graphics production process is divided into three stages:

  1. Description Sepcification
  2. Assemble the Assembly
  3. According to the Display

description

Description specification refers to the translation of user intentions into formal language, which is the main part of the GPL graph syntax. Under GPL graph syntax, the description of a statistical chart can be expressed in six statements:

  1. DATA: A series of DATA operations that create variables from a DATA set
  2. TRANS conversion: Conversion operations on variables, such as sorting, etc
  3. SCALE/metric: Metric conversion, such as log
  4. COORD coordinate system: coordinate system, such as Cartesian coordinate system, polar coordinate system
  5. ELEMENT graphic ELEMENT: Graphic symbol and corresponding data channel attributes, such as color and size
  6. GUIDE auxiliary elements: One or more auxiliary elements, such as axes, illustrations, etc

Here, the coordinate system COORD and the graphic ELEMENT ELEMENT are more important. Here, different coordinate systems and graphic elements are expressed orthogonal in graphic syntax. It can be seen that this is where the expressive power of graphic syntax is reflected.

The assembly

Assembly refers to the process of constructing a canvas from the description of a graph. The description of graph syntax itself does not design assembly. Assembly needs to be considered in the process of learning and implementing graph syntax.

According to

The chart can only be displayed through its visual properties, plus the display system, so that people can see it.

The sample

The GPL graphics syntax is described here by drawing pie charts as an example. Let’s start with a flow chart for drawing a pie chart. The chart below shows a flow from the raw data to the final graph.

Let’s simplify the process:

  1. Data transformation and mapping: Involves creating variables from data sets, scale measurements, and statistics

  1. Graphic symbol processing: mapping from data points to graphic symbols

  1. Coordinate system application: calculation of the position of graphic symbols in the coordinate system

  1. Visual channel: handles visual properties such as color and size of graphic symbols

  1. Drawing: Final assembly display of diagrams

The development of

In The Grammar of Graphics, published in early 1999 and republished in 2005, Wilkinson’s Graphics syntax was implemented in The GPL, and has since been developed in various commercial software, open source community implementations, and academic research.

A Layered Grammar of Graphcs

Different layers can share the default graphics syntax. Split the coordinate system of GPL graphics syntax into the coordinate system and the plane. ELEMENT in GPL is used in the hierarchical graphics syntax for data mapping, geometry, statistics, location and other elements of layer. In fact, ggplot2, the most popular graphical syntax implementation in the open source community, was developed based on subtheory.

Interactive grammar

When GPL graphics syntax was introduced, graphics were just drawn for people to see. Now, in the Web era, statistical charts need to support a variety of interactions that are not designed for GPL graphics syntax. At present, in the academic world, the Data Interaction Laboratory (IDL) in the School of Computer Science at the University of Washington has been studying graph grammars that support interactive behavior, and has formed a set of theories that describe the interaction flow and behavior of graphs using formal grammars. At the same time IDL also studies the implementation of graphics syntax driven by Reactive stream. IDL developed Vega and Vega-Lite graphics libraries based on this.

implementation

Protovis(Tableau)

Protovis is the scientific research product of the predecessor of Tableau. According to the idea of graph grammar, Protovis realizes a set of visual domain language, and combines and draws visual graphics through different markers. Graphic syntax has also been fully reflected in Tableau. Users do not choose graphic classification when conducting data analysis, but directly specify the graphic Element to be used, and configure visual channels, such as color and size.

ggplot2

Ggplot2 proposes a graphical syntax for layering based on Wilkinson GPL’s graphical syntax, providing a way to form a visual diagram with multiple layers. Ggplot2 is also the most famous graphics package in the R language community, and is the most comprehensive software for graphics syntax implementation.

Vega

Vega is a set of Graph grammar that supports interaction. It makes continuous academic progress on Graph grammar, proposes graph-based Reactive Graph grammar implementation and graph-based interactive Graph grammar, namely the research on interactive grammar mentioned above.

AntV G2

G2 is The implementation of Graphics syntax by ant AntV group. G2 is The most restored implementation of The Grammar of Graphics by JavaScript community at present, and has been recognized by Leland Wilkinson himself as soon as it was open source.

conclusion

To sum up, we start with two different BI software and chart library, and briefly explain why there is a graph grammar theory based on the basic cognition of graph grammar. This gives you a deeper understanding of the ideas behind the visualized charts when using BI analysis software, and makes it easier to understand the concepts in the visualized charts when choosing to use the chart library. In addition, for graph users and developers, there is a certain threshold for them to understand the theory of graph syntax, which requires a lot of time and energy. Therefore, a graph library like G2 also provides G2Plot, which encapsulates the classification and naming of graphs, to lower the threshold. If you want to learn more about graph syntax, you can refer to the following resources:

  • The Grammar of Graphics
  • ggplot2
  • Vega
  • AntV G2