1. The introduction

Tableau’s exploratory analytics are so powerful that the possibilities seem endless.

Today I’m going to look at this exploratory model and see how it works.

2. The intensive reading

To master exploratory analysis, we must first master the thinking model behind exploratory analysis.

Understand the data

Data with analytical significance is generally a table structure, which is divided into rows and columns. The columns define the meaning of the data, and the rows constitute the data details.

When we use data as “raw materials”, we need to encapsulate these detailed data into the concept of “data set” to understand. In the concept of data set, data is a field, and for fields, we need to understand the two concepts of “dimension” and “measure”.

The dimension

A dimension is a field that cannot be counted, usually a string or a discrete value that describes the dimension of the data.

To measure the

A metric is a field that can be counted. It is usually a continuous value, such as a number or date, used to describe the amount of data.

In order to improve the efficiency of data analysis, we should first classify data set fields into dimensions and measures. Data analysis is looking at metrics from different dimensions. Figure out what you’re looking at first, like sales or profit? These fields are all metrics, and then think about how you want to look at these metrics, whether by totals, by years, or by region. These fields belong to the dimension.

Dimensions and metrics can be viewed separately. If you look at a single dimension, you can only look at the details of that dimension, such as the order date field:

Note that dimension and measure fields can also be divided into continuous and discrete fields.

continuous

Values are continuous, that is, the difference between any two values can be calculated.

discrete

Values are discrete relations, that is, the difference between any two values cannot be calculated, and cannot be understood in a continuous way.

** In general, dimension fields are discrete and measure fields are continuous. ** The same conclusion can be drawn in the sense of field types: dimension fields are generally string or date types, string types are discrete, and measure fields are generally numbers, which are inherently contiguous.

It is worth noting that continuity and discreteness have nothing to do with field types or dimension measures. For example, the date field of a dimension is continuable, and even string types can “define” a continuous calculation in terms of string length, etc. For numeric measurement fields, we can also ignore the relationships between numbers and treat them as strings, which are discrete.

The “seeing dates in a discrete way” is an intuitive way to see dimensions, but you can still use the “seeing dates in a continuous way” :

In the discrete way, a single dimension has only one piece of data, and there is no sorting rule between the data, while in the continuous way, the dimensions will be sorted in a certain way: for example, the figure above is sorted by time type. At this point, the display mode is also switched from table to bar chart, because the table is suitable for displaying discrete data, and one column of the bar chart can display continuous data.

When looking at the measure alone, since the measure is attached to the dimension presentation, when looking at the measure alone, only the aggregate concept of the measure can be seen:

As shown in the figure above, when looking at the measurement field of sales volume alone, we can only aggregate all the sales fields in the data set, but this aggregation method can also be divided into several calculation types – sum, average, median, count, count deduplicative, minimum, maximum, variance, and so on:

All of these capabilities are “orthogonal”, that is, measures can be calculated in a single field with so many types, but measures can still enjoy the above different calculation methods when split by dimension.

You can also look at metrics in a continuous way:

Unlike continuous-dimension graphs, all transition values except the last value are invalid because continuous-dimension graphs have only one value. Continuous-dimensions also note that by drawing the graph in a continuous way, non-existent points in the middle are “seamlessly connected.”

There can also be parent-child relationships between data, which allow us to roll up and drill down. These parent-child relationships are called “hierarchical fields” :

Orders in the figure above is a hierarchical field. The hierarchical field is a sort combination of several fields, forming the drill-down relationship from top to bottom, and the roll-up relationship from bottom to top.

formation

** Only dimension fields can be hierarchical, ** Because measures cannot be split, only dimensions can be split.

The splitting of dimensions can be logical or arbitrary.

Hierarchies with logical meaning

The most typical hierarchical field with logical meaning is time. After a good BI system recognizes the date field, it should classify the date field obtained. For example, if the granularity of the date field is judged to be day, a date hierarchy field will be automatically generated, which will be automatically aggregated to year and allow users to switch at will:

If the data set field values are accurate to month, the hierarchy can only be expanded to month at most.

The logical meaning of the date hierarchy is that the drilling relationship of year, quarter, month and day is the relationship from large to small in nature, which conforms to the natural understanding.

Any formation

If a hierarchical field does not represent a date, you must combine the hierarchical fields with business meaning. For example, the hierarchy could be grouped by order date -> item ID -> shipping date:

This drill down allows you to see what items are available for each order date and what the shipping date is for each item.

Different order dates and delivery dates can also be divided according to the commodity ID. This hierarchical combination method takes the commodity ID as the main perspective:

As you can see, different perspectives combine hierarchies in different ways. For example, if a large company wants to check financial problems, the dimensions are: BU, date, and the measurement is: sales volume.

Then there are two ways to run: BU -> date and date -> BU. In either way, you can view the sales details of each BU by date. BU -> date displays the total sales aggregated by BU by date, while date -> BU displays the total sales aggregated by BU by different dates. The former is easier to compare the differences between BU and the latter is easier to compare the differences between dates.

Understand the configuration

Configuration is the gateway to exploratory analysis, and understanding the analysis model begins with understanding the configuration model.

Table is mainly configured into rows, columns, tags, and filters. These four configuration areas can be combined into a kaleidoscopic data insight model. With that said, let’s take a look at what this configuration thinking is and why the combination of these four configurations can cover the entire exploratory analysis scenario.

We do not need to consider the 3d data analysis scene, because of the 3d perspective, the graph loses the relationship of accurate size, and the data without accuracy is of no analysis value. Because the data is analyzed in a two-dimensional plane, most charts can be configured in a “row, column” manner.

One might ask, why not use dimensions and measures instead of rows and columns? This is a good question. People with experience in data analysis will think in terms of dimensions and measures, so for any chart, just configure dimensions and measures. The author explains his understanding from three aspects:

  1. In exploratory analysis, you don’t care what the graph is, you don’t care how the graph is presented, so the graph is infinitely variable, so a line graph can be horizontal, a bar graph can become a bar graph, so you put dimensions in columns, it’s a bar graph, you put dimensions in rows, it’s a bar graph.
  2. Really focus on the fields you’re dragging. Since fields already have dimension and metric differences, configuration areas do not need to limit dimension and metric, reducing the cost of understanding.
  3. Dimensions and measures can be placed in rows or columns at the same time, which is another key capability of exploratory analysis, as shown below:

When doing exploratory analysis, think outside the box: ** Why can’t the vertical axis of a bar chart include dimensions? ** As shown in the figure above, if a row drags two different metrics, a two-line or biaxial graph can appear, but when you drag one metric from each dimension, you can faceted the graph, for example by looking at the contribution of different customers to sales from 2013 to 2016.

line

The row of the table class and the vertical axis of the chart class. It is generally recommended to place measurement fields.

column

The column of the table class and the horizontal axis of the chart class. It is generally recommended to place dimension fields.

As shown above, any combination of dimensional measurements can be performed on either row or column, with an unlimited number of fields, and you can drill down at any level. For a chart with multiple dimensions, faceted processing is required:

As shown in the figure above, place two dimensional fields in a column to form a bar chart, so that the horizontal axis represents both dimensions, as shown in the figure above. If the horizontal axis has more dimensions, you can keep breaking it up.

The order of the horizontal (column) multidimensional fields also affects the presentation of the chart. The last field in the figure above is Category and is discrete by default, so this deviation determines that the chart uses a bar chart, and the chart type is determined by whether the last field of the dimension week is continuous or discrete.

For example, what if we switched Order Date and Category?

We get trends for three different categories over the last 12 months, which is a line chart because the dimensional axis (column) of the chart is continuous. If we drill down the Order Date to the day level:

As you can see, the driller function is essentially a dimension axis that supports the splitting of multiple dimension fields. As long as the chart supports faceted representation of any dimension field on the dimension axis, the configuration side can interpret drill-down as dragging multiple fields.

What happens if we switch from line charts to tables?

We see that categories that were in the column are automatically moved to the row, and Sales that were in the row are moved to the “tag” area. Before we formally introduce “tagged” areas, let’s understand why this shift occurred:

** The table class component is a two-dimensional component, and the line chart is a one-dimensional component. ** means that the rows and columns of the table are dimensions, and after the horizontal axis of the line chart is used as dimension, the vertical axis should be used as measurement. In the example above, the line chart dimension has two fields that are rendered in faceted mode, but when switched to a table that supports two dimensions, the extra dimension can be moved to another dimension area of the table component.

In the case that the rows and columns of the table are dimensions, the values of cells need to be expressed in the “marked” text, so the measurement fields of the original line chart are automatically transferred to the “marked” area.

tag

The tag area also takes the form of field drag, that is, the field is marked.

The markup area is divided into ** color, size, label, details, tooltip, and path. ** tags, as their name suggests, are tags that act on the diagram, that is, auxiliary tag information that does not have a material impact on the diagram framework.

The biggest influence on different charts is the rows and columns, which determine which charts to use and how to split the data. Markup, on the other hand, often changes auxiliary elements of the diagram, such as text or color.

The tool tip

Does not affect any image display, only add field information in the prompt message.

In the case of a chart, this means adding the corresponding field to the Tooltip:

As you can see from the figure above, if the profit field is in the Tooltip area, the profit field will be added to the chart’s Tooltip. It’s worth noting that all Tableau charts support Tooltips including tables:

This ensures uniform configuration and behavior.

The size of the

Control chart size.

For the line graph, control the thickness of the line; For bubble graph control bubble size; For the bar chart control column thickness; But it has no obvious effect on area chart and table. This is helped by Tableau abstracting each chart size property as much as possible.


The text

That is, text displayed directly on a chart.

For normal charts, text is represented as Label, which is the text displayed directly on the chart. For example, bar charts do not have Label text by default. The corresponding field appears only after being dragged onto the text Label.

This reflects a different conception from ordinary reports. For normal reports, Label is turned on by a check box, and the value of Label is the value that the chart measures for that field. Tableau, however, opens the TAB values to field drag and drop, allowing for the possibility of separating display and value, which can be applied to a wider range of applications.

Some people think that length and number must correspond, which is also caused by different understanding of data. Tableau lists the text (label) in the tag, indicating that text, like color and size, is an additional dimension of information display. In many cases, it is not necessary to display the same information in two ways, but to display information in different dimensions in more ways of graphics.

color

Control the color of the chart.

For example, when measuring sales volume, you can use profit as a color, or even discount as text, and see multiple metrics simultaneously in a line chart:

By contrast, we can put profits on the right Y-axis as a biaxial plot to achieve the same effect:

Markup is designed to display additional information through dimensions such as color, size, labels, tooltips, and so on without increasing the number of rows and column fields.

The detailed information

If you drag and drop metrics to the details, you see no effect at all. Because details only works if you drag and drop dimension fields. “Details” is actually used for drill-down, and you can drag a dimension field to drill-down by that dimension.

As shown in the figure above, divide sales into three lines by product line. But these three lines are indistinguishable, so you can use color to split the dimensions:

This allows you to display the disassembled content in different colors. Therefore, if the field used for marking is a dimension field, and applied to color, size, label, and details, additional dimensions will be disassembled, and the disassembled content will be differentiated by color or size.

I believe that reading this will have a question: what is the difference between disassembling by dimension and dragging multiple fields by dimension? Let’s try to see the effect. Drag the product category dimension to the row where the sales volume is, and split the sales volume dimension:

As you can see, the multidimensional split for rows and columns is implemented using a faceted strategy, while the dimension split for tags is implemented using a single-chart multi-axis approach.

The other difference is that dimension splitting on tags is applied to measures by default, whereas multidimensional splitting on rows and columns can be applied to either dimension or measure arbitrarily.

At the same time, the configuration side should limit that only dimension or discrete state measures can be split, that is, only discrete state fields can be split. As shown in the figure above, we cannot drag a Category to the right of Sales unless we set Sales to a discrete type. Tips: Tables assigns blue and green to dimensions and measures, respectively. When we set the green measure field to discrete type, the measure field turns blue and is treated as a dimension field.

Finally, the tag area can not only drag and drop fields, but can also be clicked to modify details, such as color details:

Or customize the Tooltip content:

The filter

Tableau converges all filters into the filter, and we can filter a field by dragging and dropping it:

For example, just look at office supplies and technology products. But in addition to this general function, Tableau also supports more powerful graph interaction, namely clicking or circling a graph to save or exclude selected points (field values) :

When we choose to exclude these points, a filter for dimension fields is automatically generated to exclude selected dates, so the chart is completely data-driven: in general

What if the property has a drill-down relationship? Whether dimensions are drilled down in rows and columns or disassembled by tags, filtering works for the field hierarchy:

As shown in the figure above, if a field is filtered after a trip, the filter condition automatically constructs a temporary field hierarchy and filters this temporary field hierarchy. As you can see, not only can we dynamically compose hierarchical fields in the field configuration area, but we can also generate temporary hierarchies in the filter for filtering. We need to support fields with any combination of hierarchies and use them on filters, columns, and even tags.

By the way, we can also drag and drop fields with filters anywhere:

To handle this scenario, we need to make all fields have filtering capabilities. Normal fields have no filtering criteria. We can also drag and drop a field containing filtering criteria to any position.

That was the dimension filter. Is there a scenario where you can filter metrics? Yes, but we can only manually drag the measure field to the filter location for manual filtering:

If we circle the graph, the added filter must be by dimension:

Think of it this way: dimensions are discrete, and checkboxes can express limited meaning, such as certain points on a line chart. How do we know whether we are checking the months of the dimension or the profit range we are measuring?

** Since the final check operation falls on the point, not the interval (continuous values are not suitable for circling), filtering by dimension by default is the most accurate understanding. ** If in the above operation intention, you do not want to check the range from June to December, but the sales volume is 13K to 45.5K, then you need to manually drag the profit field and enter the filter range accurately:

It is worth noting that aggregation options are also available before filtering continuous metrics: for example, range filtering of sum values or range filtering of maximum values is very powerful.

To understand the chart

Chart is the carrier of data visualization, only data and configuration, without various charts, it is difficult to generate intuitive data insight.

It can be said that, according to the thinking of exploratory analysis, after the configuration of data and configuration, there can be a variety of visual carriers to display the configuration information. For example, row and column drag date and sales volume respectively, then line chart, table, scatter chart and bar chart can meet the requirements, but if the field where the row is located is discrete, then line chart and scatter chart are not suitable, which requires the chart recommendation function to recommend appropriate graphic display according to the configuration.

Tableau’s built-in charts are divided into N categories – tables, maps, cylindrical pie, scatter/quadrant charts, as well as histograms, box and beard charts, Gantt charts, bullseye charts, etc. Visible analysis data, do not need too many kinds of visual presentation, but for each chart component, need to cultivate deep internal skills, do a table, line graph is not easy.

Rows and columns

Tables, maps, column flaps, scatter/quadrants, etc. can all be used to describe the basic structure with rows and columns:

  • Tables naturally have rows and columns, transposed to represent transposes. The rows and columns of the table must be dimension fields, and if you drag a measure field up it will automatically switch to another chart, and cutting back will move the measure field to the “text” tag area.
  • Map rows and columns are latitude and longitude. When the dimension field is placed in “Details”, the latitude and longitude are automatically generated in rows and columns according to the geographical mapping table.
  • Cylindrical flatbread and scatter/quadrant graphs are cartesian coordinates, with dimension fields as dimension axes and metric fields as metric axes.

The drill down of the procession

When more than one dimensional field exists in a row or column, the chart is drilled down accordingly. The table for row drill-down is shown in the figure below:

** The figure above can also be understood as showing the detailed data of Order Date and Order ID, grouped and merged according to Order Date. Drilling down is the process of getting closer and closer to the detailed data, but the purpose is not to see a list, but to see the details of some dimensions broken down by other dimensions.

Chart drill down and table idea is the same:

For dimensional axis dimensional drill-down, drill down each dimensional axis to finer granularity. Chart rows and columns drill down at the same time, slightly different from table behavior. Only from the shaft, the disassembly method is the same, showing multiple sets of shafts internally:

** You can assume that when the last field on a row or column is a measure, you switch to a chart presentation because charts are suitable for showing continuous state. ** If you exclude the blue area in the figure above, the remaining area is a crosshatcher. A crosshatcher is a scenario in which both rows and columns have dimension fields. If there are only rows or columns, it becomes a normal table. The drill down of the graph works the same way as the drill down of the table, except that the text of the “cell” is replaced by a column or line.

** So a drill down on any chart is a drill down on an axis. ** The same thing is that the cell properties never change. The cells of a table are text, and the cells of a graph are graphics.

If you continue drilling down to add dimensions to the row, you’re drilling down the axis. Excluding the measurement fields, this is the drill down process of a crosstab, as shown in the figure below. The part circled in blue is a group of large cells:

Since the last field is a measure, the expansion at the leaf node is not a table pattern cell, but a continuous line.

As a result of the above summary, it is important to realize that the logic for drudging down rows and columns, tables and charts in the exploratory analysis scenario is universal and should be implemented holistically. Separate the axis function into a common part, the difference between table and chart is whether the last field cell is treated discretely or continuously.

Drilling in the formation

The performance of drill down of layered fields is the same as that of dragging multiple fields. However, due to the parent-child relationship, “expand” and “fold up” buttons can be displayed on the chart. After clicking on the button, it does not operate on the chart itself, but sends an event to operate on the “row”, and finally completes the expansion or fold up action driven by data.

Charts that do not fit rows and columns

Pie charts are not suitable for rows and columns because pie charts are split according to discrete dimensions. The fan size can be determined by a measurement field, so for pie charts, rows correspond to “color” and columns to the new “Angle” tag:

Diagrams without dimensional axes

Tables are recommended for only row configurations, but bar charts and line charts can also support this, as long as the horizontal axis is ignored:

There is no horizontal axis in style, but this is a case of aggregating the horizontal axis of all dimensions.

Continuous and discrete values

Let’s look at the differences between continuous and discrete actions on dimensions and measures respectively.

Action on metric

The chart should be able to handle continuous or discrete values. For example, for sales volume, if the value is changed to discrete, it is displayed as a string:

If the sales volume is switched to a continuous value, the cell will use the line length to represent the value size, that is, the continuous value will be able to create a “sense of contrast” :

The above component is a table, which is itself suitable for displaying discrete values, but you can see that it ADAPTS to continuous values. For graphs suitable for displaying continuous values, discrete adaptation cannot be done:

In this bar chart, for example, if you switch sales volume to discrete, it will automatically switch to the table because it is meaningless to display double discrete values in a cylindrical pie.

Acting on dimension

The figure above is an example of a dimension that uses discrete fields. Since dimensions are discrete, they are shown in a bar chart because the columns are also isolated.

For continuous fields applied to dimensions, the default is suitable for scatter plots, because scatter plots have rows and columns that are measures and are suitable for default recommendation:

However, if you can use a scatter plot, you can also use a line plot. ** When the dimension is a continuous date field, a line plot rather than a scatter plot is appropriate. ** Because the date is continuous, but not suitable for comparison, so as a continuous dimension display is more appropriate; However, both axes of scatter plot are suitable for continuous measurement, so they are not suitable for continuous dimension field such as square date.

There is also the ability to switch from a line chart to a scatter chart at any time, but this graph has little business value:

Therefore, we can implement the function of recommending charts according to the configuration by marking the fold line graph with rows suitable for continuous dimension fields and the scatter chart with rows and columns suitable for continuous metric fields.

tag

In addition to the special labels “Angle” for pie charts and “path” for line graphs, all charts support the following five common tags: tooltip, Size, Text, Color, and Details.

Tooltips are relatively simple, all charts support the Hover Hover after the Tooltip, and this Tooltip allows you to customize and expand the Tooltip field.

Size is supported only by fold, column and scatter graphs, because these three graphs have line thickness, column width and circle radius that can be described respectively.

The text corresponds to the Label of the cylindrical pastry, the corresponding table, the rectangular tree, and the cell content of the map.

Color and detailed information are more special, as described below:

Dragging an existing field into details – has no effect:

It has no effect because it is looking at the details of the field itself.

However, if you drag an existing field to a color, you can differentiate it by color based on numeric size or classification:

When the color filtering condition field is continuous, the filter slider appears; when the color filtering condition field is discrete, the legend appears:

If the dragged field does not exist on the row and column, for the measure field, the color sort is done by value (dragging the measure to details still has no effect) :

As shown above, we can look at profit from length and sales from color depth.

If the drag field does not exist on rows and columns and is a dimension field, the dimension split is performed first, and then if the color tag area is selected, the same set of split tags are colored.

Since the split of the dimension by the labeled area is not column-based, each chart is split as appropriate for its situation.

For example, if a bar graph is split by a new dimension, the “stack bar” strategy is adopted:

If it is a broken line chart, the “multiple lines” policy is adopted:

If it’s a scatter plot, you just have to break up the extra points. Since the dimension splitting of scatter graphs is not segmented like line graphs and bar graphs, it is impossible to distinguish groups without using color splitting:

The complexity of exploratory analysis is high because its possibility formula is:

Field X discrete continuous X row X row drudge x mark type x filter X chart

The Cartesian product of this combination is almost infinite.

Axis interaction

Some of the chart’s specific capabilities are hidden in axis interaction. Take the line graph for example, there are five drag and drop interaction positions, as shown below:

Normally these areas are used to drag measurement fields, so if you drag a dimension field, you end up with a column or tag.

Drag and drop the dimension

Dimension drag to the bottom 1 area equals to replace the column field:

Dragging a dimension into area 4 of the diagram equals dragging the color mark:

Dimension drag to the left 3 region equals to driller the row:

Similarly, dragging to the topmost region equals drilling down the column.

Drag and drop measurement

Let’s look at the drag-and-drop metrics. Metrics can be dragged more widely. For example, drag to the 5 region on the right axis to form a biaxial graph:

Drag to the left 2 area to add an additional axis to the diagram:

Note that the row above shows “measurement value,” which is a special field, and that the two fields, Profit and Sales, are dragged by a filter. In addition to dragging and dropping, you can also do this by dragging the left “Measurements” field directly into the line:

As shown in the figure above, put the measure in the row and color it with the measure name to get the effect of dragging the measure to the left 2 field. This also indicates that all diagram interactions are ultimately mapped to the configuration, and all drag-and-drop operations can be mapped to the configuration.

For tables, the areas that can be dragged are rows, columns, and cells:

Dragging to a row or column is no different from dragging to a field configuration area; dragging to a cell is equal to dragging to a text marker area. By combining the diagram with the configuration area, even people who don’t fully understand the configuration can get an intuitive sense of action by dragging fields onto the diagram.

Click and circle the interaction

All charts can be selected by clicking and circling. For tables, points are cells:

For a bar chart, points are columns:

For a folded diagram, points are nodes:

For pie charts, dots are fan blades:

All selected points have basic highlighting functions, most importantly the ability to save, exclude, local sort the selected points, and so on.

For example, we can sort the selected sectors of the pie chart from smallest to largest:

We can also exclude certain points, as mentioned in the configuration section, which will eventually translate into new filters:

Finally, the selected state appears to be only highlighted on a single chart, but when multiple charts are linked, the highlighted selected area forms a temporary filter that applies to all charts in the same data set and highlights the filter results of those charts.

3. Summary

After understanding the exploration model’s understanding of data, configuration and charts, we can learn to analyze data with exploratory thinking, which is also of reference significance for making exploratory BI.

The discussion address is: close reading Tableau Exploratory Model · Issue #199 · dt-fe/weekly

If you’d like to participate in the discussion, pleaseClick here to, with a new theme every week, released on weekends or Mondays. Front end Intensive Reading – Helps you filter the right content.

Pay attention to the front end of intensive reading wechat public account

Copyright Notice: Freely reproduced – Non-commercial – Non-derivative – Remain signed (Creative Commons 3.0 License)