How to do a graphical display based on the large amount of data visualization, to learn the LTTB algorithm you do not know

Largest Triangle Three Buckets, seeing these words for the first time, looked stunned. What is this? The biggest triangle Three buckets, three buckets make the biggest triangle, right? What’s in the bucket? Why the triangle? I really don’t understand, fortunately we can learn, just like you are reading the article, but also don’t go to a “like”, so love learning.

preface

Based on Downsampling Time Series for Visual Representation, this paper introduces the CONTENT of LTTB algorithm, introduces the algorithm core, the problems to be solved, and the realization of specific logic and application scenarios in visualization.

Take the broken line graph as an example. In the visual scene, when the data of x axis is increasing, the corresponding data of Y axis is increasing, the broken line reflected in the graph will become more and more complicated. When the number reaches a certain level, it is difficult to find the real value expressed by a specific point through the graph, and the data becomes very crowded.

5000 data point plot line graph

In order to see the graph as a whole, we have to hide some points, show only those points that are representative of other points, or create new data points that are representative of these points, which is the problem that the downsampling algorithm solves.

Maximum triangle three bucket algorithm

Nothing in the world is perfect

Algorithms have their own limitations and applicable scope, and LTTB algorithm is the same. Only the points that can provide important information and be easily perceived by people are retained in the algorithm, while other points are ignored. The benefits are obvious: you can see the chart in full view, speed up rendering, and save bandwidth and time.

Using LTTB algorithm, the line graph drawn from 5000 data points to 888 data points is almost identical

Algorithm implementation steps:

Excluding the first and last data points, divide all data points as evenly as possible into each bucket in order. What is a bucket? A bucket is a subset of data composed of multiple data points.
The first data point goes into the first bucket, and the last data point goes into the last bucket, and that’s all there is.
Go through all the buckets to find the most appropriate point from each bucket, representing the bucket.

All points in the bucket, i.e., effective points, are the set of data points after downsampling.

The key is to find the point that represents the current bucket, which is the core design of the three-bucket algorithm.

Maximum triangle area, which three buckets are used, how to get the data, let’s look at a picture.

Virtual vertical lines are divided into three sections, each representing the data areas of the three buckets.

We start with the first bucket, because the first bucket only has the first data point, so A is the first data point, which we know, we don’t need to calculate, is the pre-selected point. B comes from the second bucket, C comes from the third bucket. Let’s first talk about the selection of point C. If there are 100 data points in each bucket, it will take 100 * 100 times of calculation to determine the appropriate point. It is more appropriate to select a temporary point that can represent other points, and then calculate the appropriate point B through 100 times of calculation to simplify the process. Therefore, point C is a temporary point and an average point, which can be easily obtained.

At this point, we know pre-selected point A and temporary point C, and select the point with the largest area of triangle ABC as point B to represent barrel B through traversing data points in barrel B.

Point A, pre-selected

Points B and ABC have the largest area of the triangle

Point C, the average point of the bucket data point C

And then, following that logic, we continue to use point B as the first choice, and we know that. Temporary point D is selected from bucket D (not shown in the figure) to calculate the point C in bucket C that can truly represent bucket C.

Loop through in turn to complete the logical calculation of all buckets.

The book gives a clear algorithm logic source code

Application scenarios in Echarts

In Eharts line graph, the serial-line-sampling attribute supports the use of ‘LTTB’ algorithm.

We analyze the echarts source code in detail.

/** * Large data down sampling using largest-triangle-three-buckets. * buckets: An ordered set containing a subinterval of data points. * Bucket: An ordered set containing a subinterval of data. *@param {string} valueDimension
* @param {number} targetCount* /
lttbDownSample(
    valueDimension: DimensionIndex,
    rate: number
): DataStore {
    const target = this.clone([valueDimension], true);
    const targetStorage = target._chunks; // Target data set, i.e. X and y axis data
    const dimStore = targetStorage[valueDimension]; // Y-axis data
    const len = this.count(); // The total number of Y-axis values, for example, 1630

    let sampledIndex = 0; // The initial sampling index is 0

    const frameSize = Math.floor(1 / rate); // The scale of the X-axis, each step, in the current example is 3
    // The current real index raw raw, note that this is the index, not the value. The current is zero
    let currentRawIndex = this.getRawIndex(0);
    let maxArea; // Maximum area
    let area; // Temporarily calculate the area
    let nextRawIndex; // Next raw index

    GetIndicesCtor determines which type to return based on the size of this._rawCount
    // Uint32Array or Uint16Array initializes the sector cache
    const newIndices = new (getIndicesCtor(this._rawCount))(Math.ceil(len / frameSize) + 2); 

    // First frame use the first data.
    newIndices[sampledIndex++] = currentRawIndex;
     // Remove the first and last two points, step size cycle processing
    for (let i = 1; i < len - 1; i += frameSize) {
        // The X-axis average of the latter bucket, plus frameSize because the starting point is confirmed
        // The second bucket start point should be spaced with the first one.
        const nextFrameStart = Math.min(i + frameSize, len - 1);
        const nextFrameEnd = Math.min(i + frameSize * 2, len);
        const avgX = (nextFrameEnd + nextFrameStart) / 2;

        // The average point of y of the latter bucket
        let avgY = 0;
        for (let idx = nextFrameStart; idx < nextFrameEnd; idx++) {
            const rawIndex = this.getRawIndex(idx);
            const y = dimStore[rawIndex] as number; // Get the true value of the Y-axis, as given in data
            if (isNaN(y)) { // Non-digit, skip the current loop and start the next loop
                continue;
            }
            avgY += y as number; / / accumulation
        }
        // The average value, the sum of buckets or points (excluding illegal values) divided by the length
        avgY /= (nextFrameEnd - nextFrameStart);

        const frameStart = i; // Start index of the current point set
        const frameEnd = Math.min(i + frameSize, len); // End index

        const pointAX = i - 1; // A is the first of three points. The current is zero
        const pointAY = dimStore[currentRawIndex] as number; // The true y value of the first point

        maxArea = -1; // Initial maximum area

        // On the whole, the X-axis is the position coordinate of the point, is the value, and the Y-axis is the real value
        nextRawIndex = frameStart;

        for (let idx = frameStart; idx < frameEnd; idx++) {
            const rawIndex = this.getRawIndex(idx);
            const y = dimStore[rawIndex] as number;
            if (isNaN(y)) { // If it is not a number, skip the current loop
                continue;
            }
            The sum of the areas of the triangle is equal to the left rectangle plus the right trapezoid minus the right upper triangle
            // The following formula can be obtained.
            area = Math.abs((pointAX - avgX) * (y - pointAY)
                - (pointAX - idx) * (avgY - pointAY)
            );
            if (area > maxArea) {
                maxArea = area;
                nextRawIndex = rawIndex; // Next a is this b}}// find the largest real index for the next bucket, and move the sampleIndex++ down one bit
        newIndices[sampledIndex++] = nextRawIndex;
        // This a is the next a (chosen b).
        currentRawIndex = nextRawIndex;
    }

    // First frame use the last data.
    newIndices[sampledIndex++] = this.getRawIndex(len - 1); // Last point
    target._count = sampledIndex; / / the total number of
    target._indices = newIndices; // Assign an index set

    target.getRawIndex = this._getRawIdx;
    return target;
}
Copy the code

If you want to test sampling in Echarts, click here.

conclusion

Graphical visualization of large amounts of data is a very interesting topic and a core point often encountered in performance optimization. There are many algorithms in the book, which are not introduced here, and a clear comparison is made based on performance, complexity, accuracy and other indicators. In most cases, LTTB algorithm is the best, that is, based on the difference in the scene, LTTB algorithm is given priority, and other algorithms can be tried when the requirements are not met.

The resources

Downsampling Time Series for Visual Representation
Largest Triangle Three Buckets
Downsampling algorithm for temporal data visualization
Downsampling example
Downsampling example

How to do a graphical display based on the large amount of data visualization, to learn the LTTB algorithm you do not know

preface

Maximum triangle three bucket algorithm

Application scenarios in Echarts

conclusion

The resources

Related Posts

Thoroughly understand the js event loop mechanism

TypeScript Examples (8)

Learn about TypeScript common type expression keywords!