Recently, I received the demand to make a social relationship map for 10W social sharing data, which should be fully rendered for macro analysis. This paper discusses the optimization method for smooth rendering of ten thousand nodes.

Note: the original link: www.404forest.com/2018/10/12/… Article backup: github.com/jin5354/404…

This code has been packaged as component D3-force-graph, repository address github.com/jin5354/d3-… .

Render effect:




Rendering a social graph with a large amount of data in real time on the browser and ensuring a smooth experience requires many aspects of optimization. Here are some practical lessons from graphic rendering, data I/O, data computation, and details.

1. Graphic rendering

The original data format of social sharing is as follows:

{
  "source": "sourceNodeName",
  "target": "targetNodeName"
}Copy the code

10W pieces of original data, about 5W nodes and 4W lines can be obtained after deduplication, invalid points removal and pre-processing. The data is stored in an Object in the following format and occupies about 10 MB memory.

{ "nodes": ["A", "B", "C", ...] , "links": [{ "source": "A", "target": "B" }, { "source": "C", "target": "D" }, ...] }Copy the code

1.1 Selection: D3-force force guide layout + WebGL rendering

How to distribute so many points on the canvas, and dense, it is best to arrange a large structure in the center, retail placed outside? Force – oriented algorithm is a graph layout algorithm, it can make the point – line relationship in a clear and beautiful posture. This algorithm, based on particle physics, simulates each node as an atom, and at each frame the speed and acceleration of the node are generated by interatomic repulsion (binding to the line), generating a new position. After many iterations, a stable layout with low energy was finally achieved. For more information about force-oriented algorithms, see D3-Force.

Given the location of each node, how can points and lines be drawn? I wrote a demo in SVG and drew 5000 points in SVG on my machine and it was down to 10fps. Therefore, it is impossible to draw 10,000-level nodes without relying on hardware acceleration. The author is familiar with three.js, so he chose WebGL (three.js) for rendering.


1.2 Particle System + LineSegments + BufferGeometry

Three. Geometry is most often used when constructing objects in three.js. Geometry is a data structure in three. js, which contains the vertex position, Color and other information of the Geometry. Data structures such as three. Vector3 and three. Color are used to store the information, which is intuitive and convenient to read and write, but with average performance. According to the most common thinking, for each node, we need to use THREE.CircleGeometry to construct a circle, and for each Line, we need to use THREE.

/ / the first version / / draw a circle enclosing a node paintData. Nodes. The forEach ((node) = > {node. The geometry = new THREE. CircleGeometry (5, 12) node.material = new THREE.MeshBasicMaterial({color: 0xAAAAAA}) node.circle = new THREE.Mesh(node.geometry, Node. This material). The scene. The add (node. Circle)}) / / draw a line segment in every line enclosing paintData. Links. ForEach ((link) = > {link. LineMaterial = new THREE.LineBasicMaterial({color: 0xAAAAAA}) link.lineGeometry = new THREE.Geometry() link.line = new THREE.Line(link.lineGeometry, link.lineMaterial) link.line.frustumCulled = false this.scene.add(link.line) })Copy the code

However, the actual measurement found that the system would also be very difficult to render at 5K nodes. If we want three.js to draw 5W circle objects and 4W line objects, each circle object has 13 vertices, the total number of vertices to draw more than 70 W. To optimize, you have to reduce the number of vertices, reduce the number of objects, and so on.

Using particle systems is a good choice for a large number of objects with little difference. In particle systems, each node needs only one vertex with a circular texture attached to it. After using the particle system, tens of thousands of Circle objects can be reduced to one particle system object, greatly reducing the complexity.


For a large number of straight segments (without turning), use THREE.LineSegments. Using gl.LINES, THREE.LineSegments can pass in a set of vertices, each pair forming a line segment. This reduces tens of thousands of Line objects to a single LineSegments object, greatly reducing complexity.

The disadvantage of particle systems and LineSegments is that GLSL Shaders must be hand-written if the color or size of individual particles is later adjusted.

D3-force-graph allows you to customize node and line styles, such as resizing, color, etc., with a simple shader written in GLSL. Limited by space this article does not introduce GLSL, interested students can view the source code to understand. You can also take a look at the series of demos the author left behind while learning WebGL: WebGL Tutorial.

BufferGeometry is a data structure similar to Geometry used to describe Geometry, which uses binary arrays to store vertex positions, colors, and other information. Binary data must be used for data exchange between Javascript and graphics card. If traditional text format is used, format conversion is required, which is very time-consuming. BufferGeometry can feed binary data into the graphics card intact, significantly improving script performance. In the scenario of this paper, the location array and color array of ten thousand level nodes are all M in size, and it is necessary to replace the Geometry with BufferGeometry.

Using binary arrays reduces the readability of the code, but significantly improves performance.

// Prepare the node, using BufferGeometry, The position is set to (-9999, -9999, 0) point. Geometry = new THREE.BufferGeometry() Positions = new Float32Array(Paintdata.nodes.length * 3) // Use the particle system instead of using geometry to draw circles. Instead use a round PNG with a transparent background // Later for more flexibility, Material = new THREE.PointsMaterial({size: 10, map: texture, transparent: True}) // Fill the binary array of positions, the readability is reduced, The only way to find x,y, and z is to use the subscript +1,+2 paintdata.nodes. ForEach ((e, i) => { point.positions[i * 3] = -9999 point.positions[i * 3 + 1] = -9999 point.positions[i * 3 + 2] = 0 }) ... / / bind position binary array point. Geometry. The addAttribute (' position, the new THREE BufferAttribute (point positions, 3)) point.geometry.computeBoundingSphere() let points = new THREE.Points(point.geometry, Add (points) // Draw the line segment line. Geometry = new THREE.BufferGeometry() line. Positions = new Float32Array(paintdata.links.length * 6) // Float Float32Array(paintdata.links.length * 6) THREE.VertexColors}) // The initial position of all points (-9999, -9999, -0.1) paintdata.links.foreach ((e, Positions [I * 6 + 1] = -9999 line. Positions [I * 6 + 2] = -0.1 Line. Positions [I * 6 + 3] = -9999 line. Positions [I * 6 + 4] = -9999 line. line.geometry.addAttribute('position', new THREE.BufferAttribute(line.positions, 3)) line.geometry.computeBoundingSphere() line.lines = new THREE.LineSegments(line.geometry, line.material) scene.add(line.lines)Copy the code

Another tip is that if you are going to use binary arrays, it is better to use binary arrays from the beginning of the logic, such as Float32Array in the previous code. Although you can use ordinary array has been in the business (ordinary array than binary array some API, or a bit more convenient), until to data into the three. The js to call three. Float32BufferAttribute transform it, But this has resulted in a significant performance loss for the ten thousand levels of data.



After such optimization, the time of drawing a frame has been reduced to less than 50ms, full draw can ensure the frame rate of 15 ~ 30fps, basically smooth. At the end of the layout, use the control plug-in of three. js to drag, pan, zoom and other view operations, stable at 60fps.

1.3 Using Web Worker to avoid main thread blocking

Under the amount of data in this paper, it takes about 2s for D3-force to iterate each frame. So we can see this effect:


The screen moves every 2s, which seems to be very slow, and the main thread is blocked during the calculation process, and the UI does not respond, giving a very poor experience. We can move the part of D3-force into worker to ensure the smoothness of the main thread. Force-directed Web Workers can be seen in Resources for a more detailed demo.

// main this.worker = new worker ('worker.js') // Pass the information to worker worker.postMessage({nodes: nodes, links: Worker. onMessage = function(event) {switch (event.data.type) {case 'tick': return ticked(event.data); case 'end': return ended(event.data); }}Copy the code
ImportScripts ("https://d3js.org/d3-collection.v1.min.js"); // worker.js // call d3-force for layout iteration. importScripts("https://d3js.org/d3-dispatch.v1.min.js"); importScripts("https://d3js.org/d3-quadtree.v1.min.js"); importScripts("https://d3js.org/d3-timer.v1.min.js"); importScripts("https://d3js.org/d3-force.v1.min.js"); onmessage = function(event) { var nodes = event.data.nodes, links = event.data.links; var simulation = d3.forceSimulation(nodes) .force("charge", d3.forceManyBody()) .force("link", d3.forceLink(links).distance(20).strength(1)) .force("x", d3.forceX()) .force("y", d3.forceY()) .stop(); for (var i = 0, n = Math.ceil(Math.log(simulation.alphaMin()) / Math.log(1 - simulation.alphaDecay())); i < n; ++i) { postMessage({type: "tick", progress: i / n}); simulation.tick(); } postMessage({type: "end", nodes: nodes, links: links}); }Copy the code

1.4 Tween animation

After the d3-force layout calculation is moved to the worker, the main thread is no longer blocked, but the picture still moves once in 2s due to the layout speed. During this 2s interval, the main thread is idle, so we can actively add transition animations to improve the flow.



For example, if the first frame is computed at 2000ms and the position x = 5 is computed at 4000ms and the second frame is computed at position x = 10, we can start drawing at 4000ms. The goal of drawing is that x changes from 5 to 10 in 2000ms. Then, when rAF is called to execute, if the current moment is 4400ms, the current position should be (4400-4000) / 2000 * (10-5) + 5 = 6, that is, x = 6 is drawn during this execution. Since the calculation time of d3-force each frame is relatively stable, and the speed becomes slightly faster as the calculation reaches the later stage, the tween strategy can be adjusted slightly, but the general idea remains unchanged. With the addition of tween animation, the animation can be directly increased to around 30fps, which greatly improves the experience.

2. The data I/O

2.1 the progress bar

As the amount of data increases, many small areas that need no attention before have become bottlenecks. For example, 10W pieces of data are about 10M in size. It takes a certain amount of time to pull the interface, calculate and layout. For example, the layout, D3-force by default will iteration 300 times to reach a stable state, in this scenario, adjust the parameter to iteration 50 times can be finished, that also takes 1 ~ 2 minutes. Add a progress bar to make it more user-friendly.

2.2 Transferable ArrayBuffer

In the process of migrating D3-Force to worker, I noticed a phenomenon:


No function is being executed at this point, and my intuition tells me that this part should be the I/O loss of the main thread and the worker thread exchanging data. A review of the postMessage document on MDN found: PostMessage also receives a second argument, which is only allowed to be of Transferable type, including ArrayBuffer, MessagePort and ImageBitmap, Use this parameter to transfer control of the Transferable variable directly from the main thread to the worker thread. Workers ♥ ArrayBuffer it is very easy to pass binary data between the main thread and the worker thread. Performance comparisons using an ArrayBuffer have already been done, using a comparison diagram from Examining Web Worker Performance:



The passed data is then rewritten to form an ArrayBuffer. Readability is also sacrificed here (object descriptions must now be fully tiled into arrays, and ArrayBuffer is cumbersome to pass letters and Chinese characters, which are best mapped to numbers) for performance. The performance of ArrayBuffer is as follows:


3. Data calculation

3.1 Complexity Optimization

Raw data is always preprocessed, such as counting the most influential (most shared) nodes, screening out useless nodes without sharing relationships, data tailoring, and so on. With a lot of data, it’s important to use the right algorithm; The first version of writing is very random, traversal set traversal, high complexity, 1W data can also accept, run hundreds of ms out, 10W data directly stuck for six or seven seconds. Later optimization, multi-use hashmap, space for time, rewrite two or three versions, and finally control the calculation time within 2s, which is still ideal.

3.2 Split multiple Web workers

Migrating the calculation process to worker can avoid blocking the main thread and ensure smooth interaction. However, in order to maximize the speed of calculation, we can split up several Web workers to make full use of multi-core performance. Javascript Web Workers Test v1.4.0 is a Web worker Test that shows that splitting can significantly reduce computation time on multi-core machines. With the aid of the navigator browser interface. HardwareConcurrency we can get the number of processor cores, then you can break up, such as 8-core machines down seven worker thread can maximize use of core. The separation of the calculation logic and the combination of the results need to be designed by ourselves. This paper only made a survey, and did not do the separation work because of the short calculation time.

Details of 4.

4.1 Avoid Vue Observe

As we know, Vue observes data under data. However, it takes a long time to Observe data under data. See the following figure:


Location array assignments and changes often occur during rendering. If each frame triggers such a 90ms operation, it will be unbearable. Therefore, it is recommended that arrays and objects with a large amount of data not be placed under Data to avoid time-consuming due to Observe.

4.2 energy saving

After the layout is finished, continuous rendering is also very performance – the machine fan will keep turning; We can start drawing with the mouse hover on the canvas and stop drawing with the mouse mouseleave in other areas, so that we can avoid wasting machine performance in pure display.

4.3 the throttle

When the number of nodes is large, the pulling and drawing of node head will become a performance problem. Generally speaking, when the field of view is large and the node is small, there is no need to load the image. It can be set to load the image in the viewport only when the node is larger than a certain degree (that is, the SCENE camera Z coordinate is less than a certain value). You can throttle it to once per second if you’re trying to determine which nodes are in view and load them every frame too often. At the same time, the head object is cached, and the field of vision is dynamically unloaded and loaded to avoid performance problems caused by too much head loading.

4.4 GPU acceleration

The profile picture on the server is square, but we want to draw a round image, how to deal with the rounded corner effect? According to the general idea, we can use canvas API to draw a circle to fill the image, and finally export the new image (see Zhang Xinxu’s big article: Small tip: SVG and Canvas respectively achieve the image rounded corner effect). But because we have the ability to manipulate the slice shader, we can modify the texture directly in the shader, which is not only rounded, but also stroke and anti-aliasing. The shader runs directly on the GPU and performs very well. If you use software to simulate anti-aliasing, it will be much more expensive.


Some demo images:


5. Reference materials

  1. d3-force
  2. Geometry – three.js docs
  3. BufferGeometry – three.js docs
  4. ArrayBuffer – Getting started with ECMAScript 6
  5. Points – three.js docs
  6. LineSegments – three.js docs
  7. Force-Directed Web Worker
  8. Workers has ArrayBuffer | Web | Google Developers
  9. Examining Web Worker Performance
  10. Worker.postMessage() – Web APIs | MDN
  11. Javascript Web Workers Test v1.4.0
  12. navigator.hardwareConcurrency – Web APIs | MDN
  13. Drawing Anti-aliased Circular Points Using OpenGL/WebGL
  14. WebGL tutorial