This is the 98th unwatered original, want to get more original good articles, please search the public account to pay attention to us ~ this article was first published in zhengcai cloud front blog: SpreadJS for Data Visualization Exploration

One, foreword

Data visualization includes three branches: scientific visualization, information visualization and visual analysis.

1. Scientific visualization is primarily concerned with the visualization of three-dimensional phenomena, such as systems in architecture, meteorology, medicine, or biology. The emphasis is on realistic renderings of bodies, surfaces, lighting, etc., and perhaps even some kind of dynamic component.

2. Information visualization is a kind of picture that combines data and design, which is helpful for individuals or organizations to transmit information to audiences in a short and effective manner.

3. Visual analytics is defined as an analytical and reasoning science based on visual interactive interface. It integrates graphics, data mining, human-computer interaction and other technologies together to form complementary advantages and mutual improvement of human brain intelligence and machine intelligence.

Visualization analysis of visual report is a top priority to display large amounts of data quickly, and data operation, flexible operation including data filtering, correlation, linkage, drill, copy the query, replace, style Settings, conditional formatting injection to achieve more levels, icon sets, data bars, duplicate values, and the formula of inserts, cross linkage table, etc. SpreadJS is most prominent in addressing visual analysis reports, so we’ll only look at the role SpreadJS plays in visual analysis.

Ii. Difficulty in visualization of reports

In the Internet e-commerce service industry, I usually deal with a large amount of business information and user information. Customer service and data analysts are the main users of reports.

Customer service usually handles a large number of work order filling, customer complaint registration, original data import, statistical summary, review and approval, cable signing, distribution and other work every day. At ordinary times, the carrier of most work information is Excel, and the server needs to process massive documents every day. Because it is difficult to extract the data of Excel documents into the database, it is not convenient to distribute the data to the operator in the first time when the template is updated, and it is difficult to integrate the data into the Web page, etc.

The data analyst needs to get the data and summarize it, calculate the sales of each product brand, the maximum, minimum, average, etc., and identify the valuable data. Capture effective data and make a report to boss.

According to the above scenarios, the following difficulties can be summarized in report visualization:

1. The concurrent

The company has a large number of customer service staff, thousands of people concurrently online heavy operation, short business flow cycle, large amount of data, so the concurrent performance of the server is a great consumption. You can use Apache POI in the background to extract and modify Excel data, and perform the formula calculation. There are two performance bottlenecks:

1) Files need to be uploaded and downloaded frequently, which puts a lot of pressure on the server bandwidth;

2) All Excel parsing and extraction operations are performed on the server side, and the server is overwhelmed by frequent IO operations.

These two performance points are difficult to break through under the current architecture and are one of the most challenging requirements when refactoring a project. Of course, hardheap server configuration is a solution, but it does not solve some of the other problems, and can also bring operational pressure.

2. Perform Excel operations andHigh compatibility requirements

If the new system cannot be used quickly, the training cost will be unbearable for the volume of users of this project. And to be able to directly import the existing Excel report template, otherwise it is difficult to develop or design all Excel reports again.

3. The report format is flexible

For different business scenarios, the report template is also ever-changing. Therefore, without the intervention of research and development, the design and filling of the operator can be completed on the page is particularly important.

4. Support formula calculation

As it involves modules such as commodity, order, cost accounting and financial statistics, it requires higher types and performance of calculation formulas.

5. Data documents in the workflow

In the previous workflow of the system, when the Excel report is involved, either the data will be assembled on the server and the Excel template first, or the system will find the Excel file of the file server according to the path and then flow to the corresponding link. Some new business modules even use mail only for file transfer.

This process will produce a large number of files, the file server brought a lot of pressure, the background also had to do regular batch data splitting and maintenance. This upgrade needs to address this issue.

Third, how to think about selection

First of all, the first step of selection is to make clear what specific products on the market for us to choose, for the current market can be integrated into the system, support the online form document editing products are many, I generally divided them into two categories.

1. Cloud document type product

There are many such products, such as WPS, Graphite document and Office Online. They have a high degree of completion in their own right, have enabled users to implement almost all functions including online collaboration, even support a degree of secondary development and can be privately deployed. However, the problem is that this kind of product is usually closed, secondary customization development is relatively difficult, and not light enough. The authorization method is also based on the time, the amount of concurrent, the number of users and other ways of authorization, the price is expensive, not very suitable for our needs.

2. Control type products

LuckySheet, Handsontable, and SpreadJS are standard controls, all of which are purely front-end table controls that support Excel features and JSON data binding.

LuckySheet is a domestic MIT open source software that can be used for commercial use. However, when I researched it, it had only been online for 1 or 2 months, and unlike React, it was not endorsed by a big factory, so it could not be used in our formal project. Up to now, one year has passed and communication platforms such as QQ groups and forums have been launched successively, but they are still weak.

Handsontable is a foreign commercial table control. It is said that there are many secondary development problems, but the biggest problem for us is that it does not have a Chinese support team.

SpreadJS is grape City’s commercial Excel spreadsheet control, and it’s interesting to note that in V2EX’s LuckySheet comment section, the author of LuckySheet also says that SpreadJS is an industry benchmark. It supports importing most Of Excel’s features, including formulas, charts, styles, and conditional formatting (macros are not supported). And the most surprising thing is that its operation interface is a complete Excel interface, completely pure JS development, using JSON for template and data interaction. SpreadJS also has an after-sale support team, technical issues can be exchanged over the phone and in forums throughout the workday, and there’s plenty of material available, including videos, documentation, examples, API manuals, and even training with their technical advisers. For a project team with a short construction period and heavy development tasks like ours, it can save a lot of energy and reduce risks.

Image credit: SpreadJS online Excel editor

So what is a control? Why use controls?

In computer programming, a control (or widget, or control) is a graphical user interface element that displays an arrangement of information that can be changed by the user, such as a window or a text box. Control definitions are characterized by providing a single point of interaction for direct manipulation of given data. A control is a basic visual building block contained within an application that controls all the data that the program processes and the interactions with that data.

According to my own understanding, the control is only to provide the basic function, support the secondary development of the function module. The control is lighter and more malleable, and has corresponding development documents and apis. It is a basic function package for developers to customize the function according to their needs.

Iv. SpreadJS needs solutions and advantages

1. The concurrent

Because SpreadJS is designed to separate data from templates, the user only needs to complete the report on the page. When submitting, you can just submit the completed JSON data, and the server no longer has to resolve all Excel files in a centralized manner. Bandwidth consumption is also directly reduced by half.

2. High requirements on Excel operation and compatibility

In the internal trial, the financial and customer service girls reported that the use experience was almost exactly the same as Excel, and no additional training was required. And our own large number of Excel reports can be directly imported into (after secondary development can also achieve batch and remote import), including charts, formulas, table style and a series of elements can be directly imported online operation.

3. The report format is flexible

Designers can directly design online, or take the Excel designed report to the Web end, do data binding, submit and save it into JSON format (ssJSON format of Spread JS includes all information of Excel documents).

4. Support formula calculation

Support more than 450 (A total of more than 480 kinds of Excel) formula, but also can develop their own extended custom formula, finance is also enough. It also supports all Excel reference operations, such as cross-sheet references, absolute references, function name information, and so on.

5. Data documents in the workflow

Basically out of the dependency on files, all process state and dependent data can be recorded in the database, the file server only needs to save a small number of template documents (in fact, the number of templates can be directly put into the database, but we have a ready-made file server). This saves us 90% of the space cost of the file server, and the operation friends have to wake up in the middle of the night laughing.

Dive into SpreadJS

The point is that what interests me most as a front-end developer is some of SpreadJS’s underlying design and improvements to memory and performance balance. I have done a lot of research and study in this regard. Fortunately, it is not difficult to find information in this respect, and I can often meet some relevant technical sharing in the open class section of the official forum of Grape City. Here’s a brief summary of what you’ve learned:

1. Rendering performance

Performance is certainly the biggest concern for every depth table control user. We often have thousands of pieces of data, and Excel is not convenient for paging (involving formula calculation and summary on the front end), so we were very worried during selection. SpreadJS was able to easily load 500,000 pieces of data, which took around 200ms to load (the performance demo on the website was only able to load 50,000, which we tested ourselves). Later on, I learned that their approach to solving this problem was like this:

  • Render in real time + Double Buffering :

Using Canvas to render a portion of the table, and only the part of the content that the user sees, you can load 1000 rows and 100000 rows of data very quickly, with the same performance difference.

Double Buffering is designed to solve the continuous experience of continuous rendering, and also to further speed up rendering. I don’t think many people have heard of this, but I think everyone has experienced Double buffering. In graphics, it’s usually called Double buffering, where the actual drawing instruction is done in a buffer, and it’s done very quickly, and after the drawing instruction is done, it’s immediately displayed on the screen by exchanging instructions. This avoids incomplete drawings and is highly efficient. It’s quite common in the game that when our main character is running around the map, the game engine will load and render the map in real time according to the character’s movement direction, which avoids the long wait when loading a large map at once.

[SpreadJS performance optimization]

  • Sparse arrays:

SpreadJS optimizes the storage of tabular data with a sparse array data structure. Sparse arrays are often used to optimize the memory footprint of two-dimensional arrays (such as chess boards, maps, etc.), but they have the inherent disadvantage of slow access performance.

So in response to this question, I gave it a stress test, and the million-level traversal took more than 200 ms. The performance is good enough for our needs.

2. Computing engine

According to the official introduction, the formula engine actually consists of two parts of the implementation, one is the computation logic system and the other is the reference system.

  • Reference system

SpreadJS encapsulates this functionality in a way that doesn’t require developers to worry about (unless there are special needs like reference backtracking).

There are direct references, cross-sheet references, relative/absolute references, named information references, table columns and formulas references, cross-workbook references, and so on. SpreadJS ‘Runtime is on the web side, so cross-workbook references are out of the question, at least for now.

  • Calculation logic

SUM, IF, MATCH, and VLOOKUP, all of which can be typed into a cell, act like little “logic packages”. SpreadJS currently has 460+ native formulas compared to 490+ in Excel, and has the ability to customize them. The experience is the same as the original formula.

For the underlying implementation, these formulas are no longer independent “islands of logic” after multiple iterations. There is a lot of abstraction and reuse at the bottom of the formula implementation, and the new version is said to be significantly less code than the old version while improving performance, which is also more front-end engineering packaging friendly.

For the implementation of nested formula calculation, SpreadJS creates an AST tree at the bottom to parse the calculation logic of user-set formulas. From the code of the official example, a set of expressions are created at the bottom of the formula, and the corresponding public interface can be called, as shown in the figure below:

[SpreadJS formula structure tree display]

  • performance

First of all, as a front-end technology, we can first analyze the performance bottlenecks and the impact caused by the technical requirements of formula calculation. We use in the event of a large number of users in the development time, dirty data, linkage, and other functions, all of these functions to ensure correct operation of an important premise, is must be able to ensure that can get the right results at any time, so the most direct the implementation of the idea is to make the formula with high priority, synchronous way to perform the calculation.

We all know that multithreading can help share the computing burden, but design and implementation difficulties aside, even with Web Worker support, JavaScript is strictly a single-threaded language because the Web Worker child threads are fully controlled by the main thread. And the main thread cannot be blocked and suspended. So even with the introduction of Web workers, there is no guarantee of synchronous execution as mentioned above.

From the above analysis, it can be seen that the computational performance of the formula is limited by the computational power of the JavaScript. I found an image that shows the computing power of Node.js. (Node.js is a V8 engine and is widely recognized as the fastest JS engine.)

Image from “Node.js in A Nutshell”

SpreadJS, however, has a set of formulas to calculate:

In our tests, the above computational performance is close to the computational performance of native JS, and SpreadJS is close to the physical limit of its optimization in this area. At present, in our application scenario, this computing performance is enough to use, but it does not rule out that there will be a massive amount of data and formula of computing needs in the future, and in this respect, the official has also given relevant solutions, please refer to here.

Officials are also said to be working on caching technology that will allow for block-caching of formula calculations: even if a value changes along the reference chain, the entire reference chain is not required to calculate the formula. It sounds very powerful, it sounds like a good idea, and HOPEFULLY it will come out soon.

3. Style system

Excel’s styling system is very complex, with flexible and large implementations for every feature point, including borders, fonts, alignment, data formatting, conditional formatting, etc. When I first started learning about SpreadJS, I was also amazed by its Style class, except for the “visible” borders, backgrounds, fonts, alignments, etc. There are cell types, data formats, table buttons, dropdowns, watermarks, things like that. Style is too heavy. If you customize a large number of cell styles, memory and performance will be poor. However, no bottleneck was found in the actual application. Originally, a layered structure was used to design, as shown in the figure:

[SpreadJS performance optimization]

How to use SpreadJS?

1. Render tables

Figure 6.1-1 Binding data and formulas

First, get the global spread object. The spread object is the body of the entire table, and the spread object is divided into multiple sheets. Every time SpreadJS initializes, it returns a Spread object.

  • Vue version spread object
 <gc-spread-sheets @workbookInitialized='spreadInitHandle($event)' />
  methods:{
     spreadInitHandle: function (spread) {
      this.spread = sprea
    },
  }
Copy the code
  • Bind data, bind formula
 tableDataBind() {
    // The data source can be obtained from the background request
    var dataSource = {
        // Note that a layer of bindPath has been added to map the binding path of the table
        bindPath_table: [{
            c1: 100.c2: 90.c3: 30.c4: 40
        }, {
            c1: 88.c2: 66.c3: 55.c4:100
        }, {
            c1: 30.c2: 89.c3: 100.c4: 40}, {c1: 40.c2: 66.c3: 88.c4: 40}};// The table binding and cell binding data source need to be wrapped in SpreadJS 'CellBindingSource
    var spreadNS = GC.Spread.Sheets;
    var dataSource1 = new spreadNS.Bindings.CellBindingSource(dataSource);
    var table2 = this.activeSheet.tables.add("tableName".0.0.1.5, spreadNS.Tables.TableThemes.light6);
    table2.showFooter(true);
    table2.autoGenerateColumns(false);
    var c1 = new spreadNS.Tables.TableColumn(1);
    c1.name("Chinese");
    c1.dataField("c1");
    var c2 = new spreadNS.Tables.TableColumn(2);
    c2.name("Mathematics");
    c2.dataField("c2");
    var c3 = new spreadNS.Tables.TableColumn(3);
    c3.name("English");
    c3.dataField("c3");
    var c4 = new spreadNS.Tables.TableColumn(4);
    c4.name("Chemistry");
    c4.dataField("c4");
    var c5 = new spreadNS.Tables.TableColumn(5);
    c5.name("Total");
    table2.bindColumns([c1, c2, c3, c4, c5]);
    table2.bindingPath("bindPath_table");
    // Set the formula
    table2.setColumnDataFormula(4."=[@ Language]+[@ math]+[@ English]+[@ chemistry]");
    table2.setColumnFormula(4."= SUBTOTAL (109, [together])");
    // Set to allow the contents of a cell to extend beyond the cell, regardless of binding
    this.activeSheet.options.allowCellOverflow = true;
    / / bind the dataSource
    this.activeSheet.setDataSource(dataSource1);
    this.spread.resumePaint();
},
Copy the code

2. Render conditional formats

Render conditional formatting: data rendering done can only ensure the normal order of the data can be displayed, but it can’t meet the needs of data analysts, but also clearly shows the effective data such as: the maximum and minimum standard red, the progress bar to show a change in state, ICONS to show rising or falling, double color order, three levels, such as, specific how to implement?

  • Icon set: Effects shown here

  • The implementation code
iconset() {
    var activeSheet = this.activeSheet;
    var iconSetRule = new GC.Spread.Sheets.ConditionalFormatting.IconSetRule();
    // The demo first writes the dead region
    iconSetRule.ranges([new GC.Spread.Sheets.Range(0.0.5.5)]);
    // IconSetType Specifies the type of graph flags: arrows, circles, and execl
    iconSetRule.iconSetType(GC.Spread.Sheets.ConditionalFormatting.IconSetType.threeArrowsColored);
    var iconCriteria = iconSetRule.iconCriteria();
    iconCriteria[0] = new GC.Spread.Sheets.ConditionalFormatting.IconCriterion(
      true,
      GC.Spread.Sheets.ConditionalFormatting.IconValueType.number, 
      60
    );/ / (< 60)
    iconCriteria[1] = new GC.Spread.Sheets.ConditionalFormatting.IconCriterion(
      true,
      GC.Spread.Sheets.ConditionalFormatting.IconValueType.number,
      90
    );/ / (60 < = 90)
    iconCriteria[2] = new GC.Spread.Sheets.ConditionalFormatting.IconCriterion(
      true,
      GC.Spread.Sheets.ConditionalFormatting.IconValueType.number,
      90
    );/ / (> = 90)
    iconSetRule.reverseIconOrder(false);
    iconSetRule.showIconOnly(false);
    activeSheet.conditionalFormats.addRule(iconSetRule);
},
Copy the code
  • Progress bar: The effect is shown here

  • The implementation code
dataBar(){
    var activeSheet = this.activeSheet;
    activeSheet.conditionalFormats.addDataBarRule(
        GC.Spread.Sheets.ConditionalFormatting.ScaleValueType.number,0./ / minimum number
        GC.Spread.Sheets.ConditionalFormatting.ScaleValueType.number, 100./ / Max
        "orange"./ / color
        [new GC.Spread.Sheets.Range(0.0.5.4)]); },Copy the code
  • Repeat value: the effect is shown in figure

  • The implementation code
duplicateValue() {
    var activeSheet = this.activeSheet;
    var style = new GC.Spread.Sheets.Style();
    style.backColor = "yellow";
    style.foreColor  = "red";
    var ranges = [new GC.Spread.Sheets.Range(0.0.5.4)];
    activeSheet.conditionalFormats.addDuplicateRule(style, ranges);
},
Copy the code
  • Cell containing text 6: The effect is shown

  • The implementation code
includeText() {
    var activeSheet = this.activeSheet;
    var style = new GC.Spread.Sheets.Style();
    style.backColor = "red";
    var ranges = [new GC.Spread.Sheets.Range(0.0.5.5)];
 	  activeSheet.conditionalFormats.addSpecificTextRule(
   		GC.Spread.Sheets.ConditionalFormatting.TextComparisonOperators.contains, "6", style, ranges
    );
},
Copy the code
  • The above results are shown in the figure

Write at the end

This paper mainly introduces my own exploration in the direction of data visualization, which is helpful for students who are ready to make market market, mail subscription reports, online collaboration, visual analysis and other directions.

Due to the long length of the paper, there are many conceptual things involved, so it is inevitable that mistakes will occur. I hope you can make more corrections. Thank you!

Recommended reading

Through the custom Vue instruction to achieve front-end exposure buried point

H5 Page list caching scheme

, recruiting

ZooTeam (ZooTeam), a young and creative team, belongs to the product RESEARCH and development department of ZooTeam, based in picturesque Hangzhou. The team now has more than 40 front end partners, the average age of 27 years old, nearly 30% are full stack engineers, no problem youth storm team. The membership consists of “old” soldiers from Alibaba and netease, as well as fresh graduates from Zhejiang University, University of Science and Technology of China, Hangzhou Electric And other universities. In addition to the daily business connection, the team also carried out technical exploration and actual practice in the direction of material system, engineering platform, building platform, performance experience, cloud application, data analysis and visualization, promoted and implemented a series of internal technical products, and continued to explore the new boundary of the front-end technology system.

If you want to change the things you’ve been doing, you want to start doing things. If you want to change, you’ve been told you need to think more, but you can’t change; If you want to change, you have the power to achieve that result, but you are not needed; If you want to change what you want to accomplish, you need a team to support you, but there is no place for you to bring people; If you want a change of pace, it’s “3 years of experience in 5 years”; If you want to change the original savvy is good, but there is always a layer of fuzzy window paper… If you believe in the power of belief, that ordinary people can achieve extraordinary things, that you can meet a better version of yourself. If you want to get involved in the growth of a front end team with a deep understanding of the business, a sound technology system, technology that creates value, and spillover impact as the business takes off, I think we should talk about it. Any time, waiting for you to write something, to [email protected]