evs-dsv

The module parses text delimiters. Common data formats are comma-delimited CSV files or tab-delimited TSV files. These tables are very popular in Excel and are much more space-efficient than JSON. RFC 4180Common Format and MIME Type for comma-separated Values (CSV) Files. For those interested, visit Github to read the source code and parse esV-DSV

API Reference and algorithm analysis:

evs.dsvFormat(delimiter):

Constructs a new DSV parser and formatter for the specified DSV. The separator must be a single character, such as ASCII code

dsv.parse(string[, row])

Parses the specified string in the format with the corresponding delimiter value, returning the parsed array of objects.

This approach differs from Dsv.Parserows in that the first line of the DSV content contains a delimited list of column names, for example:

Year, Make, Model, Length 1997, Ford, on the, 2.34 2000, Mercury, so-called Cougar, 2.38Copy the code

The parsed JS array is:

[{"Year": "1997"."Make": "Ford"."Model": "E350"."Length": "2.34"},
  {"Year": "2000"."Make": "Mercury"."Model": "Cougar"."Length": "2.38"}]Copy the code

In addition, for the ROW conversion function, if not specified, the field values are strings and are not converted to numbers, dates, or other types for safety. (Javascript may automatically cast strings to numbers in some cases, such as when using the + operator, which is native Js). But the last one is to specify such a conversion function. In addition, this module also writes an interface to implement automatic conversion functions, infer and coerce coercion type conversion.

If you specify a row conversion function, the specified function will be called for each row, resulting in a d object representing the current row and an item with subscript I starting at 0. Empty or undefined is ignored automatically. The use of an incoming line conversion function looks like this (combined with the array above) :

var data = d3.csvParse(string, function(d) {
  return {
    year: new Date(+d.Year, 0, 1), // lowercase and convert "Year" to Date
    make: d.Make, // lowercase
    model: d.Model, // lowercase
    length: +d.Length // lowercase and convert "Length" to number
  };
});
Copy the code

Another point mentioned in D3 is that using + instead of parseInt or ParseFloat is usually faster, but is restrictive, such as using + to force “30px” returns NaN, and the other two functions return 30.

dsv.parseRows(string[, row])

Parse parses the specified string, which must be formatted with the appropriate delimiter. Unlike DSv.parse, this method handles the absence of headers and converts each line to an array instead of an object, as shown in the following example:

2.34 2000, 1997, Ford, on the, Mercury, so-called Cougar, 2.38Copy the code

The results obtained are:

[["1997"."Ford"."E350"."2.34"],
  ["2000"."Mercury"."Cougar"."2.38"]]Copy the code

Similarly, if the field value is a string without specifying a conversion function, it is better to use autoType, passing in a sample line conversion function:

var data = d3.csvParseRows(string, function(d, i) {
  return {
    year: new Date(+d[0], 0, 1), // convert first colum column to Date
    make: d[1],
    model: d[2],
    length: +d[3] // convert fourth column to number
  };
});
Copy the code

Missing. The format (rows [, columns]) :

Parse transforms an array of javascript objects into a string with a delimiter. This operation is the reverse of dsv.parse, where rows are delimited with \n, columns are delimited with delimiters, and values containing delimiters are escaped with double quotation marks (“) or line breaks.

If colums is specified, the order of the header lines is indeterminate. Colums is used to specify the header and order.

var string = d3.csvFormat(data, ["year"."make"."model"."length"]);
Copy the code

All fields on each row object are converted to strings, using an empty string if the field value is null or undefined, or ECMAScript date-time string format if the field is Date. For better control over the format and manner of the fields in the code, first map the rows to an array of strings, then use dsv.formatRows.

dsv.formatBody(rows[, columns]):

Ignore the result of the dsv.format of the header line, which is useful for appending lines to existing files

dsv.formatRows(rows) :

Is the inverse operation of DSv. parseRows, which separates each row with a newline character, and each column in each row is separated by a separator, and the values of double quotes or newlines are escaped with double quotes. You can use map to convert the values of the original array of objects as follows:

var string = d3.csvFormatRows(data.map(function(d, i) {
  return [
    d.year.getFullYear(), // Assuming d.year is a Date object.
    d.make,
    d.model,
    d.length
  ];
}));
Copy the code

You can also use concat to add a line header:

var string = d3.csvFormatRows([[
    "year"."make"."model"."length"
  ]].concat(data.map(function(d, i) {
  return [
    d.year.getFullYear(), // Assuming d.year is a Date object.
    d.make,
    d.model,
    d.length
  ];
})));
Copy the code

D3. AutoType (object) :

When parsing the text, the default is all string, or the user can customize type transformation, this function is to setting up the conversion type, convenient use, directly and accordingly concluded that the value of the object type forced their transformation, for example: the CSV file, use this function, the corresponding Numbers parsed into digital type:

Year, Make, Model, Length 1997, Ford, on the, 2.34 2000, Mercury, so-called Cougar, 2.38Copy the code

The use of evs. CsvParse:

d3.csvParse(string, d3.autoType)
Copy the code

The resulting object number:

[{"Year": 1997, "Make": "Ford"."Model": "E350"."Length": 2.34},
  {"Year": 2000, "Make": "Mercury"."Model": "Cougar"."Length": 2.38}]Copy the code

The principle of calculation is to add judgment:

  1. Null, returns NULL
  2. “True”, return true
  3. “False”, return false
  4. “NaN”, returns NaN
  5. Determines if it is a digital ECMA link and returns a number
  6. Determines the date and returns the date object
  7. This method automatically ignores leading zeros, “09”->”9″, and converts digits with other characters to the string “$123″->”$123”. For dates, the default time is midnight UTC when specified in THE ECMAScript ISO 8601 format as YYYY-MM-DD. If there is no time zone (such as YYYY-MM-DDTHh: MM), assume local time. In addition to using this function, you can also use your own type conversion function

Conclusion:

Dsv.format (), dsv.formatbody (), and dsv.FormatRows () are apis for converting arrays of objects into text with different delimiters. A columns function is used to modify the values of each row. Add double quotation marks to formatValue() for different empty values, date objects, and escape characters containing delimiters. Then join(delimiter) instead of formatDate, which processes the date object as a string, use date.getutc… The pad() function is used to add zeros to a date that does not meet the specified length. So the whole point of format is this function, InferColumns () dsv.parse() and dsv.Parserows (), which compute all of the header arrays, convert the delimited text into arrays of objects and arrays of arrays, Dsv. parse() according to the DSv. parseRows implementation, the difference is whether to handle the header line problem, very clever use of return, convert does not initialize the header line, this code is as follows:

function parse(text,f){
		// console.log(text);
		var convert,columns,rows=parseRows(text,(row,i)=>{
			// console.log(row);
			if(convert) returnconvert(row,i-1); The first convert line is not initialized, so the header line is ignored. Columns =row,convert=f? customConverter(row,f):objectConverter(row); // This function is executed only once, which is equivalent to saving the title line and adding the convert function, very cleverly}); rows.columns=columns || [];return rows;
	}
Copy the code

The main parsing steps in parseRows, the code uses the initialization of two empty objects to represent the identity, to determine the different situation, although the empty object, but can distinguish between different states, the parsing steps are 3:1. To get rid of the double quotes, CSV file might generate automatically appear double quotation marks, the second step, sought by an identifier to find a value, the third step, the processing, until meet EOL end line, the f in the header row is no incoming, n records used f the number of rows, and change the result is null line is ignored. A lot of callbacks are used anyway. It also declares the token() function internally, and finds the value of the next delimiter. The core handles the following code snippet:

while((t=token()) ! == EOF){ console.log(t); var row = [];while(t ! == EOL && t ! == EOF) row.push(t), t=token();if(f && (row = f(row,n++)) == null) continue; //n++ means from rows.push(row); }Copy the code

In autoType, the re is used to determine strings that match the date pattern:

 else if(/^([-+]\d{2})? \d{4}(-\d{2}(-\d{2})?) ? (T\d{2}:\d{2}(:\d{2}(\.\d{3})?) ? (Z|[-+]\d{2}:\d{2})?) ? $/.test(value)) { value = new Date(value); } /* * match any character with the link symbol - * \d match the number * {n} n times *? Match 0 times or 1 time *$Match end * * ^([-+]\d{2})? * \d{4}(-\d{2}(-\d{2})?) ? Then 1111 (must) - optional - 11-11 * * (T \ d {2} : \ d {2} (: \ d {2} (\. \ d {3})?) ? * : begin with T T11 11:11. 111 * * (Z | + [-] \ d {2} : \ d {2})?) ? $* ends with z-11:11, optional * */Copy the code

The above five interfaces are the most basic implementation. You can customize a delimiter to convert text to object array and object array to text. In addition, the author adds two interfaces to specify delimiters for our convenience, one of which uses comma delimiters, corresponding to CSV format files:

evs.csvParse(string[, row])

Corresponding dsvFormat (“, “). The parse.

evs.csvParseRows(string[, row])

Corresponding dsvFormat (“, “). ParseRows.

evs.csvFormat(rows[, columns])

Corresponding dsvFormat (“, “). The format.

evs.csvFormatBody(rows[, columns])

Corresponding dsvFormat (“, “). FormatBody.

evs.csvFormatRows(rows)

DsvFormat (“,”).formatRows. The other is the TSV file with \t as delimiter:

evs.tsvParse(string[, row])

Corresponding dsvFormat (” \ t “). The parse.

evs.tsvParseRows(string[, row])

Corresponding dsvFormat (” \ t “). ParseRows.

evs.tsvFormat(rows[, columns])

Corresponding dsvFormat (” \ t “). The format.

evs.tsvFormatBody(rows[, columns])

Corresponding dsvFormat (” \ t “). FormatBody.

evs.tsvFormatRows(rows)

Corresponding dsvFormat (” \ t “). FormatRows.

In addition, the author also realize the command line tools included with the three different formats text conversion, dsv2dsv dsv2json/json2dsv interested can go to the code format conversion tools node.

High-level interface: Dsv.format (), dsv.formatbody (), and dsV.FormatRows () convert a normal array of objects into a custom delimiter string. The second function ignores the title and removes the title line. The third function converts a two-dimensional array into a delimiter string. Dsv.parse () and dsv.praserows () are two functions that parse relies on praseRows to handle multiple header lines.

For people who don’t need to deal with complex data, the only functionality we need is to read in a CSV file and save it as an array of objects, using evs.csvParse(data) alone. In addition, I thought of a name, called Evs, meaning is easy vis, simple visualization, because I have the idea to write some easier to use interface, or template page, so that users can easily achieve very good effects.

Deep reading:

The new Function () syntax: www.cnblogs.com/xiaokeai011…