preface

This article continues the underscore series to implement a template engine (above).

Since there are so many things to cover in this article, let’s start with some of the things that will be used.

The backslash

var txt = "We are the so-called "Vikings" from the north."
console.log(txt);
Copy the code

Our intention is to print the Vikings string wrapped with “”, but in JavaScript, strings start or end with single or double quotation marks. This code will throw Unexpected identifier errors.

What if we just want to use single or double quotation marks in strings?

We can use backslashes to insert ellipses, newlines, quotes, and other special characters in text strings:

var txt = "We are the so-called \"Vikings\" from the north."
console.log(txt);
Copy the code

Now JavaScript can output the correct text string.

This combination of characters followed by a backslash followed by a letter or number is called an escape sequence.

It is worth noting that escape sequences are treated as single characters.

Other common escape sequences are \n for newline, \t for TAB, \r for carriage return, and so on.

Escape sequences

In JavaScript, a string value is a sequence of zero or more Unicode characters (letters, numbers, and other characters).

Each character in a string can be represented by an escape sequence. For example, the letter A can also be represented by the escape sequence \u0061.

The escape sequence begins with a backslash \, which tells the JavaScript interpreter that the next character is a special character.

The syntax for an escape sequence is \uhhhh, where HHHH is a four-digit hexadecimal number.

According to this rule, we can calculate escape sequences for common characters, using the letter M as an example:

// 1. Find the unicode value for the character 'm'
var unicode = 'm'.charCodeAt(0) / / 109
// 2. Convert to hexadecimal
var result = unicode.toString(16); // "6d"
Copy the code

We can use \u006d for m. You can just type the string ‘\u006d’ in the browser command line and see the printed result.

It is worth noting that \n is also an escape sequence, but it can also be used in the same way:

var unicode = '\n'.charCodeAt(0) / / 10
var result = unicode.toString(16); // "a"
Copy the code

So we can use \u000A to represent the newline character \n, for example, typing ‘a \n b’ directly on the browser command line will have the same effect as ‘a \u000A b’.

With that said, let’s take a look at some common character escape sequences and their meanings:

Unicode character values Escape sequences meaning
\u0009 \t tabs
\u000A \n A newline
\u000D \r enter
\u0022 \” Double quotation marks
\u0027 \ ‘ Single quotes
\u005C \ \ The backslash
\u2028 Line separators
\u2029 Paragraph separator

Line Terminators

Line Terminators are Line Terminators. Like whitespace characters, line terminators can be used to improve the readability of source text.

In ES5, four characters are considered line terminators, and any other line break characters are treated as whitespace.

The four characters look like this:

Character code value The name of the
\u000A A newline
\u000D A carriage return
\u2028 Line separators
\u2029 Paragraph separator

Function

Imagine if we could write code like this and see if it works correctly:

var log = new Function("var a = '1\t23'; console.log(a)");
log()
Copy the code

The answer is yes, but what about the following:

var log = new Function("var a = '1\n23'; console.log(a)");
log()
Copy the code

Uncaught SyntaxError: Invalid or unexpected token

Why is that?

This is because the implementation of the Function constructor first performs a ToString operation on the Function body string, which becomes:

var a = 23 '1'; console.log(a)Copy the code

It then checks if the code string conforms to the code specification. In JavaScript, line breaks are not allowed in string expressions, which results in an error.

To avoid this problem, we need to change the code to:

var log = new Function("var a = '1\\n23'; console.log(a)");
log()
Copy the code

In fact, not only \n, but the other three line terminators will cause an error if you use them directly in string expressions.

The reason for this is that the template engine implementation uses the Function constructor, and the same error can occur if we use a line terminator in the template string, so we must treat these four line terminators in a special way.

Special characters

In addition to these four line terminators, we have two more characters to deal with.

One is \.

For example, our template content uses \:

var log = new Function("var a = '1\23'; console.log(a)");
log(); / / 1
Copy the code

We actually wanted to print ‘1\23’, but because we treated \ as a special character marker, we printed 1 instead.

In the same way, if we use \ string when using template engine, it will cause error processing.

The second one is prime.

If we use ‘in the template engine, because we will concatenate strings like p.ush (‘), the string will be incorrectly concatenated because of’, which will cause errors.

So in total, we need to do special processing for six characters. The processing method is to match these special characters with the regular, and then replace \n with \\n, \\ with \\, ‘\\ with \\’. The processing code is as follows:

var escapes = {
    "'": "'".'\ \': '\ \'.'\r': 'r'.'\n': 'n'.'\u2028': 'u2028'.'\u2029': 'u2029'
};

var escapeRegExp = /\\|'|\r|\n|\u2028|\u2029/g;

var escapeChar = function(match) {
    return '\ \' + escapes[match];
};
Copy the code

Let’s test it out:

var str = 'console.log("I am \n Kevin"); ';
var newStr = str.replace(escapeRegExp, escapeChar);

eval(newStr)
// I am 
// Kevin
Copy the code

replace

Let’s talk about the string replace function:

Grammar:

str.replace(regexp|substr, newSubStr|function)
Copy the code

The first argument to replace, which can be passed as either a string or a regular expression.

The second argument can pass a new string or a function.

Let’s focus on the case of the passed function, and give a simple example:

var str = 'hello world';
var newStr = str.replace('world'.function(match){
    return match + '! '
})
console.log(newStr); // hello world!
Copy the code

Match represents the string to be matched, but the function argument actually has more than match. Let’s look at a more complicated example:

function replacer(match, p1, p2, p3, offset, string) {
    // match to match the substring abc12345#$*%
    // p1, the first parenthesis matches the string ABC
    // p2, the second parenthesis matches the string 12345
    // p3, the third parenthesis matches the string #$*%
    // offset, the offset of the matched substring in the original string 0
    // string, matching the original string abc12345#$*%
    return [p1, p2, p3].join(The '-');
}
var newString = 'abc12345#$*%'.replace(/([^\d]*)(\d*)([^\w]*)/, replacer); // abc - 12345 - #$*%
Copy the code

Also note that if the first argument is a regular expression and it is a global match pattern, the method will be called multiple times, with each match being called.

For example, if we want to match a value in <%= XXX %> in a string:

var str = '<li><a href="<%=www.baidu.com%>"><%=baidu%></a></li>'

str.replace(/ < % = (. +?) %>/g.function(match, p1, offset, string){
    console.log(match);
    console.log(p1);
    console.log(offset);
    console.log(string);
})
Copy the code

The function passed in is executed twice, printing the first result:

<%=www.baidu.com%>
www.baidu.com
13
<li><a href="<%=www.baidu.com%>"><%=baidu%></a></li>
Copy the code

The print result of the second time is:

<%=baidu%>
'baidu'
33
<li><a href="<%=www.baidu.com%>"><%=baidu%></a></li>
Copy the code

Regular expression creation

When we want to create a regular expression, we can directly create:

var reg = /ab+c/i;
Copy the code

Constructors can also be used:

new RegExp('ab+c'.'i');
Copy the code

It is worth noting that each regular expression object has a source property that returns a string of the pattern text of the current regular expression object:

var regex = /fooBar/ig;
console.log(regex.source); // "fooBar" does not contain /... / and "ig".
Copy the code

Special characters of a regular expression

Regular expressions have special characters such as \d to match a number, equivalent to [0-9].

In the previous section, we used /<%=(.+?). % > / g to match % > < % = XXX, however, in the implementation of the underscore with is / < % = ([\ s \ s] +)? % > / g.

We know that \s matches a blank space, including Spaces, tabs, page feeds, newlines, and other Unicode Spaces. \s matches a non-blank space, and [\s\ s] matches everything, but why don’t we just use. ?

We might think that. Matches any single character. In fact, it doesn’t. Matches any single character other than the line terminator. Let’s try:

var str = '<%=hello world%>'

str.replace(/ < % = (. +?) %>/g.function(match){
    console.log(match); // <%=hello world%>
})
Copy the code

But if we add a line terminator between hello world, such as ‘\u2029’ :

var str = '<%=hello \u2029 world%>'

str.replace(/ < % = (. +?) %>/g.function(match){
    console.log(match);
})
Copy the code

Because there is no match, the console.log function is not executed.

But change it to /<%=([\s\ s]+?) %>/g

var str = '<%=hello \u2029 world%>'

str.replace(/<%=([\s\S]+?) %>/g.function(match){
    console.log(match); // <%=hello 
 world % >})Copy the code

Inertia match

See / < % = ([\ s \ s] +)? We know that x+ matches x 1 or more times. x? It matches x 0 or 1 times, but +? What the hell is it?

In fact, if the quantifiers *, +,? Or {}, either immediately followed by the symbol (?). , which turns the quantifier into non-greedy, which minimizes the number of matches. The reverse is, by default, greedy, which maximizes the number of matches.

Here’s an example:

console.log("aaabc".replace(/a+/g, "d")); // dbc

console.log("aaabc".replace(/a+? /g,"d")); // dddbc
Copy the code

Here we should use non-lazy matching, for example:

var str = '<li><a href="<%=www.baidu.com%>"><%=baidu%></a></li>'

str.replace(/ < % = (. +?) %>/g.function(match){
    console.log(match);
})

// <%=www.baidu.com%>
// <%=baidu%>
Copy the code

If we use lazy matching:

var str = '<li><a href="<%=www.baidu.com%>"><%=baidu%></a></li>'

str.replace(/<%=(.+)%>/g.function(match){
    console.log(match);
})

// <%=www.baidu.com%>"><%=baidu%>
Copy the code

template

With the essentials behind us, we move on to the implementation of the underscore template engine.

Unlike our previous approach, which used an array push and then join, underscore uses a string concatenation.

For example, a template string like this:

The < %for ( var i = 0; i < users.length; i++ ) { %>
    <li>
        <a href="<%=users[i].url%>">
            <%=users[i].name%>
        </a>
    </li>
<% } %>
Copy the code

We first replace <%= XXX %> with ‘+ XXX +’ and then <% XXX %> with ‘; xxx __p+=’:

'; for ( var i = 0; i < users.length; i++ ) { __p+='
    <li>
        <a href="'+ users[i].url + '">
            '+ users[i].name +'
        </a>
    </li>
'; } __p+='
Copy the code

This code is bound to run incorrectly, so let’s add some more header and tail code to form a complete code string:

var __p=' ';
with(obj){
__p+=' ';for ( var i = 0; i < users.length; i++ ) { __p+=' 
  • + users[i].url + '" >'+ users[i].name +' '; } __p+=' '; }; return __p; Copy the code

    The code is:

    var __p=' ';
    with(obj){
        __p+=' ';
        for ( var i = 0; i < users.length; i++ ) { 
            __p+='<li><a href="'+ users[i].url + '"> '+ users[i].name +'</a></li>';
        }
        __p+=' ';
    };
    return __p
    Copy the code

    We then pass the __p code string to the Function constructor:

    var render = new Function(data, __p)
    Copy the code

    We execute the render function, pass in the required data, and return an HTML string:

    render(data)
    Copy the code

    Fifth edition – Handling of special characters

    We’ll pick up where we left off in version 4, but with the exception of escaping special characters and using string concatenation:

    / / the fifth edition
    var settings = {
        / / evaluated
        evaluate: /<%([\s\S]+?) %>/g./ / insert
        interpolate: /<%=([\s\S]+?) %>/g};var escapes = {
        "'": "'".'\ \': '\ \'.'\r': 'r'.'\n': 'n'.'\u2028': 'u2028'.'\u2029': 'u2029'
    };
    
    var escapeRegExp = /\\|'|\r|\n|\u2028|\u2029/g;
    
    var template = function(text) {
    
        var source = "var __p=''; \n";
        source = source + "with(obj){\n"
        source = source + "__p+='";
    
        var main = text
        .replace(escapeRegExp, function(match) {
            return '\ \' + escapes[match];
        })
        .replace(settings.interpolate, function(match, interpolate){
            return "'+\n" + interpolate + "+\n'"
        })
        .replace(settings.evaluate, function(match, evaluate){
            return "'; \n " + evaluate + "\n__p+='"
        })
    
        source = source + main + "'; \n }; \n return __p;";
    
        console.log(source)
    
        var render = new Function('obj',  source);
    
        return render;
    };
    Copy the code

    See Template example 5 for the complete usage code.

    6th edition – Handling of special values

    But there is one caveat:

    What if users[I].url does not exist in the data? The result of this value is undefined, and we know:

    '1' + undefined // "1undefined"
    Copy the code

    It’s like concatenating undefined strings, which is definitely not what we want. We can add a little judgment to the code:

    .replace(settings.interpolate, function(match, interpolate){
        return "'+\n" + (interpolate == null ? ' ' : interpolate) + "+\n'"
    })
    Copy the code

    Interpolate But I just don’t like to write interpolate twice… Huh? So that’s it:

    var source = "var __t, __p=''; \n"; . .replace(settings.interpolate,function(match, interpolate){
        return "'+\n((__t=(" + interpolate + "))==null? '':__t)+\n'"
    })
    Copy the code

    In fact, it is equivalent to:

    var __t;
    
    var result = (__t = interpolate) == null ? ' ' : __t;
    Copy the code

    See Template Example 6 for the complete usage code.

    The seventh edition

    Now the way we use our template strings is to replace them multiple times, whereas the underscore implementation only does the substitution once. Let’s see how the underscore is implemented:

    var template = function(text) {
        var matcher = RegExp([
            (settings.interpolate).source,
            (settings.evaluate).source
        ].join('|') + '| $'.'g');
    
        var index = 0;
        var source = "__p+='";
    
        text.replace(matcher, function(match, interpolate, evaluate, offset) {
            source += text.slice(index, offset).replace(escapeRegExp, function(match) {
                return '\ \' + escapes[match];
            });
    
            index = offset + match.length;
    
            if (interpolate) {
                source += "'+\n((__t=(" + interpolate + "))==null? '':__t)+\n'";
            } else if (evaluate) {
                source += "'; \n" + evaluate + "\n__p+='";
            }
    
            return match;
        });
    
        source += "'; \n";
    
        source = 'with(obj||{}){\n' + source + '}\n'
    
        source = "var __t, __p='';" +
            source + 'return __p; \n';
    
        var render = new Function('obj', source);
    
        return render;
    };
    Copy the code

    In fact, the principle is very simple, is in the execution of multiple matching function, constantly copy the string, processing the string, concatenate the string, finally concatenate the beginning and end of the code string, get the final code string.

    In this code, matcher’s expression ends with /<%=([\s\ s]+?). %>|<%([\s\S]+?) %>|$/g

    The question is why add a | $? Let’s take a look at $:

    var str = "abc";
    str.replace(/$/g.function(match, offset){
        console.log(typeof match) // An empty string
        console.log(offset) / / 3
        return match
    })
    Copy the code

    We match $to get the position of the last string, so that when we text. Slice (index, offset), we can intercept the last character.

    See Template Example 7 for the complete usage code.

    The final version

    The underscore is already very similar to the implementation of the underscore, but the underscore adds some more details, such as:

    1. The ability to escape data
    2. Configuration items can be passed in
    3. Error handling
    4. Add the source attribute to make it easy to view the code string
    5. Added a debug – friendly print function
    6. .

    However, the content is fairly simple, so I will not write a full version. The final version is in Template example 8. If you have any questions about it, feel free to leave a comment.

    The underscore series

    Underscore Catalogue Address: github.com/mqyqingfeng… .

    The Underscore series is designed to help you read the source code and write your own undercore underscores, highlighting code architectures, chain calls, internal functions, and template engines.

    If there is any mistake or not precise place, please be sure to give correction, thank you very much. If you like or are inspired by it, welcome star and encourage the author.