understand

Encryption is the design of algorithms to turn strings into seemingly haphazardly cluttered text that makes it hard to see what you’re trying to say.

Decryption is to reverse the original text content according to the designed algorithm.

In this algorithm, is the developer to design, can be simple or complex, all by the developer’s own design ability.

The tools required

Js string encryption, we can use several tools (methods) to assist you to transform the string, repeatedly transform a few times can confuse others.

But before you do that, I want you to have some knowledge of Unicode, ASCII, UTF-8, base64, and other encodings involved. Read this article for a quick understanding of common character sets and encoding schemes

Encoding, can be easy to understand the characters into difficult to intuitively understand the numbers or chaotic characters, that is, to achieve the encryption effect of the others can not understand.

However, this form of encryption is too simple to simply convert strings using Unicode or encoding. After all, there are well-known fixed rules that can be easily detected or reversed. After all, there is a one-to-one relationship there.

Therefore, it is necessary to mix them skillfully and design a relatively complex algorithm to achieve the basic encryption effect.

To do this, let’s look at some function methods for string conversions

fromCharCode

According to UTF-16 encoding, the corresponding numeric code value into the corresponding character. Is a static method of a string.

Grammar:

String.fromCharCode(n1[, n2, ...n])
Copy the code

The parameter is a utF-16 code value, which can be expressed in any base, such as decimal or hexadecimal

The return value is a string of corresponding characters arranged and combined in code value order, such as

String.fromcharcode (68) // Returns a character'D'The Unicode corresponding to D is 68 String.fromCharcode (100) // return character'd'FromCharCode (68, 100) // returns a String'Dd'
Copy the code

Note that UTF-16 encoding is an intermediate conversion format based on Unicode encoding. In the range from 0 to 65535 (0x0000 to 0xFFFF), that is, in the BMP zero plane, utF-16 and Unicode characters are the same. Both are one 16bit binary representing one character; However, utF-16 differs from Unicode in that it uses two 16bit binary values to represent a character, that is, two code values to represent a character, which is called a proxy pair.

So the argument to this method, if you want to convert characters beyond 65535 code values, is a proxy pair rather than a direct Unicode code value. If a code value higher than BMP is used, the high bits larger than 16bit will be directly truncated. Because the parameter can only be 16bit to represent a character, although no error will be reported, the output result is obviously changed, not the expected result.

// For example U+1F303, 1F303 is a separate code value in Unicode for the character "Night with Stars" 🌃
// In UTF-16, the character is represented by two code values (0xD83C, 0xDF03).
// The result is 🌃
String.fromCharCode(0xD83C.0xDF03)

// Above 0xFFFF, do binary high level truncation, will become and
// String. FromCharCode (0xF303) Result
String.fromCharCode(0x1F303)
Copy the code

charCodeAt

Instead, fromCharCode converts characters to UTF-16 code values. But it is different in that it is an instance method and the executor is a string object instance.

Grammar:

string.charCodeAt(index)
Copy the code

The argument is the subscript of a character in the string (the value ranges from 0 to string.length-1).

The return value

  • Is the UTF-16 code value of the corresponding subscript character.
  • If index is not a number in the range of values then the return value isNaN; If it is non-numeric or null, the default is 0;
  • From the abovefromCharCodeKnowing that a character can be represented as a proxy pair, then in this case we need to know the value of this characterlengthis2Rather than1If 0 is returned, the first proxy pair is returned, and 1 is the second. If no proxy pair is sent, the default value is 0.
// '🌃'Is 55356,57091'🌃'.length // 2
'🌃'CharCodeAt (0) // Returns 55356'🌃'CharCodeAt (1) // Returns 57091Copy the code
'i am a teacher'.charCodeAt(2); // Return the Unicode value of the third character a, 97Copy the code

fromCodePoint

From the fromCharCode above, it is flawed because it may need to use proxy pairs for conversion, which involves the cumbersome process of converting proxy pairs. In order to avoid unnecessary calculation, ES 2015 has a new method, fromCodePoint.

As you can see from the name, it is a direct code-based conversion, that is, directly from Unicode codevalues, rather than representing a character in a UTF-16 proxy pair. A static method

Grammar:

String.fromCodePoint(n1[, n2, ...n]);
Copy the code

The argument is a Unicode code value, that is, a number, in any base.

The return value is a string of concatenated characters corresponding to the Unicode code values in the order of the arguments.

Again, take the character ‘🌃’, which is represented in Unicode as U+1F303 and has the code value 0x1F303. In UTF-16, it is represented by a proxy pair (0xD83C, 0xDF03)

// This method uses proxy pair conversion
String.fromCharCode(0xD83C.0xDF03)

// This method can be converted directly with the code value
String.fromCodePoint(0x1F303)
Copy the code

The difference between fromCodePoint and fromCharCode is that fromCodePoint can be directly converted to Unicode code values, but IE does not support this and is not as compatible as fromCharCode.

codePointAt

As opposed to fromCodePoint, as opposed to charCodeAt, codePointAt’s use is also obvious.

Converting characters to Unicode code values, instance methods, is also new in ES 2015.

Grammar:

string.codePointAt(index)
Copy the code

The parameter is the subscript of the position of a character in the string (the value ranges from 0 to string.length-1). Note that if a character can be represented by a proxy pair, that is, a character other than the 65535 code value, the first position of the character’s subscript is directly passed and the actual Unicode code value of the character is returned. Passing the second position is the second code value in the proxy pair.

'🌃'CodePointAt (0) // Returns 127747, the Unicode code value for this character'🌃'CodePointAt (1) // Returns 57091, the second codevalue of the character proxy pair (55356, 57091)Copy the code

The return value

  • The Unicode code value for the character corresponding to the subscript.
  • If the index argument is not in the range (numeric or other type), the return value isundefined;
  • But if it is' '.null.undefined.falseOr not, then the default is 0 (chrome);

Again, IE does not support it.

encodeURI && encodeURIComponent

For an explanation of these two functions, see the similarities and differences and scenarios between the two methods of URI encoding in my article

btoa && atob

Binary to ASCII. Atob is ASCII to binary. These are all methods under the window object. Internet Explorer 9 and below are not supported.

As the name suggests, bTOA converts a character into a base64 encoded character with its ASCII value. Atob is its inverse operation, decoding base64 encoded characters, that is, restoring the original characters.

Here we mainly say BTOA, because ATOB is very simple, which is to take the result of BTOA as a parameter to execute. The following is all about BTOA.

grammar

window.btoa(stringToEncode)
Copy the code

It converts each character of the argument as a binary data byte. However, the code points recognized by this method are based on ASCII, so anything beyond 0x00 to 0xFF will raise InvalidCharacterError. So an error is reported if the character is a Unicode character

If the parameter is numeric, it will be treated as a String, such as 100, which is base64 encoding for 1, 0, and 0 characters respectively.

Obviously, this method cannot encode Uncode characters, i.e. you cannot encode Chinese characters.

There are enhancements to support Unicode encodings:

// ucs-2 string to base64 encoded ascii
function utoa(str) {
    return window.btoa(encodeURIComponent(str));
}
// base64 encoded ascii to ucs-2 string
function atou(str) {
    return decodeURIComponent(window.atob(str));
}
// Usage:
utoa('✓ à la mode'); // 4pyTIMOgIGxhIG1vZGU=
atou('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
Copy the code

summary

Introduced several methods, first simple classification summary, to avoid confusion.

  • fromCharCode & charCodeAtIt’s for UTF-16;
  • fromCodePoint & codePointAtIs encoded for Unicode;
  • encodeURI & encodeURIComponentIt’s utF-8 encoded;
  • btoa & atobIt is base64 encoded

Template sample

Here is a simple example, you can get some inspiration from here, but it is not recommended to directly apply, or do some transformation by yourself, so that it is safer, but also can design more complex points.

The examples are for your benefit only.

The algorithm designed here is relatively simple:

We convert each character of the encrypted string to a Unicode value, then do some operations based on that value to get a new number, and then convert that number to a Unicode character. So a new string is created. To make it a little bit more complicated, I’ll code it.

// Encryption method
function encryptChar(target) {
    let result = ' ';
    for (let i = 0, j = 0; j < target.length; j++, i++) {
        let code = target.codePointAt(j);
        result += String.fromCodePoint(code + i + j);
        // Exceeds the BMP code value, which is two code values representing one character
        if (code > 65535) { j += 1; }}return encodeURIComponent(result);
}

// Decrypt method
function decryptChar(target) {
    let newTarget = decodeURIComponent(target);
    let result = ' ';
    for (let i = 0, j = 0; j < newTarget.length; j++, i++) {
        let code = newTarget.codePointAt(j);
        result += String.fromCodePoint(code - i - j);
        // Exceeds the BMP code value, which is two code values representing one character
        if (code > 65535) {
            j += 1; }}return result;
}
Copy the code

Here is a lot of online materials (a lot of copy and paste? Different, codePointAt and fromCodePoint are used instead of charCodeAt and fromCharCode, which cannot be encoded against Unicode. So if the range is taken into account for Unicode characters, then the proxy pair represents a character whose length is 2, so when we loop through the character encoding, we use j to do the actual Unicode code value of the character. CodePointAt converts a character outside BMP to the first subscript of the character to obtain the Unicode code value, while I simply represents the number of characters.

The resulting solutions are far more reliable than the online sources (I’ve done a lot of research myself, coming and going the same way, copying and pasting), and at least they fix their shortcomings here.

conclusion

Look at the examples above. You probably know how to mix these tools. You can design more complicated algorithms, all kinds of operations addition, subtraction, multiplication, division, mod and so on.

It is recommended that you digest some of the coding knowledge, as well as the advantages and disadvantages of the methods I listed above, to help you.

If it helps you, please give a thumbs-up

Please do not reprint without permission