Last week, the helicopter in the circle of friends did not know why the fire, a lot of friends in a variety of fancy planes to take off.

Before I knew what was going on, the helicopter went viral on Weibo.

More and more people came from behind with their helicopters, hovering over the circle of friends. So a lot of friends came to their tanks, designed to hit the helicopter, one bang a accurate.

All right, back to the point!

All programmers should be familiar with Unicode, which contains almost all the symbols in the world. For example, the corresponding Unicode codes for the helicopter symbols are:

Ps: recommend a website, can according to the symbol search the corresponding Unicode code: unicode.yunser.com/unicode

In addition to these normal characters, Unicode contains a wide variety of exotic characters.

The exotic character

In addition to the normal, familiar characters, Unicode has some strange characters, such as these

In addition to these strange characters, Unicode also has some strange symbols.

Take the following set of mah-jongg tiles:

A set of playing cards:

A complete set of chess:

In addition to these, the ๑• ㅂ•́)و✧,

Unicode also includes emojis that we all use.

In addition to these, There are some special characters in Unicode that we can do a lot of interesting things with.

Combining characters

Unicode has a class of characters called combinatorial characters that can be appended to previous non-combinatorial characters to make the whole look like one character.

The original purpose of composite characters is to solve the special needs of some regional languages and characters, such as Thai tone symbols and vowel symbols.

In normal use, these combinations are limited in number. However, in the Unicode composite character design, there is no such restriction, which allows us to add such composite characters indefinitely.

This feature can be used to achieve spoofing effects such as “piercing the ceiling” and “piercing the floor”.

The above implementation principle uses the following two combined characters:

You can append these two characters to normal characters by copying the corresponding HTML code after the normal characters, as shown in the following example

blackThe & # 785;The & # 814;
Copy the code

Unicode code values usually use U+N(hexadecimal N for code value), such as U+0041 for A.

Unicode can be used in HTML. N is the code value in decimal notation

In JS Unicode requires the] \uN(hexadecimal N for code value) representation

If we just copy a few of these additional characters over the normal characters, we can create the above “breakdown” effect.

Remember the Thai text mentioned above, there was a time when it was popular to post a spray text, such as the effect below.

The actual principle of this spray is to use the tone symbols in Thai to attach to other normal symbols.

There’s no way to reproduce this effect, so now we can only see this effect:

This effect may still be visible on older systems/browsers, so let me know in the comments section.

Null wide character

There is also a class of format characters in Unicode that are not visible and printable and are used to adjust the display format of characters, so they are called zero-width characters.

There are the following types of zero-width characters:

Zero-width space U+200B: Newline separation for longer words

Zero width no-break space U+FEFF: Used to prevent newline separation at a particular position

Zero-width joiner U+200D: used in Arabic and Hindi languages to create a joiner between characters that cannot be joined

Zero-width non-joiner U+200C: used in Arabic, German, and Hindi languages to prevent the hyphen effect between characters that would be joined

Left-to-right mark U+200E: Used in multilingual texts with mixed text orientations (e.g., mixed left-to-right English and right-to-left Hebrew) to specify that typeset text is written left-to-right

Right-to-left mark U+200F: Used to specify that typesetted text is written right-to-left in multilingual text with mixed text orientations

We can also play some slapstick effects by taking advantage of the fact that zero width characters are not invisible.

Blank weibo

When Posting a microblog, if the content is blank, it will not be able to publish.

But if we copy zero-width characters, such as “zero-width space U+200B” into tweets, then we can post blank tweets.

We can copy zero-width characters using the Chrome console as follows:

The release effect is as follows:

Invisible watermarking

For some internal forums or novel websites, you can embed invisible watermarks in posts or novel content through zero-width characters.

When the content is copied to other sites by some crawlers, we can easily find the user who leaked the content through invisible watermarks.

The main principle of invisible watermarking is to convert user information, such as user name, into zero-width characters through certain algorithms, so that ordinary users can not see the watermark when browsing.

If the content is copied to other sites, invisible who wins will also be copied, just find the watermark, these zero-width characters into the username can be reversed.

Here is a conversion method, the JS code mainly refers to the following Github project:

Github.com/umpox/zero-…

Invisible watermark generation method

The first step is to convert each character of the plaintext string into a binary string.

    // Each character is converted to binary, separated by Spaces
    const textToBinary = username= > (
      username
      .split(' ')
      // charCodeAt converts characters to the corresponding Unicode code value
      .map(char= > char.charCodeAt(0).toString(2))
      .join(' '));Copy the code

The following is an example:

In the second step, the binary string is converted to a zero-degree string as follows:

  • 1 convert to \u200b zero-width character (zero-width space)
  • 0 converted to \ u200C zero-width non-joiner character (zero-width non-joiner)
  • Convert the rest to a \u200d zero-width joiner
  • Finally, use \ufeff zero width no-break space as the delimiter
const binaryToZeroWidth = binary= > (
  binary.split(' ').map((binaryNum) = > {
    const num = parseInt(binaryNum, 10);
    if (num === 1) {
      return '\u200b'; // \u200b zero width character (zero width space)
    } else if(num===0) {
      return '\u200c'; // \ u200C (zero-width non-joiner)
    }
    return '\u200d'; // \u200d zero-width joiner

  }).join('\ufeff') // \ufeff (zero width no-break space)
);
Copy the code

The final encryption method is as follows:

const encode = username= > {
  const binaryUsername = textToBinary(username);
  const zeroWidthUsername = binaryToZeroWidth(binaryUsername);
  return zeroWidthUsername;
};
Copy the code

After the plaintext string is encrypted using encryption methods, the encrypted string is invisible to the naked eye, but it actually exists.

In fact, if we copy the encrypted string to the BEJSON site, we can see the characters.

In addition, you can copy the encrypted string into IDEA and see the corresponding Unicode encoding value.

Decrypting invisible watermark

Know the way of encryption, decryption is actually very simple, as long as we follow the opposite steps to it.

The first step is to convert the invisible watermark to a binary string according to the following rules. The conversion rules are as follows:

  • Use \ufeff to separate strings
  • \ u200b to 1
  • \ u200c to 0
  • Use Spaces for other characters
const zeroWidthToBinary = string= > (
  string.split('\ufeff').map((char) = > { // \ufeff (zero width no-break space)
    if (char === '\u200b') { // \u200b zero width character (zero width space)
      return '1';
    } else if(char === '\u200c') { // \ u200C (zero-width non-joiner)
      return '0';
    }
    return ' ';
  }).join(' '));Copy the code

By calling this method, the invisible watermark is converted to a binary string.

The second step is to convert the binary to the corresponding character.

const binaryToText = string= > (
  // fromCharCode binary conversion
  string.split(' ').map(num= > String.fromCharCode(parseInt(num, 2))).join(' '));Copy the code

The final decryption method is as follows:

const decode = zeroWidthUsername= > {
  const binaryUsername = zeroWidthToBinary(zeroWidthUsername);
  const textUsername = binaryToText(binaryUsername);
  return textUsername;
};
Copy the code

The decryption example is as follows:

Short url

We commonly used short url, the domain name will follow a random string, so as to achieve short url to long url mapping. For example:

sourl.cn/iLyn9S

However, we can also use zero-width characters to achieve the effect of short url, such as the following website, can generate such a short url.

zws.im/

You can see that the short url is not followed by any characters, but is actually followed by a string of zero-width characters. When the browser visits the short url, the back-end program as long as the decryption of the back zero wide characters, get the corresponding URL, and then do the jump to the specified website.

Anti-decryption principle can refer to the above invisible watermark code

Beware of zero-width characters

In daily development, we sometimes need to read text content from some files and do the corresponding processing.

Sometimes we might see some weird phenomena, like the example we saw before.

The daemon reads text from Excel and determines whether the read text is equal to the specified string.

Then when we read a copy of Excel, the cash-back logic doesn’t pass. I thought there was a space in the Excel content, but when I opened the Excel, I found it was exactly the same as the specified string, and there were no other characters.

This is the first time to encounter such an example, no experience, really investigation for a long time, to the end of a little doubt about life. Finally, I accidentally copied the text content into IDEA, only to find that the sorting was mixed with zero-width characters!

If you have encountered this problem, you can copy the text content and then check the IDEA to see if there are some invisible characters ~

And finally (thumbs up!)

These two weeks have been very busy, always in the rhythm of 9106, really tired, so cut off a week!

Fortunately, the recent project test, a little more relaxed, I can have some time to write articles. However again mention pen to write an article, have a bit broken rhythm!

This article ink ink for a long time before water out, next week to resume the rhythm of the week, no matter how busy and tired, every week to a.

Welcome everybody little friend, come here every week squat me, Gank me!!

Okay, I’m downstairs, see you next week!!

Refer to the link

  1. Juejin. Cn/post / 684490…
  2. zero.rovelast.com/
  3. zws.im/
  4. Imweb. IO/topic / 5 a08a…

Welcome to pay attention to my public account: procedures to get daily dry goods push. If you are interested in my topics, you can also follow my blog: studyidea.cn