From IMWeb community, author: Li Teng, original link

preface

Coding is a topic that every programmer can’t get around. For front-end engineers, characters are displayed intuitively on the interface. When it comes to words, in most people’s minds, they are fixed to characters arranged neatly. But the forest is big what birds have, but there are 6800+ kinds of characters in the world, will inevitably fly out of some weird birds… And the so-called “universal code” Unicode, in the implementation of coding and display, will not encounter some strange things?

In fact, you’ve probably seen it before:





X ̙ ͈ ̝ ͍ ͕ ̙ ̄ ͛ ̽ ̆ ͌ ́ ̕ ͟ g ̘ ̣ ̠ ̝ ̟ ̤ ̥ ̼ ̼ ̽ ͑ ͋ ̈ ̑ ̒ ͟ ͞ q ̛ ̤ ̦ ̝ ̘ ͎ ͋ ̔ ̋ ͌ ͒ ̆ ̋ ̚ ͡ f ̵ ̢ ̙ ͇ ̮ ̠ ̋ ̀ ͌ ̅ ̉ ̃ ̔ ͜ ͜ ͠ ͡ r ̢ ̜ ̩ ͙ ̭ ̲ ͓ ͈ ̈ ̀ ͑ ̆ ͋ ̚ ͢ ͜ m ̷ ̛ ͙ ̝ ̣ ̲ ̭ ͍ ͉ ̊ ̓ ̾ ̈ ̋ ̿ ̚ ͢ ͟ ͠ s ̡ ̩ ͔ ̮ ͈ ̜ ̊ ̽ ͂ ̆ ̈ ́ ̃ ̓ ͋ ̏ ̷


Your screen was broken by the Lanxiang excavator do not believe you see; ̷ ̸ ̨ ̀ ͒ ̏ ̃ ͦ ̈ ́ ̾ ̀ ́ ̎ ͢ ҉ ̵ ̶ ͚ ̼ ͉ ͖ ̺ ̥ ͔ ͇ ̰ ̹ ̮ ͙ ͉ ̻ ̼ ̭ ̻ ͕ ̮ ͇ ͨ ͬ ͪ ͗ ̇ ̑ ̽ ͋ ̀ ̋ ̊ ͌ ͧ ͨ ͭ ̓ ̅ ͐ ͥ ̂ ̔ ̊ ͧ ͊ ҉ ̶ ̵ ̷ ̞ ̩ ̦ ̳ ̺ ̳ ̬ ̬ ̩ ̣ ̫ ͇ ̯ ̥ ͖ ͍ ͕ ̠ ̦ ̼ ̗ ͯ ̽ ͌ ̔ ͪ ͯ ́ ́ ͋ ̍ ͨ ̿ ̿ ̎ ͒ ͤ ̓ ̅ ̀ ͂ ͧ ͋ ̏ ͫ ͣ ̔ ͘ ͜ ͠ ͏ ̶ ̥ ̺ ͘ ̵ ̸ ̧ ̧ ͓ ̘ ̺ ͎ ̜ ̥ ͕ ͈ ̝ ̫ ͎ ̺ ̮ ̱ ̤ ̠ ̠ ͖ ̳ ̻ ̥ ̣ ̪ ͍ ͕ ͇ ̮ ͙ ̹ ̪ ͮ ͧ ͫ ͂ ͒ ͤ ͣ ̌ ̽ ͨ ͪ ͒ ̄ ̄ ̉ ̒ ̊ ͩ ̅ ͆ ̚ ͘ ͘ ̚ ͟ ͟ ͝ ͅ





Today we are going to look at these strange characters.


First, words can wear hats and shoes

Mention Thai wen, many friends will immediately think: Sawadika (hello). But how is this sentence written? In fact this sentence hello, there are differences between boys and girls, the boy writing is: ส ว ั ส ด ี ค ร ั บ, girl is: ส ว ั ส ด ี ค ่ ะ.

But that’s not important. What’s important is that we found some words with special symbols on them. It’s like wearing a hat. In fact, Thai characters not only “wear hats”, but occasionally “wear shoes”. For example, the following three characters

With a little more imagination, one might wonder, is it possible to wear more than one hat? That’s true… Tevin allows you to wear one pair of shoes and two hats. The full form looks like this:

Therefore, we who are used to reading Chinese and English need to change our thinking. The world’s characters are not all arranged in neat horizontal grids, but there are also transformers like Thai characters.

Although the number of “hats” and “shoes” in standard Thai is limited, the Thai characters displayed on the computer can have an infinite number of hats and shoes due to the mysterious design of the international code. In other words, it becomes a text that can expand on the Y axis!

And not only can we spray it up, we can also spray it at an Angle…

You can also

(Screenshots are used because some systems crash)

The contradiction between man and machine

And then we ask Unicode, why aren’t you being reasonable? Such a design is almost a bug; But even if it was a bug, it should have been fixed a long time ago.

In fact, such a design is to solve a problem: the contradiction between man and machine.

The first is the contradiction of storage. If each combined Thai text was represented by one code, it would take at least 44×21×4=3696 codes (in fact, it might be more than that). It would be wasteful to use so much code to process text with only 69 basic elements, so computers use a set of designs called complex Typography (CTL) to resolve the contradictions. To put it simply, each basic character in Thai corresponds to a code. Users input multiple basic characters in sequence in the input method to combine them, and finally hit a special “end character”. The basic characters are then combined into a single Thai character and displayed on the screen. This solves the problem of wasted storage space.

But this brings up the second paradox, the problem of identification. One can easily identify whether a Thai word is correctly spelt and meaningful. But the machine can be difficult to judge when displaying, and even if it can, it can cause performance problems. How to solve it? Now will do some articles in the input method, such as a tone symbol can not be input. However, due to the nature of the Thai composite character based on the end sign, you can’t prevent “artists” from copying and pasting, manually changing character positions, and so on, to the limit.

Second, in newer versions of Webkit, this type of up-and-down character is prevented from appearing on the display, at least without affecting typography. So there are some characters have not be stacked up and down the situation, you can be in different browsers, look at the display of the characters: ส ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็ ็

2) ̀_•́) Mixes it into a mixture of sweet and bitter mixture

You may think Thai is weird, but the tones are in the text, and you can read them exactly as you see them. This is easier to spread, is it not you read classical Chinese to look up the dictionary trouble? So the design of the text has its pros and cons.

This design is not just Thai, it is a typical example. In addition to the common Lao, Tibetan. Artists with rich imagination will think of another interesting thing: can I take the Lao “hat” and combine the “Tibetan” shoes into an independent character?

They won. Before long, emoji became popular.

For example, this expression :(; ´༎ຶ д ༎ຶ ‘) contains the character with tearful eyes: ༎ຶ What language is it?

In fact, it doesn’t belong to any language in the world! The eyes section is in Lao and the tears section is in Tibetan. But when you copy it, it turns out to be a single character, which is pretty amazing. For reasons we’ve already covered in Part II.

How do I know? Of course, I am not a linguist, so we can transcode ༎ຶ to find the corresponding encoding is \u0f0e\u0eb6, and check the unicode-table.

In addition —

  • ε´ ▶ is the Greek letter
  • ʕ-‘ᴥ ‘- ʕᴥ stamp is the international phonetic alphabet
  • (, ཀ, “<)ཀ is in Tibetan
  • (•̀_•́) Mixes are Thai
  • If you wanted to type Chinese, if you wanted to type Russian Cyrillic

So it seems that if you have a cameo style emoji, you must have a Japanese speaking 18 languages ✧

Four, the dislocation of the font

So far we’ve been talking about the nature of characters, but there’s another key element to how characters appear on the screen: fonts. The same characters in different fonts will appear differently.

A line of Chinese, you set “bold” or “cursive”, it is generally neat; But if you set a line in English to “cursive”, there may be a problem.

In the browser, if the corresponding code is empty in the font file, it will be displayed as a square, at least not affect the normal typesetting of other characters. But we know that Unicode’s forest is just too big… In some fonts, special characters are incorrectly formatted.

For example: hot ҈ ҈ ҈ both ҈ and ҈ sweat ҈

You can escape it and get the code: \ terror \u0488\ trigger 84\u0488\ U5B57 \u0488\u90fd\u0488\ U51fa \u0488\ u6C57 \ U0488 \u4e86\u0488

0488 is the encoding of the character ҈, which is a combination of Cyrillic characters.

It has the problem of misplacement in most commonly used fonts; Other fonts, such as the Courier New font, are shown separately:

So which of these two displays is normal? None of them are normal. We look at the Cyrillic notation wiki and see that it only works when paired with Cyrillic numbers:

As for when you combine it with other languages, we see either a mismatch or a display of separation. And there’s no international organization that dictates how it should be displayed. In fact, there are so many words that it’s impossible to manage them, so that’s why it’s confusing.

Finally, a handful of creative artists are using this character to break through the sensitive word filter…

5. Chaos and innovation?

It is worth mentioning that Unicode is unstoppable and is constantly being updated. Emojis, for example, are part of Unicode’s standard character set.

As we mentioned earlier, people play with these weird characters based on “bug-like” Settings. This is very limited. Then the “artists” have some bold ideas: can I take the initiative to create new characters, just for dislocation and combination?

I’m not sure if this is innovation or more chaos. But the truth is, iOS, with its own characters, has already tried this. And sogou input method in the iOS system to provide many symbols, selected some, available for users to use. The name of this feature is called Ripple.

I have to wonder how unique the artists’ perspective is. Now that this is happening… Finally, let’s hope our homemade emojis can be translated into Unicode one day.

(To be continued)