A few days ago, PM put forward a demand, hoping to limit the length of rich text input by users, such as the maximum limit of 400, including Chinese and English, that is, 400 on the visual level. The rich text editor used in this project is Tinymce. Perhaps other editors have different implementations from Tinymce. I will only elaborate my approach here and provide an idea.


Ignore the Tag

First, tags are definitely not needed, including styles and possible attributes. With tags, it’s easy to think of checking by the <> tag, not counting when < is encountered, and counting again when > is encountered, so you can write the following code

const getPlainTextLen = richText= > {
    let count = 0
    for (let i = 0; i < richText.length; i++) {
        if (richText[i] === '<') {
            while(richText[i] ! = ='>') {
                i++
            }
        } else {
            count++
        }
    }
    return count
}
Copy the code

In the above code, we managed to ignore the effects of tags, but the implementation is rather ugly. We can use the re to implement the <… > is replaced by an empty string, and the length is obtained. In order not to affect the original string, the function must be pure

const getPlainTextLen = richText= > richText.replace(/<[^>]+>/g.' ').length
Copy the code

Handles blank strings and newlines

Obviously, white space strings are also undesirable. In HTML, white space strings can be represented as Spaces or ย  Or & ensp; \n and \r\n, which are represented as newlines in JavaScript, also appear as Spaces. String is still a string after replace, so extend the above function

const getPlainTextLen = richText= > richText
    .replace(/<[^>]+>/g.' ')
    .replace(/ |\n|\r\n|ย  |โ€‚ /g.' ')
    .length
Copy the code

Some editors store blank strings as simple Spaces and nothing else. For performance reasons (to reduce judgment), you don’t have to exactly copy the second replace, as opposed to something like โ€ƒ Also represents a blank string. The code above lists only two, so it is best to write accordingly.

Processing HTML Entities (HTML character Entities)

If you type 1 + 1< 3, how does the editor store it, and why does

show up correctly if I type

1234?

1 + 1< 3 will be stored as 1 + 1 < 3, < is stored as HTML entities, possibly to avoid HTML strings that the user might enter. For HTML entities, see W3Schools, HTML Entities (w3schools.com), and if you can’t turn it on, use a little magic. This is also the case, but because it appears as a blank string, it needs to be processed in advance. HTML Entities also includes emojis such as ๐Ÿ˜€, which many people love. Eg. As mentioned above. , the original 5 lengths of 1 + 1< 3 become 8 lengths, ๐Ÿ˜€ corresponds to ๐Ÿ˜€ , but the visual level should be regarded as a length, so it needs to be processed.

HTML Entities has two representations: &entity_name; OR &#entity_number; Based on this, the corresponding re is written.

{2,5} and {1,6} indicate the length to be matched
const getPlainTextLen = richText= > richText
    .replace(/<[^>]+>/g.' ')
    .replace(/ |\n|\r\n|ย  |โ€‚ /g.' ')
    .replace(/ & ([a-z] {2, 5} | # [0-9] {1, 6}); /g.' ')
    .length
Copy the code

Finally, there are special glyph, which are phonetic notes we learned in primary school, such as a -> a, which corresponds to aฬ€ Let’s write the last re, which needs to be processed before the last re in the code above, otherwise “A” will become “A” and will be counted as two strings.

const getPlainTextLen = richText= > richText
    .replace(/<[^>]+>/g.' ')
    .replace(/ |\n|\r\n|ย  |โ€‚ /g.' ')
    .replace(/[a-zA-Z]([6-7][0-9]); /g.' ')
    .replace(/ & ([a-z] {2, 5} | # [0-9] {1, 6}); /g.' ')
    .length
Copy the code

It is not necessary to add glyphs, as not every editor will display them correctly, such as tinymce, which I use. Again, it is up to you to decide if you need to add glyphs.


validation

Finally, let’s verify that this function is correct

5 + 8 + 4 + 10 + 11 + 5 + 3 + 4 + 1 = 51
const html =
    `

Title

subtitle

Hฬ eฬ€ Hฬ eฬ€

  • First block
  • Secondย  block
  • 1 + 1 < 3
  • 5 > 3
  • 1โ€‚ 00 & # 8364;
  • ๐Ÿ˜€
`
// The visual level is also 51, which is correct console.log(getPlainTextLen(html)) / / 51 Copy the code

If this article has helped you, please give me a like and leave a comment below