Offer to come, dig friends take it! I am participating in the 2022 Spring Recruit Punch card activity. Click here for details.

Topic describes

This is a 393. Utf-8 encoding verification on LeetCode with moderate difficulty.

Tag: “Simulation”

Given an array of integers representing data, returns whether it is a valid UTF−8UTF-8UTF−8 encoding.

A character in UTF-8 may be 111 to 444 bytes in length, following the following rules:

  1. for
    1 1
    Byte character, the first byte of which is set to
    0 0
    Behind,
    7 7
    Bit for this symbolunicodeCode.
  2. for
    n n
    Byte character (
    n > 1 n > 1
    ), before the first byte
    n n
    Bit is set to
    1 1
    In the first
    n + 1 n+1
    Bit is set to
    0 0
    , the first two characters of the following bytes are set to
    10 10
    . The remaining bits, not mentioned, are all bits of this symbolunicodeCode.

Here’s how UTF-8 encoding works:

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Copy the code

Note: The input is an array of integers. Only the lowest eight significant bits of each integer are used to store data. This means that each integer represents only 1 byte of data.

Example 1:

Input: data = [197,130,1] output: true description: data representation byte sequence :11000101 10000010 00000001. This is the valid UTF-8 encoding for a 2-byte character followed by a 1-byte character.Copy the code

Example 2:

Input: data = [235,140,4] output: false description: data represents an 8-bit sequence: 11101011 10001100 00000100. The first three bits are all 1's, and the fourth bit is 0 to indicate that it is a 3-byte character. The next byte is a continuation byte starting with 10, which is correct. But the second continuation byte does not start with 10, so it is against the rule.Copy the code

Tip:


  • 1 < = d a t a . l e n g t h < = 2 1 0 4 1 <= data.length <= 2 * 10^4

  • 0 < = d a t a [ i ] < = 255 0 <= data[i] <= 255

simulation

Carry out the simulation according to the meaning of the question.

Data [I]data[I]data[I]data[I]data[I]data[I]data[I]data[I]data[I]data[I]data[I]data[I]data[I]data[I]data[I]

  • if
    c n t cnt

    1 1
    Or greater than
    4 4
    Both violate encoding rules (and the character length is
    1 1
    The encoding rule and character length of the
    1 1

    4 4
    Conflict), returnFalse;
  • If the position
    i i
    Behind the lack of
    c n t 1 cnt – 1
    Also returnFalse;
  • Otherwise, the check subscript range is
    [ i + 1 . i + c n t 1 ] [i + 1, i + cnt – 1]
    If the first two digits of
    10 10
    If the request is not met, returnFalse.

Returns True if the above procedure satisfies the requirement to skip to the next checkpoint for checking and there is no conflict in the entire data.

Code:

class Solution {
    public boolean validUtf8(int[] data) {
        int n = data.length;
        for (int i = 0; i < n; ) {
            int t = data[i], j = 7;
            while (j >= 0 && (((t >> j) & 1) = =1)) j--;
            int cnt = 7 - j;
            if (cnt == 1 || cnt > 4) return false;
            if (i + cnt - 1 >= n) return false;
            for (int k = i + 1; k < i + cnt; k++) {
                if ((((data[k] >> 7) & 1) = =1) && (((data[k] >> 6) & 1) = =0)) continue;
                return false;
            }
            if (cnt == 0) i++;
            else i += cnt;
        }
        return true; }}Copy the code
  • Time complexity: O(n)O(n)O(n)
  • Space complexity: O(1)O(1)O(1)

The last

This is the No.393 of our “Brush through LeetCode” series, which started on 2021/01/01. There are 1916 topics in LeetCode by the start date, some of which have locks, we will first finish all the topics without locks.

In this series of articles, in addition to explaining how to solve the problem, I will also present the most concise code possible. If a general solution is involved, the corresponding code template will also be used.

In order to facilitate the students to debug and submit the code on the computer, I set up the relevant warehouse: github.com/SharingSour… .

In the repository, you’ll see links to the series of articles, the corresponding code for the series of articles, the LeetCode source links, and other preferred solutions.