This article is participating in the “Java Theme Month – Java Brush questions punch card”, see the activity link for details

Topic describes

This is 395 on LeetCode. The oldest string with at least K repeated characters, with difficulty Medium.

Given a string s and an integer k, find the oldest string in s and require that each character in the string occur at least k times.

Returns the length of this substring.

Example 1:

Input: s = "aaabb", k = 3 Output: 3 Explanation: The oldest string is "aaa", where the 'a' is repeated three times.Copy the code

Example 2:

Input: s = "ababbc", k = 2 Output: 5 Explanation: The oldest string is "ababb", where 'a' is repeated twice and 'b' is repeated three times.Copy the code

Tip:

  • 1 <= s.length <=
    1 0 4 10^4
  • The “S” is a lowercase letter only
  • 1 <= k <=
    1 0 5 10^5

Enumeration + double pointer

Actually see this question, my first reaction is “dichotomy”, direct “dichotomy” answer.

But if you go down, you can see that dichotomy is not feasible because it doesn’t have the two-paragraph property.

So if I have some interval of length t that satisfies this requirement, does the interval of length t plus one satisfy or not satisfy?

Obviously not, it depends on whether the characters appearing at t + 1 are in the original interval.

For example 🌰, suppose we have drawn an interval of length t that satisfies this requirement (k > 1), then when we expand the length to t + 1 (either to the left or to the right) :

  • If the character of the new position appears in the original interval, it must meet the requirement that the number of occurrences is greater than K, and then the length of t + 1 meets the requirement
  • If the character in the new position does not appear in the original interval, the new character appears only once, and the length of t + 1 does not meet the requirements

Therefore, we can not use the “binary”, and accordingly can not directly use the “sliding window” idea of the dual pointer.

In fact, the double pointer also makes use of the two-paragraph property. When a pointer is determined at a certain position, the other pointer can fall at a certain clear segmentation point, so that the left half meets the requirements, while the right half does not.

So what other properties can we use? At this point, pay attention to the “small value” content of the data range.

[1, 26] [1, 26] [1, 26] [1, 26] [1, 26] [2, 26] [1, 26] [1, 26] [2, 26] [1, 26] [1, 26] [26, 26] [1, 26] [26, 26] [1, 26] [26, 26] [1, 26] [26, 26]

You’ll notice that when you determine the number of character types a length contains, the interval regains its two-paragraph nature.

When we use double Pointers:

  1. Moving the right endpoint to the right must result in an increase (or no change) in the number of character types
  2. Moving the left endpoint to the right invariably results in a reduced (or unchanged) number of character types

Of course, we also need to keep track of how many characters meet the requirement (k or more) and update the answer when all characters in the interval match.

Code:

class Solution {
    public int longestSubstring(String s, int k) {
        int ans = 0;
        int n = s.length();
        char[] cs = s.toCharArray();
        int[] cnt = new int[26];
        for (int p = 1; p <= 26; p++) {
            Arrays.fill(cnt, 0);
            // tot represents the number of character types in the [j, I] interval; Sum indicates the number of character types whose occurrence times are at least K
            for (int i = 0, j = 0, tot = 0, sum = 0; i < n; i++) {
                int u = cs[i] - 'a';
                cnt[u]++;
                // If the value is 1 after being added to CNT, the total number of characters +1
                if (cnt[u] == 1) tot++;
                // if added to CNT equals k, it indicates that the character is changed from unqualified to qualified, and the number of qualified is + 1
                if (cnt[u] == k) sum++;
                // When the number of character types in the range tot exceeds the current limit p, we need to remove some letters, i.e. the "left pointer" moves right
                while (tot > p) {
                    int t = cs[j++] - 'a';
                    cnt[t]--;
                    // If it is 0 after being added to CNT, the total number of characters is -1
                    if (cnt[t] == 0) tot--;
                    // CNT = k-1; // CNT = k-1
                    if (cnt[t] == k - 1) sum--;
                }
                // When all characters are correct, update the answer
                if (tot == sum) ans = Math.max(ans, i - j + 1); }}returnans; }}Copy the code
  • Time complexity: Enumerate 26 possibilities, each of which scans the array once. The complexity is O(n)O(n)O(n)
  • Space complexity: O(n)O(n)O(n)

Answering questions

Why enumerate the 26 possibilities (the number of character types included in the answer) first? What is the essence?

The solution is simply stated to restore the duality of the interval, but the main purpose is to restore the monotony of the two-pointer.

Double pointer monotonicity: when one side of the double pointer (such as the right pointer) is determined, the other side of the double pointer (such as the left pointer) can clearly fall at a segmentation point, so that one side is valid and the other side is invalid.

We need to restore the monotony of the two-pointer to solve the problem of “not knowing which direction the pointer should move”.

The complete thought process is as follows:

First of all, we know that the characters to the left of the answer substring and the characters to the right of the answer substring must not appear in the substring, otherwise it would not be optimal.

But if we just start from the nature, simple solution would be to use a sliding window, continuous adjustment of sliding window around the border, to make it meet the “left on the right side of the border on the left side of the characters and the right boundary character will not appear in the window”, this is actually double pointer solution, but if you don’t finalized first (enumeration) the number of characters contained the answer, The double pointer here is not monotonic.

In other words, you can’t do logic using this property alone.

The problem is: the properties are correct, but not yet directly usable.

Therefore, we need to use the finiteness of the number of characters (enumerable) as a starting point to make the property that “the characters to the left of the answer substring and the characters to the right of the answer substring must not appear in the substring” monotonic under the realization of double Pointers. So that’s what the solution says to make the interval two segments again.

Then through the 26 possibilities (the number of character types contained in the answer), applying a sliding window to each possibility (guaranteed by the above properties), you can get the maximum of each possibility (local optimal), and from the maximum of all possibilities you can get the answer (global optimal).

review

In fact, the breakthrough analysis of this problem is similar to 1178.

Solution: When conventional analysis fails, pay attention to “small” values in the data range. Because small numbers are “enumerable,” they are often an important (if not the only) way to solve problems or reduce complexity.

Get used to this “breakthrough” approach

The last

This is the No.* of our “Brush through LeetCode” series, which started on 2021/01/01. There are 1916 topics on LeetCode as of the start date, some of which have locks.

In this series of articles, in addition to explaining how to solve the problem, I will also present the most concise code possible. If a general solution is involved, the corresponding code template will also be used.

In order to facilitate the students to debug and submit the code on the computer, I set up the relevant warehouse: github.com/SharingSour…

In the repository, you’ll see links to the series of articles, the corresponding code for the series of articles, the LeetCode source links, and other preferred solutions.