Given a string S, consider all of its repeated substrings (consecutive substrings of S, occurring twice or more, may overlap). Returns any repeated substring with the longest possible length. (If S does not contain repeating substrings, the answer is “”.) 2 <= S.length <= 10^5 S Consists of lowercase letters.

Example 1

Input: “banana” Output: “ana”

Example 2

Input: abcd Output: “”

1. How to solve the problem

First of all, if complexity is not taken into account, it is obvious that the problem can be solved by traversing all substrings of length to determine whether there is repetition until there is no repetition. But this is a very difficult problem, and the length of S in the problem can be as high as 10^5, and the time complexity will explode, so it’s definitely not the desired solution. The main source of complexity is that the length of the substring L ranges from 1 to s.length () -1, which is the outer loop. For each length L, we need to use its length as a window to get S.length() -l + 1 substring. There are so many substrings that we need to compare whether there are duplicate substrings. My own way of reducing the complexity of this problem is from the idea of solving the longest common substring problem in dynamic programming, which is that I’m matching a substring of length L as long as the two substrings are equal at L minus 1. So the first L-1 character does not need to be compared, only the last character needs to be compared for equality. If index=1 and index=3 are equal at L=2, then the length of L can be extended by 1 as long as S[1 + 2] == S[3 + 2]. The values at index=2 and index=4 are equal at L=2, but S[2 + 2]! = S[4 + 2] because S[6] is out of the range of the string. Assuming S[6] exists, but is different, strings starting at 2 and 4 cannot be extended to L+1 = 3, so the substring starting at these two positions cannot match until the next round of processing. By extending the length of L until there are no more substrings left to extend, we get the maximum length of the substring. This method can reduce the time complexity of string matching to a certain extent, because inappropriate substrings are excluded early, and only the substrings that can be extended are left behind. Moreover, each extension only needs to compare one character pair of the corresponding substring, and there is no need to repeat the comparison for the already compared ones. So I wrote the following code, which ran out of simple use cases, but eventually ran out of time. The main reason for considering timeout is that for a very long string, when L is very small, there are bound to be many matching string pairs, and two pairs of them need to be compared to determine whether to discard or continue to expand. Then L needs to expand from small to large once, and the outer loop needs to do many times, and the inner loop has a high time complexity when L is small. In the worst case the time complexity is on the order of n*O(n^2), so the time complexity needs to be improved.

class Solution {
public:
	string longestDupSubstring(string S) {
		vector<vector<int>> table(26);
		int len = S.length();
		for (int i = 0; i < len; i++) {
			table[S[i] - 'a'].push_back(i);
		}
		vector<vector<int>> pairs;
		for (int i = 0; i < 26; i++) {
			int numOfSame = table[i].size();
			if (numOfSame > 1) {
				for (int a = 0; a < numOfSame; a++) {
					for (int b = a + 1; b < numOfSame; b++) {
						vector<int> tmp; tmp.push_back(table[i][a]); tmp.push_back(table[i][b]); pairs.push_back(tmp); }}}}if (pairs.empty())
			return "";
		int result = 1;
		int start = 0;
		while(! pairs.empty()) { start = pairs[0] [0];
			for (autoit = pairs.begin(); it ! = pairs.end(); ) {int index1 = (*it)[0] + result;
				int index2 = (*it)[1] + result;
				if (index1 < len && index2 < len && S[index1] == S[index2]) {
					it++;
				}
				else {
					it = pairs.erase(it);
				}
			}
			result++;
		}
		string sub = S.substr(start, result - 1);
		returnsub; }};Copy the code

2. Official solution: binary search + Rabin-Karp string encoding

The official solution is also improved in the direction of time complexity of the violent solution. However, the improvement is in two aspects: first, the expansion of L is not to increase or decrease one by one, but to use binary search to convert the complexity of O(n) into O(logn); In addition, the rabin-Karp encoding method is adopted to reduce the complexity of O(L * n) string matching to O(1 * n). The final time is only order n logn. After I understood the way to solve the problem, I began to try to finish the code by myself. However, the difficulty of the problem was really different. Even if I followed this way, I still encountered many problems.





class Solution {
public:
	// If there is a duplicate substring of length L, return index of the duplicate substring, otherwise return -1
	int RabinKarp(vector<int>& nums, string& S, int L, long long mod) {
		set<long long> table;
		long long code = 0;
		// Evaluates the string encoded value of the first window
		for (int i = 0; i < L; i++) {
			code = code * 26 + nums[i];
			code = code % mod;
		}
		table.insert(code);
		long long aL = 1;
        for(int i = 0; i < L - 1; i++){
            aL = (aL * 26) % mod;
        }
		// The window moves backwards once to update the value of code
		for (int i = L; i < nums.size(); i++) {
			int firstNum = nums[i - L];
			long long num = (firstNum * aL) % mod;
			code = ((code - num + mod) % mod * 26) % mod  + nums[i];
			code = code % mod;
			auto it = table.find(code);
			if (it == table.end())
				table.insert(code);
			else {
				return i - L + 1; }}return - 1;
	}

	string longestDupSubstring(string S) {
		// This function implements binary L, passing the string matching operation to another function
		int n = S.length();
		long long mod = (long long)pow(2.32);
		vector<int> nums(n, 0);
		for (int i = 0; i < n; i++) {
			nums[i] = S[i] - 'a';
		}
		int left = 1;
		int right = n;
		int L;
		while(left ! = right) { L = left + (right - left) /2;
			if(RabinKarp(nums, S, L, mod) ! =- 1) {
				left = L + 1;
			}
			else
				right = L;
		}
		int start = RabinKarp(nums, S, left - 1, mod);
		if (start == - 1)
			return "";
		return S.substr(start, left - 1); }};Copy the code

3. Summary

Difficult problems have really high requirements for algorithm efficiency. Usually, I don’t pay much attention to the use of data structure, because time complexity is not a bottleneck. Basically, I use vector whenever I can use it. However, in this case, extreme optimization time efficiency is required. Even if the standard solution is used, the implementation does not pay much attention to the impact of data structure selection, resulting in the complexity of the whole algorithm still cannot be reduced due to this bottleneck. It is really a pity.