Divide and conquer solution: the longest substring with at least K repeated characters

The common applications of divide and conquer strategy include dichotomy (divide and conquer is applied to boundary division), merge sort and quick sort. See for implementation and detailed explanation Sorting summary (insert sort, merge sort, heap sort, quick sort, bucket sort)

Here is a brief summary of the idea of divide and conquer. A problem can be divided into many sub problems with the same structure, and the sub problems and the original problem are of the same nature, and the sub problems do not affect each other, that is, they are independent of each other. At the same time, the solution of a subproblem is the solution of the original problem.

Next, let's get to the point and take LeetCode 395. Longest substring with at least K repeated characters For example, sort out the idea of divide and conquer.

subject

Give you a string s and an integer k, please find the longest substring in s, and the number of occurrences of each character in the substring shall not be less than k. Returns the length of this substring.

Example 1:
Input: s = "aaabb", k = 3
 Output: 3
 Explanation: the longest substring is "aaa" ,among 'a' Repeated three times.

Example 2:
Input: s = "ababbc", k = 2
 Output: 5
 Explanation: the longest substring is "ababb" ,among 'a' Repeated twice, 'b' Repeated three times.

thinking

1, Divide and conquer

   for this problem, through observation, it is not difficult to find that the characters with frequency less than the required frequency k in the string will not appear in the longest substring. Then, using this property, the whole string can be divided into left and right substrings, and then the longest substrings of left and right substrings can be solved separately, Find the longest substring among all the longest substrings that meet the problem design and return its length to complete the problem requirements. Thus, it can be established that the divide and conquer method can be used to solve problems.

The specific ideas are as follows:
   first traverse the whole string, count the frequency of each character, and traverse the string from the beginning. If the frequency of the current character is found to be less than the required frequency, it means that the longest substring will not contain this character. Then, the whole character string can be decomposed into the maximum value of the longest substring of the left and right substrings where the currently traversed character is the dividing point. give an example:

For a string, if the substring is required to appear at least k Times, then if some letters appear less than k,
These letters must not appear in the longest substring, and these letters divide the whole character substring into small segments, which may be the longest
 However, due to being segmented, it is still necessary to check this short paragraph. If some letters appear less than k,Will continue to divide the segment,
Such as string"aacbbbdc",At least 2 occurrences are required,We record the left-right closed interval,,
first round[0,7],handle"aacbbbdc",d There is only one dissatisfaction, so the interval is solved recursively[0,5],[7,7]
Second round[0,5],handle"aacbbb",  c There is only one dissatisfaction, so the interval is solved recursively[0,1],[3,4] 
Second round[7,7],handle"c",       c There is only one dissatisfaction, and recursion will not continue
 Third round[0,1],handle"aa",      Meet occurrence times>=2,ret=2
 Third round[3,4],handle"bbb",     Meet occurrence times>=2 ret=3;

   note that in code implementation, divide and conquer and recursion are usually associated, because splitting a large problem into equal sub problems, obtaining the answer of the sub problem and updating the answer of the large problem means the generation of recursive call. There are three elements involved in recursion: return condition, behavior in this layer of recursion and return value.

   for return conditions and return values: ① if K is not greater than 2, the length of S can be returned directly; ② If s is empty, or the length of S is less than k, return 0 directly; ③ If the occurrence frequency of characters in the whole string is greater than or equal to K, the length of the string is returned directly; ④ Otherwise, it returns the larger value of the longest substring length of the divided left and right substrings;

   for recursive actions in this layer, it is to find the partition point / interval and make recursive calls.

    int longestSubstring(string s, int k) {
        if(k <= 1) return s.size();
        if(s.empty() || s.size() < k) return 0;

        int hash[26] = {0};
        for(auto ch : s) ++hash[ch - 'a'];

        int i = 0;
        while(i < s.size() && hash[s[i] - 'a'] >= k) ++i;
        if(i == s.size()) return s.size();

        int l = longestSubstring(s.substr(0, i), k);
        //Optimize the operation, expand the segmentation point, find the segmentation substring, and traverse i to the character position with occurrence frequency > = K
        while(i < s.size() && hash[s[i] - 'a'] < k) i++;
        //i = i == s.size() - 1 ? i : i + 1;
        int r = longestSubstring(s.substr(i, s.size()), k);

        return max(l, r);
    }
Time complexity O(n * ∣Σ∣), Spatial complexity O(∣Σ∣ * ∣Σ∣),among n Is the length of the string,∣Σ∣Maximum recursion depth.

2, Sliding window

As an extension, it is also a method, which is not intuitive.

For substrings that meet the requirements of the topic,The type of characters contained in it is limited to the character set of lowercase letters,Size∣Σ∣ = 26
 Suppose that the longest substring satisfying the question contains the number of character types type,Maintain a sliding window,Record the following information in this window;
1.Left boundary index l;
2.Right boundary index r;
3.The number of occurrences of each character in the window hash[s[idx] - 'a'];
4.The number of occurrences in the window is less than required K Number of character types lessK

With the above window information,Then the idea of maintaining the sliding window is available,The goal is to maintain the number of character types in the sliding window equal to type
 During maintenance,The whole is to move the right boundary of the window to the right l(Time complexity O(n))):
1.If the category in the window is not reached type Then continue to move the right boundary to the right l,Update window information
2.If the number of categories in the discovery window is greater than type,You need to move the left boundary to the right at this time r,Update window information at the same time
3.If the number of categories in the discovery window is equal to type,And lessK Equal to 0,Then the sliding window at this time is the substring that meets the problem setting,Update the length of the longest substring
    int longestSubstring(string s, int k) {
        int res = 0;
        for(int types = 1; types <= 26; ++types) {
            int l = 0, r = 0;
            int hash[26] = {0};
            int type = 0, lessK = 0;
            while(r < s.size()) {
                hash[s[r] - 'a']++;
                if(hash[s[r] - 'a'] == 1) { type++; lessK++;}
                if(hash[s[r] - 'a'] == k) lessK--;

                while(type > types) {
                    hash[s[l] - 'a']--;
                    if(hash[s[l] - 'a'] == k - 1) lessK++;
                    if(hash[s[l] - 'a'] == 0) { type--; lessK--;}
                    l++;
                }
                if(lessK == 0) res = max(res, r - l + 1);
                r++;
            }
        }
        return res;
    }
The time complexity is O(∣Σ∣*(n + ∣Σ∣)),The space complexity is O(∣Σ∣)

Tags: Algorithm string leetcode divide and conquer

Posted by tex1820 on Fri, 15 Apr 2022 00:34:41 +0930