Hash table efficient processing of strings

Hash table (hash table) is a very efficient search data structure, in principle and other search is not the same, it avoids the tedious repeated comparison between keywords, but directly one step in place to find the results. Of course, this brings with it the disadvantage of having no connection between the records. It should be said that hash tables are very good for finding data that requires high performance and no relationship between records. Note the choice of hash functions and how conflicts are handled.

Hash tables take O(1) time to insert, delete, and search data. However, Hash tables do not guarantee the order of data in the table. Therefore, the maximum or minimum search time in the Hash table is O(N).

 

/* Complete the function of filtering repeated characters in the string.

[Input] :1. Constant string; 2. String length; 3. [out] outputs the filtered character string.

[Output] : The filtered character string.

* /

Idea 1, circular decision method. Step 1: Record the first character in the string. The second step, and then from the second character, determine whether it is the same as the character before it, if not, the statistics; In the same case, the loop continues until the end of the string (‘ \0 ‘is encountered). Time complexity: O (n2).

Idea 2: Hash table filtering. The first step is to initialize a hash table to store characters (keys) and the number of occurrences of characters. Step 2, iterate through the hash table for statistical counting; In the third step, the output count is 1 and the output count is more than 1 (output one time). Time complexity: O (n).

// Loop filter to filter out repeated characters

void stringFilter(const char*pInputStr, long lInputLen, char *pOutputStr)
{
       if(pInputStr== NULL || lInputLen == 0 || pOutputStr == NULL)
       {
              return;
       }
      
       intnCnt = 0;
       *pOutputStr= pInputStr[0];            // Do the first one first
       ++nCnt;
      
       intnNotEqualCnt = 0;                 // Statistics count
       for(inti = 1; i < lInputLen; i++)
       {
              nNotEqualCnt= 0;
              for(intj = i- 1; j >=0; j--)
              {
                     if(pInputStr[i]!= pInputStr[j])
                     {
                            ++nNotEqualCnt;
                     }
              }
             
              if(nNotEqualCnt== i)  // Not the same as the previous ones.{ pOutputStr[nCnt++]= pInputStr[i]; }}//endfor
       pOutputStr[nCnt]= '\ 0';
}
Copy the code

\

// Hash filters duplicate characters in a string

void stringFilterFast(const char*pInputStr, long lInputLen, char *pOutputStr)
{
       charrstChar = '\ 0';
       boolbNotRepeatFound = false;
       constunsigned int size = 256;
       unsignedint hashTable[size];
       constchar* pHashKey = pInputStr;
       intoutPutCnt = 0;
      
       if(pInputStr== NULL)
       {
              return;
       }
      
       // Initialize the hash table
       for(unsignedint i = 0; i < size; i++)
       {
              hashTable[i]= 0;
       }
      
       // Read pString into the hash table
       while(*pHashKey! ='\ 0')
       {
              cout<< *pHashKey << "\t";
              hashTable[*pHashKey]++;    // Statistics count
              pHashKey++;
       }    
 
       // Read the hash table, store the hash table that occurs only once, store the hash table that occurs more than once.
       pHashKey= pInputStr;
       while(*pHashKey! ='\ 0')
       {
              if((hashTable[*(pHashKey)])== 1)   // Only once,
              {
                     pOutputStr[outPutCnt++]= *pHashKey;
              }
              elseif((hashTable[*(pHashKey)]) > 1) // If there is more than one time, count the first time
              {
                     pOutputStr[outPutCnt++]= *pHashKey;
                     hashTable[*(pHashKey)]= 0;
              }
              pHashKey++;
       }
       pOutputStr[outPutCnt]= '\ 0';
 
}
 
int main(a)
{
       constchar* strSrc = "desdefedeffdsswwwwwwwwwwdd";//"desdefedeffdssw";
       char*strRst =new char[strlen(strSrc)+1];
       stringFilter(strSrc,strlen(strSrc), strRst);
       cout<< strRst << endl;
       if (NULL! = strRst){delete[] strRst;  strRst = NULL; }return 0;
}
Copy the code


\

// Hash table searches for the first non-repeating character in a string

[function] : Search for the first non-repeating character in a string.

[Input] : string.

[Output] : The first non-repeating character.

Time complexity O (n), the idea is similar to the above hash table filtering method.

char FirstNotRepeatingChar(constchar* pString)
{
       charrstChar = '\ 0';
       boolbNotRepeatFound = false;
       constunsigned int size = 256;
       unsignedchar hashTable[size];
       constchar* pHashKey = pString;
 
       if(pString== NULL)
       {
              returnrstChar;
       }
 
       // Initialize the hash table
       for(unsignedint i = 0; i < size; i++)
       {
              hashTable[i] = 0;
       }
      
       // Store the pString into the hash table
       while(*pHashKey! ='\ 0')
       {
              hashTable[*(pHashKey++)]++;    // Statistics count
       }
 
       // Read the hash table, find the first character =1, bNotRepeatFound used to find. .
       pHashKey= pString;
       while(*pHashKey! ='\ 0')
       {
              if((hashTable[*(pHashKey)]) == 1)
              {
                     bNotRepeatFound= true;
                     rstChar= *pHashKey;
                     break;
              }
              pHashKey++;
       }
 
       if(bNotRepeatFound)
       {
              cout<< "The first not Repeate char is " << rstChar <<endl;
       }
       else
       {
              cout<< "The first not Repeate char is not Exist " << endl;
       }
 
       returnrstChar;
}
 
int main(a)
{
       constchar* strSrc = "google";
       constchar* strSrc2 = "[email protected]";
       constchar* strSrc3 = "aabbccddeeff";
       constchar* strsrc4 = "11111111";  
                                                                                                                   
       constchar* strArray[4] = {strSrc, strSrc2, strSrc3, strsrc4};
 
      for(inti = 0; i < 4; i++)
      {
              FirstNotRepeatingChar(strArray[i]);
       }
 
       return0;
}
Copy the code

For a massive file with different urls, minimize the time complexity to remove duplicate urls. Can refer to the string processing hash table filtering method. However, the large file here is equivalent to the previous string, and the URL here is equivalent to the previous character.