warning

In this article, I’ll introduce you to the different search algorithms and their complexity. Because we try to be easy to understand, so the length may be longer, we can Mark down first, every day take time to read and understand a little bit. Welcome to the Github Repo, which will be updated all the time.

The opening

Similar to sorting, search, or search, is one of the most commonly used algorithms. Whether we’re searching a database or a file, we’re actually using some kind of search algorithm to locate the data we’re looking for.

Linear search

The most common way to perform a search is to compare each item to the data we are looking for, which is called a linear or sequential search. It’s the most basic way to perform a search. If I have n items in my list. In the worst case. We have to search n items to find a particular item. Let’s go through an array to find an item.

function linearSearch(array $arr, int $needle) {
    for ($i = 0, $count = count($arr); $i < $count; $i++) {
        if ($needle === $arr[$i]) {
            return true; }}return false;
}
Copy the code

The complexity of linear lookup

best time complexity O(1)
worst time complexity O(n)
Average time complexity O(n)
Space time complexity O(1)

Binary search

The average time complexity or worst time complexity of a linear search is O(n), which does not change with the order of the array to be searched. So if the items in the array are sorted in a particular order, we don’t have to do a linear search. We can get better results by performing selective searches. The most popular and famous search algorithm is binary search. It’s a little bit like the binary search tree we talked about earlier, but we don’t have to construct a binary search tree to use this algorithm.


function binarySearch(array $arr, int $needle) {
    $low = 0;
    $high = count($arr) - 1;

    while ($low <= $high) {
        $middle = (int)(($high + $low) / 2);

        if ($arr[$middle] < $needle) {
            $low = $middle + 1;
        } elseif ($arr[$middle] > $needle) {
            $high = $middle - 1;
        } else {
            return true; }}return false;
}
Copy the code

In a binary search algorithm, we start at the middle of the data, check whether the middle item is smaller or larger than the item we are looking for, and decide which way to go. In this way, we split the list in half and discard one half completely, as shown in the following image.

Recursive version:

function binarySearchRecursion(array $arr, int $needle, int $low, int $high)
{
    if ($high < $low) return false;

    $middle = (int)(($high + $low) / 2);

    if ($arr[$middle] < $needle) {
        return binarySearchRecursion($arr, $needle, $middle + 1, $high);
    } elseif ($arr[$middle] > $needle) {
        return binarySearchRecursion($arr, $needle, $low, $middle - 1);
    } else {
        return true; }}Copy the code

Binary search complexity analysis

For each iteration, we split the data in half, discarded half, and used the other half for search. After one, two and three iterations, respectively, our list length gradually reduces to n/2, n/4, n/8… . Therefore, we can see that after k iterations, only n over 2 to the k term will be left. So we end up with n over 2 to the k is equal to 1, and then we take the log of both sides and we get k is equal to log of n, which is the worst run time for the binary search algorithm.

best time complexity O(1)
worst time complexity O(log n)
Average time complexity O(log n)
Space time complexity O(1)

Duplicate binary search

In a scenario where we have an array of duplicate data, if we want to find the first occurrence of 2 in the array, the previous algorithm will return the fifth element. However, as we can clearly see from the image below, the correct result tells us that it is not the fifth element, but the second element. Therefore, the above binary search algorithm needs to be modified to make it a repeated search that does not stop until the first appearance of the element.

function repetitiveBinarySearch(array $data, int $needle)
{
    $low = 0;
    $high = count($data);
    $firstIndex = - 1;

    while ($low <= $high) {
        $middle = ($low + $high) >> 1;

        if ($data[$middle] === $needle) {
            $firstIndex = $middle;
            $high = $middle - 1;
        } elseif ($data[$middle] > $needle) {
            $high = $middle - 1;
        } else {
            $low = $middle + 1; }}return $firstIndex;
}
Copy the code

First we check whether mid corresponds to the value we are looking for. If so, we specify the intermediate index as the index that appeared the first time, and we continue to check the element to the left of the intermediate element to see if the value we are looking for appears again. And then continue iterating untilHigh. If the value is not found again, the first occurrence is the value of the item’s first index. If not, return -1 as usual. Let’s run a test to see if the code is correct:

public function testRepetitiveBinarySearch(a)
{
    $arr = [1.1.1.2.3.4.5.5.5.5.5.6.7.8.9.10];

    $firstIndex = repetitiveBinarySearch($arr, 6);

    $this->assertEquals(11, $firstIndex);
}
Copy the code

The results were found to be correct.

So far, we can conclude that binary search is definitely faster than linear search. The prerequisite for all this, however, is that the array is sorted. Applying binary search to an unsorted array results in incorrect results. So there might be a situation where we have an array, and we’re not sure if it’s sorted or not. Now the question is, should I sort the array first and then apply binary search? Or do you stick with the linear search algorithm?

Small thinking

For an array of n items, and they are not sorted. Since we knew that binary search was faster, we decided to sort it first and then use binary search. However, we know that the best sorting algorithm has a worst-case time complexity of O(nlogn), and a worst-case complexity of O(logn) for binary search. So, if we sort and apply binary search, the complexity is going to be order nlogn.

However, we also know that for any linear or sequential search (sorted or unsorted), the worst time complexity is O (n), which is clearly better than the above scheme.

Consider another case where we need to search a given array multiple times. We’re going to express k as the number of times we want to search the array. If k is one, then we can easily apply our previous linear search method. If the value of k is smaller than the size of the array, let’s use n for the time being. If k is closer to or greater than n, then we have some problems with linear methods. Assuming k = n, linear search will have complexity O (n2). Now, if we sort and then search, then even if k is larger, it’s only going to take order nlogn time to sort. Then, the complexity of each search is O (logn), and the complexity of n searches is O (nlogn). If we take the worst-case run here, and sort and search k times the total complexity is order nlogn, which is obviously better than the sequential search.

We can conclude that if some of the search operations are smaller than the length of the array, it is better not to sort the array and just do the sequential search. However, if the number of search operations is greater than the size of the array, it is best to sort the array first and then use binary search.

There are many different versions of binary search algorithms. Instead of selecting an intermediate index every time, we can make a computational decision to choose which index to use next. We now look at two variations of the binary search algorithm: interpolating search and exponential search.

The interpolation search

In a binary search algorithm, the search process always starts in the middle of the array. If an array is evenly distributed, and the data we’re looking for may be near the end of the array, then searching in the middle may not be a good choice. In this case, interpolating searches can be very useful. Interpolating search is an improvement of binary search algorithm. Interpolating search can reach different positions based on the value of the search. For example, if we are searching for values near the beginning of the array, it will go directly to the first part of the array instead of the middle. Calculate the position using the formula shown below

As you can see, we will go from the generic mid = (low * high)/2 to a more complex equation. If the value of the search is closer to ARR [high], this formula returns a higher index, and if the value is closer to ARR [low], this formula returns a lower index.

function interpolationSearch(array $arr, int $needle)
{
    $low = 0;
    $high = count($arr) - 1;

    while($arr[$low] ! = $arr[$high] && $needle >= $arr[$low] && $needle <= $arr[$high]) { $middle = intval($low + ($needle - $arr[$low]) * ($high - $low) / ($arr[$high] - $arr[$low]));if ($arr[$middle] < $needle) {
            $low = $middle + 1;
        } elseif ($arr[$middle] > $needle) {
            $high = $middle - 1;
        } else {
            return$middle; }}if ($needle == $arr[$low]) {
    	return $low;
    } 
    
    return - 1;
    
}
Copy the code

Interpolating search requires more computational steps, but if the data is evenly distributed, the average complexity of this algorithm is O(log(log n)), which is much better than the complexity O(logn) of binary search. In addition, we must be careful if the values are not evenly distributed. In this case, the performance of the interpolated search may need to be reevaluated. Next we’ll explore another variant of binary search called exponential search.

Index search

In binary search, we search the entire list for the given data. Exponential search improves binary search by determining the lower and upper bounds of the search, so that we don’t search the entire list. It reduces the number of elements we compare during the search. The index search is done in two steps:

1. We determine the boundary size by looking for the first exponent K, where the value 2^k is greater than the search term. Now, 2 to the k and 2 to the k minus 1 are the upper limit and lower limit, respectively. 2. Use the above boundaries for binary search.

Let’s look at the code for the PHP implementation

function exponentialSearch(array $arr, int $needle): int
{
    $length = count($arr);
    if ($length == 0) return - 1;

    $bound = 1;

    while ($bound < $length && $arr[$bound] < $needle) {
        $bound *= 2;
    }

    return binarySearchRecursion($arr, $needle, $bound >> 1, min($bound, $length));
}
Copy the code

If we mark the position where $needle appears as bit I, then the time complexity of our first step is O(logi). That means our while loop needs to execute O(logi) times in order to find the upper bound. Because the next step is to apply a binary search, and the time is also order logi. So let’s assume that j is the number of times we executed our last while loop, so this binary search we’re going to be searching in the range from 2^j minus 1 to 2^j, and j is equal to logi, which is

So the time complexity of our binary search is log2 over this range, which is

So the entire exponential search time is two O logi, minus the constants, is order logi.

best time complexity O(1)
worst time complexity O(log i)
Average time complexity O(log i)
Space time complexity O(1)

Hash lookup

Hash tables can be very efficient data structures for search operations. In a hash table, each piece of data has a unique index associated with it. If we know which index to look at, we can easily find the corresponding value. Normally, in other programming languages, we would have to use separate hash functions to calculate the hash index of the stored value. Hash functions are designed to generate the same index for the same value and avoid collisions.

The array itself is a hash table in PHP’s underlying C implementation, and since arrays are dynamic, you don’t have to worry about array overflow. We can store values in associative arrays so that we can associate values with keys.

function hashSearch(array $arr, int $needle)
{
    return isset($arr[$needle]) ? true : false;
}
Copy the code

Tree search

One of the best ways to search hierarchical data is to create a search tree. In understanding and implementing trees, we learned how to build binary search trees and improve search efficiency, and introduced different methods of traversing trees. Now, let’s move on to the two most common methods of searching for trees, commonly referred to as breadth-first (BFS) and depth-first (DFS).

Breadth First Search (BFS)

In a tree structure, the root is connected to its children, and each child node can continue to be represented as a tree. In a breadth-first search, we start with the node (primarily the root node) and visit all neighboring nodes first before visiting other neighbor nodes. In other words, we have to move step by step when using BFS.

Using BFS, you get the following sequence.

The pseudocode is as follows:

procedure BFS(Node root)
    Q := empty queue
    Q.enqueue(root)
    
    while(Q ! = empty) u := Q.dequeue()for each node w that is childnode of u
            Q.enqueue(w)
        end for each
    end while
end procedure        
Copy the code

Here is the PHP code.

class TreeNode
{
    public $data = null;
    public $children = [];

    public function __construct(string $data = null)
    {
        $this->data = $data;
    }

    public function addChildren(TreeNode $treeNode)
    {
        $this->children[] = $treeNode; }}class Tree
{
    public $root = null;

    public function __construct(TreeNode $treeNode)
    {
        $this->root = $treeNode;
    }

    public function BFS(TreeNode $node): SplQueue
    {
        $queue = new SplQueue();
        $visited = new SplQueue();

        $queue->enqueue($node);

        while(! $queue->isEmpty()) { $current = $queue->dequeue(); $visited->enqueue($current);foreach ($current->children as$children) { $queue->enqueue($children); }}return$visited; }}Copy the code

Full examples and tests can be found here.

If you want to find out if a node exists, you can simply add a simple condition to the current node value. BFS worst time complexity is O (| V | E | | +), in which V is the number of vertices or nodes, E is the edge or the number of connections between nodes, the worst space complexity is O (| V |).

The BFS of the figure is similar to the above, but slightly different. Since graphs are looping (you can create loops), you need to make sure that we don’t repeat visits to the same node to create an infinite loop. To avoid revisiting graph nodes, you must keep track of nodes that have already been visited. You can use queues or you can use graph coloring algorithms.

Depth first Search (DFS)

Depth first search (DFS) refers to a search starting at a node and reaching as deep as possible from the target node through branches. DFS differs from BFS in that, simply put, DFS digs deeper rather than diffuses first. The DFS reaches the end of the branch and then backtracks up and moves to the next available adjacent node until the search is complete. It’s still the tree up there

This time we will get an invalid traversal order:

Start at the root, and then access the first child, which is 3. We then go to the child node of 3 and do this again and again until we reach the bottom of the branch. In DFS, we will do this recursively.

procedure DFS(Node current)
    for each node v that is childnode of current
       DFS(v)
    end for each
end procedure

Copy the code
 public function DFS(TreeNode $node): SplQueue
{
    $this->visited->enqueue($node);

    if ($node->children) {
        foreach ($node->children as $child) {
            $this->DFS($child); }}return $this->visited;
}
Copy the code

If you need to use an iterative implementation, you must remember to use a stack, not a queue, to track the next node to be accessed. The following uses an implementation of the iterative approach

public function DFS(TreeNode $node): SplQueue
{
    $stack = new SplStack();
    $visited = new SplQueue();

    $stack->push($node);

    while(! $stack->isEmpty()) { $current = $stack->pop(); $visited->enqueue($current);foreach ($current->children as$child) { $stack->push($child); }}return $visited;
}
Copy the code

This looks very similar to the BFS algorithm. The main difference is the use of stacks rather than queues to store the nodes being accessed. It makes a difference to the outcome. The above code will print 8, 10, 14, 13, 3, 6, 7, 4, 1. This is different from the algorithm output that we use with iteration, but it’s actually fine.

Because you use a stack to store the children of a particular node. For the root node with a value of 8, the first child node with a value of 3 is pushed first, followed by 10. Since 10 is pushed later, it follows LIFO. So, if we implement DFS with a stack, the output always starts from the last branch to the first. You can make some small adjustments in the DFS code to achieve the desired effect.

public function DFS(TreeNode $node): SplQueue
{
    $stack = new SplStack();
    $visited = new SplQueue();

    $stack->push($node);

    while(! $stack->isEmpty()) { $current = $stack->pop(); $visited->enqueue($current); $current->children = array_reverse($current->children);foreach ($current->children as$child) { $stack->push($child); }}return $visited;
}
Copy the code

Since the stack follows a last-in, first-out (LIFO), by reversing, you can ensure that the First node is accessed First, and because the order is reversed, the stack is actually working as a queue. If we are searching for a binary tree, we don’t need any inversion, because we can choose to push the right child first and then the left child first.

The time complexity of DFS is similar to BFS.