Unit 21: Searching
Learning Objectives
After this unit students should be familiar with the linear search and binary search algorithms, including understanding how the algorithms work, implementing the algorithms, arguing their correctness, and analyzing their running time.
Linear Search
Let's continue the discussion on efficiency on one of the fundamental problems in computing: how to look for something. Given a list of items
Let's write a function to solve this.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
What is the worst-case running time, expressed in Big-O notation, of the function above? Suppose the query
Can we do better? It turns out that this running time
In previous units, we have seen a recursive divide-and-conquer solution for max
. We can easily adapt it to search as well:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
We then call this function as:
1 |
|
In the code above, we first handle the base case where i > j
, which means that the sublist list[i]..list[j]
to search for q
in, is empty. In this case, q
is not found and we return -1. If the list list[i] .. list[j]
is not empty, we check the element in the middle if it is q
. If we have not yet find q
, then we first search the left sublist. If still not found, we search the right sublist and return the result.
In the worst case, when q
is not in the list, we still have to check through every element in the list. So the running time is still
We can also argue that the running time of the code above is
Here,
Binary Search
But, do we always have to check every element in the list? It turns out that, like many real-life situations, if the input list is sorted, we do not have to scan through every element. We can eliminate a huge chunk of the elements based on whether a chosen element is bigger or smaller than
Suppose that the input list is sorted in increasing order. Pick a random element
Suppose that
We can modify the earlier recursive function into the one below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
We have changed the function so that it only searches either the left side or the right side, but never both.
The search algorithm above is called binary search since it repeatedly cut the range of values to search by half.
Why is it correct?
It is not obvious at first glance that the code above correctly searches for
Let's analyze this function more systematically by writing an assertion for this function. What we want to do here is to eliminate elements in the array that cannot possibly contain list[i]
.. list[j]
range. In other words, we want to assert that
1 |
|
at the beginning of the function. In other words, this is a precondition for the function.
Let's see if this precondition is true at the beginning. Since list[0]..list[i-1]
and list[j+1]..list[n-1]
are empty, so the assertion is true.
What happens if list[0]
..list[i-1]
and the range list[j+1]
..list[n-1]
overlap. We can be sure that list
.
Let's see how we ensure this assertion is true in the recursive call.
1 2 3 4 5 |
|
Line 3 of the snippet above is invoked only if list[mid] > q
. Since the array list
is sorted, we know for sure that any element in list[mid+1]
..list[j]
is larger than
1 |
|
Thus, when Line 3 is invoked, the same assertion holds true. You can apply the same argument to the call:
1 |
|
To summarize, we annotate the code above with the assertions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
|
How Efficient is Binary Search
We have seen that if the input list is not sorted, then we minimally have to check every element in the list, leading to an
With a sorted input and using binary search, however, we can do better. Let's consider the worst case, where
We can also argue that the running time of the code above is
Now,
The "Magic" of Binary Search
How did we reduce the running time of searching from
Exploiting underlying properties of the inputs is a crucial weapon we have in speeding up our algorithm.
Problem Set 21
Problem 21.1
Re-write the binary search algorithm using a loop.
Problem 21.2
Instead of returning -1 if the query q
is not found, modify the binary search algorithm in Problem 21.1 such that it returns either:
- a position
k
, such thatlist[k] <= q <= list[k+1]
. -1
ifq < list[0]
n
ifq > list[n-1]