Number of Guesses in Binary Search?

How do we know the EXACT number of maximum guesses it takes to find a number in a list of numbers, using binary search? Well for one, we know that in general, the binary search algorithm just halves the range of possible numbers after each guess. So if we start with a list of 8 elements, the first "pass" through the algorithm (aka the first iteration of the while loop) will compare the number we are looking for with the middle element of the array (since there are 8 elements, the middle index is rounded down, which leaves a power of 2 elements left to check, more on this later). So since there's 8 elements and we half the list each time, the max guesses should be 3 right? Because 2^3 = 8?? Wrong! Once we get to ONE element in the array to check (in other words, when max and min are the same number), that requires an iteration through the loop to see whether the last, unchecked element is indeed the same as the key. This "extra" iteration is always added to the max number of guesses, which explains why the runtime is always

But why do we floor the log of n?
Well, we can start by writing out some of the max guesses for some sizes of n.

N = 1, Max Guesses = 1.
N = 2, Max Guesses = 2.
N = 3, Max Guesses = 2.
N = 4, Max Guesses = 3.
N = 5, Max Guesses = 3.
N = 6, Max Guesses = 3.
N = 7, Max Guesses = 3.
N = 8, Max Guesses = 4.

You can see that whenever n becomes a power of 2, the max guesses increments by 1. So that's why we floor the log of n, so that we always take the next lowest perfect power of 2.

Yeah... But.. Why?

The truth is, when n=4, and n=7, they have the same number of guesses, even though n is bigger when it is 7 as opposed to 4. The reason is because we truncate the decimal point of any middle index that isn't perfectly in the middle. For example, when n=7, after the first pass, we divide the possible numbers by 2, so 7/2 = 3.5. But since indexing can only be done by integers, this value, when stored inside an int without explicit double casting, is always going to be just 3 (we cut off, or truncated, the decimal point). Repeating again, 3/2 = 1.5 -> 1. So because we truncate decimal points when we find the "middle" index, finding the log of 7 and finding the log of 4 had the same result. Another way to think about this is the log of anything from 4 to 7 is always going to have the same number of guesses, BECAUSE, they are less than the log of 8, which is the bare minimum to have a whole 'nother guess (log 8 is 3, 3 + 1 = 4 which is one larger than 3).

I hope this cleared up some confusion about binary search guesses, some exams actually make you find the number of passes needed to find a specific number (ahem AP Computer Science A), which should be simple since we know the general formula.

No comments:

Post a Comment