algorithm - How can I better understand the one-comparison-per-iteration binary search? -
what point of one-comparison-per-iteration binary search? , can explain how works?
there 2 reasons binary search 1 comparison per iteration. less important performance. detecting exact match using 2 comparisons per iteration saves average 1 iteration of loop, whereas (assuming comparisons involve significant work) binary searching 1 comparison per iteration halves work done per iteration.
binary searching array of integers, makes little difference either way. expensive comparison, asymptotically performance same, , half-rather-than-minus-one isn't worth pursuing in cases. besides, expensive comparisons coded functions return negative, 0 or positive <
, ==
or >
, can both comparisons pretty price of 1 anyway.
the important reason binary searches 1 comparison per iteration because can more useful results some-equal-match. main searches can are...
- first key > goal
- first key >= goal
- first key == goal
- last key < goal
- last key <= goal
- last key == goal
these reduce same basic algorithm. understanding enough can code variants isn't difficult, i've not seen explanation - pseudocode , mathematical proofs. attempt @ explanation.
there games idea close possible target without overshooting. change "undershooting", , that's "find first >" does. consider ranges @ stage during search...
| lower bound | goal | upper bound +-----------------+-------------------------+-------------- | illegal | better worse | +-----------------+-------------------------+--------------
the range between current upper , lower bound still need searched. our goal (normally) in there somewhere, don't yet know where. interesting point items above upper bound legal in sense greater goal. can item above current upper bound our best-so-far solution. can @ start, though there no item @ position - in sense, if there no valid in-range solution, best solution hasn't been disproved past upper bound.
at each iteration, pick item compare between upper , lower bound. binary search, that's rounded half-way item. binary tree search, it's dictated structure of tree. principle same either way.
as searching item greater-than our goal, compare test item using item [testpos] > goal
. if result false, have overshot (or undershot) our goal, keep our existing best-so-far solution, , adjust our lower bound upwards. if result true, have found new best-so-far solution, adjust upper bound down reflect that.
either way, never want compare test item again, adjust our bound eliminate (only just) test item range search. being careless results in infinite loops.
normally, half-open ranges used - inclusive lower bound , exclusive upper bound. using system, item @ upper bound index not in search range (at least not now), is best-so-far solution. when move lower bound up, move testpos+1
(to exclude item tested range). when move upper bound down, move testpos (the upper bound exclusive anyway).
if (item[testpos] > goal) { // new best-so-far upperbound = testpos; } else { lowerbound = testpos + 1; }
when range between lower , upper bounds empty (using half-open, when both have same index), result recent best-so-far solution, above upper bound (ie @ upper bound index half-open).
so full algorithm is...
while (upperbound > lowerbound) { testpos = lowerbound + ((upperbound-lowerbound) / 2); if (item[testpos] > goal) { // new best-so-far upperbound = testpos; } else { lowerbound = testpos + 1; } }
to change first key > goal
first key >= goal
, literally switch comparison operator in if
line. the relative operator , goal replaced single parameter - predicate function returns true if (and if) parameter on greater-than side of goal.
that gives "first >" , "first >=". "first ==", use "first >=" , add equality check after loop exits.
for "last <" etc, principle same above, range reflected. means swap on bound-adjustments (but not comment) changing operator. before doing that, consider following...
a > b == !(a <= b) >= b == !(a < b)
also...
- position (last key < goal) = position (first key >= goal) - 1
- position (last key <= goal) = position (first key > goal ) - 1
when move our bounds during search, both sides being moved towards goal until meet @ goal. , there special item below lower bound, there above upper bound...
while (upperbound > lowerbound) { testpos = lowerbound + ((upperbound-lowerbound) / 2); if (item[testpos] > goal) { // new best-so-far first key > goal @ [upperbound] upperbound = testpos; } else { // new best-so-far last key <= goal @ [lowerbound - 1] lowerbound = testpos + 1; } }
so in way, have 2 complementary searches running @ once. when upperbound , lowerbound meet, have useful search result on each side of single boundary.
for cases, there's chance that original "imaginary" out-of-bounds best-so-far position final result (there no match in search range). needs checked before doing final ==
check first == , last == cases. might useful behaviour, - e.g. if you're searching position insert goal item, adding after end of existing items right thing if existing items smaller goal item.
a couple of notes on selection of testpos...
testpos = lowerbound + ((upperbound-lowerbound) / 2);
first off, never overflow, unlike more obvious ((lowerbound + upperbound)/2)
. works pointers integer indexes.
second, division assumed round down. rounding down non-negatives ok (all can sure of in c) difference non-negative anyway.
this 1 aspect may need care if use non-half-open ranges, though - make sure test position inside search range, , not outside (on 1 of already-found best-so-far positions).
finally, in binary tree search, moving of bounds implicit , choice of testpos
built structure of tree (which may unbalanced), yet same principles apply search doing. in case, choose our child node shrink implicit ranges. first match cases, either we've found new smaller best match (go lower child in hopes of finding smaller , better one) or we've overshot (go higher child in hopes of recovering). again, 4 main cases can handled switching comparison operator.
btw - there more possible operators use template parameter. consider array sorted year month. maybe want find first item particular year. this, write comparison function compares year , ignores month - goal compares equal if year equal, goal value may different type key doesn't have month value compare. think of "partial key comparison", , plug binary search template , think of "partial key search".
edit paragraph below used "31 dec 1999 equal 1 feb 2000". wouldn't work unless whole range in-between considered equal. point 3 parts of begin- , end-of-range dates differ, you're not deal "partial" key, keys considered equivalent search must form contiguous block in container, imply contiguous block in ordered set of possible keys.
it's not strictly "partial" keys, either. custom comparison might consider 31 dec 1999 equal 1 jan 2000, yet other dates different. point custom comparison must agree original key ordering, might not picky considering different values different - can treat range of keys "equivalence class".
an note bounds should have included before, may not have thought way @ time.
one way of thinking bounds aren't item indexes @ all. bound boundary line between 2 items, can number boundary lines can number items...
| | | | | | | | | | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | | |0| | |1| | |2| | |3| | |4| | |5| | |6| | |7| | | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | | | | | | | | | | 0 1 2 3 4 5 6 7 8
obviously numbering of bounds related numbering of items. long number bounds left-to-right , same way number items (in case starting zero) result same common half-open convention.
it possible select middle bound bisect range precisely two, that's not binary search does. binary search, select item test - not bound. item tested in iteration , must never tested again, it's excluded both subranges.
| | | | | | | | | | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | | |0| | |1| | |2| | |3| | |4| | |5| | |6| | |7| | | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | +-+ | | | | | | | | | | 0 1 2 3 4 5 6 7 8 ^ |<-------------------|------------->| | |<--------------->| | |<--------->| low range hi range
so testpos
, testpos+1
in algorithm 2 cases of translating item index bound index. of course if 2 bounds equal, there's no items in range choose loop cannot continue, , possible result 1 bound value.
the ranges shown above ranges still searched - gap intend close between proven-lower , proven-higher ranges.
in model, binary search searching boundary between 2 ordered kinds of values - classed "lower" , classed "higher". predicate test classifies 1 item. there no "equal" class - equal-to-key values part of higher class (for x[i] >= key
) or lower class (for x[i] > key
).
Comments
Post a Comment