algorithm - Google search results: How to find the minimum window that contains all the search keywords? -

- April 15, 2010

what complexity of algorithm is used find smallest snippet contains search key words?

as stated, problem solved rather simple algorithm:

just through input text sequentially beginning , check each word: whether in search key or not. if word in key, add end of structure call the current block. current block linear sequence of words, each word accompanied position @ found in text. current block must maintain following property: first word in current block must present in current block once , once. if add new word end of current block, , above property becomes violated, have remove first word block. process called normalization of current block. normalization potentially iterative process, since once remove first word block, new first word might violate property, you'll have remove well. , on.

so, current block fifo sequence: new words arrive @ right end, , removed normalization process left end.

all have solve problem through text, maintain current block, normalizing when necessary satisfies property. shortest block keywords in ever build answer problem.

for example, consider text

cxxxaxxxbxxaxxcxbaxxxc

with keywords a, b , c. looking through text you'll build following sequence of blocks

c ca cab - words, length 9 (cxxxaxxxb...) caba - words, length 12 (cxxxaxxxbxxa...) cabac - violates property, remove first c abac - violates property, remove first bac - words, length 7 (...bxxaxxc...) bacb - violates property, remove first b acb - words, length 6 (...axxcxb...) acba - violates property, remove first cba - words, length 4 (...cxba...) cbac - violates property, remove first c bac - words, length 6 (...baxxxc)

the best block built has length 4, answer in case

cxxxaxxxbxxaxx cxba xxxc

the exact complexity of algorithm depends on input, since dictates how many iterations normalization process make, ignoring normalization complexity trivially o(n * log m), n number of words in text , m number of keywords, , o(log m) complexity of checking whether current word belongs keyword set.

now, having said that, have admit suspect might not need. since mentioned google in caption, might statement of problem gave in post not complete. maybe in case text indexed? (with indexing above algorithm still applicable, becomes more efficient). maybe there's tricky database describes text , allows more efficient solution (like without looking through entire text)? can guess , not saying...

Search This Blog

Manage

algorithm - Google search results: How to find the minimum window that contains all the search keywords? -

Comments

Post a Comment

Popular posts from this blog

How do .net 4.0 [named] tuples work under the hood? -

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -