Skip to content
Advertisement

Hive – Merge rows with search term substrings

I got a search results log table with search terms in one of the columns. Since the search results are produced as the user types in the search terms, there are multiple rows for each search term with the partial string. For example, as the user types world the resulting rows in the table will be:

Expected results:

I’m using Hive and wanted to know if there is a way to compare the substrings across multiple rows by same user within a time limit?

Thanks!

Advertisement

Answer

You can treat this as a gaps-and-islands problem. Look back and see if the word does not match and then accumulate and aggregate:

Actually, there is an easier way. Just look forward:

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement