Hive – Merge rows with search term substrings

Question

I got a search results log table with search terms in one of the columns. Since the search results are produced as the user types in the search terms, there are multiple rows for each search term with the partial string. For example, as the user types world the resulting rows in the table will be: Expected results: I'm using

Accepted Answer

You can treat this as a gaps-and-islands problem.  Look back and see if the word does not match and then accumulate and aggregate:with words as (      select t.*,             sum(case when prev_word is null or                           word like prev_word || '%' or                           prev_ts < ts - 60   -- say within a minute                      then 1 else 0                 end) as grp      from (select t.*,                    lag(word) over (partition by userid order by ts) as prev_word,                   lag(ts) over (partition by userid order by ts) as prev_ts            from t           ) t     )select w.*from (select w.*,             row_number() over (partition by userid, grp order by ts desc) as seqnum      from words w     ) wwhere seqnum = 1;Actually, there is an easier way.  Just look forward:select w.*from (select t.*,              lag(word) over (partition by userid order by ts) as next_word,              lead(ts) over (partition by userid order by ts) as next_ts       from t      ) twhere next_ts is null or      next_ts > ts + 60 or      next_word not like word || '%';

Advertisement

Answer