Skip to content
Advertisement

Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns

I have a dataset with booking hotels. date_in has format “yyyy-MM-dd”. I need select top 10 the most visited hotel by month.

SELECT top_visits.date_ci, top_visits.hotel_id, top_visits.count_visits
FROM (
   SELECT  date_ci, hotel_id, COUNT(id) AS count_visits, 
   RANK() OVER (
   PARTITION BY date_ci, hotel_id ORDER BY COUNT(id) DESC) as rank
   FROM ( 
      SELECT id, hotel_id, SUBSTRING(my_tab.date_in, 1, 7) as date_ci 
      FROM my_database.my_tab) x
   ) top_visits
GROUP BY date_ci, hotel_id HAVING rank <= 10;

I get the following error:

Error: Error while compiling statement: FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 4:13 Expression not in GROUP BY key ‘hotel_id’

Advertisement

Answer

Move COUNT(id) aggregation into the subquery, add group by:

SELECT top_visits.date_ci, top_visits.hotel_id, top_visits.count_visits
FROM (
      SELECT date_ci, hotel_id, count_visits, 
             RANK() OVER (PARTITION BY date_ci, hotel_id ORDER BY count_visits DESC) as rank
        FROM ( 
              SELECT hotel_id, SUBSTRING(my_tab.date_in, 1, 7) as date_ci,
                     COUNT(id) AS count_visits 
                FROM my_database.my_tab
              GROUP BY hotel_id, SUBSTRING(my_tab.date_in, 1, 7)
             ) x
     ) top_visits
WHERE rank <= 10;
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement