I am getting the above error when trying to run a tpcds query 30 in Hive. I did research and know this is not allowed in Hive so I am wondering how to rewrite this query. I directly got it from this website. http://www.tpc.org/tpcds/default5.asp
Error: Error while compiling statement: FAILED: SemanticException Line 0:-1 Unsupported SubQuery Expression 'ctr_state': Only SubQuery expressions that are top level conjuncts are allowed
Query 30
with customer_total_return as (select wr_returning_customer_sk as ctr_customer_sk ,ca_state as ctr_state, sum(wr_return_amt) as ctr_total_return from web_returns ,date_dim ,customer_address where wr_returned_date_sk = d_date_sk and d_year =2000 and wr_returning_addr_sk = ca_address_sk group by wr_returning_customer_sk ,ca_state) select c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address ,c_last_review_date_sk,ctr_total_return from customer_total_return ctr1 ,customer_address ,customer where ctr1.ctr_total_return > (select avg(ctr_total_return)*1.2 from customer_total_return ctr2 where ctr1.ctr_state = ctr2.ctr_state) and ca_address_sk = c_current_addr_sk and ca_state = 'GA' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address ,c_last_review_date_sk,ctr_total_return limit 100;
Update
Query 30 may have a typo when you generate the query using the tpcds suite. This does not exist in the customer table c_last_review_date_sk
and you need to change it to c_last_review_date
Advertisement
Answer
Calculate avg(ctr_total_return)
in the subquery customer_total_return
using analytic function and remove subquery from the WHERE
:
with customer_total_return as ( select ctr_customer_sk, ctr_state, ctr_total_return, avg(ctr_total_return) over(partition by ctr_state ) as ctr_state_avg from (select wr_returning_customer_sk as ctr_customer_sk ,ca_state as ctr_state, sum(wr_return_amt) as ctr_total_return from web_returns ,date_dim ,customer_address where wr_returned_date_sk = d_date_sk and d_year =2000 and wr_returning_addr_sk = ca_address_sk group by wr_returning_customer_sk ,ca_state ) s ) select c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address ,c_last_review_date_sk,ctr_total_return from customer_total_return ctr1 ,customer_address ,customer where ctr1.ctr_total_return > ctr1.ctr_state_avg*1.2 and ca_address_sk = c_current_addr_sk and ca_state = 'GA' and ctr1.ctr_customer_sk = c_customer_sk order by c_customer_id,c_salutation,c_first_name,c_last_name,c_preferred_cust_flag ,c_birth_day,c_birth_month,c_birth_year,c_birth_country,c_login,c_email_address ,c_last_review_date_sk,ctr_total_return limit 100;