Skip to content
Advertisement

Cross Join in Hive

I’m trying to create a new column time_period while running the query below. If the date difference between a given transaction and the most recent transaction in the reference table is fewer than 7 days, then mark it as a recent transaction, else mark it as an old transaction.

However, the query below is generating an error in the subquery associated with the cross join. The error is “Cannot recognize input near ‘select’ ‘(”max’

SELECT
    c.*
    FROM(
        SELECT
         a.acct_nb,
         a.txn_date,
         a.txn_amt,
         (CASE WHEN datediff(b.most_recent_txn,a.txn_date)<7 THEN 'recent' ELSE 'old' END) AS time_period
         FROM db.t1 a
         CROSS JOIN(
              SELECT max(txn_date) AS most_recent_txn --ERROR OCCURS HERE
              FROM db.t1 b)
        )c
        WHERE c.time_period='new';

What could be causing this error?

Advertisement

Answer

The alias b should be applied to the cross joined subquery and not to the table db.t1 inside the subquery:

SELECT c.*
FROM (
  SELECT a.acct_nb, a.txn_date, a.txn_amt,
         CASE WHEN datediff(b.most_recent_txn, a.txn_date) < 7 THEN 'recent' ELSE 'old' END AS time_period
  FROM db.t1 a
  CROSS JOIN (
    SELECT max(txn_date) AS most_recent_txn
    FROM db.t1 
  ) b
) c
WHERE c.time_period='new';  

Also, there is no branch of your CASE expression returning 'new' so the last WHERE clause will filter out all rows.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement