Skip to content

Tag: hive

Hive: group by calculated column

I need to execute query like select myUsualField, SOME_FUNCTION(myAnotherField) as myUnusualField from MYTABLE group by myUsualField, myUnusualField In Hive this query fails: it cannot find field …

How to count all rows in raw data file using Hive?

I am reading some raw input which looks something like this: Note the first two rows are “good” rows and the last two rows are “bad” rows since they are missing some data. Here is the snippet of my hive query which is reading this raw data into a readonly external table: I need to get …

Performance difference with Where condition in subquery/cte

Is there a performance difference for applying the where condition to a subquery data source compared to applying it at the joined statement? Is there a difference between these in performance? Let’s say I have two hive tables A and B which are both partitioned on the field date. Is that query’s p…

Selecting most recent rows in a SQL query

I want to join two tables, selecting the most recent rows for an ID value present in table 1. i.e. For each ID value in table 1, only return the most recently added row for an ID value. For example, table 1 looks something like this: So if the same ID value is found twice in this table, only return