Tag: hive

SQL SELECT with an extra column with more than 1 value

I have a table with column_1, column_2, column_3, and select it by: What I want is to add an extra column ‘hour’, it would have 24 possible values from 0 to 23. The outcome is to have every row [column_1, column_2, column_3] repeated 24 times with all possible 24 values of the extra column hour: H…

Hive SQL – time interval in 5 minutes

hive intervals sql time

My data is too large to analyze since it is collecting every second or so. To reduce the data, I will like to group into interval of 5 minutes. I tried converting into unix timestamp and reverting it back, but didn’t work. I tried something like this Original data or output Desired output Answer I don&#…

how to include 0 results in count with group by in HIVEQL

database hive hiveql sql

I’m a newbie in HIVE. I want to include 0 rows in results I have one table like this is my query example result is but my desired result is How can I get a 0 in results? Answer You can do this but you need to remove where clause. You can also do using self join. EDIT – I

How Create a hive external table with parquet format

hdfs hive hiveql impala sql

I am trying to create an external table in hive with the following query in HDFS. getting error Error while compiling statement: FAILED: ParseException line 11:2 missing EOF at ‘LOCATION’ near ‘)’ What is the best way to create a HIVE external table with data stored in parquet format? …

HIVE CBO. Wrong results with Hive SQL query with MULTIPLE IN conditions in where clause

hadoop hive sql

I am running one SQL query in Hive and it gives different results with CBO enabled and disabled. The results are wrong when CBO is enabled (set hive.cbo.enable=true;). Prerequisites: Apache Hadoop 2.10.1 + Apache Hive 2.3.6 installed. (I tried to reproduce the issue with Apache Hive 3+ version and Hadoop 3+ v…

How to replace exist in Hive with two correlated subqueries

hive sql

I have a query that looks like this I researched and read that in Hive IN or EXIST are not supported statements. I read that a workaround for this would be to use a LEFT JOIN. I have tried this but I am having trouble with the GROUP BY u.id. I read that this needs to be paired always with

COUNT with CASE WHEN is showing the same result when using division

hive hiveql sql

I have the following query which returns the sold products: I want to calculate the percentage of the sold products comparing to all products for product_category =7 : I get the result as 100, while I execute each query separately they don’t have the same result. Answer count() counts both 0 and 1s, it …

convert a single row into 2 rows on impala/hive

hive impala sql

I have a huge table with millions of rows/IDs in the below format. I need to convert this into the below format so that the values are in 2 rows as shown below. Can you please help me with an impala/hive query to help with this? Thanks a lot. Answer I think a way would be this one:

How can i add days to a Hive timestamp without loosing hours, minutes and seconds

hive hiveql sql timestamp

I am using Hive 2.6.5 and when i want to add days to my timestamp, it doesn’t keep the hours, minutes and seconds. Exemple in addition to that it returns a wrong result as : I would like it to return the value 2021-01-17 09:34:21 Thank you Answer date_add truncates Unnecessary unix_timestamp+from_unixti…

What is difference between where and join in Hive SQL when joining two tables?

hive hiveql join sql

For example, What is difference between where and join in Hive SQL when joining two tables? Answer Join like this is a bad practice because in general, WHERE is being applied after join and transforming it to JOIN and pushing predicates is upon optimizer, to convert it to proper join and avoid CROSS join (joi…