I have a table with column_1, column_2, column_3, and select it by: What I want is to add an extra column ‘hour’, it would have 24 possible values from 0 to 23. The outcome is to have every row [column_1, column_2, column_3] repeated 24 times with all possible 24 values of the extra column hour: How should I do it?
Tag: hive
Hive SQL – time interval in 5 minutes
My data is too large to analyze since it is collecting every second or so. To reduce the data, I will like to group into interval of 5 minutes. I tried converting into unix timestamp and reverting it back, but didn’t work. I tried something like this Original data or output Desired output Answer I don’t know Hive, but make
how to include 0 results in count with group by in HIVEQL
I’m a newbie in HIVE. I want to include 0 rows in results I have one table like this is my query example result is but my desired result is How can I get a 0 in results? Answer You can do this but you need to remove where clause. You can also do using self join. EDIT – I
How Create a hive external table with parquet format
I am trying to create an external table in hive with the following query in HDFS. getting error Error while compiling statement: FAILED: ParseException line 11:2 missing EOF at ‘LOCATION’ near ‘)’ What is the best way to create a HIVE external table with data stored in parquet format? Answer I am able to create table after removing property TBLPROPERTIES(“Parquet.compression”=”SNAPPY”)
HIVE CBO. Wrong results with Hive SQL query with MULTIPLE IN conditions in where clause
I am running one SQL query in Hive and it gives different results with CBO enabled and disabled. The results are wrong when CBO is enabled (set hive.cbo.enable=true;). Prerequisites: Apache Hadoop 2.10.1 + Apache Hive 2.3.6 installed. (I tried to reproduce the issue with Apache Hive 3+ version and Hadoop 3+ version and they work fine.) Actions to reproduce: 1)
How to replace exist in Hive with two correlated subqueries
I have a query that looks like this I researched and read that in Hive IN or EXIST are not supported statements. I read that a workaround for this would be to use a LEFT JOIN. I have tried this but I am having trouble with the GROUP BY u.id. I read that this needs to be paired always with
COUNT with CASE WHEN is showing the same result when using division
I have the following query which returns the sold products: I want to calculate the percentage of the sold products comparing to all products for product_category =7 : I get the result as 100, while I execute each query separately they don’t have the same result. Answer count() counts both 0 and 1s, it does not counts NULLs. Use ELSE
convert a single row into 2 rows on impala/hive
I have a huge table with millions of rows/IDs in the below format. I need to convert this into the below format so that the values are in 2 rows as shown below. Can you please help me with an impala/hive query to help with this? Thanks a lot. Answer I think a way would be this one:
How can i add days to a Hive timestamp without loosing hours, minutes and seconds
I am using Hive 2.6.5 and when i want to add days to my timestamp, it doesn’t keep the hours, minutes and seconds. Exemple in addition to that it returns a wrong result as : I would like it to return the value 2021-01-17 09:34:21 Thank you Answer date_add truncates Unnecessary unix_timestamp+from_unixtime conversion Convert to timestamp, add interval: Result: Timestamp
What is difference between where and join in Hive SQL when joining two tables?
For example, What is difference between where and join in Hive SQL when joining two tables? Answer Join like this is a bad practice because in general, WHERE is being applied after join and transforming it to JOIN and pushing predicates is upon optimizer, to convert it to proper join and avoid CROSS join (join without ON condition). Always use