I am running one SQL query in Hive and it gives different results with CBO enabled and disabled. The results are wrong when CBO is enabled (set hive.cbo.enable=true;). Prerequisites: Apache Hadoop 2.10.1 + Apache Hive 2.3.6 installed. (I tried to reproduce the issue with Apache Hive 3+ version and Hadoop 3+ version and they work fine.) Actions to reproduce: 1)
Tag: hadoop
Extract year from timestamp in hive
I am writing the query to show the data entries for a specific year. Date is stored in dd/mm/yyyy hh:mm:ss.(Date TIMESTAMP – e.g. 12/2/2014 0:00:00). I am trying to display the two columns(name, orderdate) filtered by a specific year(year from orderdate). The requirement is to enter the specific year(2010 or 2020 etc) not the entire date. I tried using date_format()
Hive – Query to get Saturday as week start date for a given date
I have an requirement in hive to calculate Saturday as week start date for a given date in hive sql. Eg) I tried using pmod and other date functions but not getting desired output. Any insight is much appreciated. Answer Hive offers next_day(), which can be adapted for this purpose. I think the logic you want is: This is a
Combining Aggregate Function with resampling in Impala
I have Table in Hadoop in which I have data for different sensor units with a sampling time ts of 1 mSec. I can resample the data for a single unit with a combination of different aggregate functions using the following query in Impala (Let’s say I want to resample the data for each 5 minute using LAST_VALUE() as aggregate
Setting transactional-table properties results in external table
I am creating a managed table via Impala as follows: This should result in a managed table which does not support HIVE-ACID. However, when I run the command I still end up with an external table. Why is this? Answer I found out in the Cloudera documentation that neglecting the EXTERNAL-keyword when creating the table does not mean that the
Exclude records with certain values in Qubole
Using Qubole I have Table A (columns in json parsed…) I need to Select only IDs which have Recommendation GOOD but Decision BAD. Therefore output should be 3. I tried : Answer Use analytic functions. Demo: Result:
How to combine two tables to get singel table in Hive
I have following tables and need to combine them in hive Could any one please help me how can we achieve this. I tried date part with coalesce and it is fine. But fam part is not able to merge into single column. Really appreciate your help. Thanks, Babu Answer You can use full outer join. However, union with left
SQL Nested Joins (Case Statement and Join)
Hive DBMS; Two tables — A and B Table A Table B Question –> Trying to execute a query where: Join table A with table B, first on prnt_id, if it’s “unknown”, then join on sub_id, if that is “unknown”, join on ac_nm Desired Output: Answer You must use LEFT joins of TableB to 3 copies of TableA and filter
Hive – How to read a column from a table which is of type list
I have a Hive table named customer, which has a column named cust_id of list type, with following values: cust_id Now I want to read only this specific column cust_id in my select query, which can give all these list values as following separate values of this column cust_id: Basically I want to fetch all the values of cust_id from
Impala: Split single row into multiple rows based on Date and time
I want to split a single row into multiple rows based on time. Expected output is below: Day start from 00:00 AM to next day 00:00 AM. When EndDate time is greater than 00:00 AM (midnight) then split this date in two rows. First row end date is 30/03/2020 11:59:00 and next row start 31/03/2020 00:00:00. Please help me to