Tag: hadoop

HIVE CBO. Wrong results with Hive SQL query with MULTIPLE IN conditions in where clause

I am running one SQL query in Hive and it gives different results with CBO enabled and disabled. The results are wrong when CBO is enabled (set hive.cbo.enable=true;). Prerequisites: Apache Hadoop 2.10.1 + Apache Hive 2.3.6 installed. (I tried to reproduce the issue with Apache Hive 3+ version and Hadoop 3+ v…

Extract year from timestamp in hive

cloudera hadoop hive nosql sql

I am writing the query to show the data entries for a specific year. Date is stored in dd/mm/yyyy hh:mm:ss.(Date TIMESTAMP – e.g. 12/2/2014 0:00:00). I am trying to display the two columns(name, orderdate) filtered by a specific year(year from orderdate). The requirement is to enter the specific year(20…

Hive – Query to get Saturday as week start date for a given date

hadoop hive sql

I have an requirement in hive to calculate Saturday as week start date for a given date in hive sql. Eg) I tried using pmod and other date functions but not getting desired output. Any insight is much appreciated. Answer Hive offers next_day(), which can be adapted for this purpose. I think the logic you want…

Combining Aggregate Function with resampling in Impala

hadoop impala qsqlquery sql

I have Table in Hadoop in which I have data for different sensor units with a sampling time ts of 1 mSec. I can resample the data for a single unit with a combination of different aggregate functions using the following query in Impala (Let’s say I want to resample the data for each 5 minute using LAST_…

Setting transactional-table properties results in external table

hadoop hive impala parquet sql

I am creating a managed table via Impala as follows: This should result in a managed table which does not support HIVE-ACID. However, when I run the command I still end up with an external table. Why is this? Answer I found out in the Cloudera documentation that neglecting the EXTERNAL-keyword when creating t…

Exclude records with certain values in Qubole

hadoop hive hiveql qubole sql

Using Qubole I have Table A (columns in json parsed…) I need to Select only IDs which have Recommendation GOOD but Decision BAD. Therefore output should be 3. I tried : Answer Use analytic functions. Demo: Result:

How to combine two tables to get singel table in Hive

hadoop hive sql

I have following tables and need to combine them in hive Could any one please help me how can we achieve this. I tried date part with coalesce and it is fine. But fam part is not able to merge into single column. Really appreciate your help. Thanks, Babu Answer You can use full outer join. However, union with…

SQL Nested Joins (Case Statement and Join)

case hadoop hive join sql

Hive DBMS; Two tables — A and B Table A Table B Question –> Trying to execute a query where: Join table A with table B, first on prnt_id, if it’s “unknown”, then join on sub_id, if that is “unknown”, join on ac_nm Desired Output: Answer You must use LEFT joins of T…

Hive – How to read a column from a table which is of type list

database hadoop hive hql sql

I have a Hive table named customer, which has a column named cust_id of list type, with following values: cust_id Now I want to read only this specific column cust_id in my select query, which can give all these list values as following separate values of this column cust_id: Basically I want to fetch all the…

Impala: Split single row into multiple rows based on Date and time

cloudera hadoop hive impala sql

I want to split a single row into multiple rows based on time. Expected output is below: Day start from 00:00 AM to next day 00:00 AM. When EndDate time is greater than 00:00 AM (midnight) then split this date in two rows. First row end date is 30/03/2020 11:59:00 and next row start 31/03/2020 00:00:00. Pleas…