In below Hive-query, I need to read the null / empty “string” tags as well, from the XML content. Only the non-null “string” tags are getting considered within the XPATH() list now….
Tag: hiveql
Performance difference with Where condition in subquery/cte
Is there a performance difference for applying the where condition to a subquery data source compared to applying it at the joined statement? Is there a difference between these in performance? Let’s say I have two hive tables A and B which are both partitioned on the field date. Is that query’s performance the same as the following? Answer The
Hive QL Declaration
What is the difference between CHAR() and VARCHAR() declarations from HQL?
How to get a first and last value of one column based on another column values
I have a data looks like below. I want to extract the value of the first and last “TS” column based on each “Col” column values (A, B, and C) when it changes. The expected output should be as follows Thanks for your help in advance! Answer This is a type of gaps-and-islands problem. This version is probably best addressed
Exclude records with certain values in Qubole
Using Qubole I have Table A (columns in json parsed…) I need to Select only IDs which have Recommendation GOOD but Decision BAD. Therefore output should be 3. I tried : Answer Use analytic functions. Demo: Result:
Filtering records not containing numbers
I have a table that has numbers in string format. Ideally the table should contain 10 digit number in string format, but it has many junk values. I wanted to filter out the records that are not ideal …
Getting NULL after combining strings between date functions
Given a date column with a value 2020-05-01, I want to return 2020-Q2. The QUARTER() function is not available due to the Hive version we are using. I can get the quarter number with: (INT((MONTH(yyyy_mm_dd)-1)/3)+1). When I try to combine this with the YEAR() function and strings, I get null: How can I properly concatenate this to get the desired
Convert Postgre query to Hive/ Mysql
I have this table: I want to a situation where each footballer appears only once in a new table. For instance, Messi appears twice, but I want to take any occurrence of Messi in the new table. I am not sure how to convert it to either Hive or mysql. This is what I want the desired results to look
Create Missing Data Hive SQL
I have a table that has an activity date of when things change such as 2020-08-13 123 Upgrade 2020-08-17 123 Downgrade 2020-08-21 123 Upgrade Basically this in relation to a line there are 3 …
HIVE converting unix timestamp for calculation
I’m trying to perform subtraction among timestamps and would like to convert the timestamps in forms that can be converted to minutes. I used regexp_replace to convert timestamp in such form: The following code will convert it to seconds I have other two timestamps that I wish to convert to seconds, such as: How should I convert these two timestamp