I have following tables and need to combine them in hive Could any one please help me how can we achieve this. I tried date part with coalesce and it is fine. But fam part is not able to merge into single column. Really appreciate your help. Thanks, Babu Answer You can use full outer join. However, union with left
Tag: hive
SQL: Expression Not in GROUP BY Key
I have a transaction table t1 in Hive that looks like this: store_id cust_id zip_code transaction_count spend 1000 100 123 3 50 2000 200 …
how to merge multiple rows into single in MSSQL
this is my data: id segment country product status month year 83916512 Government Null Null Null Null 2014 83916512 Null Germany Null Null Null 2014 83916512 Null Null Carretera Null Null 2014 83916512 Null Null Null completed Null 2014 83916512 Null Null Null Null June 2014 83916512 Null Null Null Null Null 2014 i want below output can anybody help
how to join two hive tables with embedded array of struct and array on pyspark
I am trying to join two hive tables on databricks. tab1: The schema of “some_questions” “some_questions” example: tab2: I need to join tab1 and tab2 by “question_id” such that I get a new table I try to join them by pyspark. But, I am not sure how to decompose the array with embedded struct/array. thanks Answer For SparkSQL, you can
Hive: randomly select N values from distinct values of one column
Suppose I have a dataset like this I would like to randomly select, say, 3 values from the distinct ID values. One possibility is to get a table like this How shall I do that in Hive? Answer Here is one option using a join and rand(): The subquery randomly selects 3 ids, then the outer query brings all related
Getting NULL after combining strings between date functions
Given a date column with a value 2020-05-01, I want to return 2020-Q2. The QUARTER() function is not available due to the Hive version we are using. I can get the quarter number with: (INT((MONTH(yyyy_mm_dd)-1)/3)+1). When I try to combine this with the YEAR() function and strings, I get null: How can I properly concatenate this to get the desired
Convert Postgre query to Hive/ Mysql
I have this table: I want to a situation where each footballer appears only once in a new table. For instance, Messi appears twice, but I want to take any occurrence of Messi in the new table. I am not sure how to convert it to either Hive or mysql. This is what I want the desired results to look
Create Missing Data Hive SQL
I have a table that has an activity date of when things change such as 2020-08-13 123 Upgrade 2020-08-17 123 Downgrade 2020-08-21 123 Upgrade Basically this in relation to a line there are 3 …
How to transform data into a map using group by in Hive SQL?
I have data like below …and I want to create a map with lecture as the key and count as a value. How can I get an output like below? Answer If you can live with count being a string, you probably be able to use Hive str_to_map() function to get a desired map. That will require a couple of
Cross Join in Hive
I’m trying to create a new column time_period while running the query below. If the date difference between a given transaction and the most recent transaction in the reference table is fewer than 7 days, then mark it as a recent transaction, else mark it as an old transaction. However, the query below is generating an error in the subquery