We have separate table maintained for condition / filters. Based on the conditions, filters to be applied at base table. here’s the sample input conditional data for reference purpose Based on this input conditions , filter to be derived as follows. please help me in achieving the filter query Answer The below sparkSQL will help you to build the where
Tag: apache-spark-sql
Transpose Columns having values into Rows
I have a requirement that needs columns with values to be transposed into rows. For instance refer to the table below: cust: cust_id | cover1 | cover2 | cover3 1234 | ‘PAG’ | Null | ‘TDE’ 5678 | …
Order of the tables in a JOIN
In spark-sql I have a query that uses several tables (both large & small) in Joins. My question is – does the order of these tables matter with respect to query performance ? For e.g. select …
Why does outer reference in SQL subquery produce different results?
I run two SQL queries: The first one have an outer reference to the table inside subquery. In the second one I add the same table inside subquery. The results are different, it fails due to multiple …
Split column in hive
I am new to Hive and Hadoop framework. I am trying to write a hive query to split the column delimited by a pipe ‘|’ character. Then I want to group up the 2 adjacent values and separate them into …
How to get the COUNT of emails for each id in Scala
I use this query in SQL to get return how many user_id’s have more than one email. How would I write this same query against a users DataFrame in Scala? also how would I be able to return to exact …
Aggregate data from multiple rows to one and then nest the data
I’m relatively new to scala and spark programming. I have a use case where I need to groupby data based on certain columns and have a count of a certain column (using pivot) and then finally I need …
How to join two tables with same date but different time such that the closest time is chosen?
I am trying to left join a Notes table where the user would upload Notes on certain time of each day onto a Score table with a date and time. The dates need to be the same, but I need to choose match …
Add column with yesterday sale information on a daily sales database
I’m working with a database that contains daily sales information of different products and stores. StoreSku Date UnitsSale A-2134 20/04/2019 2 A-2135 20/04/2019 1 A-2134 …
What’s the default window frame for window functions
Running the following code: The result is: There is no window frame defined in the above code, it looks the default window frame is rowsBetween(Window.unboundedPreceding, Window.currentRow) Not sure my understanding about default window frame is correct Answer From Spark Gotchas Default frame specification depends on other aspects of a given window defintion: if the ORDER BY clause is specified and