Skip to content

Tag: scala

how to Avoid self-join in spark scala

I have a DataFrame called product_relationship_current and I’m doing a self-join to retrieve a new DataFrame like bellow: First I’m giving it an alias so I could consider them like two different dataframes: And then I’m doing a self-join to get a new dataframe: But I’m looking for another way to do that without doing a self-join, so I don’t

Filter a Dataframe using a subset of it and two specific fields in spark/scala [closed]

Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 10 months ago. Improve this question I have an Scala/Spark question. I’m using Spark 2.1.1. I have a Dataframe

create rows from columns in a apache spark dataset

I’m trying from a dataset to create a row from existing columns. Here is my case: InputDataset accountid payingaccountid billedaccountid startdate enddate 0011t00000MY1U3AAL 0011t00000MY1U3XXX 0011t00000ZZ1U3AAL 2020-06-10 00:00:00.000000 NULL And I would like to have sometthing like this accountid startdate enddate 0011t00000MY1U3AAL 2021-06-10 00:00:00.000000 NULL 0011t00000MY1U3XXX 2021-06-10 00:00:00.000000 NULL 0011t00000ZZ1U3AAL 2021-06-10 00:00:00.000000 NULL In the input dataset the columns billedaccounid and

Spark SQL: keep a non-key row after join

I have two dataset as following: and: I want to join two datasets so that I could get ingredient information for each smoothie whose price is lower than 15$, but keep those even if the price is higher, and fill in with a string To be communicated for the ingredient field. I tried smoothieDs.join(ingredientDs).filter(col(price).lt(15)) and it gives: But my expected

Single quotes cause trouble while filtering in Slick

I have statements such as below and they fail with exceptions such as this I have tried to escape the single quote but wasn’t successful. When I tried to insert a record such as this: The exception I’ve gotten is: Please note that I am using H2 in Mysql mode to run my tests. Answer That error suggests to me

Finding largest number of location IDs per hour from each zone

I am using scala with spark and having a hard time understanding how to calculate the maximum count of pickups from a location corresponding to each hour. Currently I have a df with three columns (Location,hour,Zone) where Location is an integer, hour is an integer 0-23 signifying the hour of the day and Zone is a string. Something like this

SQL Database using JDBC + parameterize SQL Query + Databricks

In Databricks am reading SQL table as How can I parameterize SourceSystem and RuleCode in Where clause Was referring to: Answer if you import the spark implicits, you can create references to columns with the dollar $ interpolator. Also, you can use the API with columns to make the logic, it will be something like this. As you can