I have a DataFrame called product_relationship_current and I’m doing a self-join to retrieve a new DataFrame like bellow: First I’m giving it an alias so I could consider them like two different dataframes: And then I’m doing a self-join to get a new dataframe: But I’m looking for another way to do that without doing a self-join, so I don’t
Tag: scala
Filter a Dataframe using a subset of it and two specific fields in spark/scala [closed]
Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 10 months ago. Improve this question I have an Scala/Spark question. I’m using Spark 2.1.1. I have a Dataframe
create rows from columns in a apache spark dataset
I’m trying from a dataset to create a row from existing columns. Here is my case: InputDataset accountid payingaccountid billedaccountid startdate enddate 0011t00000MY1U3AAL 0011t00000MY1U3XXX 0011t00000ZZ1U3AAL 2020-06-10 00:00:00.000000 NULL And I would like to have sometthing like this accountid startdate enddate 0011t00000MY1U3AAL 2021-06-10 00:00:00.000000 NULL 0011t00000MY1U3XXX 2021-06-10 00:00:00.000000 NULL 0011t00000ZZ1U3AAL 2021-06-10 00:00:00.000000 NULL In the input dataset the columns billedaccounid and
Spark SQL: keep a non-key row after join
I have two dataset as following: and: I want to join two datasets so that I could get ingredient information for each smoothie whose price is lower than 15$, but keep those even if the price is higher, and fill in with a string To be communicated for the ingredient field. I tried smoothieDs.join(ingredientDs).filter(col(price).lt(15)) and it gives: But my expected
SQLServerException: No column name was specified for column 1 of ‘bounds’
I am trying to run the following code. The idea is to obtains lower and upper bound of source table, based on a given ID column. However, what I am getting is: And I am not really sure what the issue could be. Answer I haven’t used Scala or Databricks – but I do use SQL Server so my answer
Single quotes cause trouble while filtering in Slick
I have statements such as below and they fail with exceptions such as this I have tried to escape the single quote but wasn’t successful. When I tried to insert a record such as this: The exception I’ve gotten is: Please note that I am using H2 in Mysql mode to run my tests. Answer That error suggests to me
SparkSQLContext dataframe Select query based on column array
This is my dataframe: I want to select all books where the author is Udo Haiber. but of course it didn’t work because authors is array. Answer You can use array_contains to check if the author is inside the array: Use single quotes to quote the author name because you’re using double quotes for the query string.
How can I compare rows of data in an array based on distinct attributes of a column?
I have a tricky student work in spark. I need to write an SQL query for this kind of array: There are more departments and accordingly loans for each department both for males and females. How can I compute a new array where Female’s loans are more than Male’s loans per department and print/show only the departments where female loans
Finding largest number of location IDs per hour from each zone
I am using scala with spark and having a hard time understanding how to calculate the maximum count of pickups from a location corresponding to each hour. Currently I have a df with three columns (Location,hour,Zone) where Location is an integer, hour is an integer 0-23 signifying the hour of the day and Zone is a string. Something like this
SQL Database using JDBC + parameterize SQL Query + Databricks
In Databricks am reading SQL table as How can I parameterize SourceSystem and RuleCode in Where clause Was referring to: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases Answer if you import the spark implicits, you can create references to columns with the dollar $ interpolator. Also, you can use the API with columns to make the logic, it will be something like this. As you can