Tag: scala

how to Avoid self-join in spark scala

apache-spark dataframe scala self-join sql

I have a DataFrame called product_relationship_current and I’m doing a self-join to retrieve a new DataFrame like bellow: First I’m giving it an alias so I could consider them like two different dataframes: And then I’m doing a self-join to get a new dataframe: But I’m looking for anot…

Filter a Dataframe using a subset of it and two specific fields in spark/scala [closed]

apache-spark join scala sql union

Closed. This question needs debugging details. It is not currently accepting answers. Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question. Closed 10 months ago. Improve this question I h…

create rows from columns in a apache spark dataset

apache-spark apache-spark-dataset scala sql

I’m trying from a dataset to create a row from existing columns. Here is my case: InputDataset accountid payingaccountid billedaccountid startdate enddate 0011t00000MY1U3AAL 0011t00000MY1U3XXX 0011t00000ZZ1U3AAL 2020-06-10 00:00:00.000000 NULL And I would like to have sometthing like this accountid star…

Spark SQL: keep a non-key row after join

apache-spark java scala sql

I have two dataset as following: and: I want to join two datasets so that I could get ingredient information for each smoothie whose price is lower than 15$, but keep those even if the price is higher, and fill in with a string To be communicated for the ingredient field. I tried smoothieDs.join(ingredientDs)…

SQLServerException: No column name was specified for column 1 of ‘bounds’

azure-databricks scala sql

I am trying to run the following code. The idea is to obtains lower and upper bound of source table, based on a given ID column. However, what I am getting is: And I am not really sure what the issue could be. Answer I haven’t used Scala or Databricks – but I do use SQL Server so my answer

Single quotes cause trouble while filtering in Slick

h2 scala slick sql

I have statements such as below and they fail with exceptions such as this I have tried to escape the single quote but wasn’t successful. When I tried to insert a record such as this: The exception I’ve gotten is: Please note that I am using H2 in Mysql mode to run my tests. Answer That error sugg…

SparkSQLContext dataframe Select query based on column array

apache-spark apache-spark-sql dataframe scala sql

This is my dataframe: I want to select all books where the author is Udo Haiber. but of course it didn’t work because authors is array. Answer You can use array_contains to check if the author is inside the array: Use single quotes to quote the author name because you’re using double quotes for th…

How can I compare rows of data in an array based on distinct attributes of a column?

apache-spark apache-spark-sql scala sql

I have a tricky student work in spark. I need to write an SQL query for this kind of array: There are more departments and accordingly loans for each department both for males and females. How can I compute a new array where Female’s loans are more than Male’s loans per department and print/show only the depa…

Finding largest number of location IDs per hour from each zone

apache-spark scala sql

I am using scala with spark and having a hard time understanding how to calculate the maximum count of pickups from a location corresponding to each hour. Currently I have a df with three columns (Location,hour,Zone) where Location is an integer, hour is an integer 0-23 signifying the hour of the day and Zone…

SQL Database using JDBC + parameterize SQL Query + Databricks

databricks jdbc python scala sql

In Databricks am reading SQL table as How can I parameterize SourceSystem and RuleCode in Where clause Was referring to: https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/sql-databases Answer if you import the spark implicits, you can create references to columns with the dollar $ interpolat…