Tag: scala

Chances of SQL injection in dynamically constructed SQL

I have a query like this in MySQL val selectQ = “SELECT NAME FROM EMPLOYEE” val date = “2010-10-10” val age = 10 Now I have some dynamic AND clauses, like val whereNameFilter = “WHERE date = $…

Is there a way to compare all rows in one column of a dataframe against all rows in another column of another dataframe (spark)?

apache-spark apache-zeppelin pyspark scala sql

I have two dataframes in Spark, both with an IP column. One column has over 800000 entries while the other has 4000 entries. What I want to do is to see if the IP’s in the smaller dataframe appear in the IP column of the large dataframe. At the moment all I can manage is to compare the first row

Spark Scala Compare Row and Row of 2 Data frames and get differences

apache-spark scala sql

I have a Dataframe 1, Df1, Dataframe 2 , Df2 – Same Schema I have Row 1 from Df1 – Dfw1, Row 1 from Df2 – Dfw2 I need to compare both to get differences b/n Dfw1 and Dfw2 and get the differences out as collection (Map or something) Answer A simple solution would be to transform the Row objects

Pivot on Spark dataframe returns unexpected nulls on only one of several columns

apache-spark scala sql

I’ve pivoted a Spark dataframe, which works correctly for all columns except one, even though they’re all almost exactly the same. I have a dataframe which looks like this: (there are 29 distinct cf_id values, but in this example only two) when I run: I’d expect to see: All columns work correctly except the final one displayed here (300019829932), which

spark [dataframe].write.option(“mode”,“overwrite”).saveAsTable(“foo”) fails with ‘already exists’ if foo exists

apache-spark overwrite scala sql

I think I am seeing a bug in spark where mode ‘overwrite’ is not respected, rather an exception is thrown on an attempt to do saveAsTable into a table that already exists (using mode ‘overwrite’). …

How to get the COUNT of emails for each id in Scala

apache-spark apache-spark-sql scala sql

I use this query in SQL to get return how many user_id’s have more than one email. How would I write this same query against a users DataFrame in Scala? also how would I be able to return to exact …

Aggregate data from multiple rows to one and then nest the data

apache-spark apache-spark-sql scala sql

I’m relatively new to scala and spark programming. I have a use case where I need to groupby data based on certain columns and have a count of a certain column (using pivot) and then finally I need …

How to do SQL from Akka?

akka akka-http jdbc scala sql

What is the idiomatic Akka way to issue SQL statements from an Akka application? Specifically, I have an Akka Http REST endpoint that wants to do SQL commands. Is there some official SQL support or async or message passing style SQL library? Answer Answering your specific question, “what is the idiomatic akka way to issue sql”, I would agree with

Including null values in an Apache Spark Join

apache-spark apache-spark-sql join scala sql

I would like to include null values in an Apache Spark join. Spark doesn’t include rows with null by default. Here is the default Spark behavior. val numbersDf = Seq( (“123”), (“456”), (null),…

How to send plain SQL queries (and retrieve results) using scala slick 3

scala slick sql sqlite

I am trying to make a class that has methods that can send and get data to an SQLite db using plain sql queries. This unfortunately does not work. I do not want to use the withSession implicit parts. The following error is thrown: type mismatch; found : slick.profile.SqlStreamingAction[Vector[(Int, Double, String)],(Int, Double, String),slick.dbio.Effect] required: slick.dbio.DBIOAction[(Int, Double, String),slick.dbio.NoStream,Nothing] DBops.scala Answer I