Tag: apache-spark

The python says ” ‘property’ object has no attribute ‘format'”

I am trying to use spark and I am stuck on the reading the data.. here is my code.. and the error message says that ‘property’ object has no attribute ‘format’ So I think there are something wrong with format.. I tried to read the code of spark but it was just too hard. I will really a…

Load the ResultSet of query in dataframe using Spark / java

apache-spark dataframe java sql

I want to load the Result Set of a select query in dataframe Spark. I’m using the following code : public static void func (Dataset df){ df.repartition(20); //one connection per …

Spark SQL INSERTION TECHNIQUE for Result got from calculation

apache-spark apache-spark-sql sql

or insertion I’m using below code- Entire code here for better understanding- This code gives error while inserting. Any help would be great. Error: Answer This worked- The arrangement. Just (‘”____”‘) is all I wanted to know.

How to cast from double to int in from_json Spark SQL (NULL output)

apache-spark apache-spark-sql sql

I have a table with a JSON string When running this Spark SQL query: select from_json(‘[{“column_1″:”hola”, “some_number”:1.0}]’, ‘array’) I …

How does spark SQL access databases

apache-spark sql

Suppose you access a SQL database with spark SQL. With RDD spark partitions the data into many different parts that all together make the data set. My question is how does Spark SQL manages this access from the N nodes to the database. I can see several possibilities: Each nodes of the RDD access to the datab…

how to pass value from one dataframe to another dataframe?

apache-spark apache-spark-sql dataframe sql

I have to pass the the C_ID value to where condition in below data frame as parameter. Any suggestions how I can do this ? i should not use subquery concept as data is in millions and multiple tables are there in joins,here i have mentioned sample query. Answer Store sql result into a variable using mkString …

Is there a way to compare all rows in one column of a dataframe against all rows in another column of another dataframe (spark)?

apache-spark apache-zeppelin pyspark scala sql

I have two dataframes in Spark, both with an IP column. One column has over 800000 entries while the other has 4000 entries. What I want to do is to see if the IP’s in the smaller dataframe appear in the IP column of the large dataframe. At the moment all I can manage is to compare the first row

Dynamically frame Filter condition based on conditions

apache-spark apache-spark-sql hive hiveql sql

We have separate table maintained for condition / filters. Based on the conditions, filters to be applied at base table. here’s the sample input conditional data for reference purpose Based on this input conditions , filter to be derived as follows. please help me in achieving the filter query Answer Th…

Spark Scala Compare Row and Row of 2 Data frames and get differences

apache-spark scala sql

I have a Dataframe 1, Df1, Dataframe 2 , Df2 – Same Schema I have Row 1 from Df1 – Dfw1, Row 1 from Df2 – Dfw2 I need to compare both to get differences b/n Dfw1 and Dfw2 and get the differences out as collection (Map or something) Answer A simple solution would be to transform the Row objec…

Pivot on Spark dataframe returns unexpected nulls on only one of several columns

apache-spark scala sql

I’ve pivoted a Spark dataframe, which works correctly for all columns except one, even though they’re all almost exactly the same. I have a dataframe which looks like this: (there are 29 distinct cf_id values, but in this example only two) when I run: I’d expect to see: All columns work corr…