Skip to content
Advertisement

Register temp table in dataframe not working

Below is my script to use sql in dataframe with python:

pyspark --packages com.databricks:spark-csv_2.10:1.4.0
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true').load('file:///root/Downloads/data/flight201601short.csv')

df.show(5) shows result below:

enter image description here

then I register the dataframe to a temp table:

df.registerTempTable("flight201601")

and tried to run some sql query like below:

sqlContext.sql("select distinct CARRIER from flight201601")

It doesn’t produce expected result, instead:

enter image description here

I also tried:

sqlContext.sql("select * from flight201601")

and it gives me:

enter image description here

So it seems the registerTempTable method only create the table schema and the table is NOT populated, what am I missing?

Advertisement

Answer

You will have to call show() method on Dataframe returned by sqlContext.sql to get the result of query. Check the spark document which says

The sql function on a SQLContext enables applications to run SQL queries programmatically and returns the result as a DataFrame.

sqlDF = sqlContext.sql("select * from flight201601")
sqlDF.show()
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement