Skip to content

is there a method to conect to postgresql (dbeaver ) from pyspark?

hello i installed pyspark now and i have a database postgres in local in DBeaver : how can i connect to postgres from pyspark please

i tried this

from pyspark.sql import DataFrameReader

url = 'postgresql://localhost:5432/coucou'
properties = {'user': 'postgres', 'password': 'admin'}
df = DataFrameReader(sqlContext).jdbc(
    url='jdbc:%s' % url, table='tw_db', properties=properties

but i have an error

  File "", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o37.jdbc.
: java.lang.ClassNotFoundException: C:/Users/Desktop/postgresql-42.2.23.jre7.jar


You need to add the jars you want to use when creating the sparkSession.

See this :

Either when you start pyspark

pyspark --repositories MAVEN_REPO
# OR
pyspark --jars PATH_TO_JAR

or when you create your sparkSession objects

SparkSession.builder.master("yarn").appName(app_name).config("spark.jars.packages", "MAVEN_PACKAGE")
# OR
SparkSession.builder.master("yarn").appName(app_name).config("spark.jars", "PATH_TO_JAR")

You need maven packages when you do not have the jar in local or your jars needs some dependencies.