java.sql.SQLException: Unrecognized SQL type -102 …

I’m trying to load a remote Oracle Database table on to Apache Spark shell.

This is how I started the spark-shell.

./spark-shell --driver-class-path ../jars/ojdbc6.jar --jars ../jars/ojdbc6.jar --master local

And I get a Scala prompt, where I try to load an Oracle database table like below. (I use a custom JDBC URL)

val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=WHATEVER)(HOST=myDummyHost.com)(PORT=xxxx)))(CONNECT_DATA=(SERVICE_NAME=dummy)(INSTANCE_NAME=dummyKaMummy)(UR=A)(SERVER=DEDICATED)))").option("dbtable", "THE_DUMMY_TABLE").option("user", "DUMMY_USER").option("password", "DUMMYPASSWORD").option("driver", "oracle.jdbc.driver.OracleDriver").load()

(Replaced employer data with dummy variables)

And then I get this error.

java.sql.SQLException: Unrecognized SQL type -102
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getCatalystType(JdbcUtils.scala:246)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$8.apply(JdbcUtils.scala:316)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.getSchema(JdbcUtils.scala:315)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:63)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:318)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
  ... 49 elided

I tried to see if there is an issue with the quotes, but it’s not that.

Can somebody save my life, please?

Answer

The problem is an incompatible field in the database. If you cannot modify the database, but would still like to read it, the solution would be to ignore specific columns (in my case it’s a field with type geography). With the help of How to select specific columns through Spack JDBC?, here’s a solution in pyspark (scala solution would be similar):

df = spark.read.jdbc(url=connectionString, table="(select colName from Table) as CompatibleTable", properties=properties)

java.sql.SQLException: Unrecognized SQL type -102 while connecting to Oracle Database from Apache Spark

Advertisement

Answer