Skip to content
Advertisement

Databricks Secrets with Apache Spark SQL Connecting to Oracle

I’m using spark SQL to pull tables from an Oracle database, some of them fairly sizable, into Azure databricks as tables so I can run jobs on them and leave them visible for the team to use. I need to pull them daily as the data changes and gets updated and I also want to source them by passing through my redacted credentials using Azure Key Vault / Secrets.

I have secrets set up and I can run a process to do all this in Python, however, for the larger tables it is too slow when writing the table from the data frame.

I know I can do this faster in spark SQL using the below, and can successfully pull almost all the tables in a fraction of the time than using python. However, I can’t seem to be able to set it up to source and pass through the secrets for the username and password.

%sql

CREATE TABLE <table-out>
USING org.apache.spark.sql.jdbc
OPTIONS (
  dbtable '<table-source>',
  driver 'oracle.jdbc.driver.OracleDriver',
  user '<username>',
  password '<password>',
  url 'jdbc:oracle:thin:@//<server>:<port>');

In Python I would have obtained the secrets using dbutils as below:

%python
jdbcUsername = dbutils.secrets.get(scope="jdbc", key="<key>")

Is there an equivalent way I can do this in the SQL approach above. I realise I will still need to optimise any tables but will cross that bridge when I get to it.

Any help would be appreciated.

Thanks, c2

Advertisement

Answer

So I managed to do this by submitting the SQL statement as a string into a spark.sql execution:

sqlQry = '''
CREATE TABLE IF NOT EXISTS {4}{1}
USING org.apache.spark.sql.jdbc
OPTIONS (
  driver 'oracle.jdbc.driver.OracleDriver',
  url '{0}',
  dbtable '{1}',
  user '{2}',
  password '{3}')'''.format(jdbcUrl, line, jdbcUsername, jdbcPassword, dbloc)

spark.sql(sqlQry)
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement