I’m using spark SQL to pull tables from an Oracle database, some of them fairly sizable, into Azure databricks as tables so I can run jobs on them and leave them visible for the team to use. I need to pull them daily as the data changes and gets updated and I also want to source them by passing through my redacted credentials using Azure Key Vault / Secrets.
I have secrets set up and I can run a process to do all this in Python, however, for the larger tables it is too slow when writing the table from the data frame.
I know I can do this faster in spark SQL using the below, and can successfully pull almost all the tables in a fraction of the time than using python. However, I can’t seem to be able to set it up to source and pass through the secrets for the username and password.
%sql CREATE TABLE <table-out> USING org.apache.spark.sql.jdbc OPTIONS ( dbtable '<table-source>', driver 'oracle.jdbc.driver.OracleDriver', user '<username>', password '<password>', url 'jdbc:oracle:thin:@//<server>:<port>');
In Python I would have obtained the secrets using dbutils as below:
%python jdbcUsername = dbutils.secrets.get(scope="jdbc", key="<key>")
Is there an equivalent way I can do this in the SQL approach above. I realise I will still need to optimise any tables but will cross that bridge when I get to it.
Any help would be appreciated.
Thanks, c2
Advertisement
Answer
So I managed to do this by submitting the SQL statement as a string into a spark.sql execution:
sqlQry = ''' CREATE TABLE IF NOT EXISTS {4}{1} USING org.apache.spark.sql.jdbc OPTIONS ( driver 'oracle.jdbc.driver.OracleDriver', url '{0}', dbtable '{1}', user '{2}', password '{3}')'''.format(jdbcUrl, line, jdbcUsername, jdbcPassword, dbloc) spark.sql(sqlQry)