I am trying to run the following code. The idea is to obtains lower and upper bound of source table, based on a given ID column.
%scala val bounds_sql = s"(select min(PersonID), max(PersonID) from dbo.persons) as bounds" val connectionProperties = new Properties() val (lower, upper) = spark.read.jdbc(jdbcString, bounds_sql, connectionProperties) .as[(Option[Long], Option[Long])].take(1).map { case (a, b) => (a.getOrElse(0L), b.getOrElse(0L)) }.head
However, what I am getting is:
SQLServerException: No column name was specified for column 1 of 'bounds'
And I am not really sure what the issue could be.
Advertisement
Answer
I haven’t used Scala or Databricks – but I do use SQL Server so my answer is based on more generic approach.
I believe your issue lies in the statement
(select min(PersonID), max(PersonID) from dbo.persons) as bounds
You are creating a derived (virtual) table called ‘bounds’. However, you have not named the columns of that table. I have two suggested approaches.
- Name your columns e.g.,
(select min(PersonID) as bounds_lower, max(PersonID) as bounds_upper from dbo.persons) as bounds
PS beware calling them lower and upper as these are often SQL functions (converting to lower case and upper case respectively).
- Do you need the derived table? If you just need data in order, try removing the ‘bounds’ alias
select min(PersonID), max(PersonID) from dbo.persons