Skip to content
Advertisement

Partitioned by gives me error column duplicated when creating external table

I’m creating an external table using ‘partitioned by’ but I get the following error:

 'Invalid operation: column "#id" duplicated'

I first thought it could have something to do with the character #, but then I tried a different column and I got the same mistake. (In case anyone asks about #, I uploaded the files from a csv to S3 with header including # and I couldn’t change the column name when creating the table to a name without this #, otherwise it would bring null values only.)

CREATE EXTERNAL TABLE schema.table_name
(
   #id            BIGINT,
   uf             varchar(255)
)



 partitioned by (#id BIGINT,
                  uf varchar(255))
  row format delimited
  fields terminated by ','
  stored as parquet
  location 's3://bucket/folder/'

Advertisement

Answer

Your problem is that you are trying to name a partition with a name that is already being used by a column.

As you can see here on the documentation, that is not allowed:

Create an external table and specify the partition key in the PARTITIONED BY clause. The partition key can’t be the name of a table column. […]

Just rename your partition and it should work.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement