getting Clustering/Bucketing columns programmatically

Question

For reference, I am connecting to amazon-athena via sqlalchemy using essentially: create_engine( f'awsathena+rest://:@athena.{myRegion}.amazonaws.com:443/{athena_schema}?s3_staging_dir={...

Accepted Answer

Athena uses Glue Data Catalog to store metadata about databases and tables. I don&#8217;t know how much of this is exposed in information_schema, and there is very little documentation about it.However, you can get everything Athena knows by querying the Glue Data Catalog directly. In this case if you call GetTable (e.g. aws glue get-table …) you will find the bucketing information in Table.StorageDescriptor.BucketColumns.The GetTable call will also give you the storage format and the location of the files (but for a partitioned table you need to make additional calls with GetPartitions to retrieve the location of each partition&#8217;s data).

Advertisement

Answer