Athena geospatial SQL joins never complete

Question

A very basic geospatial join, based on this example, times out every time. The table polygons contains 340K polygons, while points contains 5K rows with latitude/longitude pairs (and an ID). Both are single .csv files in S3. Query: The SQL query above never completes in the default 30-minute Athena query time…

Accepted Answer

Athena service is based on Presto 0.172. That version did not have geospatial join, so your query if single-node cross-join + filter. Newer Presto versions have the optimization so they can execute such query in a distributed manner.Besides going back to EMR, you can try Starburst Presto for AWS which gives you easy to launch best Presto version prepackaged for AWS. (note: I work for Starburst)

Advertisement

Answer