Join 2 parquet files with different columns but common key(id) column in Athena

Question

I&#8217;m wondering is there a way in AWS Athena to &#8220;merge&#8221; 2 parquet files into a one single table in Athena just leveraging the columnar model of parquet, meaning without do any joins or post-&#8230;

Accepted Answer

Athena is basically just a modified version of Presto/Trino, which is a pure SQL interface that goes via Hive (or something like Glue/Iceberg).  It doesn&#8217;t really care what the underlying storage is aside from having a reader for it.  So, this is doubtful.  It would need to scan each file and join on the keys as it treats parquet, orc, csv, etc all simiarly.Parquet is also a pretty complex format.  Even if those two files had the same columns, they could be laid out internally or sorted very differently.  It&#8217;s not like they&#8217;re both a simple sorted CSVs where you can &#8220;just grab and merge everything from both files on line 12&#8221; or something like that.So, I doubt you&#8217;ll find anything like this, in Athena/presto or outside of them.  It doesn&#8217;t sound viable.  Anything doing this would have to basically do a join anyway, even if you didn&#8217;t call it that.

id	first_name
1	Jonh
2	Joe

id	last_name	status
1	Doe	1
2	Smith	0

id	first_name	last_name	status
1	Jonh	Doe	1
2	Joe	Smith	0

Advertisement

Answer