My source is SQL Server and I am using SSIS to export data to S3 Buckets, but now my requirement is to send files as parquet File formate.
Can you guys give some clues on how to achieve this?
Thanks, Ven
Advertisement
Answer
For folks stumbling on this answer, Apache Parquet is a project that specifies a columnar file format employed by Hadoop and other Apache projects.
Unless you find a custom component or write some .NET code to do it, you’re not going to be able to export data from SQL Server to a Parquet file. KingswaySoft’s SSIS Big Data Components might offer one such custom component, but I’ve got no familiarity.
If you were exporting to Azure, you’d have two options:
Use the Flexible File Destination component (part of the Azure feature pack), which exports to a Parquet file hosted in Azure Blob or Data Lake Gen2 storage.
Leverage PolyBase, a SQL Server feature. It let’s you export to a Parquet file via the external table feature. However, that file has to be hosted in a location mentioned here. Unfortunately S3 isn’t an option.
If it were me, I’d move the data to S3 as a CSV file then use Athena to convert the CSV file to Pqrquet. There is a nifty article here that talks through the Athena piece:
https://www.cloudforecast.io/blog/Athena-to-transform-CSV-to-Parquet/
Net-net, you’ll need to spend a little money, get creative, switch to Azure, or do the conversion in AWS.