Skip to content
Advertisement

AWS Athena custom data format?

I’d like to query my app logs on S3 with AWS Athena but I’m having trouble creating the table/specifying the data format.

This is how the log lines look:

which is a timestamp followed by space and the JSON line I want to query.

Is there a way to query logs like this? I see CSV, TSV, JSON, Apache Web Logs and Text File with Custom Delimiters data formats are supported but because of the timestamp I can’t simply use JSON.

Advertisement

Answer

Define table with single column:

You can extract timestamp and JSON using regexp, then parse JSON separately:

Alternatively you can define regexSerDe table with 2 columns, SerDe will do parsing two columns and all you need is to parse JSON_COL:

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement