Skip to content
Advertisement

Concatenate variable in Athena SQL query from Python Lambda function

I have a Python Lambda function that creates a SQL table in Athena. How do I properly concatenate variables in my query? When I set the LOCATION value, I receive the error response below. The function runs successfully if I hard code the LOCATION value.

LOCATION “”” + s3_bucket_test + “””

Error response:

Response
{
  "errorMessage": "An error occurred (InvalidRequestException) when calling the StartQueryExecution operation: line 1:8: mismatched input 'EXTERNAL'. Expecting: 'OR', 'SCHEMA', 'TABLE', 'VIEW'",
  "errorType": "InvalidRequestException",
  "stackTrace": [
    "  File "/var/task/lambda_function.py", line 34, in lambda_handlern    queryStart = client.start_query_execution(n",
    "  File "/var/runtime/botocore/client.py", line 386, in _api_calln    return self._make_api_call(operation_name, kwargs)n",
    "  File "/var/runtime/botocore/client.py", line 705, in _make_api_calln    raise error_class(parsed_response, operation_name)n"
  ]
}

Lambda function:

import boto3
import json
import time

database = ‘daily_reports’
s3_bucket = 's3://test/’
s3_bucket_results = 's3://test/results’

query = ("""
          CREATE EXTERNAL TABLE IF NOT EXISTS `reports` (
              `timestamp` bigint,
              `user_id` string,
              `name` string
            )
            ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
            WITH SERDEPROPERTIES (
              'serialization.format' = '1'
            ) LOCATION “”” + s3_bucket + “””
            TBLPROPERTIES ('has_encrypted_data'='false');
        """)        


def lambda_handler(event, context):
    client = boto3.client('athena')
    queryStart = client.start_query_execution(
        QueryString = query,
        QueryExecutionContext = {
            'Database': database
        },
        ResultConfiguration = {
            'OutputLocation': s3_bucket_results
        }
    )

Thank you.

Advertisement

Answer

Have you tried to use Python’s format method? Something like this

query = ("""
          CREATE EXTERNAL TABLE IF NOT EXISTS `reports` (
              `timestamp` bigint,
              `user_id` string,
              `name` string
            )
            ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
            WITH SERDEPROPERTIES (
              'serialization.format' = '1'
            ) LOCATION '{}' 
            TBLPROPERTIES ('has_encrypted_data'='false');
        """).format(s3_bucket)    
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement