After Redshift announced support for Geometry types and spatial functions, I’d like to create a table with polygons for all countries. I’m failing to do the INSERT and would appreciate help.
Here is what I’ve tried:
I’ve downloaded the geojson and unzipped (https://datahub.io/core/geo-countries)
Then the following python snippet was used to create the table successfully (I’ve used the type GEOMETRY, not sure if I can optimise and use the sub-type POLYGON):
import psycopg2 conn = psycopg2.connect(...connection params) cur = conn.cursor() cur.execute("CREATE TABLE engagement.geospatial_countries (id INTEGER PRIMARY KEY, name VARCHAR(25), code VARCHAR(10), polygon GEOMETRY);")
The following script successfully reads the geojson, each entry in “countries” holding a Polygon GeoJson feature:
f = open("geospatial-data/countries.geojson", "r") countries_file_contents = f.read() countries_geojson = json.loads(countries_file_contents) countries = countries_geojson["features"]
For those not familiar with GeoJson, it’s simply a set of JSON data that describes geospatial shapes. Here is an excerpt of the data:
{ "type": "FeatureCollection", "features": [{ "type": "Feature", "properties": { "ADMIN": "Aruba", "ISO_A3": "ABW" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ -69.996937628999916, 12.577582098000036 ], [ -69.936390753999945, 12.531724351000051 ], [ -69.924672003999945, 12.519232489000046 ], [ -69.915760870999918, 12.497015692000076 ], [ -69.880197719999842, 12.453558661000045 ], [ -69.876820441999939, 12.427394924000097 ], [ -69.888091600999928, 12.417669989000046 ], [ -69.908802863999938, 12.417792059000107 ], [ -69.930531378999888, 12.425970770000035 ], [ -69.945139126999919, 12.44037506700009 ], [ -69.924672003999945, 12.44037506700009 ], [ -69.924672003999945, 12.447211005000014 ], [ -69.958566860999923, 12.463202216000099 ], [ -70.027658657999922, 12.522935289000088 ], [ -70.048085089999887, 12.531154690000079 ], [ -70.058094855999883, 12.537176825000088 ], [ -70.062408006999874, 12.546820380000057 ], [ -70.060373501999948, 12.556952216000113 ], [ -70.051096157999893, 12.574042059000064 ], [ -70.048736131999931, 12.583726304000024 ], [ -70.052642381999931, 12.600002346000053 ], [ -70.059641079999921, 12.614243882000054 ], [ -70.061105923999975, 12.625392971000068 ], [ -70.048736131999931, 12.632147528000104 ], [ -70.00715084499987, 12.5855166690001 ], [ -69.996937628999916, 12.577582098000036 ] ] ] } }, ... more countries }]}
Before I insert all countries, I first just want to try and create it for a single country:
country = countries[0] geometry_to_insert = ( country["properties"]["ADMIN"], country["properties"]["ISO_A3"], Json.dumps(country["geometry"]) # Have also tried psycopg2.extras.Json(country["geometry"]), as well as just using the dict )
The following fails:
cur.execute( "INSERT INTO engagement.geospatial_countries (name, code, polygon) VALUES %s", geometry_to_insert )
With the following error: TypeError: not all arguments converted during string formatting
I’ve also tried
cur.execute( "INSERT INTO engagement.geospatial_countries (name, code, polygon) VALUES (%s, %s, %s)", geometry_to_insert )
But that gives the following error: psycopg2.errors.InternalError_: Compass I/O exception: Invalid hexadecimal character(s) found
How do I insert a polygon into redshift, using the new Geometry types?
Advertisement
Answer
Here I give the steps that worked to insert it into the DB.
First, a minor correction in creating a table for the geometries, using IDENTITY to have an auto-incrementing ID:
conn = psycopg2.connect(...connection params) cur = conn.cursor() cur.execute("CREATE TABLE engagement.geospatial_countries (id INTEGER IDENTITY(0,1) PRIMARY KEY, name VARCHAR(25), code VARCHAR(10), polygon GEOMETRY);")
Onto the Geometries. To insert the value, use a WKT value:
import geojson from shapely.geometry import shape ... # exact same steps as in question to read file, then country = countries[0] geom = shape(country["geometry"]) geometry_to_insert = ( country["properties"]["ADMIN"], country["properties"]["ISO_A3"], geom.wkt )
Then the following command to insert the value:
cur.execute( "INSERT INTO engagement.geospatial_countries (name, code, polygon) VALUES (%s, %s, ST_GeomFromText(%s))", geometry_to_insert )
Answers from both @Maurice Meyer and @piro guided me to this answer.