Skip to content
Advertisement

Inserting large amounts of generated test data into postgresql database

I want to insert a billion rows of floaty data into a postgresql database so that I can test the performance of various postgis functions. My effort below takes ages and seems to be really inefficient and the memory consumption seems to balloon. Can anyone suggest a more better way of doing this – I was thinking it would be better to insert a million rows each insert but I can’t work out how to build the object eg: (a, b), (c, d) to insert.

Any help gratefully received. Please note I am a bit newbie with SQL so I’m not able to digest hyper optimised solutions that require an advances CS degrees to assimilate 🙂 I’m looking for “good enough”.

Ta,

Andrew

Advertisement

Answer

Typically using generate_series() is faster than using loops in PL/pgSQL.

To generate the “position” value, you can use random()

The following will insert 100 million rows with random values for the second column:

I prefer to insert test data in chunks (e.g. 10 million at a time). That is easier to do, if you let Postgres generate a unique value for the first column, e.g. by defining it as an identify column:

If you really need a constant increase for the second column you can do that if you have the identity column:

Or do it in chunks:

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement