I want to insert a billion rows of floaty data into a postgresql database so that I can test the performance of various postgis functions. My effort below takes ages and seems to be really inefficient and the memory consumption seems to balloon. Can anyone suggest a more better way of doing this – I was thinking it would be better to insert a million rows each insert but I can’t work out how to build the object eg: (a, b), (c, d)
to insert.
Any help gratefully received. Please note I am a bit newbie with SQL so I’m not able to digest hyper optimised solutions that require an advances CS degrees to assimilate 🙂 I’m looking for “good enough”.
Ta,
Andrew
do $$ declare position float := 0; measurement float := 0; counting integer := 0; begin while position < 100 loop INSERT into lat_longs values (counting, postition); position := position + 0.0000001; counting := counting + 1; end loop; raise notice 'count: %', counting; end$$;
Advertisement
Answer
Typically using generate_series()
is faster than using loops in PL/pgSQL.
To generate the “position” value, you can use random()
The following will insert 100 million rows with random values for the second column:
insert into lat_longs(c1, c2) select g.id, random() * 100 from generate_series(1,100e6) as g(id);
I prefer to insert test data in chunks (e.g. 10 million at a time). That is easier to do, if you let Postgres generate a unique value for the first column, e.g. by defining it as an identify column:
create table lat_longs ( c1 bigint generated always as identity, c2 float ) insert into lat_longs(c2) select random() * 100 from generate_series(1,10e6) as g(id); insert into lat_longs(c2) select random() * 100 from generate_series(1,10e6) as g(id); ...
If you really need a constant increase for the second column you can do that if you have the identity column:
insert into lat_longs(c2) select g.position from generate_series(0, 100, 0.0000001) as g(position);
Or do it in chunks:
insert into lat_longs(c2) select g.position from generate_series(0, 10, 0.0000001) as g(position); insert into lat_longs(c2) select g.position from generate_series(10, 20, 0.0000001) as g(position); ...