I use the Greenplum database – massively multi-parallel Postgres. I have a table which that has 100 gb.
There is data from 2019 up to today. The table is not ordered, but every day we insert new data. So it’s kinda sorted by a sales day. I would like to recreate this table, but I would like to sort the data before the insert. The table is currently compressed with a quicklz compression and we use the column store compression. Sorting by a specific key should be beneficial because Greenplum uses RLE. The same values will be stored together.
By recreating the table I hope to reclaim some space. Would this have any impact on the performance?
Advertisement
Answer
Using RLE (which also applies delta compression internally) would be definitely beneficial for your table. Performance should ideally get better for queries as reduced IO would be performed due to a better compression ratio.