Skip to content
Advertisement

Storing a huge amount of points(x,y,z) in a relational database

I need to store a very simple data structure on disk – the Point. It’s fields are just:

  • Moment – 64-bit integer, representing a time with high precision.
  • EventType – 32-bit integer, reference to another object.
  • Value – 64-bit floating point number.

Requirements:

  1. The pair of (Moment + EventType) is unique identifier of the Point, so I suspect it to be a composite primary key.
  2. There’s a huge number of Points. Up to 5 billions (1-2 TB of disk space). So the format must be as small as possible.
  3. Typical and almost single usage of the table is a retrieval (or creating a view) of millions of Points by exact EventType and a range of Moments.

Questions:

  • Which RDBMS to choose and why?
  • What is the optimal SQL definition for a table of Points?
  • And comments about my thoughts below are also appreciated.

I don’t need a huge professional system with all tools, features and extensions like PostgreSQL or MSSQL. Also I don’t need a server, so the choice of SQLite looks optimal. Another great RDBMS with a feature of embedded database is Firebird, but I was seduced by SQLite’s dynamic typing paradigm. It looks like it can save me space on disk because integer fields can be stored in “smaller” form (1, 2, 3, 4, 6 bytes).

But first of all, SQLite creates special ROWID column (64-bits length) when primary key is composite:

CREATE TABLE points (
    moment integer not null,
    event_id integer not null,
    value numeric not null,
    PRIMARY KEY (moment, event_id)
);

It means table wastes nearly 40% more space for nothing. I found “The WITHOUT ROWID Optimization”. But it will be available only in 3.8.2 version of SQLite (December 2013). Waiting for ADO.NET provider which I need is inappropriate.

Another problem is SQLite uses B-tree for tables. It looks like it is inefficient for selecting data ranges. To select a big block of Points based on primary key’s range it looks SQLite will be a bad choice.

Advertisement

Answer

B-trees are the most efficient organization for selecting data ranges.

If you search a constant event_id value and a range of moment values, the two-column index can be used for both lookups only if the event_id is the first column in the index:

CREATE TABLE points (
    event_id INTEGER NOT NULL,
    moment INTEGER NOT NULL,
    value NUMERIC NOT NULL,
    PRIMARY KEY (event_id, moment)
);

You should try to use version 3.8.2 so that you can use the WITHOUT ROWID optimization.
The developers are likely to be happy that somebody will test this function, and give you a compiled prerelease version.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement