Skip to content
Advertisement

A more efficient way to organize a DB schema for key value storage in a related table

Here is the DB schema that I currently use (some tables and table columns have been removed and the rest have been renamed for simplicity). I’ve added some indexes and at the bottom there are sample INSERT and SELECT queries that I frequently execute.

My question is: How can I further optimize the DB schema by adding / removing some indexes or changing some data types. The question mostly concerns the eventTypeParameterValues table and its value column which could take on arbitrary values (hence specified as TEXT here) some of which can be IDs of columns of other tables.

EDIT: The value column currently does NOT use an index when filtering on it. Should I add one?

CREATE TABLE "objects"(
    objectId INTEGER PRIMARY KEY,
    name TEXT NOT NULL UNIQUE CHECK(LENGTH(name) > 0),
    label TEXT NOT NULL UNIQUE CHECK(LENGTH(label) > 0)
);

CREATE INDEX index_objects_label ON "objects"(label);
INSERT INTO "objects"(name, label) 
    VALUES
        ('foo sfsdf', 'foo foo'),
        ('bar bar bar', 'bar');



CREATE TABLE "eventTypes"(
    eventTypeId INTEGER PRIMARY KEY,
    label TEXT NOT NULL UNIQUE CHECK(LENGTH(label) > 0)
);
INSERT OR IGNORE INTO "eventTypes"(label)
    VALUES
        ('foo'),
        ('bar'),
        ('bla'),
        -- ...
        ('bla bla');


CREATE TABLE "events"(
    eventId INTEGER PRIMARY KEY,
    eventTypeId INT NOT NULL REFERENCES "eventTypes"(eventTypeId) ON DELETE CASCADE
);
CREATE INDEX index_events_eventTypeId ON "events"(eventTypeId);


CREATE TABLE "eventTypeParameters"(
    eventTypeParameterId INTEGER PRIMARY KEY,
    label TEXT NOT NULL UNIQUE CHECK(LENGTH(label) > 0)
);
INSERT OR IGNORE INTO "eventTypeParameters"(label)
    VALUES
        ('param1ObjectId'),
        ('param1'),
        ('param2ObjectId'),
        ('param2'),
        ('param3'),
        -- ...
        ('param1000');


CREATE TABLE "eventTypeParameterValues"(
    eventTypeParameterValueId INTEGER PRIMARY KEY,
    eventId INT NOT NULL REFERENCES "events"(eventId) ON DELETE CASCADE,
    eventTypeParameterId INT NOT NULL REFERENCES "eventTypeParameters"(eventTypeParameterId) ON DELETE CASCADE,
    value TEXT NOT NULL
);
CREATE INDEX index_eventTypeParameterValues_eventId ON "eventTypeParameterValues"(eventId);
CREATE INDEX index_eventTypeParameterValues_eventTypeParameterId ON "eventTypeParameterValues"(eventTypeParameterId);


-- a sample event
INSERT INTO "events"(eventTypeId)
    VALUES((SELECT eventTypeId FROM eventTypes WHERE label = 'bar'));
INSERT INTO "eventTypeParameterValues"(eventId, eventTypeParameterId, value)
    VALUES((SELECT MAX(eventId) FROM events), (SELECT eventTypeParameterId FROM eventTypeParameters WHERE label = 'param2ObjectId'), (SELECT objectId FROM objects WHERE label = 'bar'));
INSERT INTO "eventTypeParameterValues"(eventId, eventTypeParameterId, value)
    VALUES((SELECT MAX(eventId) FROM events), (SELECT eventTypeParameterId FROM eventTypeParameters WHERE label = 'param3'), 'sdfasf sflsfjlsd');



SELECT ppv1.value FROM "events" t 
    INNER JOIN "eventTypeParameterValues" ppv1 ON t.eventId = ppv1.eventId
    INNER JOIN "eventTypeParameterValues" ppv2 ON t.eventId = ppv2.eventId
    WHERE t.eventTypeId = 2 AND ppv1.eventTypeParameterId = 3
        AND ppv2.eventTypeParameterId = 5 AND ppv2.value = 'sdfasf sflsfjlsd';

EDIT 2: Since work on this project has just begun, more radical DB schema changes are NOT out of the question. So feel free to suggest even more drastic changes.

Advertisement

Answer

SQLite uses only one index per table in your query. Adding more indices is unlikely to improve performance of your query.

Based on your sample query, you’d benefit from the following indices:

events (eventId, eventTypeId)
eventTypeParameterValues (eventId, eventTypeParameterId, value)

It’s hard to say whether that would actually improve performance without seeing your EXPLAIN as it is today. When I tried this on db-fiddle, the index on eventTypeParameterValues is used, but not the one on events, presumably because in the sample data, eventTypeId isn’t sufficiently distributed.

Adding these indices will help if the following are true:

  • in the events table for any given event ID, there can be thousands of event types.
  • in the eventTypeParameterValues table, for any given eventID, there can be thousands of combinations of eventTypeParamenterID and value

Your current EXPLAIN says that when searching the eventTypeParameterValues, the query optimizer uses an index to find all the records matching the foreign key to events. If that yields a set of “some” (<1000?) records, the additional index is unlikely to make a huge difference. If it yields “lost” (>100.000?), it will make a huge difference.

The good news is that you don’t need to worry too much – if you create the compound index, you can drop index_eventTypeParameterValues_eventId as its covered by the compound index.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement