I have the following schema for a table (simplified for the sake of discussion) :
Task max_solved_by int, # Number of users who can solve this task currently_solved_by int[], # Number of users who currently solved this task so far state VARCHAR(10), # State of a task can either be NOT_SOLVED or SOLVED
A task transitions from NOT_SOLVED
TO SOLVED
when array_length(current_solved_by) == max_solved_by
Questions:
- Is it a good practice to maintain another column, in this case
state
, which can be derived from other data (Based on size of currently_solved_by.
Main reason I added the state is to be able query on state: e.g. SELECT * FROM TASKS where state = 'NOT_SOLVED'
. But if this query can be (efficiently) queried using array_length(currently_solved_by), I probably don’t need this state.
I am a beginner to postgres. So any help in what is a good practice in this case is appreciated.
Thanks
Advertisement
Answer
There are two options:
- You may use generated column, which will hide all the business logic behind and just provide you meaningful values like
SOLVED
orNOT_SOLVED
. But it will of course consume additional space. - You may use expression index on comparison of array length and max length. This will allow you to use fast searches on the expression, but you’ll need to use this exact expression for filtering every time and no business logic preserved in this case.
I’m not a Postgres guru, so I currently didn’t manage the query to use index on boolean value when filtering with this boolean value. Then I will use cast to char(1)
as a workaround.
Two approaches are below and both use more or less the same access pattern, but generated column is business meaningful:
create table t ( name varchar(100), x int[], y int )
<some inserts here...>
create index ix_t_bool on t((array_length(x, 1) = y))
explain analyze select * from t where (array_length(x, 1) = y) = true| QUERY PLAN | | :------------------------------------------------------------------------------------------- | | Seq Scan on t (cost=0.00..20.09 rows=5 width=254) (actual time=0.014..0.092 rows=2 loops=1) | | Filter: (array_length(x, 1) = y) | | Rows Removed by Filter: 1004 | | Planning Time: 0.393 ms | | Execution Time: 0.105 ms |
Index is not used for some reason. Let’s cast to use explicit =
predicate.
create index ix_t_int on t(( (array_length(x, 1) = y)::char(1) ))
explain analyze select * from t where (array_length(x, 1) = y)::char(1) = 'Y'| QUERY PLAN | | :-------------------------------------------------------------------------------------------------------------- | | Bitmap Heap Scan on t (cost=4.31..9.69 rows=5 width=254) (actual time=0.045..0.045 rows=0 loops=1) | | Recheck Cond: (((array_length(x, 1) = y))::character(1) = 'Y'::bpchar) | | -> Bitmap Index Scan on ix_t_int (cost=0.00..4.31 rows=5 width=0) (actual time=0.043..0.043 rows=0 loops=1) | | Index Cond: (((array_length(x, 1) = y))::character(1) = 'Y'::bpchar) | | Planning Time: 0.250 ms | | Execution Time: 0.067 ms |
alter table t add column state text generated always as ( case array_length(x, 1) when y then 'SOLVED' else 'NOT_SOLVED' end ) stored
create index ix_t_calc on t(state)
explain analyze select * from t where state = 'SOLVED'| QUERY PLAN | | :--------------------------------------------------------------------------------------------------------------- | | Bitmap Heap Scan on t (cost=4.31..11.31 rows=5 width=286) (actual time=0.025..0.026 rows=2 loops=1) | | Recheck Cond: (state = 'SOLVED'::text) | | Heap Blocks: exact=1 | | -> Bitmap Index Scan on ix_t_calc (cost=0.00..4.31 rows=5 width=0) (actual time=0.020..0.020 rows=2 loops=1) | | Index Cond: (state = 'SOLVED'::text) | | Planning Time: 0.307 ms | | Execution Time: 0.047 ms |
db<>fiddle here