Skip to content
Advertisement

How could I speed up this SQL query?

I have this query:

select 
    "time_interval", 
    SUM("mv"."percent_ampl_bought") as "percent_ampl_bought", 
    SUM("mv"."percent_ampl_sold") as "percent_ampl_sold", 
    SUM("mv"."percent_ampl_transferred") as "percent_ampl_transferred", 
    SUM("mv"."amount_ampl_bought") as "amount_ampl_bought", 
    SUM("mv"."amount_ampl_sold") as "amount_ampl_sold", 
    SUM("mv"."amount_ampl_transferred") as "amount_ampl_transferred" 
from "mv_30day_daily_aggregate_buys_sells_transfers" as "mv" 
group by grouping sets ( (time_interval), () ) 
order by time_interval desc nulls last;

With this explain plan:

https://explain.depesz.com/s/gJXC

I have these indexes:

CREATE UNIQUE INDEX mv_30day_daily_aggregate_buys_sells_transfers_primary
  ON public.mv_30day_daily_aggregate_buys_sells_transfers USING btree
  (time_interval, contract_address);

CREATE INDEX mv_30day_daily_aggregate_buys_sells_transfers_time_interval
  ON public.mv_30day_daily_aggregate_buys_sells_transfers USING btree
  (time_interval);

CREATE INDEX v_30day_daily_aggregate_buys_sells_transfers_contract_address
  ON public.mv_30day_daily_aggregate_buys_sells_transfers USING btree
  (contract_address)

Is it possible to further optimise this? There are only 30 time intervals for this table so I feel like I should be able to get it faster.

Advertisement

Answer

A main limitation here (at least if you have CPUs to spare) is that GROUPING SETS does not support parallel execution. As far as I can tell, there is no fundamental reason for that, it is just that no one got around to hooking it up yet. With only 32 groups, this should parallelize nicely if it were willing to use it. So one thing you could do is rewrite this as two queries with a UNION ALL, so that you can get parallelization.

select 
    "time_interval", 
    SUM("mv"."percent_ampl_bought") as "percent_ampl_bought", 
    SUM("mv"."percent_ampl_sold") as "percent_ampl_sold", 
    SUM("mv"."percent_ampl_transferred") as "percent_ampl_transferred", 
    SUM("mv"."amount_ampl_bought") as "amount_ampl_bought", 
    SUM("mv"."amount_ampl_sold") as "amount_ampl_sold", 
    SUM("mv"."amount_ampl_transferred") as "amount_ampl_transferred" 
from "mv_30day_daily_aggregate_buys_sells_transfers" as "mv" 
group by time_interval 
union all 
select 
    NULL, 
    SUM("mv"."percent_ampl_bought") as "percent_ampl_bought", 
    SUM("mv"."percent_ampl_sold") as "percent_ampl_sold", 
    SUM("mv"."percent_ampl_transferred") as "percent_ampl_transferred", 
    SUM("mv"."amount_ampl_bought") as "amount_ampl_bought", 
    SUM("mv"."amount_ampl_sold") as "amount_ampl_sold", 
    SUM("mv"."amount_ampl_transferred") as "amount_ampl_transferred" 
from "mv_30day_daily_aggregate_buys_sells_transfers" as "mv" 
order by time_interval desc nulls last;

But you might as well take advantage of our knowledge that sums can be pieced together from their arbitrarily divided parts:

with t as (select 
    "time_interval", 
    SUM("mv"."percent_ampl_bought") as "percent_ampl_bought", 
    SUM("mv"."percent_ampl_sold") as "percent_ampl_sold", 
    SUM("mv"."percent_ampl_transferred") as "percent_ampl_transferred", 
    SUM("mv"."amount_ampl_bought") as "amount_ampl_bought", 
    SUM("mv"."amount_ampl_sold") as "amount_ampl_sold", 
    SUM("mv"."amount_ampl_transferred") as "amount_ampl_transferred" 
    from "mv_30day_daily_aggregate_buys_sells_transfers" as "mv" 
    group by time_interval 
) 
select * from t 
union all 
select NULL, sum(percent_ampl_bought), sum(percent_ampl_sold), sum(percent_ampl_transferred),sum(amount_ampl_bought), sum(amount_ampl_sold), sum(amount_ampl_transferred) from t 
order by time_interval desc nulls last;

With this last one, I get about a 3 fold speed over the original, but it would be more if I had more than 2 CPUs on my test box.

Also, if you change the types to double precision rather than numeric, then it would be faster yet.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement