How can I optimize Postgresql ARRAY_AGG queries fo…

I am using PostgreSQL for its array functionality. Here’s my schema:

CREATE TABLE questions (
  id INTEGER PRIMARY KEY,
  product_id INTEGER UNIQUE NOT NULL,
  body VARCHAR(1000) NOT NULL,
  date_written DATE NOT NULL DEFAULT current_date,
  asker_name VARCHAR(60) NOT NULL,
  asker_email VARCHAR(60) NOT NULL,
  reported BOOLEAN DEFAULT FALSE,
  helpful INTEGER NOT NULL DEFAULT 0
);

CREATE TABLE answers (
  id PRIMARY KEY NOT NULL,
  question_id INTEGER NOT NULL,
  body VARCHAR(1000) NOT NULL,
  date_written DATE NOT NULL DEFAULT current_date,
  answerer_name VARCHAR(60) NOT NULL,
  answerer_email VARCHAR(60) NOT NULL,
  reported BOOLEAN DEFAULT FALSE,
  helpful INTEGER NOT NULL DEFAULT 0
);

CREATE TABLE photos (
  id INTEGER UNIQUE,
  answer_id INTEGER NOT NULL,
  photo VARCHAR(200)
);

I am trying to query my answers table to get a list of all the answers for a given question id and include an array of all photos that exist for that given answer_id. The results should be sorted in descending order of helpfulness. So far, I have a massive query that displays the results I’m looking for, but the execution time is 729.595 ms. I am trying to optimize to get the query’s time down to 200 ms. I have the following indexes to try and optimize my query times:

  indexname    |                                 indexdef                                  
-----------------+---------------------------------------------------------------------------
 answer_id       | CREATE UNIQUE INDEX answer_id ON public.answers USING btree (id)
 question_id     | CREATE INDEX question_id ON public.answers USING btree (question_id)
 idx_reported_id | CREATE INDEX idx_reported_id ON public.answers USING btree (reported, id)
 answers_pkey    | CREATE UNIQUE INDEX answers_pkey ON public.answers USING btree (id)

  indexname    |                                  indexdef                                  
----------------+----------------------------------------------------------------------------
 id             | CREATE UNIQUE INDEX id ON public.questions USING btree (id)
 idx_q_reported | CREATE INDEX idx_q_reported ON public.questions USING btree (id, reported)
 questions_pkey | CREATE UNIQUE INDEX questions_pkey ON public.questions USING btree (id)

   indexname   |                              indexdef                               
---------------+---------------------------------------------------------------------
 photos_id_key | CREATE UNIQUE INDEX photos_id_key ON public.photos USING btree (id)
 p_links       | CREATE INDEX p_links ON public.photos USING btree (photo)

In my analysis, I noticed that the GroupAggregate is time-consuming: GroupAggregate (cost=126222.21..126222.71 rows=25 width=129) (actual time=729.497..729.506 rows=5 loops=1) Group Key: answers.id

Is there a way I can avoid the time-consuming GROUP BY? Am I missing something with the indexes? Here is the query itself:

SELECT answers.id, 
       question_id, 
       body, 
       date_written, 
       answerer_name, 
       answerer_email, 
       reported, 
       helpful, 
       ARRAY_AGG(photo) as photos 
FROM answers 
  LEFT JOIN photos ON answers.id = photos.answer_id 
WHERE reported IS 
false AND answers.id IN (SELECT id 
                         FROM answers 
                         WHERE question_id = 20012) 
GROUP BY answers.id 
ORDER BY helpful DESC;

Thanks!

Answer

I think you can skip the subquery:

SELECT answers.id, question_id, body, date_written, answerer_name, answerer_email, reported, helpful, ARRAY_AGG(photo) as photos 
FROM answers 
LEFT JOIN photos ON answers.id = photos.answer_id 
WHERE reported IS false AND question_id = 20012 
GROUP BY answers.id, question_id, body, date_written, answerer_name, answerer_email, reported, helpful
ORDER BY helpful DESC;

You can add a btree index on photos.answer_id because this field is use in the join clause.

You losed same fields on the GROUP BY clause;

How can I optimize Postgresql ARRAY_AGG queries for large tables?

Advertisement

Answer