Ok, so I’ve got two tables: jobs, and job runs. I’m using Postgres.
I want to look at 2 periods. 7 days ago until now, and 14 days ago to 7 days ago.
For each job, I want a total of the number of runs, and a percentage of successful and unsuccessful runs for each period. I’ve cooked up this terrible query:
WITH results AS ( select coalesce(count(case when succeeded = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_passes, coalesce(count(case when succeeded = false AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_failures, coalesce(count(case when timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_total_runs, coalesce(count(case when infrastructure_failure = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_infrastructure_failures, coalesce(count(case when succeeded = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_passes, coalesce(count(case when succeeded = false AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_failures, coalesce(count(case when timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_total_runs, coalesce(count(case when infrastructure_failure = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_infrastructure_failures FROM prow_job_runs JOIN prow_jobs ON prow_jobs.id = prow_job_runs.prow_job_id WHERE prow_jobs.name = 'promote-release-openshift-machine-os-content-e2e-aws-4.10' ) SELECT *, previous_passes * 100.0 / NULLIF(previous_total_runs, 0) AS previous_pass_percentage, previous_failures * 100.0 / NULLIF(previous_total_runs, 0) AS previous_failure_percentage, current_passes * 100.0 / NULLIF(current_total_runs, 0) AS current_pass_percentage, current_failures * 100.0 / NULLIF(current_total_runs, 0) AS current_failure_percentage FROM results;
Which gets me the result I want:
-[ RECORD 1 ]--------------------+----------------------- previous_passes | 591 previous_failures | 4 previous_total_runs | 595 previous_infrastructure_failures | 1 current_passes | 67 current_failures | 0 current_total_runs | 67 current_infrastructure_failures | 0 previous_pass_percentage | 99.3277310924369748 previous_failure_percentage | 0.67226890756302521008 current_pass_percentage | 100.0000000000000000 current_failure_percentage | 0.00000000000000000000
Here is the execution plan:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Subquery Scan on results (cost=661.12..661.19 rows=1 width=192) -> Aggregate (cost=661.12..661.13 rows=1 width=64) -> Hash Join (cost=8.30..650.89 rows=93 width=10) Hash Cond: (prow_job_runs.prow_job_id = prow_jobs.id) -> Seq Scan on prow_job_runs (cost=0.00..603.60 rows=14460 width=18) -> Hash (cost=8.29..8.29 rows=1 width=8) -> Index Scan using prow_jobs_name_key on prow_jobs (cost=0.27..8.29 rows=1 width=8) Index Cond: (name = 'promote-release-openshift-machine-os-content-e2e-aws-4.10'::text) (8 rows)
But it’s only for a single job, how do I do get results for each job without doing a for loop in code?
I also think my query is really slow, it’s > 8ms for just one job run.
TY
Advertisement
Answer
you need to provide the query execution plan
.
but you have to make sure you have necessary indices and also maybe you limit your number of rows in join it will be helpful:
WITH results AS ( select prow_jobs.name, coalesce(count(case when succeeded = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_passes, coalesce(count(case when succeeded = false AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_failures, coalesce(count(case when timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_total_runs, coalesce(count(case when infrastructure_failure = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_infrastructure_failures, coalesce(count(case when succeeded = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_passes, coalesce(count(case when succeeded = false AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_failures, coalesce(count(case when timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_total_runs, coalesce(count(case when infrastructure_failure = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_infrastructure_failures FROM prow_job_runs JOIN prow_jobs ON prow_jobs.id = prow_job_runs.prow_job_id and timestamp BETWEEN NOW() and now() - INTERVAL '14 DAY' group by prow_jobs.name ) SELECT *, previous_passes * 100.0 / NULLIF(previous_total_runs, 0) AS previous_pass_percentage, previous_failures * 100.0 / NULLIF(previous_total_runs, 0) AS previous_failure_percentage, current_passes * 100.0 / NULLIF(current_total_runs, 0) AS current_pass_percentage, current_failures * 100.0 / NULLIF(current_total_runs, 0) AS current_failure_percentage FROM results;
and seems like you don’t have any index on prow_job_runs table, add an index on that table with columns (id,succeeded,infrastructure_failure,timestamp, prow_job_id)