There are two tables:
Table education_data
(list of countries with values by year per measured indicator).
create table education_data (country_id int, indicator_id int, year date, value float );
Table indicators
(list of all indicators):
create table indicators (id int PRIMARY KEY, name varchar(200), code varchar(25) );
I want to find the indicators for which the highest number of countries lack information entirely i.e. max (count of missing indicators by country)
I have solved the problem in excel (by counting blanks in a pivot table by country)
pivot table with count for missing indicators by country
I haven’t figured our yet the SQL query to return the same results.
I am able to return the number of missing indicators for a set country , read query below, but not for all countries.
SELECT COUNT(*) FROM education_data AS edu RIGHT JOIN indicators AS ind ON edu.indicator_id = ind.id and country_id = 10 WHERE value IS NULL GROUP BY country_id
I have tried with a cross join without success so far.
Advertisement
Answer
You will have to join on the contries
as well, otherwise you can not tell if a contry has no entry in education_data
at all:
create table countries(id serial primary key, name varchar); create table indicators (id int PRIMARY KEY, name varchar(200), code varchar(25) ); create table education_data (country_id int references countries, indicator_id int references indicators, year date, value float ); insert into countries values (1,'USA'); insert into countries values (2,'Norway'); insert into countries values (3,'France'); insert into indicators values (1,'foo','xxx'); insert into indicators values (2,'bar', 'yyy'); insert into education_data values(1,1,'01-01-2020',1.1); SELECT count (c.id), i.id, i.name FROM countries c JOIN indicators i ON (true) LEFT JOIN education_data e ON(c.id = e.country_id AND i.id = e.indicator_id) WHERE indicator_id IS NULL GROUP BY i.id; count | id | name -------+----+------ 3 | 2 | bar 2 | 1 | foo (2 rows)