Skip to content
Advertisement

How to find missing data in table Sql

This is similar to How to find missing data rows using SQL? and How to find missing rows (dates) in a mysql table? but a bit more complex, so I’m hitting a wall.

I have a data table with the noted Primary key:

country_id (PK)
product_id (PK)
history_date (PK)
amount

I have a products table with all products, a countries table, and a calendar table with all valid dates.

I’d like to find all countries, dates and products for which there are missing products, with this wrinkle: I only care about dates for which there are entries for a country for at least one product (i.e. if the country has NOTHING on that day, I don’t need to find it) – so, by definition, there is an entry in the history table for every country and date I care about.

I know it’s going to involve some joins maybe a cross join, but I’m hitting a real wall in finding missing data.

I tried this (pretty sure it wouldn’t work):

SELECT h.history_date, h.product_id, h.country_id, h.amount
FROM products p
LEFT JOIN history h ON (p.product_id = h.product_id)
WHERE h.product_id IS NULL

No Joy.

I tried this too:

WITH allData AS (SELECT h1.country_id, p.product_id, h1.history_date
FROM products p 
CROSS JOIN (SELECT DISTINCT country_id, history_date FROM history) h1)
SELECT f.history_date, f.product_id, f.country_id
FROM allData f
LEFT OUTER JOIN history h ON (f.country_id = h.country_id AND f.history_date = h.history_date AND f.product_id = h.product_id)
WHERE h.product_id IS NULL 
   AND h.country_id IS NOT NULL 
   AND h.history_date IS NOT null

also no luck. The CTE does get me every product on every date that there is also data, but the rest returns nothing.

Advertisement

Answer

I only care about dates for which there are entries for a country for at least one product (i.e. if the country has NOTHING on that day, I don’t need to find it)

So we care about this combination:

from    (select distinct country_id, history_date from history) country_date
cross join products p

Then it’s just a matter of checking for existence:

select *
from    (select distinct country_id, history_date from history) country_date
cross join products p
where not exists (select null
                  from history h
                  where country_date.country_id   = h.country_id
                   and  country_date.history_date = h.history_date
                   and  p.product_id              = h.product_id
                 ) 
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement