Skip to content
Advertisement

Remove duplicate values by taking latest data load

I’m working with enterprise data that looks like this.

| load_number | id        | time     | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692        | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1692        | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1442        | 570732257 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1442        | 570732257 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |
| 1442        | 570732258 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1442        | 570732258 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |

The issue is that the company has bad data practices and changes/reuses IDs, but only updates the load_number field.

How do I construct my sql query to pull the latest loaded data like so:

| load_number | id        | time     | slot_time | region | network |
|-------------|-----------|----------|-----------|--------|---------|
| 1692        | 641131146 | 00:20:00 | 00:20:00  | FX-4   | SBOB    |
| 1692        | 641131146 | 00:20:00 | 00:20:30  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:00  | FX-4   | SBOB    |
| 1692        | 641131147 | 00:55:00 | 00:55:30  | FX-4   | SBOB    |

Basically every field will match except the id and the load_number. So given that every field matches except those two fields, can I remove ‘duplicates’ by taking the row with the higher load_number.

I was thinking of some sort of descending rank() on the load_number, Any help is much appreciated!

Advertisement

Answer

Try something like this

with max_load_numbers_by_id AS (
  SELECT et.id, MAX(et.load_number) AS max_load_number
  FROM enterprise_table et
  GROUP BY et.id 
)

SELECT et.*
FROM enterprise_table et
JOIN max_load_numbers_by_id mlnbi
  ON et.id = mlnbi.id
  AND et.max_load_number = mlnbi.load_number
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement