I’m working with enterprise data that looks like this.
| load_number | id | time | slot_time | region | network | |-------------|-----------|----------|-----------|--------|---------| | 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB | | 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB | | 1442 | 570732257 | 00:20:00 | 00:20:00 | FX-4 | SBOB | | 1442 | 570732257 | 00:20:00 | 00:20:30 | FX-4 | SBOB | | 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB | | 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB | | 1442 | 570732258 | 00:55:00 | 00:55:00 | FX-4 | SBOB | | 1442 | 570732258 | 00:55:00 | 00:55:30 | FX-4 | SBOB |
The issue is that the company has bad data practices and changes/reuses IDs, but only updates the load_number
field.
How do I construct my sql query to pull the latest loaded data like so:
| load_number | id | time | slot_time | region | network | |-------------|-----------|----------|-----------|--------|---------| | 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB | | 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB | | 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB | | 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB |
Basically every field will match except the id
and the load_number
. So given that every field matches except those two fields, can I remove ‘duplicates’ by taking the row with the higher load_number
.
I was thinking of some sort of descending rank()
on the load_number
,
Any help is much appreciated!
Advertisement
Answer
Try something like this
with max_load_numbers_by_id AS ( SELECT et.id, MAX(et.load_number) AS max_load_number FROM enterprise_table et GROUP BY et.id ) SELECT et.* FROM enterprise_table et JOIN max_load_numbers_by_id mlnbi ON et.id = mlnbi.id AND et.max_load_number = mlnbi.load_number