My data looks like this in Teradata
| load_number | id | time | slot_time | region | network | |-------------|-----------|----------|-----------|--------|---------| | 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB | | 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB | | 1442 | 570732257 | 00:20:00 | 00:20:00 | FX-4 | SBOB | | 1442 | 570732257 | 00:20:00 | 00:20:30 | FX-4 | SBOB | | 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB | | 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB | | 1442 | 570732258 | 00:55:00 | 00:55:00 | FX-4 | SBOB | | 1442 | 570732258 | 00:55:00 | 00:55:30 | FX-4 | SBOB | | 14 | 123456 | 00:20:00 | 00:20:00 | ABC | XYZ | | 14 | 123456 | 00:20:00 | 00:20:30 | ABC | XYZ | | 19 | 234567 | 00:20:00 | 00:20:00 | ABC | XYZ | | 19 | 234567 | 00:20:00 | 00:20:30 | ABC | XYZ |
How do I construct my sql query to pull the latest loaded data like so:
| load_number | id | time | slot_time | region | network | |-------------|-----------|----------|-----------|--------|---------| | 1692 | 641131146 | 00:20:00 | 00:20:00 | FX-4 | SBOB | | 1692 | 641131146 | 00:20:00 | 00:20:30 | FX-4 | SBOB | | 1692 | 641131147 | 00:55:00 | 00:55:00 | FX-4 | SBOB | | 1692 | 641131147 | 00:55:00 | 00:55:30 | FX-4 | SBOB | | 19 | 234567 | 00:20:00 | 00:20:00 | ABC | XYZ | | 19 | 234567 | 00:20:00 | 00:20:30 | ABC | XYZ |
Basically every field will match except the id
and the load_number
. So given that every field matches except those two fields, can I remove ‘duplicates’ by taking the row with the higher load_number
. The data can differ in the region
and network
columns, and also the load_number
can be different as well.
I was thinking of some sort of descending rank() on the load_number
or windowing over all columns that match on all fields but the id
and load_number
, and then taking the highest load_number , Any help is much appreciated!
Advertisement
Answer
If I understand correctly, you can use row_number()
and qualify
:
select t.* from t qualify row_number() over (partition by time, slot_time, region, network order by load_number desc) = 1