Skip to content
Advertisement

Query to deduplicate based on one columns

My data looks like this in Teradata

How do I construct my sql query to pull the latest loaded data like so:

Basically every field will match except the id and the load_number. So given that every field matches except those two fields, can I remove ‘duplicates’ by taking the row with the higher load_number. The data can differ in the region and network columns, and also the load_number can be different as well.

I was thinking of some sort of descending rank() on the load_number or windowing over all columns that match on all fields but the id and load_number, and then taking the highest load_number , Any help is much appreciated!

Advertisement

Answer

If I understand correctly, you can use row_number() and qualify:

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement