Skip to content
Advertisement

Remove duplicate values by taking latest data load

I’m working with enterprise data that looks like this.

The issue is that the company has bad data practices and changes/reuses IDs, but only updates the load_number field.

How do I construct my sql query to pull the latest loaded data like so:

Basically every field will match except the id and the load_number. So given that every field matches except those two fields, can I remove ‘duplicates’ by taking the row with the higher load_number.

I was thinking of some sort of descending rank() on the load_number, Any help is much appreciated!

Advertisement

Answer

Try something like this

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement