Skip to content
Advertisement

How to get the size of the duplicates in this query?

Let’s say I have the following data:

I want to find out how much data I have in duplicates — that is — if we have one of each of the files (unique by md5) how much space do we save? The answer should be:

Here is the base query I have thus far:

I think the simplest way to do this is to do ABS(file_size - total_size), but I’m not sure how to do this without using a sub-select or an outer-query that has access to both the aggregate size and the single file size. What would be the proper way to do this, or is the only way with a subselect?

Advertisement

Answer

You could use:

screen capture from demo link below

Demo

Note that my answer assumes that all records for a given md5 would always have the same file_size values, in the event that there be more than one record. If not, then my answer would not work, but we would have to redefine the logic anyway in this case.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement