Let’s say I have the following data:
md5 file_size 1234 127.4 1234 127.4 1234 127.4 909 18.2 909 18.2 104 207
I want to find out how much data I have in duplicates — that is — if we have one of each of the files (unique by md5) how much space do we save? The answer should be:
md5 size_saved 1234 254.8 909 18.2 104 0
Here is the base query I have thus far:
SELECT file_size, SUM(file_size) total_size FROM files GROUP BY md5
I think the simplest way to do this is to do ABS(file_size - total_size)
, but I’m not sure how to do this without using a sub-select or an outer-query that has access to both the aggregate size and the single file size. What would be the proper way to do this, or is the only way with a subselect?
Advertisement
Answer
You could use:
SELECT md5, SUM(file_size) - MIN(file_size) AS size_saved FROM yourTable GROUP BY md5;
Note that my answer assumes that all records for a given md5
would always have the same file_size
values, in the event that there be more than one record. If not, then my answer would not work, but we would have to redefine the logic anyway in this case.