Skip to content
Advertisement

Why does this suggested solution prefer a different way of calculating percentages based on aggregations?

I’m working through a problem set from CMU’s public db systems course. I have the following two tables:

Order

Id CustomerId EmployeeId OrderDate RequiredDate ShippedDate ShipVia Freight ShipName ShipAddress ShipCity ShipRegion ShipPostalCode ShipCountry
10248 VINET 5 2012-07-04 2012-08-01 2012-07-16 3 16.75 Vins et alcools Chevalier 59 rue de l’Abbaye Reims Western Europe 51100 France
10249 TOMSP 6 2012-07-05 2012-08-16 2012-07-10 1 22.25 Toms Spezialitäten Luisenstr. 48 Münster Western Europe 44087 Germany

Shipper

Id CompanyName Phone
1 Speedy Express (503) 555-9831
2 United Package (503) 555-3199

The question I am attempting to answer is the following:

For each Shipper, find the percentage of orders which are late. Details: An order is considered late if ShippedDate > RequiredDate. Print the following format, order by descending percentage, rounded to the nearest hundredths, like United Package|23.44.

This is the query I came up with, which produces the expected results:

The official solution is a more involved query as follows:

The outputs of the two queries match, each returning the following:

What is the benefit of doing it as shown in the solution rather than using my approach?

Advertisement

Answer

The official solution is a very bad solution because it aggregates twice on the same table Order and then joins the results whilst the same can be achieved by a single aggregation query (like your first query).

I can’t believe that this would be proposed as a proper solution for this type of problems.
I hope it’s just for practice on joins for subqueries.

Also, I find it very disappointing to see the use of single quotes for table (or column) names/aliases.
This is a very bad practice an it can lead to unexpected results.
When needed use double quotes which is the SQL standard (square brackets or backticks also work in SQLite).

As I said your first query is correct and it could be further simplified with the use of the aggregate function AVG():

Note the use of Id (it is the primary key of Shipper) in the GROUP BY clause instead of CompanyName, which provides better performance.

Also, I used COALESCE(), because I found (after downloading your database) that in the column ShippedDate there are nulls.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement