Skip to content
Advertisement

How can I select an ID from the row with the maximum value for each of several columns?

I want to extract the values of an ID column from the rows containing the maximum values of each of several other columns, and then collate these in a new table which has the column headers in one column and the row IDs of the maximum values in another.

To clarify:

My data is structured like this, extending into a roughly 20×20 grid:

table:

 id | name  | A | B | C | D | E | ... 
----|-------|---|---|---|---|---|----
101 | name1 | 4 | 4 | 1 | 3 | 3 | ... 
102 | name2 | 3 | 1 | 2 | 4 | 2 | ... 
103 | name3 | 2 | 2 | 3 | 2 | 1 | ... 
104 | name4 | 1 | 3 | 4 | 1 | 4 | ... 
... | ..... | . | . | . | . | . | ... 

And I want a query that returns the name which contains the highest value for each column A, B, C etc., like so:

top_scorers:

id | top_scorer
---|-----------
 A | name1
 B | name1
 C | name4
 D | name2
 E | name4

In this case I know the maximum values will all be the same so I can search for values = 4, but it might not be in future so finding the max would be a more robust method.

The best I’ve come up with so far is:

SELECT name AS top_scorer FROM table
WHERE A = 4
UNION
SELECT name AS top_scorer FROM table
WHERE B = 4
UNION
...

And so on, repeating for each column header.

I hoped this would generate the top_scorer column above, which I could then join to another table containing the column headers of my original table. However it does not return any name more than once, so I only get as many rows are there are distinct entries in the top_scorer column. The resulting values are also sorted alphabetically rather than in the order they’re first found.

So rather than

top_scorer
----------
name1
name1
name4
name2
name4

the code above returns

top_scorer
----------
name1
name2
name4

I also tried:

SELECT name AS top_scorer FROM table
WHERE A = MAX(A)
UNION
SELECT name AS top_scorer FROM table
WHERE B = MAX(B)
UNION
...

But this gave an error: “misuse of aggregate function MAX()”

Can anyone suggest a query that would return the top_scorers table above?

Similar questions I’ve found are either looking to return maximums from one column using GROUP BY, or the maximum value across a range of columns for each row (equivalent to the highest score achieved by each name in this case), which aren’t what I’m looking for.

Is there maybe something clever that can be done with PIVOT or some other transposing function? I’m a SQL beginner and haven’t tried using anything like that before.

This is running on DB Browser for SQLite.

Advertisement

Answer

The following approach takes advantage of one of sqlite3’s documented but non-standard behaviors when mixing aggregate and non-aggregate results in a query – when using max(), all non-aggregate values are taken from one of the rows with the maximum value (Ties broken at random):

WITH maxes(column, name, maxval) AS
  (SELECT 'A', name, max(A) FROM mytable
   UNION ALL
   SELECT 'B', name, max(B) FROM mytable
   UNION ALL
   SELECT 'C', name, max(C) FROM mytable
   UNION ALL
   SELECT 'D', name, max(D) FROM mytable
   UNION ALL
   SELECT 'E', name, max(E) FROM mytable)
SELECT column AS id, name AS top_scorer
FROM maxes
ORDER BY column;

which gives

id          top_scorer
----------  ----------
A           name1
B           name1
C           name4
D           name2
E           name4

However, a database design that uses a one-to-many relationship with a second table instead of one column per thing is going to be a better approach.

Consider this schema:

CREATE TABLE names(id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE scores(name_id INTEGER REFERENCES names(id)
                  , score_id TEXT
                  , val INTEGER
                  , PRIMARY KEY(name_id, score_id)) WITHOUT ROWID;

populated with your test data:

INSERT INTO names VALUES(101,'name1');
INSERT INTO names VALUES(102,'name2');
INSERT INTO names VALUES(103,'name3');
INSERT INTO names VALUES(104,'name4');
INSERT INTO scores VALUES(101,'A',4);
INSERT INTO scores VALUES(101,'B',4);
INSERT INTO scores VALUES(101,'C',1);
INSERT INTO scores VALUES(101,'D',3);
INSERT INTO scores VALUES(101,'E',3);
INSERT INTO scores VALUES(102,'A',3);
INSERT INTO scores VALUES(102,'B',1);
INSERT INTO scores VALUES(102,'C',2);
INSERT INTO scores VALUES(102,'D',4);
INSERT INTO scores VALUES(102,'E',2);
INSERT INTO scores VALUES(103,'A',2);
INSERT INTO scores VALUES(103,'B',2);
INSERT INTO scores VALUES(103,'C',3);
INSERT INTO scores VALUES(103,'D',2);
INSERT INTO scores VALUES(103,'E',1);
INSERT INTO scores VALUES(104,'A',1);
INSERT INTO scores VALUES(104,'B',3);
INSERT INTO scores VALUES(104,'C',4);
INSERT INTO scores VALUES(104,'D',1);
INSERT INTO scores VALUES(104,'E',4);

you can use this query:

WITH maxes AS
 (SELECT score_id, name, max(val)
  FROM names
  JOIN scores ON id = name_id
  GROUP BY score_id)
SELECT score_id AS id, name AS top_scorer
FROM maxes
ORDER BY score_id;

which avoids having to hard code each id being tracked like the current one-per-column design you’re using. Much, much cleaner and flexible.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement