Skip to content
Advertisement

MySQL query for multi-column distinct plus an ancillary column condition

Imagine a flat table that tracks game matches in which each game has three participants: an attacker, a defender and a bettor who is wagering on the outcome of the battle between players 1 and 2. The table includes the names of the players and the bettor of each game, as well as the date of the game, the scores of each player, the game venue and the name of the referee. I have included the CREATE sql for some sample data below.

DROP TABLE IF EXISTS `game`;

CREATE TABLE `game` (
  `game_date` text,
  `player_1` text,
  `player_2` text,
  `bettor` text,
  `p1_score` double DEFAULT NULL,
  `p2_score` double DEFAULT NULL,
  `result` double DEFAULT NULL,
  `venue` text,
  `referee` text
) 

INSERT INTO `game` VALUES ('2020-04-05','Bob','Kelly','Kevin',100,78,0.2,'TS1','Richard'),('2020-03-06','Jim','Bob','Dave',100,97,1.2,'TS2','Mike'),('2020-02-05','Jim','Bob','Kevin',100,86,0.9,'TS2','Mike'),('2020-01-06','Kelly','Bob','Jim',100,92,1.3,'TS2','Richard'),('2019-12-07','Kelly','Bob','Jim',100,98,1.7,'TS1','Mike'),('2019-11-07','Kelly','Bob','Kevin',78,100,2.1,'TS2','Mike'),('2019-10-08','Kelly','Bob','Kevin',97,100,1.5,'TS1','Mike'),('2019-09-08','Kelly','Jim','Dave',86,100,2.4,'TS1','Richard'),('2019-08-09','Kelly','Jim','Dave',92,100,2.8,'TS2','Mike'),('2019-07-10','Kelly','Jim','Dave',98,100,2.2,'TS2','Mike'),('2019-06-10','Kelly','Jim','Dave',100,78,1.9,'TS2','Richard'),('2019-05-11','Sarah','Jim','Kevin',100,97,2.1,'TS1','Mike'),('2019-04-11','Sarah','Jim','Kevin',100,86,2.1,'TS2','Mike'),('2019-03-12','Sarah','Jim','Kevin',100,92,2.8,'TS1','Mike'),('2019-02-10','Sarah','Jim','Kevin',100,98,1.8,'TS1','Richard');

I need a query that returns match info for each unique assembly of match participants… but only for the first match that the three participants ever played in all together, i.e., for the earliest game_date among the matches that all three participated in.

For example, a game where Bob was player 1, Kelly was player two and Kevin was the bettor would constitute a unique threesome. In the data, there is only one such pairing for this threesome so the query would return a row for that one match.

In the case of Sarah as player 1, Jim as player 2 and Kevin as bettor, there are four matches with that threesome and so the query would return only info for the earliest match, i.e., the one 2/10/2019.

Note that in the sample data there are two matches with the threesome ‘Kelly’,’Bob’,’Jim’. There are also two other matchs with the threesome ‘Kelly’,’Jim’,’Bob’. These are not the same because Bob and Jim swap places has player 2 and bettor. So the query would return one row for each of them, i.e., the matches dated ’12/072019′ and ’08/09/2019′, respectively.

Using DISTINCT, I can return a list of all of the unique player groupings.

SELECT DISTINCT player_1, player_2, bettor FROM games;

Using GROUP BY, I can return all of the game info for all of the matches the group played in.

SELECT * FROM games GROUP BY player_1, player_2, bettor;

But I can’t figure out how to return all of the game info but only for the earliest game where all three participants played together and in distinct roles in the games.

I have tried sub-queries using MIN() for game_date but that’s a loser. I suspect there is perhaps an INNER JOIN solution but I haven’t found it yet.

I am grateful for any guidance you can provide.

Advertisement

Answer

One canonical approach uses a join to a subquery which identifies the earliest games for each trio:

SELECT g1.*
FROM games g1
INNER JOIN
(
    SELECT player_1_name, player_2_name, player_3_name,
           MIN(game_date) AS min_game_date
    FROM games
    GROUP BY player_1_name, player_2_name, player_3_name
) g2
    ON g2.player_1_name = g1.player_1_name AND
       g2.player_2_name = g1.player_2_name AND
       g2.player_3_name = g1.player_3_name AND
       g2.min_game_date = g1.game_date;

If you are running MySQL 8+, then the ROW_NUMBER analytic function provides another option:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY player_1_name, player_2_name,
                                              player_3_name
                                 ORDER BY game_date) rn
    FROM games
)

SELECT *
FROM cte
WHERE rn = 1;
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement