I’ve got a music player system which is supposed to play each song in my database, mostly randomly, before repeating any songs. My problem is, there are some very old songs in the database that are being skipped.
My query is set up to pull the 50 oldest songs in the database but with no duplicate artists.
Here is my query:
SELECT * FROM spotify_master WHERE days IS NULL AND id NOT LIKE 'random_%' AND type = ? AND autoplay = 1 GROUP BY Substring_index(artist, ',', 1) ORDER BY last_played ASC LIMIT 50
If I remove the GROUP BY
part, the query actually returns the oldest songs in the database, but it includes duplicate artists.
I’m not sure why this is straight up causing songs to not show up at all, but the oldest song in my database has a timestamp of 2019-10-30 10:55:28
, while this query is selecting songs with a timestamp of only 2 weeks ago or so.
I currently have around 550 songs that haven’t played in at 3-6 months, which is 1/4th of all the music it can choose from.
How can I get this query to select the oldest songs in the database with no duplicate artists?
Here is an image of the results of my query as-is (but changed to LIMIT 5)
Here is an image of the exact same query, except without the GROUP BY
clause. This returns the expected results with the oldest timestamps, but it also has the potential to return multiple songs from the same artist, which is what I’m trying to avoid.
Advertisement
Answer
I must preface this by saying I am no MySQL expert.
Generally when you do a GROUP BY
you group by a list of columns. For the other columns an aggregate function needs to be applied. I assume MySQL is using any_value
. Thus your ORDER BY
will be applied on data that was aggregated rendering it meaningless.
After doing a bit of research, it seems that indeed the values are randomly choosen by MySQL and it is just bad practice. I strongly advise you not to write query like these.
To disallow this, you can say at runtime:
SET sql_mode := CONCAT('ONLY_FULL_GROUP_BY,',@@sql_mode);
I think this should do the trick for what you want to achieve:
SELECT DISTINCT FIRST_VALUE(id) OVER (PARTITION BY Substring_index(artist, ',', 1) ORDER BY last_played ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as id, FIRST_VALUE(name) OVER (PARTITION BY Substring_index(artist, ',', 1) ORDER BY last_played ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as name, FIRST_VALUE(artist) OVER (PARTITION BY Substring_index(artist, ',', 1) ORDER BY last_played ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as artist, FIRST_VALUE(last_played) OVER (PARTITION BY Substring_index(artist, ',', 1) ORDER BY last_played ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as last_played FROM spotify_master WHERE days IS NULL AND id NOT LIKE 'random_%' AND type = 'english' AND autoplay = 1 LIMIT 50
There is more work involved in making it work prior to MySQL 8 as you will need to partition by hand :
SELECT * FROM (SELECT ( CASE Substring_index(artist, ',', 1) WHEN @curType THEN @curRow := @curRow + 1 ELSE @curRow := 1 AND @curType := Substring_index(artist, ',', 1) END ) + 1 AS rank, id, name, artist, Substring_index(artist, ',', 1), days, type, autoplay FROM spotify_master , (SELECT @curRow := 0, @curType := '') r ORDER BY Substring_index(artist, ',', 1), last_played ASC) ManualRankingDueToLackOfPartitioning WHERE rank=1 LIMIT 50
What I do is basically parition by Substring_index(artist, ',', 1)
and order by last_played
, in this way showing the oldest played song for each artists and picking 50 artists.
If you want to prioritise the oldest last_played songs you just need to add an ORDER BY
SELECT * FROM (SELECT ( CASE Substring_index(artist, ',', 1) WHEN @curType THEN @curRow := @curRow + 1 ELSE @curRow := 1 AND @curType := Substring_index(artist, ',', 1) END ) + 1 AS rank, id, name, artist, Substring_index(artist, ',', 1), days, type, autoplay, last_played FROM spotify_master , (SELECT @curRow := 0, @curType := '') r ORDER BY Substring_index(artist, ',', 1), last_played ASC) ManualRankingDueToLackOfPartitioning WHERE rank=1 ORDER BY last_played ASC LIMIT 50