Skip to content
Advertisement

Unexpected results from query pulling oldest 50 songs in database with unique artists

I’ve got a music player system which is supposed to play each song in my database, mostly randomly, before repeating any songs. My problem is, there are some very old songs in the database that are being skipped.

My query is set up to pull the 50 oldest songs in the database but with no duplicate artists.

Here is my query:

SELECT * 
FROM   spotify_master 
WHERE  days IS NULL 
       AND id NOT LIKE 'random_%' 
       AND type = ? 
       AND autoplay = 1 
GROUP  BY Substring_index(artist, ',', 1) 
ORDER  BY last_played ASC 
LIMIT  50

If I remove the GROUP BY part, the query actually returns the oldest songs in the database, but it includes duplicate artists.

I’m not sure why this is straight up causing songs to not show up at all, but the oldest song in my database has a timestamp of 2019-10-30 10:55:28, while this query is selecting songs with a timestamp of only 2 weeks ago or so.

I currently have around 550 songs that haven’t played in at 3-6 months, which is 1/4th of all the music it can choose from.

How can I get this query to select the oldest songs in the database with no duplicate artists?

Here is an image of the results of my query as-is (but changed to LIMIT 5) enter image description here

Here is an image of the exact same query, except without the GROUP BY clause. This returns the expected results with the oldest timestamps, but it also has the potential to return multiple songs from the same artist, which is what I’m trying to avoid. enter image description here

Advertisement

Answer

I must preface this by saying I am no MySQL expert.

Generally when you do a GROUP BY you group by a list of columns. For the other columns an aggregate function needs to be applied. I assume MySQL is using any_value. Thus your ORDER BY will be applied on data that was aggregated rendering it meaningless.

After doing a bit of research, it seems that indeed the values are randomly choosen by MySQL and it is just bad practice. I strongly advise you not to write query like these.

To disallow this, you can say at runtime:

SET sql_mode := CONCAT('ONLY_FULL_GROUP_BY,',@@sql_mode);

I think this should do the trick for what you want to achieve:

SELECT DISTINCT 
       FIRST_VALUE(id) OVER (PARTITION BY Substring_index(artist, ',', 1) ORDER BY last_played ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as id,
       FIRST_VALUE(name) OVER (PARTITION BY Substring_index(artist, ',', 1) ORDER BY last_played ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as name,
       FIRST_VALUE(artist) OVER (PARTITION BY Substring_index(artist, ',', 1) ORDER BY last_played ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as artist,
       FIRST_VALUE(last_played) OVER (PARTITION BY Substring_index(artist, ',', 1) ORDER BY last_played ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as last_played
FROM   spotify_master 
WHERE  days IS NULL 
       AND id NOT LIKE 'random_%' 
       AND type = 'english'
       AND autoplay = 1 
LIMIT  50

There is more work involved in making it work prior to MySQL 8 as you will need to partition by hand :

SELECT * FROM 
(SELECT    ( 
              CASE Substring_index(artist, ',', 1)
              WHEN @curType 
              THEN @curRow := @curRow + 1 
              ELSE @curRow := 1 AND @curType := Substring_index(artist, ',', 1) END
            ) + 1 AS rank,
            id,
            name, 
            artist,
      Substring_index(artist, ',', 1),
      days,
      type,
      autoplay
  FROM      spotify_master ,
            (SELECT @curRow := 0, @curType := '') r
  ORDER BY  Substring_index(artist, ',', 1), last_played ASC) ManualRankingDueToLackOfPartitioning 
WHERE rank=1 
LIMIT 50 

What I do is basically parition by Substring_index(artist, ',', 1) and order by last_played, in this way showing the oldest played song for each artists and picking 50 artists.

If you want to prioritise the oldest last_played songs you just need to add an ORDER BY

  SELECT * FROM 
    (SELECT    ( 
                  CASE Substring_index(artist, ',', 1)
                  WHEN @curType 
                  THEN @curRow := @curRow + 1 
                  ELSE @curRow := 1 AND @curType := Substring_index(artist, ',', 1) END
                ) + 1 AS rank,
                id,
                name, 
                artist,
          Substring_index(artist, ',', 1),
          days,
          type,
          autoplay,
          last_played
      FROM      spotify_master ,
                (SELECT @curRow := 0, @curType := '') r
      ORDER BY  Substring_index(artist, ',', 1), last_played ASC) ManualRankingDueToLackOfPartitioning 
    WHERE rank=1 
    ORDER BY last_played ASC
    LIMIT 50 
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement