I have a database with a schema similar to:
CREATE TABLE IF NOT EXISTS song_artist ( song INTEGER, artist INTEGER, FOREIGN KEY("song") REFERENCES "Songs"("song_id") ON DELETE CASCADE, FOREIGN KEY("artist") REFERENCES "Artists"("artist_id") ON DELETE CASCADE ); CREATE TABLE IF NOT EXISTS Artists ( "artist_id" INTEGER PRIMARY KEY, "artist_name" TEXT ); CREATE TABLE IF NOT EXISTS Songs ( "song_id" INTEGER PRIMARY KEY, "song_name" TEXT );
Say they have the following data in them:
INSERT INTO Artists (artist_name) VALUES ("Bob"); INSERT INTO Artists (artist_name) VALUES ("Shawn"); INSERT INTO Songs (song_name) VALUES ("one song"); INSERT INTO Songs (song_name) VALUES ("another song"); INSERT INTO song_artist (artist, song) VALUES (1, 1); INSERT INTO song_artist (artist, song) VALUES (2, 1); INSERT INTO song_artist (artist, song) VALUES (2, 2);
Artists
:
artist_id | artist_name |
---|---|
1 | Bob |
2 | Shawn |
Songs
:
song_id | song_name |
---|---|
1 | one song |
2 | another song |
song_artist
:
song | artist |
---|---|
1 | 1 |
1 | 2 |
2 | 2 |
Here, both Bob
and Shawn
are involved in one song
, and only Shawn
is involved in another song
. Say I wanted to get the names of the songs where Shawn
is involved, but not Bob
. So, in this case it would be only another song
. However, as to be expected, this query doesn’t work as it gives me both the songs:
SELECT song_name FROM Songs JOIN Artists ON Artists.artist_id = song_artist.artist JOIN song_artist ON song_artist.song = Songs.song_id WHERE Artists.artist_name = "Shawn" AND Artists.artist_name != "Bob"
song_name | |
---|---|
1 | one song |
2 | another song |
This would be the result of the join table song_artist
having multiple records for the same song, so even if a record with Bob
‘s ID is excluded, there’s still another record without Bob
‘s ID that gets included in the final result.
Here’s how I’ve found a workaround to this:
SELECT song_name FROM Songs JOIN Artists ON Artists.artist_id = song_artist.artist JOIN song_artist ON song_artist.song = Songs.song_id WHERE Artists.artist_name = "Shawn" AND Songs.song_id NOT IN (SELECT song_id FROM Songs JOIN Artists ON Artists.artist_id = song_artist.artist JOIN song_artist ON song_artist.song = Songs.song_id WHERE Artists.artist_name == "Bob")
song_name | |
---|---|
1 | another song |
Now, while this works as expected, as one can imagine, this becomes a long winded query once you have a lot more than a single join table. I also assume this is a very un-optimized query. My question is, what is a better way to do this kind of query?
Advertisement
Answer
First, add a WHERE
clause in your query which filters out any other artists than Shawn and Bob.
Then use conditional aggregation and in the HAVING
clause add the condition that Bob does not participate in the song (which leaves only Shawn):
SELECT s.* FROM Songs s JOIN song_artist sa ON sa.song = s.song_id JOIN Artists a ON a.artist_id = sa.artist WHERE a.artist_name IN ('Shawn', 'Bob') GROUP BY s.song_id HAVING SUM(a.artist_name = 'Bob') = 0;
See the demo.