SQL WHERE IN () Performance Optimization

Question

I checked several questions for a duplicate but I couldn't find one. I am dealing with three tables, the first "articles", the second "tags", and the third "article_tags" ...

Accepted Answer

You can get everything you need in a single query:SELECT TOP (5) a.IDFROM article AS aWHERE a.publish_flag = 1 AND a.publish_date < DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME())AND a.Id <> @ID AND EXISTS ( SELECT 1 FROM article_tags AS at WHERE at.ArticleID = a.ID AND EXISTS ( SELECT 1 FROM article_tags AS at2 WHERE at2.ArticleID = @ID AND at2.TagID = at.TagID ) )ORDER BY a.publish_date DESC; I have assumed that you were originally using TOP 4 for tags as an arbitrary limit for performance reasons, as there was no sort. So have ommitted this. I have also changed your predicate from:SYSDATETIME() > DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), a.publish_date)toa.publish_date < DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME())The meaning is the same, however by calling the DATEADD/DATEDIFF functions on the run time constants SYSDATETIME() and UTCDATETIME() it means this calculation is only done once, rather than once for every a.publish_date meaning any index on publish_date is now usable.The other change I have made is to use EXISTS rather than JOIN to link articles to tags. This will avoid duplicates, however it would be equally trivial to remove duplicates using GROUP BY e.g.SELECT TOP (5) a.IDFROM article AS a INNER JOIN article_tags AS at ON at.ArticleID = a.IDWHERE a.publish_flag = 1 AND a.publish_date < DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME())AND a.Id <> @ID AND EXISTS ( SELECT 1 FROM article_tags AS at2 WHERE at2.ArticleID = @ID AND at2.TagID = at.TagID )GROUP BY a.ID, a.publish_dateORDER BY a.publish_date DESC;A few side notes as well that don’t directly relate to the above answer, but are still worth mentioning.The Implicit join syntax you are using was replaced 28 years ago by ANSI 92 explicit join syntax. There are plenty of good reasons to switch to the “new” syntax, so I would advise you do.Parameterised queries are about more than just SQL Injection attacks (including but not limited to type safety and query plan caching), so just because your input isn’t coming from a user doesn’t mean you shouldn’t use parametrized queries.I would strongly advise against re-using your SqlClient objects (SqlConnection, SqlCommand), create a new object for each use, and dispose of it correctly when done.

Advertisement

Answer