I checked several questions for a duplicate but I couldn’t find one. I am dealing with three tables, the first “articles”, the second “tags”, and the third “article_tags” which contains two foreign keys “articleid” and “tagid”. The table “article_tags” relates articles sharing the same tag together.
My SQL query is as follows:
'Getting 4 tag ids from table article_tags MyCommand = New SqlCommand("SELECT TOP (4) at.tagid FROM articles a, article_tags at WHERE at.articleid=a.id AND at.articleid=@id", myconnection) MyCommand.Parameters.Add("@id", SqlDbType.Int).Value = intID Dim daQuery = New SqlDataAdapter(MyCommand) Dim dtArticleTags = New DataTable daQuery.Fill(dtArticleTags) Dim strTagIds As String = "" If dtArticleTags.Rows.Count > 0 Then 'Save all tags id's in a string "strTags" '--------------------------------------------- Dim i As Integer = 0 For Each myrow As DataRow In dtArticleTags.Rows 'Store the Tag Ids in a string strTagIds += myrow.Item("tagid").ToString If i <> dtArticleTags.Rows.Count - 1 Then strTagIds += "," End If i += 1 Next '--------------------------------------------- 'Getting 5 related articles sharing the same tags (Note: I know that strTagIds is not parametrized but this can never be inputted by a user) MyCommand = New SqlCommand("SELECT TOP(5) at.articleid FROM article_tags at, articles a WHERE a.id=at.articleid AND a.publish_flag=1 AND SYSDATETIME() > DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), a.publish_date) AND at.tagid IN (" & strTagIds & ") AND at.articleid<>@id Group by at.articleid, a.publish_date ORDER BY a.publish_date DESC", myconnection) MyCommand.Parameters.Add("@id", SqlDbType.Int).Value = intID daRelated = New SqlDataAdapter(MyCommand) daRelated.Fill(dsArticle, "related") End If
I feel that the above query is taking time to load especially with the “article_tags” table growing bigger. I am using SQL Express Edition and the tables are indexed and I would like to know if this can be done in a better way to enhance performance.
Attached is the Execution Plan:
Running the below query as suggested by @SuperPoney, returned the same results, and below is the execution plan:
WITH top_art AS ( SELECT TOP (4) at.tagid FROM articles a, article_tags at WHERE at.articleid=a.id AND at.articleid=@id ) SELECT TOP(5) at.articleid FROM article_tags at, articles a, top_art WHERE a.id=at.articleid AND a.publish_flag=1 AND SYSDATETIME() > DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), a.publish_date) AND at.tagid=top_art.tag_id AND at.articleid<>@id Group by at.articleid, a.publish_date ORDER BY a.publish_date DESC
Advertisement
Answer
You can get everything you need in a single query:
SELECT TOP (5) a.ID FROM article AS a WHERE a.publish_flag = 1 AND a.publish_date < DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME()) AND a.Id <> @ID AND EXISTS ( SELECT 1 FROM article_tags AS at WHERE at.ArticleID = a.ID AND EXISTS ( SELECT 1 FROM article_tags AS at2 WHERE at2.ArticleID = @ID AND at2.TagID = at.TagID ) ) ORDER BY a.publish_date DESC;
I have assumed that you were originally using TOP 4
for tags as an arbitrary limit for performance reasons, as there was no sort. So have ommitted this. I have also changed your predicate from:
SYSDATETIME() > DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), a.publish_date)
to
a.publish_date < DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME())
The meaning is the same, however by calling the DATEADD
/DATEDIFF
functions on the run time constants SYSDATETIME()
and UTCDATETIME()
it means this calculation is only done once, rather than once for every a.publish_date
meaning any index on publish_date
is now usable.
The other change I have made is to use EXISTS
rather than JOIN
to link articles to tags. This will avoid duplicates, however it would be equally trivial to remove duplicates using GROUP BY
e.g.
SELECT TOP (5) a.ID FROM article AS a INNER JOIN article_tags AS at ON at.ArticleID = a.ID WHERE a.publish_flag = 1 AND a.publish_date < DATEADD(mi, DATEDIFF(mi, GETUTCDATE(), SYSDATETIME()), SYSDATETIME()) AND a.Id <> @ID AND EXISTS ( SELECT 1 FROM article_tags AS at2 WHERE at2.ArticleID = @ID AND at2.TagID = at.TagID ) GROUP BY a.ID, a.publish_date ORDER BY a.publish_date DESC;
A few side notes as well that don’t directly relate to the above answer, but are still worth mentioning.
- The Implicit join syntax you are using was replaced 28 years ago by ANSI 92 explicit join syntax. There are plenty of good reasons to switch to the “new” syntax, so I would advise you do.
- Parameterised queries are about more than just SQL Injection attacks (including but not limited to type safety and query plan caching), so just because your input isn’t coming from a user doesn’t mean you shouldn’t use parametrized queries.
- I would strongly advise against re-using your SqlClient objects (SqlConnection, SqlCommand), create a new object for each use, and dispose of it correctly when done.