Is this a perfect SQL query to fetch random items

Question

I have this below query to fetch books of a random author and order by the books last modified date. Because I am new to SQL something tells me this query can be improved and simplified. I would appreciate the help of the SQL experts, if this is already simplified then I will consider myself an SQL expert :). Table

Accepted Answer

You query is not taking a random sample of authors.  It is taking a random sample from book_authors, which is different.  In particular, authors with more books are more likely to be chosen.  Consider:author_id    book_id    1           1    1           2    1           3    2           4A &#8220;random&#8221; sample from this table is going to choose 1 three times more often than 2.  Such biased results are not how I interpret &#8220;fetch books of a random author&#8221;.So, you should be doing the sampling from the authors table.  The structure of your query is reasonable (see below), although no aggregation is needed:SELECT b.id, bc.authorFROM book_authors bc INNER JOIN     books b     ON bc.book = b.id WHERE bc.author = (SELECT a.id FROM authors id ORDER BY random() LIMIT 1) AND      b.status = 'GOOD'ORDER BY b.modified_date DESC;Now, whether this actually does what you want is interesting.  As written, Postgres has two ways of running this:Running the subquery once for the query.  That is getting one random author and using it throughout.Running the subquery once per comparison in the WHERE.The subquery would usual return different results each time it is run (technically, the subquery is &#8220;nondeterministic&#8221;).  So, the second method would have a different random author for each comparison &#8212; and that is not the results you intend.  In practice, the Postgres optimizer (I think) ignores the indeterminate nature of the subquery and executes it only once.To fix this potential problem, it is better to move the logic to the FROM clause; then it is evaluated only once:SELECT b.id, bc.authorFROM (SELECT a.id      FROM authors id      ORDER BY random()      LIMIT 1     ) a INNER JOIN     book_authors bc     ON bc.author = a.id INNER JOIN     books b     ON bc.book = b.id WHERE b.status = 'GOOD'ORDER BY b.modified_date DESC;Note:  The use of ORDER BY random() LIMIT 1 to fetch one random row works.  However, for anything but small tables it is quite expensive performance-wise.  I wouldn&#8217;t get in the habit of using that.

Advertisement

Answer