Skip to content
Advertisement

Is this a perfect SQL query to fetch random items

I have this below query to fetch books of a random author and order by the books last modified date. Because I am new to SQL something tells me this query can be improved and simplified. I would appreciate the help of the SQL experts, if this is already simplified then I will consider myself an SQL expert :).

Table structures

looking for a output like

Advertisement

Answer

You query is not taking a random sample of authors. It is taking a random sample from book_authors, which is different. In particular, authors with more books are more likely to be chosen. Consider:

A “random” sample from this table is going to choose 1 three times more often than 2. Such biased results are not how I interpret “fetch books of a random author”.

So, you should be doing the sampling from the authors table. The structure of your query is reasonable (see below), although no aggregation is needed:

Now, whether this actually does what you want is interesting. As written, Postgres has two ways of running this:

  • Running the subquery once for the query. That is getting one random author and using it throughout.
  • Running the subquery once per comparison in the WHERE.

The subquery would usual return different results each time it is run (technically, the subquery is “nondeterministic”). So, the second method would have a different random author for each comparison — and that is not the results you intend. In practice, the Postgres optimizer (I think) ignores the indeterminate nature of the subquery and executes it only once.

To fix this potential problem, it is better to move the logic to the FROM clause; then it is evaluated only once:

Note: The use of ORDER BY random() LIMIT 1 to fetch one random row works. However, for anything but small tables it is quite expensive performance-wise. I wouldn’t get in the habit of using that.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement