Random ordering via order_by("?")

Using order_by('?') can be very inefficient if you have lots of rows in the table. Moving the randomness to the application layer will probably give significant a performance improvement.

Using order_by('?') can be very slow as to achieve random ordering the database has to:

  • Generate a random number for each row.
  • Scan the numbers for the smallest one.
  • Read the one row we're interested in.

Let that sink in: generating random numbers is slow and scanning is not quick. We're doing that for every row in the table just to read one row! If there are a lot of rows then expect a performance impact. In short, databases are not good at random. Applications are though. So consier splitting responsibilities between the database and the application:

  • At database level determine the number of rows via objects.count()
  • At application level get a random number between 0 and that count.
  • At database level select the row at that index.

This will mean two database reads are performed, but that will be significantly quicker than the database copying the table and ordering all the rows randomly.

If we spot this issue in your GitHub pull request we give this advice:

views.pyperformancemedium
+
from random import randint
+
def get_random_item():
-
    return MyModel.objects.order_by('?')[0]
+
    index = randint(0, MyModel.objects.count() -1)
+
    return model.objects.all()[index]

Using order_by('?') can be very inefficient if you have lots of rows in the table. Moving the randomness to the application layer will probably give significant a performance improvement.

Read more
Protect your pull requests from over 40 types of common Django technical debts with our GitHub code review bot.

Configuring this check

Django Doctor will run this check by default. No configuration is needed but the check can be turned on/off using check code inefficient-order-by-random in your pyproject.toml file.

Read more about configuring Django Doctor.