Skip to content
Advertisement

Django: Really slow RunPython migration on related models

I am trying to optimise the speed of a migration on a large table (250K objects). The goal is to add to each row a user field based on the user of an object related to this row:

I tried using an F expression but sadly django won’t allow relations in them. Note that I am mostly a beginner in SQL 🙂

## models.py
class Band(models.Model):
    user = models.ForeignKey(
        settings.AUTH_USER_MODEL,
        on_delete=models.CASCADE,
        null=True,
        default=None)
    ...

class Album(models.Model):
    band = models.ForeignKey(
        Band
        on_delete=models.CASCADE,
        null=True,
        default=None)

    user = models.ForeignKey(
        settings.AUTH_USER_MODEL,
        on_delete=models.CASCADE,
        null=True,
        default=None)
    ...

## 0035_album_add_user.py
def forwards_add_user(apps, schema_editor):
    Album = apps.get_model('band', 'Album')
    db_alias = schema_editor.connection.alias

    albums = Album.objects 
        .using(db_alias) 
        .filter(band__isnull=False) 
        .select_related('band', 'band__user')

    for each in albums:
        each.user = each.band.user

    Album.objects 
        .using(db_alias) 
        .bulk_update(albums, ['user'])

class Migration(migrations.Migration):
    dependencies = [ ... ]

    operations = [
        migrations.RunPython(forwards_add_user, reverse_add_user),
    ]

This migrations currently takes an hour on my local with 100% usage on the database for the whole duration. I have a better CPU thank my production database so this worries me a lot, downing the production database for an hour is not really an option.

I’m looking at two different solutions both of which I have no idea of how to accomplish : – Optimize more the python code to make it run faster – Throttle the database’s CPU consummation during the migration so it stays available. Both being the best case scenario 🙂

I’m running Python3.6.9 with Django2.2.9 and PostgreSQL10.6 on RDS in production an in a Docker in local.

Thanks !

Advertisement

Answer

Use subquery to update field, something similar to following as you cannot use F() function in update joining tables

Album.objects.update(
    user=Subquery(
        Band.objects.filter(
            id=OuterRef('band')
        ).values('user')[:1]
    )
)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement