I am trying to optimise the speed of a migration on a large table (250K objects). The goal is to add to each row a user field based on the user of an object related to this row:
I tried using an F expression but sadly django wonβt allow relations in them. Note that I am mostly a beginner in SQL π
## models.py
class Band(models.Model):
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
null=True,
default=None)
β
class Album(models.Model):
band = models.ForeignKey(
Band
on_delete=models.CASCADE,
null=True,
default=None)
β
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
null=True,
default=None)
β
## 0035_album_add_user.py
def forwards_add_user(apps, schema_editor):
Album = apps.get_model('band', 'Album')
db_alias = schema_editor.connection.alias
β
albums = Album.objects
.using(db_alias)
.filter(band__isnull=False)
.select_related('band', 'band__user')
β
for each in albums:
each.user = each.band.user
β
Album.objects
.using(db_alias)
.bulk_update(albums, ['user'])
β
class Migration(migrations.Migration):
dependencies = [ ]
β
operations = [
migrations.RunPython(forwards_add_user, reverse_add_user),
]
β
This migrations currently takes an hour on my local with 100% usage on the database for the whole duration. I have a better CPU thank my production database so this worries me a lot, downing the production database for an hour is not really an option.
Iβm looking at two different solutions both of which I have no idea of how to accomplish : β Optimize more the python code to make it run faster β Throttle the databaseβs CPU consummation during the migration so it stays available. Both being the best case scenario π
Iβm running Python3.6.9 with Django2.2.9 and PostgreSQL10.6 on RDS in production an in a Docker in local.
Thanks !
Advertisement
Answer
Use subquery to update field, something similar to following as you cannot use F() function in update joining tables
Album.objects.update(
user=Subquery(
Band.objects.filter(
id=OuterRef('band')
).values('user')[:1]
)
)
β