Performance and Scaling¶
Here we overview some of the main things to consider regarding performance
and scale. It’s useful to consider some of these recommendations before using
django-pghistory
because some changes can involve tricky migrations if
done after the fact.
Trigger Execution¶
Postgres triggers
are used to store events. When using a pghistory.Snapshot
tracker, for example,
this means a snapshot of your model is saved every time it is created or updated.
In other words, an INSERT
statement for the event table is ran alongside the
query that creates or updates data on the tracked model.
While this will have a performance impact when creating or updating models, keep in mind that triggers run in the database and do not require expensive round trips from the application. This can result in substantially better performance when compared to traditional history tracking solutions that are implemented in the application and do separate round trips.
When triggers execute, the following happens:
By default, context will be updated or inserted into the main
Context
model’s table.A new entry will be made in the event table.
The following sections discuss considerations to help the performance of both operations in the trigger.
Ignoring Context¶
If you don’t want to attach context to events, set settings.PGHISTORY_CONTEXT_FIELD
to
None
. All event models won’t include the pgh_context
field, and the associated
context operations won’t happen.
Note
You can still track context on a per-event-model basis with the context_field
argument to pghistory.track
.
Denormalizing Context¶
As discussed in the Denormalizing Context section, you can avoid doing an update or insert on the main context table and instead duplicate the context data on the event model. This not only reduces the overhead of maintaining an index to the context table from the event table, but it also reduces the contention on a shared context table among multiple event triggers.
Large amounts of events should also be taken into consideration too. As the context table grows, so will the indices and the associated time it takes to update the index. Denormalizing context reduces the overhead at the expense of more storage. It also makes it easier to partition your event tables.
Indices and Foreign Key Constraints¶
By default, all event tables use unconstrained foreign keys. Event tables also drop all indices except for foreign keys.
If you don’t plan to query or join your event models, you may not need to index the foreign keys. You could instead set the default foreign key behavior to:
PGHISTORY_FOREIGN_KEY_FIELD = pghistory.ForeignKey(db_index=False)
Keep in mind that this could negatively impact the performance of the admin integration. If you want to continue using core admin functionality, be sure to index the object field, otherwise the foreign key settings will override it:
PGHISTORY_OBJ_FIELD = pghistory.ObjForeignKey(db_index=True)
If you prefer that your foreign keys have referential integrity, do:
PGHISTORY_FOREIGN_KEY_FIELD = pghistory.ForeignKey(
db_constraint=True,
on_delete=pghistory.DEFAULT
)
Remember that there will be a performance hit for maintaining the foreign key constraint, and Django will also have to cascade delete more models.
The Events
Proxy Model¶
The pghistory.models.Events
proxy model uses a common table expression (CTE) across event tables
to query an aggregate view of data. Postgres 12 optimizes filters on CTEs, but you may experience
performance issues if trying to directly filter Events
on earlier versions of Postgres.
See Aggregating Events and Diffs for more information on how to use the special model manager methods to more efficiently filter events.