Tutorial¶
There are two main components of django-pghistory
:
Event. Events are configured to track various historical events that happen in an application.
django-pghistory
has several utilities for automatically tracking events based off of changes in the database. Events range from snapshotting all model changes, tracking specific events based on changes in the database, and still allowing users to manually track events that cannot be expressed at a database level. Users have flexibility in how event models are structured.Context. An application can track as many events across as many models as desired, which can result in multiple tables and events for even a single request.
django-pghistory
provides the ability to contextualize all of these events and group them under the same context. Along with this, the application can also add as much free-form metadata to the context of events as desired.django-pghistory
comes with middleware that will automatically group events in a request under the same context and annotate additional information about the events (e.g. the URL and the authenticated user).
These two concepts, along with advanced usage examples, will be covered in more detail over the tutorial.
Tracking Snapshots of Models When Fields Change¶
We dive into django-pghistory
’s event tracking by first showing
how to configure it to track any changes to relevant models. After
these examples, we’ll dive deeper into how to configure django-pghistory
to automatically track custom events and also show examples of how
to manually track events in an application.
The pghistory.track
decorator is the primary interface for
configuring event tracking. Using this decorator not only configures
event tracking for a model, but it will also create another tracking
model dynamically that tracks all changes.
For example, let’s say that we have a TestModel
like so:
from django.db import models
class TestModel(models.Model):
int_field = models.IntegerField()
char_field = models.CharField(max_length=16)
Tracking model changes can be configured with:
import pghistory
@pghistory.track(
pghistory.Snapshot('test_model.snapshot')
)
class TestModel(models.Model):
...
Here’s an overview of what’s going on above:
We’ve registered the
test_model.snapshot
event to be tracked. This is apghistory.Snapshot
event that will snapshot the model when it is created and anytime a field is updated.Tracking happens automatically at the database level via Postgres Triggers. Triggers are installed when migrations run.
The tracked changes are stored in an automatically-generated tracking model. By default, the tracking model has a nearly identical structure as the model being tracked, along with some additional metadata inserted by
django-pghistory
. These models will appear when callingmanage.py makemigrations
, and other parameters topghistory.track
can be configured to limit which fields are tracked among other things. Every event tracked in this table will have a label oftest_model.snapshot
.
For this particular scenario, the automatically-generated event model looks like this:
class TestModelEvent(pghistory.models.Event):
pgh_obj = models.ForeignKey(TestModel, on_delete=models.DO_NOTHING, related_name='event')
pgh_label = models.TextField()
pgh_context = models.ForeignKey('pghistory.Context', null=True, on_delete=models.DO_NOTHING, related_name='+')
pgh_created_at = models.DatetimeField(auto_now_add=True)
id = models.IntegerField()
int_field = models.IntegerField()
char_field = models.CharField(max_length=16)
When TestModel
is inserted or updated, a TestModelEvent
is created with the new values of the TestModel
object.
Events fire only when fields change, so no event will be stored if
any empty update happens. This
provides a complete history of all of the values of that particular model.
One can query all updates to the model with test_model_instance.event.all()
.
The pgh_obj
is a foreign key that references the original
object. This foreign key can be modified via the fk_obj
parameter
to pghistory.track
. See pghistory.track
for more
information.
The pgh_context
is a foreign key that points to context about the historical
event. The context object, along with tracking free-form metadata from the app,
allows grouping of similar events. More on this later.
Since the event model is automatically tracking fields, it will also be migrated whenever the original model is changed. It is up to the user to write appropriate data migrations for these circumstances.
Note
You may have noticed the use of DO_NOTHING
on the deletion of
foreign keys. By default, all django-pghistory
event models
create foreign keys that are unconstrained, even for the foreign keys
of the tracked model. This helps ensure
the tracked values are accurate for the point in time at which
they were tracked and that Django does not try to modify them
during deletions. It is up to the user to handle referential integrity
errors from tracking models as a result or to override the generated
tracking models if referential integrity is important. More on
this in a later section.
django-pghistory
provides the ability to specify tracking for only
a subset of fields, or even potentially having different event models
for different field updates. For example, this will create two
different event models: one for changes to int_field
and one
for changes to char_field
:
@pghistory.track(pghistory.Snapshot('test_model.int_field_snapshot'), fields=['int_field'])
@pghistory.track(pghistory.Snapshot('test_model.char_field_snapshot'), fields=['char_field'])
class TestModel(models.Model):
...
In the above, two different tracking models would be created with the following structure:
class TestModelIntFieldEvent(pghistory.models.Event):
pgh_obj = models.ForeignKey(TestModel, on_delete=models.DO_NOTHING, related_name='int_field_event')
pgh_label = models.TextField()
pgh_context = models.ForeignKey('pghistory.Context', null=True, on_delete=models.DO_NOTHING, related_name='+')
pgh_created_at = models.DatetimeField(auto_now_add=True)
int_field = models.IntegerField()
class TestModelCharFieldEvent(pghistory.models.Snapshot):
pgh_obj = models.ForeignKey(TestModel, on_delete=models.DO_NOTHING, related_name='char_field_event')
pgh_label = models.TextField()
pgh_context = models.ForeignKey('pghistory.Context', null=True, on_delete=models.DO_NOTHING, related_name='+')
pgh_created_at = models.DatetimeField(auto_now_add=True)
char_field = models.CharField(max_length=16)
The fields
argument to pghistory.track
can take any combination of
fields that should be snapshot when any field in the group is changed.
Tracking Specific Model Events¶
Oftentimes changes of specific fields to specific values directly
corresponds to events in an application. For example, the creation of a
new user could mean that a new user has signed up. The change of a status
field of a model might indicate a model has progressed to a new stage.
Similar to the pghistory.Snapshot
event, django-pghistory
comes
with some utilities for automatically storing events based on conditional
changes in the database. For example, let’s take our previous example
of storing an event when a user is created:
@pghistory.event(
pghistory.AfterInsert('user.create'),
)
class User(models.Model):
username = models.CharField(max_length=64)
In the above, we’ve registered a pghistory.AfterInsert
event. When
an insert happens, an event will be created with the label of user.create
.
The event model is generated in an identical way to the previous snapshot
examples. By default, every value of the model will be snapshot alongside
the event label. If it isn’t important to have all of this additional
information alongside the event, one can override this behavior with the
fields
argument. For example, the following will only track the username
field when a user.create
event happens:
@pghistory.track(
pghistory.AfterInsert('user.create'),
fields=['username']
)
class User(models.Model):
username = models.CharField(max_length=64)
password = models.PasswordField()
django-pghistory
comes with five utility classes for automatically
creating events based on changes in rows:
pghistory.AfterInsert
: For creating an event based on the fields after an insert. Values from theNEW
row (from the Postgres trigger) will be stored.pghistory.BeforeUpdate
: For creating an event based on the fields before the update. Values from theOLD
row will be stored.pghistory.AfterUpdate
: For creating an event based on the fields after an update. Values from theNEW
row will be stored.pghistory.BeforeDelete
: For creating an event based on the fields before a delete. Values from theOLD
row will be stored.pghistory.AfterInsertOrUpdate
: A helper to create an event after an insert or an update based on the rows after the operation. Values from theNEW
row will be stored.
Similar to snapshots, these five event types directly map to Postgres triggers
that are installed in the database, meaning
that they all can be given a condition
argument to specify when they
should be fired. It is up to the application developer to understand
when it makes sense to snapshot the OLD
or the NEW
row
when using pghistory.BeforeUpdate
or pghistory.AfterUpdate
.
Manually Tracking Events¶
Sometimes it is not possible to express an event based on a series
of changes to a model. pghistory.create_event
can be used for
circumstances where the event needs to be manually instrumented.
These events must still be declared with the model though, for
example:
@pghistory.track(
pghistory.Event('user.create'),
fields=['username']
)
class User(models.Model):
username = models.CharField(max_length=64)
password = models.PasswordField()
In the above, we have defined the user.create
event like before, but
it will not automatically be created. We will have to instrument our
code to create the event before a user is created:
user = User.objects.create(...)
pghistory.create_event(user, label='user.create')
Note
Manually-created events will still be linked with context if context tracking has been enabled. More on context tracking in a later section.
Creating a Custom Event Model¶
django-pghistory
also provides the ability for the user to create
a custom event model if one needs to override field declarations
or add custom attributes to fields (e.g. an index).
pghistory.get_event_model
is used like so:
class TestModel(models.Model):
...
class MySnapshotModel(pghistory.get_event_model(
TestModel,
pghistory.Snapshot('test_model.snapshot'),
fields=['int_field'],
)):
pass
The call signature for pghistory.get_event_model
is almost
identical to pghistory.track
with the exception that the
tracked model is the first argument.
Grouping Changes and Metadata¶
By default, all django-pghistory
event models come with
a pgh_context
foreign key that points to the pghistory.models.Context
object associated with the event. The pghistory.models.Context
model has a UUID id
primary key field and a metadata
JSON field.
In order to group changes under the same context, use pghistory.context
:
with pghistory.context(key='val'):
# Do changes here...
When using pghistory.context
, all contained changes will point to the
same Context
object. The Context
object in this example will also
have {"key": "val"}
in its metadata.
Context can be added anywhere in an application. For example, imagine one
has a core system of their application that imports data and they want
to add context about a file that was imported to any change that happens.
This can be done by entering pghistory.context(additional='metadata')
before the import happens and attaching additional metadata.
The metadata will be accumulated into the shared Context
object associated
with all changes since the root pghistory.context
call happened.
Normally an application will group changes together at the following levels of granularity:
Request. Changes for an entire POST request can be grouped together by using the middleware in
pghistory.middleware.HistoryMiddleware
. The default middleware attaches the authenticated user and the URL of the request to the context metadata. Note: be sure to add the middleware afterdjango.contrib.auth
in order to track the correct user.Management Command. If users run a management command outside of a request, one can instrument
manage.py
withpghistory.context
to apply the same context for all changes in the management command.Task. When running periodic or asynchronous tasks, one can instrument the core task objects to contextualize all changes in the same task run.
Note
If one does not wrap database changes in pghistory.context
, the
associated events will have a pgh_context
set to None
.
If one directly connects to the database and runs raw SQL, for example,
the changes would still be tracked, but there would be missing context
as to why the change happened.
django-pghistory
context is meant to group together events
and bring more clarity around why a particular event happened in an
application. It is ultimately up to the application developers to
decide what core sets of free-form metadata should be tracked alongside
structured events.
Advanced Usage Examples¶
Tracking Third-Party Model Changes¶
django-pghistory
can track changes to third-party models like Django’s
User
model by using a proxy model. Below we show how to track
the default Django User
model:
from django.contrib.auth.models import User
import pghistory
# Track the user model, excluding the password field
@pghistory.track(
pghistory.Snapshot('user.snapshot'),
exclude=['password'],
)
class UserProxy(User):
class Meta:
proxy = True
Note
Although it’s possible to track the models directly
with pghistory.track(...)(model_name)
, doing so would
create migrations in a third-party app. Using proxy models
ensures that the migration files are created inside your
project.
Tracking Many-To-Many Events¶
Events in many-to-many fields, such as user groups or permissions, can be configured by tracking the “through” model of the many-to-many relationship. When creating a many-to-many relationship, Django automatically generates a “through” model that is populated based on changes to the many-to-many field (and one can override this behavior with their own custom “through” model).
django-pghistory
’s tracking functions can be used as
a decorator on any custom “through” models, and a proxy model
can be created for any default “through” models.
Here we show an example of how to track
group “add” and “remove” events for users:
from django.contrib.auth.models import User
import pghistory
# Track add and remove events to user groups
@pghistory.track(
pghistory.AfterInsert('group.add'),
pghistory.BeforeDelete('group.remove'),
obj_fk=None,
)
class UserGroups(User.groups.through):
class Meta:
proxy = True
Two events are set up to track additions and deletions to the “through” model, which will track every time a user is added or removed from a group.
Note
Django does not allow foreign keys to auto-generated “through” models.
Setting obj_fk=None
will create an event model that does not contain
a reference to the original “through” model.
Assuming one has created and executed migrations, the following code will show tracked changes to user group relationships:
# Note: this is pseudo-code
>>> user = User.objects.create_user('username')
>>> group = Group.objects.create(name='group')
>>> user.groups.add(group)
>>> user.groups.remove(group)
>>> print(my_app_models.UserGroupsEvent.objects.values('pgh_label', 'user', 'group'))
[
{'user': user.id, 'group': group.id, 'pgh_label': 'group.add'},
{'user': user.id, 'group': group.id, 'pgh_label': 'group.remove'},
]
Configuring Context Collection¶
When using pghistory.middleware.HistoryMiddleware
, all GET, POST, PUT,
PATCH, and DELETE requests
will automatically be tracked with pghistory.context
and events will
reference the same context object in their associated models (i.e.
the pgh_context
foreign key). By default, the
authenticated user is added as the user
key and the URL is added
as the url
key.
Note
Packages like django-rest-framework
add the user to the request
object in the view layer. pghistory.middleware.HistoryMiddleware
modifies the Django request object so that any changes to request.user
in the view lifecycle will be captured.
Users, however, can enter pghistory.context
at any point in their
application code to attach more information to the context.
For example, this will attach an is_import
flag whenever an import
of data is triggered:
import pghistory
@pghistory.context(is_import=True)
def import_data():
...
Note that is_import=True
is attached to the current context. Events
will be grouped together under the same context based on the highest level
at which pghistory.context
was started. So, for example, if an import
is issued in a request and the middleware is configured, all changes in
the request will have an is_import
flag in their context. If the
middleware was not enabled and this was the first time the application entered
pghistory.context
, only changes inside of this function would be grouped
under the same context.
If one desires to only add context if a parent function has already entered
pghistory.context
(e.g. the middleware), one can call pghistory.context
directly:
pghistory.context(my='context')
The context from the above example will not be added if a parent process
has not entered pghistory.context
.
It is up to the application developer to determine the levels of granularity at which history should be grouped together and how this will be used in their application. A general rule of thumb is to group changes by web requests. Things outside of web requests, such as Celery tasks or management commands, can be instrumented at their own levels individually.
Celery Tasks¶
One can override the Celery base task like so to group all task events under the same context with the same task name:
import celery
import pghistory
class Task(celery.Task):
def __call__(self, *args, **kwargs):
with pghistory.context(task=self.name):
return super().__call__(*args, **kwargs)
# Override the celery task decorator for your application
app = create_celery_app('my-app')
task = app.task(base=Task)
Management Commands¶
To capture all events issued under a management command, one
can instrument manage.py
like so:
#!/usr/bin/env python
import contextlib
import sys
import pghistory
if __name__ == "__main__":
if (
len(sys.argv) > 1
and not sys.argv[1].startswith('runserver')
):
# Group history context under the same management command if
# we aren't running a server.
history_context = pghistory.context(command=sys.argv[1])
else:
# Otherwise, history will be grouped together every request
# in the middleware
history_context = contextlib.ExitStack()
import configurations.management
with history_context:
configurations.management.execute_from_command_line(sys.argv)
In the above, we ignore tracking context for runserver
commands. Otherwise
every single change in a development session would be grouped under the
same context.
Note
This example uses django-configurations
for settings management. The default manage.py
generated by Django
will look different, but pghistory
instrumentation will be the
same.