Aggregating Events and Diffs

Tracking events across models results in multiple event tables, creating challenges to query the the history for a single object or other objects that reference it. Similarly, obtaining diffs to see what actually changed can be cumbersome. The pghistory.models.Events proxy model makes both of these use cases easy.

Using the Events model

The pghistory.models.Events proxy model treats all event tables as a unified view that can be queried as a normal Django model. It comes with the following fields:

  • pgh_slug: The unique identifier across all event tables.

  • pgh_model: The event model label formatted as “app_label.ModelName”.

  • pgh_id: The primary key of the event.

  • pgh_created_at: When the event was created.

  • pgh_label: The event label.

  • pgh_data: The raw data of the event.

  • pgh_diff: The diff against the previous event of the same label.

  • pgh_context_id: The context UUID.

  • pgh_context: The context JSON associated with the event.

  • pgh_obj_model: The object model.

  • pgh_obj_id: The primary key of the object.

Let’s create a snapshot tracker for a User model and query the associated events:

@pghistory.track(pghistory.Snapshot())
class User(models.Model):
    username = models.CharField()
    name = models.CharField()

We create a user, modify the username, and view the data and diffs using pghistory.models.Events.

import pghistory.models

user = User.objects.create(username="hello", name="world")
user.username = "hi"
user.save()

print(pghistory.models.Events.objects.order_by("pgh_created_at").values())

The events look like:

[{
    "pgh_slug": "app.UserEvent:<event_id>"
    "pgh_model": "app.UserEvent",
    "pgh_id": "<event_id>",
    "pgh_created_at": datetime(2020, 6, 17, 12, 20, 10),
    "pgh_label": "snapshot",
    "pgh_data": {
        "username": "hello",
        "name": "world",
        "id": "<user_id>"
    },
    "pgh_diff": None,
    "pgh_context_id": None
    "pgh_context": None,
    "pgh_obj_model": "app.User",
    "pgh_obj_id": "<user_id>"
},
{
    "pgh_slug": "app.UserEvent:<event_id>",
    "pgh_model": "app.UserEvent",
    "pgh_id": "<event_id>",
    "pgh_created_at": datetime(2020, 6, 17, 12, 20, 20),
    "pgh_label": "snapshot",
    "pgh_data": {
        "username": "hi",
        "name": "world",
        "id": "<user_id>"
    },
    "pgh_diff": {
        "username": ["hello", "hi"]
    }
    "pgh_context_id": None
    "pgh_context": None,
    "pgh_obj_model": "app.User",
    "pgh_obj_id": "<user_id>"
}]

Above we see that pgh_data shows the raw data and pgh_diff shows the changes. The first diff is empty because there is no previous event of the same object and label.

How does it work?

Underneath the hood, pghistory.models.Events is a common table expression (CTE) that does a UNION ALL across event tables. It uses window functions to compute the diff.

When filtering the events directly using Events.objects.filter(), keep in mind that the aggregate CTE is filtered. In versions of Postgres before 12, CTEs are materialized before being queried, which can lead to poor performance when working with many large event tables. Postgres 12 changed how it treats CTEs and can optimize how CTEs are filtered.

Regardless of what version of Postgres you’re using, we recommend using the across(), tracks() and references() methods on the queryset for basic filtering. We cover these in the next sections.

Filtering event models using objects.across()

Use Event.objects.across("app.Model") to filter events by their associated event model. Event.objects.across() can be supplied with multiple model classes or model import strings. It is a much more efficient query than running Event.objects.filter(pgh_model="my.Model").

Filtering by tracked objects using objects.tracks()

Filter aggregate events by object using Events.objects.tracks(). This method takes one or multiple objects and limits the search space by the tracked object. Event models without a pgh_obj field will be ignored.

Filtering by referenced objects using objects.references()

The aforementioned objects.tracks() method only filters by the pgh_obj field. objects.references() will filter events that have any foreign key to the associate object(s). This allows one to query events related to a particular object, such as group or permission events for a particular user. Simply supply the primary object(s) to Events.objects.references(), and all referencing events will be returned.

Note that only events up to one level deep will be returned. Indirect relationships through multiple foreign keys are not returned.

For example, say that we have two models like so:

class Company(models.Model):
    name = models.TextField()

class Product(models.Model):
    company = models.ForeignKey(Company, on_delete=models.CASCADE)

If we make changes to the company or products, doing Events.objects.references(company_object) will return all events for the company and any products that reference it.

Note that like other methods, Events.objects.references takes a variable amount of arguments.

Querying Context as Structured Fields

Similar to individual event models, pghistory.models.Events can also have child classes that make use of the pghistory.ProxyField utilty.

For example, let’s say that we track the url attribute in our context metadata. Here we create a subclass that proxies this field:

class EventsProxy(pghistory.models.Events):
    url = pghistory.ProxyField("pgh_context__url", models.TextField())

    class Meta:
        proxy = True

The EventsProxy model from above now has access to the url field from the context as a normal field. For example, one can now do:

EventsProxy.objects.tracks(object).filter(url="https://some_url.com")

Unlike individual event models, only the pgh_context field can be proxied on the pghistory.models.Events model.

Note

If the corresponding attribute doesn’t exist in the JSON, None will be returned.

Using the MiddlewareEvents model

If you use the middleware to attach context on requests, you can make use of pghistory.models.MiddlewareEvents, which attaches a user and url field that correspond to the attributes captured by the middleware.