Introduction Link to heading
An audit trail
is a way capture changes being made to a database table. This articles describes a common way to implement audit trails in a relational database like Postgres using audit tables
. Audit tables record changes made to a table, including INSERT, UPDATE, and DELETE operations. This allows you to track who made what changes and when, which is crucial for understanding the history of the data. These trails can help in achieving security compliance, in error resolution that require tracing the history of data, and can service as historical records for generating reports. There’s also use cases where audit table records can be replayed (chronologically, based on audit timestamp fields), to reconstruct the state of a table from the beginging of time.
A table and its audit table Link to heading
Consider the following table that holds users information in a system.
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- Table primary key; An autoincrementing number
first_name VARCHAR(255), -- User first name string
last_name VARCHAR(255), -- User last name string
address TEXT, -- Free-form text field for address
last_logged_in TIMESTAMP WITH TIME ZONE, -- The time the User last logged into a system
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP, -- The timestamp this record was created
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP -- The timestamp this record was last updated
)
To keep track of all the changes made to the users
table, we’ll create an audit table. This table will have two versions of the fields that are in the table its auditing - one to keep track of the original value of the record, and the second to keep track of the updated values. The schema for that could be like this -
CREATE TABLE audit_users (
audit_id SERIAL PRIMARY KEY,
operation_type CHAR(1), -- 'C' for Create, 'U' for Update, 'D' for Delete
user_id INT, -- id value in the users table
old_first_name VARCHAR(255),
new_first_name VARCHAR(255),
old_last_name VARCHAR(255),
new_last_name VARCHAR(255),
old_address TEXT,
new_address TEXT,
old_last_logged_in TIMESTAMP WITH TIME ZONE,
new_last_logged_in TIMESTAMP WITH TIME ZONE,
old_created_at TIMESTAMP WITH TIME ZONE,
new_created_at TIMESTAMP WITH TIME ZONE,
old_updated_at TIMESTAMP WITH TIME ZONE,
new_updated_at TIMESTAMP WITH TIME ZONE,
changed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP -- timestamp this audit record was created
);
With these two schemas in place, we can now update both, whenever there’s a CREATE, UPDATE or DELETE operation on the users
table. So now,
- Each row in
audit_users
corresponds to an operation performed on theusers
table - For INSERT operations (‘C’), all the
new_
fields will be populated with the inserted values. Theold_
fields will have NULL values - For UPDATE operations (‘U’), the
new_
fields will contain the updated values, theold_
fields will have the values before the update transaction was committed - For DELETE operations (‘D’), you might only have the
user_id
available. Thenew_
fields could be NULL or retain the last known values before deletion, depending on how you wish to track deletions.
This approach with an audit table captures both, the old and new values after a transactions updating the users
table completes. This approach for an audit table is suitable when you have a use case where its important to get instant access to the old and new values for a row associated with a particular update. The caveat with this approach is that its expensive from a storage point-of-view. If storage is a concern and/or, you don’t need the old values associated with an update, a following reduced schema can be used -
CREATE TABLE audit_users (
audit_id SERIAL PRIMARY KEY,
operation_type CHAR(1),
user_id INT,
new_first_name VARCHAR(255),
new_last_name VARCHAR(255),
new_address TEXT,
new_last_logged_in TIMESTAMP WITH TIME ZONE,
new_created_at TIMESTAMP WITH TIME ZONE,
new_updated_at TIMESTAMP WITH TIME ZONE,
changed_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
Now this simplified audit table will only capture the state of the row after the update transaction is complete.
Adding to the Audit table Link to heading
There are at least two possible ways we can write to the audit table when there’s a CREATE, UPDATE, or DELETE operation on the users
table. One way is to write to audit table within the same transaction that’s updating the users
table. This approach might seem logical however, it has the significant downsides:
- The authors of the transactions that update the
users
table needs to be aware that they need to consistently update theusers
audit table too within the transaction - If multiple actors (services) update the
users
table, all the transactions code that updates the table then needs to be updated - If there is a migration to the
users
table, again there might be a need to update all the transactions updating theusers
table.
All of these make this method error-prone, as developer intervention is required in all places where the transactions are being handled.
There’s an alternate approach that can take the burden of keeping the audit table updated away from the developer. That’s the use of triggers. For Postgres, a trigger function can be defined in the following way on a table -
CREATE OR REPLACE FUNCTION audit_user_changes() RETURNS TRIGGER AS $$
BEGIN
IF (TG_OP = 'INSERT') THEN
INSERT INTO audit_users(operation_type, user_id, new_first_name, new_last_name, new_address, new_last_logged_in, new_created_at, new_updated_at, changed_at)
VALUES ('C', NEW.id, NEW.first_name, NEW.last_name, NEW.address, NEW.last_logged_in, NEW.created_at, NEW.updated_at, CURRENT_TIMESTAMP);
ELSIF (TG_OP = 'UPDATE') THEN
INSERT INTO audit_users(operation_type, user_id, new_first_name, new_last_name, new_address, new_last_logged_in, new_created_at, new_updated_at, changed_at)
VALUES ('U', NEW.id, NEW.first_name, NEW.last_name, NEW.address, NEW.last_logged_in, NEW.created_at, NEW.updated_at, CURRENT_TIMESTAMP);
ELSIF (TG_OP = 'DELETE') THEN
INSERT INTO audit_users(operation_type, user_id, changed_at)
VALUES ('D', OLD.id, CURRENT_TIMESTAMP);
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
For our example schemas presented above, this trigger function updates the
audit_users
table with the simplified schema. Modifying it to work with the extended schema, that captures both the old and new tables values should be trivial.
This trigger can then be activated on CREATE
UPDATE
DELETE
operations on the table like this -
CREATE TRIGGER trigger_user_changes
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_user_changes();
This approach has the advantage of taking the hassle of keeping the audit table update away from the developer. In fact, the author of a transaction updating the users
table need not be aware of the audit table at all!
Audit table audit timestamps Link to heading
One of the things to notice in the audit_users
table is the changed_at
field. The purpose of this field is to record the timestamp of when the audit record was created. Notice that the default value of this field in the database is CURRENT_TIMESTAMP
. The trigger function is also defined to set this value at CURRENT_TIMESTAMP
. This property of the created_at
fields having the CURRENT_TIMESTAMP
value has a subtle but important implication, that’s best illustrated by this example. Assume the following two transactions that are lined up in time as shown -
Transaction timeline Link to heading
Assume two transactions that are executed in time like this -
time | transaction 1 | transaction 2 |
---|---|---|
t0 | begin; | |
t1 | INSERT INTO Users (first_name, last_name, address, last_logged_in, created_at, updated_at) VALUES ('Sam', 'Smith', '123 Main St, Oaktown, USA', CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, CURRENT_TIMESTAMP); | |
t2 | UPDATE Users set address = '789 Another St, Newtown,USA' WHERE id = 4; | |
t3 | commit; |
Assume that the insert operation at t1
inserted a row with the identifier 4. If you look at the audit table for users
you will notice something peculiar. Here’s an example of when this operation was run. Take a look at the created_at
timestamp. The timestamp value for the Update operation, which is chronologically after the create operation, is lower (16:14:01.677
) than that of the Create operation (16:14:22.834
)!
6 C 4 Sam Smith 123 Main St, Oaktown, USA 2023-11-25 16:14:22.834 -0800 2023-11-25 16:14:22.834 -0800 2023-11-25 16:14:22.834 -0800 2023-11-25 16:14:22.834 -0800
7 U 4 Sam Smith 789 Another St, Newtown, USA 2023-11-25 16:14:22.834 -0800 2023-11-25 16:14:22.834 -0800 2023-11-25 16:14:22.834 -0800 2023-11-25 16:14:01.677 -0800
The reason why this happens is because the CREATE_TIMESTAMP value is captured at the beginning of a transaction. Since in the above timeline, the second transaction was started earlier than the first one, which actually did the insert, the update audit entry for the same row (with id 4), had a lower timestamp value. The implication of this observation is that with this approach we would not be able to replay transactions back on a table purely based on a the created_at
. One important thing to mention is that this issue only happens if the transaction isolation level for the update transaction is READ COMMITTED
or lower. It will not happen if its either REPEATABLE READ
or SERIALIZABLE
.
Final Notes Link to heading
We have talked about keeping audit records in a database like Postgres. Similar technique can be employed with other relational database like MySQL, SQL Server etc. The notion of keeping an audit trail is not limited to relational database though. Some databases offer solution-specific audit solutions. For example, MongoDB offer an auditing facility described here, Cassandra offers Audit Logging etc.