Auditing via logical decoding

Started by Philip Scottover 7 years ago7 messages

from_postgres@safetyphil.com

over 7 years ago

Hi Postgres Hackers,

We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is starting to
grate a bit and so I have been tasked with seeing if we can make use of
the new logical decoding functions to achieve the same thing. I thought
that someone must already have written something that would satisfy our
use-case but my internet searches have come up short so far so I am
considering writing a logical decoding plugin to do what we want.

I thought I would run the idea past you all here just in case my plan is
crazy; I’ve browsed around the postgres source code a bit before but
I’ve never really gotten my hands dirty and am a little bit nervous
about putting my own C code into the heart of our DBMS so if this comes
to anything I would like to offer my code up for review and/or possible
inclusion as a contributed module.

A quick summary of requirements:

We want to log (to a separate, remote database)
- One row for every transaction that changes the state of the
database.
We call this table ‘audit_entry’ and contains the xid, transaction
timestamp, username, client hostname, and application name of the
session that caused the change.
- One row for each change made by each transaction which records the
state of the tuple before the change.
We call this table ‘audit_detail’ and contains xid, statement
timestamp, table name & schema, event_type, primary_key (hstore),
old_row (hstore), and the text of the query that was responsible for the
change.

A lot of that information is available already by listening to the
pgoutput decoding, and my first thought was that I could just write a
receiver for that. However, application name, username, client hostname
and current_query() are not available. This is understandable as they
aren’t useful for logical replication.

I was about to give up, when I discovered pg_logical_emit_message.

My current thoughts are to:
- Write this extra data into a logical message while the transaction
is still in progess
Either with a deferred trigger per table or, perhaps better
Find some global commit-time (or xid-assigment time) hook emit it there

- Then get the information out of the database:
Either modify the existing pgoutput plugin & protocol to forward
such messages in its stream,
Or write a dedicated ‘audit’ decoding plugin with its own protocol

- Then get the information into the ‘auditing’ database:
Either with some standalone process that connects to both, consumes
the output created above, translates it to SQL to run in the auditing
DB.
Figure out how to create a proper postgres background process to do
it, in a similar fashion to the logical replication worker

Any input you folks have would be very much appreciated.

Kinds Regards,

Philip

PS: If there is someone out there who is willing & able to build this
for less than my company will have to pay me to do it, please drop me a
line ☺

Jeremy Finzel

finzelj@gmail.com

over 7 years ago

In reply to: Philip Scott (#1)

Re: Auditing via logical decoding

On Fri, Jul 27, 2018 at 5:41 AM Philip Scott <from_postgres@safetyphil.com>
wrote:

Hi Postgres Hackers,

We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is starting to
grate a bit and so I have been tasked with seeing if we can make use of
the new logical decoding functions to achieve the same thing. I thought
that someone must already have written something that would satisfy our
use-case but my internet searches have come up short so far so I am
considering writing a logical decoding plugin to do what we want.

I thought I would run the idea past you all here just in case my plan is
crazy; I’ve browsed around the postgres source code a bit before but
I’ve never really gotten my hands dirty and am a little bit nervous
about putting my own C code into the heart of our DBMS so if this comes
to anything I would like to offer my code up for review and/or possible
inclusion as a contributed module.

A quick summary of requirements:

We want to log (to a separate, remote database)
- One row for every transaction that changes the state of the
database.
We call this table ‘audit_entry’ and contains the xid, transaction
timestamp, username, client hostname, and application name of the
session that caused the change.
- One row for each change made by each transaction which records the
state of the tuple before the change.
We call this table ‘audit_detail’ and contains xid, statement
timestamp, table name & schema, event_type, primary_key (hstore),
old_row (hstore), and the text of the query that was responsible for the
change.

A lot of that information is available already by listening to the
pgoutput decoding, and my first thought was that I could just write a
receiver for that. However, application name, username, client hostname
and current_query() are not available. This is understandable as they
aren’t useful for logical replication.

I was about to give up, when I discovered pg_logical_emit_message.

My current thoughts are to:
- Write this extra data into a logical message while the transaction
is still in progess
Either with a deferred trigger per table or, perhaps better
Find some global commit-time (or xid-assigment time) hook emit it
there

- Then get the information out of the database:
Either modify the existing pgoutput plugin & protocol to forward
such messages in its stream,
Or write a dedicated ‘audit’ decoding plugin with its own protocol

- Then get the information into the ‘auditing’ database:
Either with some standalone process that connects to both, consumes
the output created above, translates it to SQL to run in the auditing
DB.
Figure out how to create a proper postgres background process to do
it, in a similar fashion to the logical replication worker

Any input you folks have would be very much appreciated.

Kinds Regards,

Philip

PS: If there is someone out there who is willing & able to build this
for less than my company will have to pay me to do it, please drop me a
line ☺

All I can say is +1 this would be an awesome feature to have and I hope to
see it someday.

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Philip Scott (#1)

Re: Auditing via logical decoding

On 07/27/2018 12:44 PM, Philip Scott wrote:

Hi Postgres Hackers,

We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is starting to
grate a bit and so I have been tasked with seeing if we can make use of
the new logical decoding functions to achieve the same thing. I thought
that someone must already have written something that would satisfy our
use-case but my internet searches have come up short so far so I am
considering writing a logical decoding plugin to do what we want.

I thought I would run the idea past you all here just in case my plan is
crazy; I’ve browsed around the postgres source code a bit before but
I’ve never really gotten my hands dirty and am a little bit nervous
about putting my own C code into the heart of our DBMS so if this comes
to anything I would like to offer my code up for review and/or possible
inclusion as a contributed module.

A quick summary of requirements:

We want to log (to a separate, remote database)
- One row for every transaction that changes the state of the database.
We call this table ‘audit_entry’ and contains the xid, transaction
timestamp, username, client hostname, and application name of the
session that caused the change.
- One row for each change made by each transaction which records the
state of the tuple before the change.
We call this table ‘audit_detail’ and contains xid, statement
timestamp, table name & schema, event_type, primary_key (hstore),
old_row (hstore), and the text of the query that was responsible for the
change.

Have you checked pgaudit [1]https://www.pgaudit.org/? I haven't checked if it matches all your
requirements, but considering it's an extension aimed at auditing use
cases it might. And it's already available, of course.

[1]: https://www.pgaudit.org/

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Noname

from_postgres@safetyphil.com

over 7 years ago

In reply to: Tomas Vondra (#3)

RE: Auditing via logical decoding

We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is starting to
grate a bit and so I have been tasked with seeing if we can make use of
the new logical decoding functions to achieve the same thing. I thought
that someone must already have written something that would satisfy our
use-case but my internet searches have come up short so far so I am
considering writing a logical decoding plugin to do what we want.

Have you checked pgaudit [1]? I haven't checked if it matches all your
requirements, but considering it's an extension aimed at auditing use
cases it might. And it's already available, of course.

Actually no, I hadn't come across this before, thanks for the heads up. It is important for us to be able to get the audit data back into a different database, but it looks like I could scrape the logs and do that. At the very least it shows that it is possible to hook into postgres in the right places to emit extra logical messages if that turns out to be the better way to do it.

It doesn't appear to support application_name, nor the connection details of the user doing the update - but perhaps that would be just as sensible to add it here than in a logical output plugin.

It's a bit tricky coming to a big codebase like postgres and trying to decide the best route of doing something; I don't have much of a mental model about how complicated the various systems are :)

Kind Regards,

Phil

Pavel Stehule

pavel.stehule@gmail.com

over 7 years ago

In reply to: Noname (#4)

Re: Auditing via logical decoding

2018-07-27 16:43 GMT+02:00 <from_postgres@safetyphil.com>:

We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is starting to
grate a bit and so I have been tasked with seeing if we can make use of
the new logical decoding functions to achieve the same thing. I thought
that someone must already have written something that would satisfy our
use-case but my internet searches have come up short so far so I am
considering writing a logical decoding plugin to do what we want.

Have you checked pgaudit [1]? I haven't checked if it matches all your
requirements, but considering it's an extension aimed at auditing use
cases it might. And it's already available, of course.

Actually no, I hadn't come across this before, thanks for the heads up. It
is important for us to be able to get the audit data back into a different
database, but it looks like I could scrape the logs and do that. At the
very least it shows that it is possible to hook into postgres in the right
places to emit extra logical messages if that turns out to be the better
way to do it.

It doesn't appear to support application_name, nor the connection details
of the user doing the update - but perhaps that would be just as sensible
to add it here than in a logical output plugin.

it is very simple to show application name or any other info

Few years ago I customized pgAudit and it was not hard work - almost all
time

Regards

Pavel

Show quoted text

It's a bit tricky coming to a big codebase like postgres and trying to
decide the best route of doing something; I don't have much of a mental
model about how complicated the various systems are :)

Kind Regards,

Phil

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 7 years ago

In reply to: Noname (#4)

Re: Auditing via logical decoding

On 07/27/2018 04:43 PM, from_postgres@safetyphil.com wrote:

We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is
starting to grate a bit and so I have been tasked with seeing if
we can make use of the new logical decoding functions to achieve
the same thing. I thought that someone must already have written
something that would satisfy our use-case but my internet
searches have come up short so far so I am considering writing a
logical decoding plugin to do what we want.

Have you checked pgaudit [1]? I haven't checked if it matches all
your requirements, but considering it's an extension aimed at
auditing use cases it might. And it's already available, of
course.

Actually no, I hadn't come across this before, thanks for the heads
up. It is important for us to be able to get the audit data back into
a different database, but it looks like I could scrape the logs and
do that. At the very least it shows that it is possible to hook into
postgres in the right places to emit extra logical messages if that
turns out to be the better way to do it.

It doesn't appear to support application_name, nor the connection
details of the user doing the update - but perhaps that would be just
as sensible to add it here than in a logical output plugin.

It's a bit tricky coming to a big codebase like postgres and trying
to decide the best route of doing something; I don't have much of a
mental model about how complicated the various systems are :)

It may not support everything, but in general when a feature can be done
in an extension, it's generally faster to deliver that way. One of the
reasons is the ad hoc release cycle (compared to the 1-per-year cycle of
PostgreSQL). So I'd suggest investigating this option first.

It may not meet all your requirements immediately (or at all), but well,
no solution probably will.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Stephen Frost

sfrost@snowman.net

over 7 years ago

In reply to: Noname (#4)

Re: Auditing via logical decoding

Greetings,

* from_postgres@safetyphil.com (from_postgres@safetyphil.com) wrote:

We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is starting to
grate a bit and so I have been tasked with seeing if we can make use of
the new logical decoding functions to achieve the same thing. I thought
that someone must already have written something that would satisfy our
use-case but my internet searches have come up short so far so I am
considering writing a logical decoding plugin to do what we want.

Have you checked pgaudit [1]? I haven't checked if it matches all your
requirements, but considering it's an extension aimed at auditing use
cases it might. And it's already available, of course.

Actually no, I hadn't come across this before, thanks for the heads up. It is important for us to be able to get the audit data back into a different database, but it looks like I could scrape the logs and do that. At the very least it shows that it is possible to hook into postgres in the right places to emit extra logical messages if that turns out to be the better way to do it.

It doesn't appear to support application_name, nor the connection details of the user doing the update - but perhaps that would be just as sensible to add it here than in a logical output plugin.

I've not been following this very closely but when it comes to pgAudit,
the log lines will include whatever is in log_line_prefix, where you can
include information like the application_name, login user, etc, which is
why you don't explicitly see options for those in pgAudit itself.

This is because pgAudit logs messages the same way PG itself does (which
also means it goes to the same place, such as to a CSV log that can then
be imported into a PG database).

Thanks!

Stephen