pg_dump's aborted transactions

Started by Stephen Frostalmost 11 years ago6 messages
#1Stephen Frost
sfrost@snowman.net

All,

We recently had a client complain that check_postgres' commitratio
check would alert about relatively unused databases. As it turns
out, the reason for this is because they automate running pg_dump
against their databases (surely a good thing..), but pg_dump doesn't
close out its transaction cleanly, leading to rolled back
transactions.

At first blush, at least, this strikes me as an oversight which we
should probably fix and possibly backpatch.

Thoughts?

Thanks,

Stephen

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#1)
Re: pg_dump's aborted transactions

Stephen Frost <sfrost@snowman.net> writes:

All,
We recently had a client complain that check_postgres' commitratio
check would alert about relatively unused databases. As it turns
out, the reason for this is because they automate running pg_dump
against their databases (surely a good thing..), but pg_dump doesn't
close out its transaction cleanly, leading to rolled back
transactions.

At first blush, at least, this strikes me as an oversight which we
should probably fix and possibly backpatch.

No, somebody should fix check_postgres to count rollbacks as well as
commits as activity (as they obviously are).

This is not an oversight, it's 100% intentional. The reason pg_dump
aborts rather than commits is to make entirely sure that it does not
commit any changes to the database. I would be against removing that
safety feature, considering that pg_dump is typically run as superuser.
We have frequently worried about security exploits that involve hijacking
superuser activities, and this behavior provides at least a small
increment of safety against such holes.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Guillaume Lelarge
guillaume@lelarge.info
In reply to: Tom Lane (#2)
Re: pg_dump's aborted transactions

2015-02-04 6:37 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Stephen Frost <sfrost@snowman.net> writes:

All,
We recently had a client complain that check_postgres' commitratio
check would alert about relatively unused databases. As it turns
out, the reason for this is because they automate running pg_dump
against their databases (surely a good thing..), but pg_dump doesn't
close out its transaction cleanly, leading to rolled back
transactions.

At first blush, at least, this strikes me as an oversight which we
should probably fix and possibly backpatch.

No, somebody should fix check_postgres to count rollbacks as well as
commits as activity (as they obviously are).

Well, actually, no. This is a commit ratio, not an activity counter, not
even a transactions count.

The formula right now is:

round(100.*sd.xact_commit/(sd.xact_commit+sd.xact_rollback), 2)

which, AFAICT, is correct.

The fact that the OP uses it to know if there's activity on his databases
can get him false positives if he has no actual activity, except for dumps.

I might be wrong, but there is nothing to fix on the check_postgres (at
least, for this issue ;) ). The expectation of this user is to fix :)

This is not an oversight, it's 100% intentional. The reason pg_dump

aborts rather than commits is to make entirely sure that it does not
commit any changes to the database. I would be against removing that
safety feature, considering that pg_dump is typically run as superuser.
We have frequently worried about security exploits that involve hijacking
superuser activities, and this behavior provides at least a small
increment of safety against such holes.

+1

--
Guillaume.
http://blog.guillaume.lelarge.info
http://www.dalibo.com

#4Stephen Frost
sfrost@snowman.net
In reply to: Guillaume Lelarge (#3)
Re: pg_dump's aborted transactions

* Guillaume Lelarge (guillaume@lelarge.info) wrote:

2015-02-04 6:37 GMT+01:00 Tom Lane <tgl@sss.pgh.pa.us>:

Stephen Frost <sfrost@snowman.net> writes:
No, somebody should fix check_postgres to count rollbacks as well as
commits as activity (as they obviously are).

Well, actually, no. This is a commit ratio, not an activity counter, not
even a transactions count.

The formula right now is:

round(100.*sd.xact_commit/(sd.xact_commit+sd.xact_rollback), 2)

which, AFAICT, is correct.

The fact that the OP uses it to know if there's activity on his databases
can get him false positives if he has no actual activity, except for dumps.

I might be wrong, but there is nothing to fix on the check_postgres (at
least, for this issue ;) ). The expectation of this user is to fix :)

Apologies for the confusion- the client isn't using it to determine if
there's activity. They're using it exactly as it's intended, as I
understand it- to check and see if the number of rollbacks is
signifigant compared to the number of commits. The issue is that, with
databases that have little activity, you can end up with a commit ratio
of 50% or less. Perhaps check_postgres could have an option to only
complain when some minimum number of transactions has been reached..

I've already told them that not all aborted transactions are necessairly
bad ones (I tend to create a lot when I'm using psql, in fact), and that
the commit ratio check is really intended for active databases as just a
heuristic to detect if things have suddenly changed for the worse.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

This is not an oversight, it's 100% intentional. The reason pg_dump
aborts rather than commits is to make entirely sure that it does not
commit any changes to the database. I would be against removing that
safety feature, considering that pg_dump is typically run as superuser.
We have frequently worried about security exploits that involve hijacking
superuser activities, and this behavior provides at least a small
increment of safety against such holes.

We already mark the transaction as READ ONLY in modern versions and so
I'm not sure that this really buys us all that much. If someone's able
to get sufficient transaction control to get out from the read-only one
which is created, aborting at the end isn't going to help.

Thanks,

Stephen

#5Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#4)
Re: pg_dump's aborted transactions

On Wed, Feb 4, 2015 at 8:08 AM, Stephen Frost <sfrost@snowman.net> wrote:

Apologies for the confusion- the client isn't using it to determine if
there's activity. They're using it exactly as it's intended, as I
understand it- to check and see if the number of rollbacks is
signifigant compared to the number of commits. The issue is that, with
databases that have little activity, you can end up with a commit ratio
of 50% or less. Perhaps check_postgres could have an option to only
complain when some minimum number of transactions has been reached..

That sounds like a much better idea. pg_dump isn't really doing
anything wrong here, so "fixing" it feels like the wrong solution.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#5)
Re: pg_dump's aborted transactions

* Robert Haas (robertmhaas@gmail.com) wrote:

On Wed, Feb 4, 2015 at 8:08 AM, Stephen Frost <sfrost@snowman.net> wrote:

Apologies for the confusion- the client isn't using it to determine if
there's activity. They're using it exactly as it's intended, as I
understand it- to check and see if the number of rollbacks is
signifigant compared to the number of commits. The issue is that, with
databases that have little activity, you can end up with a commit ratio
of 50% or less. Perhaps check_postgres could have an option to only
complain when some minimum number of transactions has been reached..

That sounds like a much better idea. pg_dump isn't really doing
anything wrong here, so "fixing" it feels like the wrong solution.

It occurs to me that 'commit ratio' might not be the check that
check_postgres users really want but is, instead, a proxy as we don't
actually provide a way to get at what they really want.

Perhaps we should consider adding a new stat- *errored* transactions.
This would be a count of transactions which rolled back due to an error
(and not an explicit "rollback;" or abort), or even have counts of both
errored and aborted transactions.

At least in my experience, rolled back transactions aren't all bad, but
transactions which were terminated due to an error case are things you
care about and want to look into. I've used tail-n-mail with PG logs to
check for such errors, but it'd certainly be nice to have a Nagios check
which could report on them.

Thanks,

Stephen