spurious wrap-around shutdown
In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns.
postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
datname | datfrozenxid | age
-----------+--------------+-----------
template1 | 2621759843 | 0
template0 | 2621759843 | 0
postgres | 2571759843 | 50000000
jjanes | 2437230921 | 184528922
postgres=# select txid_current();
ERROR: database is not accepting commands to avoid wraparound data loss in
database "jjanes"
HINT: Stop the postmaster and use a standalone backend to vacuum that
database.
You might also need to commit or roll back old prepared transactions.
184,528,922 is well short of 2 billion, so what is going on?
I thought maybe the ShmemVariableCache were not getting updated when vacuum
finished, but if I restart the server (forcing shared memory to get rebuilt
from disk) the condition continues.
I tried setting a breakpoint on SetTransactionIdLimit, but that seems to
get executed on startup before the -W flag takes effect, so I can't find it.
Any tips on how to debug this? I figure the next step is running git
bisect, but that is sure to be tedious.
I'm using a variant of the below to reach wraparound quicker, perhaps that
is introducing a bug?
/messages/by-id/20130207203216.GE5172@alvh.no-ip.org
Cheers,
Jeff
On 2013-06-16 11:54:24 -0700, Jeff Janes wrote:
In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns.
postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
datname | datfrozenxid | age
-----------+--------------+-----------
template1 | 2621759843 | 0
template0 | 2621759843 | 0
postgres | 2571759843 | 50000000
jjanes | 2437230921 | 184528922postgres=# select txid_current();
ERROR: database is not accepting commands to avoid wraparound data loss in
database "jjanes"
HINT: Stop the postmaster and use a standalone backend to vacuum that
database.
You might also need to commit or roll back old prepared transactions.184,528,922 is well short of 2 billion, so what is going on?
I guess you're sure you don't have any old prepared xacts running
around?
I thought maybe the ShmemVariableCache were not getting updated when vacuum
finished, but if I restart the server (forcing shared memory to get rebuilt
from disk) the condition continues.I tried setting a breakpoint on SetTransactionIdLimit, but that seems to
get executed on startup before the -W flag takes effect, so I can't find it.Any tips on how to debug this? I figure the next step is running git
bisect, but that is sure to be tedious.
I'd first add the actual xids limits that are assumed to be dangerous to
the error messages in GetNewTransactionId(). That already might give a
hint.
I'm using a variant of the below to reach wraparound quicker, perhaps that
is introducing a bug?
I don't really trust that patch because it skips loads of checks since
it only repeats part of the work that GetNewTransactionId does. I don't
immediately see what the problem that could cause though.
IIRC I had postes a patch in that thread that looped around
GetNewTransactionId() in that thread. It might be worthwile to test
whether that also reproduces the issue.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jun 16, 2013 at 11:54 AM, Jeff Janes
<jeff.janes@gmail.com<javascript:_e({}, 'cvml',
'jeff.janes@gmail.com');>
wrote:
In 9.3 HEAD I am getting what seems to be spurious wrap-around shutdowns.
postgres=# SELECT datname, datfrozenxid, age(datfrozenxid) FROM
pg_database;datname | datfrozenxid | age
-----------+--------------+-----------
template1 | 2621759843 | 0
template0 | 2621759843 | 0
postgres | 2571759843 | 50000000
jjanes | 2437230921 | 184528922
While the behavior is weird, it is not a regression (also present in 9.2
with suitable changes in timing) and the shutdown is not spurious.
If I execute the above query immediately after the shutdown, I see what I
would expect, jjanes has an age of about 2^31.
The one table that is holding everything back is already getting autovac
for wraparound at that point, and eventually that vacuum finishes. When
done, pg_class and pg_database are updated (I don't know how they get
updated without trying to assign another transaction), and then I get the
above query results.
I would think the database would re-allow new transactions at this point,
but it does not. I don't know why.
Since this isn't a regression in 9.3, I probably won't pursue it any more
at this time, unless encouraged to.
Cheers,
Jeff