txid_status returns NULL for recently commited transactions

Started by Michail Nikolaevover 7 years ago5 messages
#1Michail Nikolaev
michail.nikolaev@gmail.com

Hello everyone.

I see strange beaviour of txid_status with one of my production servers.

SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811

It also returns null for recent commited and aborted transactions.

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369 <(415)%20734-4369>
template1 289957661 4157344464 <(415)%20734-4464>
project 682347934 3764954191

GDB shows it happens because of TransactionIdPrecedes(xid,
ShmemVariableCache->oldestClogXid)) at txid.c:155.

xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131 <(220)%20734-0131>

The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131
<(220)%20734-0131> which is greater than current xid without an epoch
(150227913) .

Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu
10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.

Also, another small issue:

On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow:
TransactionIdPrecedes(59856482, 2207340131 <(220)%20734-0131>) == false

Could anyone help me with understanding of txid_status behaviour?

Thanks,
Michail.

#2Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Michail Nikolaev (#1)
Re: txid_status returns NULL for recently commited transactions

Sorry, some auto formatting in recent email.

Again wtih fixed formatiing:

I see strange beaviour of txid_status with one of my production servers.

SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811

It also returns null for recent commited and aborted transactions.

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369
template1 289957661 4157344464
project 682347934 3764954191

GDB shows it happens because of TransactionIdPrecedes(xid,
ShmemVariableCache->oldestClogXid)) at txid.c:155.

xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131

The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131
which is greater than current xid without an epoch (150227913) .

Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu
10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.

Also, another small issue:

On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow:
TransactionIdPrecedes(59856482, 2207340131) == false

Could anyone help me with understanding of txid_status behaviour?

Thanks,
Michail.

вт, 25 сент. 2018 г. в 19:47, Michail Nikolaev <michail.nikolaev@gmail.com>:

Show quoted text

Hello everyone.

I see strange beaviour of txid_status with one of my production servers.

SELECT txid_status(txid_current()) -> NULL
SELECT txid_current() -> 4447342811

It also returns null for recent commited and aborted transactions.

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369 <(415)%20734-4369>
template1 289957661 4157344464 <(415)%20734-4464>
project 682347934 3764954191

GDB shows it happens because of TransactionIdPrecedes(xid,
ShmemVariableCache->oldestClogXid)) at txid.c:155.

xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131 <(220)%20734-0131>

The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131
<(220)%20734-0131> which is greater than current xid without an epoch
(150227913) .

Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu
10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.

Also, another small issue:

On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow:
TransactionIdPrecedes(59856482, 2207340131 <(220)%20734-0131>) == false

Could anyone help me with understanding of txid_status behaviour?

Thanks,
Michail.

#3Andres Freund
andres@anarazel.de
In reply to: Michail Nikolaev (#1)
Re: txid_status returns NULL for recently commited transactions

Hi,

On 2018-09-25 19:47:40 +0300, Michail Nikolaev wrote:

I see strange beaviour of txid_status with one of my production servers.

What are you using it for?

SELECT txid_status(txid_current()) -> NULL

I can't reproduce that...

SELECT txid_current() -> 4447342811

It also returns null for recent commited and aborted transactions.

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template 0239961994 4207340131
postgres 289957756 4157344369 <(415)%20734-4369>
template1 289957661 4157344464 <(415)%20734-4464>
project 682347934 3764954191

"0239961994" - I can't see how a leading zero is possible.

Your mailer appears to do very annoying things by converting numbers to
phone numbers.

GDB shows it happens because of TransactionIdPrecedes(xid,
ShmemVariableCache->oldestClogXid)) at txid.c:155.

xid_epoch = 1
xid = 150227913
now_epoch = 1
now_epoch_last_xid = 150468261
ShmemVariableCache->oldestClogXid = 2207340131 <(220)%20734-0131>

The strangest thing is ShmemVariableCache->oldestClogXid == 2207340131
<(220)%20734-0131> which is greater than current xid without an epoch
(150227913) .

It's from the last epoch. Plain xids are 32bit wide, the epochs deal
with values that are bigger. And 2207340131 is less than 2^31 in the
past.

Postgres was updated from 9.6 to PostgreSQL 10.4 (Ubuntu
10.4-2.pgdg14.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
4.8.4-2ubuntu1~14.04.4) 4.8.4, 64-bit.

Also, another small issue:

On the same database:
SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Both tx ids are from my head and not valid.
First 'commited' happends because int32 overflow:
TransactionIdPrecedes(59856482, 2207340131 <(220)%20734-0131>) == false

Why do you think that is the wrong result?

Greetings,

Andres Freund

#4Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Andres Freund (#3)
Re: txid_status returns NULL for recently commited transactions

Hi, thanks for the reply!

What are you using it for?

I want to use it to validate status of related entities in other database
(queue) in short interval after PG transaction commit/rollback.

I can't reproduce that...

Yes, it happens only with one cluster. All others work as expected.

Your mailer appears to do very annoying things by converting numbers to

phone numbers.

Sorry.

It's from the last epoch. Plain xids are 32bit wide, the epochs deal
with values that are bigger. And 2207340131 is less than 2^31 in the
past.

Yes, and probably it is cause of the issue.

ShmemVariableCache->oldestClogXid = 2207340131
xid_epoch = 1
xid = 150227913
TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE
, then TransactionIdInRecentPast return FALSE and txtd_status return NULL.

But xid (1) and xid_epoch (150227913) are correct values from my active (or
recently commited) transaction.

SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Why do you think that is the wrong result?

Let's leave it for now (maybe my misunderstanding). I think it is better to
deal with "txid_status(txid_current()) -> NULL" issue first.

Thanks,
Michail.

#5Michail Nikolaev
michail.nikolaev@gmail.com
In reply to: Michail Nikolaev (#4)
Re: txid_status returns NULL for recently commited transactions

Hello.

Got some new information.

There are 6 replicas and master in cluster. I rebooted two replicas... And
they started to work as expected!

So, on master and 4 untouched replicas:
txid_current() -> 4459388265
txid_status(BIGINT '4459388265') -> NULL

On two rebooted replicas:
txid_status(BIGINT '4459388265') -> 'commited'

All replicas are in sync with master.

Root cause is ShmemVariableCache.oldestClogXid value, as described in
previous message.

On master and 4 replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$13 = 2207340131
(gdb) p ShmemVariableCache.oldestXid
$11 = 3764954191

On two rebooted replicas:
(gdb) p ShmemVariableCache.oldestClogXid
$14 = 3764954191
(gdb) p ShmemVariableCache.oldestXid
$12 = 3764954191

SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database;
template0 252790897 4207340131
postgres 302786659 4157344369
template1 302786564 4157344464
project 695176837 3764954191

As far as I remember master and replicas were not rebooted after upgrading
from 9.6 to 10. So, maybe issue is upgrade-related.

вт, 25 сент. 2018 г. в 22:22, Michail Nikolaev <michail.nikolaev@gmail.com>:

Show quoted text

Hi, thanks for the reply!

What are you using it for?

I want to use it to validate status of related entities in other database
(queue) in short interval after PG transaction commit/rollback.

I can't reproduce that...

Yes, it happens only with one cluster. All others work as expected.

Your mailer appears to do very annoying things by converting numbers to

phone numbers.

Sorry.

It's from the last epoch. Plain xids are 32bit wide, the epochs deal
with values that are bigger. And 2207340131 <(220)%20734-0131> is less

than 2^31 in the

past.

Yes, and probably it is cause of the issue.

ShmemVariableCache->oldestClogXid = 2207340131 <(220)%20734-0131>

xid_epoch = 1
xid = 150227913
TransactionIdPrecedes(xid, ShmemVariableCache->oldestClogXid)) return TRUE
, then TransactionIdInRecentPast return FALSE and txtd_status return NULL.

But xid (1) and xid_epoch (150227913) are correct values from my active
(or recently commited) transaction.

SELECT txid_status(BIGINT '4294967295') -> 'commited'.
SELECT txid_status(BIGINT '4294967296') -> NULL

Why do you think that is the wrong result?

Let's leave it for now (maybe my misunderstanding). I think it is better
to deal with "txid_status(txid_current()) -> NULL" issue first.

Thanks,
Michail.