Race condition in TransactionIdIsInProgress

Started by Konstantin Knizhnikalmost 4 years ago18 messages
#1Konstantin Knizhnik
knizhnik@garret.ru

Hi hackers,

Postgres first records state transaction in CLOG, then removes
transaction from procarray and finally releases locks.
But it can happen that transaction is marked as committed in CLOG,
XMAX_COMMITTED bit is set in modified tuple but
TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
uncommitted in tuple (%u,%u) to be updated in table"
error is reported.

TransactionIdIsInProgress first checks for cached XID status:

    /*
     * We may have just checked the status of this transaction, so if it is
     * already known to be completed, we can fall out without any access to
     * shared memory.
     */
    if (TransactionIdIsKnownCompleted(xid))
    {
        xc_by_known_xact_inc();
        return false;
    }

This status cache is updated by TransactionLogFetch.
So once transaction status is fetched from CLOG, subsequent invocation
of TransactionIdIsKnownCompleted will return true
even if transaction is not removed from procarray.

The following stack illustrates how it can happen:

#6  0x000055ab7758bbaf in TransactionLogFetch (transactionId=4482)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:88
#7  TransactionLogFetch (transactionId=4482)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:52
#8  0x000055ab7758bc7d in TransactionIdDidAbort
(transactionId=transactionId@entry=4482)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/transam/transam.c:186
#9  0x000055ab77812a43 in TransactionIdIsInProgress (xid=4482)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/storage/ipc/procarray.c:1587
#10 0x000055ab7753de79 in HeapTupleSatisfiesVacuumHorizon
(htup=htup@entry=0x7ffe562cec50, buffer=buffer@entry=1817,
    dead_after=dead_after@entry=0x7ffe562ceb14)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1281
#11 0x000055ab7753e2c5 in HeapTupleSatisfiesNonVacuumable (buffer=1817,
snapshot=0x7ffe562cec70, htup=0x7ffe562cec50)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1449
#12 HeapTupleSatisfiesVisibility (tup=tup@entry=0x7ffe562cec50,
snapshot=snapshot@entry=0x7ffe562cec70, buffer=buffer@entry=1817)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam_visibility.c:1804
#13 0x000055ab7752f3b5 in heap_hot_search_buffer
(tid=tid@entry=0x7ffe562cec2a, relation=relation@entry=0x7fa57fbf17c8,
buffer=buffer@entry=1817,
    snapshot=snapshot@entry=0x7ffe562cec70,
heapTuple=heapTuple@entry=0x7ffe562cec50, all_dead=all_dead@entry=0x0,
first_call=true)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:1802
#14 0x000055ab775370d8 in heap_index_delete_tuples (rel=0x7fa57fbf17c8,
delstate=<optimized out>)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/heap/heapam.c:7480
#15 0x000055ab7755767a in table_index_delete_tuples
(delstate=0x7ffe562d0160, rel=0x2)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/include/access/tableam.h:1327
#16 _bt_delitems_delete_check (rel=rel@entry=0x7fa57fbf4518,
buf=buf@entry=1916, heapRel=heapRel@entry=0x7fa57fbf17c8,
    delstate=delstate@entry=0x7ffe562d0160)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtpage.c:1541
#17 0x000055ab7754e475 in _bt_bottomupdel_pass
(rel=rel@entry=0x7fa57fbf4518, buf=buf@entry=1916,
heapRel=heapRel@entry=0x7fa57fbf17c8, newitemsz=20)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtdedup.c:406
#18 0x000055ab7755010a in _bt_delete_or_dedup_one_page
(rel=0x7fa57fbf4518, heapRel=0x7fa57fbf17c8, insertstate=0x7ffe562d0640,
--Type <RET> for more, q to quit, c to continue without paging--
    simpleonly=<optimized out>, checkingunique=<optimized out>,
uniquedup=<optimized out>, indexUnchanged=true)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:2763
#19 0x000055ab775548c3 in _bt_findinsertloc (heapRel=0x7fa57fbf17c8,
stack=0x55ab7a1cde08, indexUnchanged=true, checkingunique=true,
    insertstate=0x7ffe562d0640, rel=0x7fa57fbf4518)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:904
#20 _bt_doinsert (rel=rel@entry=0x7fa57fbf4518,
itup=itup@entry=0x55ab7a1cde30,
checkUnique=checkUnique@entry=UNIQUE_CHECK_YES,
    indexUnchanged=indexUnchanged@entry=true,
heapRel=heapRel@entry=0x7fa57fbf17c8)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtinsert.c:255
#21 0x000055ab7755a931 in btinsert (rel=0x7fa57fbf4518,
values=<optimized out>, isnull=<optimized out>, ht_ctid=0x55ab7a1cd068,
heapRel=0x7fa57fbf17c8,
    checkUnique=UNIQUE_CHECK_YES, indexUnchanged=true,
indexInfo=0x55ab7a1cdbd0)
    at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/access/nbtree/nbtree.c:199
#22 0x000055ab7769ae59 in ExecInsertIndexTuples
(resultRelInfo=resultRelInfo@entry=0x55ab7a1c88b8,
slot=slot@entry=0x55ab7a1cd038,
    estate=estate@entry=0x55ab7a1c8430, update=update@entry=true,
noDupErr=noDupErr@entry=false, specConflict=specConflict@entry=0x0,
    arbiterIndexes=0x0) at
/home/knizhnik/zenith.main/tmp_install/build/../../vendor/postgres/src/backend/executor/execIndexing.c:415

heap_lock_tuple will not wait transaction completion if
HEAP_XMAX_COMMITTED is set in tuple and returns TM_Updated:

        /*
         * Time to sleep on the other transaction/multixact, if necessary.
         *
         * If the other transaction is an update/delete that's already
         * committed, then sleeping cannot possibly do any good: if we're
         * required to sleep, get out to raise an error instead.
...

The problem is not so easy to reproduce. I got it when performe commits
of transaction with large number of subtransactions with several
synchronous replicas.
But it is possible to artificially introduce delay between
RecordTransactionCommit() and ProcArrayEndTransaction():

diff --git a/src/backend/access/transam/xact.c 
b/src/backend/access/transam/xact.c
index ca6f6d57d3..d095fd13b0 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2187,6 +2187,7 @@ CommitTransaction(void)
                  * durably commit.
                  */
                 latestXid = RecordTransactionCommit();
+               pg_usleep(random() % 100000);
         }
         else
         {

Then the problem can be reproduced with the following pgbebch script:

update.sql:
\set aid random(1, 1000)
select do_update(:aid,100)

update function:
create or replace function do_update(id integer, level integer) returns
void as $$
begin
    begin
        if level > 0 then
            perform do_update(id, level-1);
        else
            update pgbench_accounts SET abalance = abalance + 1 WHERE
aid = id;
        end if;
    exception WHEN OTHERS THEN
        raise notice '% %', SQLERRM, SQLSTATE;
    end;
end; $$ language plpgsql;

pgbench -i postgres
pgbench  -f update.sql -c 100 -T 100 -P 1 postgres

#2Andrey Borodin
x4mmm@yandex-team.ru
In reply to: Konstantin Knizhnik (#1)
Re: Race condition in TransactionIdIsInProgress

On 10 Feb 2022, at 21:46, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table"
error is reported.

Wow, cool, that seem to be a solution to one more mysterious corruption thread - “reporting TID/table with corruption error" [0]/messages/by-id/202108191637.oqyzrdtnheir@alvherre.pgsql. Thank you!

Now it’s obvious that this is not a real data corruption. Maybe let’s remove corruption error code from the error? I had been literally woken at night by this code few times in January.

And do you have a plan how to fix the actual issue?

Best regards, Andrey Borodin.

[0]: /messages/by-id/202108191637.oqyzrdtnheir@alvherre.pgsql

#3Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Andrey Borodin (#2)
Re: Race condition in TransactionIdIsInProgress

On 2022-Feb-10, Andrey Borodin wrote:

On 10 Feb 2022, at 21:46, Konstantin Knizhnik <knizhnik@garret.ru> wrote:
As a result "t_xmin %u is uncommitted in tuple (%u,%u) to be updated in table"
error is reported.

Wow, cool, that seem to be a solution to one more mysterious
corruption thread - “reporting TID/table with corruption error" [0].
Thank you!

Ooh, great find. I'll have a look at this soon. Thanks for CCing me.

--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)

#4Robert Haas
robertmhaas@gmail.com
In reply to: Konstantin Knizhnik (#1)
Re: Race condition in TransactionIdIsInProgress

On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:

Postgres first records state transaction in CLOG, then removes
transaction from procarray and finally releases locks.
But it can happen that transaction is marked as committed in CLOG,
XMAX_COMMITTED bit is set in modified tuple but
TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
uncommitted in tuple (%u,%u) to be updated in table"
error is reported.

This is exactly why the HeapTupleSatisfiesFOO() functions all test
TransactionIdIsInProgress() first and TransactionIdDidCommit() only if
it returns false. I don't think it's really legal to test
TransactionIdDidCommit() without checking TransactionIdIsInProgress()
first. And I bet that's exactly what this code is doing...

--
Robert Haas
EDB: http://www.enterprisedb.com

#5Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#4)
Re: Race condition in TransactionIdIsInProgress

Hi,

On 2022-02-10 15:21:27 -0500, Robert Haas wrote:

On Thu, Feb 10, 2022 at 11:46 AM Konstantin Knizhnik <knizhnik@garret.ru> wrote:

Postgres first records state transaction in CLOG, then removes
transaction from procarray and finally releases locks.
But it can happen that transaction is marked as committed in CLOG,
XMAX_COMMITTED bit is set in modified tuple but
TransactionIdIsInProgress still returns true. As a result "t_xmin %u is
uncommitted in tuple (%u,%u) to be updated in table"
error is reported.

This is exactly why the HeapTupleSatisfiesFOO() functions all test
TransactionIdIsInProgress() first and TransactionIdDidCommit() only if
it returns false. I don't think it's really legal to test
TransactionIdDidCommit() without checking TransactionIdIsInProgress()
first. And I bet that's exactly what this code is doing...

The problem is that the TransactionIdDidAbort() call is coming from within
TransactionIdIsInProgress().

If I understand correctly, the problem is that the xid cache stuff doesn't
correctly work for subtransactions and breaks TransactionIdIsInProgress().

For the problem to occur you need an xid that is for a subtransaction that
suboverflowed and that is 'clog committed' but not 'procarray committed'.

If that xid is tested twice with TransactionIdIsInProgress(), you will get a
bogus result on the second call within the same session.

The first TransactionIdIsInProgress() will go to
/*
* Step 4: have to check pg_subtrans.
*
* At this point, we know it's either a subtransaction of one of the Xids
* in xids[], or it's not running. If it's an already-failed
* subtransaction, we want to say "not running" even though its parent may
* still be running. So first, check pg_xact to see if it's been aborted.
*/
xc_slow_answer_inc();

if (TransactionIdDidAbort(xid))
return false;

and fetch that xid. The TransactionLogFetch() will set cachedFetchXid, because
the xid's status is TRANSACTION_STATUS_COMMITTED. Because the transaction
didn't abort TransactionIdIsInProgress() will go onto the the following loop,
see the toplevel xid is in progress in the procarray and return true.

The second call to TransactionIdIsInProgress() will see that the fetched
transaction matches the cached transactionid and return false.

/*
* We may have just checked the status of this transaction, so if it is
* already known to be completed, we can fall out without any access to
* shared memory.
*/
if (TransactionIdIsKnownCompleted(xid))
{
xc_by_known_xact_inc();
return false;
}

Without any improper uses of TransactionIdDidAbort()/TransactionIdDidCommit()
before TransactionIdIsInProgress(). And other sessions will still return
true. And even *this* session can return true again, if another transaction is
checked that ends up caching another transaction status in cachedFetchXid.

It looks to me like the TransactionIdIsKnownCompleted() path in
TransactionIdIsInProgress() is just entirely broken. An prior
TransactionLogFetch() can't be allowed to make subsequent
TransactionIdIsInProgress() calls return wrong values.

If I understand the problem correctly, a SELECT
pg_xact_status(clog-committed-pgproc-in-progres) can make
TransactionIdIsInProgress() return wrong values too. It's not just
TransactionIdIsInProgress() itself.

Looks like this problem goes back all the way back to

commit 611b4393f22f2bb43135501cd6b7591299b6b453
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2008-03-11 20:20:35 +0000

Make TransactionIdIsInProgress check transam.c's single-item XID status cache
before it goes groveling through the ProcArray. In situations where the same
recently-committed transaction ID is checked repeatedly by tqual.c, this saves
a lot of shared-memory searches. And it's cheap enough that it shouldn't
hurt noticeably when it doesn't help.
Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.

I think this may actually mean that the hot corruption problem fixed in

commit 18b87b201f7
Author: Andres Freund <andres@anarazel.de>
Date: 2021-12-10 20:12:26 -0800

Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning.

for 14/master is present in older branches too :(. Need to trace through the
HTSV and pruning logic with a bit more braincells than currently available to
be sure.

Greetings,

Andres Freund

#6Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#5)
Re: Race condition in TransactionIdIsInProgress

Hi,

On 2022-02-10 21:56:09 -0800, Andres Freund wrote:

I think this may actually mean that the hot corruption problem fixed in

commit 18b87b201f7
Author: Andres Freund <andres@anarazel.de>
Date: 2021-12-10 20:12:26 -0800

Fix possible HOT corruption when RECENTLY_DEAD changes to DEAD while pruning.

for 14/master is present in older branches too :(. Need to trace through the
HTSV and pruning logic with a bit more braincells than currently available to
be sure.

On second thought, there's probably sufficiently more direct corruption this
can cause than corruption via hot pruning. Not that that's not a problem, but
... Inconsistent TransactionIdIsInProgress() result can wreak havoc quite
broadly.

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
least it might make it easier to write tests exercising this scenario...

Greetings,

Andres Freund

#7Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Andres Freund (#6)
Re: Race condition in TransactionIdIsInProgress

On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
least it might make it easier to write tests exercising this scenario...

Agreed

TransactionIdIsKnownCompleted(xid) is only broken because the single
item cache is set too early in some cases. The single item cache is
important for performance, so we just need to be more careful about
setting the cache.

--
Simon Riggs http://www.EnterpriseDB.com/

#8Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Simon Riggs (#7)
2 attachment(s)
Re: Race condition in TransactionIdIsInProgress

On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
least it might make it easier to write tests exercising this scenario...

Agreed

TransactionIdIsKnownCompleted(xid) is only broken because the single
item cache is set too early in some cases. The single item cache is
important for performance, so we just need to be more careful about
setting the cache.

Something like this... fix_cachedFetchXid.v1.patch prevents the cache
being set, but this fails! Not worked out why, yet.

just_remove_TransactionIdIsKnownCompleted_call.v1.patch
just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

--
Simon Riggs http://www.EnterpriseDB.com/

Attachments:

fix_cachedFetchXid.v1.patchapplication/octet-stream; name=fix_cachedFetchXid.v1.patchDownload
diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index dbc5f884e8..c99646fa6f 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId)
 		return TRANSACTION_STATUS_ABORTED;
 	}
 
+	/*
+	 * Safeguard that we have called TransactionIsIdInProgress() before
+	 * checking commit log manager, to ensure that we do not cache the
+	 * result until the xid is no longer in the procarray at eoxact.
+	 */
+	if (!TransactionIdKnownNotInProgress(transactionId))
+		return TRANSACTION_STATUS_IN_PROGRESS;
+
 	/*
 	 * Get the transaction status.
 	 */
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 13d192ec2b..a47ba74c68 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -262,6 +262,11 @@ static ProcArrayStruct *procArray;
 
 static PGPROC *allProcs;
 
+/*
+ * Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
+ */
+static TransactionId cachedXidIsNotInProgress = InvalidTransactionId;
+
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
@@ -1400,7 +1405,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 * already known to be completed, we can fall out without any access to
 	 * shared memory.
 	 */
-	if (TransactionIdIsKnownCompleted(xid))
+	if (TransactionIdKnownNotInProgress(xid))
 	{
 		xc_by_known_xact_inc();
 		return false;
@@ -1558,6 +1563,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	if (nxids == 0)
 	{
 		xc_no_overflow_inc();
+		cachedXidIsNotInProgress = xid;
 		return false;
 	}
 
@@ -1572,7 +1578,10 @@ TransactionIdIsInProgress(TransactionId xid)
 	xc_slow_answer_inc();
 
 	if (TransactionIdDidAbort(xid))
+	{
+		cachedXidIsNotInProgress = xid;
 		return false;
+	}
 
 	/*
 	 * It isn't aborted, so check whether the transaction tree it belongs to
@@ -1590,6 +1599,16 @@ TransactionIdIsInProgress(TransactionId xid)
 		}
 	}
 
+	cachedXidIsNotInProgress = xid;
+	return false;
+}
+
+bool
+TransactionIdKnownNotInProgress(TransactionId xid)
+{
+	if (TransactionIdEquals(cachedXidIsNotInProgress, xid))
+		return true;
+
 	return false;
 }
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 9a2816de51..50a5ac13e5 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -289,6 +289,9 @@ extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
 
+/* in procarray.c */
+extern bool TransactionIdKnownNotInProgress(TransactionId xid);
+
 /* in transam/varsup.c */
 extern FullTransactionId GetNewTransactionId(bool isSubXact);
 extern void AdvanceNextFullTransactionIdPastXid(TransactionId xid);
just_remove_TransactionIdIsKnownCompleted_call.v1.patchapplication/octet-stream; name=just_remove_TransactionIdIsKnownCompleted_call.v1.patchDownload
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 13d192ec2b..a47ba74c68 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -262,6 +262,11 @@ static ProcArrayStruct *procArray;
 
 static PGPROC *allProcs;
 
+/*
+ * Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
+ */
+static TransactionId cachedXidIsNotInProgress = InvalidTransactionId;
+
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
@@ -1400,7 +1405,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 * already known to be completed, we can fall out without any access to
 	 * shared memory.
 	 */
-	if (TransactionIdIsKnownCompleted(xid))
+	if (TransactionIdKnownNotInProgress(xid))
 	{
 		xc_by_known_xact_inc();
 		return false;
@@ -1558,6 +1563,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	if (nxids == 0)
 	{
 		xc_no_overflow_inc();
+		cachedXidIsNotInProgress = xid;
 		return false;
 	}
 
@@ -1572,7 +1578,10 @@ TransactionIdIsInProgress(TransactionId xid)
 	xc_slow_answer_inc();
 
 	if (TransactionIdDidAbort(xid))
+	{
+		cachedXidIsNotInProgress = xid;
 		return false;
+	}
 
 	/*
 	 * It isn't aborted, so check whether the transaction tree it belongs to
@@ -1590,6 +1599,16 @@ TransactionIdIsInProgress(TransactionId xid)
 		}
 	}
 
+	cachedXidIsNotInProgress = xid;
+	return false;
+}
+
+bool
+TransactionIdKnownNotInProgress(TransactionId xid)
+{
+	if (TransactionIdEquals(cachedXidIsNotInProgress, xid))
+		return true;
+
 	return false;
 }
 
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 9a2816de51..50a5ac13e5 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -289,6 +289,9 @@ extern TransactionId TransactionIdLatest(TransactionId mainxid,
 										 int nxids, const TransactionId *xids);
 extern XLogRecPtr TransactionIdGetCommitLSN(TransactionId xid);
 
+/* in procarray.c */
+extern bool TransactionIdKnownNotInProgress(TransactionId xid);
+
 /* in transam/varsup.c */
 extern FullTransactionId GetNewTransactionId(bool isSubXact);
 extern void AdvanceNextFullTransactionIdPastXid(TransactionId xid);
#9Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#8)
Re: Race condition in TransactionIdIsInProgress

Hi,

On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:

On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
least it might make it easier to write tests exercising this scenario...

Agreed

TransactionIdIsKnownCompleted(xid) is only broken because the single
item cache is set too early in some cases. The single item cache is
important for performance, so we just need to be more careful about
setting the cache.

Something like this... fix_cachedFetchXid.v1.patch prevents the cache
being set, but this fails! Not worked out why, yet.

I don't think it's safe to check !TransactionIdKnownNotInProgress() in
TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
do TransactionLogFetch() internally.

just_remove_TransactionIdIsKnownCompleted_call.v1.patch

I think this might contain a different diff than what the name suggests?

just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

Which other callers are you referring to?

To me it seems that the "real" reason behind this specific issue being
nontrivial to fix and behind the frequent error of calling
TransactionIdDidCommit() before TransactionIdIsInProgress() is
that we've really made a mess out of the transaction status determination code
/ API. IMO the original sin is requiring the complicated

if (TransactionIdIsCurrentTransactionId(xid)
...
else if (TransactionIdIsInProgress(xid)
...
else if (TransactionIdDidCommit(xid)
...

dance at pretty much any place checking transaction status. The multiple calls
for the same xid is, I think, what pushed the caching down to the
TransactionLogFetch level.

ISTM that we should introduce something like TransactionIdStatus(xid) that
returns
XACT_STATUS_CURRENT
XACT_STATUS_IN_PROGRESS
XACT_STATUS_COMMITTED
XACT_STATUS_ABORTED
accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
XidInMVCCSnapshot() instead of TransactionIdIsInProgress().

I think that'd also make visibility checks faster - we spend a good chunk of
cycles doing unnecessary function calls and repeatedly gathering information.

It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
optimized - TransactionIdIsInProgress() knows it doesn't need to check
anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
smaller than GetTopTransactionIdIfAny() before bsearching through subxids.

Of course anything along these lines would never be backpatchable...

Greetings,

Andres Freund

#10Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#6)
Re: Race condition in TransactionIdIsInProgress

Hi,

On 2022-02-10 22:11:38 -0800, Andres Freund wrote:

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
least it might make it easier to write tests exercising this scenario...

FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
r/w workload that's been modified to start 70 savepoints at the start shows

pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table "pgbench_branches"
pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table "pgbench_branches"
pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table "pgbench_branches"

after a few minutes of running with a local, not slowed down, syncrep. Without
any other artifical slowdowns or such.

Greetings,

Andres Freund

#11Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#10)
Re: Race condition in TransactionIdIsInProgress

Hi,

On 2022-02-11 16:41:24 -0800, Andres Freund wrote:

FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
r/w workload that's been modified to start 70 savepoints at the start shows

pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table "pgbench_branches"
pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table "pgbench_branches"
pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table "pgbench_branches"

after a few minutes of running with a local, not slowed down, syncrep. Without
any other artifical slowdowns or such.

And this can easily be triggered even without subtransactions, in a completely
reliable way.

The only reason I'm so far not succeeding in turning it into an
isolationtester spec is that a transaction waiting for SyncRep doesn't count
as waiting for isolationtester.

Basically

S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep>
S2: SELECT pg_xact_status($xid);
S2: UPDATE;

suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
cache, which then causes the TransactionIdIsInProgress() to take the early
return path, despite the transaction still being in progress. Which then
allows the update to proceed, despite the S1 not having "properly committed"
yet.

Greetings,

Andres Freund

#12Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Andres Freund (#9)
Re: Race condition in TransactionIdIsInProgress

On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres@anarazel.de> wrote:

On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:

On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
least it might make it easier to write tests exercising this scenario...

Agreed

TransactionIdIsKnownCompleted(xid) is only broken because the single
item cache is set too early in some cases. The single item cache is
important for performance, so we just need to be more careful about
setting the cache.

Something like this... fix_cachedFetchXid.v1.patch prevents the cache
being set, but this fails! Not worked out why, yet.

I don't think it's safe to check !TransactionIdKnownNotInProgress() in
TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
do TransactionLogFetch() internally.

That's not correct because you're confusing
TransactionIdKnownNotInProgress(), which is a new function introduced
by the patch, with the existing function
TransactionIdIsKnownCompleted().

just_remove_TransactionIdIsKnownCompleted_call.v1.patch

I think this might contain a different diff than what the name suggests?

Not at all, please don't be confused by the name. The patch removes
the call to TransactionIdIsKnownCompleted() from
TransactionIdIsInProgress().

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.

just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

Which other callers are you referring to?

I don't know of any, but we can't just remove a function in a
backbranch, can we?

To me it seems that the "real" reason behind this specific issue being
nontrivial to fix and behind the frequent error of calling
TransactionIdDidCommit() before TransactionIdIsInProgress() is
that we've really made a mess out of the transaction status determination code
/ API. IMO the original sin is requiring the complicated

if (TransactionIdIsCurrentTransactionId(xid)
...
else if (TransactionIdIsInProgress(xid)
...
else if (TransactionIdDidCommit(xid)
...

dance at pretty much any place checking transaction status.

Agreed, it's pretty weird to have to call the functions in the right
order or you get the wrong answer. Especially since we have no
cross-check to ensure the correct sequence was followed.

The multiple calls
for the same xid is, I think, what pushed the caching down to the
TransactionLogFetch level.

ISTM that we should introduce something like TransactionIdStatus(xid) that
returns
XACT_STATUS_CURRENT
XACT_STATUS_IN_PROGRESS
XACT_STATUS_COMMITTED
XACT_STATUS_ABORTED
accompanied by a TransactionIdStatusMVCC(xid, snapshot) that checks
XidInMVCCSnapshot() instead of TransactionIdIsInProgress().

I think that'd also make visibility checks faster - we spend a good chunk of
cycles doing unnecessary function calls and repeatedly gathering information.

It's also kind of weird that TransactionIdIsCurrentTransactionId() isn't more
optimized - TransactionIdIsInProgress() knows it doesn't need to check
anything before RecentXmin, but TransactionIdIsCurrentTransactionId()
doesn't. Nor does TransactionIdIsCurrentTransactionId() check if the xid is
smaller than GetTopTransactionIdIfAny() before bsearching through subxids.

Of course anything along these lines would never be backpatchable...

Agreed

I've presented a simple patch intended for backpatching. You may not
like the name, but please have another look at
"just_remove_TransactionIdIsKnownCompleted_call.v1.patch".
I believe it is backpatchable with minimal impact and without loss of
performance.

--
Simon Riggs http://www.EnterpriseDB.com/

#13Andres Freund
andres@anarazel.de
In reply to: Simon Riggs (#12)
Re: Race condition in TransactionIdIsInProgress

Hi,

On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:

On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres@anarazel.de> wrote:

On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:

On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:

Looks lik syncrep will make this a lot worse, because it can drastically
increase the window between the TransactionIdCommitTree() and
ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween. But at
least it might make it easier to write tests exercising this scenario...

Agreed

TransactionIdIsKnownCompleted(xid) is only broken because the single
item cache is set too early in some cases. The single item cache is
important for performance, so we just need to be more careful about
setting the cache.

Something like this... fix_cachedFetchXid.v1.patch prevents the cache
being set, but this fails! Not worked out why, yet.

I don't think it's safe to check !TransactionIdKnownNotInProgress() in
TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
do TransactionLogFetch() internally.

That's not correct because you're confusing
TransactionIdKnownNotInProgress(), which is a new function introduced
by the patch, with the existing function
TransactionIdIsKnownCompleted().

I don't think so. This call of the new TransactionIdKnownNotInProgress()

--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId)
return TRANSACTION_STATUS_ABORTED;
}
+	/*
+	 * Safeguard that we have called TransactionIsIdInProgress() before
+	 * checking commit log manager, to ensure that we do not cache the
+	 * result until the xid is no longer in the procarray at eoxact.
+	 */
+	if (!TransactionIdKnownNotInProgress(transactionId))
+		return TRANSACTION_STATUS_IN_PROGRESS;
+
/*
* Get the transaction status.
*/

isn't right. Consider what happens to TransactionIdIsInProgress(x)'s call to
TransactionIdDidAbort(x) at "step 4". TransactionIdDidAbort(x) will call
TransactionLogFetch(x). cachedXidIsNotInProgress isn't yet set to x, so
TransactionLogFetch() returns TRANSACTION_STATUS_IN_PROGRESS. Even if the
(sub)transaction aborted.

Which explains why your first patch doesn't work.

just_remove_TransactionIdIsKnownCompleted_call.v1.patch

I think this might contain a different diff than what the name suggests?

Not at all, please don't be confused by the name. The patch removes
the call to TransactionIdIsKnownCompleted() from
TransactionIdIsInProgress().

I guess for me "just remove" doesn't include adding a new cache...

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.

I think it'd be fine if we needed to. Or we could just make it always return
false or such.

just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

Which other callers are you referring to?

I don't know of any, but we can't just remove a function in a
backbranch, can we?

We have in the past, if it's a "sufficiently internal" function.

Greetings,

Andres Freund

#14Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Andres Freund (#11)
1 attachment(s)
Re: Race condition in TransactionIdIsInProgress

On 12/02/2022 05:42, Andres Freund wrote:

On 2022-02-11 16:41:24 -0800, Andres Freund wrote:

FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
r/w workload that's been modified to start 70 savepoints at the start shows

pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR: t_xmin 3853739 is uncommitted in tuple (2,159) to be updated in table "pgbench_branches"
pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR: t_xmin 3954305 is uncommitted in tuple (2,58) to be updated in table "pgbench_branches"
pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR: t_xmin 4017908 is uncommitted in tuple (3,44) to be updated in table "pgbench_branches"

after a few minutes of running with a local, not slowed down, syncrep. Without
any other artifical slowdowns or such.

And this can easily be triggered even without subtransactions, in a completely
reliable way.

The only reason I'm so far not succeeding in turning it into an
isolationtester spec is that a transaction waiting for SyncRep doesn't count
as waiting for isolationtester.

Basically

S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep>
S2: SELECT pg_xact_status($xid);
S2: UPDATE;

suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
cache, which then causes the TransactionIdIsInProgress() to take the early
return path, despite the transaction still being in progress. Which then
allows the update to proceed, despite the S1 not having "properly committed"
yet.

I started to improve isolationtester, to allow the spec file to specify
a custom query to check for whether the backend is blocked. But then I
realized that it's not quite enough: there is currently no way write the
"$xid = txid_current()" step either. Isolationtester doesn't have
variables like that. Also, you need to set synchronous_standby_names to
make it block, which seems a bit weird in an isolation test.

I wish we had an explicit way to inject delays or failures. We discussed
it before
(/messages/by-id/CANXE4TdxdESX1jKw48xet-5GvBFVSq=4cgNeioTQff372KO45A@mail.gmail.com),
but I don't want to pick up that task right now.

Anyway, I wrote a TAP test for this instead. See attached. I'm not sure
if this is worth committing, the pg_xact_status() trick is quite special
for this particular bug.

Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks
good to me. Replying to the discussion on that:

On 12/02/2022 23:50, Andres Freund wrote:

On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.

I think it'd be fine if we needed to. Or we could just make it always return
false or such.

just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

Which other callers are you referring to?

I don't know of any, but we can't just remove a function in a
backbranch, can we?

We have in the past, if it's a "sufficiently internal" function.

I think we should remove it in v15, and leave it as it is in
backbranches. Just add a comment to point out that the name is a bit
misleading, because it checks the clog rather than the proc array. It's
not inherently dangerous, and someone might have a legit use case for
it. True, someone might also be doing this incorrect thing with it, but
still seems like the right balance to me.

I think we also need to change pg_xact_status(), to also call
TransactionIdIsInProgress() before TransactionIdDidCommit/Abort().
Currently, if it returns "committed", and you start a new transaction,
the new transaction might not yet see the effects of that "committed"
transaction. With that, you cannot reproduce the original bug with
pg_xact_status() though, so there's no point in committing that test then.

In summary, I think we should:
- commit and backpatch Simon's
just_remove_TransactionIdIsKnownCompleted_call.v1.patch
- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone
beats me to it.

- Heikki

Attachments:

add-test-for-xid-cache-bug-1.patchtext/x-patch; charset=UTF-8; name=add-test-for-xid-cache-bug-1.patchDownload
diff --git a/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl b/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl
new file mode 100644
index 0000000000..8a651c0c60
--- /dev/null
+++ b/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl
@@ -0,0 +1,86 @@
+# Test for visibility checks, when an updating transaction has already
+# written commit record but has not yet removed its entry from the
+# proc array, and thus isn't yet visible to other transactions.
+#
+# In particular, this reproduced an old bug in that, see:
+# https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init(allows_streaming => 1);
+$node->start;
+
+# Create test table
+$node->safe_psql('postgres', 'CREATE TABLE xact_test (t text)');
+$node->safe_psql('postgres', "INSERT INTO xact_test VALUES ('');");
+
+# Launch a psql session in the background that UPDATEs the row and
+# commits, but the COMMIT blocks waiting for synchronous replication.
+#
+# We don't have a replica configured, so it will hang forever.
+$node->safe_psql('postgres',
+				 "ALTER SYSTEM SET synchronous_standby_names = 'nonexistent';");
+$node->reload;
+
+my $in  = '';
+my $out = '';
+my $timeout = IPC::Run::timeout($PostgreSQL::Test::Utils::timeout_default);
+
+my $h = $node->background_psql('postgres', \$in, \$out, $timeout);
+
+# Read current xid, we'll need it later
+$out = '';
+$in = "BEGIN;\n";
+$in .= "show synchronous_standby_names;\n";
+$in .= "SELECT pg_backend_pid(), pg_current_xact_id();\n";
+pump_until($h, $timeout, \$out, qr/([[:digit:]]+)\|([[:digit:]]+)[\r\n]$/m);
+
+$out =~ /([[:digit:]]+)\|([[:digit:]]+)[\r\n]/;
+my ($pid, $xid) = ($1, $2);
+
+# Update and commit. The commit will block, waiting for the synchronous replica (which doesn't
+# exist).
+$in = "UPDATE xact_test SET t = 'first';\n";
+$in .= "COMMIT;\n";
+$h->pump_nb;
+
+# Make sure the COMMIT has progressed to the synchronous replication wait.
+my $wait_query = "SELECT true FROM pg_stat_activity WHERE pid = $pid AND wait_event = 'SyncRep'";
+$node->poll_query_until('postgres', $wait_query)
+  or die "Timed out while waiting for commit to reach SyncRep state";
+
+# Try to UPDATE the same row from another session.
+#
+# This is expected to block, waiting for the first updating transaction to finish.
+my ($ret, $stdout, $stderr) = $node->psql(
+	'postgres',
+	qq{
+\\set ECHO all
+SET lock_timeout=1;
+
+-- Prime the single-element CLOG lookup cache. (We used to have a bug where the
+-- single-element cache was used incorrectly in the visibility check)
+SELECT pg_xact_status(xid8 '$xid');
+
+-- This is expected to fail due to the lock timeout
+BEGIN;
+UPDATE xact_test SET t = t || ' second';
+
+-- Not reached, because the UPDATE should fail
+SELECT * from xact_test;
+ROLLBACK;
+});
+print "#### Begin standard output\n";
+print $stdout;
+print "\n#### End standard output\n";
+like($stderr, qr/canceling statement due to lock timeout/, "UPDATE fails due to lock timeout");
+
+$node->stop;
+$h->finish;
+
+done_testing();
#15Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#14)
Re: Race condition in TransactionIdIsInProgress

Hi,

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:

On 12/02/2022 05:42, Andres Freund wrote:

The only reason I'm so far not succeeding in turning it into an
isolationtester spec is that a transaction waiting for SyncRep doesn't count
as waiting for isolationtester.

Basically

S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; <commit wait for syncrep>
S2: SELECT pg_xact_status($xid);
S2: UPDATE;

suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
cache, which then causes the TransactionIdIsInProgress() to take the early
return path, despite the transaction still being in progress. Which then
allows the update to proceed, despite the S1 not having "properly committed"
yet.

I started to improve isolationtester, to allow the spec file to specify a
custom query to check for whether the backend is blocked. But then I
realized that it's not quite enough: there is currently no way write the
"$xid = txid_current()" step either. Isolationtester doesn't have variables
like that. Also, you need to set synchronous_standby_names to make it block,
which seems a bit weird in an isolation test.

I don't think we strictly need $xid - it likely can be implemented via
something like
SELECT pg_xact_status(backend_xid) FROM pg_stat_activity WHERE application_name = 'testname/s1';

I wish we had an explicit way to inject delays or failures. We discussed it
before (/messages/by-id/CANXE4TdxdESX1jKw48xet-5GvBFVSq=4cgNeioTQff372KO45A@mail.gmail.com),
but I don't want to pick up that task right now.

Understandable :(

Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if
this is worth committing, the pg_xact_status() trick is quite special for
this particular bug.

Yea, not sure either.

Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good
to me. Replying to the discussion on that:

On 12/02/2022 23:50, Andres Freund wrote:

On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.

I think it'd be fine if we needed to. Or we could just make it always return
false or such.

just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.

Which other callers are you referring to?

I don't know of any, but we can't just remove a function in a
backbranch, can we?

We have in the past, if it's a "sufficiently internal" function.

I think we should remove it in v15, and leave it as it is in backbranches.
Just add a comment to point out that the name is a bit misleading, because
it checks the clog rather than the proc array. It's not inherently
dangerous, and someone might have a legit use case for it. True, someone
might also be doing this incorrect thing with it, but still seems like the
right balance to me.

I think we also need to change pg_xact_status(), to also call
TransactionIdIsInProgress() before TransactionIdDidCommit/Abort().
Currently, if it returns "committed", and you start a new transaction, the
new transaction might not yet see the effects of that "committed"
transaction. With that, you cannot reproduce the original bug with
pg_xact_status() though, so there's no point in committing that test then.

In summary, I think we should:
- commit and backpatch Simon's
just_remove_TransactionIdIsKnownCompleted_call.v1.patch
- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone beats me
to it.

Makes sense.

Greetings,

Andres Freund

#16Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Andres Freund (#15)
1 attachment(s)
Re: Race condition in TransactionIdIsInProgress

On 24/06/2022 04:43, Andres Freund wrote:

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:

In summary, I think we should:
- commit and backpatch Simon's
just_remove_TransactionIdIsKnownCompleted_call.v1.patch
- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone beats me
to it.

Makes sense.

This is what I came up with for master. One difference from Simon's
original patch is that I decided to not expose the
TransactionIdIsKnownNotInProgress(), as there are no other callers of it
in core, and it doesn't seem useful to extensions. I inlined it into the
caller instead.

BTW, should we worry about XID wraparound with the cache? Could you have
a backend sit idle for 2^32 transactions, without making any
TransactionIdIsKnownNotInProgress() calls? That's not new with this
patch, though, it could happen with the single-item cache in transam.c, too.

- Heikki

Attachments:

0001-Fix-visibility-check-when-XID-is-committed-in-CLOG-b.patchtext/x-patch; charset=UTF-8; name=0001-Fix-visibility-check-when-XID-is-committed-in-CLOG-b.patchDownload
From de81639af2ce0ed635778d4f7cafb3dfc4fd84f5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Sat, 25 Jun 2022 12:02:39 +0300
Subject: [PATCH 1/1] Fix visibility check when XID is committed in CLOG but
 not in procarray.

TransactionIdIsInProgress had a fast path to return 'false' if the
single-item CLOG cache said that the transaction was known to be
committed. However, that was wrong, because a transaction is first
marked as committed in the CLOG but doesn't become visible to others
until it has removed its XID from the proc array. That could lead to an
error:

    ERROR:  t_xmin is uncommitted in tuple to be updated

or for an UPDATE to go ahead without blocking, before the previous
UPDATE on the same row was made visible.

The window is usually very short, but synchronous replication makes it
much wider, because the wait for synchronous replica happens in that
window.

Another thing that makes it hard to hit is that it's hard to get such
a commit-in-progress transaction into the single item CLOG cache.
Normally, if you call TransactionIdIsInProgress on such a transaction,
it determines that the XID is in progress without checking the CLOG
and without populating the cache. One way to prime the cache is to
explicitly call pg_xact_status() on the XID. Another way is use a lot
of subtransactions, so that the subxid cache in the proc array is
overflown, making TransactionIdIsInProgress rely on CLOG checks.

This has been broken ever since it was introduced in 2008, but the race
condition is very hard to hit, especially without synchronous
replication.  There were a couple of reports of this starting from
summer 2021, but no one was able to find the root cause then.

TransactionIdIsKnownCompleted() is now unused. In 'master', remove it,
but I left in place in backbranches in case it's used by extensions.

Also change pg_xact_status() to check TransactionIdIsInProgress().
Previously, it only checked the CLOG, and returned "committed" before
the transaction was actually made visible to other queries. Note that
this also means that you cannot use pg_xact_status() to reproduce the
bug anymore, even if the code wasn't fixed.

Report and analysis by Konstantin Knizhnik. Patch by Simon Riggs, with
the pg_xact_status() change added by me.

Author: Simon Riggs
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
---
 src/backend/access/transam/transam.c | 27 -------------------------
 src/backend/storage/ipc/procarray.c  | 12 ++++++++++-
 src/backend/utils/adt/xid8funcs.c    | 30 +++++++++++-----------------
 src/include/access/transam.h         |  1 -
 4 files changed, 23 insertions(+), 47 deletions(-)

diff --git a/src/backend/access/transam/transam.c b/src/backend/access/transam/transam.c
index dbc5f884e8..5865810135 100644
--- a/src/backend/access/transam/transam.c
+++ b/src/backend/access/transam/transam.c
@@ -219,33 +219,6 @@ TransactionIdDidAbort(TransactionId transactionId)
 	return false;
 }
 
-/*
- * TransactionIdIsKnownCompleted
- *		True iff transaction associated with the identifier is currently
- *		known to have either committed or aborted.
- *
- * This does NOT look into pg_xact but merely probes our local cache
- * (and so it's not named TransactionIdDidComplete, which would be the
- * appropriate name for a function that worked that way).  The intended
- * use is just to short-circuit TransactionIdIsInProgress calls when doing
- * repeated heapam_visibility.c checks for the same XID.  If this isn't
- * extremely fast then it will be counterproductive.
- *
- * Note:
- *		Assumes transaction identifier is valid.
- */
-bool
-TransactionIdIsKnownCompleted(TransactionId transactionId)
-{
-	if (TransactionIdEquals(transactionId, cachedFetchXid))
-	{
-		/* If it's in the cache at all, it must be completed. */
-		return true;
-	}
-
-	return false;
-}
-
 /*
  * TransactionIdCommitTree
  *		Marks the given transaction and children as committed
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4da53a5b3f..dadaa958a8 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -261,6 +261,11 @@ static ProcArrayStruct *procArray;
 
 static PGPROC *allProcs;
 
+/*
+ * Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
+ */
+static TransactionId cachedXidIsNotInProgress = InvalidTransactionId;
+
 /*
  * Bookkeeping for tracking emulated transactions in recovery
  */
@@ -1396,7 +1401,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	 * already known to be completed, we can fall out without any access to
 	 * shared memory.
 	 */
-	if (TransactionIdIsKnownCompleted(xid))
+	if (TransactionIdEquals(cachedXidIsNotInProgress, xid))
 	{
 		xc_by_known_xact_inc();
 		return false;
@@ -1554,6 +1559,7 @@ TransactionIdIsInProgress(TransactionId xid)
 	if (nxids == 0)
 	{
 		xc_no_overflow_inc();
+		cachedXidIsNotInProgress = xid;
 		return false;
 	}
 
@@ -1568,7 +1574,10 @@ TransactionIdIsInProgress(TransactionId xid)
 	xc_slow_answer_inc();
 
 	if (TransactionIdDidAbort(xid))
+	{
+		cachedXidIsNotInProgress = xid;
 		return false;
+	}
 
 	/*
 	 * It isn't aborted, so check whether the transaction tree it belongs to
@@ -1586,6 +1595,7 @@ TransactionIdIsInProgress(TransactionId xid)
 		}
 	}
 
+	cachedXidIsNotInProgress = xid;
 	return false;
 }
 
diff --git a/src/backend/utils/adt/xid8funcs.c b/src/backend/utils/adt/xid8funcs.c
index 6c57ec3d35..81ca6cfdd0 100644
--- a/src/backend/utils/adt/xid8funcs.c
+++ b/src/backend/utils/adt/xid8funcs.c
@@ -36,6 +36,7 @@
 #include "miscadmin.h"
 #include "postmaster/postmaster.h"
 #include "storage/lwlock.h"
+#include "storage/procarray.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
 #include "utils/snapmgr.h"
@@ -676,29 +677,22 @@ pg_xact_status(PG_FUNCTION_ARGS)
 	{
 		Assert(TransactionIdIsValid(xid));
 
-		if (TransactionIdIsCurrentTransactionId(xid))
+		/*
+		 * Like when doing visiblity checks on a row, check whether the transaction
+		 * is still in progress before looking into the CLOG. Otherwise we
+		 * would incorrectly return "committed" for a transaction that is
+		 * committing and has already updated the CLOG, but hasn't removed its XID
+		 * from the proc array yet. (See comment on that race condition at
+		 * the top of heapam_visibility.c)
+		 */
+		if (TransactionIdIsInProgress(xid))
 			status = "in progress";
 		else if (TransactionIdDidCommit(xid))
 			status = "committed";
-		else if (TransactionIdDidAbort(xid))
-			status = "aborted";
 		else
 		{
-			/*
-			 * The xact is not marked as either committed or aborted in clog.
-			 *
-			 * It could be a transaction that ended without updating clog or
-			 * writing an abort record due to a crash. We can safely assume
-			 * it's aborted if it isn't committed and is older than our
-			 * snapshot xmin.
-			 *
-			 * Otherwise it must be in-progress (or have been at the time we
-			 * checked commit/abort status).
-			 */
-			if (TransactionIdPrecedes(xid, GetActiveSnapshot()->xmin))
-				status = "aborted";
-			else
-				status = "in progress";
+			/* it must have aborted or crashed */
+			status = "aborted";
 		}
 	}
 	else
diff --git a/src/include/access/transam.h b/src/include/access/transam.h
index 338dfca5a0..775471d2a7 100644
--- a/src/include/access/transam.h
+++ b/src/include/access/transam.h
@@ -273,7 +273,6 @@ extern PGDLLIMPORT VariableCache ShmemVariableCache;
  */
 extern bool TransactionIdDidCommit(TransactionId transactionId);
 extern bool TransactionIdDidAbort(TransactionId transactionId);
-extern bool TransactionIdIsKnownCompleted(TransactionId transactionId);
 extern void TransactionIdCommitTree(TransactionId xid, int nxids, TransactionId *xids);
 extern void TransactionIdAsyncCommitTree(TransactionId xid, int nxids, TransactionId *xids, XLogRecPtr lsn);
 extern void TransactionIdAbortTree(TransactionId xid, int nxids, TransactionId *xids);
-- 
2.30.2

#17Simon Riggs
simon.riggs@enterprisedb.com
In reply to: Heikki Linnakangas (#16)
Re: Race condition in TransactionIdIsInProgress

On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 24/06/2022 04:43, Andres Freund wrote:

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:

In summary, I think we should:
- commit and backpatch Simon's
just_remove_TransactionIdIsKnownCompleted_call.v1.patch
- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone beats me
to it.

Makes sense.

This is what I came up with for master. One difference from Simon's
original patch is that I decided to not expose the
TransactionIdIsKnownNotInProgress(), as there are no other callers of it
in core, and it doesn't seem useful to extensions. I inlined it into the
caller instead.

Looks good, thanks.

BTW, should we worry about XID wraparound with the cache? Could you have
a backend sit idle for 2^32 transactions, without making any
TransactionIdIsKnownNotInProgress() calls? That's not new with this
patch, though, it could happen with the single-item cache in transam.c, too.

Yes, as a separate patch only in PG15+, imho.

--
Simon Riggs http://www.EnterpriseDB.com/

#18Heikki Linnakangas
hlinnaka@iki.fi
In reply to: Simon Riggs (#17)
Re: Race condition in TransactionIdIsInProgress

On 25/06/2022 13:10, Simon Riggs wrote:

On Sat, 25 Jun 2022 at 10:18, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 24/06/2022 04:43, Andres Freund wrote:

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:

In summary, I think we should:
- commit and backpatch Simon's
just_remove_TransactionIdIsKnownCompleted_call.v1.patch
- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone beats me
to it.

Makes sense.

This is what I came up with for master. One difference from Simon's
original patch is that I decided to not expose the
TransactionIdIsKnownNotInProgress(), as there are no other callers of it
in core, and it doesn't seem useful to extensions. I inlined it into the
caller instead.

Looks good, thanks.

Committed and backpatched. Thanks!

- Heikki