collect_corrupt_items_vacuum.patch
Hi hackers!
I’ve been working on this [ /messages/by-id/cfcca574-6967-c5ab-7dc3-2c82b6723b99@mail.ru ] bug. Finally, I’ve come up with the patch you can find attached. Basically what is does is rises a PROC_IN_VACUUM flag and resets it afterwards. I know this seems kinda crunchy and I hope you guys will give me some hints on where to continue. This [ /messages/by-id/20220218175119.7hwv7ksamfjwijbx@alap3.anarazel.de ] message contains reproduction script. Thank you very much in advance.
Kind regards,
Daniel Shelepanov
Attachments:
collect_corrupt_items_vacuum.patchtext/x-diff; name="=?UTF-8?B?Y29sbGVjdF9jb3JydXB0X2l0ZW1zX3ZhY3V1bS5wYXRjaA==?="Download
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 53aaf61cb94..700ebeaa018 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -18,6 +18,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/rel.h"
@@ -562,12 +563,19 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
Buffer vmbuffer = InvalidBuffer;
BufferAccessStrategy bstrategy = GetAccessStrategy(BAS_BULKREAD);
TransactionId OldestXmin = InvalidTransactionId;
+ uint8 oldStatusFlags;
rel = relation_open(relid, AccessShareLock);
/* Only some relkinds have a visibility map */
check_relation_relkind(rel);
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ oldStatusFlags = MyProc->statusFlags;
+ MyProc->statusFlags |= PROC_IN_VACUUM;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+
if (all_visible)
OldestXmin = GetOldestNonRemovableTransactionId(rel);
@@ -722,6 +730,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
items->count = items->next;
items->next = 0;
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ MyProc->statusFlags = oldStatusFlags;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+
return items;
}
On Mon, Apr 4, 2022 at 4:51 AM Daniel Shelepanov <deniel1495@mail.ru> wrote:
I’ve been working on this [/messages/by-id/cfcca574-6967-c5ab-7dc3-2c82b6723b99@mail.ru%5D bug. Finally, I’ve come up with the patch you can find attached. Basically what is does is rises a PROC_IN_VACUUM flag and resets it afterwards. I know this seems kinda crunchy and I hope you guys will give me some hints on where to continue. This [/messages/by-id/20220218175119.7hwv7ksamfjwijbx@alap3.anarazel.de%5D message contains reproduction script. Thank you very much in advance.
I noticed the CommitFest entry for this thread today and decided to
take a look. I think the general issue here can be stated in this way:
suppose a VACUUM computes an all-visible cutoff X, i.e. it thinks all
committed XIDs < X are all-visible. Then, at a later time, pg_visible
computes an all-visible cutoff Y, i.e. it thinks all committed XIDs <
Y are all-visible. If Y < X, pg_check_visible() might falsely report
corruption, because VACUUM might have marked as all-visible some page
containing tuples which pg_check_visibile() thinks aren't really
all-visible.
In reality, the oldest all-visible XID cannot move backward, but
ComputeXidHorizons() lets it move backward, because it's intended for
use by a caller who wants to mark pages all-visible, and it's only
concerned with making sure that the value is old enough to be safe.
And that's a problem for the way that pg_visibility is (mis-)using it.
To say that another way, ComputeXidHorizons() is perfectly fine with
returning a value that is older than the true answer, as long as it
never returns a value that is newer than the new answer. pg_visibility
wants the opposite. Here, a value that is newer than the true value
can't do worse than hide corruption, which is sort of OK, but a value
that's older than the true value can report corruption where none
exists, which is very bad.
I have a feeling, therefore, that this isn't really a complete fix. I
think it might address one way for the horizon reported by
ComputeXidHorizons() to move backward, but not all the ways.
Unfortunately, I am out of time for today to study this... but will
try to find more time on another day.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
In reality, the oldest all-visible XID cannot move backward, but
ComputeXidHorizons() lets it move backward, because it's intended for
use by a caller who wants to mark pages all-visible, and it's only
concerned with making sure that the value is old enough to be safe.
Right.
And that's a problem for the way that pg_visibility is (mis-)using it.
To say that another way, ComputeXidHorizons() is perfectly fine with
returning a value that is older than the true answer, as long as it
never returns a value that is newer than the new answer. pg_visibility
wants the opposite. Here, a value that is newer than the true value
can't do worse than hide corruption, which is sort of OK, but a value
that's older than the true value can report corruption where none
exists, which is very bad.
Maybe we need a different function for pg_visibility to call?
If we want ComputeXidHorizons to serve both these purposes, then it
has to always deliver exactly the right answer, which seems like
a definition that will be hard and expensive to achieve.
regards, tom lane
On Wed, Jul 27, 2022 at 5:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe we need a different function for pg_visibility to call?
If we want ComputeXidHorizons to serve both these purposes, then it
has to always deliver exactly the right answer, which seems like
a definition that will be hard and expensive to achieve.
Yeah, I was thinking along similar lines.
I'm also kind of wondering why these calculations use
latestCompletedXid. Is that something we do solely to reduce locking?
The XIDs of running transactions matter, and their snapshots matter,
and the XIDs that could start running in the future matter, but I
don't know why it matters what the latest completed XID is.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Wed, Jul 27, 2022 at 09:47:19PM -0400, Robert Haas wrote:
On Wed, Jul 27, 2022 at 5:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe we need a different function for pg_visibility to call?
If we want ComputeXidHorizons to serve both these purposes, then it
has to always deliver exactly the right answer, which seems like
a definition that will be hard and expensive to achieve.Yeah, I was thinking along similar lines.
I'm also kind of wondering why these calculations use
latestCompletedXid. Is that something we do solely to reduce locking?
The XIDs of running transactions matter, and their snapshots matter,
and the XIDs that could start running in the future matter, but I
don't know why it matters what the latest completed XID is.
Daniel, it seems to me that this thread is waiting for some input from
you, based on the remarks of Tom and Robert. Are you planning to do
so? This is marked as a bug fix, so I have moved this item to the
next CF for now.
--
Michael
Hi hackers!
Daniel is busy with other tasks. I've found this topic and this problem
seems to be actual
or v15 too.
Please correct me if I am wrong. I've checked another discussion related to
pg_visibility [1]/messages/by-id/c0610352-8433-ab4b-986d-0e803c628efe@postgrespro.ru.
According to discussion: if using latest completed xid is not right for
checking visibility, than
it should be the least running transaction xid? So it must be another
function to be used for
these calculations, not the GetOldestNonRemovableTransactionId that uses
the ComputeXidHorizons.
[1]: /messages/by-id/c0610352-8433-ab4b-986d-0e803c628efe@postgrespro.ru
/messages/by-id/c0610352-8433-ab4b-986d-0e803c628efe@postgrespro.ru
On Wed, Oct 12, 2022 at 8:15 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 27, 2022 at 09:47:19PM -0400, Robert Haas wrote:
On Wed, Jul 27, 2022 at 5:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe we need a different function for pg_visibility to call?
If we want ComputeXidHorizons to serve both these purposes, then it
has to always deliver exactly the right answer, which seems like
a definition that will be hard and expensive to achieve.Yeah, I was thinking along similar lines.
I'm also kind of wondering why these calculations use
latestCompletedXid. Is that something we do solely to reduce locking?
The XIDs of running transactions matter, and their snapshots matter,
and the XIDs that could start running in the future matter, but I
don't know why it matters what the latest completed XID is.Daniel, it seems to me that this thread is waiting for some input from
you, based on the remarks of Tom and Robert. Are you planning to do
so? This is marked as a bug fix, so I have moved this item to the
next CF for now.
--
Michael
--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/
Hi hackers!
Just to bump this thread, because the problem seems to be still actual:
Please correct me if I am wrong. I've checked another discussion related to
pg_visibility [1]/messages/by-id/c0610352-8433-ab4b-986d-0e803c628efe@postgrespro.ru.
According to discussion: if using latest completed xid is not right for
checking visibility, than
it should be the least running transaction xid? So it must be another
function to be used for
these calculations, not the GetOldestNonRemovableTransactionId that uses
the ComputeXidHorizons.
[1]: /messages/by-id/c0610352-8433-ab4b-986d-0e803c628efe@postgrespro.ru
/messages/by-id/c0610352-8433-ab4b-986d-0e803c628efe@postgrespro.ru
On Wed, Oct 12, 2022 at 8:15 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Jul 27, 2022 at 09:47:19PM -0400, Robert Haas wrote:
On Wed, Jul 27, 2022 at 5:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe we need a different function for pg_visibility to call?
If we want ComputeXidHorizons to serve both these purposes, then it
has to always deliver exactly the right answer, which seems like
a definition that will be hard and expensive to achieve.Yeah, I was thinking along similar lines.
I'm also kind of wondering why these calculations use
latestCompletedXid. Is that something we do solely to reduce locking?
The XIDs of running transactions matter, and their snapshots matter,
and the XIDs that could start running in the future matter, but I
don't know why it matters what the latest completed XID is.Daniel, it seems to me that this thread is waiting for some input from
you, based on the remarks of Tom and Robert. Are you planning to do
so? This is marked as a bug fix, so I have moved this item to the
next CF for now.
--
Michael
--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/
This patch has been waiting on the author for about a year now, so I will close
it as Returned with Feedback. Plesae feel free to resubmit to a future CF when
there is renewed interest in working on this.
--
Daniel Gustafsson
Hi!
On Tue, Jul 4, 2023 at 10:21 AM Daniel Gustafsson <daniel@yesql.se> wrote:
This patch has been waiting on the author for about a year now, so I will close
it as Returned with Feedback. Plesae feel free to resubmit to a future CF when
there is renewed interest in working on this.
I'd like to revive this thread. While the patch proposed definitely
makes things better. But as pointed out by Robert and Tom, it didn't
allow to avoid all false reports. The reason is that the way we
currently calculate the oldest xmin, it could move backwards (see
comments to ComputeXidHorizons()). The attached patch implements own
function to calculate strict oldest xmin, which should be always
greater or equal to any xid horizon calculated before. I have to do
the following changes in comparison to what ComputeXidHorizons() do.
1. Ignore processes xmin's, because they take into account connection
to other databases which were ignored before.
2. Ignore KnownAssignedXids, because they are not database-aware.
While primary could compute its horizons database-aware.
3. Ignore walsender xmin, because it could go backward if some
replication connections don't use replication slots.
Surely these would significantly sacrifice accuracy. But we have to do
so in order to avoid reporting false errors.
Any thoughts?
------
Regards,
Alexander Korotkov
Attachments:
0001-Fix-false-reports-in-pg_visibility-v2.patchapplication/octet-stream; name=0001-Fix-false-reports-in-pg_visibility-v2.patchDownload
From 7723ba395dbf15a0badd6cdafda2752c19356baa Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon, 6 Nov 2023 11:07:35 +0200
Subject: [PATCH] Fix false reports in pg_visibility
---
contrib/pg_visibility/pg_visibility.c | 75 ++++++++++++++++++++++++---
1 file changed, 69 insertions(+), 6 deletions(-)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 2a4acfd1eee..036a9fd7681 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -19,6 +19,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/rel.h"
@@ -532,6 +533,58 @@ collect_visibility_data(Oid relid, bool include_pd)
return info;
}
+/*
+ * The "strict" version of GetOldestNonRemovableTransactionId(). The
+ * pg_visiblity check can tolerale false positives (don't report some of the
+ * errors), but can't toletate false negatives (report false errors).
+ * This makes us fundamentally unhappy with ComputeXidHorizons(). Normally,
+ * horozons moves forwards, but there are cases when it could move backwards
+ * (see comment for ComputeXidHorizons()).
+ *
+ * This is why we have to implement our own function for xid horizon, which
+ * would be guaranteed to be newer or equal to any xid horizon computed before.
+ * We have to do the following to achieve this.
+ *
+ * 1. Ignore processes xmin's, because they take into account connection to
+ * other databases which were ignored before.
+ * 2. Ignore KnownAssignedXids, because they are not database-aware. While
+ * primary could compute its horizons database-aware.
+ * 3. Ignore walsender xmin, because it could go backward if some replication
+ * connections don't use replication slots.
+ *
+ * Surely these would significantly sacrifice accuracy. But we have to do so
+ * in order to avoid reporting false errors.
+ */
+static TransactionId
+get_strict_xid_horizon(void)
+{
+ TransactionId xmin = InvalidTransactionId;
+
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ for (int index = 0; index < ProcGlobal->allProcCount; index++)
+ {
+ TransactionId xid = (uint32) (*((volatile uint32 *) &ProcGlobal->xids[index]));
+ PGPROC *proc = &ProcGlobal->allProcs[index];
+ uint8 statusFlags;
+
+ if (likely(xid == InvalidTransactionId))
+ continue;
+
+ statusFlags = ProcGlobal->statusFlags[index];
+ if (statusFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
+ continue;
+
+ if (proc->databaseId == MyDatabaseId ||
+ (statusFlags & PROC_AFFECTS_ALL_HORIZONS))
+ {
+ xmin = TransactionIdOlder(xmin, xid);
+ }
+ }
+ LWLockRelease(ProcArrayLock);
+
+ return xmin;
+}
+
/*
* Returns a list of items whose visibility map information does not match
* the status of the tuples on the page.
@@ -562,8 +615,14 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
/* Only some relkinds have a visibility map */
check_relation_relkind(rel);
+ /* Mark ourselves as PROC_IN_VACUUM to exclude our influence on xmin's */
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ MyProc->statusFlags |= PROC_IN_VACUUM;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+
if (all_visible)
- OldestXmin = GetOldestNonRemovableTransactionId(rel);
+ OldestXmin = get_strict_xid_horizon();
nblocks = RelationGetNumberOfBlocks(rel);
@@ -670,12 +729,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
* From a concurrency point of view, it sort of sucks to
* retake ProcArrayLock here while we're holding the buffer
* exclusively locked, but it should be safe against
- * deadlocks, because surely
- * GetOldestNonRemovableTransactionId() should never take a
- * buffer lock. And this shouldn't happen often, so it's worth
- * being careful so as to avoid false positives.
+ * deadlocks, because surely get_strict_xid_horizon() should
+ * never take a buffer lock. And this shouldn't happen often,
+ * so it's worth being careful so as to avoid false positives.
*/
- RecomputedOldestXmin = GetOldestNonRemovableTransactionId(rel);
+ RecomputedOldestXmin = get_strict_xid_horizon();
if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
record_corrupt_item(items, &tuple.t_self);
@@ -715,6 +773,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
items->count = items->next;
items->next = 0;
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ MyProc->statusFlags &= ~PROC_IN_VACUUM;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+
return items;
}
--
2.39.3 (Apple Git-145)
Hi Alexander,
06.11.2023 12:30, Alexander Korotkov wrote:
Surely these would significantly sacrifice accuracy. But we have to do
so in order to avoid reporting false errors.
I've reduced the dirty reproducer Daniel Shelepanov posted initially
to the following:
numdbs=10
for ((d=1;d<=$numdbs;d++)); do
createdb db$d
psql db$d -c "create extension pg_visibility"
done
for ((i=1;i<=300;i++)); do
echo "iteration $i"
for ((d=1;d<=$numdbs;d++)); do
(
echo "
create table vacuum_test as select 42 i;
vacuum (disable_page_skipping) vacuum_test;
select * from pg_check_visible('vacuum_test');
" | psql db$d -a -q >psql-$d.log 2>&1
) &
done
wait
res=0
for ((d=1;d<=$numdbs;d++)); do
grep -q '0 rows' psql-$d.log || { echo "Error condition in psql-$d.log:"; cat psql-$d.log; res=1; break; }
psql db$d -q -c "drop table vacuum_test"
done
[ $res == 0 ] || break;
done
It looks like the v2 patch doesn't fix the original issue. Maybe I miss
something, but with the patch applied, I see the failure immediately,
though without the patch several iterations are needed to get it.
Best regards,
Alexander
Hi, Alexander.
On Tue, Nov 7, 2023 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
It looks like the v2 patch doesn't fix the original issue. Maybe I miss
something, but with the patch applied, I see the failure immediately,
though without the patch several iterations are needed to get it.
That's a bug in the patch. Thank you for cathing it. It should start
calculation from latestCompletedXid + 1, not InvalidTransactionId.
Please, check the revised patch.
------
Regards,
Alexander Korotkov
Attachments:
0001-Fix-false-reports-in-pg_visibility-v3.patchapplication/octet-stream; name=0001-Fix-false-reports-in-pg_visibility-v3.patchDownload
From 19ccad18b18d8ebdcfde6f4be102a02a4f6488ec Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon, 6 Nov 2023 11:07:35 +0200
Subject: [PATCH] Fix false reports in pg_visibility
---
contrib/pg_visibility/pg_visibility.c | 78 ++++++++++++++++++++++++---
1 file changed, 72 insertions(+), 6 deletions(-)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 2a4acfd1eee..18721324207 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -19,6 +19,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/rel.h"
@@ -532,6 +533,61 @@ collect_visibility_data(Oid relid, bool include_pd)
return info;
}
+/*
+ * The "strict" version of GetOldestNonRemovableTransactionId(). The
+ * pg_visiblity check can tolerale false positives (don't report some of the
+ * errors), but can't toletate false negatives (report false errors).
+ * This makes us fundamentally unhappy with ComputeXidHorizons(). Normally,
+ * horozons moves forwards, but there are cases when it could move backwards
+ * (see comment for ComputeXidHorizons()).
+ *
+ * This is why we have to implement our own function for xid horizon, which
+ * would be guaranteed to be newer or equal to any xid horizon computed before.
+ * We have to do the following to achieve this.
+ *
+ * 1. Ignore processes xmin's, because they take into account connection to
+ * other databases which were ignored before.
+ * 2. Ignore KnownAssignedXids, because they are not database-aware. While
+ * primary could compute its horizons database-aware.
+ * 3. Ignore walsender xmin, because it could go backward if some replication
+ * connections don't use replication slots.
+ *
+ * Surely these would significantly sacrifice accuracy. But we have to do so
+ * in order to avoid reporting false errors.
+ */
+static TransactionId
+get_strict_xid_horizon(void)
+{
+ TransactionId xmin;
+
+ LWLockAcquire(ProcArrayLock, LW_SHARED);
+ xmin = XidFromFullTransactionId(ShmemVariableCache->latestCompletedXid);
+ Assert(TransactionIdIsValid(xmin));
+ TransactionIdAdvance(xmin);
+ for (int index = 0; index < ProcGlobal->allProcCount; index++)
+ {
+ TransactionId xid = (uint32) (*((volatile uint32 *) &ProcGlobal->xids[index]));
+ PGPROC *proc = &ProcGlobal->allProcs[index];
+ uint8 statusFlags;
+
+ if (likely(xid == InvalidTransactionId))
+ continue;
+
+ statusFlags = ProcGlobal->statusFlags[index];
+ if (statusFlags & (PROC_IN_LOGICAL_DECODING | PROC_IN_VACUUM))
+ continue;
+
+ if (proc->databaseId == MyDatabaseId ||
+ (statusFlags & PROC_AFFECTS_ALL_HORIZONS))
+ {
+ xmin = TransactionIdOlder(xmin, xid);
+ }
+ }
+ LWLockRelease(ProcArrayLock);
+
+ return xmin;
+}
+
/*
* Returns a list of items whose visibility map information does not match
* the status of the tuples on the page.
@@ -562,8 +618,14 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
/* Only some relkinds have a visibility map */
check_relation_relkind(rel);
+ /* Mark ourselves as PROC_IN_VACUUM to exclude our influence on xmin's */
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ MyProc->statusFlags |= PROC_IN_VACUUM;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+
if (all_visible)
- OldestXmin = GetOldestNonRemovableTransactionId(rel);
+ OldestXmin = get_strict_xid_horizon();
nblocks = RelationGetNumberOfBlocks(rel);
@@ -670,12 +732,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
* From a concurrency point of view, it sort of sucks to
* retake ProcArrayLock here while we're holding the buffer
* exclusively locked, but it should be safe against
- * deadlocks, because surely
- * GetOldestNonRemovableTransactionId() should never take a
- * buffer lock. And this shouldn't happen often, so it's worth
- * being careful so as to avoid false positives.
+ * deadlocks, because surely get_strict_xid_horizon() should
+ * never take a buffer lock. And this shouldn't happen often,
+ * so it's worth being careful so as to avoid false positives.
*/
- RecomputedOldestXmin = GetOldestNonRemovableTransactionId(rel);
+ RecomputedOldestXmin = get_strict_xid_horizon();
if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
record_corrupt_item(items, &tuple.t_self);
@@ -715,6 +776,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
items->count = items->next;
items->next = 0;
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ MyProc->statusFlags &= ~PROC_IN_VACUUM;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+
return items;
}
--
2.39.3 (Apple Git-145)
07.11.2023 14:38, Alexander Korotkov wrote:
Hi, Alexander.
On Tue, Nov 7, 2023 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
It looks like the v2 patch doesn't fix the original issue. Maybe I miss
something, but with the patch applied, I see the failure immediately,
though without the patch several iterations are needed to get it.That's a bug in the patch. Thank you for cathing it. It should start
calculation from latestCompletedXid + 1, not InvalidTransactionId.
Please, check the revised patch.
Thanks for looking at this!
Unfortunately, I still see the failure with the v3, but not on a first
iteration:
...
iteration 316
Error condition in psql-8.log:
create table vacuum_test as select 42 i;
vacuum (disable_page_skipping) vacuum_test;
select * from pg_check_visible('vacuum_test');
t_ctid
--------
(0,1)
(1 row)
(I've double-checked that the patch is applied and get_strict_xid_horizon()
is called.)
Best regards,
Alexander
Hi, Alexander!
On Tue, Nov 7, 2023 at 3:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
07.11.2023 14:38, Alexander Korotkov wrote:
Hi, Alexander.
On Tue, Nov 7, 2023 at 1:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
It looks like the v2 patch doesn't fix the original issue. Maybe I miss
something, but with the patch applied, I see the failure immediately,
though without the patch several iterations are needed to get it.That's a bug in the patch. Thank you for cathing it. It should start
calculation from latestCompletedXid + 1, not InvalidTransactionId.
Please, check the revised patch.Thanks for looking at this!
Unfortunately, I still see the failure with the v3, but not on a first
iteration:
...
iteration 316
Error condition in psql-8.log:
create table vacuum_test as select 42 i;
vacuum (disable_page_skipping) vacuum_test;
select * from pg_check_visible('vacuum_test');
t_ctid
--------
(0,1)
(1 row)(I've double-checked that the patch is applied and get_strict_xid_horizon()
is called.)
I managed to reproduce this on a Linux VM. This problem should arise
because in extension I don't have access to ProcArrayStruct. So, my
code is iterating the whole PGPROC's array. I reimplemented the new
horizon calculation function in the core with usage of
ProcArrayStruct. Now it doesn't fall for me. Please, recheck.
------
Regards,
Alexander Korotkov
Attachments:
0001-Fix-false-reports-in-pg_visibility-v4.patchapplication/octet-stream; name=0001-Fix-false-reports-in-pg_visibility-v4.patchDownload
From b3e2d15a0add4a55907195a05f38f1ce22857ade Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon, 6 Nov 2023 11:07:35 +0200
Subject: [PATCH] Fix false reports in pg_visibility
---
contrib/pg_visibility/pg_visibility.c | 23 +++++++---
src/backend/storage/ipc/procarray.c | 66 +++++++++++++++++++++++++++
src/include/storage/procarray.h | 1 +
3 files changed, 84 insertions(+), 6 deletions(-)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 2a4acfd1eee..78dac323e51 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -19,6 +19,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/rel.h"
@@ -562,8 +563,14 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
/* Only some relkinds have a visibility map */
check_relation_relkind(rel);
+ /* Mark ourselves as PROC_IN_VACUUM to exclude our influence on xmin's */
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ MyProc->statusFlags |= PROC_IN_VACUUM;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+
if (all_visible)
- OldestXmin = GetOldestNonRemovableTransactionId(rel);
+ OldestXmin = GetStrictOldestNonRemovableTransactionId();
nblocks = RelationGetNumberOfBlocks(rel);
@@ -670,12 +677,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
* From a concurrency point of view, it sort of sucks to
* retake ProcArrayLock here while we're holding the buffer
* exclusively locked, but it should be safe against
- * deadlocks, because surely
- * GetOldestNonRemovableTransactionId() should never take a
- * buffer lock. And this shouldn't happen often, so it's worth
- * being careful so as to avoid false positives.
+ * deadlocks, because surely get_strict_xid_horizon() should
+ * never take a buffer lock. And this shouldn't happen often,
+ * so it's worth being careful so as to avoid false positives.
*/
- RecomputedOldestXmin = GetOldestNonRemovableTransactionId(rel);
+ RecomputedOldestXmin = GetStrictOldestNonRemovableTransactionId();
if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
record_corrupt_item(items, &tuple.t_self);
@@ -715,6 +721,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
items->count = items->next;
items->next = 0;
+ LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+ MyProc->statusFlags &= ~PROC_IN_VACUUM;
+ ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ LWLockRelease(ProcArrayLock);
+
return items;
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 80ab026bf56..e36a463cc66 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -237,6 +237,30 @@ typedef struct ComputeXidHorizonsResult
*/
TransactionId data_oldest_nonremovable;
+ /*
+ * The "strict" version of GetOldestNonRemovableTransactionId(). The
+ * pg_visiblity check can tolerale false positives (don't report some of the
+ * errors), but can't toletate false negatives (report false errors).
+ * This makes us fundamentally unhappy with ComputeXidHorizons(). Normally,
+ * horozons moves forwards, but there are cases when it could move backwards
+ * (see comment for ComputeXidHorizons()).
+ *
+ * This is why we have to implement our own function for xid horizon, which
+ * would be guaranteed to be newer or equal to any xid horizon computed before.
+ * We have to do the following to achieve this.
+ *
+ * 1. Ignore processes xmin's, because they take into account connection to
+ * other databases which were ignored before.
+ * 2. Ignore KnownAssignedXids, because they are not database-aware. While
+ * primary could compute its horizons database-aware.
+ * 3. Ignore walsender xmin, because it could go backward if some replication
+ * connections don't use replication slots.
+ *
+ * Surely these would significantly sacrifice accuracy. But we have to do so
+ * in order to avoid reporting false errors.
+ */
+ TransactionId data_strict_oldest_nonremovable;
+
/*
* Oldest xid for which deleted tuples need to be retained in this
* session's temporary tables.
@@ -252,6 +276,7 @@ typedef enum GlobalVisHorizonKind
VISHORIZON_SHARED,
VISHORIZON_CATALOG,
VISHORIZON_DATA,
+ VISHORIZON_DATA_STRICT,
VISHORIZON_TEMP,
} GlobalVisHorizonKind;
@@ -1743,6 +1768,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
h->oldest_considered_running = initial;
h->shared_oldest_nonremovable = initial;
h->data_oldest_nonremovable = initial;
+ h->data_strict_oldest_nonremovable = initial;
/*
* Only modifications made by this backend affect the horizon for
@@ -1842,6 +1868,10 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
{
h->data_oldest_nonremovable =
TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+
+ if (TransactionIdIsValid(xid))
+ h->data_strict_oldest_nonremovable =
+ TransactionIdOlder(h->data_strict_oldest_nonremovable, xid);
}
}
@@ -1997,6 +2027,8 @@ GetOldestNonRemovableTransactionId(Relation rel)
return horizons.catalog_oldest_nonremovable;
case VISHORIZON_DATA:
return horizons.data_oldest_nonremovable;
+ case VISHORIZON_DATA_STRICT:
+ return horizons.data_strict_oldest_nonremovable;
case VISHORIZON_TEMP:
return horizons.temp_oldest_nonremovable;
}
@@ -2005,6 +2037,38 @@ GetOldestNonRemovableTransactionId(Relation rel)
return InvalidTransactionId;
}
+/*
+ * The "strict" version of GetOldestNonRemovableTransactionId(). The
+ * pg_visiblity check can tolerale false positives (don't report some of the
+ * errors), but can't toletate false negatives (report false errors).
+ * This makes us fundamentally unhappy with ComputeXidHorizons(). Normally,
+ * horozons moves forwards, but there are cases when it could move backwards
+ * (see comment for ComputeXidHorizons()).
+ *
+ * This is why we have to implement our own function for xid horizon, which
+ * would be guaranteed to be newer or equal to any xid horizon computed before.
+ * We have to do the following to achieve this.
+ *
+ * 1. Ignore processes xmin's, because they take into account connection to
+ * other databases which were ignored before.
+ * 2. Ignore KnownAssignedXids, because they are not database-aware. While
+ * primary could compute its horizons database-aware.
+ * 3. Ignore walsender xmin, because it could go backward if some replication
+ * connections don't use replication slots.
+ *
+ * Surely these would significantly sacrifice accuracy. But we have to do so
+ * in order to avoid reporting false errors.
+ */
+TransactionId
+GetStrictOldestNonRemovableTransactionId(void)
+{
+ ComputeXidHorizonsResult horizons;
+
+ ComputeXidHorizons(&horizons);
+
+ return horizons.data_strict_oldest_nonremovable;
+}
+
/*
* Return the oldest transaction id any currently running backend might still
* consider running. This should not be used for visibility / pruning
@@ -4028,6 +4092,8 @@ GlobalVisTestFor(Relation rel)
case VISHORIZON_TEMP:
state = &GlobalVisTempRels;
break;
+ default:
+ break;
}
Assert(FullTransactionIdIsValid(state->definitely_needed) &&
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d8cae3ce1c5..d44c2a98b2d 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -55,6 +55,7 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
+extern TransactionId GetStrictOldestNonRemovableTransactionId(void);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
extern TransactionId GetOldestActiveTransactionId(void);
extern TransactionId GetOldestSafeDecodingTransactionId(bool catalogOnly);
--
2.39.3 (Apple Git-145)
Hi Alexander,
04.12.2023 03:23, Alexander Korotkov wrote:
I managed to reproduce this on a Linux VM. This problem should arise
because in extension I don't have access to ProcArrayStruct. So, my
code is iterating the whole PGPROC's array. I reimplemented the new
horizon calculation function in the core with usage of
ProcArrayStruct. Now it doesn't fall for me. Please, recheck.
Yes, v4 works for me as well (thousands of iterations passed).
Thank you!
Though the test passes even without manipulations with the PROC_IN_VACUUM
flag in pg_visibility.c (maybe the test is not good enough to show why
those manipulations are needed).
I also couldn't see where VISHORIZON_DATA_STRICT comes into play...
Best regards,
Alexander
Hi!
I agree with Alexander Lakhin about PROC_IN_VACUUM and
VISHORIZON_DATA_STRICT:
1) probably manipulations with the PROC_IN_VACUUM flag in
pg_visibility.c were needed for condition [1]https://github.com/postgres/postgres/blob/4d0cf0b05defcee985d5af38cb0db2b9c2f8dbae/src/backend/storage/ipc/procarray.c#L1812 and can be removed now;
2) the VISHORIZON_DATA_STRICT macro is probably unnecessary too (since
we are not going to use it in the GlobalVisHorizonKindForRel() function).
Also It would be nice to remove the get_strict_xid_horizon() function
from the comment (replace to GetStrictOldestNonRemovableTransactionId()?).
[1]: https://github.com/postgres/postgres/blob/4d0cf0b05defcee985d5af38cb0db2b9c2f8dbae/src/backend/storage/ipc/procarray.c#L1812
https://github.com/postgres/postgres/blob/4d0cf0b05defcee985d5af38cb0db2b9c2f8dbae/src/backend/storage/ipc/procarray.c#L1812
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.com
Hi!
On Tue, Dec 5, 2023 at 9:03 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
I agree with Alexander Lakhin about PROC_IN_VACUUM and
VISHORIZON_DATA_STRICT:
1) probably manipulations with the PROC_IN_VACUUM flag in
pg_visibility.c were needed for condition [1] and can be removed now;
Right, PROC_IN_VACUUM is no longer required. The possible benefit of
it would be to avoid bloat during a possibly long run of
pg_visibility() function. But the downside are problems with the
snapshot if the invoking query contains something except a single call
of the pg_visibility() function, and complexity. Removed.
2) the VISHORIZON_DATA_STRICT macro is probably unnecessary too (since
we are not going to use it in the GlobalVisHorizonKindForRel() function).
Makes sense, removed.
Also It would be nice to remove the get_strict_xid_horizon() function
from the comment (replace to GetStrictOldestNonRemovableTransactionId()?).
Right, fixed.
The revised patch is attached. Besides the fixes above, it contains
improvements for comments and the detailed commit message.
Tom, Robert, what do you think about the patch attached? It required
a new type of xid horizon in core and sacrifices accuracy. But this
is the only way I can imagine, we can fix the problem in a general
way.
------
Regards,
Alexander Korotkov
Attachments:
0001-Fix-false-reports-in-pg_visibility-v5.patchapplication/octet-stream; name=0001-Fix-false-reports-in-pg_visibility-v5.patchDownload
From 67a32e2cd504d77cb68283b8ee473405dbdb8137 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon, 6 Nov 2023 11:07:35 +0200
Subject: [PATCH] Fix false reports in pg_visibility
Currently pg_visibility computes its xid horizon using the
GetOldestNonRemovableTransactionId(). The problem is that this horizon can
sometimes go backwards. That can lead to reporting false errors.
In order to fix that, this commit implements new funciton
GetStrictOldestNonRemovableTransactionId(). This function computes the xid
horizon, which would be guaranteed to be newer or equal to any xid horizon
computed before.
We have to do the following to achieve this.
1. Ignore processes xmin's, because they take into account connection to
other databases which were ignored before.
2. Ignore KnownAssignedXids, because they are not database-aware. While
primary could compute its horizons database-aware.
3. Ignore walsender xmin, because it could go backward if some replication
connections don't use replication slots.
Surely these would significantly sacrifice accuracy. But we have to do so
in order to avoid reporting false errors.
---
contrib/pg_visibility/pg_visibility.c | 11 +++----
src/backend/storage/ipc/procarray.c | 42 +++++++++++++++++++++++++++
src/include/storage/procarray.h | 1 +
3 files changed, 49 insertions(+), 5 deletions(-)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 17c50a44722..42c654261f7 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -19,6 +19,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/rel.h"
@@ -563,7 +564,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
check_relation_relkind(rel);
if (all_visible)
- OldestXmin = GetOldestNonRemovableTransactionId(rel);
+ OldestXmin = GetStrictOldestNonRemovableTransactionId();
nblocks = RelationGetNumberOfBlocks(rel);
@@ -671,11 +672,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
* retake ProcArrayLock here while we're holding the buffer
* exclusively locked, but it should be safe against
* deadlocks, because surely
- * GetOldestNonRemovableTransactionId() should never take a
- * buffer lock. And this shouldn't happen often, so it's worth
- * being careful so as to avoid false positives.
+ * GetStrictOldestNonRemovableTransactionId() should never
+ * take a buffer lock. And this shouldn't happen often, so
+ * it's worth being careful so as to avoid false positives.
*/
- RecomputedOldestXmin = GetOldestNonRemovableTransactionId(rel);
+ RecomputedOldestXmin = GetStrictOldestNonRemovableTransactionId();
if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
record_corrupt_item(items, &tuple.t_self);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e2f71da4a01..e57d28cda91 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -237,6 +237,12 @@ typedef struct ComputeXidHorizonsResult
*/
TransactionId data_oldest_nonremovable;
+ /*
+ * The "strict" oldest xid, which is guarantteed to be to be newer or
+ * equal to any oldest xid computed before.
+ */
+ TransactionId data_strict_oldest_nonremovable;
+
/*
* Oldest xid for which deleted tuples need to be retained in this
* session's temporary tables.
@@ -1743,6 +1749,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
h->oldest_considered_running = initial;
h->shared_oldest_nonremovable = initial;
h->data_oldest_nonremovable = initial;
+ h->data_strict_oldest_nonremovable = initial;
/*
* Only modifications made by this backend affect the horizon for
@@ -1842,6 +1849,10 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
{
h->data_oldest_nonremovable =
TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+
+ if (TransactionIdIsValid(xid))
+ h->data_strict_oldest_nonremovable =
+ TransactionIdOlder(h->data_strict_oldest_nonremovable, xid);
}
}
@@ -2005,6 +2016,37 @@ GetOldestNonRemovableTransactionId(Relation rel)
return InvalidTransactionId;
}
+/*
+ * The "strict" version of GetOldestNonRemovableTransactionId(). The
+ * pg_visiblity check can tolerale false positives (don't report some of the
+ * errors), but can't toletate false negatives (report false errors).
+ * Normally, horozons moves forwards, but there are cases when it could move
+ * backwards (see comment for ComputeXidHorizons()).
+ *
+ * This is why we have to implement our own function for xid horizon, which
+ * would be guaranteed to be newer or equal to any xid horizon computed before.
+ * We have to do the following to achieve this.
+ *
+ * 1. Ignore processes xmin's, because they take into account connection to
+ * other databases which were ignored before.
+ * 2. Ignore KnownAssignedXids, because they are not database-aware. While
+ * primary could compute its horizons database-aware.
+ * 3. Ignore walsender xmin, because it could go backward if some replication
+ * connections don't use replication slots.
+ *
+ * Surely these would significantly sacrifice accuracy. But we have to do so
+ * in order to avoid reporting false errors.
+ */
+TransactionId
+GetStrictOldestNonRemovableTransactionId(void)
+{
+ ComputeXidHorizonsResult horizons;
+
+ ComputeXidHorizons(&horizons);
+
+ return horizons.data_strict_oldest_nonremovable;
+}
+
/*
* Return the oldest transaction id any currently running backend might still
* consider running. This should not be used for visibility / pruning
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index f3eba9b7640..c1f09c5d812 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -55,6 +55,7 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
+extern TransactionId GetStrictOldestNonRemovableTransactionId(void);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
extern TransactionId GetOldestActiveTransactionId(void);
extern TransactionId GetOldestSafeDecodingTransactionId(bool catalogOnly);
--
2.39.3 (Apple Git-145)
Hi!
Thank you, there is one small point left (in the comment): can you
replace "guarantteed to be to be newer" to "guaranteed to be newer",
file src/backend/storage/ipc/procarray.c?
--
With best regards,
Dmitry Koval
Postgres Professional: http://postgrespro.com
Hi, Dmitry!
On Sat, Jan 13, 2024 at 7:33 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
Thank you, there is one small point left (in the comment): can you
replace "guarantteed to be to be newer" to "guaranteed to be newer",
file src/backend/storage/ipc/procarray.c?
Fixed. Thank you for catching this.
------
Regards,
Alexander Korotkov
Attachments:
0001-Fix-false-reports-in-pg_visibility-v5.patchapplication/octet-stream; name=0001-Fix-false-reports-in-pg_visibility-v5.patchDownload
From 444ac0890030a7bf6804104dcb3663b661fe696a Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon, 6 Nov 2023 11:07:35 +0200
Subject: [PATCH] Fix false reports in pg_visibility
Currently pg_visibility computes its xid horizon using the
GetOldestNonRemovableTransactionId(). The problem is that this horizon can
sometimes go backwards. That can lead to reporting false errors.
In order to fix that, this commit implements new funciton
GetStrictOldestNonRemovableTransactionId(). This function computes the xid
horizon, which would be guaranteed to be newer or equal to any xid horizon
computed before.
We have to do the following to achieve this.
1. Ignore processes xmin's, because they take into account connection to
other databases which were ignored before.
2. Ignore KnownAssignedXids, because they are not database-aware. While
primary could compute its horizons database-aware.
3. Ignore walsender xmin, because it could go backward if some replication
connections don't use replication slots.
Surely these would significantly sacrifice accuracy. But we have to do so
in order to avoid reporting false errors.
---
contrib/pg_visibility/pg_visibility.c | 11 +++----
src/backend/storage/ipc/procarray.c | 42 +++++++++++++++++++++++++++
src/include/storage/procarray.h | 1 +
3 files changed, 49 insertions(+), 5 deletions(-)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 17c50a44722..42c654261f7 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -19,6 +19,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/rel.h"
@@ -563,7 +564,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
check_relation_relkind(rel);
if (all_visible)
- OldestXmin = GetOldestNonRemovableTransactionId(rel);
+ OldestXmin = GetStrictOldestNonRemovableTransactionId();
nblocks = RelationGetNumberOfBlocks(rel);
@@ -671,11 +672,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
* retake ProcArrayLock here while we're holding the buffer
* exclusively locked, but it should be safe against
* deadlocks, because surely
- * GetOldestNonRemovableTransactionId() should never take a
- * buffer lock. And this shouldn't happen often, so it's worth
- * being careful so as to avoid false positives.
+ * GetStrictOldestNonRemovableTransactionId() should never
+ * take a buffer lock. And this shouldn't happen often, so
+ * it's worth being careful so as to avoid false positives.
*/
- RecomputedOldestXmin = GetOldestNonRemovableTransactionId(rel);
+ RecomputedOldestXmin = GetStrictOldestNonRemovableTransactionId();
if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
record_corrupt_item(items, &tuple.t_self);
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e2f71da4a01..50716c5ef99 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -237,6 +237,12 @@ typedef struct ComputeXidHorizonsResult
*/
TransactionId data_oldest_nonremovable;
+ /*
+ * The "strict" oldest xid, which is guarantteed to be newer or equal to
+ * any oldest xid computed before.
+ */
+ TransactionId data_strict_oldest_nonremovable;
+
/*
* Oldest xid for which deleted tuples need to be retained in this
* session's temporary tables.
@@ -1743,6 +1749,7 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
h->oldest_considered_running = initial;
h->shared_oldest_nonremovable = initial;
h->data_oldest_nonremovable = initial;
+ h->data_strict_oldest_nonremovable = initial;
/*
* Only modifications made by this backend affect the horizon for
@@ -1842,6 +1849,10 @@ ComputeXidHorizons(ComputeXidHorizonsResult *h)
{
h->data_oldest_nonremovable =
TransactionIdOlder(h->data_oldest_nonremovable, xmin);
+
+ if (TransactionIdIsValid(xid))
+ h->data_strict_oldest_nonremovable =
+ TransactionIdOlder(h->data_strict_oldest_nonremovable, xid);
}
}
@@ -2005,6 +2016,37 @@ GetOldestNonRemovableTransactionId(Relation rel)
return InvalidTransactionId;
}
+/*
+ * The "strict" version of GetOldestNonRemovableTransactionId(). The
+ * pg_visiblity check can tolerale false positives (don't report some of the
+ * errors), but can't toletate false negatives (report false errors).
+ * Normally, horozons moves forwards, but there are cases when it could move
+ * backwards (see comment for ComputeXidHorizons()).
+ *
+ * This is why we have to implement our own function for xid horizon, which
+ * would be guaranteed to be newer or equal to any xid horizon computed before.
+ * We have to do the following to achieve this.
+ *
+ * 1. Ignore processes xmin's, because they take into account connection to
+ * other databases which were ignored before.
+ * 2. Ignore KnownAssignedXids, because they are not database-aware. While
+ * primary could compute its horizons database-aware.
+ * 3. Ignore walsender xmin, because it could go backward if some replication
+ * connections don't use replication slots.
+ *
+ * Surely these would significantly sacrifice accuracy. But we have to do so
+ * in order to avoid reporting false errors.
+ */
+TransactionId
+GetStrictOldestNonRemovableTransactionId(void)
+{
+ ComputeXidHorizonsResult horizons;
+
+ ComputeXidHorizons(&horizons);
+
+ return horizons.data_strict_oldest_nonremovable;
+}
+
/*
* Return the oldest transaction id any currently running backend might still
* consider running. This should not be used for visibility / pruning
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index f3eba9b7640..c1f09c5d812 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -55,6 +55,7 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern bool TransactionIdIsActive(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
+extern TransactionId GetStrictOldestNonRemovableTransactionId(void);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
extern TransactionId GetOldestActiveTransactionId(void);
extern TransactionId GetOldestSafeDecodingTransactionId(bool catalogOnly);
--
2.39.3 (Apple Git-145)
On Sun, Jan 14, 2024 at 4:35 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sat, Jan 13, 2024 at 7:33 PM Dmitry Koval <d.koval@postgrespro.ru> wrote:
Thank you, there is one small point left (in the comment): can you
replace "guarantteed to be to be newer" to "guaranteed to be newer",
file src/backend/storage/ipc/procarray.c?Fixed. Thank you for catching this.
I made the following improvements to the patch.
1. I find a way to implement the path with less changes to the core
code. The GetRunningTransactionData() function allows to get the
least running xid, all I need is to add database-aware values.
2. I added the TAP test reproducing the original issue.
I'm going to push this if no objections.
------
Regards,
Alexander Korotkov
Attachments:
v6-0001-Fix-false-reports-in-pg_visibility.patchapplication/octet-stream; name=v6-0001-Fix-false-reports-in-pg_visibility.patchDownload
From c1b15b8a4a165f75fc91eba64d3a3ce4f4234a9a Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Mon, 6 Nov 2023 11:07:35 +0200
Subject: [PATCH v6] Fix false reports in pg_visibility
Currently, pg_visibility computes its xid horizon using the
GetOldestNonRemovableTransactionId(). The problem is that this horizon can
sometimes go backward. That can lead to reporting false errors.
In order to fix that, this commit implements a new function
GetStrictOldestNonRemovableTransactionId(). This function computes the xid
horizon, which would be guaranteed to be newer or equal to any xid horizon
computed before.
We have to do the following to achieve this.
1. Ignore processes xmin's, because they consider connection to other databases
that were ignored before.
2. Ignore KnownAssignedXids, because they are not database-aware. At the same
time, the primary could compute its horizons database-aware.
3. Ignore walsender xmin, because it could go backward if some replication
connections don't use replication slots.
As a result, we're using only currently running xids to compute the horizon.
Surely these would significantly sacrifice accuracy. But we have to do so to
avoid reporting false errors.
Inspired by earlier patch by Daniel Shelepanov and the following discussion
with Robert Haas and Tom Lane.
Discussion: https://postgr.es/m/1649062270.289865713%40f403.i.mail.ru
Reviewed-by: Alexander Lakhin, Dmitry Koval
---
contrib/pg_visibility/Makefile | 1 +
contrib/pg_visibility/meson.build | 4 ++
contrib/pg_visibility/pg_visibility.c | 68 +++++++++++++++++--
.../t/001_concurrent_transaction.pl | 48 +++++++++++++
src/backend/storage/ipc/procarray.c | 13 +++-
src/include/storage/standby.h | 1 +
6 files changed, 129 insertions(+), 6 deletions(-)
create mode 100644 contrib/pg_visibility/t/001_concurrent_transaction.pl
diff --git a/contrib/pg_visibility/Makefile b/contrib/pg_visibility/Makefile
index b3b1a89e47d..d3cb411cc90 100644
--- a/contrib/pg_visibility/Makefile
+++ b/contrib/pg_visibility/Makefile
@@ -11,6 +11,7 @@ DATA = pg_visibility--1.1.sql pg_visibility--1.1--1.2.sql \
PGFILEDESC = "pg_visibility - page visibility information"
REGRESS = pg_visibility
+TAP_TESTS = 1
ifdef USE_PGXS
PG_CONFIG = pg_config
diff --git a/contrib/pg_visibility/meson.build b/contrib/pg_visibility/meson.build
index 38e242d95c0..f932371f62d 100644
--- a/contrib/pg_visibility/meson.build
+++ b/contrib/pg_visibility/meson.build
@@ -32,5 +32,9 @@ tests += {
'sql': [
'pg_visibility',
],
+ 'tap': {
+ 'tests': [
+ 't/001_concurrent_transaction.pl',
+ ],
},
}
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 17c50a44722..c654aca269c 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -19,6 +19,7 @@
#include "funcapi.h"
#include "miscadmin.h"
#include "storage/bufmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/smgr.h"
#include "utils/rel.h"
@@ -532,6 +533,63 @@ collect_visibility_data(Oid relid, bool include_pd)
return info;
}
+/*
+ * The "strict" version of GetOldestNonRemovableTransactionId(). The
+ * pg_visiblity check can tolerate false positives (don't report some of the
+ * errors), but can't tolerate false negatives (report false errors). Normally,
+ * horozons move forwards, but there are cases when it could move backward
+ * (see comment for ComputeXidHorizons()).
+ *
+ * This is why we have to implement our own function for xid horizon, which
+ * would be guaranteed to be newer or equal to any xid horizon computed before.
+ * We have to do the following to achieve this.
+ *
+ * 1. Ignore processes xmin's, because they consider connection to other
+ * databases that were ignored before.
+ * 2. Ignore KnownAssignedXids, because they are not database-aware. At the
+ * same time, the primary could compute its horizons database-aware.
+ * 3. Ignore walsender xmin, because it could go backward if some replication
+ * connections don't use replication slots.
+ *
+ * As a result, we're using only currently running xids to compute the horizon.
+ * Surely these would significantly sacrifice accuracy. But we have to do so
+ * to avoid reporting false errors.
+ */
+static TransactionId
+GetStrictOldestNonRemovableTransactionId(Relation rel)
+{
+ RunningTransactions runningTransactions;
+
+ if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
+ {
+ /* Shared relation: take into account all running xids */
+ runningTransactions = GetRunningTransactionData();
+ LWLockRelease(ProcArrayLock);
+ LWLockRelease(XidGenLock);
+ return runningTransactions->oldestRunningXid;
+ }
+ else if (!RELATION_IS_LOCAL(rel))
+ {
+ /*
+ * Normal relation: take into account xids running within the current
+ * database
+ */
+ runningTransactions = GetRunningTransactionData();
+ LWLockRelease(ProcArrayLock);
+ LWLockRelease(XidGenLock);
+ return runningTransactions->oldestDatabaseRunningXid;
+ }
+ else
+ {
+ /*
+ * For temporary relations, ComputeXidHorizons() uses only
+ * TransamVariables->latestCompletedXid and MyProc->xid. These two
+ * shouldn't go backwards. So we're fine with this horizon.
+ */
+ return GetOldestNonRemovableTransactionId(rel);
+ }
+}
+
/*
* Returns a list of items whose visibility map information does not match
* the status of the tuples on the page.
@@ -563,7 +621,7 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
check_relation_relkind(rel);
if (all_visible)
- OldestXmin = GetOldestNonRemovableTransactionId(rel);
+ OldestXmin = GetStrictOldestNonRemovableTransactionId(rel);
nblocks = RelationGetNumberOfBlocks(rel);
@@ -671,11 +729,11 @@ collect_corrupt_items(Oid relid, bool all_visible, bool all_frozen)
* retake ProcArrayLock here while we're holding the buffer
* exclusively locked, but it should be safe against
* deadlocks, because surely
- * GetOldestNonRemovableTransactionId() should never take a
- * buffer lock. And this shouldn't happen often, so it's worth
- * being careful so as to avoid false positives.
+ * GetStrictOldestNonRemovableTransactionId() should never
+ * take a buffer lock. And this shouldn't happen often, so
+ * it's worth being careful so as to avoid false positives.
*/
- RecomputedOldestXmin = GetOldestNonRemovableTransactionId(rel);
+ RecomputedOldestXmin = GetStrictOldestNonRemovableTransactionId(rel);
if (!TransactionIdPrecedes(OldestXmin, RecomputedOldestXmin))
record_corrupt_item(items, &tuple.t_self);
diff --git a/contrib/pg_visibility/t/001_concurrent_transaction.pl b/contrib/pg_visibility/t/001_concurrent_transaction.pl
new file mode 100644
index 00000000000..f0169963473
--- /dev/null
+++ b/contrib/pg_visibility/t/001_concurrent_transaction.pl
@@ -0,0 +1,48 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Check a concurrent transaction doesn't cause false negatives in
+# pg_check_visible() function
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+
+my $node = PostgreSQL::Test::Cluster->new('main');
+
+$node->init;
+$node->start;
+
+# Setup another database
+$node->safe_psql("postgres",
+ "CREATE DATABASE other_database;\n");
+my $bsession = $node->background_psql('other_database');
+
+# Run a concurrent transaction
+$bsession->query_safe(
+ qq[
+ BEGIN;
+ SELECT txid_current();
+]);
+
+# Create a sample table and run vacuum
+$node->safe_psql("postgres",
+ "CREATE EXTENSION pg_visibility;\n"
+ . "CREATE TABLE vacuum_test AS SELECT 42 i;\n"
+ . "VACUUM (disable_page_skipping) vacuum_test;");
+
+# Run pg_check_visible()
+my $result = $node->safe_psql("postgres",
+ "SELECT * FROM pg_check_visible('vacuum_test');");
+
+# There should be no false negatives
+ok($result eq "", "pg_check_visible() detects no errors");
+
+# Shutdown
+$bsession->query_safe("COMMIT;");
+$bsession->quit;
+$node->stop;
+
+done_testing();
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 9eea1ed315a..a8f31aa091d 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2688,6 +2688,7 @@ GetRunningTransactionData(void)
RunningTransactions CurrentRunningXacts = &CurrentRunningXactsData;
TransactionId latestCompletedXid;
TransactionId oldestRunningXid;
+ TransactionId oldestDatabaseRunningXid;
TransactionId *xids;
int index;
int count;
@@ -2732,7 +2733,7 @@ GetRunningTransactionData(void)
latestCompletedXid =
XidFromFullTransactionId(TransamVariables->latestCompletedXid);
- oldestRunningXid =
+ oldestDatabaseRunningXid = oldestRunningXid =
XidFromFullTransactionId(TransamVariables->nextXid);
/*
@@ -2740,6 +2741,8 @@ GetRunningTransactionData(void)
*/
for (index = 0; index < arrayP->numProcs; index++)
{
+ int pgprocno = arrayP->pgprocnos[index];
+ PGPROC *proc = &allProcs[pgprocno];
TransactionId xid;
/* Fetch xid just once - see GetNewTransactionId */
@@ -2760,6 +2763,13 @@ GetRunningTransactionData(void)
if (TransactionIdPrecedes(xid, oldestRunningXid))
oldestRunningXid = xid;
+ /*
+ * Also, update the oldest running xid within the current database.
+ */
+ if (proc->databaseId == MyDatabaseId &&
+ TransactionIdPrecedes(xid, oldestRunningXid))
+ oldestDatabaseRunningXid = xid;
+
if (ProcGlobal->subxidStates[index].overflowed)
suboverflowed = true;
@@ -2826,6 +2836,7 @@ GetRunningTransactionData(void)
CurrentRunningXacts->subxid_overflow = suboverflowed;
CurrentRunningXacts->nextXid = XidFromFullTransactionId(TransamVariables->nextXid);
CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
+ CurrentRunningXacts->oldestDatabaseRunningXid = oldestDatabaseRunningXid;
CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index 534d56fbc9f..52e8eebe087 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -82,6 +82,7 @@ typedef struct RunningTransactionsData
bool subxid_overflow; /* snapshot overflowed, subxids missing */
TransactionId nextXid; /* xid from TransamVariables->nextXid */
TransactionId oldestRunningXid; /* *not* oldestXmin */
+ TransactionId oldestDatabaseRunningXid; /* same as above, but within the current database */
TransactionId latestCompletedXid; /* so we can set xmax */
TransactionId *xids; /* array of (sub)xids still running */
--
2.39.3 (Apple Git-145)
On Tue, Mar 12, 2024 at 02:10:59PM +0200, Alexander Korotkov wrote:
I'm going to push this if no objections.
Commit e85662d wrote:
--- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c
@@ -2740,6 +2741,8 @@ GetRunningTransactionData(void) */ for (index = 0; index < arrayP->numProcs; index++) { + int pgprocno = arrayP->pgprocnos[index]; + PGPROC *proc = &allProcs[pgprocno]; TransactionId xid;/* Fetch xid just once - see GetNewTransactionId */
@@ -2760,6 +2763,13 @@ GetRunningTransactionData(void)
if (TransactionIdPrecedes(xid, oldestRunningXid))
oldestRunningXid = xid;+ /* + * Also, update the oldest running xid within the current database. + */ + if (proc->databaseId == MyDatabaseId && + TransactionIdPrecedes(xid, oldestRunningXid)) + oldestDatabaseRunningXid = xid;
Shouldn't that be s/oldestRunningXid/oldestDatabaseRunningXid/?
While this isn't a hot path, I likely would test TransactionIdPrecedes()
before fetching pgprocno and PGPROC, to reduce wasted cache misses.
On Mon, Jul 1, 2024 at 2:18 AM Noah Misch <noah@leadboat.com> wrote:
On Tue, Mar 12, 2024 at 02:10:59PM +0200, Alexander Korotkov wrote:
I'm going to push this if no objections.
Commit e85662d wrote:
--- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c@@ -2740,6 +2741,8 @@ GetRunningTransactionData(void) */ for (index = 0; index < arrayP->numProcs; index++) { + int pgprocno = arrayP->pgprocnos[index]; + PGPROC *proc = &allProcs[pgprocno]; TransactionId xid;/* Fetch xid just once - see GetNewTransactionId */
@@ -2760,6 +2763,13 @@ GetRunningTransactionData(void)
if (TransactionIdPrecedes(xid, oldestRunningXid))
oldestRunningXid = xid;+ /* + * Also, update the oldest running xid within the current database. + */ + if (proc->databaseId == MyDatabaseId && + TransactionIdPrecedes(xid, oldestRunningXid)) + oldestDatabaseRunningXid = xid;Shouldn't that be s/oldestRunningXid/oldestDatabaseRunningXid/?
Thank you for catching this.
While this isn't a hot path, I likely would test TransactionIdPrecedes()
before fetching pgprocno and PGPROC, to reduce wasted cache misses.
And thanks for suggestion.
The patchset is attached. 0001 implements
s/oldestRunningXid/oldestDatabaseRunningXid/. 0002 implements cache
misses optimization.
If no objection, I'll backpatch 0001 and apply 0002 to the head.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v1-0002-Optimize-memory-access-in-GetRunningTransactionDa.patchapplication/octet-stream; name=v1-0002-Optimize-memory-access-in-GetRunningTransactionDa.patchDownload
From 622361df39c04f1efd8c9febeb559c952ba843fb Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 3 Jul 2024 00:21:13 +0300
Subject: [PATCH v1 2/2] Optimize memory access in GetRunningTransactionData()
e85662df44 made GetRunningTransactionData() calculate the oldest running
transaction id within the current database. This commit optimized this
calculation by performing a cheap transaction id comparison before fetching
the process database id, while the latter could cause extra cache misses.
Reported-by: Noah Misch
Discussion: https://postgr.es/m/20240630231816.bf.nmisch%40google.com
---
src/backend/storage/ipc/procarray.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 9fc930e98f8..16b5803d388 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2753,8 +2753,6 @@ GetRunningTransactionData(void)
*/
for (index = 0; index < arrayP->numProcs; index++)
{
- int pgprocno = arrayP->pgprocnos[index];
- PGPROC *proc = &allProcs[pgprocno];
TransactionId xid;
/* Fetch xid just once - see GetNewTransactionId */
@@ -2776,11 +2774,18 @@ GetRunningTransactionData(void)
oldestRunningXid = xid;
/*
- * Also, update the oldest running xid within the current database.
+ * Also, update the oldest running xid within the current database. As
+ * fetching pgprocno and PGPROC could cause cache misses, we do cheap
+ * TransactionId comparison first.
*/
- if (proc->databaseId == MyDatabaseId &&
- TransactionIdPrecedes(xid, oldestDatabaseRunningXid))
- oldestDatabaseRunningXid = xid;
+ if (TransactionIdPrecedes(xid, oldestDatabaseRunningXid))
+ {
+ int pgprocno = arrayP->pgprocnos[index];
+ PGPROC *proc = &allProcs[pgprocno];
+
+ if (proc->databaseId == MyDatabaseId)
+ oldestDatabaseRunningXid = xid;
+ }
if (ProcGlobal->subxidStates[index].overflowed)
suboverflowed = true;
--
2.39.3 (Apple Git-145)
v1-0001-Fix-typo-in-GetRunningTransactionData.patchapplication/octet-stream; name=v1-0001-Fix-typo-in-GetRunningTransactionData.patchDownload
From 75ebdd659785a7a215cee93b5ae1e896dc98aea9 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 3 Jul 2024 00:16:21 +0300
Subject: [PATCH v1 1/2] Fix typo in GetRunningTransactionData()
e85662df44 made GetRunningTransactionData() calculate the oldest running
transaction id within the current database. However, because of the typo,
the new code uses oldestRunningXid instead of oldestDatabaseRunningXid
in comparison before updating oldestDatabaseRunningXid. This commit fixes
that issue.
Reported-by: Noah Misch
Discussion: https://postgr.es/m/20240630231816.bf.nmisch%40google.com
Backpatch-through: 17
---
src/backend/storage/ipc/procarray.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 387b4a405b0..9fc930e98f8 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2779,7 +2779,7 @@ GetRunningTransactionData(void)
* Also, update the oldest running xid within the current database.
*/
if (proc->databaseId == MyDatabaseId &&
- TransactionIdPrecedes(xid, oldestRunningXid))
+ TransactionIdPrecedes(xid, oldestDatabaseRunningXid))
oldestDatabaseRunningXid = xid;
if (ProcGlobal->subxidStates[index].overflowed)
--
2.39.3 (Apple Git-145)
On Wed, Jul 03, 2024 at 12:31:48AM +0300, Alexander Korotkov wrote:
On Mon, Jul 1, 2024 at 2:18 AM Noah Misch <noah@leadboat.com> wrote:
Commit e85662d wrote:
--- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c@@ -2740,6 +2741,8 @@ GetRunningTransactionData(void) */ for (index = 0; index < arrayP->numProcs; index++) { + int pgprocno = arrayP->pgprocnos[index]; + PGPROC *proc = &allProcs[pgprocno]; TransactionId xid;/* Fetch xid just once - see GetNewTransactionId */
@@ -2760,6 +2763,13 @@ GetRunningTransactionData(void)
if (TransactionIdPrecedes(xid, oldestRunningXid))
oldestRunningXid = xid;+ /* + * Also, update the oldest running xid within the current database. + */ + if (proc->databaseId == MyDatabaseId && + TransactionIdPrecedes(xid, oldestRunningXid)) + oldestDatabaseRunningXid = xid;Shouldn't that be s/oldestRunningXid/oldestDatabaseRunningXid/?
Thank you for catching this.
While this isn't a hot path, I likely would test TransactionIdPrecedes()
before fetching pgprocno and PGPROC, to reduce wasted cache misses.And thanks for suggestion.
The patchset is attached. 0001 implements
s/oldestRunningXid/oldestDatabaseRunningXid/. 0002 implements cache
misses optimization.If no objection, I'll backpatch 0001 and apply 0002 to the head.
Looks fine. I'd drop the comment update as saying the obvious, but keeping it
is okay.
This causes an assertion failure when executed in a hot standby server:
select * from pg_check_visible('pg_database');
TRAP: failed Assert("!RecoveryInProgress()"), File:
"../src/backend/storage/ipc/procarray.c", Line: 2710, PID: 1142572
GetStrictOldestNonRemovableTransactionId does this:
if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
{
/* Shared relation: take into account all running xids */
runningTransactions = GetRunningTransactionData();
LWLockRelease(ProcArrayLock);
LWLockRelease(XidGenLock);
return runningTransactions->oldestRunningXid;
}
And GetRunningTransactionData() has this:
Assert(!RecoveryInProgress());
So it's easy to see that you will hit that assertion.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Tue, Aug 13, 2024 at 9:39 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
This causes an assertion failure when executed in a hot standby server:
select * from pg_check_visible('pg_database');
TRAP: failed Assert("!RecoveryInProgress()"), File:
"../src/backend/storage/ipc/procarray.c", Line: 2710, PID: 1142572GetStrictOldestNonRemovableTransactionId does this:
if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
{
/* Shared relation: take into account all running xids */
runningTransactions = GetRunningTransactionData();
LWLockRelease(ProcArrayLock);
LWLockRelease(XidGenLock);
return runningTransactions->oldestRunningXid;
}And GetRunningTransactionData() has this:
Assert(!RecoveryInProgress());
So it's easy to see that you will hit that assertion.
Oh, thank you!
I'll fix this and add a test for recovery!
------
Regards,
Alexander Korotkov
Supabase
On Tue, Aug 13, 2024 at 10:15 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:
On Tue, Aug 13, 2024 at 9:39 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
This causes an assertion failure when executed in a hot standby server:
select * from pg_check_visible('pg_database');
TRAP: failed Assert("!RecoveryInProgress()"), File:
"../src/backend/storage/ipc/procarray.c", Line: 2710, PID: 1142572GetStrictOldestNonRemovableTransactionId does this:
if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
{
/* Shared relation: take into account all running xids */
runningTransactions = GetRunningTransactionData();
LWLockRelease(ProcArrayLock);
LWLockRelease(XidGenLock);
return runningTransactions->oldestRunningXid;
}And GetRunningTransactionData() has this:
Assert(!RecoveryInProgress());
So it's easy to see that you will hit that assertion.
Oh, thank you!
I'll fix this and add a test for recovery!
Attached patch fixes the problem and adds the corresponding test. I
would appreciate if you take a look at it.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v1-0001-Fix-GetStrictOldestNonRemovableTransactionId-on-s.patchapplication/octet-stream; name=v1-0001-Fix-GetStrictOldestNonRemovableTransactionId-on-s.patchDownload
From 0d4b46ac9e69ef6289d8dbf2c52e505c48515cd5 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 14 Aug 2024 03:13:57 +0300
Subject: [PATCH v1] Fix GetStrictOldestNonRemovableTransactionId() on standby
e85662df44 implemented GetStrictOldestNonRemovableTransactionId() function
for computation of xid horizon that avoid reporting of false errors.
However, GetStrictOldestNonRemovableTransactionId() uses
GetRunningTransactionData() even on standby leading to an assertion failure.
Given that we decided to ignore KnownAssignedXids and standby can't have
own running xids, we switch to use TransamVariables->nextXid as a xid horizon.
Reported-by: Heikki Linnakangas
Discussion: https://postgr.es/m/42218c4f-2c8d-40a3-8743-4d34dd0e4cce%40iki.fi
---
contrib/pg_visibility/pg_visibility.c | 12 +++++++++++-
.../t/001_concurrent_transaction.pl | 19 +++++++++++++++++--
2 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1a1a4ff7be7..067c804e00d 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -560,7 +560,17 @@ GetStrictOldestNonRemovableTransactionId(Relation rel)
{
RunningTransactions runningTransactions;
- if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
+ if (RecoveryInProgress())
+ {
+ TransactionId result;
+
+ /* As we ignore KnownAssignedXids on standby, just pick nextXid */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ result = XidFromFullTransactionId(TransamVariables->nextXid);
+ LWLockRelease(XidGenLock);
+ return result;
+ }
+ else if (rel == NULL || rel->rd_rel->relisshared)
{
/* Shared relation: take into account all running xids */
runningTransactions = GetRunningTransactionData();
diff --git a/contrib/pg_visibility/t/001_concurrent_transaction.pl b/contrib/pg_visibility/t/001_concurrent_transaction.pl
index c31d041757d..498ce412d9a 100644
--- a/contrib/pg_visibility/t/001_concurrent_transaction.pl
+++ b/contrib/pg_visibility/t/001_concurrent_transaction.pl
@@ -10,11 +10,18 @@ use PostgreSQL::Test::Utils;
use Test::More;
+# Initialize the primary node
my $node = PostgreSQL::Test::Cluster->new('main');
-
-$node->init;
+$node->init(allows_streaming => 1);
$node->start;
+# Initialize the streaming standby
+my $backup_name = 'my_backup';
+$node->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($node, $backup_name, has_streaming => 1);
+$standby->start;
+
# Setup another database
$node->safe_psql("postgres", "CREATE DATABASE other_database;\n");
my $bsession = $node->background_psql('other_database');
@@ -39,9 +46,17 @@ my $result = $node->safe_psql("postgres",
# There should be no false negatives
ok($result eq "", "pg_check_visible() detects no errors");
+# Run pg_check_visible() on standby
+$result = $standby->safe_psql("postgres",
+ "SELECT * FROM pg_check_visible('vacuum_test');");
+
+# There should be no false negatives either
+ok($result eq "", "pg_check_visible() detects no errors");
+
# Shutdown
$bsession->query_safe("COMMIT;");
$bsession->quit;
$node->stop;
+$standby->stop;
done_testing();
--
2.39.3 (Apple Git-146)
On 14/08/2024 04:51, Alexander Korotkov wrote:
On Tue, Aug 13, 2024 at 10:15 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Tue, Aug 13, 2024 at 9:39 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
This causes an assertion failure when executed in a hot standby server:
select * from pg_check_visible('pg_database');
TRAP: failed Assert("!RecoveryInProgress()"), File:
"../src/backend/storage/ipc/procarray.c", Line: 2710, PID: 1142572GetStrictOldestNonRemovableTransactionId does this:
if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
{
/* Shared relation: take into account all running xids */
runningTransactions = GetRunningTransactionData();
LWLockRelease(ProcArrayLock);
LWLockRelease(XidGenLock);
return runningTransactions->oldestRunningXid;
}And GetRunningTransactionData() has this:
Assert(!RecoveryInProgress());
So it's easy to see that you will hit that assertion.
Oh, thank you!
I'll fix this and add a test for recovery!Attached patch fixes the problem and adds the corresponding test. I
would appreciate if you take a look at it.
The code changes seem fine. I think the "Ignore KnownAssignedXids"
comment above the function could be made more clear. It's not wrong, but
I think it doesn't explain the reasoning very well:
* We are now doing no effectively no checking in a standby, because we
always just use nextXid. It's better than nothing, I suppose it will
catch very broken cases where an XID is in the future, but that's all.
* We *could* use KnownAssignedXids for shared catalogs, because with
shared catalogs, the global horizon is used, not a database-aware one.
* Then again, there might be rare corner cases that a transaction has
crashed in the primary without writing a commit/abort record, and hence
it looks like it's still running in the standby but has already ended in
the primary. So I think it's good we ignore KnownAssignedXids for shared
catalogs anyway.
--
Heikki Linnakangas
Neon (https://neon.tech)
On Wed, Aug 14, 2024 at 10:20 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 14/08/2024 04:51, Alexander Korotkov wrote:
On Tue, Aug 13, 2024 at 10:15 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:On Tue, Aug 13, 2024 at 9:39 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
This causes an assertion failure when executed in a hot standby server:
select * from pg_check_visible('pg_database');
TRAP: failed Assert("!RecoveryInProgress()"), File:
"../src/backend/storage/ipc/procarray.c", Line: 2710, PID: 1142572GetStrictOldestNonRemovableTransactionId does this:
if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
{
/* Shared relation: take into account all running xids */
runningTransactions = GetRunningTransactionData();
LWLockRelease(ProcArrayLock);
LWLockRelease(XidGenLock);
return runningTransactions->oldestRunningXid;
}And GetRunningTransactionData() has this:
Assert(!RecoveryInProgress());
So it's easy to see that you will hit that assertion.
Oh, thank you!
I'll fix this and add a test for recovery!Attached patch fixes the problem and adds the corresponding test. I
would appreciate if you take a look at it.The code changes seem fine. I think the "Ignore KnownAssignedXids"
comment above the function could be made more clear. It's not wrong, but
I think it doesn't explain the reasoning very well:* We are now doing no effectively no checking in a standby, because we
always just use nextXid. It's better than nothing, I suppose it will
catch very broken cases where an XID is in the future, but that's all.* We *could* use KnownAssignedXids for shared catalogs, because with
shared catalogs, the global horizon is used, not a database-aware one.* Then again, there might be rare corner cases that a transaction has
crashed in the primary without writing a commit/abort record, and hence
it looks like it's still running in the standby but has already ended in
the primary. So I think it's good we ignore KnownAssignedXids for shared
catalogs anyway.
Thank you for the detailed explanation. I've updated the
GetStrictOldestNonRemovableTransactionId() header comment accordingly.
I'm going to push this if no objections.
------
Regards,
Alexander Korotkov
Supabase
Attachments:
v2-0001-Fix-GetStrictOldestNonRemovableTransactionId-on-s.patchapplication/octet-stream; name=v2-0001-Fix-GetStrictOldestNonRemovableTransactionId-on-s.patchDownload
From 463ba8129abfebd4f56db416b47e4c6220ec95c8 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Wed, 14 Aug 2024 03:13:57 +0300
Subject: [PATCH v2] Fix GetStrictOldestNonRemovableTransactionId() on standby
e85662df44 implemented GetStrictOldestNonRemovableTransactionId() function
for computation of xid horizon that avoid reporting of false errors.
However, GetStrictOldestNonRemovableTransactionId() uses
GetRunningTransactionData() even on standby leading to an assertion failure.
Given that we decided to ignore KnownAssignedXids and standby can't have
own running xids, we switch to use TransamVariables->nextXid as a xid horizon.
Also, revise the comment regarding ignoring KnownAssignedXids with more
detailed reasoning provided by Heikki.
Reported-by: Heikki Linnakangas
Discussion: https://postgr.es/m/42218c4f-2c8d-40a3-8743-4d34dd0e4cce%40iki.fi
---
contrib/pg_visibility/pg_visibility.c | 26 ++++++++++++++++---
.../t/001_concurrent_transaction.pl | 19 ++++++++++++--
2 files changed, 40 insertions(+), 5 deletions(-)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 1a1a4ff7be7..9ee2738da0a 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -546,11 +546,21 @@ collect_visibility_data(Oid relid, bool include_pd)
*
* 1. Ignore processes xmin's, because they consider connection to other
* databases that were ignored before.
- * 2. Ignore KnownAssignedXids, because they are not database-aware. At the
- * same time, the primary could compute its horizons database-aware.
+ * 2. Ignore KnownAssignedXids, as they are not database-aware. Although we
+ * now perform minimal checking on a standby by always using nextXid, this
+ * approach is better than nothing and will at least catch extremely broken
+ * cases where a xid is in the future.
* 3. Ignore walsender xmin, because it could go backward if some replication
* connections don't use replication slots.
*
+ * While it might seem like we could use KnownAssignedXids for shared
+ * catalogs, since shared catalogs rely on a global horizon rather than a
+ * database-specific one—there are potential edge cases. For example, a
+ * transaction may crash on the primary without writing a commit/abort record.
+ * This would lead to a situation where it appears to still be running on the
+ * standby, even though it has already ended on the primary. For this reason,
+ * it's safer to ignore KnownAssignedXids, even for shared catalogs.
+ *
* As a result, we're using only currently running xids to compute the horizon.
* Surely these would significantly sacrifice accuracy. But we have to do so
* to avoid reporting false errors.
@@ -560,7 +570,17 @@ GetStrictOldestNonRemovableTransactionId(Relation rel)
{
RunningTransactions runningTransactions;
- if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
+ if (RecoveryInProgress())
+ {
+ TransactionId result;
+
+ /* As we ignore KnownAssignedXids on standby, just pick nextXid */
+ LWLockAcquire(XidGenLock, LW_SHARED);
+ result = XidFromFullTransactionId(TransamVariables->nextXid);
+ LWLockRelease(XidGenLock);
+ return result;
+ }
+ else if (rel == NULL || rel->rd_rel->relisshared)
{
/* Shared relation: take into account all running xids */
runningTransactions = GetRunningTransactionData();
diff --git a/contrib/pg_visibility/t/001_concurrent_transaction.pl b/contrib/pg_visibility/t/001_concurrent_transaction.pl
index c31d041757d..498ce412d9a 100644
--- a/contrib/pg_visibility/t/001_concurrent_transaction.pl
+++ b/contrib/pg_visibility/t/001_concurrent_transaction.pl
@@ -10,11 +10,18 @@ use PostgreSQL::Test::Utils;
use Test::More;
+# Initialize the primary node
my $node = PostgreSQL::Test::Cluster->new('main');
-
-$node->init;
+$node->init(allows_streaming => 1);
$node->start;
+# Initialize the streaming standby
+my $backup_name = 'my_backup';
+$node->backup($backup_name);
+my $standby = PostgreSQL::Test::Cluster->new('standby');
+$standby->init_from_backup($node, $backup_name, has_streaming => 1);
+$standby->start;
+
# Setup another database
$node->safe_psql("postgres", "CREATE DATABASE other_database;\n");
my $bsession = $node->background_psql('other_database');
@@ -39,9 +46,17 @@ my $result = $node->safe_psql("postgres",
# There should be no false negatives
ok($result eq "", "pg_check_visible() detects no errors");
+# Run pg_check_visible() on standby
+$result = $standby->safe_psql("postgres",
+ "SELECT * FROM pg_check_visible('vacuum_test');");
+
+# There should be no false negatives either
+ok($result eq "", "pg_check_visible() detects no errors");
+
# Shutdown
$bsession->query_safe("COMMIT;");
$bsession->quit;
$node->stop;
+$standby->stop;
done_testing();
--
2.39.3 (Apple Git-146)