Pruning never visible changes
A user asked me whether we prune never visible changes, such as
BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;
Once committed, the original insert is no longer visible to anyone, so
"ought to be able to be pruned", sayeth the user. And they also say
that changing the app is much harder, as ever.
After some thought, Yes, we can prune, but not in all cases - only if
the never visible tuple is at the root end of the update chain. The
only question is can that be done cheaply enough to bother with. The
answer in one specific case is Yes, in other cases No.
This patch adds a new test for this use case, and code to remove the
never visible row when the changes are made by the same xid.
(I'm pretty sure there used to be a test for this some years back and
I'm guessing it was removed because it isn't always possible to remove
the tuple, which this new patch honours.)
Please let me know what you think.
--
Simon Riggs http://www.EnterpriseDB.com/
Attachments:
never_visible.v1.patchapplication/octet-stream; name=never_visible.v1.patchDownload
commit 70973802b461b3752c520a9e43d2bb96ebb36f14
Author: Simon Riggs <simon.riggs@enterprisedb.com>
Date: Mon Sep 5 14:37:11 2022 +0100
Improve effectiveness of heap pruning for never visible tuples
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9f43bbe25f..15548896b8 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -597,7 +597,8 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
HeapTupleHeader htup;
OffsetNumber latestdead = InvalidOffsetNumber,
maxoff = PageGetMaxOffsetNumber(dp),
- offnum;
+ offnum,
+ priorOffnum;
OffsetNumber chainitems[MaxHeapTuplesPerPage];
int nchain = 0,
i;
@@ -654,7 +655,8 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
{
ItemId lp;
bool tupdead,
- recent_dead;
+ recent_dead,
+ never_visible;
/* Sanity check (pure paranoia) */
if (offnum < FirstOffsetNumber)
@@ -718,7 +720,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
/*
* Check tuple's visibility status.
*/
- tupdead = recent_dead = false;
+ tupdead = recent_dead = never_visible = false;
switch ((HTSV_Result) prstate->htsv[offnum])
{
@@ -727,16 +729,32 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
break;
case HEAPTUPLE_RECENTLY_DEAD:
+ {
+ TransactionId prune_xmin = HeapTupleHeaderGetXmin(htup);
+ TransactionId prune_xmax = HeapTupleHeaderGetUpdateXid(htup);
recent_dead = true;
+ /*
+ * Row versions that are both inserted and updated by the same
+ * transaction will never be visible outside of the transaction
+ * and might be removable, even though only recently dead.
+ * This check can be performed very cheaply with data at hand.
+ *
+ * It would also be correct to check for rows inserted in one
+ * subtransaction, then later inserted in a later subtransaction
+ * within one top-level transaction, but we'd need to hit subtrans
+ * with lots of requests, so it will be very slow.
+ */
+ if (TransactionIdEquals(prune_xmin, prune_xmax))
+ never_visible = true;
+
/*
* This tuple may soon become DEAD. Update the hint field so
* that the page is reconsidered for pruning in future.
*/
- heap_prune_record_prunable(prstate,
- HeapTupleHeaderGetUpdateXid(htup));
+ heap_prune_record_prunable(prstate,prune_xmax);
break;
-
+ }
case HEAPTUPLE_DELETE_IN_PROGRESS:
/*
@@ -769,8 +787,19 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
* but we can't advance past anything else. We have to make sure that
* we don't miss any DEAD tuples, since DEAD tuples that still have
* tuple storage after pruning will confuse VACUUM.
+ *
+ * There are some useful edge cases where we can remove tuples, even
+ * when they are only recently dead:
+ * - We can remove never_visible tuples at the start of the chain.
+ * - If the last member of the chain was latestdead then we can also
+ * remove never_visible tuples immediately afterwards, since doing
+ * so will not break the chain. Doing this can greatly increase the
+ * effectiveness of pruning when an application performs an INSERT
+ * then an UPDATEs the row in the same transaction, or when an app
+ * performs multiple UPDATEs of the same row in the same transaction.
*/
- if (tupdead)
+ if (tupdead ||
+ (never_visible && latestdead == priorOffnum))
{
latestdead = offnum;
HeapTupleHeaderAdvanceLatestRemovedXid(htup,
@@ -794,6 +823,7 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum, PruneState *prstate)
*/
Assert(ItemPointerGetBlockNumber(&htup->t_ctid) ==
BufferGetBlockNumber(buffer));
+ priorOffnum = offnum;
offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
diff --git a/src/test/isolation/expected/never-visible.out b/src/test/isolation/expected/never-visible.out
new file mode 100644
index 0000000000..cbc1f25cb2
--- /dev/null
+++ b/src/test/isolation/expected/never-visible.out
@@ -0,0 +1,12 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1b s1i s2b s2i s2u s2u s2c s2v s1c
+step s1b: BEGIN;
+step s1i: INSERT INTO t (id) VALUES (1);
+step s2b: BEGIN;
+step s2i: INSERT INTO t (id) VALUES (2);
+step s2u: UPDATE t SET txt = 'a' WHERE id = 2;
+step s2u: UPDATE t SET txt = 'a' WHERE id = 2;
+step s2c: COMMIT;
+step s2v: VACUUM t;
+step s1c: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 529a4cbd4d..49f72a7069 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -16,6 +16,7 @@ test: ri-trigger
test: partial-index
test: two-ids
test: multiple-row-versions
+test: never-visible
test: index-only-scan
test: predicate-lock-hot-tuple
test: update-conflict-out
diff --git a/src/test/isolation/specs/never-visible.spec b/src/test/isolation/specs/never-visible.spec
new file mode 100644
index 0000000000..67d995ee87
--- /dev/null
+++ b/src/test/isolation/specs/never-visible.spec
@@ -0,0 +1,28 @@
+# Never visible test
+#
+# Test pruning of rows that were never visible outside of the
+# originating transaction.
+
+setup
+{
+ CREATE TABLE t (id int NOT NULL, txt text) WITH (fillfactor=50);
+}
+
+teardown
+{
+ DROP TABLE t;
+}
+
+session s1
+step s1b { BEGIN; }
+step s1i { INSERT INTO t (id) VALUES (1); }
+step s1c { COMMIT; }
+
+session s2
+step s2b { BEGIN; }
+step s2i { INSERT INTO t (id) VALUES (2); }
+step s2u { UPDATE t SET txt = 'a' WHERE id = 2; }
+step s2c { COMMIT; }
+step s2v { VACUUM t; }
+
+permutation s1b s1i s2b s2i s2u s2u s2c s2v s1c
Simon Riggs <simon.riggs@enterprisedb.com> writes:
A user asked me whether we prune never visible changes, such as
BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;
Didn't we just have this discussion in another thread? You cannot
do that, at least not without checking that the originating
transaction has no snapshots that could see the older row version.
I'm not sure whether or not snapmgr.c has enough information to
determine that, but in any case this formulation is surely
unsafe, because it isn't even checking whether that transaction is
our own, let alone asking snapmgr.c.
I'm dubious that a safe version would fire often enough to be
worth the cycles spent.
regards, tom lane
On Fri, 16 Sept 2022 at 15:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
A user asked me whether we prune never visible changes, such as
BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;Didn't we just have this discussion in another thread?
Not that I was aware of, but it sounds like a different case anyway.
You cannot
do that, at least not without checking that the originating
transaction has no snapshots that could see the older row version.
I'm saying this is possible only AFTER the end of the originating
xact, so there are no issues with additional snapshots.
i.e. the never visible row has to be judged RECENTLY_DEAD before we even check.
--
Simon Riggs http://www.EnterpriseDB.com/
Simon Riggs <simon.riggs@enterprisedb.com> writes:
On Fri, 16 Sept 2022 at 15:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
You cannot
do that, at least not without checking that the originating
transaction has no snapshots that could see the older row version.
I'm saying this is possible only AFTER the end of the originating
xact, so there are no issues with additional snapshots.
I see, so the point is just that we can prune even if the originating
xact hasn't yet passed the global xmin horizon. I agree that's safe,
but will it fire often enough to be worth the trouble? Also, why
does it need to be restricted to certain shapes of HOT chains ---
that is, why can't we do exactly what we'd do if the xact *were*
past the xmin horizon?
regards, tom lane
On Fri, 2022-09-16 at 10:26 -0400, Tom Lane wrote:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
A user asked me whether we prune never visible changes, such as
BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;Didn't we just have this discussion in another thread?
For reference: that was
/messages/by-id/f6a491b32cb44bb5daaafec835364f7149348fa1.camel@cybertec.at
Yours,
Laurenz Albe
On Fri, 16 Sept 2022 at 21:07, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2022-09-16 at 10:26 -0400, Tom Lane wrote:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
A user asked me whether we prune never visible changes, such as
BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;Didn't we just have this discussion in another thread?
For reference: that was
/messages/by-id/f6a491b32cb44bb5daaafec835364f7149348fa1.camel@cybertec.at
Thanks. I confirm I hadn't seen that, and indeed, I wrote the patch on
5 Sept before you posted.
--
Simon Riggs http://www.EnterpriseDB.com/
On Fri, 16 Sept 2022 at 18:37, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
On Fri, 16 Sept 2022 at 15:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
You cannot
do that, at least not without checking that the originating
transaction has no snapshots that could see the older row version.I'm saying this is possible only AFTER the end of the originating
xact, so there are no issues with additional snapshots.I see, so the point is just that we can prune even if the originating
xact hasn't yet passed the global xmin horizon. I agree that's safe,
but will it fire often enough to be worth the trouble?
It is an edge case with limited utility, I agree, which is why I
looked for a cheap test (xmin == xmax only).
This additional test is also valid, but too expensive to apply:
TransactionIdGetTopmostTranactionId(xmax) ==
TransactionIdGetTopmostTranactionId(xmin)
Also, why
does it need to be restricted to certain shapes of HOT chains ---
that is, why can't we do exactly what we'd do if the xact *were*
past the xmin horizon?
I thought it important to maintain the integrity of the HOT chain, in
that the xmax of one tuple version matches the xmin of the next. So
pruning only from the root of the chain allows us to maintain that
validity check.
I'm on the fence myself, which is why I didn't post it immediately I
had written it.
If not, it would be useful to add a note in comments to the code to
explain why we don't do this.
--
Simon Riggs http://www.EnterpriseDB.com/
On Fri, 16 Sept 2022 at 10:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
A user asked me whether we prune never visible changes, such as
BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;Didn't we just have this discussion in another thread?
Well..... not "just" :)
commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Apr 29 16:29:42 2011 -0400
Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum().
VACUUM was willing to remove a committed-dead tuple immediately if it was
deleted by the same transaction that inserted it. The idea is that such a
tuple could never have been visible to any other transaction, so we don't
need to keep it around to satisfy MVCC snapshots. However, there was
already an exception for tuples that are part of an update chain, and this
exception created a problem: we might remove TOAST tuples (which are never
part of an update chain) while their parent tuple stayed around (if it was
part of an update chain). This didn't pose a problem for most things,
since the parent tuple is indeed dead: no snapshot will ever consider it
visible. But MVCC-safe CLUSTER had a problem, since it will try to copy
RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST
data too, and would fail if VACUUM had already removed the toast tuples.
Easiest fix is to get rid of the special case for xmin == xmax. This may
delay reclaiming dead space for a little bit in some cases, but it's by far
the most reliable way to fix the issue.
Per bug #5998 from Mark Reid. Back-patch to 8.3, which is the oldest
version with MVCC-safe CLUSTER.
On Mon, 19 Sept 2022 at 01:16, Greg Stark <stark@mit.edu> wrote:
On Fri, 16 Sept 2022 at 10:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
A user asked me whether we prune never visible changes, such as
BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;Didn't we just have this discussion in another thread?
Well..... not "just" :)
This recent thread [0]/messages/by-id/2031521.1663076724@sss.pgh.pa.us Subject: Re: Tuples inserted and deleted by the same transaction Date: 2022-09-13 14:13:44 mentioned the same, and I mentioned it in [1]/messages/by-id/CAEze2Whjnhg96Wt2-DxtBydhmMDmVm2WfWOX3aGcB2C2Hbry0Q@mail.gmail.com Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic Date: 2021-06-14 09:53:47 (in a thread about a PS comment)
too last year.
Kind regards,
Matthias van de Meent
[0]: /messages/by-id/2031521.1663076724@sss.pgh.pa.us Subject: Re: Tuples inserted and deleted by the same transaction Date: 2022-09-13 14:13:44
Subject: Re: Tuples inserted and deleted by the same transaction
Date: 2022-09-13 14:13:44
[1]: /messages/by-id/CAEze2Whjnhg96Wt2-DxtBydhmMDmVm2WfWOX3aGcB2C2Hbry0Q@mail.gmail.com Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic Date: 2021-06-14 09:53:47 (in a thread about a PS comment)
Subject: Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date: 2021-06-14 09:53:47
(in a thread about a PS comment)
On Mon, 19 Sept 2022 at 00:16, Greg Stark <stark@mit.edu> wrote:
On Fri, 16 Sept 2022 at 10:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Simon Riggs <simon.riggs@enterprisedb.com> writes:
A user asked me whether we prune never visible changes, such as
BEGIN;
INSERT...
UPDATE.. (same row)
COMMIT;Didn't we just have this discussion in another thread?
Well..... not "just" :)
commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Apr 29 16:29:42 2011 -0400Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum().
VACUUM was willing to remove a committed-dead tuple immediately if it was
deleted by the same transaction that inserted it. The idea is that such a
tuple could never have been visible to any other transaction, so we don't
need to keep it around to satisfy MVCC snapshots. However, there was
already an exception for tuples that are part of an update chain, and this
exception created a problem: we might remove TOAST tuples (which are never
part of an update chain) while their parent tuple stayed around (if it was
part of an update chain). This didn't pose a problem for most things,
since the parent tuple is indeed dead: no snapshot will ever consider it
visible. But MVCC-safe CLUSTER had a problem, since it will try to copy
RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST
data too, and would fail if VACUUM had already removed the toast tuples.Easiest fix is to get rid of the special case for xmin == xmax. This may
delay reclaiming dead space for a little bit in some cases, but it's by far
the most reliable way to fix the issue.Per bug #5998 from Mark Reid. Back-patch to 8.3, which is the oldest
version with MVCC-safe CLUSTER.
Good research Greg, thank you. Only took 10 years for me to notice it
was gone ;-)
--
Simon Riggs http://www.EnterpriseDB.com/
On 2022-Sep-22, Simon Riggs wrote:
On Mon, 19 Sept 2022 at 00:16, Greg Stark <stark@mit.edu> wrote:
VACUUM was willing to remove a committed-dead tuple immediately if it was
deleted by the same transaction that inserted it. The idea is that such a
tuple could never have been visible to any other transaction, so we don't
need to keep it around to satisfy MVCC snapshots. However, there was
already an exception for tuples that are part of an update chain, and this
exception created a problem: we might remove TOAST tuples (which are never
part of an update chain) while their parent tuple stayed around (if it was
part of an update chain). This didn't pose a problem for most things,
since the parent tuple is indeed dead: no snapshot will ever consider it
visible. But MVCC-safe CLUSTER had a problem, since it will try to copy
RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST
data too, and would fail if VACUUM had already removed the toast tuples.
Good research Greg, thank you. Only took 10 years for me to notice it
was gone ;-)
But this begs the question: is the proposed change safe, given that
ancient consideration? I don't think TOAST issues have been mentioned
in this thread so far, so I wonder if there is a test case that verifies
that this problem doesn't occur for some reason.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
On Thu, 22 Sept 2022 at 15:16, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Sep-22, Simon Riggs wrote:
On Mon, 19 Sept 2022 at 00:16, Greg Stark <stark@mit.edu> wrote:
VACUUM was willing to remove a committed-dead tuple immediately if it was
deleted by the same transaction that inserted it. The idea is that such a
tuple could never have been visible to any other transaction, so we don't
need to keep it around to satisfy MVCC snapshots. However, there was
already an exception for tuples that are part of an update chain, and this
exception created a problem: we might remove TOAST tuples (which are never
part of an update chain) while their parent tuple stayed around (if it was
part of an update chain). This didn't pose a problem for most things,
since the parent tuple is indeed dead: no snapshot will ever consider it
visible. But MVCC-safe CLUSTER had a problem, since it will try to copy
RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST
data too, and would fail if VACUUM had already removed the toast tuples.Good research Greg, thank you. Only took 10 years for me to notice it
was gone ;-)But this begs the question: is the proposed change safe, given that
ancient consideration? I don't think TOAST issues have been mentioned
in this thread so far, so I wonder if there is a test case that verifies
that this problem doesn't occur for some reason.
Oh, completely agreed.
I will submit a modified patch that adds a test case and just a
comment to explain why we can't remove such rows.
--
Simon Riggs http://www.EnterpriseDB.com/