pgsql: Fix "base" snapshot handling in logical decoding
Fix "base" snapshot handling in logical decoding
Two closely related bugs are fixed. First, xmin of logical slots was
advanced too early. During xl_running_xacts processing, xmin of the
slot was set to the oldest running xid in the record, but that's wrong:
actually, snapshots which will be used for not-yet-replayed transactions
might consider older txns as running too, so we need to keep xmin back
for them. The problem wasn't noticed earlier because DDL which allows
to delete tuple (set xmax) while some another not-yet-committed
transaction looks at it is pretty rare, if not unique: e.g. all forms of
ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock
conflicting with any inserts. The included test case (test_decoding's
oldest_xmin) uses ALTER of a composite type, which doesn't have such
interlocking.
To deal with this, we must be able to quickly retrieve oldest xmin
(oldest running xid among all assigned snapshots) from ReorderBuffer. To
fix, add another list of ReorderBufferTXNs to the reorderbuffer, where
transactions are sorted by base-snapshot-LSN. This is slightly
different from the existing (sorted by first-LSN) list, because a
transaction can have an earlier LSN but a later Xmin, if its first
record does not obtain an xmin (eg. xl_xact_assignment). Note this new
list doesn't fully replace the existing txn list: we still need that one
to prevent WAL recycling.
The second issue concerns SnapBuilder snapshots and subtransactions.
SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a
transaction that is known to be a subtxn, which is good in the common
case that the top-level transaction already has one (no point in doing
so), but a bug otherwise. To fix, arrange to transfer the snapshot from
the subtxn to its top-level txn as soon as the kinship gets known.
test_decoding's snapshot_transfer verifies this.
Also, fix a minor memory leak: refcount of toplevel's old base snapshot
was not decremented when the snapshot is transferred from child.
Liberally sprinkle code comments, and rewrite a few existing ones. This
part is my (Álvaro's) contribution to this commit, as I had to write all
those comments in order to understand the existing code and Arseny's
patch.
Reported-by: Arseny Sher <a.sher@postgrespro.ru>
Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Arseny Sher <a.sher@postgrespro.ru>
Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Antonin Houska <ah@cybertec.at>
Discussion: /messages/by-id/87lgdyz1wj.fsf@ars-thinkpad
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/f49a80c481f74fa81407dce8e51dea6956cb64f8
Modified Files
--------------
contrib/test_decoding/Makefile | 3 +-
contrib/test_decoding/expected/oldest_xmin.out | 27 ++
.../test_decoding/expected/snapshot_transfer.out | 49 ++++
contrib/test_decoding/specs/oldest_xmin.spec | 37 +++
contrib/test_decoding/specs/snapshot_transfer.spec | 42 +++
src/backend/replication/logical/reorderbuffer.c | 304 +++++++++++++++------
src/backend/replication/logical/snapbuild.c | 29 +-
src/include/replication/reorderbuffer.h | 22 +-
8 files changed, 409 insertions(+), 104 deletions(-)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Fix "base" snapshot handling in logical decoding
According to buildfarm member friarbird, and as confirmed here,
the contrib/test_decoding/specs/oldest_xmin.spec test added by this
commit fails under CLOBBER_CACHE_ALWAYS.
friarbird's error output doesn't seem to be getting captured by the
buildfarm script, but what I see is
***************
*** 16,21 ****
--- 16,22 ----
step s1_commit: COMMIT;
step s0_vacuum: VACUUM FULL;
+ ERROR: canceling statement due to user request
step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_s
lot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
which looks like it might just be a timeout problem.
regards, tom lane
On 2018-Jun-28, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Fix "base" snapshot handling in logical decoding
According to buildfarm member friarbird, and as confirmed here,
the contrib/test_decoding/specs/oldest_xmin.spec test added by this
commit fails under CLOBBER_CACHE_ALWAYS.
Thanks for the notice, looking into it now.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Jun-28, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Fix "base" snapshot handling in logical decoding
According to buildfarm member friarbird, and as confirmed here,
the contrib/test_decoding/specs/oldest_xmin.spec test added by this
commit fails under CLOBBER_CACHE_ALWAYS.
Hm. Running this test under CLOBBER_CACHE_ALWAYS I see that the VACUUM
FULL step takes 31 seconds (and the test succeeds). This is not
top-of-the-line hardware by a long shot (Intel(R) Core(TM) i7-4600U CPU
@ 2.10GHz) but I can believe that the other machines are slower or
busier.
Or there might be a completely different explanation ...
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Jun-29, Alvaro Herrera wrote:
On 2018-Jun-28, Tom Lane wrote:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
Fix "base" snapshot handling in logical decoding
According to buildfarm member friarbird, and as confirmed here,
the contrib/test_decoding/specs/oldest_xmin.spec test added by this
commit fails under CLOBBER_CACHE_ALWAYS.Hm. Running this test under CLOBBER_CACHE_ALWAYS I see that the VACUUM
FULL step takes 31 seconds (and the test succeeds). This is not
top-of-the-line hardware by a long shot (Intel(R) Core(TM) i7-4600U CPU
@ 2.10GHz) but I can believe that the other machines are slower or
busier.
It does fail in the indicated way in CLOBBER_CACHE_RECURSIVELY, but I
guess that's expected. It also fails if I reduce the timeout from 60
seconds to 25.
I suppose 60 seconds (isolationtester's default timeout) is just not
enough time for those machines. We could increase it to 180 seconds and
see if that's enough to make them pass ...
Another possibility is to examine column 'state' in isolationtester. I
thought we only used that timeout for 'waiting' queries, but I see it's
not doing that.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Jun-28, Tom Lane wrote:
According to buildfarm member friarbird, and as confirmed here,
the contrib/test_decoding/specs/oldest_xmin.spec test added by this
commit fails under CLOBBER_CACHE_ALWAYS.
I suppose 60 seconds (isolationtester's default timeout) is just not
enough time for those machines. We could increase it to 180 seconds and
see if that's enough to make them pass ...
What I want to know is why this test is doing a database-wide VACUUM FULL
in the first place. If that isn't profligate wastage of testing cycles,
why not?
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
On 2018-Jun-28, Tom Lane wrote:
According to buildfarm member friarbird, and as confirmed here,
the contrib/test_decoding/specs/oldest_xmin.spec test added by this
commit fails under CLOBBER_CACHE_ALWAYS.I suppose 60 seconds (isolationtester's default timeout) is just not
enough time for those machines. We could increase it to 180 seconds and
see if that's enough to make them pass ...What I want to know is why this test is doing a database-wide VACUUM FULL
in the first place. If that isn't profligate wastage of testing cycles,
why not?regards, tom lane
Oh, that's my fault. I think just VACUUM pg_attribute is enough there --
it takes 42ms on my laptop with CLOBBER_CACHE_ALWAYS ('vacuum full'
occupies 1 minute), patch is attached. The test is still steadily fails
without the main patch.
There is also one thing that puzzles me as I don't know much about
vacuum internals. If I do plain VACUUM of pg_attribute in the test, it
shouts "catalog is missing 1 attribute(s) for relid" error (which is
quite expected), while with 'VACUUM FULL pg_attribute' the tuple is
silently (and wrongly, with dropped column missing) decoded. Moreover,
if I perform the test manually, and do 'VACUUM FULL;', sometimes test
becomes useless -- that is, tuple is successfully decoded with all three
columns, as though VACUUM was not actually executed. All this is without
the main patch, of course. I think I will look into this soon.
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
less_agressive_vacuum_in_oldest_xmin_test.patchtext/x-diffDownload
diff --git a/contrib/test_decoding/expected/oldest_xmin.out b/contrib/test_decoding/expected/oldest_xmin.out
index d09342c4be..a5d7d3c2ca 100644
--- a/contrib/test_decoding/expected/oldest_xmin.out
+++ b/contrib/test_decoding/expected/oldest_xmin.out
@@ -15,7 +15,7 @@ step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slo
data
step s1_commit: COMMIT;
-step s0_vacuum: VACUUM FULL;
+step s0_vacuum: VACUUM pg_attribute;
step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
diff --git a/contrib/test_decoding/specs/oldest_xmin.spec b/contrib/test_decoding/specs/oldest_xmin.spec
index 4f8af70aa2..cd2b69cae2 100644
--- a/contrib/test_decoding/specs/oldest_xmin.spec
+++ b/contrib/test_decoding/specs/oldest_xmin.spec
@@ -22,7 +22,7 @@ step "s0_getxid" { SELECT txid_current() IS NULL; }
step "s0_alter" { ALTER TYPE basket DROP ATTRIBUTE mangos; }
step "s0_commit" { COMMIT; }
step "s0_checkpoint" { CHECKPOINT; }
-step "s0_vacuum" { VACUUM FULL; }
+step "s0_vacuum" { VACUUM pg_attribute; }
step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
session "s1"
Arseny Sher <a.sher@postgrespro.ru> writes:
There is also one thing that puzzles me as I don't know much about
vacuum internals. If I do plain VACUUM of pg_attribute in the test, it
shouts "catalog is missing 1 attribute(s) for relid" error (which is
quite expected), while with 'VACUUM FULL pg_attribute' the tuple is
silently (and wrongly, with dropped column missing) decoded. Moreover,
if I perform the test manually, and do 'VACUUM FULL;', sometimes test
becomes useless -- that is, tuple is successfully decoded with all three
columns, as though VACUUM was not actually executed. All this is without
the main patch, of course. I think I will look into this soon.
So I have been jumping around this and learned a few curious things.
1) Test in its current shape sometimes doesn't fulfill its aim indeed --
that is, despite issued VACUUM the tuple is still decoded with all
three fields. This happens because during attempt to advance xmin
there is a good possiblity to encounter xl_running_xacts record logged
before our CHECKPOINT (they are logged each 15 seconds). We do not
try to advance xmin twice without client acknowledgment, so in this
case xmin will not be advanced far enough to allow vacuum prune entry
from pg_attribute.
2) This is not easy to notice because often (but not always) explicit
VACUUM is not needed at all: tuple is often pruned by microvacuum
(heap_page_prune_opts) right in the final decoding session. If we
hadn't bumped xmin far enough during previous get_changes, we do that
now, so microvacuum actually purges the entry. But if we were so
unfortunate that 1) extra xl_running_xacts was logged and 2)
microvacuum was in bad mood and didn't come, pg_attribute is not
vacuumed and test becomes useless. To make this bulletproof, in the
attached patch I doubled first get_changes: now there are two client
acks, so our VACUUM always does the job.
3) As a side note, answer to my question 'why do we get different errors
with VACUUM and VACUUM FULL' is the following. With VACUUM FULL, not
only old pg_attribute entry is pruned, but also xmin of new entry
with attisdropped=true is reset to frozen xid. This means that
decoding session (RelationBuildTupleDesc) actually sees 3 attributes,
and the fact that one of them is dropped doesn't embarrass this
function (apparently relnatts in pg_class is never decremented) --
we just go ahead and decode only live attributes.
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachments:
less_agressive_vacuum_in_oldest_xmin_test.patchtext/x-diffDownload
diff --git a/contrib/test_decoding/expected/oldest_xmin.out b/contrib/test_decoding/expected/oldest_xmin.out
index d09342c4be..d1b4f17e3a 100644
--- a/contrib/test_decoding/expected/oldest_xmin.out
+++ b/contrib/test_decoding/expected/oldest_xmin.out
@@ -1,6 +1,6 @@
Parsed test spec with 2 sessions
-starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit s0_checkpoint s0_get_changes s1_commit s0_vacuum s0_get_changes
+starting permutation: s0_begin s0_getxid s1_begin s1_insert s0_alter s0_commit s0_checkpoint s0_get_changes s0_get_changes s1_commit s0_vacuum s0_get_changes
step s0_begin: BEGIN;
step s0_getxid: SELECT txid_current() IS NULL;
?column?
@@ -14,8 +14,11 @@ step s0_checkpoint: CHECKPOINT;
step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
+step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
+data
+
step s1_commit: COMMIT;
-step s0_vacuum: VACUUM FULL;
+step s0_vacuum: VACUUM pg_attribute;
step s0_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
data
diff --git a/contrib/test_decoding/specs/oldest_xmin.spec b/contrib/test_decoding/specs/oldest_xmin.spec
index 4f8af70aa2..141fe2b145 100644
--- a/contrib/test_decoding/specs/oldest_xmin.spec
+++ b/contrib/test_decoding/specs/oldest_xmin.spec
@@ -22,7 +22,7 @@ step "s0_getxid" { SELECT txid_current() IS NULL; }
step "s0_alter" { ALTER TYPE basket DROP ATTRIBUTE mangos; }
step "s0_commit" { COMMIT; }
step "s0_checkpoint" { CHECKPOINT; }
-step "s0_vacuum" { VACUUM FULL; }
+step "s0_vacuum" { VACUUM pg_attribute; }
step "s0_get_changes" { SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1'); }
session "s1"
@@ -30,8 +30,11 @@ step "s1_begin" { BEGIN; }
step "s1_insert" { INSERT INTO harvest VALUES ((1, 2, 3)); }
step "s1_commit" { COMMIT; }
-# Checkpoint with following get_changes forces to advance xmin. ALTER of a
+# Checkpoint with following get_changes forces xmin advancement. We do
+# get_changes twice because if one more xl_running_xacts record had slipped
+# before our CHECKPOINT, xmin will be advanced only on this record, thus not
+# reaching value needed for vacuuming corresponding pg_attribute entry. ALTER of
# composite type is a rare form of DDL which allows T1 to see the tuple which
# will be removed (xmax set) before T1 commits. That is, interlocking doesn't
# forbid modifying catalog after someone read it (and didn't commit yet).
-permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_get_changes" "s1_commit" "s0_vacuum" "s0_get_changes"
+permutation "s0_begin" "s0_getxid" "s1_begin" "s1_insert" "s0_alter" "s0_commit" "s0_checkpoint" "s0_get_changes" "s0_get_changes""s1_commit" "s0_vacuum" "s0_get_changes"
Hi Arseny. I'm writing a commit message to push this test change, and I
can't explain this bit:
On 2018-Jul-02, Arseny Sher wrote:
3) As a side note, answer to my question 'why do we get different errors
with VACUUM and VACUUM FULL' is the following. With VACUUM FULL, not
only old pg_attribute entry is pruned, but also xmin of new entry
with attisdropped=true is reset to frozen xid. This means that
decoding session (RelationBuildTupleDesc) actually sees 3 attributes,
and the fact that one of them is dropped doesn't embarrass this
function (apparently relnatts in pg_class is never decremented) --
we just go ahead and decode only live attributes.
I just don't see it that VACUUM FULL would change the xmin of anything
to FrozenXid, and in my experiments it doesn't. Did you mean VACUUM
FREEZE?
PS - sorry about the broken CC I added :-(
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for the detective work! I pushed this test change. (I still
don't quite understand what you mean with the freezing part, though.)
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I just don't see it that VACUUM FULL would change the xmin of anything
to FrozenXid, and in my experiments it doesn't. Did you mean VACUUM
FREEZE?
Well, docs for VACUUM say:
FREEZE
Selects aggressive “freezing” of tuples. Specifying FREEZE is
equivalent to performing VACUUM with the vacuum_freeze_min_age and
vacuum_freeze_table_age parameters set to zero. Aggressive freezing
is always performed when the table is rewritten, so this option is
redundant when FULL is specified.
So this is literally the same. rewrite_heap_tuple does the job.
Thanks for the detective work! I pushed this test change.
Thank you, I appreciate this.
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company