pgsql: Fix freezing of a dead HOT-updated tuple
Fix freezing of a dead HOT-updated tuple
Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But
concurrent transaction commit/abort may turn DEAD some of the HOT tuples
that survived the prune, before HeapTupleSatisfiesVacuum tests them.
This happens to activate the code that decides to freeze the tuple ...
which resuscitates it, duplicating data.
(This is especially bad if there's any unique constraints, because those
are now internally violated due to the duplicate entries, though you
won't know until you try to REINDEX or dump/restore the table.)
One possible fix would be to simply skip doing anything to the tuple,
and hope that the next HOT prune would remove it. But there is a
problem: if the tuple is older than freeze horizon, this would leave an
unfrozen XID behind, and if no HOT prune happens to clean it up before
the containing pg_clog segment is truncated away, it'd later cause an
error when the XID is looked up.
Fix the problem by having the tuple freezing routines cope with the
situation: don't freeze the tuple (and keep it dead). In the cases that
the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
so that there is no need to look up the XID in pg_clog later on.
An isolation test is included, authored by Michael Paquier, loosely
based on Daniel Wood's original reproducer. It only tests one
particular scenario, though, not all the possible ways for this problem
to surface; it be good to have a more reliable way to test this more
fully, but it'd require more work.
In message /messages/by-id/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql
I outlined another test case (more closely matching Dan Wood's) that
exposed a few more ways for the problem to occur.
Backpatch all the way back to 9.3, where this problem was introduced by
multixact juggling. In branches 9.3 and 9.4, this includes a backpatch
of commit e5ff9fefcd50 (of 9.5 era), since the original is not
correctable without matching the coding pattern in 9.5 up.
Reported-by: Daniel Wood
Diagnosed-by: Daniel Wood
Reviewed-by: Yi Wen Wong, Michaël Paquier
Discussion: /messages/by-id/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/20b655224249e6d2daf7ef0595995228baddb381
Modified Files
--------------
src/backend/access/heap/heapam.c | 57 +++++++++----
src/backend/commands/vacuumlazy.c | 20 ++---
src/test/isolation/expected/freeze-the-dead.out | 101 ++++++++++++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/freeze-the-dead.spec | 27 +++++++
5 files changed, 179 insertions(+), 27 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Sep 28, 2017 at 7:47 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Fix freezing of a dead HOT-updated tuple
If I run Dan Wood's test case again, the obvious symptom (spurious
duplicates) goes away. However, the enhanced amcheck, and thus CREATE
INDEX/REINDEX, still isn't happy about this:
postgres=# select bt_index_check('t_pkey', true);
DEBUG: 00000: verifying presence of required tuples in index "t_pkey"
LOCATION: bt_check_every_level, verify_nbtree.c:424
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,6) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597
Time: 3.699 ms
--
Peter Geoghegan
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Peter Geoghegan wrote:
On Thu, Sep 28, 2017 at 7:47 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Fix freezing of a dead HOT-updated tuple
If I run Dan Wood's test case again, the obvious symptom (spurious
duplicates) goes away. However, the enhanced amcheck, and thus CREATE
INDEX/REINDEX, still isn't happy about this:postgres=# select bt_index_check('t_pkey', true);
DEBUG: 00000: verifying presence of required tuples in index "t_pkey"
LOCATION: bt_check_every_level, verify_nbtree.c:424
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,6) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597
Time: 3.699 ms
... Rats, I obviously missed the message where you said that amcheck
detected this problem.
Odd that it's not fixed. I guess there's still some more work to do
here ...
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera wrote:
Odd that it's not fixed. I guess there's still some more work to do
here ...
Maybe what this means is that we need to do both Dan's initially
proposed patch (or something related to it) apart from the fixes already
pushed. IOW we need to put back some of the "tupkeep" business ...
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Alvaro Herrera wrote:
Odd that it's not fixed. I guess there's still some more work to do
here ...Maybe what this means is that we need to do both Dan's initially
proposed patch (or something related to it) apart from the fixes already
pushed. IOW we need to put back some of the "tupkeep" business ...
We certainly do still see wrong answers to queries here:
postgres=# select ctid, xmin, xmax, * from t;
ctid | xmin | xmax | id | name | x
-------+-------+------+----+------+---
(0,1) | 21171 | 0 | 1 | 111 | 0
(0,7) | 21177 | 0 | 3 | 333 | 5
(2 rows)
postgres=# select * from t where id = 3;
id | name | x
----+------+---
3 | 333 | 5
(1 row)
postgres=# set enable_seqscan = off;
SET
postgres=# select * from t where id = 3;
id | name | x
----+------+---
(0 rows)
FWIW, I am reminded a little bit of the MultiXact/recovery bug I
reported way back in February of 2014 [1]/messages/by-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com, which also had a HOT
interaction that caused index scans to give wrong answers, despite
more or less structurally sound indexes.
[1]: /messages/by-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote:
We certainly do still see wrong answers to queries here:
postgres=# select ctid, xmin, xmax, * from t;
ctid | xmin | xmax | id | name | x
-------+-------+------+----+------+---
(0,1) | 21171 | 0 | 1 | 111 | 0
(0,7) | 21177 | 0 | 3 | 333 | 5
(2 rows)postgres=# select * from t where id = 3;
id | name | x
----+------+---
3 | 333 | 5
(1 row)postgres=# set enable_seqscan = off;
SET
postgres=# select * from t where id = 3;
id | name | x
----+------+---
(0 rows)
Yeah, oops.
FWIW, I am reminded a little bit of the MultiXact/recovery bug I
reported way back in February of 2014 [1], which also had a HOT
interaction that caused index scans to give wrong answers, despite
more or less structurally sound indexes.[1] /messages/by-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com
Thanks for the reference. I didn't remember this problem and it's not
(wasn't) in my list of things to look into. Perhaps these are both the
same bug.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 28, 2017 at 2:39 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
FWIW, I am reminded a little bit of the MultiXact/recovery bug I
reported way back in February of 2014 [1], which also had a HOT
interaction that caused index scans to give wrong answers, despite
more or less structurally sound indexes.[1] /messages/by-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com
Thanks for the reference. I didn't remember this problem and it's not
(wasn't) in my list of things to look into. Perhaps these are both the
same bug.
I was reminded of that old bug because initially, at the time, it
looked very much like a corrupt index: sequential scans were fine, but
index scans gave wrong answers. This is what I saw today.
In the end, commit 6bfa88a fixed that old recovery bug by making sure
the recovery routine heap_xlog_lock() did the right thing. In both
cases (Feb 2014 and today), the index wasn't really corrupt -- it just
pointed to the root of a HOT chain when it should point to some child
tuple (or maybe a successor HOT chain).
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 28, 2017 at 2:47 PM, Peter Geoghegan <pg@bowt.ie> wrote:
In the end, commit 6bfa88a fixed that old recovery bug by making sure
the recovery routine heap_xlog_lock() did the right thing. In both
cases (Feb 2014 and today), the index wasn't really corrupt -- it just
pointed to the root of a HOT chain when it should point to some child
tuple (or maybe a successor HOT chain).
Just to be clear, obviously I don't mean that the index should have
been magically updated to compensate for that 2014 heap_lock_tuple()
recovery bug (say, via an in-place IndexTuple update during recovery).
Certainly, the index AM was entitled to assume that the heap TID it
pointed to would continue to be the root of the same HOT chain, at
least until the next VACUUM. That was what I meant by "the index
wasn't actually corrupt".
All I meant here is that the index scan and sequential scan would have
been in agreement if something like that *had* happened artificially
(say, when the index tuple was edited with a hex editor). And, that
the actual high level problem was that we failed to treat a
HOT-updated tuple as a HOT-updated tuple, leading to consequences that
were mostly noticeable during index scans. Probably on both occasions
(back in 2014, and now).
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 28, 2017 at 5:47 PM, Peter Geoghegan <pg@bowt.ie> wrote:
In the end, commit 6bfa88a fixed that old recovery bug by making sure
the recovery routine heap_xlog_lock() did the right thing. In both
cases (Feb 2014 and today), the index wasn't really corrupt -- it just
pointed to the root of a HOT chain when it should point to some child
tuple (or maybe a successor HOT chain).
Unless I'm very confused, it's really not OK to point at a child tuple
rather than the root of the HOT chain.
Pointing to a successor HOT chain would be OK, as long as you point to
the root tuple thereof.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 28, 2017 at 3:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 28, 2017 at 5:47 PM, Peter Geoghegan <pg@bowt.ie> wrote:
In the end, commit 6bfa88a fixed that old recovery bug by making sure
the recovery routine heap_xlog_lock() did the right thing. In both
cases (Feb 2014 and today), the index wasn't really corrupt -- it just
pointed to the root of a HOT chain when it should point to some child
tuple (or maybe a successor HOT chain).Unless I'm very confused, it's really not OK to point at a child tuple
rather than the root of the HOT chain.
Please see my later clarification.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 29, 2017 at 6:39 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Peter Geoghegan wrote:
We certainly do still see wrong answers to queries here:
postgres=# select ctid, xmin, xmax, * from t;
ctid | xmin | xmax | id | name | x
-------+-------+------+----+------+---
(0,1) | 21171 | 0 | 1 | 111 | 0
(0,7) | 21177 | 0 | 3 | 333 | 5
(2 rows)postgres=# select * from t where id = 3;
id | name | x
----+------+---
3 | 333 | 5
(1 row)postgres=# set enable_seqscan = off;
SET
postgres=# select * from t where id = 3;
id | name | x
----+------+---
(0 rows)Yeah, oops.
This really looks like a problem at heap-level with the parent
redirection not getting defined (?). Please note that a subsequent
REINDEX fails as well:
=# reindex index t_pkey ;
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597
VACUUM FREEZE also is not getting things right, but a VACUUM FULL does.
Also, dropping the constraint and attempting to recreate it is failing:
=# alter table t drop constraint t_pkey;
ALTER TABLE
=# create index t_pkey on t(id);
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597
A corrupted page clearly indicates that there are no tuple redirections:
=# select lp, t_ctid, lp_off, t_infomask, t_infomask2 from
heap_page_items(get_raw_page('t', 0));
lp | t_ctid | lp_off | t_infomask | t_infomask2
----+--------+--------+------------+-------------
1 | (0,1) | 8152 | 2818 | 3
2 | null | 0 | null | null
3 | (0,4) | 8112 | 9986 | 49155
4 | (0,5) | 8072 | 9986 | 49155
5 | (0,6) | 8032 | 9986 | 49155
6 | (0,7) | 7992 | 9986 | 49155
7 | (0,7) | 7952 | 11010 | 32771
(7 rows)
And a non-corrupted page clearly shows the redirection done with lp_off:
lp | t_ctid | lp_off | t_infomask | t_infomask2
----+--------+--------+------------+-------------
1 | (0,1) | 8152 | 2818 | 3
2 | null | 7 | null | null
3 | null | 0 | null | null
4 | null | 0 | null | null
5 | null | 0 | null | null
6 | null | 0 | null | null
7 | (0,7) | 8112 | 11010 | 32771
(7 rows)
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Maybe what this means is that we need to do both Dan's initially
proposed patch (or something related to it) apart from the fixes already
pushed. IOW we need to put back some of the "tupkeep" business ...
I took the time to specifically check if that would fix the problem.
Unfortunately, it did not. We see exactly the same problem, or at
least amcheck/REINDEX produces exactly the same error. I checked both
Dan's original update_freeze.patch, and your revision that retained
some of the "tupkeep" stuff,
0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
September 6th.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Oct 1, 2017 at 10:25 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Maybe what this means is that we need to do both Dan's initially
proposed patch (or something related to it) apart from the fixes already
pushed. IOW we need to put back some of the "tupkeep" business ...I took the time to specifically check if that would fix the problem.
Unfortunately, it did not. We see exactly the same problem, or at
least amcheck/REINDEX produces exactly the same error. I checked both
Dan's original update_freeze.patch, and your revision that retained
some of the "tupkeep" stuff,
0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
September 6th.
I did not take the time to dig into that more than two hours, but my
first feeling is that some race condition is going on with the heap
pruning.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote:
On Sun, Oct 1, 2017 at 10:25 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Maybe what this means is that we need to do both Dan's initially
proposed patch (or something related to it) apart from the fixes already
pushed. IOW we need to put back some of the "tupkeep" business ...I took the time to specifically check if that would fix the problem.
Unfortunately, it did not. We see exactly the same problem, or at
least amcheck/REINDEX produces exactly the same error. I checked both
Dan's original update_freeze.patch, and your revision that retained
some of the "tupkeep" stuff,
0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
September 6th.I did not take the time to dig into that more than two hours, but my
first feeling is that some race condition is going on with the heap
pruning.
I'll look into this on Monday. I found out yesterday that the
problematic case is when HTSV returns HEAPTUPLE_DEAD and the HOT tests
return true. I haven't yet figured if it is one of those specifically,
but I suspect it is the IsHotUpdated case.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
So here's my attempt at an explanation for what is going on. At one
point, we have this:
select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+--------+--------+--------+----------+-----------
1 | 1 | 2 | 0 | (0,1) | 902 | 3
2 | 0 | | | | |
3 | 1 | 2 | 19928 | (0,4) | 3142 | c003
4 | 1 | 14662 | 19929 | (0,5) | 3142 | c003
5 | 1 | 14663 | 19931 | (0,6) | 3142 | c003
6 | 1 | 14664 | 19933 | (0,7) | 3142 | c003
7 | 1 | 14665 | 0 | (0,7) | 2902 | 8003
(7 filas)
which shows a HOT-update chain, where the t_xmax are multixacts. Then a
vacuum freeze comes, and because the multixacts are below the freeze
horizon for multixacts, we get this:
select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+--------+--------+--------+----------+-----------
1 | 1 | 2 | 0 | (0,1) | 902 | 3
2 | 0 | | | | |
3 | 1 | 2 | 14662 | (0,4) | 2502 | c003
4 | 1 | 2 | 14663 | (0,5) | 2502 | c003
5 | 1 | 2 | 14664 | (0,6) | 2502 | c003
6 | 1 | 2 | 14665 | (0,7) | 2502 | c003
7 | 1 | 2 | 0 | (0,7) | 2902 | 8003
(7 filas)
where the xmin values have all been frozen, and the xmax values are now
regular Xids. I think the HOT code that walks the chain fails to detect
these as chains, because the xmin values no longer match the xmax
values. I modified the multixact freeze code, so that whenever the
update Xid is below the cutoff Xid, it's set to FrozenTransactionId,
since keeping the other value is invalid anyway (even though we have set
the HEAP_XMAX_COMMITTED flag). But that still doesn't fix the problem;
as far as I can see, vacuum removes the root of the chain, not yet sure
why, and then things are just as corrupted as before.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
which shows a HOT-update chain, where the t_xmax are multixacts. Then a
vacuum freeze comes, and because the multixacts are below the freeze
horizon for multixacts, we get this:select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+--------+--------+--------+----------+-----------
1 | 1 | 2 | 0 | (0,1) | 902 | 3
2 | 0 | | | | |
3 | 1 | 2 | 14662 | (0,4) | 2502 | c003
4 | 1 | 2 | 14663 | (0,5) | 2502 | c003
5 | 1 | 2 | 14664 | (0,6) | 2502 | c003
6 | 1 | 2 | 14665 | (0,7) | 2502 | c003
7 | 1 | 2 | 0 | (0,7) | 2902 | 8003
(7 filas)where the xmin values have all been frozen, and the xmax values are now
regular Xids.
I thought that we no longer store FrozenTransactionId (xid 2) as our
"raw" xmin while freezing, and yet that's what we see here. It looks
like pageinspect is looking at raw xmin (it's calling
HeapTupleHeaderGetRawXmin(), not HeapTupleHeaderGetXmin()). What's the
deal with that? Shouldn't the original xmin be preserved for forensic
analysis, following commit 37484ad2?
I guess what you mean is that this is what you see having modified the
code to actually store FrozenTransactionId as xmin once more, in an
effort to fix this?
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
But that still doesn't fix the problem;
as far as I can see, vacuum removes the root of the chain, not yet sure
why, and then things are just as corrupted as before.
Are you sure it's not opportunistic pruning? Another thing that I've
noticed with this problem is that the relevant IndexTuple will pretty
quickly vanish, presumably due to LP_DEAD setting (but maybe not
actually due to LP_DEAD setting).
(Studies the problem some more...)
I now think that it actually is a VACUUM problem, specifically a
problem with VACUUM pruning. You see the HOT xmin-to-xmax check
pattern that you mentioned within heap_prune_chain(), which looks like
where the incorrect tuple prune (or possibly, at times, redirect?)
takes place. (I refer to the prune/kill that you mentioned today, that
frustrated your first attempt at a fix -- "I modified the multixact
freeze code...".)
The attached patch "fixes" the problem -- I cannot get amcheck to
complain about corruption with this applied. And, "make check-world"
passes. Hopefully it goes without saying that this isn't actually my
proposed fix. It tells us something that this at least *masks* the
problem, though; it's a start.
FYI, the repro case page contents looks like this with the patch applied:
postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+---------+--------+--------+----------+-----------
1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
2 | 2 | | | | |
3 | 0 | | | | |
4 | 0 | | | | |
5 | 0 | | | | |
6 | 0 | | | | |
7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
(7 rows)
--
Peter Geoghegan
Attachments:
suppress-bad-prune.patchtext/x-patch; charset=US-ASCII; name=suppress-bad-prune.patchDownload
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52231ac..90eb39e 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -470,13 +470,6 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
ItemPointerSet(&(tup.t_self), BufferGetBlockNumber(buffer), offnum);
/*
- * Check the tuple XMIN against prior XMAX, if any
- */
- if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
- break;
-
- /*
* OK, this tuple is indeed a member of the chain.
*/
chainitems[nchain++] = offnum;
On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <pg@bowt.ie> wrote:
I now think that it actually is a VACUUM problem, specifically a
problem with VACUUM pruning. You see the HOT xmin-to-xmax check
pattern that you mentioned within heap_prune_chain(), which looks like
where the incorrect tuple prune (or possibly, at times, redirect?)
takes place. (I refer to the prune/kill that you mentioned today, that
frustrated your first attempt at a fix -- "I modified the multixact
freeze code...".)
My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.
The attached patch "fixes" the problem -- I cannot get amcheck to
complain about corruption with this applied. And, "make check-world"
passes. Hopefully it goes without saying that this isn't actually my
proposed fix. It tells us something that this at least *masks* the
problem, though; it's a start.
Yep.
FYI, the repro case page contents looks like this with the patch applied:
postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+---------+--------+--------+----------+-----------
1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
2 | 2 | | | | |
3 | 0 | | | | |
4 | 0 | | | | |
5 | 0 | | | | |
6 | 0 | | | | |
7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
(7 rows)
Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
what would look correct to me.
- * Check the tuple XMIN against prior XMAX, if any
- */
- if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
- break;
If you remove this check, you could also remove completely priorXmax.
Actually, I may be missing something, but why is priorXmax updated
even for dead tuples? For example just doing that is also taking care
of the problem:
@@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
Assert(ItemPointerGetBlockNumber(&htup->t_ctid) ==
BufferGetBlockNumber(buffer));
offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
- priorXmax = HeapTupleHeaderGetUpdateXid(htup);
+ if (!tupdead)
+ priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I’ve just started looking at this again after a few weeks break.
There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexed item). The cause of that is:
pruneheap.c:
/*
* Check the tuple XMIN against prior XMAX, if any
*/
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
break;
chainitems[nchain++] = offnum;
The priorXmax is a multixact key share lock, thus not equal to xmin. This terminates the scan. Unlike the scan termination with “if (!recent_dead) break;” below the switch (...SatisiesVacuum…), this “break” does not put the offnum into the chain even though it is in the chain. If the first not-deleted item isn’t put in the chain then we’ll not call heap_prune_record_redirect().
I do not know what the above ‘if’ test is protecting. Normally the xmin is equal to the priorXmax. Other than protecting against corruption a key share lock can cause this. So, I tried a fix which does the “if” check after adding it to chainitems. This will break whatever real situation this IF was protecting. Tom Lane put this in.
OK: With that hack of a fix the redirect now works correctly. However, we still get the reindex problem with not finding the parent. That problem is caused by:
Pruneheap.c:heap_get_root_tuples()
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;
In this case, instead of these not being equal because of a multixact key share lock, it is because XMIN is frozen and FrozenTransactionId doesn’t equal the priorXmax. Thus, we don’t build the entire chain from root to most current row version and this causes the reindex failure.
If we disable this ‘if’ as a hack then we no longer get a problem on the reindex. However, YiWen reported that at the end of an install check out index checking reported corruption in the system catalogues. So we are still looking.
We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?
FYI, someone should look at the same ”if” test in heapam.c: heap_lock_updated_tuple_rec(). Also, I hope there are no strange issues with concurrent index builds.
Finally, the idea behind the original fix was to simply NOT to do an unsupported freeze on a dead tuple. It had two drawbacks:
1) CLOG truncation. This could have been handled by keeping track of the old unpruned item found and using that to update the table’s/DB’s freeze xid.
2) Not making freeze progress. The only reason the prune would fail should be because of an open TXN. Unless that TXN was so old such that it’s XID was as old as the ?min freeze threshold? then we would make progress. If we were doing TXN’s that old then we’d be having problems anyway.
On 10/3/17, 5:15 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <pg@bowt.ie> wrote:
I now think that it actually is a VACUUM problem, specifically a
problem with VACUUM pruning. You see the HOT xmin-to-xmax check
pattern that you mentioned within heap_prune_chain(), which looks like
where the incorrect tuple prune (or possibly, at times, redirect?)
takes place. (I refer to the prune/kill that you mentioned today, that
frustrated your first attempt at a fix -- "I modified the multixact
freeze code...".)
My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.
The attached patch "fixes" the problem -- I cannot get amcheck to
complain about corruption with this applied. And, "make check-world"
passes. Hopefully it goes without saying that this isn't actually my
proposed fix. It tells us something that this at least *masks* the
problem, though; it's a start.
Yep.
FYI, the repro case page contents looks like this with the patch applied:
postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+---------+--------+--------+----------+-----------
1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
2 | 2 | | | | |
3 | 0 | | | | |
4 | 0 | | | | |
5 | 0 | | | | |
6 | 0 | | | | |
7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
(7 rows)
Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
what would look correct to me.
- * Check the tuple XMIN against prior XMAX, if any
- */
- if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
- break;
If you remove this check, you could also remove completely priorXmax.
Actually, I may be missing something, but why is priorXmax updated
even for dead tuples? For example just doing that is also taking care
of the problem:
@@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
Assert(ItemPointerGetBlockNumber(&htup->t_ctid) ==
BufferGetBlockNumber(buffer));
offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
- priorXmax = HeapTupleHeaderGetUpdateXid(htup);
+ if (!tupdead)
+ priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 3, 2017 at 5:15 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
FYI, the repro case page contents looks like this with the patch applied:
postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+---------+--------+--------+----------+-----------
1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
2 | 2 | | | | |
3 | 0 | | | | |
4 | 0 | | | | |
5 | 0 | | | | |
6 | 0 | | | | |
7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
(7 rows)Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
what would look correct to me.
Yes, it is:
postgres=# select * from bt_page_items('foo', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+-------+---------+-------+------+-------------------------
1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
2 | (0,2) | 16 | f | f | 03 00 00 00 00 00 00 00
(2 rows)
I can tell from looking at my hex editor that the 4 bytes of ItemId
that we see for position '(0,2)' in the ItemId array are "07 00 01
00", meaning that '(0,2)' this is a LP_REDIRECT item, repointing us to
'(0,7)'. Everything here looks sane to me, at least at first blush.
- * Check the tuple XMIN against prior XMAX, if any
- */
- if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
- break;
If you remove this check, you could also remove completely priorXmax.Actually, I may be missing something, but why is priorXmax updated
even for dead tuples? For example just doing that is also taking care
of the problem:
I'll study what you suggest here some more tomorrow.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote:
I’ve just started looking at this again after a few weeks break.
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;
We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?
I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
One minor side note… Is it weird for xmin/xmax to go backwards in a hot row chain?
lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax
----+--------+--------+------------+-------------+--------+--------
1 | (0,1) | 8152 | 2818 | 3 | 36957 | 0
2 | | 5 | | | |
3 | | 0 | | | |
4 | | 0 | | | |
5 | (0,6) | 8112 | 9986 | 49155 | 36962 | 36963
6 | (0,7) | 8072 | 9986 | 49155 | 36963 | 36961
7 | (0,7) | 8032 | 11010 | 32771 | 36961 | 0
(7 rows)
On 10/3/17, 6:20 PM, "Peter Geoghegan" <pg@bowt.ie> wrote:
On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote:
I’ve just started looking at this again after a few weeks break.
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;
We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?
I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
break;
So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened
The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's repro... I've run it a couple of times and have consistently gotten the following page
lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
----+--------+--------+----------+------------+-------------
1 | (0,1) | 8152 | 1 | 2818 | 3
2 | | 7 | 2 | |
3 | | 0 | 0 | |
4 | | 0 | 0 | |
5 | | 0 | 0 | |
6 | | 0 | 0 | |
7 | (0,7) | 8112 | 1 | 11010 | 32771
(7 rows)
I've made this change to conditions in both heap_prune_chain and heap_get_root_tuples and this seems to cause things to pass my smoke tests.
I'll look into this deeper tomorrow.
Yi Wen
________________________________________
From: Peter Geoghegan <pg@bowt.ie>
Sent: Tuesday, October 3, 2017 6:19 PM
To: Wood, Dan
Cc: Michael Paquier; Alvaro Herrera; PostgreSQL Hackers; Wong, Yi Wen
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert@amazon.com> wrote:
I’ve just started looking at this again after a few weeks break.
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;
We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?
I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote:
I thought that we no longer store FrozenTransactionId (xid 2) as our
"raw" xmin while freezing, and yet that's what we see here.
I'm doing this in 9.3. I can't tell if the new tuple freezing stuff
broke things more thoroughly, but it is already broken in earlier
releases.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote:
There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexed item). The cause of that is:
pruneheap.c:/*
* Check the tuple XMIN against prior XMAX, if any
*/
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
break;chainitems[nchain++] = offnum;
The priorXmax is a multixact key share lock,
Uhh, what? That certainly shouldn't happen -- the priorXmax comes from
priorXmax = HeapTupleHeaderGetUpdateXid(htup);
so only the XID of the update itself should be reported, not a multixact
and certainly not just a tuple lock XID.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote:
One minor side note… Is it weird for xmin/xmax to go backwards in a hot row chain?
lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax
----+--------+--------+------------+-------------+--------+--------
1 | (0,1) | 8152 | 2818 | 3 | 36957 | 0
2 | | 5 | | | |
3 | | 0 | | | |
4 | | 0 | | | |
5 | (0,6) | 8112 | 9986 | 49155 | 36962 | 36963
6 | (0,7) | 8072 | 9986 | 49155 | 36963 | 36961
7 | (0,7) | 8032 | 11010 | 32771 | 36961 | 0
(7 rows)
No, it just means transaction A got its XID before transaction B, but B
executed the update first and A updated the tuple second.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Alvaro Herrera wrote:
Peter Geoghegan wrote:
I thought that we no longer store FrozenTransactionId (xid 2) as our
"raw" xmin while freezing, and yet that's what we see here.I'm doing this in 9.3. I can't tell if the new tuple freezing stuff
broke things more thoroughly, but it is already broken in earlier
releases.
In fact, I think in 9.3 we should include this patch, to set the Xmin to
FrozenXid. 9.4 and onwards have commit 37484ad2a "Change the way we
mark tuples as frozen" which uses a combination of infomask bits, but in
9.3 I think leaving the unfrozen value in the xmax field is a bad idea
even if we set the HEAP_XMAX_COMMITTED bit.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
fix-multi-freezing.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e00dc6c1ca..e68746fc3b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5500,7 +5500,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
+ {
+ xid = FrozenTransactionId;
*flags = FRM_MARK_COMMITTED | FRM_RETURN_IS_XID;
+ }
else
{
*flags |= FRM_INVALIDATE_XMAX;
Wong, Yi Wen wrote:
My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
break;So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened
I independently arrived at the same conclusion. Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.
Attached is the patch I'm using, and my own oneliner test (pretty much
the same I posted earlier) seems to survive dozens of iterations without
showing any problem in REINDEX.
This patch is incomplete, since I think there are other places that need
to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
Of course, for 9.4 and onwards we need to patch like you described.
This bit in EvalPlanQualFetch caught my attention ... why is it saying
xmin never changes? It does change with freezing.
/*
* If xmin isn't what we're expecting, the slot must have been
* recycled and reused for an unrelated tuple. This implies that
* the latest version of the row was deleted, so we need do
* nothing. (Should be safe to examine xmin without getting
* buffer's content lock, since xmin never changes in an existing
* tuple.)
*/
if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
priorXmax))
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
93-fix.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e00dc6c1ca..e68746fc3b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -5500,7 +5500,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
+ {
+ xid = FrozenTransactionId;
*flags = FRM_MARK_COMMITTED | FRM_RETURN_IS_XID;
+ }
else
{
*flags |= FRM_INVALIDATE_XMAX;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9a8db74cb9..561acd144a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -435,7 +435,8 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
* Check the tuple XMIN against prior XMAX, if any
*/
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
+ !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax) &&
+ HeapTupleHeaderGetXmin(htup) != FrozenTransactionId)
break;
/*
On Tue, Oct 3, 2017 at 8:43 PM, Wong, Yi Wen <yiwong@amazon.com> wrote:
My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
break;So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened
I was thinking along similar lines.
The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's repro... I've run it a couple of times and have consistently gotten the following page
lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
----+--------+--------+----------+------------+-------------
1 | (0,1) | 8152 | 1 | 2818 | 3
2 | | 7 | 2 | |
3 | | 0 | 0 | |
4 | | 0 | 0 | |
5 | | 0 | 0 | |
6 | | 0 | 0 | |
7 | (0,7) | 8112 | 1 | 11010 | 32771
(7 rows)
That's also what I see. This is a good thing, I think; that's how we
ought to prune.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Sorry. Should have looked at the macro. I sent this too soon. The early “break;” here is likely the xmin frozen reason as I found in the other loop.
On 10/4/17, 2:52 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
Wood, Dan wrote:
There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexed item). The cause of that is:
pruneheap.c:/*
* Check the tuple XMIN against prior XMAX, if any
*/
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
break;chainitems[nchain++] = offnum;
The priorXmax is a multixact key share lock,
Uhh, what? That certainly shouldn't happen -- the priorXmax comes from
priorXmax = HeapTupleHeaderGetUpdateXid(htup);
so only the XID of the update itself should be reported, not a multixact
and certainly not just a tuple lock XID.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Oct 4, 2017 at 11:00 AM, Wood, Dan <hexpert@amazon.com> wrote:
The early “break;” here is likely the xmin frozen reason as I found in the other loop.
It looks that way.
Since we're already very defensive when it comes to this xmin/xmax
matching business, and we're defensive while following an update chain
more generally, I wonder if we should be checking
HeapTupleHeaderIsSpeculative() on versions >= 9.5 (versions with ON
CONFLICT DO UPDATE, where t_ctid block number might actually be a
speculative insertion token). Or, at least acknowledging that case in
comments. I remember expressing concern that something like this was
possible at the time that that went in.
We certainly don't want to have heap_abort_speculative() "super
delete" the wrong tuple in the event of item pointer recycling. There
are at least defensive sanity checks that would turn that into an
error within heap_abort_speculative(), so it wouldn't be a disaster if
it was attempted. I don't think that it's possible in practice, and
maybe it's sufficient that routines like heap_get_latest_tid() check
for a sane item offset, which will discriminate against
SpecTokenOffsetNumber, which must be well out of range for ItemIds on
the page. Could be worth a comment.
(I guess that heap_prune_chain() wouldn't need to be changed if we
decide to add such comments, because the speculative tuple ItemId is
going to be skipped over due to being ItemIdIsUsed() before we even
get there.)
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Wong, Yi Wen wrote:
My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
break;So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened
As you know, on version 9.4+, as of commit 37484ad2a, we decided that
we are "largely ignoring the value to which it [xmin] is set". The
expectation became that raw xmin is available after freezing, but
mostly for forensic purposes. I think Alvaro should now memorialize
the idea that its value is actually critical in some place
(htup_details.h?).
I independently arrived at the same conclusion. Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.
Obviously you're going to have to be prepared for a raw xmin of
FrozenTransactionId, even on 9.4+, due to pg_upgrade. I can see why it
would be safe (or at least no more dangerous) to rely on
HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
installations that initdb'd on a version after commit 37484ad2a
(version 9.4+). However, I'm not sure why what you propose here would
be safe when even raw xmin happens to be FrozenTransactionId. Are you
sure that that's truly race-free? If it's really true that we only
need to check for FrozenTransactionId on 9.3, why not just do that on
all versions, and never bother with HeapTupleHeaderGetRawXmin()?
("Sheer paranoia" is a valid answer; I just want us to be clear on the
reasoning.)
Obviously any race would have a ridiculously tiny window, but it's not
obvious why this protocol would be completely race-free (in the event
of a FrozenTransactionId raw xmin).
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Wong, Yi Wen wrote:
My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
break;So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened
I independently arrived at the same conclusion. Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.
Attached is the patch I'm using, and my own oneliner test (pretty much
the same I posted earlier) seems to survive dozens of iterations without
showing any problem in REINDEX.
Confirmed, the problem goes away with this patch on 9.3.
This patch is incomplete, since I think there are other places that need
to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
Of course, for 9.4 and onwards we need to patch like you described.
I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)
This bit in EvalPlanQualFetch caught my attention ... why is it saying
xmin never changes? It does change with freezing./*
* If xmin isn't what we're expecting, the slot must have been
* recycled and reused for an unrelated tuple. This implies that
* the latest version of the row was deleted, so we need do
* nothing. (Should be safe to examine xmin without getting
* buffer's content lock, since xmin never changes in an existing
* tuple.)
*/
if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
priorXmax))
Agreed. That's not good.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.
Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuum process starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table.
On 10/4/17, 6:31 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Wong, Yi Wen wrote:
My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
break;So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened
I independently arrived at the same conclusion. Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.
Attached is the patch I'm using, and my own oneliner test (pretty much
the same I posted earlier) seems to survive dozens of iterations without
showing any problem in REINDEX.
Confirmed, the problem goes away with this patch on 9.3.
This patch is incomplete, since I think there are other places that need
to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
Of course, for 9.4 and onwards we need to patch like you described.
I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)
This bit in EvalPlanQualFetch caught my attention ... why is it saying
xmin never changes? It does change with freezing./*
* If xmin isn't what we're expecting, the slot must have been
* recycled and reused for an unrelated tuple. This implies that
* the latest version of the row was deleted, so we need do
* nothing. (Should be safe to examine xmin without getting
* buffer's content lock, since xmin never changes in an existing
* tuple.)
*/
if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
priorXmax))
Agreed. That's not good.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan <hexpert@amazon.com> wrote:
Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.
Interesting. Which version did you test? Only 9.6?
Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuum process starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table.
Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote:
Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.
Good idea. You can achieve a similar effect by adding a filler column,
and reducing fillfactor.
Once nearly all 250 clients have done their updates and everybody is
waiting to vacuum which one by one will take a while I usually just
“pkill -9 psql”. After that I have many of duplicate “id=3” rows.
Odd ...
On top of that I think we might have a lock leak. After the pkill I
tried to rerun setup.sql to drop/create the table and it hangs. I see
an autovacuum process starting and existing every couple of seconds.
Only by killing and restarting PG can I drop the table.
Please do try to figure this one out. It'd be a separate problem,
worthy of its own thread.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote:
As you know, on version 9.4+, as of commit 37484ad2a, we decided that
we are "largely ignoring the value to which it [xmin] is set". The
expectation became that raw xmin is available after freezing, but
mostly for forensic purposes. I think Alvaro should now memorialize
the idea that its value is actually critical in some place
(htup_details.h?).
I'd love to be certain that we preserve the original value in all
cases.
On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I independently arrived at the same conclusion. Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.Obviously you're going to have to be prepared for a raw xmin of
FrozenTransactionId, even on 9.4+, due to pg_upgrade.
Hmm. It would be good to be able to get rid of that case, actually,
because I now think it's less safe (see below).
At any rate, I was thinking in a new routine to encapsulate the logic,
/*
* Check the tuple XMIN against prior XMAX, if any
*/
if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;
where ..XmaxMatchesXmin would know about checking for possibly frozen
tuples.
I can see why it
would be safe (or at least no more dangerous) to rely on
HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
installations that initdb'd on a version after commit 37484ad2a
(version 9.4+). However, I'm not sure why what you propose here would
be safe when even raw xmin happens to be FrozenTransactionId. Are you
sure that that's truly race-free? If it's really true that we only
need to check for FrozenTransactionId on 9.3, why not just do that on
all versions, and never bother with HeapTupleHeaderGetRawXmin()?
("Sheer paranoia" is a valid answer; I just want us to be clear on the
reasoning.)
I think the RawXmin is the better mechanism. I'm not absolutely certain
that the windows is completely closed in 9.3; as I understand things, it
is possible for transaction A to prune an aborted heap-only tuple, then
transaction B to insert a frozen tuple in the same location, then
transaction C follows a link to the HOT that was pruned. I think we'd
end up considering that the new frozen tuple is part of the chain, which
is wrong ... In 9.4 we can compare the real xmin.
I hope I am proved wrong about this, because if not, I think we're
looking at an essentially unsolvable problem in 9.3.
Obviously any race would have a ridiculously tiny window, but it's not
obvious why this protocol would be completely race-free (in the event
of a FrozenTransactionId raw xmin).
As far as I understand, this problem only emerges if one part of a HOT
chain reaches the min freeze age while another part of the same chain is
still visible by some running transaction. It is particularly
noticeably in our current test case because we use a min freeze age of
0, with many concurrrent modifying the same page. What this says to me
is that VACUUM FREEZE is mildly dangerous when there's lots of high
concurrent HOT UPDATE activity.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I think this is the patch for 9.3. I ran the test a few hundred times
(with some additional changes such as randomly having an update inside a
savepoint that's randomly aborted, randomly aborting the transaction,
randomly skipping the for key share lock, randomly sleeping at a few
points; and also adding filler columns, reducing fillfactor and using
5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
to stay in the same page or migrate to later pages). The final REINDEX
has never complained again about failing to find the root tuple. I hope
it's good now.
The attached patch needs a few small tweaks, such as improving
commentary in the new function, maybe turn it into a macro (otherwise I
think it could be bad for performance; I'd like a static func but not
sure those are readily available in 9.3), change the XID comparison to
use the appropriate macro rather than ==, and such.
Regarding changes of xmin/xmax comparison, I also checked manually the
spots I thought should be modified and later double-checked against the
list that Michael posted. It's a match, except for rewriteheap.c which
I cannot make heads or tails about. (I think it's rather unfortunate
that it sticks a tuple's Xmax into a field that's called Xmin, but let's
put that aside). Maybe there's a problem here, maybe there isn't.
I'm now going to forward-port this to 9.4.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Fix-bugs.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e00dc6c1ca..87bce0e7ea 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -1718,8 +1718,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
* broken.
*/
if (TransactionIdIsValid(prev_xmax) &&
- !TransactionIdEquals(prev_xmax,
- HeapTupleHeaderGetXmin(heapTuple->t_data)))
+ !HeapTupleUpdateXmaxMatchesXmin(prev_xmax, heapTuple->t_data))
break;
/*
@@ -1888,7 +1887,7 @@ heap_get_latest_tid(Relation relation,
* tuple. Check for XMIN match.
*/
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(tp.t_data)))
+ !HeapTupleUpdateXmaxMatchesXmin(priorXmax, tp.t_data))
{
UnlockReleaseBuffer(buffer);
break;
@@ -1920,6 +1919,36 @@ heap_get_latest_tid(Relation relation,
} /* end of loop */
}
+/*
+ * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
+ * taking into account that the Xmin might have been frozen.
+ */
+bool
+HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
+{
+ TransactionId xmin = HeapTupleHeaderGetXmin(htup);
+
+ /* xmax must not be invalid or frozen */
+ Assert(TransactionIdIsNormal(xmax));
+
+ /*
+ * If the xmax of the old tuple is identical to the xmin of the new one,
+ * it's a match.
+ */
+ if (xmax == xmin)
+ return true;
+
+ /* FIXME Here we need to check the HEAP_XMIN_FROZEN in 9.4 and up */
+
+ /*
+ * We actually don't know if there's a match, but if the previous tuple
+ * was frozen, we cannot really rely on a perfect match.
+ */
+ if (xmin == FrozenTransactionId)
+ return true;
+
+ return false;
+}
/*
* UpdateXmaxHintBits - update tuple hint bits after xmax transaction ends
@@ -5045,8 +5074,7 @@ l4:
* the end of the chain, we're done, so return success.
*/
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data),
- priorXmax))
+ !HeapTupleUpdateXmaxMatchesXmin(priorXmax, mytup.t_data))
{
UnlockReleaseBuffer(buf);
return HeapTupleMayBeUpdated;
@@ -5500,7 +5528,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
+ {
+ xid = FrozenTransactionId;
*flags = FRM_MARK_COMMITTED | FRM_RETURN_IS_XID;
+ }
else
{
*flags |= FRM_INVALIDATE_XMAX;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 9a8db74cb9..a342f57f42 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -435,7 +435,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
* Check the tuple XMIN against prior XMAX, if any
*/
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
+ !HeapTupleUpdateXmaxMatchesXmin(priorXmax, htup))
break;
/*
@@ -774,7 +774,7 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
htup = (HeapTupleHeader) PageGetItem(page, lp);
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
+ !HeapTupleUpdateXmaxMatchesXmin(priorXmax, htup))
break;
/* Remember the root line pointer for this item */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 315a555dcf..79e4701bb8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2019,8 +2019,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode,
* buffer's content lock, since xmin never changes in an existing
* tuple.)
*/
- if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
- priorXmax))
+ if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, tuple.t_data))
{
ReleaseBuffer(buffer);
return NULL;
@@ -2138,8 +2137,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode,
/*
* As above, if xmin isn't what we're expecting, do nothing.
*/
- if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
- priorXmax))
+ if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, tuple.t_data))
{
ReleaseBuffer(buffer);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4a187f899c..93a481f483 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -127,6 +127,9 @@ extern void heap_get_latest_tid(Relation relation, Snapshot snapshot,
ItemPointer tid);
extern void setLastTid(const ItemPointer tid);
+extern bool HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax,
+ HeapTupleHeader htup);
+
extern BulkInsertState GetBulkInsertState(void);
extern void FreeBulkInsertState(BulkInsertState);
--
2.11.0
On Thu, Oct 5, 2017 at 4:35 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
At any rate, I was thinking in a new routine to encapsulate the logic,
/*
* Check the tuple XMIN against prior XMAX, if any
*/
if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;where ..XmaxMatchesXmin would know about checking for possibly frozen
tuples.
Makes sense.
I can see why it
would be safe (or at least no more dangerous) to rely on
HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
installations that initdb'd on a version after commit 37484ad2a
(version 9.4+). However, I'm not sure why what you propose here would
be safe when even raw xmin happens to be FrozenTransactionId. Are you
sure that that's truly race-free? If it's really true that we only
need to check for FrozenTransactionId on 9.3, why not just do that on
all versions, and never bother with HeapTupleHeaderGetRawXmin()?
("Sheer paranoia" is a valid answer; I just want us to be clear on the
reasoning.)I think the RawXmin is the better mechanism. I'm not absolutely certain
that the windows is completely closed in 9.3; as I understand things, it
is possible for transaction A to prune an aborted heap-only tuple, then
transaction B to insert a frozen tuple in the same location, then
transaction C follows a link to the HOT that was pruned. I think we'd
end up considering that the new frozen tuple is part of the chain, which
is wrong ... In 9.4 we can compare the real xmin.
Good point: the race doesn't exist on 9.4+ with pg_upgrade from 9.3,
when raw xmin happens to be FrozenTransactionId, because there can be
no new tuples that have that property. This possible race is strictly
a 9.3 issue. (You have to deal with a raw xmin of FrozenTransactionId
on 9.4+, but there are no race conditions, because affected tuples are
all pre-pg_upgrade tuples -- they were frozen months ago, not
microseconds ago.)
I hope I am proved wrong about this, because if not, I think we're
looking at an essentially unsolvable problem in 9.3.
Well, as noted within README.HOT, the xmin/xmax matching did not
appear in earlier versions of the original HOT patch, because it was
thought that holding a pin of the buffer was a sufficient interlock
against concurrent line pointer recycling (pruning must have a buffer
cleanup lock). Clearly it is more robust to match xmin to xmax, but is
there actually any evidence that it was really necessary at all (for
single-page HOT chain traversals) when HOT went in in 2007, or that it
has since become necessary? heap_page_prune()'s caller has to have a
buffer cleanup lock.
I'm certainly not advocating removing the xmin/xmax matching within
heap_prune_chain() on 9.3. However, it may be acceptable to rely on
holding a buffer cleanup lock within heap_prune_chain() on 9.3, just
for the rare/theoretical cases where the FrozenTransactionId raw xmin
ambiguity means that xmin/xmax matching alone might not be enough. As
you say, it seems unsolvable through looking at state on the page
directly on 9.3, so there may be no other way. And, that's still
strictly better than what we have today.
As far as I understand, this problem only emerges if one part of a HOT
chain reaches the min freeze age while another part of the same chain is
still visible by some running transaction. It is particularly
noticeably in our current test case because we use a min freeze age of
0, with many concurrrent modifying the same page. What this says to me
is that VACUUM FREEZE is mildly dangerous when there's lots of high
concurrent HOT UPDATE activity.
I'm not sure what you mean here. It is dangerous right now, because
there is a bug, but we can squash the bug.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
I would prefer to focus on either latest 9X or 11dev. Does Alvaro’s patch presume any of the other patch to set COMMITTED in the freeze code?
On 10/4/17, 7:17 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan <hexpert@amazon.com> wrote:
Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.
Interesting. Which version did you test? Only 9.6?
Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuum process starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table.
Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I think this is the patch for 9.3. I ran the test a few hundred times
(with some additional changes such as randomly having an update inside a
savepoint that's randomly aborted, randomly aborting the transaction,
randomly skipping the for key share lock, randomly sleeping at a few
points; and also adding filler columns, reducing fillfactor and using
5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
to stay in the same page or migrate to later pages). The final REINDEX
has never complained again about failing to find the root tuple. I hope
it's good now.
I have looked and played with your patch as well for a couple of
hours, and did not notice any failures again. The structure of the
pages looked sane as well, I could see also with pageinspect a correct
HOT chain using the first set of tests provided on this thread.
The attached patch needs a few small tweaks, such as improving
commentary in the new function, maybe turn it into a macro (otherwise I
think it could be bad for performance; I'd like a static func but not
sure those are readily available in 9.3), change the XID comparison to
use the appropriate macro rather than ==, and such.
Yeah that would be better.
Regarding changes of xmin/xmax comparison, I also checked manually the
spots I thought should be modified and later double-checked against the
list that Michael posted.
Thanks. I can see see that your patch is filling the holes.
It's a match, except for rewriteheap.c which
I cannot make heads or tails about. (I think it's rather unfortunate
that it sticks a tuple's Xmax into a field that's called Xmin, but let's
put that aside). Maybe there's a problem here, maybe there isn't.
rewrite_heap_tuple is used only by CLUSTER, and the oldest new tuples
are frozen before being rewritten. So this looks safe to me at the
end.
I'm now going to forward-port this to 9.4.
+ /*
+ * If the xmax of the old tuple is identical to the xmin of the new one,
+ * it's a match.
+ */
+ if (xmax == xmin)
+ return true;
I would use TransactionIdEquals() here, to remember once you switch
that to a macro.
+/*
+ * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
+ * taking into account that the Xmin might have been frozen.
+ */
[...]
+ /*
+ * We actually don't know if there's a match, but if the previous tuple
+ * was frozen, we cannot really rely on a perfect match.
+ */
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote:
On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
+ /* + * If the xmax of the old tuple is identical to the xmin of the new one, + * it's a match. + */ + if (xmax == xmin) + return true; I would use TransactionIdEquals() here, to remember once you switch that to a macro.
I've had second thoughts about the macro thing -- for now I'm keeping it
a function, actually.
+/* + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin, + * taking into account that the Xmin might have been frozen. + */ [...] + /* + * We actually don't know if there's a match, but if the previous tuple + * was frozen, we cannot really rely on a perfect match. + */
I don't know what you had in mind here, but I tweaked the 9.3 version so
that it now looks like this:
/*
* HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
*
* Given the new version of a tuple after some update, verify whether the
* given Xmax (corresponding to the previous version) matches the tuple's
* Xmin, taking into account that the Xmin might have been frozen after the
* update.
*/
bool
HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
{
TransactionId xmin = HeapTupleHeaderGetXmin(htup);
/*
* If the xmax of the old tuple is identical to the xmin of the new one,
* it's a match.
*/
if (TransactionIdEquals(xmax, xmin))
return true;
/*
* When a tuple is frozen, the original Xmin is lost, but we know it's a
* committed transaction. So unless the Xmax is InvalidXid, we don't
* know for certain that there is a match, but there may be one; and we
* must return true so that a HOT chain that is half-frozen can be walked
* correctly.
*/
if (TransactionIdEquals(xmin, FrozenTransactionId) &&
TransactionIdIsValid(xmax))
return true;
return false;
}
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote:
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
I would prefer to focus on either latest 9X or 11dev.
I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed. There must be something new in
9.6 that is causing the problem to reappear.
Does Alvaro’s patch presume any of the other patch to set COMMITTED in
the freeze code?
I don't know what you mean. Here is the 9.6 version of my patch. Note
that HEAP_XMIN_FROZEN (which uses the XMIN_COMMITTED bit as I recall)
was introduced in 9.4.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-Fix-bugs.patchtext/plain; charset=us-asciiDownload
From 46ca12d56402d33a78ea0e13367d1e0e25a474dd Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 6 Oct 2017 14:11:34 +0200
Subject: [PATCH] Fix bugs
---
src/backend/access/heap/heapam.c | 52 +++++++++++++++++++++++++++++++++----
src/backend/access/heap/pruneheap.c | 4 +--
src/backend/executor/execMain.c | 6 ++---
src/include/access/heapam.h | 3 +++
4 files changed, 54 insertions(+), 11 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index b41f2a2fdd..10916b140e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2060,8 +2060,7 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
* broken.
*/
if (TransactionIdIsValid(prev_xmax) &&
- !TransactionIdEquals(prev_xmax,
- HeapTupleHeaderGetXmin(heapTuple->t_data)))
+ !HeapTupleUpdateXmaxMatchesXmin(prev_xmax, heapTuple->t_data))
break;
/*
@@ -2244,7 +2243,7 @@ heap_get_latest_tid(Relation relation,
* tuple. Check for XMIN match.
*/
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(tp.t_data)))
+ !HeapTupleUpdateXmaxMatchesXmin(priorXmax, tp.t_data))
{
UnlockReleaseBuffer(buffer);
break;
@@ -2276,6 +2275,50 @@ heap_get_latest_tid(Relation relation,
} /* end of loop */
}
+/*
+ * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
+ *
+ * Given the new version of a tuple after some update, verify whether the
+ * given Xmax (corresponding to the previous version) matches the tuple's
+ * Xmin, taking into account that the Xmin might have been frozen after the
+ * update.
+ */
+bool
+HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
+{
+ TransactionId xmin = HeapTupleHeaderGetXmin(htup);
+
+ /*
+ * If the xmax of the old tuple is identical to the xmin of the new one,
+ * it's a match.
+ */
+ if (TransactionIdEquals(xmax, xmin))
+ return true;
+
+ /*
+ * If the Xmin that was in effect prior to a freeze matches the Xmax,
+ * it's good too.
+ */
+ if (HeapTupleHeaderXminFrozen(htup) &&
+ TransactionIdEquals(HeapTupleHeaderGetRawXmin(htup), xmax))
+ return true;
+
+ /*
+ * When a tuple is frozen, the original Xmin is lost, but we know it's a
+ * committed transaction. So unless the Xmax is InvalidXid, we don't know
+ * for certain that there is a match, but there may be one; and we must
+ * return true so that a HOT chain that is half-frozen can be walked
+ * correctly.
+ *
+ * We no longer freeze tuples this way, but we must keep this in order to
+ * interpret pre-pg_upgrade pages correctly.
+ */
+ if (TransactionIdEquals(xmin, FrozenTransactionId) &&
+ TransactionIdIsValid(xmax))
+ return true;
+
+ return false;
+}
/*
* UpdateXmaxHintBits - update tuple hint bits after xmax transaction ends
@@ -5693,8 +5736,7 @@ l4:
* end of the chain, we're done, so return success.
*/
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(mytup.t_data),
- priorXmax))
+ !HeapTupleUpdateXmaxMatchesXmin(priorXmax, mytup.t_data))
{
result = HeapTupleMayBeUpdated;
goto out_locked;
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 52231ac417..7753ee7b12 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -473,7 +473,7 @@ heap_prune_chain(Relation relation, Buffer buffer, OffsetNumber rootoffnum,
* Check the tuple XMIN against prior XMAX, if any
*/
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
+ !HeapTupleUpdateXmaxMatchesXmin(priorXmax, htup))
break;
/*
@@ -813,7 +813,7 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
htup = (HeapTupleHeader) PageGetItem(page, lp);
if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
+ !HeapTupleUpdateXmaxMatchesXmin(priorXmax, htup))
break;
/* Remember the root line pointer for this item */
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 396b7a1e83..ac7e0fb48b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2594,8 +2594,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode,
* atomic, and Xmin never changes in an existing tuple, except to
* invalid or frozen, and neither of those can match priorXmax.)
*/
- if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
- priorXmax))
+ if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, tuple.t_data))
{
ReleaseBuffer(buffer);
return NULL;
@@ -2742,8 +2741,7 @@ EvalPlanQualFetch(EState *estate, Relation relation, int lockmode,
/*
* As above, if xmin isn't what we're expecting, do nothing.
*/
- if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
- priorXmax))
+ if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, tuple.t_data))
{
ReleaseBuffer(buffer);
return NULL;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4e41024e92..9f4367d704 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -146,6 +146,9 @@ extern void heap_get_latest_tid(Relation relation, Snapshot snapshot,
ItemPointer tid);
extern void setLastTid(const ItemPointer tid);
+extern bool HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax,
+ HeapTupleHeader htup);
+
extern BulkInsertState GetBulkInsertState(void);
extern void FreeBulkInsertState(BulkInsertState);
extern void ReleaseBulkInsertStatePin(BulkInsertState bistate);
--
2.11.0
On Fri, Oct 6, 2017 at 7:57 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Michael Paquier wrote:
On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: +/* + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin, + * taking into account that the Xmin might have been frozen. + */ [...] + /* + * We actually don't know if there's a match, but if the previous tuple + * was frozen, we cannot really rely on a perfect match. + */I don't know what you had in mind here,
Impossible to know if I don't actually send the contents :)
but I tweaked the 9.3 version so that it now looks like this:
I wanted to mention that the comments could be reworked. And forgot to
suggest some.
/*
* HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
*
* Given the new version of a tuple after some update, verify whether the
* given Xmax (corresponding to the previous version) matches the tuple's
* Xmin, taking into account that the Xmin might have been frozen after the
* update.
*/
bool
HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
{
TransactionId xmin = HeapTupleHeaderGetXmin(htup);/*
* If the xmax of the old tuple is identical to the xmin of the new one,
* it's a match.
*/
if (TransactionIdEquals(xmax, xmin))
return true;/*
* When a tuple is frozen, the original Xmin is lost, but we know it's a
* committed transaction. So unless the Xmax is InvalidXid, we don't
* know for certain that there is a match, but there may be one; and we
* must return true so that a HOT chain that is half-frozen can be walked
* correctly.
*/
if (TransactionIdEquals(xmin, FrozenTransactionId) &&
TransactionIdIsValid(xmax))
return true;return false;
}
Those are clearly improvements.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Wood, Dan wrote:
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
I would prefer to focus on either latest 9X or 11dev.
I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed. There must be something new in
9.6 that is causing the problem to reappear.
The freeze visibility map has been introduced in 9.6... There could be
interactions on this side.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote:
On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Wood, Dan wrote:
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
I would prefer to focus on either latest 9X or 11dev.
I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed. There must be something new in
9.6 that is causing the problem to reappear.The freeze visibility map has been introduced in 9.6... There could be
interactions on this side.
Ah, thanks for the tip. I hope the authors of that can do the gruntwork
of researching this problem there, then. I'll go commit what I have
now.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 10:57 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Ah, thanks for the tip. I hope the authors of that can do the gruntwork
of researching this problem there, then.
I have some stuff using 9.6 extensively, so like Dan I think I'll
chime in anyway. Not before Tuesday though, long weekend in Japan
ahead.
I'll go commit what I have now.
As far as I saw this set definitely improves the situation.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert,
* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
Michael Paquier wrote:
On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Wood, Dan wrote:
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
I would prefer to focus on either latest 9X or 11dev.
I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed. There must be something new in
9.6 that is causing the problem to reappear.The freeze visibility map has been introduced in 9.6... There could be
interactions on this side.Ah, thanks for the tip. I hope the authors of that can do the gruntwork
of researching this problem there, then. I'll go commit what I have
now.
I don't doubt you're watching this thread too, but just to be 110% sure
that we don't end up with the November releases still having this issue,
I'm adding you to the CC on this thread as the one who did the freeze
visibility map work. Depending on hope here is a bit too squishy for me
when we're talking about corruption issues.
Thanks!
Stephen
By the way, I still wonder if there's any way for a new tuple to get
inserted in the place where a HOT redirect would be pointing to, and
have it be marked as Frozen, where the old redirect contains a
non-invalid Xmax. I tried to think of a way for that to happen, but
couldn't think of anything.
What I imagine is a sequence like this:
1. insert a tuple
2. HOT-update a tuple
3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
4. start transaction
5. HOT-update the tuple again, creating HOT in lp 3
6. abort transaction (leaving aborted update in lp 3)
7. somehow remove tuple from lp 3, make slot available
8. new transaction comes along, inserts tuple in lp 3
9. somebody freezes tuple in lp3 (???)
Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
the tuple is part of the chain because of an xid "match".
Basically from step 7 onwards I don't think this is possible, but maybe
I'm just blind.
Maybe we can forestall the problem by checking whether the Xmax
TransactionIdIsCurrentTransaction || TransactionIdDidCommit (or some
variant thereof). This would be very slow but safer; and in 9.4 and up
we'd only need to do it if the xmin value is actually FrozenXid which
should be rare (only in pages upgraded from 9.3).
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Wood, Dan wrote:
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
I would prefer to focus on either latest 9X or 11dev.
I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed. There must be something new in
9.6 that is causing the problem to reappear.
What problem persists? The original one (or, at least, the original
symptom of pruning HOT chains incorrectly)? If that's what you mean, I
wouldn't be so quick to assume that it's the freeze map.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote:
On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Wood, Dan wrote:
Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
I would prefer to focus on either latest 9X or 11dev.
I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed. There must be something new in
9.6 that is causing the problem to reappear.What problem persists? The original one (or, at least, the original
symptom of pruning HOT chains incorrectly)? If that's what you mean, I
wouldn't be so quick to assume that it's the freeze map.
I can tell that, in 9.6, REINDEX still reports the error we saw in
earlier releases, after some of the runs of my reproducer scripts. I'm
unable to reproduce it anymore in 9.3 to 9.5. I can't see the one Dan
originally reported anywhere, either.
I don't know if it's really the freeze map at fault or something else.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 7:59 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
By the way, I still wonder if there's any way for a new tuple to get
inserted in the place where a HOT redirect would be pointing to, and
have it be marked as Frozen, where the old redirect contains a
non-invalid Xmax. I tried to think of a way for that to happen, but
couldn't think of anything.What I imagine is a sequence like this:
1. insert a tuple
2. HOT-update a tuple
3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
4. start transaction
5. HOT-update the tuple again, creating HOT in lp 3
6. abort transaction (leaving aborted update in lp 3)
7. somehow remove tuple from lp 3, make slot available
8. new transaction comes along, inserts tuple in lp 3
9. somebody freezes tuple in lp3 (???)Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
the tuple is part of the chain because of an xid "match".Basically from step 7 onwards I don't think this is possible, but maybe
I'm just blind.
For the record, I also think that this is impossible, in part because
pruning requires a cleanup buffer lock (and because HOT chains cannot
span pages). I wouldn't say that I am 100% confident about this,
though.
BTW, is this comment block that appears above
heap_prepare_freeze_tuple() now obsolete, following 20b65522 (and
maybe much earlier commits)?
* NB: It is not enough to set hint bits to indicate something is
* committed/invalid -- they might not be set on a standby, or after crash
* recovery. We really need to remove old xids.
*/
We WAL-log setting hint bits during freezing now, iff tuple xmin is
before the Xid cutoff and tuple is a heap-only tuple.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 10:49 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I can tell that, in 9.6, REINDEX still reports the error we saw in
earlier releases, after some of the runs of my reproducer scripts. I'm
unable to reproduce it anymore in 9.3 to 9.5. I can't see the one Dan
originally reported anywhere, either.
You mean the enhanced stress-test that varied fillfactor, added filler
columns, and so on [1]/messages/by-id/20171005162402.jahqflf3mekileqm@alvherre.pgsql -- Peter Geoghegan? Can you post that to the list, please? I think
that several of us would like to have a reproducible test case.
I don't know if it's really the freeze map at fault or something else.
Ideally, it would be possible to effectively disable the new freeze
map stuff in a minimal way, for testing purposes. Perhaps the authors
of that patch, CC'd, can suggest a way to do that.
If I had to guess, I'd say that it's just as likely that the issue is
only reproducible on 9.6 because of the enhancements added in that
release that improved buffer pinning (the use of atomic ops to pin
buffers, moving buffer content locks into buffer descriptors, etc). It
was already a bit tricky to get the problem that remained after
20b6552 but before today's a5736bf to reproduce with Dan's script. It
often took me 4 or 5 attempts. (I wonder what it looks like with your
enhanced version of that script -- the one that I just asked about.)
It seems possible that we've merely reduced the window for the race to
the point that it's practically (though not theoretically) impossible
to reproduce the problem on versions < 9.6, though not on 9.6+.
Applying Occam's razor, the problem doesn't seem particularly likely
to be in the freeze map stuff, which isn't actually all that closely
related.
[1]: /messages/by-id/20171005162402.jahqflf3mekileqm@alvherre.pgsql -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg@bowt.ie> wrote:
I don't know if it's really the freeze map at fault or something else.
Ideally, it would be possible to effectively disable the new freeze
map stuff in a minimal way, for testing purposes. Perhaps the authors
of that patch, CC'd, can suggest a way to do that.
Actually, the simplest thing might be to just use pg_visibility's
pg_check_frozen() to check that the visibility/freeze map accurately
summarizes the all-frozen status of tuples in the heap. If that
doesn't indicate that there is corruption, we can be fairly confident
that the problem is elsewhere. The metadata in the visibility/freeze
map should be accurate when a bit is set to indicate that an entire
heap page is all-frozen (or, separately, all-visible). We can hardly
expect it to have better information that the authoritative source of
truth, the heap itself.
The more I think about it, the more I tend to doubt that the remaining
problems are with the freeze map. If the freeze map was wrong, and
incorrectly said that a page was all-frozen, then surely the outward
symptoms would take a long time to show up, as they always do when we
accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
that that's the only meaningful way that the freeze map can be wrong
-- it only promises to be accurate when it says that no further
freezing is needed for a page/bit.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg@bowt.ie> wrote:
I don't know if it's really the freeze map at fault or something else.
Ideally, it would be possible to effectively disable the new freeze
map stuff in a minimal way, for testing purposes. Perhaps the authorsof that patch, CC'd, can suggest a way to do that.
Actually, the simplest thing might be to just use pg_visibility's
pg_check_frozen() to check that the visibility/freeze map accurately
summarizes the all-frozen status of tuples in the heap. If that
doesn't indicate that there is corruption, we can be fairly confident
that the problem is elsewhere. The metadata in the visibility/freeze
map should be accurate when a bit is set to indicate that an entire
heap page is all-frozen (or, separately, all-visible). We can hardly
expect it to have better information that the authoritative source of
truth, the heap itself.
The more I think about it, the more I tend to doubt that the remaining
problems are with the freeze map. If the freeze map was wrong, and
incorrectly said that a page was all-frozen, then surely the outward
symptoms would take a long time to show up, as they always do when we
accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
that that's the only meaningful way that the freeze map can be wrong
-- it only promises to be accurate when it says that no further
freezing is needed for a page/bit.
Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
None of the all-frozen or all-visible bits are necessarily set in problematic pages.
ERROR: failed to find parent tuple for heap-only tuple at (0,182) in table "accounts"
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | f
(1 row)
Even when the bits were set, I haven't found issues with the pg_check_xxx functions in the dozens of times I've run them.
postgres=# select * from pg_visibility('accounts'::regclass);
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | f
1 | t | t | t
(2 rows)
postgres=# select pg_check_visible('accounts'::regclass);
pg_check_visible
------------------
(0 rows)
postgres=# select pg_check_frozen('accounts'::regclass);
pg_check_frozen
-----------------
(0 rows)
Yi Wen
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen <yiwong@amazon.com> wrote:
Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
None of the all-frozen or all-visible bits are necessarily set in problematic pages.
Since this happened yesterday, I assume it was with an unfixed version?
As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1]/messages/by-id/20171005162402.jahqflf3mekileqm@alvherre.pgsql -- Peter Geoghegan.
I've tried to independently reproduce the problem on the master
branch's current tip, with today's new fix, but cannot break things
despite trying many variations. I cannot reproduce the problem that
Alvaro still sees.
I'll have to wait until Alvaro posts his repro to the list before
commenting further, which I assume he'll post as soon as he can. There
doesn't seem to be much point in not waiting for that.
[1]: /messages/by-id/20171005162402.jahqflf3mekileqm@alvherre.pgsql -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote:
On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen <yiwong@amazon.com> wrote:
Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
None of the all-frozen or all-visible bits are necessarily set in problematic pages.Since this happened yesterday, I assume it was with an unfixed version?
As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1].
I just execute setup.sql once and then run this shell command,
while :; do
psql -e -P pager=off -f ./repro.sql
for i in `seq 1 5`; do
psql -P pager=off -e --no-psqlrc -f ./lock.sql &
done
wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
psql -P pager=off -e --no-psqlrc -f ./report.sql
echo "done"
done
Note that you need to use pg10's psql because of the \if lines in
lock.sql. For some runs I change the values to compare random() to, and
originally the commented out section in lock.sql was not commented out,
but I'm fairly sure the failures I saw where with this version. Also, I
sometime change the 5 in the `seq` command to higher values (180, 250).
I didn't find the filler column to have any effect, so I took that out.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1].I just execute setup.sql once and then run this shell command,
while :; do
psql -e -P pager=off -f ./repro.sql
for i in `seq 1 5`; do
psql -P pager=off -e --no-psqlrc -f ./lock.sql &
done
wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
psql -P pager=off -e --no-psqlrc -f ./report.sql
echo "done"
done
I cannot reproduce the problem on my personal machine using this
script/stress-test. I tried to do so on the master branch git tip.
This reinforces the theory that there is some timing sensitivity,
because the remaining race condition is very narrow.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Peter Geoghegan wrote:
On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1].I just execute setup.sql once and then run this shell command,
while :; do
psql -e -P pager=off -f ./repro.sql
for i in `seq 1 5`; do
psql -P pager=off -e --no-psqlrc -f ./lock.sql &
done
wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
psql -P pager=off -e --no-psqlrc -f ./report.sql
echo "done"
doneI cannot reproduce the problem on my personal machine using this
script/stress-test. I tried to do so on the master branch git tip.
This reinforces the theory that there is some timing sensitivity,
because the remaining race condition is very narrow.
Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.
I still cannot reproduce. Perhaps you can be more specific?
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I’m unclear on what is being repro’d in 9.6. Are you getting the duplicate rows problem or just the reindex problem? Are you testing with asserts enabled(I’m not)?
If you are getting the dup rows consider the code in the block in heapam.c that starts with the comment “replace multi by update xid”.
When I repro this I find that MultiXactIdGetUpdateXid() returns 0. There is an updater in the multixact array however the status is MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I assume this is a preliminary status before the following row in the hot chain has it’s multixact set to NoKeyUpdate.
Since a 0 is returned this does precede cutoff_xid and TransactionIdDidCommit(0) will return false. This ends up aborting the multixact on the row even though the real xid is committed. This sets XMAX to 0 and that row becomes visible as one of the dups. Interestingly the real xid of the updater is 122944 and the cutoff_xid is 122945.
I’m still debugging but I start late so I’m passing this incomplete info along now.
On 10/7/17, 4:25 PM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:
Peter Geoghegan wrote:
On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1].I just execute setup.sql once and then run this shell command,
while :; do
psql -e -P pager=off -f ./repro.sql
for i in `seq 1 5`; do
psql -P pager=off -e --no-psqlrc -f ./lock.sql &
done
wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
psql -P pager=off -e --no-psqlrc -f ./report.sql
echo "done"
doneI cannot reproduce the problem on my personal machine using this
script/stress-test. I tried to do so on the master branch git tip.
This reinforces the theory that there is some timing sensitivity,
because the remaining race condition is very narrow.
Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 9, 2017 at 2:29 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.I still cannot reproduce. Perhaps you can be more specific?
I have been trying to reproduce things for a total of 4 hours, testing
various variations of the proposed test cases (killed Postgres,
changed fillfactor, manual sleep calls), but I am proving unable to
see a regression as well. I would not think that the OS matters here,
all my attempts were on macos with assertions and debugging enabled.
At least the code is now more stable, which is definitely a good thing.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Wood, Dan wrote:
I’m unclear on what is being repro’d in 9.6. Are you getting the
duplicate rows problem or just the reindex problem? Are you testing
with asserts enabled(I’m not)?
I was seeing just the reindex problem. I don't see any more dups.
But I've tried to reproduce it afresh now, and let it run for a long
time and nothing happened. Maybe I made a mistake last week and
ran an unfixed version. I don't see any more problems now.
If you are getting the dup rows consider the code in the block in
heapam.c that starts with the comment “replace multi by update xid”.When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
There is an updater in the multixact array however the status is
MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I
assume this is a preliminary status before the following row in the
hot chain has it’s multixact set to NoKeyUpdate.
Yes, the "For" version is the locker version rather than the actual
update. That lock is acquired by EvalPlanQual locking the row just
before doing the update. I think GetUpdateXid has no reason to return
such an Xid, since it's not an update.
Since a 0 is returned this does precede cutoff_xid and
TransactionIdDidCommit(0) will return false. This ends up aborting
the multixact on the row even though the real xid is committed. This
sets XMAX to 0 and that row becomes visible as one of the dups.
Interestingly the real xid of the updater is 122944 and the cutoff_xid
is 122945.
I haven't seen this effect. Please keep us updated if you're able to
verify corruption this way.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
I was seeing just the reindex problem. I don't see any more dups.
But I've tried to reproduce it afresh now, and let it run for a long
time and nothing happened. Maybe I made a mistake last week and
ran an unfixed version. I don't see any more problems now.
Okay, so that's one person more going to this trend, making three with
Peter and I.
If you are getting the dup rows consider the code in the block in
heapam.c that starts with the comment “replace multi by update xid”.When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
There is an updater in the multixact array however the status is
MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I
assume this is a preliminary status before the following row in the
hot chain has it’s multixact set to NoKeyUpdate.Yes, the "For" version is the locker version rather than the actual
update. That lock is acquired by EvalPlanQual locking the row just
before doing the update. I think GetUpdateXid has no reason to return
such an Xid, since it's not an update.Since a 0 is returned this does precede cutoff_xid and
TransactionIdDidCommit(0) will return false. This ends up aborting
the multixact on the row even though the real xid is committed. This
sets XMAX to 0 and that row becomes visible as one of the dups.
Interestingly the real xid of the updater is 122944 and the cutoff_xid
is 122945.I haven't seen this effect. Please keep us updated if you're able to
verify corruption this way.
Me neither. It would be nice to not live long with such a sword of Damocles.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I found one glitch with our merge of the original dup row fix. With that corrected AND Alvaro’s Friday fix things are solid.
No dup’s. No index corruption.
Thanks so much.
On 10/10/17, 7:25 PM, "Michael Paquier" <michael.paquier@gmail.com> wrote:
On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
I was seeing just the reindex problem. I don't see any more dups.
But I've tried to reproduce it afresh now, and let it run for a long
time and nothing happened. Maybe I made a mistake last week and
ran an unfixed version. I don't see any more problems now.
Okay, so that's one person more going to this trend, making three with
Peter and I.
If you are getting the dup rows consider the code in the block in
heapam.c that starts with the comment “replace multi by update xid”.When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
There is an updater in the multixact array however the status is
MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I
assume this is a preliminary status before the following row in the
hot chain has it’s multixact set to NoKeyUpdate.Yes, the "For" version is the locker version rather than the actual
update. That lock is acquired by EvalPlanQual locking the row just
before doing the update. I think GetUpdateXid has no reason to return
such an Xid, since it's not an update.Since a 0 is returned this does precede cutoff_xid and
TransactionIdDidCommit(0) will return false. This ends up aborting
the multixact on the row even though the real xid is committed. This
sets XMAX to 0 and that row becomes visible as one of the dups.
Interestingly the real xid of the updater is 122944 and the cutoff_xid
is 122945.I haven't seen this effect. Please keep us updated if you're able to
verify corruption this way.
Me neither. It would be nice to not live long with such a sword of Damocles.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 11, 2017 at 11:31 AM, Wood, Dan <hexpert@amazon.com> wrote:
I found one glitch with our merge of the original dup row fix. With that corrected AND Alvaro’s Friday fix things are solid.
No dup’s. No index corruption.Thanks so much.
Nice to hear that! You guys seem to be doing extensive testing and
actually report back about it, which is really nice to see.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2017-09-28 14:47:53 +0000, Alvaro Herrera wrote:
Fix freezing of a dead HOT-updated tuple
Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But
concurrent transaction commit/abort may turn DEAD some of the HOT tuples
that survived the prune, before HeapTupleSatisfiesVacuum tests them.
This happens to activate the code that decides to freeze the tuple ...
which resuscitates it, duplicating data.(This is especially bad if there's any unique constraints, because those
are now internally violated due to the duplicate entries, though you
won't know until you try to REINDEX or dump/restore the table.)One possible fix would be to simply skip doing anything to the tuple,
and hope that the next HOT prune would remove it. But there is a
problem: if the tuple is older than freeze horizon, this would leave an
unfrozen XID behind, and if no HOT prune happens to clean it up before
the containing pg_clog segment is truncated away, it'd later cause an
error when the XID is looked up.Fix the problem by having the tuple freezing routines cope with the
situation: don't freeze the tuple (and keep it dead). In the cases that
the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
so that there is no need to look up the XID in pg_clog later on.
I think this is the wrong fix - the assumption that ctid chains can be
validated based on the prev-xmax = cur-xmin is fairly ingrained into the
system, and we shouldn't just be breaking it. The need to later
lobotomize the checks, in a5736bf754, is some evidence of that.
I spent some time discussing this with Robert today (with both of us
alternating between feeling the other and ourselves as stupid), and the
conclusion I think is that the problem is on the pruning, rather than
the freezing side.
FWIW, I don't think the explanation in the commit message of how the
problem triggers is actually correct - the testcase you added doesn't
have any transactions concurrently committing / aborting when
crashing. Rather the problem is that the liveliness checks for freezing
is different from the ones for pruning. HTSV considers xmin
RECENTLY_DEAD when there's a multi xmax with at least one alive locker,
whereas pruning thinks it has to do something because there's the member
xid below the cutoff. No concurrent activity is needed to trigger that.
I think the problem is on the pruning, rather than the freezing side. We
can't freeze a tuple if it has an alive predecessor - rather than
weakining this, we should be fixing the pruning to not have the alive
predecessor.
The relevant logic in HTSV is:
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
TransactionId xmax;
if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
{
/* already checked above */
Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
xmax = HeapTupleGetUpdateXid(tuple);
/* not LOCKED_ONLY, so it has to have an xmax */
Assert(TransactionIdIsValid(xmax));
if (TransactionIdIsInProgress(xmax))
return HEAPTUPLE_DELETE_IN_PROGRESS;
else if (TransactionIdDidCommit(xmax))
/* there are still lockers around -- can't return DEAD here */
return HEAPTUPLE_RECENTLY_DEAD;
/* updating transaction aborted */
return HEAPTUPLE_LIVE;
with the problematic branch being the TransactionIdDidCommit()
case. Rather than unconditionally returning HEAPTUPLE_RECENTLY_DEAD, we
should test the update xid against OldestXmin and return DEAD /
RECENTLY_DEAD according to that.
If the update xmin is actually below the cutoff we can remove the tuple
even if there's live lockers - the lockers will also be present in the
newer version of the tuple. I verified that for me that fixes the
problem. Obviously that'd require some comment work and more careful
diagnosis.
I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
the face of previously corrupted tuple chains and pg_upgraded clusters -
it can lead to tuples being considered related, even though they they're
from entirely independent hot chains. Especially when upgrading 9.3 post
your fix, to current releases.
In short, I think the two commits should be reverted, and replaced with
a fix along what I'm outlining above.
There'll be some trouble for people that upgraded to an unreleased
version, but I don't really see what we could do about that.
I could be entirely wrong - I've been travelling for the last two weeks
and my brain is somewhat more fried than usual.
Regards,
Andres
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Nov 2, 2017 at 4:50 PM, Andres Freund <andres@anarazel.de> wrote:
I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
the face of previously corrupted tuple chains and pg_upgraded clusters -
it can lead to tuples being considered related, even though they they're
from entirely independent hot chains. Especially when upgrading 9.3 post
your fix, to current releases.
I think this is a key point. If the new behavior were merely not
entirely correct, we could perhaps refine it later. But it's not only
not correct - it actually has the potential to create new problems
that didn't exist before those commits. And if we release without
reverting those commits then we can't change our mind later.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund wrote:
I spent some time discussing this with Robert today (with both of us
alternating between feeling the other and ourselves as stupid), and the
conclusion I think is that the problem is on the pruning, rather than
the freezing side.
Thanks both for spending some more time on this.
I think the problem is on the pruning, rather than the freezing side. We
can't freeze a tuple if it has an alive predecessor - rather than
weakining this, we should be fixing the pruning to not have the alive
predecessor.
I gave a look at HTSV back then, but I didn't find what the right tweak
was, but then I only tried changing the return value to DEAD and
DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
on OldestXmin didn't occur to me ... I was thinking that the fact that
there were live lockers meant that the tuple could not be removed,
obviously failing to notice that the subsequent versions of the tuple
would be good enough.
If the update xmin is actually below the cutoff we can remove the tuple
even if there's live lockers - the lockers will also be present in the
newer version of the tuple. I verified that for me that fixes the
problem. Obviously that'd require some comment work and more careful
diagnosis.
Sounds good.
I'll revert those commits then, keeping the test, and then you can
commit your change. OK?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Hi,
On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote:
Andres Freund wrote:
I think the problem is on the pruning, rather than the freezing side. We
can't freeze a tuple if it has an alive predecessor - rather than
weakining this, we should be fixing the pruning to not have the alive
predecessor.I gave a look at HTSV back then, but I didn't find what the right tweak
was, but then I only tried changing the return value to DEAD and
DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
on OldestXmin didn't occur to me ... I was thinking that the fact that
there were live lockers meant that the tuple could not be removed,
obviously failing to notice that the subsequent versions of the tuple
would be good enough.
I'll try to write up a commit based on that idea. I think there's some
comment work needed too, Robert and I were both confused by a few
things.
I'm unfortunately travelling atm - it's evening here, and I'll flying
back to the US all Saturday. I'm fairly sure I'll be able to come up
with a decent patch tomorrow, but I'll need review and testing by
others.
If the update xmin is actually below the cutoff we can remove the tuple
even if there's live lockers - the lockers will also be present in the
newer version of the tuple. I verified that for me that fixes the
problem. Obviously that'd require some comment work and more careful
diagnosis.Sounds good.
I'll revert those commits then, keeping the test, and then you can
commit your change. OK?
Generally that sounds good - but you can't keep the testcase in without
the new fix - the buildfarm would immediately turn red. I guess the best
thing would be to temporarily remove it from the schedule?
Do we care about people upgrading to unreleased versions? We could do
nothing, document it in the release notes, or ???
Greetings,
Andres Freund
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Andres Freund <andres@anarazel.de> writes:
Do we care about people upgrading to unreleased versions? We could do
nothing, document it in the release notes, or ???
Do nothing.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Andres Freund <andres@anarazel.de> writes:
Do we care about people upgrading to unreleased versions? We could do
nothing, document it in the release notes, or ???Do nothing.
Agreed. Not much we can do there.
Thanks!
Stephen
Stephen Frost wrote:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
Andres Freund <andres@anarazel.de> writes:
Do we care about people upgrading to unreleased versions? We could do
nothing, document it in the release notes, or ???Do nothing.
Agreed. Not much we can do there.
Pushed the reverts.
I noticed while doing so that REL_10_STABLE contains the bogus commits.
Does that change our opinion regarding what to do for people upgrading
to a version containing the broken commits? I don't think so, because
1) we hope that not many people will trust their data to 10.0
immediately after release
2) the bug is very low probability
3) it doesn't look like we can do a lot about it anyway.
I'll experiment with Andres' proposed fix now.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Pushed the reverts.
I noticed while doing so that REL_10_STABLE contains the bogus commits.
Does that change our opinion regarding what to do for people upgrading
to a version containing the broken commits? I don't think so, because1) we hope that not many people will trust their data to 10.0
immediately after release
2) the bug is very low probability
3) it doesn't look like we can do a lot about it anyway.
Just to be clear, it looks like "Fix freezing of a dead HOT-updated
tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
was stamped, but "Fix traversal of half-frozen update chains"
(22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
therefore unreleased at present.
Users of 10.0 who hit the code introduced by
46c35116ae1acc8826705ef2a7b5d9110f9d6e84 will have XIDs stored in the
xmax fields of tuples that predate relfrozenxid. Those tuples will be
hinted-committed. That's not good, but it might not really have much
in the way of consequences. *IF* the next VACUUM doesn't get confused
by the old XID, then it will prune the tuple then and I think we'll be
OK. And I think it won't, because it should just call
HeapTupleSatisfiesVacuum() and that should see that
HEAP_XMAX_COMMITTED is set and not actually try to consult the old
CLOG. If that hint bit can ever get lost - or fail to propagate to a
standby - then we have more trouble, but the fact that it's set by a
logged operation makes me hope that can't happen. Furthermore, that
follow-on VACUUM should indeed arrive in due time, because we will not
have marked the page all-visible -- HeapTupleSatisfiesVacuum() will
NOT have returned HEAPTUPLE_LIVE when called from lazy_scan_heap(),
and therefore we will have set all_visible = false.
The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
I think things get a lot more dangerous. The problem (as Andres
pointed out to me this afternoon) is that it seems possible to end up
with a situation where there should be two HOT chains on a page, and
because of the weakened xmin/xmax checking rules, we end up thinking
that the second one is a continuation of the first one, which will be
all kinds of bad news. That would be particularly likely to happen in
releases from before we invented HEAP_XMIN_FROZEN, when there's no
xmin/xmax matching at all, but could happen on later releases if we
are extraordinarily unlucky (i.e. xmin of the first tuple in the
second chain happens to be the same as the pre-freeze xmax in the old
chain, probably because the same XID was used to update the page in
two consecutive epochs). Fortunately, that commit is (I think) not
released anywhere.
Personally, I think it would be best to push the release out a week.
I think we understand this well enough now that we can fix it
relatively easily, but haste makes bugs, and (I know you're all tired
of hearing me say this) patches that implicate the on-disk format are
scary.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Nov 2, 2017 at 4:20 AM, Andres Freund <andres@anarazel.de> wrote:
I think the problem is on the pruning, rather than the freezing side. We
can't freeze a tuple if it has an alive predecessor - rather than
weakining this, we should be fixing the pruning to not have the alive
predecessor.
Excellent catch.
If the update xmin is actually below the cutoff we can remove the tuple
even if there's live lockers - the lockers will also be present in the
newer version of the tuple. I verified that for me that fixes the
problem. Obviously that'd require some comment work and more careful
diagnosis.
I didn't even know that that was safe.
I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
the face of previously corrupted tuple chains and pg_upgraded clusters -
it can lead to tuples being considered related, even though they they're
from entirely independent hot chains. Especially when upgrading 9.3 post
your fix, to current releases.
Frankly, I'm relieved that you got to this. I was highly suspicious of
a5736bf754c82d8b86674e199e232096c679201d, even beyond my specific,
actionable concern about how it failed to handle the
9.3/FrozenTransactionId xmin case as special. As I went into in the
"heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead
bug" thread, these commits left us with a situation where there didn't
seem to be a reliable way of knowing whether or not it is safe to
interrogate clog for a given heap tuple using a tool like amcheck.
And, it wasn't obvious that you couldn't have a codepath that failed
to account for pre-cutoff non-frozen tuples -- codepaths that call
TransactionIdDidCommit() despite it actually being unsafe.
If I'm not mistaken, your proposed fix restores sanity there.
--
Peter Geoghegan
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
I think things get a lot more dangerous. The problem (as Andres
pointed out to me this afternoon) is that it seems possible to end up
with a situation where there should be two HOT chains on a page, and
because of the weakened xmin/xmax checking rules, we end up thinking
that the second one is a continuation of the first one, which will be
all kinds of bad news. That would be particularly likely to happen in
releases from before we invented HEAP_XMIN_FROZEN, when there's no
xmin/xmax matching at all, but could happen on later releases if we
are extraordinarily unlucky (i.e. xmin of the first tuple in the
second chain happens to be the same as the pre-freeze xmax in the old
chain, probably because the same XID was used to update the page in
two consecutive epochs). Fortunately, that commit is (I think) not
released anywhere.
FWIW, if you look at the second commit
(22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
that it doesn't even treat those two cases differently. It was buggy
even on its own terms. The FrozenTransactionId test used an xmin from
HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
Just to be clear, it looks like "Fix freezing of a dead HOT-updated
tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
was stamped, but "Fix traversal of half-frozen update chains"
(22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
therefore unreleased at present.
Thanks for doing this analysis of the actual effects in 10.0.
Personally, I think it would be best to push the release out a week.
I would only be in favor of that if there were some reason to think that
the bug is worse now than it's been in the four years since 9.3 was
released. Otherwise, we should ship the bug fixes we have on-schedule.
I think it's a very very safe bet that there are other data-loss-causing
bugs in there, so I see no good reason for panicking over this one.
... and (I know you're all tired of hearing me say this) patches that
implicate the on-disk format are scary.
Agreed, but how is that relevant now that the bogus patches are reverted?
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Nov 2, 2017 at 10:26 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
I think things get a lot more dangerous. The problem (as Andres
pointed out to me this afternoon) is that it seems possible to end up
with a situation where there should be two HOT chains on a page, and
because of the weakened xmin/xmax checking rules, we end up thinking
that the second one is a continuation of the first one, which will be
all kinds of bad news. That would be particularly likely to happen in
releases from before we invented HEAP_XMIN_FROZEN, when there's no
xmin/xmax matching at all, but could happen on later releases if we
are extraordinarily unlucky (i.e. xmin of the first tuple in the
second chain happens to be the same as the pre-freeze xmax in the old
chain, probably because the same XID was used to update the page in
two consecutive epochs). Fortunately, that commit is (I think) not
released anywhere.FWIW, if you look at the second commit
(22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
that it doesn't even treat those two cases differently. It was buggy
even on its own terms. The FrozenTransactionId test used an xmin from
HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().
Oh, wow. You seem to be correct.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Nov 2, 2017 at 10:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Personally, I think it would be best to push the release out a week.
I would only be in favor of that if there were some reason to think that
the bug is worse now than it's been in the four years since 9.3 was
released. Otherwise, we should ship the bug fixes we have on-schedule.
I think it's a very very safe bet that there are other data-loss-causing
bugs in there, so I see no good reason for panicking over this one.
Well, my thought was that delaying this release for a week would be
better than either (a) doing an extra minor release just to get this
fix out or (b) waiting another three months to release this fix. The
former seems like fairly unnecessary work, and the latter doesn't seem
particularly responsible. Users can't reasonably expect us to fix
data-loss-causing bugs that we don't know about yet, but they can
reasonably expect us to issue fixes promptly for ones that we do know
about.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Robert Haas <robertmhaas@gmail.com> writes:
Well, my thought was that delaying this release for a week would be
better than either (a) doing an extra minor release just to get this
fix out or (b) waiting another three months to release this fix. The
former seems like fairly unnecessary work, and the latter doesn't seem
particularly responsible. Users can't reasonably expect us to fix
data-loss-causing bugs that we don't know about yet, but they can
reasonably expect us to issue fixes promptly for ones that we do know
about.
Our experience with "hold the release waiting for a fix for bug X"
decisions has been consistently bad. Furthermore, if we can't produce
a patch we trust by Monday, I would much rather that we not do it
in a rushed fashion at all. I think it would be entirely reasonable
to consider making off-cadence releases in, perhaps, a month, once
the dust is entirely settled.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 2017-11-02 06:05:51 -0700, Andres Freund wrote:
Hi,
On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote:
Andres Freund wrote:
I think the problem is on the pruning, rather than the freezing side. We
can't freeze a tuple if it has an alive predecessor - rather than
weakining this, we should be fixing the pruning to not have the alive
predecessor.I gave a look at HTSV back then, but I didn't find what the right tweak
was, but then I only tried changing the return value to DEAD and
DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
on OldestXmin didn't occur to me ... I was thinking that the fact that
there were live lockers meant that the tuple could not be removed,
obviously failing to notice that the subsequent versions of the tuple
would be good enough.I'll try to write up a commit based on that idea. I think there's some
comment work needed too, Robert and I were both confused by a few
things.
I'm unfortunately travelling atm - it's evening here, and I'll flying
back to the US all Saturday. I'm fairly sure I'll be able to come up
with a decent patch tomorrow, but I'll need review and testing by
others.
Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.
I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.
Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running. It does so without verifying liveliness
of members. Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one. Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly. I hope I'm missing something here?
Greetings,
Andres Freund
Attachments:
0001-Fix-pruning-of-locked-and-updated-tuples.patchtext/x-diff; charset=us-asciiDownload
From 774376dc8f7ed76657660e6beb0f5191d1bdaae4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 3 Nov 2017 07:52:29 -0700
Subject: [PATCH] Fix pruning of locked and updated tuples.
Previously it was possible that a tuple was not pruned during vacuum,
even though it's update xmax (i.e. the updating xid in a multixact
with both key share lockers and an updater) was below the cutoff
horizon.
As the freezing code assumed, rightly so, that that's not supposed to
happen, xmax would be preserved (as a member of a new multixact or
xmax directly). That causes two problems: For one the tuple is below
the xmin horizon, which can cause problems if the clog is truncated or
once there's an xid wraparound. The bigger problem is that that will
break HOT chains, leading to index lookups failing, which in turn can
lead to constraints being violated.
Fix the problem by recognizing that tuples can be DEAD instead of
RECENTLY_DEAD, even if the multixactid has alive members, if the
update_xid is below the xmin horizon. That's safe because newer
versions of the tuple will contain the locking xids.
As the wrong freezing of xmax can cause data corruption, extend sanity
checks and promote them to elogs rather than assertions.
Per-Discussion: Robert Haas and Freund
Reported-By: Daniel Wood
Discussion:
https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
---
src/backend/access/heap/heapam.c | 34 ++++++++++++++++------
src/backend/commands/vacuumlazy.c | 13 +++++++++
src/backend/utils/time/tqual.c | 53 +++++++++++++++--------------------
src/test/isolation/isolation_schedule | 1 +
4 files changed, 62 insertions(+), 39 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 765750b8743..9b963e0cd0d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6412,7 +6412,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
if (TransactionIdPrecedes(xid, cutoff_xid))
{
- Assert(!TransactionIdDidCommit(xid));
+ if (TransactionIdDidCommit(xid))
+ elog(ERROR, "can't freeze committed xmax");
*flags |= FRM_INVALIDATE_XMAX;
xid = InvalidTransactionId; /* not strictly necessary */
}
@@ -6484,6 +6485,8 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
{
TransactionId xid = members[i].xid;
+ Assert(TransactionIdIsValid(xid));
+
/*
* It's an update; should we keep it? If the transaction is known
* aborted or crashed then it's okay to ignore it, otherwise not.
@@ -6512,18 +6515,24 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
update_committed = true;
update_xid = xid;
}
+ else
+ {
+ /*
+ * Not in progress, not committed -- must be aborted or crashed;
+ * we can ignore it.
+ */
+ }
- /*
- * Not in progress, not committed -- must be aborted or crashed;
- * we can ignore it.
- */
/*
* Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the
- * update Xid cannot possibly be older than the xid cutoff.
+ * update Xid cannot possibly be older than the xid cutoff. The
+ * presence of such a tuple would cause corruption, so be paranoid
+ * and check.
*/
- Assert(!TransactionIdIsValid(update_xid) ||
- !TransactionIdPrecedes(update_xid, cutoff_xid));
+ if (TransactionIdIsValid(update_xid) &&
+ TransactionIdPrecedes(update_xid, cutoff_xid))
+ elog(ERROR, "encountered tuple from before xid cutoff");
/*
* If we determined that it's an Xid corresponding to an update
@@ -6714,7 +6723,16 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
else if (TransactionIdIsNormal(xid))
{
if (TransactionIdPrecedes(xid, cutoff_xid))
+ {
+ /*
+ * If we freeze xmax, make absolutely sure that it's not an XID
+ * that is important.
+ */
+ if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ TransactionIdDidCommit(xid))
+ elog(ERROR, "can't freeze committed xmax");
freeze_xmax = true;
+ }
else
totally_frozen = false;
}
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6587db77acf..93e3ba31214 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1007,7 +1007,20 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
*/
if (HeapTupleIsHotUpdated(&tuple) ||
HeapTupleIsHeapOnly(&tuple))
+ {
nkeep += 1;
+
+ /*
+ * If this were to happen for a tuple that actually
+ * needed to be frozen, we'd be in trouble, because
+ * it'd leave a tuple below the relation's xmin
+ * horizon alive.
+ */
+ if (heap_tuple_needs_freeze(tuple.t_data, FreezeLimit,
+ MultiXactCutoff, buf))
+ elog(ERROR, "encountered tuple from before xid cutoff");
+
+ }
else
tupgone = true; /* we can delete the tuple */
all_visible = false;
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index b7aab0dd190..6209b4e17cc 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1311,49 +1311,40 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
- TransactionId xmax;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuple);
- if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
- {
- /* already checked above */
- Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
-
- xmax = HeapTupleGetUpdateXid(tuple);
-
- /* not LOCKED_ONLY, so it has to have an xmax */
- Assert(TransactionIdIsValid(xmax));
-
- if (TransactionIdIsInProgress(xmax))
- return HEAPTUPLE_DELETE_IN_PROGRESS;
- else if (TransactionIdDidCommit(xmax))
- /* there are still lockers around -- can't return DEAD here */
- return HEAPTUPLE_RECENTLY_DEAD;
- /* updating transaction aborted */
- return HEAPTUPLE_LIVE;
- }
-
- Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED));
-
- xmax = HeapTupleGetUpdateXid(tuple);
+ /* already checked above */
+ Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
/* not LOCKED_ONLY, so it has to have an xmax */
Assert(TransactionIdIsValid(xmax));
- /* multi is not running -- updating xact cannot be */
- Assert(!TransactionIdIsInProgress(xmax));
- if (TransactionIdDidCommit(xmax))
+ if (TransactionIdIsInProgress(xmax))
+ return HEAPTUPLE_DELETE_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xmax))
{
+ /*
+ * The multixact might still be running due to lockers. If the
+ * updater is below the horizon we have to return DEAD regardless
+ * - otherwise we could end up with a tuple where the updater has
+ * to be removed due to the horizon, but is not pruned away. It's
+ * not a problem to prune that tuple because all the lockers will
+ * also be present in the newer tuple version.
+ */
if (!TransactionIdPrecedes(xmax, OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;
else
return HEAPTUPLE_DEAD;
}
+ else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
+ {
+ /*
+ * Not in Progress, Not Committed, so either Aborted or crashed.
+ * Remove the Xmax.
+ */
+ SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
+ }
- /*
- * Not in Progress, Not Committed, so either Aborted or crashed.
- * Remove the Xmax.
- */
- SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
return HEAPTUPLE_LIVE;
}
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 32c965b2a02..7dad3c23163 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -44,6 +44,7 @@ test: update-locked-tuple
test: propagate-lock-delete
test: tuplelock-conflict
test: tuplelock-update
+test: freeze-the-dead
test: nowait
test: nowait-2
test: nowait-3
--
2.14.1.536.g6867272d5b.dirty
Andres Freund <andres@anarazel.de> wrote:
Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.
The commit message should probably say that the bug involves the
resurrection of previously dead tuples, which is different to there
being duplicates because a constraint is not enforced because HOT chains
are broken (that's a separate, arguably less serious problem).
Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running. It does so without verifying liveliness
of members. Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one. Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly. I hope I'm missing something here?
Can you be more specific about what you mean here? I think that I
understand where you're going with this, but I'm not sure.
--
Peter Geoghegan
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Peter Geoghegan wrote:
Andres Freund <andres@anarazel.de> wrote:
Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running. It does so without verifying liveliness
of members. Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one. Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly. I hope I'm missing something here?Can you be more specific about what you mean here? I think that I
understand where you're going with this, but I'm not sure.
He means that the tuple that heap_update moves to page 1 (which will no
longer be processed by vacuum) will contain a multixact that's older
than relminmxid -- because it is copied unchanged by heap_update instead
of properly checking against age limit.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
He means that the tuple that heap_update moves to page 1 (which will no
longer be processed by vacuum) will contain a multixact that's older
than relminmxid -- because it is copied unchanged by heap_update instead
of properly checking against age limit.
I see. The problem is more or less with this heap_update() code:
/*
* And also prepare an Xmax value for the new copy of the tuple. If there
* was no xmax previously, or there was one but all lockers are now gone,
* then use InvalidXid; otherwise, get the xmax from the old tuple. (In
* rare cases that might also be InvalidXid and yet not have the
* HEAP_XMAX_INVALID bit set; that's fine.)
*/
if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
(checked_lockers && !locker_remains))
xmax_new_tuple = InvalidTransactionId;
else
xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);
My naive guess is that we have to create a new MultiXactId here in at
least some cases, just like FreezeMultiXactId() sometimes does.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On November 4, 2017 1:22:04 AM GMT+05:30, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Peter Geoghegan wrote:
Andres Freund <andres@anarazel.de> wrote:
Staring at the vacuumlazy hunk I think I might have found a related
bug:
heap_update_tuple() just copies the old xmax to the new tuple's
xmax if
a multixact and still running. It does so without verifying
liveliness
of members. Isn't that buggy? Consider what happens if we have
three
blocks: 1 has free space, two is being vacuumed and is locked,
three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that caseafaict
the multi will be copied from the third page to the first one.
Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly. I hope I'm missing something here?Can you be more specific about what you mean here? I think that I
understand where you're going with this, but I'm not sure.He means that the tuple that heap_update moves to page 1 (which will no
longer be processed by vacuum) will contain a multixact that's older
than relminmxid -- because it is copied unchanged by heap_update
instead
of properly checking against age limit.
Right. That, or a member xid below relminxid. I think both scenarios are possible.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 2017-11-03 12:36:59 -0700, Peter Geoghegan wrote:
Andres Freund <andres@anarazel.de> wrote:
Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.The commit message should probably say that the bug involves the
resurrection of previously dead tuples, which is different to there
being duplicates because a constraint is not enforced because HOT chains
are broken (that's a separate, arguably less serious problem).
The reason for that is that I hadn't yet quite figured out how the bug I
described in the commit message (and the previously committed testcase)
would cause that. I was planning to diagnose / experiment more with this
and write an email if I couldn't come up with an explanation. The
committed test does *not* actually trigger that.
The reason I couldn't quite figure out how the problem triggers is that
the xmax removing branch in FreezeMultiXactId() can only be reached if
the multi is from before the cutoff - which it can't have been for a
single vacuum execution to trigger the bug, because that'd require the
multi to be running to falsely return RECENTLY_DEAD rather than DEAD (by
definition a multi can't be below the cutoff if running).
For the problem to occur I think vacuum has to be executed *twice*: The
first time through HTSV mistakenly returns RECENTLY_DEAD preventing the
tuple from being pruned. That triggers FreezeMultiXactId() to create a
*new* multi with dead members. At this point the database already is in
a bad state. Then in a second vacuum HTSV returns DEAD, but
* Ordinarily, DEAD tuples would have been removed by
* heap_page_prune(), but it's possible that the tuple
* state changed since heap_page_prune() looked. In
* particular an INSERT_IN_PROGRESS tuple could have
* changed to DEAD if the inserter aborted. So this
* cannot be considered an error condition.
*
..
if (HeapTupleIsHotUpdated(&tuple) ||
HeapTupleIsHeapOnly(&tuple))
{
nkeep += 1;
prevents the tuple from being removed. If now the multi xmax is below
the xmin horizon it triggers
/*
* If the xid is older than the cutoff, it has to have aborted,
* otherwise the tuple would have gotten pruned away.
*/
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
elog(ERROR, "can't freeze committed xmax");
*flags |= FRM_INVALIDATE_XMAX;
in FreezeMultiXact. Without the new elog, this then causes xmax to be
removed, reviving the tuple.
The current testcase, and I think the descriptions in the relevant
threads, all actually fail to point out the precise way the bug is
triggered. As e.g. evidenced that the freeze-the-dead case appears to
not cause any failures in !assertion builds even if the bug is present.
The good news is that the error checks I added in the patch upthread
prevent all of this from happening, even though I'd not yet understood
the mechanics fully - it's imnsho pretty clear that we need to be more
paranoid in production builds around this. A bunch of users that
triggered largely "invisible" corruption (the first vacuum described
above) will possibly run into one of these elog()s, but that seems far
preferrable to making the corruption a lot worse.
I think unfortunately the check + elog() in the
HeapTupleIsHeapOnly(&tuple))
{
nkeep += 1;
/*
* If this were to happen for a tuple that actually
* needed to be frozen, we'd be in trouble, because
* it'd leave a tuple below the relation's xmin
* horizon alive.
*/
if (heap_tuple_needs_freeze(tuple.t_data, FreezeLimit,
MultiXactCutoff, buf))
{
elog(ERROR, "encountered tuple from before xid cutoff, sleeping");
case needs to go, because it's too coarse - there very well could be
lockers or such that need to be removed and where it's fine to do
so. The checks the actual freezing code ought to be sufficient however.
Greetings,
Andres Freund
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
The current testcase, and I think the descriptions in the relevant
threads, all actually fail to point out the precise way the bug is
triggered. As e.g. evidenced that the freeze-the-dead case appears to
not cause any failures in !assertion builds even if the bug is present.
Trying to write up tests reproducing more of the issues in the area, I
think I might have found a third issue - although I'm not sure how
practically relevant it is:
When FreezeMultiXactId() decides it needs to create a new multi because
the old one is below the cutoff, that attempt can be defeated by the
multixact id caching. If the new multi has exactly the same members the
multixact id cache will just return the existing multi with the same
members. The cache will routinely be primed from the lookup of its
members.
I'm not yet sure how easily this can be hit in practice, because
commonly the multixact horizon should prevent a multi with all its
members living from being below the horizon. I found a situation where
that's not the case with the current bug, but I'm not sif that can
happen otherwise.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
The reason for that is that I hadn't yet quite figured out how the bug I
described in the commit message (and the previously committed testcase)
would cause that. I was planning to diagnose / experiment more with this
and write an email if I couldn't come up with an explanation. The
committed test does *not* actually trigger that.The reason I couldn't quite figure out how the problem triggers is that
[ long explanation ]
Attached is a version of the already existing regression test that both
reproduces the broken hot chain (and thus failing index lookups) and
then also the tuple reviving. I don't see any need for letting this run
with arbitrary permutations.
Thanks to whoever allowed isolationtester permutations to go over
multiple lines and allow comments. I was wondering about adding that as
a feature just to discover it's already there ;)
What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing. There's no way we can fix hot chains after the fact, but
preventing dead rows from reapparing seems important. A minimal version
of that is fairly easy - we slap a bunch of if if
!TransactionIdDidCommit() elog(ERROR) at various code paths. But that'll
often trigger clog access errors when the problem occurred - if we want
to do better we need to pass down relfrozenxid/relminmxid to a few
functions. I'm inclined to do so, but it'll make the patch larger...
Comments?
Greetings,
Andres Freund
Attachments:
On Thu, Nov 9, 2017 at 2:24 PM, Andres Freund <andres@anarazel.de> wrote:
Attached is a version of the already existing regression test that both
reproduces the broken hot chain (and thus failing index lookups) and
then also the tuple reviving. I don't see any need for letting this run
with arbitrary permutations.
I thought that the use of every possible permutation was excessive,
myself. It left us with an isolation test that didn't precisely
describe the behavior that is tested. What you came up with seems far,
far better, especially because of the comments you included. The mail
message-id references seem to add a lot, too.
What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing.
I don't follow you here. Why would REINDEXing make the rows that
should be dead disappear again, even for a short period of time? It
might do so for index scans, I suppose, but not for sequential scans.
Are you concerned about a risk of somebody not noticing that
sequential scans are still broken?
Actually, on second thought, I take that back -- I don't think that
REINDEXing will even finish once a HOT chain is broken by the bug.
IndexBuildHeapScan() actually does quite a good job of making sure
that HOT chains are sane, which is how the enhanced amcheck notices
the bug here in practice. (Before this bug was discovered, I would
have expected amcheck to catch problems like it slightly later, during
the Bloom filter probe for that HOT chain...but, in fact, it never
gets there with corruption from this bug in practice, AFAIK.)
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-11-09 16:02:17 -0800, Peter Geoghegan wrote:
What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing.I don't follow you here. Why would REINDEXing make the rows that
should be dead disappear again, even for a short period of time?
It's not the REINDEX that makes them reappear. It's the second
vacuum. The reindex part was about $user trying to fix the problem...
As you need two vacuums with appropriate cutoffs to hit the "rows
revive" problem, that'll often in practice not happen immediately.
Actually, on second thought, I take that back -- I don't think that
REINDEXing will even finish once a HOT chain is broken by the bug.
IndexBuildHeapScan() actually does quite a good job of making sure
that HOT chains are sane, which is how the enhanced amcheck notices
the bug here in practice.
I think that's too optimistic.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:
I don't follow you here. Why would REINDEXing make the rows that
should be dead disappear again, even for a short period of time?It's not the REINDEX that makes them reappear.
Of course. I was just trying to make sense of what you said.
It's the second
vacuum. The reindex part was about $user trying to fix the problem...
As you need two vacuums with appropriate cutoffs to hit the "rows
revive" problem, that'll often in practice not happen immediately.
This explanation clears things up, though.
--
Peter Geoghegan
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:
Actually, on second thought, I take that back -- I don't think that
REINDEXing will even finish once a HOT chain is broken by the bug.
IndexBuildHeapScan() actually does quite a good job of making sure
that HOT chains are sane, which is how the enhanced amcheck notices
the bug here in practice.I think that's too optimistic.
Why? Because the "find the TID of the root" logic in
IndexBuildHeapScan()/heap_get_root_tuples() won't reliably find the
actual root (it might be some other HOT chain root following TID
recycling by VACUUM)?
Assuming that's what you meant: I would have thought that the
xmin/xmax matching within heap_get_root_tuples() makes the sanity
checking fairly reliable in practice.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-11-09 16:45:07 -0800, Peter Geoghegan wrote:
On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:
Actually, on second thought, I take that back -- I don't think that
REINDEXing will even finish once a HOT chain is broken by the bug.
IndexBuildHeapScan() actually does quite a good job of making sure
that HOT chains are sane, which is how the enhanced amcheck notices
the bug here in practice.I think that's too optimistic.
Why? Because the "find the TID of the root" logic in
IndexBuildHeapScan()/heap_get_root_tuples() won't reliably find the
actual root (it might be some other HOT chain root following TID
recycling by VACUUM)?
Primarily because it's not an anti-corruption tool. I'd be surprised if
there weren't ways to corrupt the page using these corruptions that
aren't detected by it. But even if it were, I don't think there's
enough information to do so in the general case. You very well can end
up with pages where subsequent hot pruning has removed a good bit of the
direct evidence of this bug.
But I'm not really sure why the error detection capabilities of matter
much for the principal point I raised, which is how much work we need to
do to not further worsen the corruption.
Greetings,
Andres Freund
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Thu, Nov 9, 2017 at 4:53 PM, Andres Freund <andres@anarazel.de> wrote:
Primarily because it's not an anti-corruption tool. I'd be surprised if
there weren't ways to corrupt the page using these corruptions that
aren't detected by it.
It's very hard to assess the risk of missing something that's actually
detectable with total confidence, but I think that the check is actually
very thorough.
But even if it were, I don't think there's
enough information to do so in the general case. You very well can end
up with pages where subsequent hot pruning has removed a good bit of the
direct evidence of this bug.
Sure, but maybe those are cases that can't get any worse anyway. So the
question of avoiding making it worse doesn't arise.
But I'm not really sure why the error detection capabilities of matter
much for the principal point I raised, which is how much work we need to
do to not further worsen the corruption.
You're right. Just trying to put the risk in context, and to understand the
extent of the concern that you have.
--
Peter Geoghegan
Import Notes
Reply to msg id not found: CAH2-WzmYvPf-gi5KdCdZnWBtYb4fJHXWRnX8TNwNkPFcPR0nng@mail.gmail.com
Hi,
On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.
I'm fairly sure it could be triggered, therefore I've rewritten that.
I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.
Please review!
One thing I'm wondering about is whether we have any way for users to
diagnose whether they need to dump & restore - I can't really think of
anything actually meaningful. Reindexing will find some instances, but
far from all. VACUUM (disable_page_skipping, freeze) pg_class will
also find a bunch. Not a perfect story.
Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running. It does so without verifying liveliness
of members. Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one. Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly. I hope I'm missing something here?
Trying to write a testcase for that now.
Greetings,
Andres Freund
Attachments:
0001-Fix-pruning-of-locked-and-updated-tuples.patchtext/x-diff; charset=us-asciiDownload
From 6d9f0b60da30aca9bf5eaab814b1d5d7f1ccb8c4 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Fri, 3 Nov 2017 07:52:29 -0700
Subject: [PATCH 1/2] Fix pruning of locked and updated tuples.
Previously it was possible that a tuple was not pruned during vacuum,
even though it's update xmax (i.e. the updating xid in a multixact
with both key share lockers and an updater) was below the cutoff
horizon.
As the freezing code assumed, rightly so, that that's not supposed to
happen, xmax would be preserved (as a member of a new multixact or
xmax directly). That causes two problems: For one the tuple is below
the xmin horizon, which can cause problems if the clog is truncated or
once there's an xid wraparound. The bigger problem is that that will
break HOT chains, which in turn can lead two to breakages: First,
failing index lookups, which in turn can e.g lead to constraints being
violated. Second, future hot prunes / vacuums can end up making
invisible tuples visible again. There's other harmful scenarios.
Fix the problem by recognizing that tuples can be DEAD instead of
RECENTLY_DEAD, even if the multixactid has alive members, if the
update_xid is below the xmin horizon. That's safe because newer
versions of the tuple will contain the locking xids.
This commit also rewrites, and activates, a previously added
regression test to actually showcase corruption without the fix
applied.
A followup commit will harden the code somewhat against future similar
bugs and already corrupted data.
Author: Andres Freund
Per-Discussion: Robert Haas and Freund
Reported-By: Daniel Wood
Discussion:
https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
---
src/backend/utils/time/tqual.c | 61 ++++++--------
src/test/isolation/expected/freeze-the-dead.out | 107 +++++-------------------
src/test/isolation/specs/freeze-the-dead.spec | 36 +++++++-
3 files changed, 82 insertions(+), 122 deletions(-)
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index a821e2eed10..8be2980116c 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1311,49 +1311,42 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
- TransactionId xmax;
+ TransactionId xmax = HeapTupleGetUpdateXid(tuple);
- if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
- {
- /* already checked above */
- Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
-
- xmax = HeapTupleGetUpdateXid(tuple);
-
- /* not LOCKED_ONLY, so it has to have an xmax */
- Assert(TransactionIdIsValid(xmax));
-
- if (TransactionIdIsInProgress(xmax))
- return HEAPTUPLE_DELETE_IN_PROGRESS;
- else if (TransactionIdDidCommit(xmax))
- /* there are still lockers around -- can't return DEAD here */
- return HEAPTUPLE_RECENTLY_DEAD;
- /* updating transaction aborted */
- return HEAPTUPLE_LIVE;
- }
-
- Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED));
-
- xmax = HeapTupleGetUpdateXid(tuple);
+ /* already checked above */
+ Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
/* not LOCKED_ONLY, so it has to have an xmax */
Assert(TransactionIdIsValid(xmax));
- /* multi is not running -- updating xact cannot be */
- Assert(!TransactionIdIsInProgress(xmax));
- if (TransactionIdDidCommit(xmax))
+ if (TransactionIdIsInProgress(xmax))
+ return HEAPTUPLE_DELETE_IN_PROGRESS;
+ else if (TransactionIdDidCommit(xmax))
{
- if (!TransactionIdPrecedes(xmax, OldestXmin))
- return HEAPTUPLE_RECENTLY_DEAD;
- else
+ /*
+ * The multixact might still be running due to lockers. If the
+ * updater is below the horizon we have to return DEAD regardless
+ * - otherwise we could end up with a tuple where the updater has
+ * to be removed due to the horizon, but is not pruned away. It's
+ * not a problem to prune that tuple because all the lockers will
+ * also be present in the newer tuple version.
+ */
+ if (TransactionIdPrecedes(xmax, OldestXmin))
+ {
return HEAPTUPLE_DEAD;
+ }
+ else
+ return HEAPTUPLE_RECENTLY_DEAD;
+ }
+ else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
+ {
+ /*
+ * Not in Progress, Not Committed, so either Aborted or crashed.
+ * Remove the Xmax.
+ */
+ SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
}
- /*
- * Not in Progress, Not Committed, so either Aborted or crashed.
- * Remove the Xmax.
- */
- SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
return HEAPTUPLE_LIVE;
}
diff --git a/src/test/isolation/expected/freeze-the-dead.out b/src/test/isolation/expected/freeze-the-dead.out
index dd045613f93..8e638f132f9 100644
--- a/src/test/isolation/expected/freeze-the-dead.out
+++ b/src/test/isolation/expected/freeze-the-dead.out
@@ -1,101 +1,36 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
-starting permutation: s1_update s1_commit s1_vacuum s2_key_share s2_commit
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id
-
-3
-step s2_commit: COMMIT;
-
-starting permutation: s1_update s1_commit s2_key_share s1_vacuum s2_commit
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id
-
-3
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-step s2_commit: COMMIT;
-
-starting permutation: s1_update s1_commit s2_key_share s2_commit s1_vacuum
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id
-
-3
-step s2_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-
-starting permutation: s1_update s2_key_share s1_commit s1_vacuum s2_commit
+starting permutation: s1_begin s2_begin s3_begin s1_update s2_key_share s3_key_share s1_update s1_commit s2_commit s2_vacuum s1_selectone s3_commit s2_vacuum s1_selectall
+step s1_begin: BEGIN;
+step s2_begin: BEGIN;
+step s3_begin: BEGIN;
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
id
3
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-step s2_commit: COMMIT;
-
-starting permutation: s1_update s2_key_share s1_commit s2_commit s1_vacuum
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id
-
-3
-step s1_commit: COMMIT;
-step s2_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-
-starting permutation: s1_update s2_key_share s2_commit s1_commit s1_vacuum
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id
-
-3
-step s2_commit: COMMIT;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-
-starting permutation: s2_key_share s1_update s1_commit s1_vacuum s2_commit
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id
-
-3
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-step s2_commit: COMMIT;
-
-starting permutation: s2_key_share s1_update s1_commit s2_commit s1_vacuum
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
+step s3_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
id
3
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
step s1_commit: COMMIT;
step s2_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
+step s2_vacuum: VACUUM FREEZE tab_freeze;
+step s1_selectone:
+ BEGIN;
+ SET LOCAL enable_seqscan = false;
+ SET LOCAL enable_bitmapscan = false;
+ SELECT * FROM tab_freeze WHERE id = 3;
+ COMMIT;
-starting permutation: s2_key_share s1_update s2_commit s1_commit s1_vacuum
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id
+id name x
-3
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s2_commit: COMMIT;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
+3 333 2
+step s3_commit: COMMIT;
+step s2_vacuum: VACUUM FREEZE tab_freeze;
+step s1_selectall: SELECT * FROM tab_freeze ORDER BY name, id;
+id name x
-starting permutation: s2_key_share s2_commit s1_update s1_commit s1_vacuum
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id
-
-3
-step s2_commit: COMMIT;
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
+1 111 0
+3 333 2
diff --git a/src/test/isolation/specs/freeze-the-dead.spec b/src/test/isolation/specs/freeze-the-dead.spec
index 3cd9965b2fa..e24d7d5d116 100644
--- a/src/test/isolation/specs/freeze-the-dead.spec
+++ b/src/test/isolation/specs/freeze-the-dead.spec
@@ -16,12 +16,44 @@ teardown
}
session "s1"
-setup { BEGIN; }
+step "s1_begin" { BEGIN; }
step "s1_update" { UPDATE tab_freeze SET x = x + 1 WHERE id = 3; }
step "s1_commit" { COMMIT; }
step "s1_vacuum" { VACUUM FREEZE tab_freeze; }
+step "s1_selectone" {
+ BEGIN;
+ SET LOCAL enable_seqscan = false;
+ SET LOCAL enable_bitmapscan = false;
+ SELECT * FROM tab_freeze WHERE id = 3;
+ COMMIT;
+}
+step "s1_selectall" { SELECT * FROM tab_freeze ORDER BY name, id; }
+step "s1_reindex" { REINDEX TABLE tab_freeze; }
session "s2"
-setup { BEGIN; }
+step "s2_begin" { BEGIN; }
step "s2_key_share" { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; }
step "s2_commit" { COMMIT; }
+step "s2_vacuum" { VACUUM FREEZE tab_freeze; }
+
+session "s3"
+step "s3_begin" { BEGIN; }
+step "s3_key_share" { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; }
+step "s3_commit" { COMMIT; }
+step "s3_vacuum" { VACUUM FREEZE tab_freeze; }
+
+# This permutation verfies that a previous bug
+# https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
+# https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
+# is not reintroduced. We used to make wrong pruning / freezing
+# decision for multixacts, which could lead to a) broken hot chains b)
+# dead rows being revived.
+permutation "s1_begin" "s2_begin" "s3_begin" # start transactions
+ "s1_update" "s2_key_share" "s3_key_share" # have xmax be a multi with an updater, updater being oldest xid
+ "s1_update" # create additional row version that has multis
+ "s1_commit" "s2_commit" # commit both updater and share locker
+ "s2_vacuum" # due to bug in freezing logic, we used to *not* prune updated row, and then froze it
+ "s1_selectone" # if hot chain is broken, the row can't be found via index scan
+ "s3_commit" # commit remaining open xact
+ "s2_vacuum" # pruning / freezing in broken hot chains would unset xmax, reviving rows
+ "s1_selectall" # show borkedness
--
2.14.1.536.g6867272d5b.dirty
0002-Perform-a-lot-more-sanity-checks-when-freezing-tuple.patchtext/x-diff; charset=us-asciiDownload
From 3f432d15aa9550112fff18a6670e4b75119ddbb6 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Mon, 13 Nov 2017 18:45:47 -0800
Subject: [PATCH 2/2] Perform a lot more sanity checks when freezing tuples.
The previous commit has shown that the sanity checks around freezing
aren't strong enough. Strengthening them seems especially important
because the existance of the bug has caused corruption that we don't
want to make even worse during future vacuum cycles.
Author: Andres Freund
Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Backpatch: 9.3-
---
src/backend/access/heap/heapam.c | 74 ++++++++++++++++++++++++++++-------
src/backend/access/heap/rewriteheap.c | 5 ++-
src/backend/commands/vacuumlazy.c | 15 ++++++-
src/include/access/heapam.h | 5 ++-
src/include/access/heapam_xlog.h | 2 +
5 files changed, 81 insertions(+), 20 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3acef279f47..eb93718baa7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6357,6 +6357,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
*/
static TransactionId
FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
+ TransactionId relfrozenxid, TransactionId relminmxid,
TransactionId cutoff_xid, MultiXactId cutoff_multi,
uint16 *flags)
{
@@ -6385,14 +6386,19 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
}
else if (MultiXactIdPrecedes(multi, cutoff_multi))
{
+ if (MultiXactIdPrecedes(multi, relminmxid))
+ elog(ERROR, "encountered multixact from before horizon");
+
/*
* This old multi cannot possibly have members still running. If it
* was a locker only, it can be removed without any further
* consideration; but if it contained an update, we might need to
* preserve it.
*/
- Assert(!MultiXactIdIsRunning(multi,
- HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)));
+ if (MultiXactIdIsRunning(multi,
+ HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)))
+ elog(ERROR, "multixact from before cutoff found to be still running");
+
if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
{
*flags |= FRM_INVALIDATE_XMAX;
@@ -6412,7 +6418,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
if (TransactionIdPrecedes(xid, cutoff_xid))
{
- Assert(!TransactionIdDidCommit(xid));
+ if (TransactionIdPrecedes(xid, relfrozenxid))
+ elog(ERROR, "encountered xid from before horizon");
+ if (TransactionIdDidCommit(xid))
+ elog(ERROR, "can't freeze committed xmax");
*flags |= FRM_INVALIDATE_XMAX;
xid = InvalidTransactionId; /* not strictly necessary */
}
@@ -6484,6 +6493,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
{
TransactionId xid = members[i].xid;
+ Assert(TransactionIdIsValid(xid));
+ if (TransactionIdPrecedes(xid, relfrozenxid))
+ elog(ERROR, "encountered xid from before horizon");
+
/*
* It's an update; should we keep it? If the transaction is known
* aborted or crashed then it's okay to ignore it, otherwise not.
@@ -6512,18 +6525,24 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
update_committed = true;
update_xid = xid;
}
+ else
+ {
+ /*
+ * Not in progress, not committed -- must be aborted or crashed;
+ * we can ignore it.
+ */
+ }
- /*
- * Not in progress, not committed -- must be aborted or crashed;
- * we can ignore it.
- */
/*
* Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the
- * update Xid cannot possibly be older than the xid cutoff.
+ * update Xid cannot possibly be older than the xid cutoff. The
+ * presence of such a tuple would cause corruption, so be paranoid
+ * and check.
*/
- Assert(!TransactionIdIsValid(update_xid) ||
- !TransactionIdPrecedes(update_xid, cutoff_xid));
+ if (TransactionIdIsValid(update_xid) &&
+ TransactionIdPrecedes(update_xid, cutoff_xid))
+ elog(ERROR, "encountered xid from before xid cutoff");
/*
* If we determined that it's an Xid corresponding to an update
@@ -6620,8 +6639,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
* recovery. We really need to remove old xids.
*/
bool
-heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
- TransactionId cutoff_multi,
+heap_prepare_freeze_tuple(HeapTupleHeader tuple,
+ TransactionId relfrozenxid, TransactionId relminmxid,
+ TransactionId cutoff_xid, TransactionId cutoff_multi,
xl_heap_freeze_tuple *frz, bool *totally_frozen_p)
{
bool changed = false;
@@ -6640,6 +6660,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
{
if (TransactionIdPrecedes(xid, cutoff_xid))
{
+ if (TransactionIdPrecedes(xid, relfrozenxid))
+ elog(ERROR, "encountered xid from before xid cutoff");
+
+ if (!TransactionIdDidCommit(xid))
+ elog(ERROR, "uncommitted xmin needs to be frozen");
+
frz->t_infomask |= HEAP_XMIN_FROZEN;
changed = true;
}
@@ -6664,6 +6690,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
uint16 flags;
newxmax = FreezeMultiXactId(xid, tuple->t_infomask,
+ relfrozenxid, relminmxid,
cutoff_xid, cutoff_multi, &flags);
if (flags & FRM_INVALIDATE_XMAX)
@@ -6714,7 +6741,21 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
else if (TransactionIdIsNormal(xid))
{
if (TransactionIdPrecedes(xid, cutoff_xid))
+ {
+ if (TransactionIdPrecedes(xid, relfrozenxid))
+ elog(ERROR, "encountered xid from before xid cutoff");
+
+ /*
+ * If we freeze xmax, make absolutely sure that it's not an XID
+ * that is important (Note, a lock-only xmax can be removed
+ * independent of committedness, since a committed lock holder has
+ * released the lock).
+ */
+ if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ TransactionIdDidCommit(xid))
+ elog(ERROR, "can't freeze committed xmax");
freeze_xmax = true;
+ }
else
totally_frozen = false;
}
@@ -6819,14 +6860,17 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz)
* Useful for callers like CLUSTER that perform their own WAL logging.
*/
bool
-heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
- TransactionId cutoff_multi)
+heap_freeze_tuple(HeapTupleHeader tuple,
+ TransactionId relfrozenxid, TransactionId relminmxid,
+ TransactionId cutoff_xid, TransactionId cutoff_multi)
{
xl_heap_freeze_tuple frz;
bool do_freeze;
bool tuple_totally_frozen;
- do_freeze = heap_prepare_freeze_tuple(tuple, cutoff_xid, cutoff_multi,
+ do_freeze = heap_prepare_freeze_tuple(tuple,
+ relfrozenxid, relminmxid,
+ cutoff_xid, cutoff_multi,
&frz, &tuple_totally_frozen);
/*
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index f93c194e182..7d163c91379 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state,
* While we have our hands on the tuple, we may as well freeze any
* eligible xmin or xmax, so that future VACUUM effort can be saved.
*/
- heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
+ heap_freeze_tuple(new_tuple->t_data,
+ state->rs_old_rel->rd_rel->relfrozenxid,
+ state->rs_old_rel->rd_rel->relminmxid,
+ state->rs_freeze_xid,
state->rs_cutoff_multi);
/*
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6587db77acf..4a01137527c 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -467,6 +467,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
blkno;
HeapTupleData tuple;
char *relname;
+ TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
+ TransactionId relminmxid = onerel->rd_rel->relminmxid;
BlockNumber empty_pages,
vacuumed_pages;
double num_tuples,
@@ -1004,6 +1006,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
* tuple, we choose to keep it, because it'll be a lot
* cheaper to get rid of it in the next pruning pass than
* to treat it like an indexed tuple.
+ *
+ * If this were to happen for a tuple that actually needed
+ * to be deleted , we'd be in trouble, because it'd
+ * possibly leave a tuple below the relation's xmin
+ * horizon alive. But checks in
+ * heap_prepare_freeze_tuple() would trigger in that case,
+ * preventing corruption.
*/
if (HeapTupleIsHotUpdated(&tuple) ||
HeapTupleIsHeapOnly(&tuple))
@@ -1095,8 +1104,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
* Each non-removable tuple must be checked to see if it needs
* freezing. Note we already have exclusive buffer lock.
*/
- if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
- MultiXactCutoff, &frozen[nfrozen],
+ if (heap_prepare_freeze_tuple(tuple.t_data,
+ relfrozenxid, relminmxid,
+ FreezeLimit, MultiXactCutoff,
+ &frozen[nfrozen],
&tuple_totally_frozen))
frozen[nfrozen++].offset = offnum;
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4e41024e926..f1366ed9581 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -168,8 +168,9 @@ extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
bool follow_update,
Buffer *buffer, HeapUpdateFailureData *hufd);
extern void heap_inplace_update(Relation relation, HeapTuple tuple);
-extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
- TransactionId cutoff_multi);
+extern bool heap_freeze_tuple(HeapTupleHeader tuple,
+ TransactionId relfrozenxid, TransactionId relminmxid,
+ TransactionId cutoff_xid, TransactionId cutoff_multi);
extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
MultiXactId cutoff_multi, Buffer buf);
extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 81a6a395c4f..ad5f982978c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -390,6 +390,8 @@ extern XLogRecPtr log_heap_freeze(Relation reln, Buffer buffer,
TransactionId cutoff_xid, xl_heap_freeze_tuple *tuples,
int ntuples);
extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
+ TransactionId relfrozenxid,
+ TransactionId relminmxid,
TransactionId cutoff_xid,
TransactionId cutoff_multi,
xl_heap_freeze_tuple *frz,
--
2.14.1.536.g6867272d5b.dirty
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
Hi,
On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.I'm fairly sure it could be triggered, therefore I've rewritten that.
I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.Please review!
Please note:
I'd forgotten to git add the change to isolation_schedule re-activating
the freeze-the-dead regression test.
- Andres
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running. It does so without verifying liveliness
of members. Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one. Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly. I hope I'm missing something here?Trying to write a testcase for that now.
This indeed happens, but I can't quite figure out a way to write an
isolationtester test for this. The problem is that to have something
reproducible one has to make vacuum block on a cleanup lock, but that
currently doesn't register as waiting for the purpose of
isolationtester's logic.
Greetings,
Andres Freund
Hi,
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
Hi,
On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.I'm fairly sure it could be triggered, therefore I've rewritten that.
I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.Please review!
Ping? Alvaro, it'd be good to get some input here.
- Andres
On Tue, Nov 21, 2017 at 4:18 AM, Andres Freund <andres@anarazel.de> wrote:
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.I'm fairly sure it could be triggered, therefore I've rewritten that.
I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.Please review!
Ping? Alvaro, it'd be good to get some input here.
Note that I will be able to jump on the ship after being released from
commit fest duties. This is likely a multi-day task for testing and
looking at it, and I am not the most knowledgeable human being with
this code.
--
Michael
On 2017-11-20 11:18:45 -0800, Andres Freund wrote:
Hi,
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
Hi,
On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.I'm fairly sure it could be triggered, therefore I've rewritten that.
I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.Please review!
Ping? Alvaro, it'd be good to get some input here.
Ping. I'm a bit surprised that a bug fixing a significant data
corruption issue has gotten no reviews at all.
Greetings,
Andres Freund
On Wed, Dec 6, 2017 at 5:03 PM, Andres Freund <andres@anarazel.de> wrote:
Ping. I'm a bit surprised that a bug fixing a significant data
corruption issue has gotten no reviews at all.
Note that I was planning to look at this problem today and tomorrow my
time, getting stuck for CF handling last week and conference this
week. If you think that helps, I'll be happy to help at the extent of
what I can do.
--
Michael
I think you've done a stellar job of identifying what the actual problem
was. I like the new (simpler) coding of that portion of
HeapTupleSatisfiesVacuum.
freeze-the-dead is not listed in isolation_schedule; an easy fix.
I confirm that the test crashes with an assertion failure without the
code fix, and that it doesn't with it.
I think the comparison to OldestXmin should be reversed:
if (!TransactionIdPrecedes(xmax, OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;
return HEAPTUPLE_DEAD;
This way, an xmax that has exactly the OldestXmin value will return
RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
OldestXmin value itself is supposed to be still possibly visible to
somebody). Also, this way it is consistent with the other comparison to
OldestXmin at the bottom of the function. There is no reason for the
"else" or the extra braces.
Put together, I propose the attached delta for 0001.
Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.
I haven't looked at your 0002 yet.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
tweaks.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 8be2980116..aab03835d1 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1324,25 +1324,23 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
else if (TransactionIdDidCommit(xmax))
{
/*
- * The multixact might still be running due to lockers. If the
- * updater is below the horizon we have to return DEAD regardless
- * - otherwise we could end up with a tuple where the updater has
- * to be removed due to the horizon, but is not pruned away. It's
- * not a problem to prune that tuple because all the lockers will
- * also be present in the newer tuple version.
+ * The multixact might still be running due to lockers. If the
+ * updater is below the xid horizon, we have to return DEAD
+ * regardless -- otherwise we could end up with a tuple where the
+ * updater has to be removed due to the horizon, but is not pruned
+ * away. It's not a problem to prune that tuple, because any
+ * remaining lockers will also be present in newer tuple versions.
*/
- if (TransactionIdPrecedes(xmax, OldestXmin))
- {
- return HEAPTUPLE_DEAD;
- }
- else
+ if (!TransactionIdPrecedes(xmax, OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;
+
+ return HEAPTUPLE_DEAD;
}
else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
{
/*
* Not in Progress, Not Committed, so either Aborted or crashed.
- * Remove the Xmax.
+ * Mark the Xmax as invalid.
*/
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
}
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index e41b9164cd..eb566ebb6c 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -44,6 +44,7 @@ test: update-locked-tuple
test: propagate-lock-delete
test: tuplelock-conflict
test: tuplelock-update
+test: freeze-the-dead
test: nowait
test: nowait-2
test: nowait-3
Andres Freund wrote:
I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.
Looking at 0002: I agree with the stuff being done here. I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case. I didn't catch any place missing additional checks.
Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about. No translation required,
though. Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.
I started thinking it'd be good to report block number whenever anything
happened while scanning the relation. The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that. (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)
It appears that you got all the places that seem to reasonably need
additional checks.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-tweaks-for-0002.patchtext/plain; charset=us-asciiDownload
From 9ac665638c86460f0b767628203f8bf28df35e49 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 6 Dec 2017 16:44:25 -0300
Subject: [PATCH 1/2] tweaks for 0002
---
src/backend/access/heap/heapam.c | 67 +++++++++++++++++++++++++++------------
src/backend/commands/vacuumlazy.c | 6 ++--
2 files changed, 50 insertions(+), 23 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index eb93718baa..5c284a4c32 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6387,17 +6387,23 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
else if (MultiXactIdPrecedes(multi, cutoff_multi))
{
if (MultiXactIdPrecedes(multi, relminmxid))
- elog(ERROR, "encountered multixact from before horizon");
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("found multixact %u from before relminmxid %u",
+ multi, relminmxid)));
/*
- * This old multi cannot possibly have members still running. If it
- * was a locker only, it can be removed without any further
- * consideration; but if it contained an update, we might need to
- * preserve it.
+ * This old multi cannot possibly have members still running, but
+ * verify just in case. If it was a locker only, it can be removed
+ * without any further consideration; but if it contained an update, we
+ * might need to preserve it.
*/
if (MultiXactIdIsRunning(multi,
HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)))
- elog(ERROR, "multixact from before cutoff found to be still running");
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("multixact %u from before cutoff %u found to be still running",
+ multi, cutoff_multi)));
if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
{
@@ -6419,9 +6425,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdPrecedes(xid, relfrozenxid))
- elog(ERROR, "encountered xid from before horizon");
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("found xid %u from before relfrozenxid %u",
+ xid, relfrozenxid)));
if (TransactionIdDidCommit(xid))
- elog(ERROR, "can't freeze committed xmax");
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("can't freeze committed xmax %u", xid)));
*flags |= FRM_INVALIDATE_XMAX;
xid = InvalidTransactionId; /* not strictly necessary */
}
@@ -6495,7 +6506,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
Assert(TransactionIdIsValid(xid));
if (TransactionIdPrecedes(xid, relfrozenxid))
- elog(ERROR, "encountered xid from before horizon");
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("found update xid %u from before relfrozenxid %u",
+ xid, relfrozenxid)));
/*
* It's an update; should we keep it? If the transaction is known
@@ -6533,7 +6547,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
}
-
/*
* Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the
* update Xid cannot possibly be older than the xid cutoff. The
@@ -6542,7 +6555,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
*/
if (TransactionIdIsValid(update_xid) &&
TransactionIdPrecedes(update_xid, cutoff_xid))
- elog(ERROR, "encountered xid from before xid cutoff");
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("found update xid %u from before xid cutoff %u",
+ update_xid, cutoff_xid)));
/*
* If we determined that it's an Xid corresponding to an update
@@ -6658,13 +6674,19 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
xid = HeapTupleHeaderGetXmin(tuple);
if (TransactionIdIsNormal(xid))
{
+ if (TransactionIdPrecedes(xid, relfrozenxid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("found xid %u from before relfrozenxid %u",
+ xid, relfrozenxid)));
+
if (TransactionIdPrecedes(xid, cutoff_xid))
{
- if (TransactionIdPrecedes(xid, relfrozenxid))
- elog(ERROR, "encountered xid from before xid cutoff");
-
if (!TransactionIdDidCommit(xid))
- elog(ERROR, "uncommitted xmin needs to be frozen");
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("uncommitted Xmin %u from before xid cutoff %u needs to be frozen",
+ xid, cutoff_xid)));
frz->t_infomask |= HEAP_XMIN_FROZEN;
changed = true;
@@ -6740,20 +6762,25 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
}
else if (TransactionIdIsNormal(xid))
{
+ if (TransactionIdPrecedes(xid, relfrozenxid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("found xid %u from before relfrozenxid %u",
+ xid, relfrozenxid)));
+
if (TransactionIdPrecedes(xid, cutoff_xid))
{
- if (TransactionIdPrecedes(xid, relfrozenxid))
- elog(ERROR, "encountered xid from before xid cutoff");
-
/*
* If we freeze xmax, make absolutely sure that it's not an XID
- * that is important (Note, a lock-only xmax can be removed
+ * that is important. (Note, a lock-only xmax can be removed
* independent of committedness, since a committed lock holder has
* released the lock).
*/
if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
TransactionIdDidCommit(xid))
- elog(ERROR, "can't freeze committed xmax");
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("can't freeze committed xmax %u", xid)));
freeze_xmax = true;
}
else
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 7035a7a6f7..f95346acdb 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1008,10 +1008,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
* to treat it like an indexed tuple.
*
* If this were to happen for a tuple that actually needed
- * to be deleted , we'd be in trouble, because it'd
+ * to be deleted, we'd be in trouble, because it'd
* possibly leave a tuple below the relation's xmin
- * horizon alive. But checks in
- * heap_prepare_freeze_tuple() would trigger in that case,
+ * horizon alive. heap_prepare_freeze_tuple() is prepared
+ * to detect that case and abort the transaction,
* preventing corruption.
*/
if (HeapTupleIsHotUpdated(&tuple) ||
--
2.11.0
0002-blkno-rel-context-info-for-lazy_scan_heap.patchtext/plain; charset=us-asciiDownload
From 5bdc203b9eeda5e43546e0b77dd83d06894fb76b Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Wed, 6 Dec 2017 17:02:17 -0300
Subject: [PATCH 2/2] blkno/rel context info for lazy_scan_heap
---
src/backend/commands/vacuumlazy.c | 41 +++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index f95346acdb..332adc6550 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -131,6 +131,14 @@ typedef struct LVRelStats
bool lock_waiter_detected;
} LVRelStats;
+/*
+ * Struct to hold context info for lazy_scan_heap.
+ */
+typedef struct LazyScanHeapInfo
+{
+ Relation relation;
+ BlockNumber blkno;
+} LazyScanHeapInfo;
/* A few variables that don't seem worth passing around as parameters */
static int elevel = -1;
@@ -146,6 +154,7 @@ static BufferAccessStrategy vac_strategy;
static void lazy_scan_heap(Relation onerel, int options,
LVRelStats *vacrelstats, Relation *Irel, int nindexes,
bool aggressive);
+static void lazy_scan_heap_cb(void *arg);
static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
static void lazy_vacuum_index(Relation indrel,
@@ -476,6 +485,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
nkeep,
nunused;
IndexBulkDeleteResult **indstats;
+ ErrorContextCallback callback;
+ LazyScanHeapInfo info;
int i;
PGRUsage ru0;
Buffer vmbuffer = InvalidBuffer;
@@ -526,6 +537,15 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
initprog_val[2] = vacrelstats->max_dead_tuples;
pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
+ /* set up our verbose error context callback */
+ info.relation = onerel;
+ info.blkno = InvalidBlockNumber;
+
+ callback.callback = lazy_scan_heap_cb;
+ callback.arg = &info;
+ callback.previous = error_context_stack;
+ error_context_stack = &callback;
+
/*
* Except when aggressive is set, we want to skip pages that are
* all-visible according to the visibility map, but only when we can skip
@@ -616,6 +636,9 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
bool has_dead_tuples;
TransactionId visibility_cutoff_xid = InvalidTransactionId;
+ /* update error callback info */
+ info.blkno = blkno;
+
/* see note above about forcing scanning of last page */
#define FORCE_CHECK_PAGE() \
(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
@@ -1275,6 +1298,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
RecordPageWithFreeSpace(onerel, blkno, freespace);
}
+ error_context_stack = callback.previous;
+
/* report that everything is scanned and vacuumed */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
@@ -1384,6 +1409,22 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
pfree(buf.data);
}
+/*
+ * lazy_scan_heap_cb
+ * Error context callback for lazy_scan_heap.
+ */
+static void
+lazy_scan_heap_cb(void *arg)
+{
+ LazyScanHeapInfo *info = (LazyScanHeapInfo *) arg;
+
+ if (info->blkno != InvalidBlockNumber)
+ errcontext("while scanning page %u of relation %s",
+ info->blkno, RelationGetRelationName(info->relation));
+ else
+ errcontext("while vacuuming relation %s",
+ RelationGetRelationName(info->relation));
+}
/*
* lazy_vacuum_heap() -- second pass over the heap
--
2.11.0
On Thu, Dec 7, 2017 at 1:21 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
Put together, I propose the attached delta for 0001.
I have been looking at Andres' 0001 and your tweaks here for some time
since yesterday...
I have also performed sanity checks using all the scripts that have
accumulated on my archives for this stuff. This looks solid to me. I
have not seen failures with broken hot chains, REINDEX, etc.
This way, an xmax that has exactly the OldestXmin value will return
RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
OldestXmin value itself is supposed to be still possibly visible to
somebody). Also, this way it is consistent with the other comparison to
OldestXmin at the bottom of the function. There is no reason for the
"else" or the extra braces.
+1. It would be nice to add a comment in the patched portion
mentioning that the new code had better match what is at the bottom of
the function.
+ else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
+ {
+ /*
+ * Not in Progress, Not Committed, so either Aborted or crashed.
+ * Mark the Xmax as invalid.
+ */
+ SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
}
- /*
- * Not in Progress, Not Committed, so either Aborted or crashed.
- * Remove the Xmax.
- */
- SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
return HEAPTUPLE_LIVE;
I would find cleaner if the last "else if" is put into its own
separate if condition, and that for a multixact still running this
refers to an updating transaction aborted so hint bits are not set.
Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.
(Nit: I have extracted from the test case of Dan an isolation test,
which Andres has reduced to a subset of permutations. Peter G. also
complained about what is visibly the same bug we are discussing here
but without a test case.)
--
Michael
On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Looking at 0002: I agree with the stuff being done here.
The level of details you are providing with a proper error code is an
improvement over the first version proposed in my opinion.
I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case. I didn't catch any place missing additional checks.
In FreezeMultiXactId() wouldn't it be better to issue an error as well
for this assertion?
Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));
Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about. No translation required,
though. Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.
+ if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ TransactionIdDidCommit(xid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("can't freeze committed xmax %u", xid)));
The usual wording used in errmsg is not the "can't" but "cannot".
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("uncommitted Xmin %u from
before xid cutoff %u needs to be frozen",
+ xid, cutoff_xid)));
"Xmin" I have never seen, but "xmin" I did.
I started thinking it'd be good to report block number whenever anything
happened while scanning the relation. The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that. (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)
I would recommend to start a new thread and to add that patch to the
next commit fest as you would get more visibility and input from other
folks on -hackers. It looks like a good idea to me.
--
Michael
Hi,
On 2017-12-06 13:21:15 -0300, Alvaro Herrera wrote:
I think you've done a stellar job of identifying what the actual problem
was. I like the new (simpler) coding of that portion of
HeapTupleSatisfiesVacuum.
Thanks!
freeze-the-dead is not listed in isolation_schedule; an easy fix.
Yea, I'd sent an update about that, stupidly forgot git amend the
commit...
I confirm that the test crashes with an assertion failure without the
code fix, and that it doesn't with it.I think the comparison to OldestXmin should be reversed:
if (!TransactionIdPrecedes(xmax, OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;return HEAPTUPLE_DEAD;
This way, an xmax that has exactly the OldestXmin value will return
RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
OldestXmin value itself is supposed to be still possibly visible to
somebody).
Yes, I think you're right. That's a bug.
Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.
Yea, you're right. I was writing it with 14h of jetlag, apparently that
does something to your brain...
Greetings,
Andres Freund
Hi,
On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.Looking at 0002: I agree with the stuff being done here. I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case. I didn't catch any place missing additional checks.
I think I largely put them into the inner blocks because they were
guaranteed to be reached in those case (the horizon has to be before the
cutoff etc), and that way additional branches are avoided.
Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about. No translation required,
though. Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.
Hm, I don't really care one way or another. I do see that you used
errmsg() in some places, errmsg_internal() in others. Was that
intentional?
I started thinking it'd be good to report block number whenever anything
happened while scanning the relation. The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that. (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)
That seems like a good idea. There's some cases where that could
increase log spam noticeably (unitialized blocks), but that seems
acceptable.
+static void +lazy_scan_heap_cb(void *arg) +{ + LazyScanHeapInfo *info = (LazyScanHeapInfo *) arg; + + if (info->blkno != InvalidBlockNumber) + errcontext("while scanning page %u of relation %s", + info->blkno, RelationGetRelationName(info->relation)); + else + errcontext("while vacuuming relation %s", + RelationGetRelationName(info->relation)); +}
Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
replacing scanning with vacuuming?
Greetings,
Andres Freund
On 2017-12-07 12:08:38 +0900, Michael Paquier wrote:
Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.(Nit: I have extracted from the test case of Dan an isolation test,
which Andres has reduced to a subset of permutations. Peter G. also
complained about what is visibly the same bug we are discussing here
but without a test case.)
The previous versions of the test case didn't actually hit the real
issues, so I don't think that matter much.
Greetings,
Andres Freund
Hello,
Andres Freund wrote:
On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.Looking at 0002: I agree with the stuff being done here. I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case. I didn't catch any place missing additional checks.I think I largely put them into the inner blocks because they were
guaranteed to be reached in those case (the horizon has to be before the
cutoff etc), and that way additional branches are avoided.
Hmm, it should be possible to call vacuum with a very low freeze_min_age
(which sets a very recent relfrozenxid), then shortly thereafter call it
with a large one, no? So it's not really guaranteed ...
Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about. No translation required,
though. Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.Hm, I don't really care one way or another. I do see that you used
errmsg() in some places, errmsg_internal() in others. Was that
intentional?
Eh, no, my intention was to make these all errmsg_internal() to avoid
translation (serves no purpose here). Feel free to update the remaining
ones.
I started thinking it'd be good to report block number whenever anything
happened while scanning the relation. The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that. (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)That seems like a good idea. There's some cases where that could
increase log spam noticeably (unitialized blocks), but that seems
acceptable.
Yeah, I noticed that and I agree it seems ok.
+ if (info->blkno != InvalidBlockNumber) + errcontext("while scanning page %u of relation %s", + info->blkno, RelationGetRelationName(info->relation)); + else + errcontext("while vacuuming relation %s", + RelationGetRelationName(info->relation));Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
replacing scanning with vacuuming?
Makes sense.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-12-07 17:41:56 -0300, Alvaro Herrera wrote:
Looking at 0002: I agree with the stuff being done here. I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case. I didn't catch any place missing additional checks.I think I largely put them into the inner blocks because they were
guaranteed to be reached in those case (the horizon has to be before the
cutoff etc), and that way additional branches are avoided.Hmm, it should be possible to call vacuum with a very low freeze_min_age
(which sets a very recent relfrozenxid), then shortly thereafter call it
with a large one, no? So it's not really guaranteed ...
Fair point!
- Andres
On 2017-12-07 18:32:51 +0900, Michael Paquier wrote:
On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Looking at 0002: I agree with the stuff being done here.
The level of details you are providing with a proper error code is an
improvement over the first version proposed in my opinion.I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case. I didn't catch any place missing additional checks.In FreezeMultiXactId() wouldn't it be better to issue an error as well
for this assertion?
Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));
I'm not really concerned that much about pure lockers, they don't cause
permanent data corruption...
Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about. No translation required,
though. Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.
If you want to go around doing that in some more places we can do so in
master only...
+ if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) && + TransactionIdDidCommit(xid)) + ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg("can't freeze committed xmax %u", xid))); The usual wording used in errmsg is not the "can't" but "cannot".+ ereport(ERROR, + (errcode(ERRCODE_DATA_CORRUPTED), + errmsg_internal("uncommitted Xmin %u from before xid cutoff %u needs to be frozen", + xid, cutoff_xid))); "Xmin" I have never seen, but "xmin" I did.
Changed...
Greetings,
Andres Freund
Hi,
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index f93c194e182..7d163c91379 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state, * While we have our hands on the tuple, we may as well freeze any * eligible xmin or xmax, so that future VACUUM effort can be saved. */ - heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid, + heap_freeze_tuple(new_tuple->t_data, + state->rs_old_rel->rd_rel->relfrozenxid, + state->rs_old_rel->rd_rel->relminmxid, + state->rs_freeze_xid, state->rs_cutoff_multi);
Hm. So this requires backpatching the introduction of
RewriteStateData->rs_old_rel into 9.3, which in turn requires a new
argument to begin_heap_rewrite(). It originally was added in the
logical decoding commit (i.e. 9.4).
I'm fine with that, but it could theoretically cause issues for somebody
with an extension that calls begin_heap_rewrite() - which seems unlikely
and I couldn't find any that does so.
Does anybody have a problem with that?
Regards,
Andres
On 2017-12-14 17:00:29 -0800, Andres Freund wrote:
Hi,
On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index f93c194e182..7d163c91379 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state, * While we have our hands on the tuple, we may as well freeze any * eligible xmin or xmax, so that future VACUUM effort can be saved. */ - heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid, + heap_freeze_tuple(new_tuple->t_data, + state->rs_old_rel->rd_rel->relfrozenxid, + state->rs_old_rel->rd_rel->relminmxid, + state->rs_freeze_xid, state->rs_cutoff_multi);Hm. So this requires backpatching the introduction of
RewriteStateData->rs_old_rel into 9.3, which in turn requires a new
argument to begin_heap_rewrite(). It originally was added in the
logical decoding commit (i.e. 9.4).I'm fine with that, but it could theoretically cause issues for somebody
with an extension that calls begin_heap_rewrite() - which seems unlikely
and I couldn't find any that does so.Does anybody have a problem with that?
Pushed this way. Moved some more relfrozenxid/relminmxid tests outside
of the cutoff changes, polished some error messages.
Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
could have a look at the backported version, just about everything but
v10 had conflicts, some of them not insubstantial.
Greetings,
Andres Freund
On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund <andres@anarazel.de> wrote:
Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
could have a look at the backported version, just about everything but
v10 had conflicts, some of them not insubstantial.
I have gone through the backpatched versions for the fixes in tuple
pruning, running some tests on the way and those look good to me. I
have not taken the time to go through the ones changing the assertions
to ereport() though...
--
Michael
On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund <andres@anarazel.de> wrote:
Pushed this way. Moved some more relfrozenxid/relminmxid tests outside
of the cutoff changes, polished some error messages.Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
could have a look at the backported version, just about everything but
v10 had conflicts, some of them not insubstantial.
I have one minor piece of feedback on the upgrading of assertions to
ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
upgrade the raw elog() "can't happen" error within
IndexBuildHeapRangeScan() to be an ereport() with
ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
tuple for heap-only tuple" error, which I think merits being a real
user-visible error, just like the relfrozenxid/relminmxid tests are
now. As you know, the enhanced amcheck will sometimes detect
corruption due to this bug by hitting that error.
It would be nice if we could tighten up the number of errcodes that
can be involved in an error that amcheck detects. I know that elog()
implicitly has an errcode, that could be included in the errcodes to
check when verification occurs in an automated fashion across a large
number of databases. However, this is a pretty esoteric point, and I'd
prefer to just try to limit it to ERRCODE_DATA_CORRUPTION and
ERRCODE_INDEX_CORRUPTED, insofar as that's practical. When we ran
amcheck on the Heroku fleet, back when I worked there, there were all
kinds of non-interesting errors that could occur that needed to be
filtered out. I want to try to make that process somewhat less painful.
--
Peter Geoghegan
On 2017-12-15 20:25:22 +0900, Michael Paquier wrote:
On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund <andres@anarazel.de> wrote:
Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
could have a look at the backported version, just about everything but
v10 had conflicts, some of them not insubstantial.I have gone through the backpatched versions for the fixes in tuple
pruning, running some tests on the way and those look good to me.
Thanks.
I have not taken the time to go through the ones changing the
assertions to ereport() though...
Those were the ones with a lot of conflicts tho - I'd temporarily broken
freezing for 9.3, but both review and testing caught it...
Greetings,
Andres Freund
On 2017-12-15 10:46:05 -0800, Peter Geoghegan wrote:
On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund <andres@anarazel.de> wrote:
Pushed this way. Moved some more relfrozenxid/relminmxid tests outside
of the cutoff changes, polished some error messages.Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
could have a look at the backported version, just about everything but
v10 had conflicts, some of them not insubstantial.I have one minor piece of feedback on the upgrading of assertions to
ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
upgrade the raw elog() "can't happen" error within
IndexBuildHeapRangeScan() to be an ereport() with
ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
tuple for heap-only tuple" error, which I think merits being a real
user-visible error, just like the relfrozenxid/relminmxid tests are
now. As you know, the enhanced amcheck will sometimes detect
corruption due to this bug by hitting that error.
I'm not opposed to that, it just seems independent from this thread. Not
sure I really want to go around and backpatch such a change, that code
has changed a bit between branches. Happy to do so on master.
Greetings,
Andres Freund
On Fri, Dec 15, 2017 at 10:58 AM, Andres Freund <andres@anarazel.de> wrote:
I have one minor piece of feedback on the upgrading of assertions to
ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
upgrade the raw elog() "can't happen" error within
IndexBuildHeapRangeScan() to be an ereport() with
ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
tuple for heap-only tuple" error, which I think merits being a real
user-visible error, just like the relfrozenxid/relminmxid tests are
now. As you know, the enhanced amcheck will sometimes detect
corruption due to this bug by hitting that error.I'm not opposed to that, it just seems independent from this thread. Not
sure I really want to go around and backpatch such a change, that code
has changed a bit between branches. Happy to do so on master.
The elog(), which was itself upgraded from a simple Assert by commit
d70cf811, appears in exactly the same form in 9.3+. Things did change
there, but they were kept in sync.
--
Peter Geoghegan
On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan <pg@bowt.ie> wrote:
The elog(), which was itself upgraded from a simple Assert by commit
d70cf811, appears in exactly the same form in 9.3+. Things did change
there, but they were kept in sync.
BTW, if you're going to do it, I would target the similar error within
validate_index_heapscan(), too. That was also added by d70cf811.
--
Peter Geoghegan
On 2017-12-15 11:15:47 -0800, Peter Geoghegan wrote:
On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan <pg@bowt.ie> wrote:
The elog(), which was itself upgraded from a simple Assert by commit
d70cf811, appears in exactly the same form in 9.3+. Things did change
there, but they were kept in sync.BTW, if you're going to do it, I would target the similar error within
validate_index_heapscan(), too. That was also added by d70cf811.
Please send a patch for master on a *new* thread.
Greetings,
Andres Freund
I noticed that I'm committer for this patch in the commitfest, though I
don't remember setting that. Are you expecting me to commit it? I
thought you'd do it, but if you want me to assume the responsibility I
can do that.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I noticed that I'm committer for this patch in the commitfest, though I
don't remember setting that. Are you expecting me to commit it? I
thought you'd do it, but if you want me to assume the responsibility I
can do that.
I thought I pushed it to all branches? Do you see anything missing? Didn't know there's a CF entry...
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund wrote:
On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
I noticed that I'm committer for this patch in the commitfest, though I
don't remember setting that. Are you expecting me to commit it? I
thought you'd do it, but if you want me to assume the responsibility I
can do that.I thought I pushed it to all branches? Do you see anything missing?
Didn't know there's a CF entry...
Ah, no, looks correct to me in all branches. Updated the CF now too.
Thanks!
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Dec 23, 2017 at 05:26:22PM -0300, Alvaro Herrera wrote:
I noticed that I'm committer for this patch in the commitfest, though I
don't remember setting that. Are you expecting me to commit it? I
thought you'd do it, but if you want me to assume the responsibility I
can do that.
I tend to create CF entries for all patches that we need to track as bug
fixes. No things lost this way. Alvaro, you have been marked as a committer
of this patch as you were the one to work and push the first version that
has finished reverted.
--
Michael