Old row version in hot chain become visible after a freeze

Started by Wood, Danover 8 years ago34 messagesbugs
Jump to latest
#1Wood, Dan
hexpert@amazon.com

From: Wood, Dan hexpert(at)amazon(dot)com

I’ve found a bug in Postgres which causes old row versions to appear in a table. DEAD rows in a hot chain are getting frozen and becoming visible. I’ve repro’d this in both 9.6.1 and 11-devel.

The repro consists of two short psql scripts.

While the repro does an explicit VACUUM FREEZE, this bug also happens with autovacuum.

FILE: lock.sql
begin;
select id from t where id=3 for key share;
select pg_sleep(1);
update t set x=x+1 where id=3;
commit;
vacuum freeze t;
select ctid, xmin, xmax, id from t;

FILE: repro.sql
drop table t;
create table t (id int primary key, name char(3), x integer);

insert into t values (1, '111', 0);
insert into t values (3, '333', 0);

\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &
\! psql -p 5432 postgres -f lock.sql &

It’s about 50-50 whether any given run of repro.sql will produce output like:

ctid | xmin | xmax | id | x
-------+------+------+----+---
(0,1) | 984 | 0 | 1 | 0
(0,7) | 990 | 0 | 3 | 5
(2 rows)

ctid | xmin | xmax | id | x
-------+------+------+----+---
(0,1) | 984 | 0 | 1 | 0
(0,3) | 986 | 0 | 3 | 1 // This, and x = 2, 3 and 4 came back from the DEAD
(0,4) | 987 | 0 | 3 | 2
(0,5) | 988 | 0 | 3 | 3
(0,6) | 989 | 0 | 3 | 4
(0,7) | 990 | 0 | 3 | 5
(6 rows)

Root cause analysis: lazy_scan_heap() deletes DEAD tuples in heap_page_prune(). However, it is possible for concurrent commits/rollbacks to render a tuple DEAD by the time we reach the switch statement on HeapTupleSatisfiesVacuum(). If such a row IsHotUpdated or IsHeapOnly we can’t delete it below, and must allow a later prune to take care of it.

if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple))
nkeep += 1; // Don't delete, allow later prune to delete it
else
tupgone = true; // We can delete it below

Because tupgone is false we freeze instead of deleting. Freezing a DEAD tuple makes it visible. Here is a comment in heap_prepare_freeze_tuple()

* It is assumed that the caller has checked the tuple with
* HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
* (else we should be removing the tuple, not freezing it).

It is rare that we run into a DEAD tuple in this way during a freeze. More often RECENTLY_DEAD is returned. But we did see this with a more realistic long running test and I was able to create the simplified test case above. Skipping the Freeze on a DEAD tuple that IsHotUpdated or IsHeapOnly does fix the problem. I’ve attached a patch with this fix.

Attachments:

update_freeze.patchapplication/octet-stream; name=update_freeze.patchDownload+14-5
In reply to: Wood, Dan (#1)
Re: Old row version in hot chain become visible after a freeze

Hi Dan,

Nice to hear from you.

On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:

Because tupgone is false we freeze instead of deleting. Freezing a DEAD
tuple makes it visible. Here is a comment in heap_prepare_freeze_tuple()

* It is assumed that the caller has checked the tuple with

* HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD

Funny that there'd be another bug associated with
heap_prepare_freeze_tuple() so soon after the last one was discovered.
Are you aware of the other one [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=31b8db8e6c1fa4436116f4be5ca789f3a01b9ebf;hp=f1dae097f2945ffcb59a9f236843e0e0bbf0920d?

BTW, I just posted a patch to enhance amcheck, to allow it to verify
that an index has all the entries that it ought to [2]https://commitfest.postgresql.org/14/1263/ -- Peter Geoghegan. Perhaps you'd
find it useful for this kind of thing. I welcome your feedback on
that.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=31b8db8e6c1fa4436116f4be5ca789f3a01b9ebf;hp=f1dae097f2945ffcb59a9f236843e0e0bbf0920d
[2]: https://commitfest.postgresql.org/14/1263/ -- Peter Geoghegan
--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Wood, Dan
hexpert@amazon.com
In reply to: Peter Geoghegan (#2)
Re: Old row version in hot chain become visible after a freeze

I was aware of the other one and, in fact, reverted the change just to make sure it wasn’t Involved. A strange coincidence indeed.

On 8/31/17, 3:57 PM, "Peter Geoghegan" <pg@bowt.ie> wrote:

Hi Dan,

Nice to hear from you.

On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:

Because tupgone is false we freeze instead of deleting. Freezing a DEAD
tuple makes it visible. Here is a comment in heap_prepare_freeze_tuple()

* It is assumed that the caller has checked the tuple with

* HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD

Funny that there'd be another bug associated with
heap_prepare_freeze_tuple() so soon after the last one was discovered.
Are you aware of the other one [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=31b8db8e6c1fa4436116f4be5ca789f3a01b9ebf;hp=f1dae097f2945ffcb59a9f236843e0e0bbf0920d?

BTW, I just posted a patch to enhance amcheck, to allow it to verify
that an index has all the entries that it ought to [2]https://commitfest.postgresql.org/14/1263/ -- Peter Geoghegan. Perhaps you'd
find it useful for this kind of thing. I welcome your feedback on
that.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=31b8db8e6c1fa4436116f4be5ca789f3a01b9ebf;hp=f1dae097f2945ffcb59a9f236843e0e0bbf0920d
[2]: https://commitfest.postgresql.org/14/1263/ -- Peter Geoghegan
--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Wood, Dan (#3)
Re: Old row version in hot chain become visible after a freeze

On Thu, Aug 31, 2017 at 4:20 PM, Wood, Dan <hexpert@amazon.com> wrote:

I was aware of the other one and, in fact, reverted the change just to make sure it wasn’t Involved. A strange coincidence indeed.

FWIW, the enhanced amcheck does actually detect this:

postgres=# select bt_index_check('t_pkey', true);
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,3) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597

The good news is that this happens in a codepath that is also used by REINDEX:

postgres=# reindex index t_pkey ;
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,3) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597

Perhaps the lack of field reports of this error suggests that it's
fairly rare. Very hard to be sure about that, though.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Wood, Dan (#1)
Re: Old row version in hot chain become visible after a freeze

On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:

I’ve found a bug in Postgres which causes old row versions to appear in a
table. DEAD rows in a hot chain are getting frozen and becoming visible.
I’ve repro’d this in both 9.6.1 and 11-devel.

I can confirm that this goes all the way back to 9.3, where "for key
share"/foreign key locks were introduced.

--
Peter Geoghegan

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Michael Paquier
michael@paquier.xyz
In reply to: Peter Geoghegan (#5)
Re: Old row version in hot chain become visible after a freeze

On Fri, Sep 1, 2017 at 12:25 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:

I’ve found a bug in Postgres which causes old row versions to appear in a
table. DEAD rows in a hot chain are getting frozen and becoming visible.
I’ve repro’d this in both 9.6.1 and 11-devel.

I can confirm that this goes all the way back to 9.3, where "for key
share"/foreign key locks were introduced.

Hm. That looks pretty bad to me... It is bad luck that this report
happens just after the last round of minor releases has been out, and
we would have needed at least a couple of days and a couple of pairs
of eyes to come up with a correct patch. (I haven't looked at the
proposed solution and the attached patch yet, so no opinions yet).
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#6)
Re: Old row version in hot chain become visible after a freeze

On Fri, Sep 1, 2017 at 12:46 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Sep 1, 2017 at 12:25 PM, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Aug 31, 2017 at 3:36 PM, Wood, Dan <hexpert@amazon.com> wrote:

I’ve found a bug in Postgres which causes old row versions to appear in a
table. DEAD rows in a hot chain are getting frozen and becoming visible.
I’ve repro’d this in both 9.6.1 and 11-devel.

I can confirm that this goes all the way back to 9.3, where "for key
share"/foreign key locks were introduced.

Hm. That looks pretty bad to me... It is bad luck that this report
happens just after the last round of minor releases has been out, and
we would have needed at least a couple of days and a couple of pairs
of eyes to come up with a correct patch. (I haven't looked at the
proposed solution and the attached patch yet, so no opinions yet).

Using a build where assertions are enabled, the provided test case
blows up before even returning results:
frame #3: 0x0000000109e8df90
postgres`ExceptionalCondition(conditionName="!(!((update_xid) !=
((TransactionId) 0)) || !TransactionIdPrecedes(update_xid,
cutoff_xid))", errorType="FailedAssertion", fileName="heapam.c",
lineNumber=6533) + 128 at assert.c:54
frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34,
t_infomask=4930, cutoff_xid=897, cutoff_multi=30,
flags=0x00007fff56372fae) + 1179 at heapam.c:6532
frame #5: 0x00000001098fb2cd
postgres`heap_prepare_freeze_tuple(tuple=0x000000010b04d0b0,
cutoff_xid=897, cutoff_multi=30, frz=0x00007fae0d800040,
totally_frozen_p="\x01\x02") + 285 at heapam.c:6673
frame #6: 0x0000000109af356d
postgres`lazy_scan_heap(onerel=0x000000010a6fe630, options=9,
vacrelstats=0x00007fae0a0580a8, Irel=0x00007fae0a058688, nindexes=1,
aggressive='\x01') + 4413 at vacuumlazy.c:1081
frame #7: 0x0000000109af1ef2
postgres`lazy_vacuum_rel(onerel=0x000000010a6fe630, options=9,
params=0x00007fff56373608, bstrategy=0x00007fae0a047c40) + 626 at
vacuumlazy.c:253
(lldb) up 1
frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34,
t_infomask=4930, cutoff_xid=897, cutoff_multi=30,
flags=0x00007fff56372fae) + 1179 at heapam.c:6532
6529 * Since the tuple wasn't marked HEAPTUPLE_DEAD
by vacuum, the
6530 * update Xid cannot possibly be older than the
xid cutoff.
6531 */
-> 6532 Assert(!TransactionIdIsValid(update_xid) ||
6533 !TransactionIdPrecedes(update_xid, cutoff_xid));
6534
6535 /*
(lldb) p update_xid
(TransactionId) $0 = 896
(lldb) p cutoff_xid
(TransactionId) $1 = 897

As the portion doing vacuum freeze is the one blowing up, I think that
it is possible to extract an isolation test and include it in the
patch with repro.sql as the initialization phase.

                 /*
-                 * Each non-removable tuple must be checked to see if it needs
+                 * Each non-removable tuple that we do not keep must
be checked to see if it needs
                  * freezing.  Note we already have exclusive buffer lock.
                  */
-                if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
+                if (!tupkeep)
+                {
+                    if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
                                               MultiXactCutoff,
&frozen[nfrozen],
                                               &tuple_totally_frozen))
-                    frozen[nfrozen++].offset = offnum;
+                        frozen[nfrozen++].offset = offnum;

Wouldn't it be better to check if freeze is needed for the tuple
scanned with heap_tuple_needs_freeze() instead of inventing a new
custom condition? Please note as well that heap_tuple_needs_freeze()
does not need its last "buf" argument.
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#7)
Re: Old row version in hot chain become visible after a freeze

On Tue, Sep 5, 2017 at 9:44 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

As the portion doing vacuum freeze is the one blowing up, I think that
it is possible to extract an isolation test and include it in the
patch with repro.sql as the initialization phase.

Indeed, I have been able to reduce that to an isolation test which
crashes with a rate of 100% if assertions are enabled.

I have also spent a couple more hours looking at the proposed patch
and eye-balling the surrounding code, and my suggestion about
heap_tuple_needs_freeze() is proving to be wrong. So I am arriving at
the conclusion that your patch is taking the right approach to skip
freezing completely if the tuple is not to be removed yet if it is for
vacuum either DEAD or RECENTLY_DEAD.

I am adding as well in CC Álvaro and Andres, who reworked tuple
freezing in 3b97e682 a couple of years back for 9.3. Could you look
more at the bug and the attached patch? It does not fundamentally
change from what has been proposed first, just did the following:
- Updated set of comments that incorrectly assumed that only DEAD
tuples should have freeze skipped.
- Added isolation test for the failure.

An entry has been added in the next open CF to track this problem.
This should not fall into the cracks!
--
Michael

Attachments:

update_freeze_v2.patchapplication/octet-stream; name=update_freeze_v2.patchDownload+162-16
#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#8)
Re: Old row version in hot chain become visible after a freeze

Michael Paquier wrote:

I have also spent a couple more hours looking at the proposed patch
and eye-balling the surrounding code, and my suggestion about
heap_tuple_needs_freeze() is proving to be wrong. So I am arriving at
the conclusion that your patch is taking the right approach to skip
freezing completely if the tuple is not to be removed yet if it is for
vacuum either DEAD or RECENTLY_DEAD.

I think in the "tupkeep" case we must not mark the page as frozen in VM;
in other words I think that block needs to look like this:

// tupgone = false
{
bool tuple_totally_frozen;

num_tuples += 1;
hastup = true;

/*
* Each non-removable tuple that we do not keep must be checked
* to see if it needs freezing. Note we already have exclusive
* buffer lock.
*/
if (!tupkeep &&
heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
MultiXactCutoff,
&frozen[nfrozen],
&tuple_totally_frozen))
frozen[nfrozen++].offset = offnum;

if (tupkeep || !tuple_totally_frozen)
all_frozen = false;
}

Otherwise, we risk marking the page as all-frozen, and it would be
skipped by vacuum. If we never come around to HOT-pruning the page, a
non-permanent xid (or a multixact? not sure that that can happen;
probably not) would linger unnoticed and cause a DoS condition later
("cannot open file pg_clog/1234") when the tuple header is read.

Now, it is possible that HOT pruning would fix the page promptly without
causing an actual DoS, but nonetheless it seems dangerous to leave
things like this.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#8)
Re: Old row version in hot chain become visible after a freeze

Another thing is that if you're going through heap_copy_data, the tuple
is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here
but I'm not sure yet.

I attach the patch as I have it now -- mostly comment tweaks, but also
the change to not mark page as all-frozen when we skip freezing a
recently read tuple. I also renamed the test case, because I think it
may be possible to reach the problem code without involving a multixact.
Haven't tried too hard though.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0001-test-cases.patchtext/plain; charset=us-asciiDownload+129-1
0002-Don-t-freeze-recently-dead-HOT-tuples.patchtext/plain; charset=us-asciiDownload+32-14
#11Wood, Dan
hexpert@amazon.com
In reply to: Alvaro Herrera (#9)
Re: Old row version in hot chain become visible after a freeze

First new reply

On 9/6/17, 3:41 AM, "Alvaro Herrera" <alvherre@alvh.no-ip.org> wrote:

Michael Paquier wrote:

I have also spent a couple more hours looking at the proposed patch
and eye-balling the surrounding code, and my suggestion about
heap_tuple_needs_freeze() is proving to be wrong. So I am arriving at
the conclusion that your patch is taking the right approach to skip
freezing completely if the tuple is not to be removed yet if it is for
vacuum either DEAD or RECENTLY_DEAD.

I think in the "tupkeep" case we must not mark the page as frozen in VM;
in other words I think that block needs to look like this:

// tupgone = false
{
bool tuple_totally_frozen;

num_tuples += 1;
hastup = true;

/*
* Each non-removable tuple that we do not keep must be checked
* to see if it needs freezing. Note we already have exclusive
* buffer lock.
*/
if (!tupkeep &&
heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
MultiXactCutoff,
&frozen[nfrozen],
&tuple_totally_frozen))
frozen[nfrozen++].offset = offnum;

if (tupkeep || !tuple_totally_frozen)
all_frozen = false;
}

Otherwise, we risk marking the page as all-frozen, and it would be
skipped by vacuum. If we never come around to HOT-pruning the page, a
non-permanent xid (or a multixact? not sure that that can happen;
probably not) would linger unnoticed and cause a DoS condition later
("cannot open file pg_clog/1234") when the tuple header is read.

Now, it is possible that HOT pruning would fix the page promptly without
causing an actual DoS, but nonetheless it seems dangerous to leave
things like this.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#12Wood, Dan
hexpert@amazon.com
In reply to: Wood, Dan (#11)
Re: Old row version in hot chain become visible after a freeze

Sorry for my mistaken last mail

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#7)
Re: Old row version in hot chain become visible after a freeze

Michael Paquier wrote:

frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34,
t_infomask=4930, cutoff_xid=897, cutoff_multi=30,
flags=0x00007fff56372fae) + 1179 at heapam.c:6532
6529 * Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the
6530 * update Xid cannot possibly be older than the xid cutoff.
6531 */
-> 6532 Assert(!TransactionIdIsValid(update_xid) ||
6533 !TransactionIdPrecedes(update_xid, cutoff_xid));
6534
6535 /*
(lldb) p update_xid
(TransactionId) $0 = 896
(lldb) p cutoff_xid
(TransactionId) $1 = 897

So, looking at this closely, I think there is a bigger problem here: if
we use any of the proposed patches or approaches, we risk leaving an old
Xid in a tuple (because of skipping the heap_tuple_prepare_freeze on a
tuple which remains in the heap with live Xids), followed by later
truncating pg_multixact / pg_clog removing a segment that might be
critical to resolving this tuple status later on.

I think doing the tuple freezing dance for any tuple we don't remove
from the heap is critical, not optional. Maybe a later HOT pruning
would save you from actual disaster, but I think it'd be a bad idea to
rely on that.

So ISTM we need a different solution than what's been proposed so far;
and I think that solution is different for each of the possible problem
cases, which are two: HEAPTUPLE_DEAD and HEAPTUPLE_RECENTLY_DEAD.

I think we can cover the HEAPTUPLE_DEAD case by just redoing the
heap_page_prune (just add a "goto" back to it if we detect the case).
It's a bit wasteful because we'd re-process all the prior tuples in the
loop below, but since it's supposed to be an infrequent condition, I
think it should be okay.

The RECENTLY_DEAD case is interesting. We know that the updater is
committed, and since the update XID is older than the cutoff XID, then
we know nobody else can see the tuple. So we can simply remove it ...
and we already have a mechanism for that: return FRM_MARK_COMMITTED in
FreezeMultiXactId. But the code already does that! The only thing we
need in order for this to be handled correctly is to remove the assert.

A case I didn't think about yet is RECENTLY_DEAD if the xmax is a plain
Xid (not a multi). My vague feeling is that there is no bug here.

I haven't actually tested this. Planning to look into it tomorrow.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#14Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#10)
Re: Old row version in hot chain become visible after a freeze

On Wed, Sep 6, 2017 at 7:40 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Otherwise, we risk marking the page as all-frozen, and it would be
skipped by vacuum. If we never come around to HOT-pruning the page, a
non-permanent xid (or a multixact? not sure that that can happen;
probably not) would linger unnoticed and cause a DoS condition later
("cannot open file pg_clog/1234") when the tuple header is read.

Yes, you are definitely right here.

On Wed, Sep 6, 2017 at 10:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Another thing is that if you're going through heap_copy_data, the tuple
is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here
but I'm not sure yet.

Not sure to what you are referring here. Perhaps heap_prune_chain()?
It seems to me that it does the right thing.

I attach the patch as I have it now -- mostly comment tweaks, but also
the change to not mark page as all-frozen when we skip freezing a
recently read tuple. I also renamed the test case, because I think it
may be possible to reach the problem code without involving a multixact.
Haven't tried too hard though.

+test: freeze-the-dead
This one has made me smile..

+++ b/src/test/isolation/expected/freeze-the-dead.spec
@@ -0,0 +1,101 @@
+Parsed test spec with 2 sessions
The test fails as expected/ files need to be named with .out, not .spec.
-- 
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#15Jeff Frost
jeff@pgexperts.com
In reply to: Alvaro Herrera (#10)
Re: Old row version in hot chain become visible after a freeze

On Sep 6, 2017, at 6:27 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Another thing is that if you're going through heap_copy_data, the tuple
is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here
but I'm not sure yet.

Any idea on how to identify affected rows on a running system?

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#16Jeff Frost
jeff@pgexperts.com
In reply to: Jeff Frost (#15)
Re: Old row version in hot chain become visible after a freeze

On Sep 6, 2017, at 8:23 PM, Jeff Frost <jeff@pgexperts.com> wrote:

Any idea on how to identify affected rows on a running system?

I guess better questions would be:

Is it as simple as:

SELECT id, count(*) FROM foo GROUP BY id HAVING count(*) > 1;

Maybe also need to:

set enable_indexscan = 0;
set enable_indexonlyscan = 0;

before running the SELECT?

Is it possible to affect a DELETE or does it need to be a HOT updated row?

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#17Wong, Yi Wen
yiwong@amazon.com
In reply to: Alvaro Herrera (#13)
Re: Old row version in hot chain become visible after a freeze

________________________________________
From: pgsql-bugs-owner@postgresql.org <pgsql-bugs-owner@postgresql.org> on behalf of Alvaro Herrera <alvherre@alvh.no-ip.org>
Sent: Wednesday, September 6, 2017 3:12 PM
To: Michael Paquier
Cc: Peter Geoghegan; Wood, Dan; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] Old row version in hot chain become visible after a freeze

Michael Paquier wrote:

frame #4: 0x00000001098fba6b postgres`FreezeMultiXactId(multi=34,
t_infomask=4930, cutoff_xid=897, cutoff_multi=30,
flags=0x00007fff56372fae) + 1179 at heapam.c:6532
6529 * Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the
6530 * update Xid cannot possibly be older than the xid cutoff.
6531 */
-> 6532 Assert(!TransactionIdIsValid(update_xid) ||
6533 !TransactionIdPrecedes(update_xid, cutoff_xid));
6534
6535 /*
(lldb) p update_xid
(TransactionId) $0 = 896
(lldb) p cutoff_xid
(TransactionId) $1 = 897

So, looking at this closely, I think there is a bigger problem here: if
we use any of the proposed patches or approaches, we risk leaving an old
Xid in a tuple (because of skipping the heap_tuple_prepare_freeze on a
tuple which remains in the heap with live Xids), followed by later
truncating pg_multixact / pg_clog removing a segment that might be
critical to resolving this tuple status later on.

I think doing the tuple freezing dance for any tuple we don't remove
from the heap is critical, not optional. Maybe a later HOT pruning
would save you from actual disaster, but I think it'd be a bad idea to
rely on that.

There is actually another case where HEAPTUPLE_DEAD tuples may be kept and have
prepare_freeze skipped on them entirely.

lazy_record_dead_tuple may fail to record the heap for later pruning
for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples
in the array:

/*
* The array shouldn't overflow under normal behavior, but perhaps it
* could if we are given a really small maintenance_work_mem. In that
* case, just forget the last few tuples (we'll get 'em next time).
*/
if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
...

It looks like we don't even check if the lazy_record_dead_tuple operation actually did any
actual recording in the tupgone case...

if (tupgone)
{
lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
&vacrelstats->latestRemovedXid);
tups_vacuumed += 1;
has_dead_tuples = true;
}

It's probably unsafe for any operations calling TruncateXXX code to assume that old Xids don't stick around
after a vacuum in itself.

Thanks,
Yi Wen

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Wong, Yi Wen (#17)
Re: Old row version in hot chain become visible after a freeze

Wong, Yi Wen wrote:

There is actually another case where HEAPTUPLE_DEAD tuples may be kept and have
prepare_freeze skipped on them entirely.

lazy_record_dead_tuple may fail to record the heap for later pruning
for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples
in the array:

Hmm, ouch, good catch.

AFAICS this is a shouldn't-happen condition, since we bail out of the
loop pessimistically as soon as we would be over the array limit if the
next page were to be full of dead tuples (i.e., we never give the chance
for overflow to actually happen). So unless I misunderstand, this could
only fail if you give maint_work_mem smaller than necessary for one
pageful of dead tuples, which should be about 1800 bytes ...

If we wanted to be very sure about this we could add a test and perhaps
abort the vacuum, but I'm not sure it's worth the trouble.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#18)
Re: Old row version in hot chain become visible after a freeze

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Wong, Yi Wen wrote:

lazy_record_dead_tuple may fail to record the heap for later pruning
for lazy_vacuum_heap if there is already a sufficiently large number of dead tuples
in the array:

Hmm, ouch, good catch.

AFAICS this is a shouldn't-happen condition, since we bail out of the
loop pessimistically as soon as we would be over the array limit if the
next page were to be full of dead tuples (i.e., we never give the chance
for overflow to actually happen). So unless I misunderstand, this could
only fail if you give maint_work_mem smaller than necessary for one
pageful of dead tuples, which should be about 1800 bytes ...

If we wanted to be very sure about this we could add a test and perhaps
abort the vacuum, but I'm not sure it's worth the trouble.

I think if we're going to depend on that, we should change the logic from
"don't record tuple if no space" to "throw error on no space".

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#14)
Re: Old row version in hot chain become visible after a freeze

Michael Paquier wrote:

On Wed, Sep 6, 2017 at 7:40 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Otherwise, we risk marking the page as all-frozen, and it would be
skipped by vacuum. If we never come around to HOT-pruning the page, a
non-permanent xid (or a multixact? not sure that that can happen;
probably not) would linger unnoticed and cause a DoS condition later
("cannot open file pg_clog/1234") when the tuple header is read.

Yes, you are definitely right here.

On Wed, Sep 6, 2017 at 10:27 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Another thing is that if you're going through heap_copy_data, the tuple
is only checked for DEAD, not RECENTLY_DEAD. Maybe there is no bug here
but I'm not sure yet.

Not sure to what you are referring here. Perhaps heap_prune_chain()?
It seems to me that it does the right thing.

I meant copy_heap_data, which also "freezes" tuples while copying the
heap. In the end, I concluded the fact that it runs with access
exclusive lock is sufficient protection. I was worried that in the
future we might improve that code and enable it to run without AEL, but
after looking at it a little more, it sounds like a big enough project
that this is going to be the least of its troubles.

Anyway, in order to test this, I tried running this one-liner (files
attached):
psql -f /tmp/repro.sql ; for i in `seq 1 50`; do psql --no-psqlrc -f /tmp/lock.sql & done

(I also threw in a small sleep between heap_page_prune and
HeapTupleSatisfiesVacuum while testing, just to widen the problem window
to hopefully make any remaining problems more evident.)

This turned up a few different failure modes, which I fixed until no
further problems arose. With the attached patch, I no longer see any
failures (assertion failures) or misbehavior (additional rows), in a few
dozen runs, which were easy to come by with the original code. The
resulting patch, which I like better than the previously proposed idea
of skipping the freeze, takes the approach of handling freeze correctly
for the cases where the tuple still exists after pruning.

I also tweaked lazy_record_dead_tuple to fail with ERROR if the tuple
cannot be recorded, as observed by Yi Wen. AFAICS that's not reachable
because of the way the array is allocated, so an elog(ERROR) is
sufficient.

I regret my inability to turn the oneliner into a committable test case,
but I think that's beyond what I can do for now.

FWIW running this against 9.2 I couldn't detect any problems, which I
suppose was expected. Changing the lock mode in the test file to FOR
SHARE I only see an enormous amount of deadlocks reported (which makes
me confident that doing multixact stuff in 9.3 was a good thing, despite
what bug chasers may think.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

update_freeze_v3.patchtext/plain; charset=us-asciiDownload+40-23
repro.sqltext/plain; charset=us-asciiDownload
lock.sqltext/plain; charset=us-asciiDownload
#21Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#20)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#21)
#23Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#23)
#25Wong, Yi Wen
yiwong@amazon.com
In reply to: Alvaro Herrera (#22)
#26Wong, Yi Wen
yiwong@amazon.com
In reply to: Wong, Yi Wen (#25)
#27Wong, Yi Wen
yiwong@amazon.com
In reply to: Wong, Yi Wen (#26)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#20)
#29Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#28)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Wong, Yi Wen (#27)
#31Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#30)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Frost (#16)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#31)
#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#31)