Incomplete freezing when truncating a relation during vacuum

Started by Andres Freundover 12 years ago40 messageshackers
Jump to latest
#1Andres Freund
andres@anarazel.de

Hi,

A customer of ours reported that some columns were missing from
pg_attribute. Investigation showed that they were visible to sequential
but not index scans. Looking closer, the page with the missing
attributes is marked as all-visible, but the xids on the individual rows
were xids more than 2^31 in the past - so, effectively in the future and
invisible.
pg_class.relfrozenxid, pg_database.datfrozenxid looked perfectly normal,
not indicating any wraparound and were well past the xid in the
particular rows.
So evidently, something around freezing has gone wrong. We've haven't
frozen each row, but updated relfrozenxid regardless.

A longer period of staring revealed a likely reason, in lazy_vacuum_rel:
/* Do the vacuuming */
lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all);
...
if (whatever)
lazy_truncate_heap(onerel, vacrelstats);
...
new_frozen_xid = FreezeLimit;
if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
new_frozen_xid = InvalidTransactionId;
but lazy_tuncate_heap() does, after it's finished truncating:
vacrelstats->rel_pages = new_rel_pages;

Which means, we might consider a partial vacuum as a vacuum that has
frozen all old rows if just enough pages have been truncated away.

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.

This is somewhat nasty :(.

I am not sure how we could fixup the resulting corruption. In some cases
we can check for page-level all-visible bit and fixup the the individual
xids. But it's not guaranteed, although likely, that the page level all
visible bit has been set...

Comments?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#1)
Re: Incomplete freezing when truncating a relation during vacuum

On Tue, Nov 26, 2013 at 7:32 AM, Andres Freund <andres@2ndquadrant.com> wrote:

This is somewhat nasty :(.

Yeah, that's bad. Real bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
Re: Incomplete freezing when truncating a relation during vacuum

On 2013-11-26 13:32:44 +0100, Andres Freund wrote:

A longer period of staring revealed a likely reason, in lazy_vacuum_rel:
/* Do the vacuuming */
lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all);
...
if (whatever)
lazy_truncate_heap(onerel, vacrelstats);
...
new_frozen_xid = FreezeLimit;
if (vacrelstats->scanned_pages < vacrelstats->rel_pages)
new_frozen_xid = InvalidTransactionId;
but lazy_tuncate_heap() does, after it's finished truncating:
vacrelstats->rel_pages = new_rel_pages;

Which means, we might consider a partial vacuum as a vacuum that has
frozen all old rows if just enough pages have been truncated away.

repro.sql is a reproducer for the problem.

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.

I've gone for the latter in this preliminary patch. Not increasing
relfrozenxid after an initial data load seems like a bit of a shame.

I wonder if we should just do scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

repro.sqltext/plain; charset=us-asciiDownload
0001-Don-t-sometimes-incorrectly-increase-relfrozenxid-in.patchtext/x-patch; charset=us-asciiDownload+18-9
#4Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#3)
Re: Incomplete freezing when truncating a relation during vacuum

On 11/27/13 01:21, Andres Freund wrote:

On 2013-11-26 13:32:44 +0100, Andres Freund wrote:

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.

I've gone for the latter in this preliminary patch. Not increasing
relfrozenxid after an initial data load seems like a bit of a shame.

I wonder if we should just do scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages?

Hmm, you did (scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages) for relminmxid, and just
(vacrelstats->scanned_pages < vacrelstats->rel_pages) for relfrozenxid.
That was probably not what you meant to do, the thing you did for
relfrozenxid looks good to me.

Does the attached look correct to you?

- Heikki

Attachments:

0001-Don-t-sometimes-incorrectly-increase-relfrozenxid-in-2.patchtext/x-diff; name=0001-Don-t-sometimes-incorrectly-increase-relfrozenxid-in-2.patchDownload+21-10
#5Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#4)
Re: Incomplete freezing when truncating a relation during vacuum

On 2013-11-27 11:01:55 +0200, Heikki Linnakangas wrote:

On 11/27/13 01:21, Andres Freund wrote:

On 2013-11-26 13:32:44 +0100, Andres Freund wrote:

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.

I've gone for the latter in this preliminary patch. Not increasing
relfrozenxid after an initial data load seems like a bit of a shame.

I wonder if we should just do scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages?

Hmm, you did (scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages) for relminmxid, and just (vacrelstats->scanned_pages
< vacrelstats->rel_pages) for relfrozenxid. That was probably not what you
meant to do, the thing you did for relfrozenxid looks good to me.

I said it's a preliminary patch ;), really, I wasn't sure what of both
to go for.

Does the attached look correct to you?

Looks good.

I wonder if we need to integrate any mitigating logic? Currently the
corruption may only become apparent long after it occurred, that's
pretty bad. And instructing people run a vacuum after the ugprade will
cause the corrupted data being lost if they are already 2^31 xids. But
integrating logic to fix things into heap_page_prune() looks somewhat
ugly as well.
Afaics the likelihood of the issue occuring on non-all-visible pages is
pretty low, since they'd need to be skipped due to lock contention
repeatedly.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#5)
Re: Incomplete freezing when truncating a relation during vacuum

On 11/27/13 11:15, Andres Freund wrote:

On 2013-11-27 11:01:55 +0200, Heikki Linnakangas wrote:

On 11/27/13 01:21, Andres Freund wrote:

On 2013-11-26 13:32:44 +0100, Andres Freund wrote:

This seems to be the case since
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8. I suggest we go back to using
scan_all to determine whether we can set new_frozen_xid. That's a slight
pessimization when we scan a relation fully without explicitly scanning
it in its entirety, but given this isn't the first bug around
scanned_pages/rel_pages I'd rather go that way. The aforementioned
commit wasn't primarily concerned with that.
Alternatively we could just compute new_frozen_xid et al before the
lazy_truncate_heap.

I've gone for the latter in this preliminary patch. Not increasing
relfrozenxid after an initial data load seems like a bit of a shame.

I wonder if we should just do scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages?

Hmm, you did (scan_all || vacrelstats->scanned_pages <
vacrelstats->rel_pages) for relminmxid, and just (vacrelstats->scanned_pages
< vacrelstats->rel_pages) for relfrozenxid. That was probably not what you
meant to do, the thing you did for relfrozenxid looks good to me.

I said it's a preliminary patch ;), really, I wasn't sure what of both
to go for.

Does the attached look correct to you?

Looks good.

Ok, committed and backpatched that.

I wonder if we need to integrate any mitigating logic? Currently the
corruption may only become apparent long after it occurred, that's
pretty bad. And instructing people run a vacuum after the ugprade will
cause the corrupted data being lost if they are already 2^31 xids.

Ugh :-(. Running vacuum after the upgrade is the right thing to do to
prevent further damage, but you're right that it will cause any
already-wrapped around data to be lost forever. Nasty.

But integrating logic to fix things into heap_page_prune() looks
somewhat ugly as well.

I think any mitigating logic we might add should go into vacuum. It
should be possible for a DBA to run a command, and after it's finished,
be confident that you're safe. That means vacuum.

Afaics the likelihood of the issue occuring on non-all-visible pages is
pretty low, since they'd need to be skipped due to lock contention
repeatedly.

Hmm. If a page has its visibility-map flag set, but contains a tuple
that appears to be dead because you've wrapped around, vacuum will give
a warning: "page containing dead tuples is marked as all-visible in
relation \"%s\" page %u". So I think if a manual VACUUM FREEZE passes
without giving that warning, that vacuum hasn't destroyed any data. So
we could advise to take a physical backup of the data directory, and run
a manual VACUUM FREEZE on all databases. If it doesn't give a warning,
you're safe from that point onwards. If it does, you'll want to recover
from an older backup, or try to manually salvage just the lost rows from
the backup, and re-index. Ugly, but hopefully rare in practice.

Unfortunately that doesn't mean that you haven't already lost some data.
Wrap-around could've already happened, and vacuum might already have run
and removed some rows. You'll want to examine your logs and grep for
that warning.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#6)
Re: Incomplete freezing when truncating a relation during vacuum

On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote:

Ok, committed and backpatched that.

Thanks.

I wonder if we need to integrate any mitigating logic? Currently the
corruption may only become apparent long after it occurred, that's
pretty bad. And instructing people run a vacuum after the ugprade will
cause the corrupted data being lost if they are already 2^31 xids.

Ugh :-(. Running vacuum after the upgrade is the right thing to do to
prevent further damage, but you're right that it will cause any
already-wrapped around data to be lost forever. Nasty.

But integrating logic to fix things into heap_page_prune() looks
somewhat ugly as well.

I think any mitigating logic we might add should go into vacuum. It should
be possible for a DBA to run a command, and after it's finished, be
confident that you're safe. That means vacuum.

Well, heap_page_prune() is the first thing that's executed by
lazy_scan_heap(), that's why I was talking about it. So anything we do
need to happen in there or before.

Afaics the likelihood of the issue occuring on non-all-visible pages is
pretty low, since they'd need to be skipped due to lock contention
repeatedly.

Hmm. If a page has its visibility-map flag set, but contains a tuple that
appears to be dead because you've wrapped around, vacuum will give a
warning: "page containing dead tuples is marked as all-visible in relation
\"%s\" page %u".

I don't think this warning is likely to be hit as the code stands -
heap_page_prune() et. al. will have removed all dead tuples already,
right and so has_dead_tuples won't be set.

Independent from this, ISTM we should add a
else if (PageIsAllVisible(page) && all_visible)
to those checks.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andres Freund (#7)
Re: Incomplete freezing when truncating a relation during vacuum

On 11/27/13 14:11, Andres Freund wrote:

On 2013-11-27 13:56:58 +0200, Heikki Linnakangas wrote:

Afaics the likelihood of the issue occuring on non-all-visible pages is
pretty low, since they'd need to be skipped due to lock contention
repeatedly.

Hmm. If a page has its visibility-map flag set, but contains a tuple that
appears to be dead because you've wrapped around, vacuum will give a
warning: "page containing dead tuples is marked as all-visible in relation
\"%s\" page %u".

I don't think this warning is likely to be hit as the code stands -
heap_page_prune() et. al. will have removed all dead tuples already,
right and so has_dead_tuples won't be set.

It might be a good idea to add such a warning to heap_page_prune(). Or
also emit the warning in lazy_scan_heap() if heap_page_prune() returned > 0.

Independent from this, ISTM we should add a
else if (PageIsAllVisible(page) && all_visible)
to those checks.

Can you elaborate, where should that be added?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Heikki Linnakangas (#8)
Re: Incomplete freezing when truncating a relation during vacuum

On 2013-11-27 14:45:25 +0200, Heikki Linnakangas wrote:

On 11/27/13 14:11, Andres Freund wrote:

I don't think this warning is likely to be hit as the code stands -
heap_page_prune() et. al. will have removed all dead tuples already,
right and so has_dead_tuples won't be set.

It might be a good idea to add such a warning to heap_page_prune(). Or also
emit the warning in lazy_scan_heap() if heap_page_prune() returned > 0.

Independent from this, ISTM we should add a
else if (PageIsAllVisible(page) && all_visible)
to those checks.

Can you elaborate, where should that be added?

I was thinking of adding such a warning below
elog(WARNING, "page containing dead tuples is marked as all-visible in relation \"%s\" page %u",..)
but cannot warn against that because GetOldestXmin() can go backwards...

I think it's probably sufficient to set has_dead_tuples = true in the
ItemIdIsDead() branch in lazy_scan_heap(). That should catch relevant
actions from heap_page_prune().

Besides not warning in against deletions from heap_page_prune(), the
current warning logic is also buggy for tables without indexes...

/*
* If there are no indexes then we can vacuum the page right now
* instead of doing a second scan.
*/
if (nindexes == 0 &&
vacrelstats->num_dead_tuples > 0)
{
/* Remove tuples from heap */
lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
has_dead_tuples = false;

That happens before the
else if (PageIsAllVisible(page) && has_dead_tuples)
check.

With regard to fixing things up, ISTM the best bet is heap_prune_chain()
so far. That's executed b vacuum and by opportunistic pruning and we
know we have the appropriate locks there. Looks relatively easy to fix
up things there. Not sure if there are any possible routes to WAL log
this but using log_newpage()?
I am really not sure what the best course of action is :(

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#9)
Re: Incomplete freezing when truncating a relation during vacuum

How would you characterize the chances of this happening with default
*vacuum_freeze_*_age settings? Offhand, it seems you would need to encounter
this bug during each of ~10 generations of autovacuum_freeze_max_age before
the old rows actually become invisible.

On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:

With regard to fixing things up, ISTM the best bet is heap_prune_chain()
so far. That's executed b vacuum and by opportunistic pruning and we
know we have the appropriate locks there. Looks relatively easy to fix
up things there. Not sure if there are any possible routes to WAL log
this but using log_newpage()?
I am really not sure what the best course of action is :(

Maximizing detection is valuable, and the prognosis for automated repair is
poor. I would want a way to extract tuples having xmin outside the range of
CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible page. At
first, I supposed we could offer a tool to blindly freeze such tuples.
However, there's no guarantee that they are in harmony with recent changes to
the database; transactions that wrongly considered those tuples invisible may
have made decisions incompatible with their existence. For example, reviving
such a tuple could violate a UNIQUE constraint if the user had already
replaced the missing row manually. A module that offers "SELECT * FROM
rows_wrongly_invisible('anytable')" would aid manual cleanup efforts.
freeze_if_wrongly_invisible(tid) would be useful, too.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#10)
Re: Incomplete freezing when truncating a relation during vacuum

On 2013-11-27 14:53:27 -0500, Noah Misch wrote:

How would you characterize the chances of this happening with default
*vacuum_freeze_*_age settings? Offhand, it seems you would need to encounter
this bug during each of ~10 generations of autovacuum_freeze_max_age before
the old rows actually become invisible.

I think realistically, to actually trigger the bug, it needs to happen
quite a bit more often. But in some workloads it's pretty darn easy to
hit. E.g. if significant parts of the table are regularly deleted, lots,
if not most, of your vacuums will spuriously increase relfrozenxid above
the actual value. Each time only by a small amount, but due to that
small increase there never will be an actual full table vacuum since
freeze_table_age will never even remotely be reached.

The client that made me look into the issue noticed problems on
pg_attribute - presumably because of temporary table usage primarily
affecting the tail end of pg_attribute.

On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:

With regard to fixing things up, ISTM the best bet is heap_prune_chain()
so far. That's executed b vacuum and by opportunistic pruning and we
know we have the appropriate locks there. Looks relatively easy to fix
up things there. Not sure if there are any possible routes to WAL log
this but using log_newpage()?
I am really not sure what the best course of action is :(

Maximizing detection is valuable, and the prognosis for automated repair is
poor. I would want a way to extract tuples having xmin outside the range of
CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible
page.

I think the likelihood of the problem affecting !all-visible pages is
close to zero. Each vacuum will try to clean those, so they surely will
get vacuumed at some point. I think the only way that could happen is if
the ConditionalLockBufferForCleanup() fails in each vacuum. And that
seems a bit unlikely.

At first, I supposed we could offer a tool to blindly freeze such tuples.
However, there's no guarantee that they are in harmony with recent changes to
the database; transactions that wrongly considered those tuples invisible may
have made decisions incompatible with their existence. For example, reviving
such a tuple could violate a UNIQUE constraint if the user had already
replaced the missing row manually.

Good point, although since they are all on all-visible pages sequential
scans will currently already find those. It's primarily index scans that
won't. So it's not really reviving them...
The primary reason why I think it might be a good idea to "revive"
automatically is, that an eventual full-table/freeze vacuum will
currently delete them which seems bad.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#11)
Re: Incomplete freezing when truncating a relation during vacuum

On Wed, Nov 27, 2013 at 10:43:05PM +0100, Andres Freund wrote:

On 2013-11-27 14:53:27 -0500, Noah Misch wrote:

How would you characterize the chances of this happening with default
*vacuum_freeze_*_age settings? Offhand, it seems you would need to encounter
this bug during each of ~10 generations of autovacuum_freeze_max_age before
the old rows actually become invisible.

I think realistically, to actually trigger the bug, it needs to happen
quite a bit more often. But in some workloads it's pretty darn easy to
hit. E.g. if significant parts of the table are regularly deleted, lots,
if not most, of your vacuums will spuriously increase relfrozenxid above
the actual value. Each time only by a small amount, but due to that
small increase there never will be an actual full table vacuum since
freeze_table_age will never even remotely be reached.

That makes sense.

Maximizing detection is valuable, and the prognosis for automated repair is
poor. I would want a way to extract tuples having xmin outside the range of
CLOG that are marked HEAP_XMIN_COMMITTED or appear on an all-visible
page.

I think the likelihood of the problem affecting !all-visible pages is
close to zero. Each vacuum will try to clean those, so they surely will
get vacuumed at some point. I think the only way that could happen is if
the ConditionalLockBufferForCleanup() fails in each vacuum. And that
seems a bit unlikely.

The page could have sat all-visible (through multiple XID epochs, let's say)
until a recent UPDATE.

At first, I supposed we could offer a tool to blindly freeze such tuples.
However, there's no guarantee that they are in harmony with recent changes to
the database; transactions that wrongly considered those tuples invisible may
have made decisions incompatible with their existence. For example, reviving
such a tuple could violate a UNIQUE constraint if the user had already
replaced the missing row manually.

Good point, although since they are all on all-visible pages sequential
scans will currently already find those. It's primarily index scans that
won't. So it's not really reviving them...

True. Since a dump/reload of the database would already get the duplicate key
violation, the revival is not making anything clearly worse. And if we hope
for manual repair, many DBAs just won't do that at all.

The primary reason why I think it might be a good idea to "revive"
automatically is, that an eventual full-table/freeze vacuum will
currently delete them which seems bad.

Will it? When the page became all-visible, the tuples were all hinted. They
will never be considered dead. Every 2B transactions, they will alternate
between live and not-yet-committed.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#12)
Re: Incomplete freezing when truncating a relation during vacuum

On 2013-11-27 18:18:02 -0500, Noah Misch wrote:

I think the likelihood of the problem affecting !all-visible pages is
close to zero. Each vacuum will try to clean those, so they surely will
get vacuumed at some point. I think the only way that could happen is if
the ConditionalLockBufferForCleanup() fails in each vacuum. And that
seems a bit unlikely.

The page could have sat all-visible (through multiple XID epochs, let's say)
until a recent UPDATE.

Good point.

At first, I supposed we could offer a tool to blindly freeze such tuples.
However, there's no guarantee that they are in harmony with recent changes to
the database; transactions that wrongly considered those tuples invisible may
have made decisions incompatible with their existence. For example, reviving
such a tuple could violate a UNIQUE constraint if the user had already
replaced the missing row manually.

Good point, although since they are all on all-visible pages sequential
scans will currently already find those. It's primarily index scans that
won't. So it's not really reviving them...

True. Since a dump/reload of the database would already get the duplicate key
violation, the revival is not making anything clearly worse. And if we hope
for manual repair, many DBAs just won't do that at all.

Especially if it involves compiling C code...

The primary reason why I think it might be a good idea to "revive"
automatically is, that an eventual full-table/freeze vacuum will
currently delete them which seems bad.

Will it? When the page became all-visible, the tuples were all hinted. They
will never be considered dead. Every 2B transactions, they will alternate
between live and not-yet-committed.

Good point again. And pretty damn good luck.

Although 9.3+ multixact rows look like they could return _DEAD since
we'll do an TransactionIdDidAbort() (via GetUpdateXid()) and
TransactionIdDidCommit() in there and we don't set XMAX_COMMITTED hint
bits for XMAX_IS_MULTI rows. As an additional problem, once multixact.c
has pruned the old multis away we'll get errors from there on.
But that's less likely to affect many rows.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#11)
Re: Incomplete freezing when truncating a relation during vacuum

On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:

With regard to fixing things up, ISTM the best bet is heap_prune_chain()
so far. That's executed b vacuum and by opportunistic pruning and we
know we have the appropriate locks there. Looks relatively easy to fix
up things there. Not sure if there are any possible routes to WAL log
this but using log_newpage()?
I am really not sure what the best course of action is :(

Based on subsequent thread discussion, the plan you outline sounds reasonable.
Here is a sketch of the specific semantics of that fixup. If a HEAPTUPLE_LIVE
tuple has XIDs older than the current relfrozenxid/relminmxid of its relation
or newer than ReadNewTransactionId()/ReadNextMultiXactId(), freeze those XIDs.
Do likewise for HEAPTUPLE_DELETE_IN_PROGRESS, ensuring a proper xmin if the
in-progress deleter aborts. Using log_newpage_buffer() seems fine; there's no
need to optimize performance there. (All the better if we can, with minimal
hacks, convince heap_freeze_tuple() itself to log the right changes.)

I am wary about the performance loss of doing these checks in every
heap_prune_chain() call, for all time. If it's measurable, can we can shed
the overhead once corrections are done? Maybe bump the page layout version
and skip the checks for v5 pages? (Ugh.)

Time is tight to finalize this, but it would be best to get this into next
week's release. That way, the announcement, fix, and mitigating code
pertaining to this data loss bug all land in the same release. If necessary,
I think it would be worth delaying the release, or issuing a new release a
week or two later, to closely align those events. That being said, I'm
prepared to review a patch in this area over the weekend.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#14)
Re: Incomplete freezing when truncating a relation during vacuum

Hi Noah,

On 2013-11-30 00:40:06 -0500, Noah Misch wrote:

On Wed, Nov 27, 2013 at 02:14:53PM +0100, Andres Freund wrote:

With regard to fixing things up, ISTM the best bet is heap_prune_chain()
so far. That's executed b vacuum and by opportunistic pruning and we
know we have the appropriate locks there. Looks relatively easy to fix
up things there. Not sure if there are any possible routes to WAL log
this but using log_newpage()?
I am really not sure what the best course of action is :(

Based on subsequent thread discussion, the plan you outline sounds reasonable.
Here is a sketch of the specific semantics of that fixup. If a HEAPTUPLE_LIVE
tuple has XIDs older than the current relfrozenxid/relminmxid of its relation
or newer than ReadNewTransactionId()/ReadNextMultiXactId(), freeze those XIDs.
Do likewise for HEAPTUPLE_DELETE_IN_PROGRESS, ensuring a proper xmin if the
in-progress deleter aborts. Using log_newpage_buffer() seems fine; there's no
need to optimize performance there.

We'd need to decide what to do with xmax values, they'd likely need to
be treated differently.

The problem with log_newpage_buffer() is that we'd quite possibly issue
one such call per item on a page. And that might become quite
expensive. Logging ~1.5MB per 8k page in the worst case sounds a bit
scary.

(All the better if we can, with minimal
hacks, convince heap_freeze_tuple() itself to log the right changes.)

That likely comes to late - we've already pruned the page and might have
made wrong decisions there. Also, heap_freeze_tuple() is run on both the
primary and standbys.
I think our xl_heap_freeze format, that relies on running
heap_freeze_tuple() during recovery, is a terrible idea, but we cant
change that right now.

Time is tight to finalize this, but it would be best to get this into next
week's release. That way, the announcement, fix, and mitigating code
pertaining to this data loss bug all land in the same release. If necessary,
I think it would be worth delaying the release, or issuing a new release a
week or two later, to closely align those events. That being said, I'm
prepared to review a patch in this area over the weekend.

I don't think I currently have the energy/brainpower/time to develop
such a fix in a suitable quality till monday. I've worked pretty hard on
trying to fix the host of multixact data corruption bugs the last days
and developing a solution that I'd be happy to put into such critical
paths is certainly several days worth of work.

I am not sure if it's a good idea to delay the release because of this,
there are so many other critical issues that that seems like a bad
tradeoff.

That said, if somebody else is taking the lead I am certainly willing to
help in detail with review and testing.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#15)
Re: Incomplete freezing when truncating a relation during vacuum

Andres Freund <andres@2ndquadrant.com> writes:

I am not sure if it's a good idea to delay the release because of this,
there are so many other critical issues that that seems like a bad
tradeoff.

Indeed. We already said that this release was being done *now* because
of the replication bug, and I see no reason to change that. The more
last-minute stuff we try to cram in, the bigger risk of (another)
mistake. We've already taken a credibility hit from introducing a new
bug into the last round of update releases; let's please not take a
risk of doing that again.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#16)
Re: Incomplete freezing when truncating a relation during vacuum

On 2013-11-30 11:18:09 -0500, Tom Lane wrote:

We've already taken a credibility hit from introducing a new
bug into the last round of update releases; let's please not take a
risk of doing that again.

On that front: I'd love for somebody else to look at the revised 9.3
freezing logic. It's way to complicated and there are quite some
subtleties about freezing that are hard to get right, as evidenced by
9.3.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#17)
Re: Incomplete freezing when truncating a relation during vacuum

On Sat, Nov 30, 2013 at 05:22:04PM +0100, Andres Freund wrote:

On that front: I'd love for somebody else to look at the revised 9.3
freezing logic.

Do you speak of the changes to xmax freezing arising from the FK lock
optimization?

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#16)
Re: Incomplete freezing when truncating a relation during vacuum

On 2013-11-30 11:18:09 -0500, Tom Lane wrote:

Indeed. We already said that this release was being done *now* because
of the replication bug, and I see no reason to change that.

FWIW, I think the two other data corrupting bugs, "incomplete freezing
due to truncation" (all branches) and freezing overall (in 9.3), are at
least as bad because they take effect on the primary.
Not saying that because of my involvement, but because I think they need
to be presented at least as prominent in the release notes.
They bugs themselves are all fixed in the relevant branches, but I do
think we need to talk about about how to detect and fix possible
corruption.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#19)
Re: Incomplete freezing when truncating a relation during vacuum

Andres Freund <andres@2ndquadrant.com> writes:

FWIW, I think the two other data corrupting bugs, "incomplete freezing
due to truncation" (all branches) and freezing overall (in 9.3), are at
least as bad because they take effect on the primary.
Not saying that because of my involvement, but because I think they need
to be presented at least as prominent in the release notes.
They bugs themselves are all fixed in the relevant branches, but I do
think we need to talk about about how to detect and fix possible
corruption.

I was planning to draft up the release notes today. Can you propose
text about the above?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#18)
#22Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#20)
#23Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#15)
#24Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#22)
#26Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#20)
#27Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#10)
#28Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#27)
#29Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#28)
#30Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#29)
#31Noah Misch
noah@leadboat.com
In reply to: Andres Freund (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#26)
#33Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#32)
#34Andres Freund
andres@anarazel.de
In reply to: Noah Misch (#31)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#33)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#26)
#37Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#35)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#37)
#40Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#39)