pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c

Started by Simon Riggsover 13 years ago6 messageshackers
Jump to latest
#1Simon Riggs
simon@2ndQuadrant.com

Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown checkpoint.
Recovery code documents clearly that a shutdown checkpoint is executed at
end of recovery - a shutdown checkpoint WAL record is written but the buffer
manager had been altered to treat end of recovery as a normal checkpoint.
This bug exacerbates the bufmgr relpersistence bug.

Bug spotted by Andres Freund, patch by me.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/64e196b6efbd58893a4381013a35c84b167b4856

Modified Files
--------------
src/backend/storage/buffer/bufmgr.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

#2Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#1)
Re: [COMMITTERS] pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c

On Sun, Sep 16, 2012 at 2:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown checkpoint.
Recovery code documents clearly that a shutdown checkpoint is executed at
end of recovery - a shutdown checkpoint WAL record is written but the buffer
manager had been altered to treat end of recovery as a normal checkpoint.
This bug exacerbates the bufmgr relpersistence bug.

Bug spotted by Andres Freund, patch by me.

I am confused by this patch. It seems to me that the effect of this
patch is to force unlogged buffers to be written at end-of-recovery as
well as at shutdown. But, barring bugs elsewhere, there shouldn't be
any unlogged buffers in shared_buffers at end-of-recovery, so this
won't make any difference at all. Am I missing something?

Maybe what we should do is - if this is an end-of-recovery checkpoint
- *assert* that the BM_PERMANENT bit is set on every buffer we find.
That would provide a useful cross-check that we don't have a bug
similar to the one Jeff already fixed in any other code path.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#2)
Re: [COMMITTERS] pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c

On Monday, September 17, 2012 04:59:06 PM Robert Haas wrote:

On Sun, Sep 16, 2012 at 2:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown
checkpoint. Recovery code documents clearly that a shutdown checkpoint
is executed at end of recovery - a shutdown checkpoint WAL record is
written but the buffer manager had been altered to treat end of recovery
as a normal checkpoint. This bug exacerbates the bufmgr relpersistence
bug.

Bug spotted by Andres Freund, patch by me.

I am confused by this patch. It seems to me that the effect of this
patch is to force unlogged buffers to be written at end-of-recovery as
well as at shutdown. But, barring bugs elsewhere, there shouldn't be
any unlogged buffers in shared_buffers at end-of-recovery, so this
won't make any difference at all. Am I missing something?

I just noted during investigating of the impact of the fakerelcache bug that
contrary to whats claimed at several places END_OF_RECOVERY checkpoints do
*not* behave the same way CHECKPOINT_IS_SHUTDOWN ones do. Which doesn't seem to
be a good idea. E.g. the impact of this bug would have been smaller if they
were really treated the same. Unless I missed something thats the only place of
relevance that treats them differently.
Imo treating them different in some remote places (2 calls away) is a good way
to introduce further bugs.

Maybe what we should do is - if this is an end-of-recovery checkpoint
- *assert* that the BM_PERMANENT bit is set on every buffer we find.
That would provide a useful cross-check that we don't have a bug
similar to the one Jeff already fixed in any other code path.

I haven't looked into the details, but can't a new unlogged relation be created
since the last checkpoint and thus have pages in s_b?

Greetings,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#2)
Re: [COMMITTERS] pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c

On 17 September 2012 15:59, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Sep 16, 2012 at 2:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown checkpoint.
Recovery code documents clearly that a shutdown checkpoint is executed at
end of recovery - a shutdown checkpoint WAL record is written but the buffer
manager had been altered to treat end of recovery as a normal checkpoint.
This bug exacerbates the bufmgr relpersistence bug.

Bug spotted by Andres Freund, patch by me.

I am confused by this patch. It seems to me that the effect of this
patch is to force unlogged buffers to be written at end-of-recovery as
well as at shutdown. But, barring bugs elsewhere, there shouldn't be
any unlogged buffers in shared_buffers at end-of-recovery, so this
won't make any difference at all.

There shouldn't be, but this coding is the fail safe way.

Am I missing something?

If you or others do, this will save us.

Maybe what we should do is - if this is an end-of-recovery checkpoint
- *assert* that the BM_PERMANENT bit is set on every buffer we find.
That would provide a useful cross-check that we don't have a bug
similar to the one Jeff already fixed in any other code path.

Safety net is needed there, not an Assert.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#5Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#3)
Re: [COMMITTERS] pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c

On Mon, Sep 17, 2012 at 11:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:

I just noted during investigating of the impact of the fakerelcache bug that
contrary to whats claimed at several places END_OF_RECOVERY checkpoints do
*not* behave the same way CHECKPOINT_IS_SHUTDOWN ones do. Which doesn't seem to
be a good idea. E.g. the impact of this bug would have been smaller if they
were really treated the same. Unless I missed something thats the only place of
relevance that treats them differently.
Imo treating them different in some remote places (2 calls away) is a good way
to introduce further bugs.

OK, I can agree with that. As a backstop against future mistakes, it
makes some sense to me.

Maybe what we should do is - if this is an end-of-recovery checkpoint
- *assert* that the BM_PERMANENT bit is set on every buffer we find.
That would provide a useful cross-check that we don't have a bug
similar to the one Jeff already fixed in any other code path.

I haven't looked into the details, but can't a new unlogged relation be created
since the last checkpoint and thus have pages in s_b?

Data changes to unlogged relations are not WAL-logged, so there's no
reason for recovery to ever read them. Even if such a reason existed,
there wouldn't be anything to read, because the backing files are
unlinked before WAL replay begins.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#5)
Re: [COMMITTERS] pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c

On Tuesday, September 18, 2012 04:18:01 AM Robert Haas wrote:

Maybe what we should do is - if this is an end-of-recovery checkpoint
- *assert* that the BM_PERMANENT bit is set on every buffer we find.
That would provide a useful cross-check that we don't have a bug
similar to the one Jeff already fixed in any other code path.

I haven't looked into the details, but can't a new unlogged relation be
created since the last checkpoint and thus have pages in s_b?

Data changes to unlogged relations are not WAL-logged, so there's no
reason for recovery to ever read them. Even if such a reason existed,
there wouldn't be anything to read, because the backing files are
unlinked before WAL replay begins.

Back then I thought that resetting the relation by copying the init fork might
use the buffer cache. It doesn't atm...

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services