Should buffer of initialization fork have a BM_PERMANENT flag

Started by Wang Haoabout 9 years ago11 messageshackers
Jump to latest
#1Wang Hao
whberet@gmail.com

An unlogged table has an initialization fork. The initialization fork does
not have an BM_PERMANENT flag when get a buffer.
In checkpoint (not shutdown or end of recovery), it will not write to disk.
after a crash recovery, the page of initialization fork will not correctly,
then make the main fork not correctly too.

Here is an example for GIN index.

create unlogged table gin_test_tbl(i int4[]);
create index gin_test_idx on gin_test_tbl using gin (i);
checkpoint;

kill all the postgres process, and restart again.

vacuum gin_test_tbl; -- crash.

It seems have same problem in BRIN, GIN, GiST and HASH index which using
buffer for meta page initialize in ambuildempty function.

#2Michael Paquier
michael@paquier.xyz
In reply to: Wang Hao (#1)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

(Adding Robert in CC.)

On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao <whberet@gmail.com> wrote:

An unlogged table has an initialization fork. The initialization fork does
not have an BM_PERMANENT flag when get a buffer.
In checkpoint (not shutdown or end of recovery), it will not write to disk.
after a crash recovery, the page of initialization fork will not correctly,
then make the main fork not correctly too.

For init forks the flush need absolutely to happen, so that's really
not good. We ought to fix BufferAlloc() appropriately here.

Here is an example for GIN index.

create unlogged table gin_test_tbl(i int4[]);
create index gin_test_idx on gin_test_tbl using gin (i);
checkpoint;

kill all the postgres process, and restart again.

vacuum gin_test_tbl; -- crash.

It seems have same problem in BRIN, GIN, GiST and HASH index which using
buffer for meta page initialize in ambuildempty function.

Yeah, other index AMs deal directly with the sync of the page, that's
why there is no issue for them.

So the patch attached fixes the problem by changing BufferAlloc() in
such a way that initialization forks are permanently written to disk,
which is what you are suggesting. As a simple fix for back-branches
that's enough, though on HEAD I think that we should really rework the
empty() routines so as the write goes through shared buffers first,
that seems more solid than relying on the sgmr routines to do this
work. Robert, what do you think?
--
Michael

Attachments:

unlogged-flush-fix.patchapplication/octet-stream; name=unlogged-flush-fix.patchDownload+8-0
#3Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#2)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

On Thu, Jan 26, 2017 at 9:14 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

So the patch attached fixes the problem by changing BufferAlloc() in
such a way that initialization forks are permanently written to disk,
which is what you are suggesting. As a simple fix for back-branches
that's enough, though on HEAD I think that we should really rework the
empty() routines so as the write goes through shared buffers first,
that seems more solid than relying on the sgmr routines to do this
work. Robert, what do you think?

Attached is what I have in mind for HEAD. btree, gist, spgist and
bloom indexes are changed so as the init forks created go through the
shared buffers instead of having their empty() routines handle the
flush of the page created. This removes any kind of race conditions
between the checkpointer and the init fork creations, which is I think
a good thing.

Here are the tests I have done.
First running those commands to create all types of indexes.
create extension bloom;
create extension btree_gist;
create extension btree_gin;
create unlogged table foo (a int);
create index foo_bt on foo(a);
create index foo_bloom on foo using bloom(a);
create index foo_gin on foo using gin (a);
create index foo_gist on foo using gist (a);
create index foo_brin on foo using brin (a);
create unlogged table foo_geo (a box);
create index foo_spgist ON foo_geo using spgist(a);
checkpoint;

Then crash the server, restart it, and the following vacuums are able
to complete.
vacuum foo;
vacuum foo_geo;

I have as well created a CF entry for this set of patches:
https://commitfest.postgresql.org/13/971/

Thanks,
--
Michael

Attachments:

unlogged-flush-fix-head.patchapplication/x-patch; name=unlogged-flush-fix-head.patchDownload+80-86
#4Arthur Zakirov
a.zakirov@postgrespro.ru
In reply to: Michael Paquier (#3)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

Hello,

I wanted to review the patch. But the patch is applied with errors. I've
rebased the local copy and have done review on it. I'm not sure is it
properly to send rebased patch by reviewer, so I haven't sent it to
avoid confuses.

On 29.01.2017 17:00, Michael Paquier wrote:

Attached is what I have in mind for HEAD. btree, gist, spgist and
bloom indexes are changed so as the init forks created go through the
shared buffers instead of having their empty() routines handle the
flush of the page created. This removes any kind of race conditions
between the checkpointer and the init fork creations, which is I think
a good thing.

I think this is good fixes. I've checked them. And in my opinion they
are correct.

The code also is good.

Here are the tests I have done.
First running those commands to create all types of indexes.
create extension bloom;
create extension btree_gist;
create extension btree_gin;
create unlogged table foo (a int);
create index foo_bt on foo(a);
create index foo_bloom on foo using bloom(a);
create index foo_gin on foo using gin (a);
create index foo_gist on foo using gist (a);
create index foo_brin on foo using brin (a);
create unlogged table foo_geo (a box);
create index foo_spgist ON foo_geo using spgist(a);
checkpoint;

Then crash the server, restart it, and the following vacuums are able
to complete.
vacuum foo;
vacuum foo_geo;

I've done this tests. Before the patch server crashes on vacuum command.
After applying the patch server doesn't crash on vacuum command.

I have run regression and TAP tests. They all passed without error.

I think the patch can be marked as "Ready for Committer" after rebase.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Michael Paquier
michael@paquier.xyz
In reply to: Arthur Zakirov (#4)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

On Thu, Mar 9, 2017 at 10:25 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:

I think this is good fixes. I've checked them. And in my opinion they are
correct.

The code also is good.

Having something with conflicts is not nice, so attached is a rebased version.

I have run regression and TAP tests. They all passed without error.

I think the patch can be marked as "Ready for Committer" after rebase.

Thanks for the review.
--
Michael

Attachments:

unlogged-flush-fix-head-v2.patchapplication/octet-stream; name=unlogged-flush-fix-head-v2.patchDownload+80-86
#6Arthur Zakirov
a.zakirov@postgrespro.ru
In reply to: Michael Paquier (#5)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

On 10.03.2017 04:00, Michael Paquier wrote:

On Thu, Mar 9, 2017 at 10:25 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:

I think this is good fixes. I've checked them. And in my opinion they are
correct.

The code also is good.

Having something with conflicts is not nice, so attached is a rebased version.

Thank you!

I've rerun regression and TAP tests. They all passed.

Also maybe it will be good to fix comments.

In buf_internals.h:

#define BM_PERMANENT (1U << 31) /* permanent relation (not
* unlogged) */

And in FlushBuffer():

/*
* Force XLOG flush up to buffer's LSN. This implements the basic WAL
* rule that log updates must hit disk before any of the data-file changes
* they describe do.
*
* However, this rule does not apply to unlogged relations, which will be
* lost after a crash anyway. Most unlogged relation pages do not bear

Because BM_PERMANENT is used for init forks of unlogged indexes now.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Michael Paquier
michael@paquier.xyz
In reply to: Arthur Zakirov (#6)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

On Sat, Mar 11, 2017 at 12:03 AM, Artur Zakirov
<a.zakirov@postgrespro.ru> wrote:

Because BM_PERMANENT is used for init forks of unlogged indexes now.

Yes, indeed.
--
Michael

Attachments:

unlogged-flush-fix-head-v3.patchtext/x-patch; charset=US-ASCII; name=unlogged-flush-fix-head-v3.patchDownload+83-88
#8Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#2)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

On Wed, Jan 25, 2017 at 7:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

(Adding Robert in CC.)

On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao <whberet@gmail.com> wrote:

An unlogged table has an initialization fork. The initialization fork does
not have an BM_PERMANENT flag when get a buffer.
In checkpoint (not shutdown or end of recovery), it will not write to disk.
after a crash recovery, the page of initialization fork will not correctly,
then make the main fork not correctly too.

For init forks the flush need absolutely to happen, so that's really
not good. We ought to fix BufferAlloc() appropriately here.

I agree with that, but I propose the attached version instead. It
seems cleaner to have the entire test for setting BM_PERMANENT in one
place rather than splitting it up as you did.

I believe this sets a record for the longest-lived data corruption bug
in a commit made by me. Six years and change, woohoo. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

unlogged-flush-fix-rmh.patchapplication/octet-stream; name=unlogged-flush-fix-rmh.patchDownload+6-1
#9Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#8)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

On Tue, Mar 14, 2017 at 4:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 25, 2017 at 7:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

(Adding Robert in CC.)

On Thu, Jan 26, 2017 at 4:34 AM, Wang Hao <whberet@gmail.com> wrote:

An unlogged table has an initialization fork. The initialization fork does
not have an BM_PERMANENT flag when get a buffer.
In checkpoint (not shutdown or end of recovery), it will not write to disk.
after a crash recovery, the page of initialization fork will not correctly,
then make the main fork not correctly too.

For init forks the flush need absolutely to happen, so that's really
not good. We ought to fix BufferAlloc() appropriately here.

I agree with that, but I propose the attached version instead. It
seems cleaner to have the entire test for setting BM_PERMANENT in one
place rather than splitting it up as you did.

Fine for me. You may want to update the comment of BM_PERMANENT in
buf_internals.h as Artur has mentioned upthread. For example by just
adding "and init forks".

I believe this sets a record for the longest-lived data corruption bug
in a commit made by me.

Really? I'll need to double-check the git history here.

Six years and change, woohoo. :-(

And that much for someone to report it.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#9)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

On Mon, Mar 13, 2017 at 8:51 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I agree with that, but I propose the attached version instead. It
seems cleaner to have the entire test for setting BM_PERMANENT in one
place rather than splitting it up as you did.

Fine for me. You may want to update the comment of BM_PERMANENT in
buf_internals.h as Artur has mentioned upthread. For example by just
adding "and init forks".

OK, done, and back-patched all the way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#10)
Re: Should buffer of initialization fork have a BM_PERMANENT flag

On Wed, Mar 15, 2017 at 1:13 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 13, 2017 at 8:51 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

I agree with that, but I propose the attached version instead. It
seems cleaner to have the entire test for setting BM_PERMANENT in one
place rather than splitting it up as you did.

Fine for me. You may want to update the comment of BM_PERMANENT in
buf_internals.h as Artur has mentioned upthread. For example by just
adding "and init forks".

OK, done, and back-patched all the way.

Thanks.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers