Bug in ginRedoRecompress that causes opaque data on page to be overrun

Started by R, Sivaover 7 years ago13 messageshackers

sivasubr@amazon.com

over 7 years ago

Hi,

We recently encountered an issue where the opaque data flags on a gin data leaf page was corrupted while replaying a gin insert WAL record. Upon further examination of the redo code, we found a bug in ginRedoRecompress code, which extracts the WAL information and updates the page.

Specifically, when a new segment is inserted in the middle of a page, a memmove operation is performed [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/ginxlog.c;h=7515f8bc167c2eafceced5d6ad5d74f7ec09e0a5;hb=refs/heads/REL9_6_STABLE#l278 at the current point in the page to make room for the new segment. If this segment insertion is followed by delete segment actions that are yet to be processed and the total data size is very close to GinDataPageMaxDataSize, then we may move the data portion beyond the boundary causing the opaque data to be corrupted.

One way of solving this problem is to perform the replay work on a scratch space, perform sanity check on the total size of the data portion before copying it back to the actual page. While it involves additional memory allocation and memcpy operations, it is safer and similar to the 'do' code path where we ensure to make a copy of all segment past the first modified segment before placing them back on the page [2]https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/gindatapage.c;h=cd3b9dfb784b084dd27a37146a4909fa1109ee81;hb=refs/heads/REL9_6_STABLE#l1726.

I have attached a patch for that approach here. Please let us know any comments or feedback.
Thanks!

Best
Siva

References:
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/ginxlog.c;h=7515f8bc167c2eafceced5d6ad5d74f7ec09e0a5;hb=refs/heads/REL9_6_STABLE#l278
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/gin/gindatapage.c;h=cd3b9dfb784b084dd27a37146a4909fa1109ee81;hb=refs/heads/REL9_6_STABLE#l1726

Alexander Korotkov

aekorotkov@gmail.com

over 7 years ago

In reply to: R, Siva (#1)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

Hi, Siva!

On Tue, Sep 4, 2018 at 11:01 PM R, Siva <sivasubr@amazon.com> wrote:

We recently encountered an issue where the opaque data flags on a gin data leaf page was corrupted while replaying a gin insert WAL record. Upon further examination of the redo code, we found a bug in ginRedoRecompress code, which extracts the WAL information and updates the page.

Specifically, when a new segment is inserted in the middle of a page, a memmove operation is performed [1] at the current point in the page to make room for the new segment. If this segment insertion is followed by delete segment actions that are yet to be processed and the total data size is very close to GinDataPageMaxDataSize, then we may move the data portion beyond the boundary causing the opaque data to be corrupted.

One way of solving this problem is to perform the replay work on a scratch space, perform sanity check on the total size of the data portion before copying it back to the actual page. While it involves additional memory allocation and memcpy operations, it is safer and similar to the 'do' code path where we ensure to make a copy of all segment past the first modified segment before placing them back on the page [2].

I have attached a patch for that approach here. Please let us know any comments or feedback.

Do you have a test scenario for reproduction of this issue? We need
it to ensure that fix is correct.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Peter Geoghegan

pg@bowt.ie

over 7 years ago

In reply to: R, Siva (#1)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Tue, Sep 4, 2018 at 12:59 PM, R, Siva <sivasubr@amazon.com> wrote:

We recently encountered an issue where the opaque data flags on a gin data
leaf page was corrupted while replaying a gin insert WAL record. Upon
further examination of the redo code, we found a bug in ginRedoRecompress
code, which extracts the WAL information and updates the page.

I wonder how you managed to detect it in the first place. Were you
using something like wal_consistency_checking=all, with a custom
stress-test?

--
Peter Geoghegan

R, Siva

sivasubr@amazon.com

over 7 years ago

In reply to: Peter Geoghegan (#3)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

Hi Alexander!
On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

Do you have a test scenario for reproduction of this issue? We need
it to ensure that fix is correct.

Unfortunately, I do not have a way of reproducing this issue.
So far I have tried a workload consisting of inserts (of the
same attribute value that is indexed), batch deletes of rows
and vacuum interleaved with engine crash/restarts.

Hi Peter!
On Tue, Sep 4, 2018 at 09:55 PM, Peter Geoghegan <pg@bowt.ie> wrote:

I wonder how you managed to detect it in the first place. Were you
using something like wal_consistency_checking=all, with a custom
stress-test?

We observed this corruption during stress testing and were
able to isolate the corrupted page and WAL record changes
leading up to the corruption using some internal diagnostic
tools.

Best
Siva

Alexander Korotkov

aekorotkov@gmail.com

over 7 years ago

In reply to: R, Siva (#4)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Wed, Sep 5, 2018 at 1:45 AM R, Siva <sivasubr@amazon.com> wrote:

On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

Do you have a test scenario for reproduction of this issue? We need
it to ensure that fix is correct.

Unfortunately, I do not have a way of reproducing this issue.
So far I have tried a workload consisting of inserts (of the
same attribute value that is indexed), batch deletes of rows
and vacuum interleaved with engine crash/restarts.

Issue reproduction and testing is essential for bug fix. Remember
last time you reported GIN bug [1]/messages/by-id/1531867212836.63354@amazon.com, after issue reproduction it
appears that we have more things to fix. I's quite clear for me that
if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
then it might lead to wrong behavior in ginRedoRecompress(). But it's
not yet clear to understand what code patch could lead to such segment
list... I'll explore code more and probably will come with some idea.

Links
[1]: /messages/by-id/1531867212836.63354@amazon.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Alexander Korotkov

aekorotkov@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#5)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Wed, Sep 5, 2018 at 12:26 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 1:45 AM R, Siva <sivasubr@amazon.com> wrote:

On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

Do you have a test scenario for reproduction of this issue? We need
it to ensure that fix is correct.

Unfortunately, I do not have a way of reproducing this issue.
So far I have tried a workload consisting of inserts (of the
same attribute value that is indexed), batch deletes of rows
and vacuum interleaved with engine crash/restarts.

Issue reproduction and testing is essential for bug fix. Remember
last time you reported GIN bug [1], after issue reproduction it
appears that we have more things to fix. I's quite clear for me that
if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
then it might lead to wrong behavior in ginRedoRecompress(). But it's
not yet clear to understand what code patch could lead to such segment
list... I'll explore code more and probably will come with some idea.

Aha, I've managed to reproduce this.
1. Apply ginRedoRecompress-asserts.patch, which add assertions to
ginRedoRecompress() detecting past opaque writes.
2. Setup streaming replication.
3. Execute following on the master.
create or replace function test () returns void as $$
declare
i int;
begin
FOR i IN 1..1000 LOOP
insert into test values ('{1}');
end loop;
end
$$ language plpgsql;
create table test (a int[]);
insert into test (select '{}'::int[] from generate_series(1,10000));
insert into test (select '{1}'::int[] from generate_series(1,100000));
create index test_idx on test using gin(a) with (fastupdate = off);
delete from test where a = '{}'::int[];
vacuum test;
select test();

So, since we managed to reproduce this, I'm going to test and review your fix.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Alexander Korotkov

aekorotkov@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#6)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Wed, Sep 5, 2018 at 2:39 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 12:26 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 1:45 AM R, Siva <sivasubr@amazon.com> wrote:

On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

Do you have a test scenario for reproduction of this issue? We need
it to ensure that fix is correct.

Unfortunately, I do not have a way of reproducing this issue.
So far I have tried a workload consisting of inserts (of the
same attribute value that is indexed), batch deletes of rows
and vacuum interleaved with engine crash/restarts.

Issue reproduction and testing is essential for bug fix. Remember
last time you reported GIN bug [1], after issue reproduction it
appears that we have more things to fix. I's quite clear for me that
if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
then it might lead to wrong behavior in ginRedoRecompress(). But it's
not yet clear to understand what code patch could lead to such segment
list... I'll explore code more and probably will come with some idea.

Aha, I've managed to reproduce this.
1. Apply ginRedoRecompress-asserts.patch, which add assertions to
ginRedoRecompress() detecting past opaque writes.

It was wrong, sorry. It appears that I put strict inequality into
asserts, while there should be loose inequality. Correct version of
patch is attached. And scenario I've posted isn't really reproducing
the bug...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Masahiko Sawada

sawada.mshk@gmail.com

over 7 years ago

In reply to: R, Siva (#1)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Wed, Sep 5, 2018 at 4:59 AM, R, Siva <sivasubr@amazon.com> wrote:

Hi,

We recently encountered an issue where the opaque data flags on a gin data
leaf page was corrupted while replaying a gin insert WAL record. Upon
further examination of the redo code, we found a bug in ginRedoRecompress
code, which extracts the WAL information and updates the page.

Specifically, when a new segment is inserted in the middle of a page, a
memmove operation is performed [1] at the current point in the page to make
room for the new segment. If this segment insertion is followed by delete
segment actions that are yet to be processed and the total data size is very
close to GinDataPageMaxDataSize, then we may move the data portion beyond
the boundary causing the opaque data to be corrupted.

One way of solving this problem is to perform the replay work on a scratch
space, perform sanity check on the total size of the data portion before
copying it back to the actual page. While it involves additional memory
allocation and memcpy operations, it is safer and similar to the 'do' code
path where we ensure to make a copy of all segment past the first modified
segment before placing them back on the page [2].

Hmm, could you share the sequence of what kind of WAL has applied to
the broken page? I suspect the segment list contains
GIN_SEGMENT_REPLACE before GIN_SEGMENT_INSERT.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Alexander Korotkov

aekorotkov@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#7)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Wed, Sep 5, 2018 at 5:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 2:39 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 12:26 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 1:45 AM R, Siva <sivasubr@amazon.com> wrote:

On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

Do you have a test scenario for reproduction of this issue? We need
it to ensure that fix is correct.

Unfortunately, I do not have a way of reproducing this issue.
So far I have tried a workload consisting of inserts (of the
same attribute value that is indexed), batch deletes of rows
and vacuum interleaved with engine crash/restarts.

Issue reproduction and testing is essential for bug fix. Remember
last time you reported GIN bug [1], after issue reproduction it
appears that we have more things to fix. I's quite clear for me that
if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
then it might lead to wrong behavior in ginRedoRecompress(). But it's
not yet clear to understand what code patch could lead to such segment
list... I'll explore code more and probably will come with some idea.

Aha, I've managed to reproduce this.
1. Apply ginRedoRecompress-asserts.patch, which add assertions to
ginRedoRecompress() detecting past opaque writes.

It was wrong, sorry. It appears that I put strict inequality into
asserts, while there should be loose inequality. Correct version of
patch is attached. And scenario I've posted isn't really reproducing
the bug...

Finally I managed to reproduce the bug. The scenario is following.
Underlying idea is that when insertion of multiple tuples goes to the
beginning of the page and this insertion succeed only thanks to
collapse of some short segments together, then this insertion wouldn't
fit to the page if given alone.

create table test (i integer primary key, a int[]) with (fillfactor = 50);
insert into test (select id, array[id%2]::int[] from
generate_series(1,15376) id);
create index test_idx on test using gin(a) with (fastupdate = off);
update test set a = '{1}' where i % 4 = 0 or i % 16 = 2 or i % 64 in
(6, 46, 36) or i % 256 = 54;
vacuum test;
insert into test (select id + 16376, '{0}' from generate_series(1,5180) id);
update test set a = '{1}' where i <= 15376 and i % 256 in (182, 198);
vacuum test;
alter index test_idx set (fastupdate = on);
delete from test where i <= 134 and a = '{1}';
vacuum test;
insert into test (select id+30000, '{0}' from generate_series(1,110) id);
vacuum test;

With ginRedoRecompress-asserts-v2.patch following assertion is fired.
TRAP: FailedAssertion("!(segptr + newsegsize + (szleft - segsize) <= (
((void) ((_Bool) (! (!(PageValidateSpecialPointer(page))) ||
(ExceptionalCondition("!(PageValidateSpecialPointer(page))",
("FailedAssertion"), "ginxlog.c", 289), 0)))), (char *) ((char *)
(page) + ((PageHeader) (page))->pd_special) ))", File: "ginxlog.c",
Line: 289)

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#10

R, Siva

sivasubr@amazon.com

over 7 years ago

In reply to: Alexander Korotkov (#9)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Tue, Sep 5, 2018 at 08:55 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

Finally I managed to reproduce the bug. The scenario is following.
Underlying idea is that when insertion of multiple tuples goes to the
beginning of the page and this insertion succeed only thanks to
collapse of some short segments together, then this insertion wouldn't
fit to the page if given alone.

Sorry for the late reply.
Thank you so much for working on this and reproducing the issue!
Like you mentioned, the WAL record where we detected this problem
has future segments deleted due to compaction and written out
as an insert segment.

alter index test_idx set (fastupdate = on);

Just curious why does this help with the repro? This is related to only
using the Gin pending list vs the posting tree.

I will try to reproduce the issue with the above workload and
also test the fix with the same and report back.

On Wed, Sep 5, 2018 at 5:24 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hmm, could you share the sequence of what kind of WAL has applied to
the broken page? I suspect the segment list contains
GIN_SEGMENT_REPLACE before GIN_SEGMENT_INSERT.

These are the segment operations on the WAL in sequence:
- 1 Replace action on segment N
- 5 Insert actions after segment N - the 5th insert action is essentially
replacing the last 3 remaining segments with a new one.
- 3 delete actions on the remaining segments.

Best
Siva

#11

Alexander Korotkov

aekorotkov@gmail.com

over 7 years ago

In reply to: R, Siva (#10)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Thu, Sep 6, 2018 at 12:53 AM R, Siva <sivasubr@amazon.com> wrote:

On Tue, Sep 5, 2018 at 08:55 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

Finally I managed to reproduce the bug. The scenario is following.
Underlying idea is that when insertion of multiple tuples goes to the
beginning of the page and this insertion succeed only thanks to
collapse of some short segments together, then this insertion wouldn't
fit to the page if given alone.

Sorry for the late reply.
Thank you so much for working on this and reproducing the issue!
Like you mentioned, the WAL record where we detected this problem
has future segments deleted due to compaction and written out
as an insert segment.

alter index test_idx set (fastupdate = on);

Just curious why does this help with the repro? This is related to only
using the Gin pending list vs the posting tree.

With (fastupdate = on) GIN performs bulk update of posting lists,
inserting multiple tuples at once if possible. With (fastupdate =
off) GIN always inserts tuples one-by-one. It might be still possible
to reproduce the issue with (fastupdate = off), but it seems even
harder.

BTW, I've tried the patch you've posted. On my test case it fails
with following assertion.
TRAP: FailedAssertion("!(a_action == 2)", File: "ginxlog.c", Line: 243)

I thought about fixing this issue more, and I decided we can fix it in
less invasive way. Once modification is started we can copy tail of
the page into separately allocated chunk of memory, and the use it as
the source of original segments. See the patch attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#12

R, Siva

sivasubr@amazon.com

over 7 years ago

In reply to: Alexander Korotkov (#11)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Tue, Sep 6, 2018 at 09:53 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

With (fastupdate = on) GIN performs bulk update of posting lists,
inserting multiple tuples at once if possible. With (fastupdate =
off) GIN always inserts tuples one-by-one. It might be still possible
to reproduce the issue with (fastupdate = off), but it seems even
harder.

Ah I see. This is cool, I will keep in mind for future testing. Thanks!

BTW, I've tried the patch you've posted. On my test case it fails
with following assertion.
TRAP: FailedAssertion("!(a_action == 2)", File: "ginxlog.c", Line: 243)

I thought about fixing this issue more, and I decided we can fix it in
less invasive way. Once modification is started we can copy tail of
the page into separately allocated chunk of memory, and the use it as
the source of original segments. See the patch attached.

I'm also running into this assert with the workload, I think my patch is
not handling the case where the action is add items on the last segment
of the page correctly. I'm still investigating the issue further to find the
source of the bug.

Meanwhile I reviewed your patch and it looks good to me. I agree that
copying out the entire tail out to the scratch space in one shot vs copying
out every segment reduces the number of memcpy calls and simplifies
the solution overall. Let us go ahead with this patch.

Best
Siva

#13

Alexander Korotkov

aekorotkov@gmail.com

over 7 years ago

In reply to: R, Siva (#12)

Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

On Thu, Sep 6, 2018 at 9:02 PM R, Siva <sivasubr@amazon.com> wrote:

I'm also running into this assert with the workload, I think my patch is
not handling the case where the action is add items on the last segment
of the page correctly. I'm still investigating the issue further to find
the
source of the bug.

Meanwhile I reviewed your patch and it looks good to me. I agree that
copying out the entire tail out to the scratch space in one shot vs copying
out every segment reduces the number of memcpy calls and simplifies
the solution overall. Let us go ahead with this patch.

Thank you for review! Pushed with minor beautification.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Bug in ginRedoRecompress that causes opaque data on page to be overrun

Attachments:

Attachments:

Attachments:

Attachments: