[PATCH] BUG FIX: inconsistent page found in BRIN_REGULAR_PAGE
Hi hackers,
I found that when wal_consistency_checking = brin is set, it may cause redo
abort, all the standby-nodes lost, and the primary node can not be restart.
This bug exists in all versions of PostgreSQL.
The operation steps are as follows:
1. Create a primary instance, set wal_consistency_checking = brin, and
start the primary instance.
initdb -D pg_test
echo "wal_consistency_checking = brin" >> pg_test/postgresql.conf
echo "port=53320" >> pg_test/postgresql.conf
pg_ctl start -D pg_test -l pg_test.logfile
2. Create a standby instance.
pg_basebackup -R -p 53320 -D pg_test_slave
echo "wal_consistency_checking = brin" >>
pg_test_slave/postgresql.conf
echo "port=53321" >> pg_test_slave/postgresql.conf
pg_ctl start -D pg_test_slave -l pg_test_slave.logfile
3. Execute brin_redo_abort.sql through psql, and find that the standby
machine is lost.
psql -p 53320 -f brin_redo_abort.sql
4. The standby instance is lost during redo, FATAL messages as follows:
FATAL: inconsistent page found, rel 1663/12978/16387, forknum 0,
blkno 2
5. The primary instance cannot be restarted through pg_ctl restart -mi.
pg_ctl restart -D pg_test -mi -l pg_test.logfile
6. FATAL messages when restart primary instance as follows:
FATAL: inconsistent page found, rel 1663/12978/16387, forknum 0,
blkno 2
I analyzed the reasons as follows:
1. When the revmap needs to be extended by brinRevmapExtend,
we may set BRIN_EVACUATE_PAGE flag on a REGULAR_PAGE to prevent
other concurrent backends from adding more BrinTuple to that page
in brin_start_evacuating_page.
2. But, during redo-process, it is not needed to set BRIN_EVACUATE_PAGE
flag on that REGULAR_PAGE after removing the old BrinTuple in
brin_xlog_update, since no one will add BrinTuple to that Page at
this time.
3. As a result, this will cause a FATAL message to be thrown in
CheckXLogConsistency after redo, due to inconsistency checking of
the BRIN_EVACUATE_PAGE flag, finally cause redo to abort.
4. Therefore, the BRIN_EVACUATE_PAGE flag should be cleared before
CheckXLogConsistency.
For the above reasons, the patch file, sql file, shell script file, and the
log files are given in the attachment.
Best Regards!
Haiyang Wang
Attachments:
0001-clear-BRIN_EVACUATE_PAGE-before-consistency-checking.patchapplication/octet-stream; name=0001-clear-BRIN_EVACUATE_PAGE-before-consistency-checking.patchDownload+24-1
At Wed, 3 Aug 2022 02:37:30 -0400, 王海洋 <wanghaiyang.001@bytedance.com> wrote in
Hi hackers,
I found that when wal_consistency_checking = brin is set, it may cause redo
abort, all the standby-nodes lost, and the primary node can not be restart.This bug exists in all versions of PostgreSQL.
The operation steps are as follows:
Nice reproducer!
I analyzed the reasons as follows:
1. When the revmap needs to be extended by brinRevmapExtend,
we may set BRIN_EVACUATE_PAGE flag on a REGULAR_PAGE to prevent
other concurrent backends from adding more BrinTuple to that page
in brin_start_evacuating_page.2. But, during redo-process, it is not needed to set BRIN_EVACUATE_PAGE
flag on that REGULAR_PAGE after removing the old BrinTuple in
brin_xlog_update, since no one will add BrinTuple to that Page at
this time.3. As a result, this will cause a FATAL message to be thrown in
CheckXLogConsistency after redo, due to inconsistency checking of
the BRIN_EVACUATE_PAGE flag, finally cause redo to abort.4. Therefore, the BRIN_EVACUATE_PAGE flag should be cleared before
CheckXLogConsistency.
+1 for what the patch does. The flag is actually non-logged,
primary-only flag.
The coment looks too verbose. It looks as if saying we evading some
flaw by masking the flag. The inconsistency is caused by that the flag
is changed unlogged and not operated via wal-redo. So, I think it can
be shortened as the following example. (It is taken from btree_mask())
BRIN_EVACUATE_PAGE is an unlogged bit, which is never set during
recovery. See brin_start_evacuating_page() for details.
I think we need the corresponding comment that explains we don't need
to wal-log the bit in the function, too.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
I'd like to explain a bit further.
At Thu, 04 Aug 2022 16:55:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
BRIN_EVACUATE_PAGE is an unlogged bit, which is never set during
recovery. See brin_start_evacuating_page() for details.
That being said, I think that some bug could unexpectedly set the flag
during recovery. I sought for clean way to check that only while not
in evacuation, but I found none..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Thanks, I agree. So my new modified patch is attached.
From: "Kyotaro Horiguchi"<horikyota.ntt@gmail.com>
Date: Thu, Aug 4, 2022, 16:12
Subject: [External] Re: [PATCH] BUG FIX: inconsistent page found in
BRIN_REGULAR_PAGE
To: <wanghaiyang.001@bytedance.com>
Cc: <pgsql-bugs@lists.postgresql.org>
I'd like to explain a bit further. At Thu, 04 Aug 2022 16:55:35 +0900
(JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > >
BRIN_EVACUATE_PAGE is an unlogged bit, which is never set during > >
recovery. See brin_start_evacuating_page() for details. That being said, I
think that some bug could unexpectedly set the flag during recovery. I
sought for clean way to check that only while not in evacuation, but I
found none.. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachments:
0001-mask-un-logged-hint-bit-BRIN_EVACUATE_PAGE-before-co.patchapplication/octet-stream; name=0001-mask-un-logged-hint-bit-BRIN_EVACUATE_PAGE-before-co.patchDownload+13-2
On 2022-Aug-04, 王海洋 wrote:
Thanks, I agree. So my new modified patch is attached.
I propose the attached. This also adds a new TAP test that runs BRIN
through a revmap extension and through wal_consistency_checking. This
increases coverage, as currently there's none for brin_evacuate_page.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Every machine is a smoke machine if you operate it wrong enough."
https://twitter.com/libseybieda/status/1541673325781196801
Nice, benefited a lot, thanks.
Best Regards!
Haiyang Wang
From: "Alvaro Herrera"<alvherre@alvh.no-ip.org>
Date: Fri, Aug 5, 2022, 21:17
Subject: Re: [External] Re: [PATCH] BUG FIX: inconsistent page found in
BRIN_REGULAR_PAGE
To: "王海洋"<wanghaiyang.001@bytedance.com>
Cc: "Kyotaro Horiguchi"<horikyota.ntt@gmail.com>, <
pgsql-bugs@lists.postgresql.org>
On 2022-Aug-04, 王海洋 wrote: > Thanks, I agree. So my new modified patch is
attached. I propose the attached. This also adds a new TAP test that runs
BRIN through a revmap extension and through wal_consistency_checking. This
increases coverage, as currently there's none for brin_evacuate_page. --
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Every
machine is a smoke machine if you operate it wrong enough."
https://twitter.com/libseybieda/status/1541673325781196801
On 2022-Aug-05, 王海洋 wrote:
Nice, benefited a lot, thanks.
Thanks, pushed to all branches. I ran the test individually in each
branch and they all pass locally, though I'll wait to see what does th
buildfarm have to say about it.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Hi Alvaro!
Is there any special reason to read WAL records until the last inserted
record?
+my $end_lsn = $whiskey->lsn('insert');
+
+my ($ret, $out, $err) = $whiskey->psql(
+ 'postgres', qq{
+ select count(*) from pg_get_wal_records_info('$start_lsn', '$end_lsn')
+ where resource_manager = 'BRIN' AND
+ record_type ILIKE '%revmap%'
+ });
+cmp_ok($out, '>=', 1);
It seems that in some rare situations on slower machines this test can
fail. If
any background process inserts a WAL record before lsn('insert') and
this record
isn't flushed before pg_get_wal_records_info('$start_lsn', '$end_lsn'),
pg_get_wal_records_info('$start_lsn', '$end_lsn') ends with ERROR
"cannot accept
future end LSN" as it works only if record with end LSN is inserted.
I attached two patches with two ways of fixing this minor issue.
Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
Attachments:
v1-0001-prevent-test-from-failing-on-slower-machines.patchtext/x-patch; charset=UTF-8; name=v1-0001-prevent-test-from-failing-on-slower-machines.patchDownload+1-2
v2-0001-prevent-test-from-failing-on-slower-machines.patchtext/x-patch; charset=UTF-8; name=v2-0001-prevent-test-from-failing-on-slower-machines.patchDownload+1-3
Hello,
On 2022-Dec-23, Karina Litskevich wrote:
Is there any special reason to read WAL records until the last inserted
record?
Hmm, no, there's no good reason to do it that way.
It seems that in some rare situations on slower machines this test can fail.
If any background process inserts a WAL record before lsn('insert')
and this record isn't flushed before
pg_get_wal_records_info('$start_lsn', '$end_lsn'),
pg_get_wal_records_info('$start_lsn', '$end_lsn') ends with ERROR
"cannot accept future end LSN" as it works only if record with end LSN
is inserted.
Hmm. I've never seen that, but it sounds plausible.
I attached two patches with two ways of fixing this minor issue.
-my $end_lsn = $whiskey->lsn('insert'); +my $end_lsn = $whiskey->lsn('flush');
Thank you, I used this one.
Regards
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Evelyn,
I ran the query that runs when you click on Check Invoices that returns
the invTotal and TotDetailAmt and here's what came up. As you will see,
the invTotal amounts from your list don't match the production data and
most of the amounts in TotDetailAmt don't match production data. Where
did your list come from? Is this an old list from the other day?
I have turned on the invTotal field in both invoice forms so you can
compare the Total Invoice amount that is calculated from the line items
with the invTotal amount that is saved to the database. This will allow
you to see what is being saved at the time of creating the invoice.
*/Patrick Headley/*
Linx Consulting, Inc.
(303) 916-5522
pheadley@linxco-inc.com
www.linxco-inc.com
Show quoted text
On 12/23/22 9:35 AM, Alvaro Herrera wrote:
Hello,
On 2022-Dec-23, Karina Litskevich wrote:
Is there any special reason to read WAL records until the last inserted
record?Hmm, no, there's no good reason to do it that way.
It seems that in some rare situations on slower machines this test can fail.
If any background process inserts a WAL record before lsn('insert')
and this record isn't flushed before
pg_get_wal_records_info('$start_lsn', '$end_lsn'),
pg_get_wal_records_info('$start_lsn', '$end_lsn') ends with ERROR
"cannot accept future end LSN" as it works only if record with end LSN
is inserted.Hmm. I've never seen that, but it sounds plausible.
I attached two patches with two ways of fixing this minor issue. -my $end_lsn = $whiskey->lsn('insert'); +my $end_lsn = $whiskey->lsn('flush');Thank you, I used this one.
Regards