BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY

Started by PG Bug reporting formover 3 years ago5 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17568
Logged by: Sergei Kornilov
Email address: sk@zsrv.org
PostgreSQL version: 14.4
Operating system: Ubuntu 20.04
Description:

Hello
I recently ran "REINDEX INDEX CONCURRENTLY i_sess_uuid;" (pg14.4, table
around 700gb), but suddenly, after the start of phase "index validation:
scanning index", the insert and update operations started returning an
error:

ERROR: index "i_sess_uuid_ccnew" contains unexpected zero page at block 0
HINT: Please REINDEX it.

i_sess_uuid_ccnew is exactly the new index that builds reindex concurrently
at this time. It is clear that the errors started after
index_set_state_flags INDEX_CREATE_SET_READY, because insert and update
queries now need to update this index too. But it remains unclear how
exactly page 0 turned out to be all zeros at this point.

I think some process may have loaded btree metapage (page 0) into shared
buffers prior the end of _bt_load. In this case, the error is reproduced
(14.4, 14 STABLE, HEAD):

create extension pageinspect;
create table test as select generate_series(1,1e4) as id;
create index test_id_idx on test(id);
# prepare gdb for this backend with breakpoint on _bt_uppershutdown
reindex index concurrently test_id_idx ;

While gdb is stopped on breakpoint run from second session:

insert into test values (0);
SELECT * FROM bt_metap('test_id_idx_ccnew');
-[ RECORD 1 ]-------------+---
magic | 0
version | 0
root | 0
level | 0
fastroot | 0
fastlevel | 0
last_cleanup_num_delpages | 0
last_cleanup_num_tuples | -1
allequalimage | f

Then continue reindex backend. New inserts along with reindex itself will
give error "index "test_id_idx_ccnew" contains unexpected zero page at block
0". The metapage on disk after _bt_uppershutdown call will be written
correctly and correctly replicated to standby. But it is still erroneous in
shared buffers on primary.

I still don't know if this is what happened to my base. Monitoring requests
(like pg_total_relation_size, pg_stat_user_indexes, pg_statio_user_indexes)
do not load metapage into shared buffers. Normal select/insert/update/delete
should not touch in any way not ready index. This database does not have any
extensions installed other than those available in contrib.

Thoughts?

regards, Sergei

In reply to: PG Bug reporting form (#1)
Re:BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY

Hello
Luckily, I found a call of bt_metap function in one query that was definitely called while reindex was running and before these errors started to appear.
Accidentally pilot error.

It would be nice to protect shared buffers from such premature page loading, but it's probably not going to work without performance penalty.

regards, Sergei

#3Andres Freund
andres@anarazel.de
In reply to: PG Bug reporting form (#1)
Re: BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY

Hi,

On 2022-08-03 14:23:34 +0000, PG Bug reporting form wrote:

I think some process may have loaded btree metapage (page 0) into shared
buffers prior the end of _bt_load. In this case, the error is reproduced
(14.4, 14 STABLE, HEAD):

create extension pageinspect;
create table test as select generate_series(1,1e4) as id;
create index test_id_idx on test(id);
# prepare gdb for this backend with breakpoint on _bt_uppershutdown
reindex index concurrently test_id_idx ;

Worth noting that this doesn't even require reindex concurrently, it's also an
issue for CIC.

The problem basically is that once the first non-meta page from the btree is
written (e.g. _bt_blwritepage() calling smgrextend()), concurrent sessions can
read in the metapage (and potentially other pages that are also zero filled)
into shared_buffers. at the end of _bt_uppershutdown() we'll write the
metapage to disk, bypassing shared buffers. And boom, the all-zeroes version
read into memory earlier is suddenly out of date.

The easiest fix is likely to force all buffers to be forgotten at the end of
index_concurrently_build() or such. I don't immediately see a nicer way to fix
this, we can't just lock the new index relation exclusively.

We could of course also stop bypassing s_b for CIC/RIC, but that seems mighty
invasive for a bugfix.

Greetings,

Andres Freund

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#3)
Re: BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY

Andres Freund <andres@anarazel.de> writes:

The easiest fix is likely to force all buffers to be forgotten at the end of
index_concurrently_build() or such.

Race conditions there ...

I don't immediately see a nicer way to fix
this, we can't just lock the new index relation exclusively.

Why not? If the index isn't valid yet, other backends have zero
business touching it. I'd think about taking an exclusive lock
to start with, and releasing it (downgrading to a non-exclusive
lock) once the index is valid enough that other backends can
access it, which would be just before we set pg_index.indisready
to true.

Basically, this is to enforce the previously-implicit contract
that other sessions won't touch the index too soon against
careless superusers.

regards, tom lane

#5Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#4)
Re: BUG #17568: unexpected zero page at block 0 during REINDEX CONCURRENTLY

Hi,

On 2022-08-15 19:56:40 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

The easiest fix is likely to force all buffers to be forgotten at the end of
index_concurrently_build() or such.

Race conditions there ...

Not immediately seeing it? New reads from disk will read valid data.

But I agree, it's a shitty approach.

I don't immediately see a nicer way to fix
this, we can't just lock the new index relation exclusively.

Why not? If the index isn't valid yet, other backends have zero
business touching it. I'd think about taking an exclusive lock
to start with, and releasing it (downgrading to a non-exclusive
lock) once the index is valid enough that other backends can
access it, which would be just before we set pg_index.indisready
to true.

I'm afraid we'd start blocking in quite a few places, both inside and outside
of core PG. E.g. ExecOpenIndices(), ExecInitPartitionInfo(),
calculate_toast_table_size(), ... will open all indislive indexes, even if not
indisready.

Basically, this is to enforce the previously-implicit contract
that other sessions won't touch the index too soon against
careless superusers.

I suspect this isn't restricted to superusers, fwiw. E.g. pg_prewarm doesn't()
require superuser.

Greetings,

Andres Freund