BUG #17268: Possible corruption in toast index after reindex index concurrently

Started by PG Bug reporting formover 4 years ago28 messagesbugs

noreply@postgresql.org

over 4 years ago

The following bug has been logged on the website:

Bug reference: 17268
Logged by: Alexey Ermakov
Email address: alexey.ermakov@dataegret.com
PostgreSQL version: 13.4
Operating system: Ubuntu 18.04.5 LTS
Description:

Hello,

after running this query:
2021-11-02 13:15:45.469 UTC 2379 postgres@xx from [local] [vxid:24/0 txid:0]
[REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX INDEX
CONCURRENTLY pg_toast.pg_toast_2624976286_index

we've got following errors (here is first one, others looks exactly the same
with same "toast value"):
2021-11-02 20:37:12.305 UTC 27935 yy@xx from [local] [vxid:41/199242673
txid:0] [SELECT] ERROR: missing chunk number 0 for toast value 4040061139
in pg_toast_2624976286

it turns out one entry in toast index (?) was corrupted:
select md5(body) from zz where id = ...;
ERROR: missing chunk number 0 for toast value 4040061139 in
pg_toast_2624976286

According to "created_at" column in linked table row was created at
"2021-11-02 13:04:22.192125", i.e. during reindex concurrently.

I'm wondering if it's known bug and how risky could it be to reindex toast's
indexes. It was done automatically with tool which monitors indexes' bloat
and index size reduced several times in this case.

It's possible to get additional details from backup and WAL archive but
it'll take some time to restore (row with corrupted data was removed).

--
Thanks,
Alexey Ermakov

Peter Geoghegan

pg@bowt.ie

over 4 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Wed, Nov 3, 2021 at 3:26 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

it turns out one entry in toast index (?) was corrupted:
select md5(body) from zz where id = ...;
ERROR: missing chunk number 0 for toast value 4040061139 in
pg_toast_2624976286

According to "created_at" column in linked table row was created at
"2021-11-02 13:04:22.192125", i.e. during reindex concurrently.

I wonder if it's a coincidence that that number (~4.04 billion) is not
that far from 2^32-1 (~4.294 billion).

Can you run amcheck? Perhaps the output of the following will be interesting:

create extension amcheck;
set client_min_messages=debug1;
select bt_index_check('pg_toast.pg_toast_2624976286_index', true);

(Couldn't hurt to try it, at least.)

I'm wondering if it's known bug and how risky could it be to reindex toast's
indexes. It was done automatically with tool which monitors indexes' bloat
and index size reduced several times in this case.

If I had to guess, I'd guess that this is a new and unknown bug.

--
Peter Geoghegan

Maxim Boguk

maxim.boguk@gmail.com

over 4 years ago

In reply to: Peter Geoghegan (#2)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Thu, Nov 4, 2021 at 3:07 AM Peter Geoghegan <pg@bowt.ie> wrote:

On Wed, Nov 3, 2021 at 3:26 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

it turns out one entry in toast index (?) was corrupted:
select md5(body) from zz where id = ...;
ERROR: missing chunk number 0 for toast value 4040061139 in
pg_toast_2624976286

According to "created_at" column in linked table row was created at
"2021-11-02 13:04:22.192125", i.e. during reindex concurrently.

I wonder if it's a coincidence that that number (~4.04 billion) is not
that far from 2^32-1 (~4.294 billion).

Can you run amcheck? Perhaps the output of the following will be interesting:

create extension amcheck;
set client_min_messages=debug1;
select bt_index_check('pg_toast.pg_toast_2624976286_index', true);

(Couldn't hurt to try it, at least.)

Hi Peter,

Please see a result of amcheck

select bt_index_check('pg_toast.pg_toast_2624976286_index', true);
DEBUG: verifying consistency of tree structure for index
"pg_toast_2624976286_index"
DEBUG: verifying level 3 (true root level)
DEBUG: verifying level 2
DEBUG: verifying level 1
DEBUG: verifying level 0 (leaf level)
DEBUG: leaf block 715360 of index "pg_toast_2624976286_index" has no
first data item
DEBUG: verifying that tuples from index "pg_toast_2624976286_index"
are present in "pg_toast_2624976286"
ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks
matching index tuple within index "pg_toast_2624976286_index"
HINT: Retrying verification using the function
bt_index_parent_check() might provide a more specific error.

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему
больно когда я так делаю ещё раз?"

Peter Geoghegan

pg@bowt.ie

over 4 years ago

In reply to: Maxim Boguk (#3)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Thu, Nov 4, 2021 at 11:08 AM Maxim Boguk <maxim.boguk@gmail.com> wrote:

select bt_index_check('pg_toast.pg_toast_2624976286_index', true);
DEBUG: verifying consistency of tree structure for index
"pg_toast_2624976286_index"
DEBUG: verifying level 3 (true root level)
DEBUG: verifying level 2
DEBUG: verifying level 1
DEBUG: verifying level 0 (leaf level)
DEBUG: leaf block 715360 of index "pg_toast_2624976286_index" has no
first data item
DEBUG: verifying that tuples from index "pg_toast_2624976286_index"
are present in "pg_toast_2624976286"
ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks
matching index tuple within index "pg_toast_2624976286_index"
HINT: Retrying verification using the function
bt_index_parent_check() might provide a more specific error.

That's an unusually large TOAST table. It's at least ~454.42GiB, based
on this error. Is the block number 59561917 near the end of the table?

Structural verification of the B-Tree itself doesn't throw any errors
here. This leaves little doubt that the problem is that the index is
missing entries that it really should have. That's not really new
information, though -- I was pretty sure of that much already.

--
Peter Geoghegan

Maxim Boguk

maxim.boguk@gmail.com

over 4 years ago

In reply to: Peter Geoghegan (#4)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Thu, Nov 4, 2021 at 8:18 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Nov 4, 2021 at 11:08 AM Maxim Boguk <maxim.boguk@gmail.com> wrote:

select bt_index_check('pg_toast.pg_toast_2624976286_index', true);
DEBUG: verifying consistency of tree structure for index
"pg_toast_2624976286_index"
DEBUG: verifying level 3 (true root level)
DEBUG: verifying level 2
DEBUG: verifying level 1
DEBUG: verifying level 0 (leaf level)
DEBUG: leaf block 715360 of index "pg_toast_2624976286_index" has no
first data item
DEBUG: verifying that tuples from index "pg_toast_2624976286_index"
are present in "pg_toast_2624976286"
ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks
matching index tuple within index "pg_toast_2624976286_index"
HINT: Retrying verification using the function
bt_index_parent_check() might provide a more specific error.

That's an unusually large TOAST table. It's at least ~454.42GiB, based
on this error. Is the block number 59561917 near the end of the table?

select pg_size_pretty(pg_relation_size('pg_toast.pg_toast_2624976286'));
pg_size_pretty
----------------
473 GB
now... and yes during the time of error page 59561917 was very close
to the end of the table.
There was a high chance (but not 100%) that the corresponding main
table entry had been inserted during reindex CONCURRENTLY of the toast
index run.

We have base backup and wal archive so theoretically it's possible to
restore sequence of writes which lead to error,
but given huge size of relation in interest (and even bigger size of
whole database 10+TB) and large amount of writes it's a complicated
task (especially when I not really sure what exactly to look for in
waldump output).

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему
больно когда я так делаю ещё раз?"

Maxim Boguk

maxim.boguk@gmail.com

over 4 years ago

In reply to: Maxim Boguk (#5)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Thu, Nov 4, 2021 at 9:47 PM Maxim Boguk <maxim.boguk@gmail.com> wrote:

On Thu, Nov 4, 2021 at 8:18 PM Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Nov 4, 2021 at 11:08 AM Maxim Boguk <maxim.boguk@gmail.com> wrote:

select bt_index_check('pg_toast.pg_toast_2624976286_index', true);
DEBUG: verifying consistency of tree structure for index
"pg_toast_2624976286_index"
DEBUG: verifying level 3 (true root level)
DEBUG: verifying level 2
DEBUG: verifying level 1
DEBUG: verifying level 0 (leaf level)
DEBUG: leaf block 715360 of index "pg_toast_2624976286_index" has no
first data item
DEBUG: verifying that tuples from index "pg_toast_2624976286_index"
are present in "pg_toast_2624976286"
ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks
matching index tuple within index "pg_toast_2624976286_index"
HINT: Retrying verification using the function
bt_index_parent_check() might provide a more specific error.

That's an unusually large TOAST table. It's at least ~454.42GiB, based
on this error. Is the block number 59561917 near the end of the table?

select pg_size_pretty(pg_relation_size('pg_toast.pg_toast_2624976286'));
pg_size_pretty
----------------
473 GB
now... and yes during the time of error page 59561917 was very close
to the end of the table.
There was a high chance (but not 100%) that the corresponding main
table entry had been inserted during reindex CONCURRENTLY of the toast
index run.

UPDATE: the problem base table entry which had erroneous toast data
definitely had been created during the REINDEX run of toast idx.

2021-11-02 13:15:45.469 UTC 2379 postgres@*** from [local] [vxid:24/0
txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX
INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index
and the problem entry had been created at 2021-11-02 13:04:22.192125 UTC.

So there seems some subtle bug with indexing new toast data during
REINDEX INDEX CONCURRENTLY of the toast index.
--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему
больно когда я так делаю ещё раз?"

Peter Geoghegan

pg@bowt.ie

over 4 years ago

In reply to: Maxim Boguk (#5)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Thu, Nov 4, 2021 at 12:47 PM Maxim Boguk <maxim.boguk@gmail.com> wrote:

now... and yes during the time of error page 59561917 was very close
to the end of the table.
There was a high chance (but not 100%) that the corresponding main
table entry had been inserted during reindex CONCURRENTLY of the toast
index run.

It might be useful if you located the leaf page that the missing index
tuple is supposed to be on. It's possible that there is a recognizable
pattern. If you knew the block number of the relevant leaf page in the
index already, then you could easily dump the relevant page to a small
file, and share it with us here. The usual procedure is described
here:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#contrib.2Fpageinspect_page_dump

The tricky part is figuring out which block number is the one of
interest. I can't think of any easy way of doing that in a production
database. The easiest approach I can think of is to use
pg_buffercache. Restart the database (or more likely an instance of
the database that has the problem, but isn't the live production
database). Then write a query like this:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM pg_toast_2624976286 WHERE
chunk_id = 4040061139;

(The BUFFERS stuff is to verify that you got buffer hits.)

Next query the blocks that you see in pg_buffercache:

SELECT relblocknumber FROM pg_buffercache WHERE relfilenode =
pg_relation_filenode('pg_toast.pg_toast_2624976286_index');

Finally, note down the block numbers that the query returns. There
will probably be 3 or 4. Just send me any that are non-0 (that's just
the metapage). I only really care about the leaf page, but I can
figure that part out myself when I have the pages you access.

(It's a pity there isn't a less cumbersome procedure here.)

--
Peter Geoghegan

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Maxim Boguk (#6)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hi,

On November 4, 2021 1:01:44 PM PDT, Maxim Boguk <maxim.boguk@gmail.com> wrote:

UPDATE: the problem base table entry which had erroneous toast data
definitely had been created during the REINDEX run of toast idx.

Do you have any details about the transaction / application that inserted the data?

2021-11-02 13:15:45.469 UTC 2379 postgres@*** from [local] [vxid:24/0
txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX
INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index
and the problem entry had been created at 2021-11-02 13:04:22.192125 UTC.

So there seems some subtle bug with indexing new toast data during
REINDEX INDEX CONCURRENTLY of the toast index.

Any chance you're using prepared transactions?

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Maxim Boguk

maxim.boguk@gmail.com

over 4 years ago

In reply to: Andres Freund (#8)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Sun, Nov 7, 2021 at 2:07 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On November 4, 2021 1:01:44 PM PDT, Maxim Boguk <maxim.boguk@gmail.com> wrote:

UPDATE: the problem base table entry which had erroneous toast data
definitely had been created during the REINDEX run of toast idx.

Do you have any details about the transaction / application that inserted the data?

Well... jdbc with server side prepared statements off (pgbouncer).

2021-11-02 13:15:45.469 UTC 2379 postgres@*** from [local] [vxid:24/0
txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX
INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index
and the problem entry had been created at 2021-11-02 13:04:22.192125 UTC.

So there seems some subtle bug with indexing new toast data during
REINDEX INDEX CONCURRENTLY of the toast index.

Any chance you're using prepared transactions?

Most definitely no, prepared TS are disabled server-wide.
postgres=# show max_prepared_transactions ;
max_prepared_transactions
---------------------------
0

--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone RU: +7 985 433 0000
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk

"Доктор, вы мне советовали так не делать, но почему мне по-прежнему
больно когда я так делаю ещё раз?"

#10

Alexey Ermakov

alexey.ermakov@dataegret.com

over 4 years ago

In reply to: Maxim Boguk (#9)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hello,

On 2021-11-08 02:29, Maxim Boguk wrote:

On Sun, Nov 7, 2021 at 2:07 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On November 4, 2021 1:01:44 PM PDT, Maxim Boguk <maxim.boguk@gmail.com> wrote:

UPDATE: the problem base table entry which had erroneous toast data
definitely had been created during the REINDEX run of toast idx.

Do you have any details about the transaction / application that inserted the data?

Well... jdbc with server side prepared statements off (pgbouncer).

I was told that application inserts rows into given table with queries like:

INSERT INTO zz (id, body) VALUES (?, ?) ON CONFLICT (id) DO NOTHING

inside transaction. It's possible to attempt to insert row with same id
several times inside one transaction (in that case usually that id is
new one i.e. was not presented in the table before that transaction).

Hope it could help in some way.

Thanks,

Alexey Ermakov

#11

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Maxim Boguk (#3)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hi,

On 2021-11-04 20:07:28 +0200, Maxim Boguk wrote:

ERROR: heap tuple (59561917,1) from table "pg_toast_2624976286" lacks
matching index tuple within index "pg_toast_2624976286_index"
HINT: Retrying verification using the function
bt_index_parent_check() might provide a more specific error.

Could you use https://www.postgresql.org/docs/devel/pageinspect.html
to get the heap items for this page? Best after verifying that that ctid still
shows an amcheck error. Something like

SELECT * FROM page_header(get_raw_page('pg_toast_2624976286', 59561917));
SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, mask.raw_flags, mask.combined_flags, t_hoff, t_bits
FROM heap_page_items(get_raw_page('pg_toast_2624976286', 59561917)),
heap_tuple_infomask_flags(t_infomask, t_infomask2) AS mask;

Do you have WAL archiving or such set up? If you still have the WAL from that
time it'd be helpful.

Greetings,

Andres Freund

#12

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Maxim Boguk (#6)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hi,

On 2021-11-04 22:01:44 +0200, Maxim Boguk wrote:

2021-11-02 13:15:45.469 UTC 2379 postgres@*** from [local] [vxid:24/0
txid:0] [REINDEX] LOG: duration: 1719120.441 ms statement: REINDEX
INDEX CONCURRENTLY pg_toast.pg_toast_2624976286_index
and the problem entry had been created at 2021-11-02 13:04:22.192125 UTC.

Another question on this. Are you running multiple reindexes concurrently? Other
maintenance commands?

Greetings,

Andres Freund

#13

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Alexey Ermakov (#10)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hi,

On 2021-11-08 17:04:40 +0600, Alexey Ermakov wrote:

I was told that application inserts rows into given table with queries like:

INSERT INTO zz (id, body) VALUES (?, ?) ON CONFLICT (id) DO NOTHING

inside transaction. It's possible to attempt to insert row with same id
several times inside one transaction (in that case usually that id is new
one i.e. was not presented in the table before that transaction).

Hope it could help in some way.

It did. I've managed to reproduce an issue like this. I'm still narrowing down
the reproducer, but here's what I have so far:

DROP TABLE IF EXISTS wide; CREATE TABLE wide(id serial primary key, wide text);
pgbench -n -Mprepared -T1000 -f ~/tmp/insert_cci_bug.sql -c 16 -j 16 postgres

in concurrent psql
SELECT indrelid::regclass, indexrelid::regclass, bt_index_parent_check(indexrelid::regclass, true) FROM pg_index WHERE indrelid = (SELECT reltoastrelid FROM pg_class WHERE oid = 'wide'::regclass);
REINDEX INDEX CONCURRENTLY pg_toast.pg_toast_331092_index;

usually after 1-3 iterations the index is corrupt. Reproduces both in 13 and
HEAD.

Greetings,

Andres Freund

#14

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Andres Freund (#13)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hi,

On 2021-11-08 10:36:46 -0800, Andres Freund wrote:

DROP TABLE IF EXISTS wide; CREATE TABLE wide(id serial primary key, wide text);
pgbench -n -Mprepared -T1000 -f ~/tmp/insert_cci_bug.sql -c 16 -j 16 postgres

in concurrent psql
SELECT indrelid::regclass, indexrelid::regclass, bt_index_parent_check(indexrelid::regclass, true) FROM pg_index WHERE indrelid = (SELECT reltoastrelid FROM pg_class WHERE oid = 'wide'::regclass);
REINDEX INDEX CONCURRENTLY pg_toast.pg_toast_331092_index;

usually after 1-3 iterations the index is corrupt. Reproduces both in 13 and
HEAD.

Oh, wow (or ugh). It narrows down quite a bit. A single pgbench session
running
INSERT INTO wide(wide) SELECT string_agg(g.i::text || random()::text, '') FROM generate_series(1, 500) g(i);

triggers the issue. Sometimes takes quite a few iterations of REINDEX INDEX
CONCURRENTLY, but...

Greetings,

Andres Freund

PS:
to trigger the reindex / validate I use

cat /tmp/ri.sql
\i /tmp/ri.sql
SELECT 'REINDEX INDEX CONCURRENTLY '||indexrelid::regclass FROM pg_index WHERE indrelid = (SELECT reltoastrelid FROM pg_class WHERE oid = 'wide'::regclass) \gexec
SELECT indrelid::regclass, indexrelid::regclass, bt_index_parent_check(indexrelid::regclass, true) FROM pg_index WHERE indrelid = (SELECT reltoastrelid FROM pg_class WHERE oid = 'wide'::regclass);

#15

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Andres Freund (#14)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hi,

On 2021-11-08 10:53:17 -0800, Andres Freund wrote:

Oh, wow (or ugh). It narrows down quite a bit. A single pgbench session
running
INSERT INTO wide(wide) SELECT string_agg(g.i::text || random()::text, '') FROM generate_series(1, 500) g(i);

triggers the issue. Sometimes takes quite a few iterations of REINDEX INDEX
CONCURRENTLY, but...

After several wrong turns, I found the cause of the problem:

The problem is that toast_save_datum() releases the lock on the toast relation
directly after inserting. That breaks the interlock between
WaitForLockersMultiple() in ReindexRelationConcurrently() and the "toast
inserter". The comments in ReindexRelationConcurrently() don't meaningfully
explain the need for that interlock, but the equivalent case in DefineIndex()
does:

/*
* Phase 2 of concurrent index build (see comments for validate_index()
* for an overview of how this works)
*
* Now we must wait until no running transaction could have the table open
* with the old list of indexes. Use ShareLock to consider running
* transactions that hold locks that permit writing to the table. Note we
* do not need to worry about xacts that open the table for writing after
* this point; they will see the new index when they open it.
*
* Note: the reason we use actual lock acquisition here, rather than just
* checking the ProcArray and sleeping, is that deadlock is possible if
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
*/
WaitForLockers(heaplocktag, ShareLock, true);

The problem that this causes is that the toast-inserting backend will not
necessarily have seen the new index at this point (since there was no need to
process relcache invalidations), but the scan below validate_index() will not
necessarily find new tuples inserted after the start of the scan (particularly
if they're in a new portion of the table).

I verified that if I change the table_close() in toast_save_datum() to not
release the lock the problem doesn't happen anymore.

This problem doesn't exist for CREATE INDEX CONCURRENTLY afaict, because we
don't allow that on system tables, including toast tables. But we *do* reindex
toast indexes concurrently.

One possible way to fix this would be to make ReindexRelationConcurrently()
acquire a lock on the underlying table when reindexing a toast table. Another
to not release the lock in toast_save_datum().

Greetings,

Andres Freund

#16

Noah Misch

noah@leadboat.com

over 4 years ago

In reply to: Andres Freund (#15)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote:

One possible way to fix this would be to make ReindexRelationConcurrently()
acquire a lock on the underlying table when reindexing a toast table. Another
to not release the lock in toast_save_datum().

The latter is more future-proof. Does it have material disadvantages?

#17

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Noah Misch (#16)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hi,

On November 8, 2021 7:56:24 PM PST, Noah Misch <noah@leadboat.com> wrote:

On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote:

One possible way to fix this would be to make ReindexRelationConcurrently()
acquire a lock on the underlying table when reindexing a toast table. Another
to not release the lock in toast_save_datum().

The latter is more future-proof. Does it have material disadvantages?

I don't immediately see any. But I've been long of the opinion, and had plenty discussions around it, that our habit of releasing locks early is far too widely used.

I do however wonder if there's other path to the problem, besides saving toast datums. We also release those locks early in other places, and while e.g. r/o locks won't cause a problem with this specific interlock, it could cause problem around dropping the relation, for example.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#18

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Andres Freund (#17)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Mon, Nov 08, 2021 at 08:37:58PM -0800, Andres Freund wrote:

On November 8, 2021 7:56:24 PM PST, Noah Misch <noah@leadboat.com> wrote:

On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote:

One possible way to fix this would be to make ReindexRelationConcurrently()
acquire a lock on the underlying table when reindexing a toast table. Another
to not release the lock in toast_save_datum().

Thanks for the test case. That reproduces really quickly.

The latter is more future-proof. Does it have material disadvantages?

I don't immediately see any. But I've been long of the opinion, and
had plenty discussions around it, that our habit of releasing locks
early is far too widely used.

Yes, I'd agree that not patching the reindex concurrent path would be
safer in the long run. This feels a bit like e629a01, in spirit, not
in scope.

I do however wonder if there's other path to the problem, besides
saving toast datums. We also release those locks early in other
places, and while e.g. r/o locks won't cause a problem with this
specific interlock, it could cause problem around dropping the
relation, for example.

Hmm. Perhaps there could be some latent issue around
toast_delete_datum()?
--
Michael

#19

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Michael Paquier (#18)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

Hi,

On 2021-11-09 14:02:19 +0900, Michael Paquier wrote:

On Mon, Nov 08, 2021 at 08:37:58PM -0800, Andres Freund wrote:

On November 8, 2021 7:56:24 PM PST, Noah Misch <noah@leadboat.com> wrote:

On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote:

One possible way to fix this would be to make ReindexRelationConcurrently()
acquire a lock on the underlying table when reindexing a toast table. Another
to not release the lock in toast_save_datum().

Thanks for the test case. That reproduces really quickly.

The latter is more future-proof. Does it have material disadvantages?

I don't immediately see any. But I've been long of the opinion, and
had plenty discussions around it, that our habit of releasing locks
early is far too widely used.

Yes, I'd agree that not patching the reindex concurrent path would be
safer in the long run. This feels a bit like e629a01, in spirit, not
in scope.

I wonder if we should do both...

Greetings,

Andres Freund

#20

Noah Misch

noah@leadboat.com

over 4 years ago

In reply to: Andres Freund (#19)

Re: BUG #17268: Possible corruption in toast index after reindex index concurrently

On Mon, Nov 08, 2021 at 09:08:48PM -0800, Andres Freund wrote:

On 2021-11-09 14:02:19 +0900, Michael Paquier wrote:

On Mon, Nov 08, 2021 at 08:37:58PM -0800, Andres Freund wrote:

On November 8, 2021 7:56:24 PM PST, Noah Misch <noah@leadboat.com> wrote:

On Mon, Nov 08, 2021 at 12:36:41PM -0800, Andres Freund wrote:

One possible way to fix this would be to make ReindexRelationConcurrently()
acquire a lock on the underlying table when reindexing a toast table. Another
to not release the lock in toast_save_datum().

The latter is more future-proof. Does it have material disadvantages?

I don't immediately see any. But I've been long of the opinion, and
had plenty discussions around it, that our habit of releasing locks
early is far too widely used.

Yes, I'd agree that not patching the reindex concurrent path would be
safer in the long run. This feels a bit like e629a01, in spirit, not
in scope.

I wonder if we should do both...

Doing both sounds harmless. Regarding REINDEX CONCURRENTLY on a system
catalog, I bet that can still reach bugs even if we do both, considering this:

$ git grep 'table_close.*, RowExclusiveLock' | wc -l
288

#21

Andres Freund

andres@anarazel.de

over 4 years ago

In reply to: Noah Misch (#20)

#22

Noah Misch

noah@leadboat.com

over 4 years ago

In reply to: Andres Freund (#21)