ERROR: posting list tuple with 20 items cannot be split at offset 168

Started by Herman verschootenover 4 years ago10 messagesbugs
Jump to latest
#1Herman verschooten
Herman@verschooten.net

Hi,

I upgraded postgres 13 to 14+231.pgdg18.04+1 during the weekend, virtual ubuntu 20.04.3.

this morning a user reported an error when trying to change a flag in our web app (Phoenix, ecto).
I traced it back to an index issue on a boolean column.

This is the table:

tranman_production=# \d freights
Table "public.freights"
Column | Type | Collation | Nullable | Default
--------------+-----------------------------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('freights_id_seq'::regclass)
order_id | integer | | |
product_id | integer | | |
amount | integer | | |
reference_1 | character varying | | |
reference_2 | character varying | | |
reference_3 | character varying | | |
reference_4 | character varying | | |
status | integer | | | 0
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
cmr | bigint | | |
amount_out | integer | | not null | 0
paused | boolean | | | false
cmr_received | boolean | | | false
Indexes:
"freights_pkey" PRIMARY KEY, btree (id)
"index_freights_on_cmr" btree (cmr)
"index_freights_on_cmr_received" btree (cmr_received)
"index_freights_on_order_id" btree (order_id)
"index_freights_on_product_id" btree (product_id)
"index_freights_on_reference_1" btree (reference_1)
"index_freights_on_reference_2" btree (reference_2)
"index_freights_on_reference_3" btree (reference_3)
"index_freights_on_reference_4" btree (reference_4)
"index_freights_on_status" btree (status)
Foreign-key constraints:
"fk_rails_21fdf332ee" FOREIGN KEY (product_id) REFERENCES products(id)
"fk_rails_af9f7b0831" FOREIGN KEY (order_id) REFERENCES orders(id)
Referenced by:
TABLE "freight_docs" CONSTRAINT "fk_rails_10448cfa4f" FOREIGN KEY (freight_id) REFERENCES freights(id)
TABLE "freight_events" CONSTRAINT "fk_rails_20c83ae774" FOREIGN KEY (freight_id) REFERENCES freights(id)
TABLE "rides" CONSTRAINT "fk_rails_a50483d9c3" FOREIGN KEY (freight_id) REFERENCES freights(id)

And this is what happens:

tranman_production=# update freights set cmr_received=false where id=49632;
ERROR: XX000: posting list tuple with 20 items cannot be split at offset 168
LOCATION: _bt_swap_posting, nbtdedup.c:1037

If I drop the index index_freights_on_cmr_received, then the update succeeds.

If I recreate the index, the error resurfaces.

Note that it does not happen for every row in the database.

Sincerely,

Herman verschooten

#2Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Herman verschooten (#1)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

On Mon, Oct 25, 2021 at 11:58:48AM +0200, Herman verschooten wrote:
Hi,

I upgraded postgres 13 to 14+231.pgdg18.04+1 during the weekend, virtual ubuntu 20.04.3.

this morning a user reported an error when trying to change a flag in our web app (Phoenix, ecto).
I traced it back to an index issue on a boolean column.

Hi,

thanks for reporting the issue.

tranman_production=# update freights set cmr_received=false where id=49632;
ERROR: XX000: posting list tuple with 20 items cannot be split at offset 168
LOCATION: _bt_swap_posting, nbtdedup.c:1037

If I drop the index index_freights_on_cmr_received, then the update succeeds.

If I recreate the index, the error resurfaces.

Note that it does not happen for every row in the database.

Interesting, during posting list split the postingoff is bigger than the
number of elements in the posting list. Have you tried to run amcheck
[1]: https://www.postgresql.org/docs/13/amcheck.html
important, as the index recreation doesn't help. Otherwise, is there a
chance to use pageinspect [2]https://www.postgresql.org/docs/13/pageinspect.html to take a look at the structure if the
index?

[1]: https://www.postgresql.org/docs/13/amcheck.html
[2]: https://www.postgresql.org/docs/13/pageinspect.html

#3Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Herman verschooten (#1)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

On Mon, Oct 25, 2021 at 02:41:43PM +0200, Herman verschooten wrote:
I made a mistake apparently on checking the wrong thing.
someone on slack helped me.

tranman_production=# SELECT bt_index_check(c.oid), c.relname, c.relpages
tranman_production-# FROM pg_index i
tranman_production-# JOIN pg_opclass op ON i.indclass[0] = op.oid
tranman_production-# JOIN pg_am am ON op.opcmethod = am.oid
tranman_production-# JOIN pg_class c ON i.indexrelid = c.oid
tranman_production-# JOIN pg_namespace n ON c.relnamespace = n.oid
tranman_production-# WHERE am.amname = 'btree' AND n.nspname = 'public'
tranman_production-# -- Don't check temp tables, which may be from another session:
tranman_production-# AND c.relpersistence != 't'
tranman_production-# -- Function may throw an error when this is omitted:
tranman_production-# AND i.indisready AND i.indisvalid
tranman_production-# ORDER BY c.relpages;
ERROR: item order invariant violated for index "index_products_on_group"
DETAIL: Lower index tid=(1,353) (points to heap tid=(19,8)) higher index tid=(1,354) (points to heap tid=(14,14)) page lsn=5/CEC25DA8.
tranman_production=# SELECT bt_index_check(c.oid), c.relname, c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'public'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND i.indisready AND i.indisvalid
ORDER BY c.relpages;
ERROR: item order invariant violated for index "index_products_on_group"
DETAIL: Lower index tid=(1,353) (points to heap tid=(19,8)) higher index tid=(1,354) (points to heap tid=(14,14)) page lsn=5/CEC25DA8.

I dropped that index and ran it again, and kept on dropping indexes

ERROR: item order invariant violated for index "index_cleanings_on_receipt"
DETAIL: Lower index tid=(39,21) (points to heap tid=(238,98)) higher index tid=(39,22) (points to heap tid=(243,7)) page lsn=1/B853F98.

ERROR: item order invariant violated for index "index_freights_on_reference_4"
DETAIL: Lower index tid=(34,51) (points to heap tid=(67,7)) higher index tid=(34,52) (points to heap tid=(84,2)) page lsn=6/7EFB68.

ERROR: item order invariant violated for index "index_freights_on_reference_3"
DETAIL: Lower index tid=(38,82) (points to heap tid=(362,75)) higher index tid=(38,83) (points to heap tid=(194,21)) page lsn=4/FEA79090.

ERROR: item order invariant violated for index "index_freights_on_reference_2"
DETAIL: Lower index tid=(10,44) (points to heap tid=(176,26)) higher index tid=(10,45) (points to heap tid=(215,60)) page lsn=5/46A4D90.

ERROR: item order invariant violated for index "index_freights_on_reference_1"
DETAIL: Lower index tid=(43,214) (points to heap tid=(112,46)) higher index tid=(43,215) (points to heap tid=(112,43)) page lsn=5/C3999990.

Now no more errors are displayed, and I can run my initial update again, with the index created.

I recreated the 6 indexes and everyhting still works.

Herman

Does it mean that everything is fine after recreating
index_freights_on_reference_<d> indexes? Well, that would mean they were
somehow corrupted, and reindexing is the way to go. Although it's not
clear for me, why the problem was temporarily avoided by dropping
another index on cmr_received.

PS: One small tip, please reply to all in the mailing list, otherwise it
would be harder for others to understand what happened with the discussion.

In reply to: Herman verschooten (#1)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

On Mon, Oct 25, 2021 at 2:59 AM Herman verschooten
<Herman@verschooten.net> wrote:

tranman_production=# update freights set cmr_received=false where id=49632;
ERROR: XX000: posting list tuple with 20 items cannot be split at offset 168
LOCATION: _bt_swap_posting, nbtdedup.c:1037

If I drop the index index_freights_on_cmr_received, then the update succeeds.

What you see here is a defensive "can't happen" error that I added in
commit 8f72bbac, and backpatched to Postgres 13.4, which came out on
2021-08-12. The goal of that error is to make a possible hard crash
due to corruption into a slightly friendlier kind of failure (the
error that you see here). Were you running 13.4 before the upgrade?

If you were on 13.3 or earlier before the upgrade to 14, then it's
possible that the problem was there all along, but is only now visible
for the first time.

--
Peter Geoghegan

#5Herman verschooten
Herman@verschooten.net
In reply to: Peter Geoghegan (#4)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

Hi Peter,

Thanks for the info.
I don’t recall the exact 13 version we were on.

I can confirm that after dropping and recreating the 6 indexes everything is working fine.

Thanks for all the help, both here and on slack,

Herman

Show quoted text

Op 25 okt. 2021, om 17:29 heeft Peter Geoghegan <pg@bowt.ie> het volgende geschreven:

On Mon, Oct 25, 2021 at 2:59 AM Herman verschooten
<Herman@verschooten.net> wrote:

tranman_production=# update freights set cmr_received=false where id=49632;
ERROR: XX000: posting list tuple with 20 items cannot be split at offset 168
LOCATION: _bt_swap_posting, nbtdedup.c:1037

If I drop the index index_freights_on_cmr_received, then the update succeeds.

What you see here is a defensive "can't happen" error that I added in
commit 8f72bbac, and backpatched to Postgres 13.4, which came out on
2021-08-12. The goal of that error is to make a possible hard crash
due to corruption into a slightly friendlier kind of failure (the
error that you see here). Were you running 13.4 before the upgrade?

If you were on 13.3 or earlier before the upgrade to 14, then it's
possible that the problem was there all along, but is only now visible
for the first time.

--
Peter Geoghegan

In reply to: Herman verschooten (#5)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

On Mon, Oct 25, 2021 at 10:35 PM Herman verschooten
<Herman@verschooten.net> wrote:

I can confirm that after dropping and recreating the 6 indexes everything is working fine.

Did you also run VACUUM?

You said that not all rows seem to be affected. Is there any pattern
that you notice about the rows that are affected?

I understand (from your remarks onSlack) that you found that the
problem seemed to go away once you dropped the index named
index_freights_on_cmr_received. But once you recreated the index from
scratch, the problem came back.

In other words, rebuilding the index didn't seem to make the "posting
list tuple with 20 items cannot be split at offset 168" error go away,
but dropping the index appeared to *temporarily* resolve the issue?
Suggesting that the problem was either in the table, or in the code
that builds a new index structure by scanning the table?

Thanks for all the help, both here and on slack,

Thanks for the report!

--
Peter Geoghegan

#7Herman verschooten
Herman@verschooten.net
In reply to: Peter Geoghegan (#6)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

Peter,

No we just noticed that we were unable to update that 1 record, and only that Boolean field. We did not search for other records that might have been affected.

I did as yet not run VACUUM, I’ll look into that later today.

Herman

Verstuurd vanaf mijn iPad

Show quoted text

Op 27 okt. 2021 om 03:45 heeft Peter Geoghegan <pg@bowt.ie> het volgende geschreven:

On Mon, Oct 25, 2021 at 10:35 PM Herman verschooten
<Herman@verschooten.net> wrote:

I can confirm that after dropping and recreating the 6 indexes everything is working fine.

Did you also run VACUUM?

You said that not all rows seem to be affected. Is there any pattern
that you notice about the rows that are affected?

I understand (from your remarks onSlack) that you found that the
problem seemed to go away once you dropped the index named
index_freights_on_cmr_received. But once you recreated the index from
scratch, the problem came back.

In other words, rebuilding the index didn't seem to make the "posting
list tuple with 20 items cannot be split at offset 168" error go away,
but dropping the index appeared to *temporarily* resolve the issue?
Suggesting that the problem was either in the table, or in the code
that builds a new index structure by scanning the table?

Thanks for all the help, both here and on slack,

Thanks for the report!

--
Peter Geoghegan

#8Andres Freund
andres@anarazel.de
In reply to: Herman verschooten (#1)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

Hi,

On 2021-10-25 11:58:48 +0200, Herman verschooten wrote:

I upgraded postgres 13 to 14+231.pgdg18.04+1 during the weekend, virtual ubuntu 20.04.3.

What filesystem is the postgres data stored on?

this morning a user reported an error when trying to change a flag in our web app (Phoenix, ecto).
I traced it back to an index issue on a boolean column.

If I drop the index index_freights_on_cmr_received, then the update succeeds.

If I recreate the index, the error resurfaces.

Any chance you still have the WAL around from since the update? If we could
scan the WAL to see what type of activity lead to the corruption, we might be
able to narrow the cause down more easily.

Greetings,

Andres Freund

#9Herman verschooten
Herman@verschooten.net
In reply to: Andres Freund (#8)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

Hi,

The filesystem is EXT4 on a Proxmox container using LVM-thin.

I have to check, but I believe there may still be a complete backup of the instance.

Herman

Show quoted text

Op 28 okt. 2021, om 23:47 heeft Andres Freund <andres@anarazel.de> het volgende geschreven:

Hi,

On 2021-10-25 11:58:48 +0200, Herman verschooten wrote:

I upgraded postgres 13 to 14+231.pgdg18.04+1 during the weekend, virtual ubuntu 20.04.3.

What filesystem is the postgres data stored on?

this morning a user reported an error when trying to change a flag in our web app (Phoenix, ecto).
I traced it back to an index issue on a boolean column.

If I drop the index index_freights_on_cmr_received, then the update succeeds.

If I recreate the index, the error resurfaces.

Any chance you still have the WAL around from since the update? If we could
scan the WAL to see what type of activity lead to the corruption, we might be
able to narrow the cause down more easily.

Greetings,

Andres Freund

In reply to: Herman verschooten (#9)
Re: ERROR: posting list tuple with 20 items cannot be split at offset 168

On Fri, Oct 29, 2021 at 1:47 AM Herman verschooten
<Herman@verschooten.net> wrote:

The filesystem is EXT4 on a Proxmox container using LVM-thin.

I have to check, but I believe there may still be a complete backup of the instance.

Thanks for getting back about this.

A fix for this bug will appear in Postgres 14.1. In the meantime I
suggest that you avoid using parallel VACUUM. For example, any scripts
that run VACUUM could be temporarily changed to make sure that
parallelism won't be used. If the script says "vacuum my_table;", then
that could be changed to "vacuum (parallel 0) my_table;", for example.
Just until you can upgrade to the next point release.

--
Peter Geoghegan