BUG #16595: Reads fail with "lost saved point in index" error after writes

Started by PG Bug reporting formalmost 6 years ago5 messagesbugs

noreply@postgresql.org

almost 6 years ago

The following bug has been logged on the website:

Bug reference: 16595
Logged by: Jesse Kinkead
Email address: jesse@falkon.ai
PostgreSQL version: 11.6
Operating system: Linux
Description:

We've seen this on a few different servers running Postgres 11.6.

The table in question has a JSONB column with a GIN index, defined as:

"index_name_idx" gin (jsonb_column_name jsonb_path_ops, integer_column,
timestamp_with_time_zone_column)

Under load (especially writes followed immediately by reads), we
occasionally see reads failing an internal exception (error code XX000) with
the message "lost saved point in index".

This seems to come from this source file:
https://doxygen.postgresql.org/ginget_8c_source.html

To further complicate matters, this server is using TimescaleDB
(https://www.timescale.com/), and has this table configured as a hypertable
(https://docs.timescale.com/latest/using-timescaledb/hypertables), which I
believe means it's actually multiple tables masquerading as a single
table.

The errors are quite frequent when running with slow CPUs / disks, but more
rare when running with more resources.

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #16595: Reads fail with "lost saved point in index" error after writes

PG Bug reporting form <noreply@postgresql.org> writes:

The table in question has a JSONB column with a GIN index, defined as:
"index_name_idx" gin (jsonb_column_name jsonb_path_ops, integer_column,
timestamp_with_time_zone_column)
Under load (especially writes followed immediately by reads), we
occasionally see reads failing an internal exception (error code XX000) with
the message "lost saved point in index".

[ stares at that for awhile... ] Hm, I think this is just broken for
multi-column GIN indexes. The code is expecting that it can re-find
an index item that was there before, and that the item must either be
where it was before in the index or somewhere to the right. Both
of those assumptions seem valid, since ginvacuum never removes items;
but insertions could push the item to the right.

But this:

if (gintuple_get_attrnum(btree->ginstate, itup) != attnum)
elog(ERROR, "lost saved point in index"); /* must not happen !!! */

amounts to an assumption that things can't get pushed so far to the
right that our pointer is now pointing at an item with lower attnum.
There's no reason for that to be true --- AFAICS, the attnum is the
first sort key for the items, but there's not physical segregation
into different index pages or anything like that.

We could throw an error if we reach a higher attnum, but since the
whole thing is supposed to be can't-happen, I doubt it's worth
expending code on. What we should do is just advance over any
item that hasn't got the right attnum.

regards, tom lane

Jesse Kinkead

jesse@falkon.ai

almost 6 years ago

In reply to: Tom Lane (#2)

Re: BUG #16595: Reads fail with "lost saved point in index" error after writes

Thanks for the quick reply!

It sounds like this isn't a known problem, and that the only workaround
might be just dropping the index. Hmm!

So far, we've had fine luck simply retrying on error, and retries typically
succeed, so that's also a fine path forward.

Thanks again!

- jesse

On Thu, Aug 27, 2020 at 8:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

PG Bug reporting form <noreply@postgresql.org> writes:

The table in question has a JSONB column with a GIN index, defined as:
"index_name_idx" gin (jsonb_column_name jsonb_path_ops, integer_column,
timestamp_with_time_zone_column)
Under load (especially writes followed immediately by reads), we
occasionally see reads failing an internal exception (error code XX000)

with

the message "lost saved point in index".

[ stares at that for awhile... ] Hm, I think this is just broken for
multi-column GIN indexes. The code is expecting that it can re-find
an index item that was there before, and that the item must either be
where it was before in the index or somewhere to the right. Both
of those assumptions seem valid, since ginvacuum never removes items;
but insertions could push the item to the right.

But this:

if (gintuple_get_attrnum(btree->ginstate, itup) != attnum)
elog(ERROR, "lost saved point in index"); /* must
not happen !!! */

amounts to an assumption that things can't get pushed so far to the
right that our pointer is now pointing at an item with lower attnum.
There's no reason for that to be true --- AFAICS, the attnum is the
first sort key for the items, but there's not physical segregation
into different index pages or anything like that.

We could throw an error if we reach a higher attnum, but since the
whole thing is supposed to be can't-happen, I doubt it's worth
expending code on. What we should do is just advance over any
item that hasn't got the right attnum.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: Jesse Kinkead (#3)

Re: BUG #16595: Reads fail with "lost saved point in index" error after writes

Jesse Kinkead <jesse@falkon.ai> writes:

It sounds like this isn't a known problem,

Well, now it is :-). I'm just finishing up testing a patch.

and that the only workaround
might be just dropping the index. Hmm!

Single-column GIN indexes aren't subject to the issue, so maybe you
could work around it by replacing the multicolumn index with some
single-column ones?

Also, now that I've studied the code some more, the code path with
the issue is only hit for queries that use (a) partial match or
(b) an empty query that forces a full-index scan, similar to
"arraycol @> '{}'". That might explain how come the bug went
undetected for so long. I dunno if that info will help you dodge
the bug, but maybe.

If you can't find any acceptable workaround, another idea is to
build your own server version with the patch applied, to tide
you over until there's an official release with the fix. IMO
one of the main benefits of using open-source code is that that's
normally not very difficult.

regards, tom lane

Jesse Kinkead

jesse@falkon.ai

almost 6 years ago

In reply to: Tom Lane (#4)

Re: BUG #16595: Reads fail with "lost saved point in index" error after writes

On Thu, Aug 27, 2020 at 1:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jesse Kinkead <jesse@falkon.ai> writes:

It sounds like this isn't a known problem,

Well, now it is :-). I'm just finishing up testing a patch.

Awesome!

and that the only workaround
might be just dropping the index. Hmm!

Single-column GIN indexes aren't subject to the issue, so maybe you
could work around it by replacing the multicolumn index with some
single-column ones?

This might be an option for us to try out; thank you.

Also, now that I've studied the code some more, the code path with
the issue is only hit for queries that use (a) partial match or
(b) an empty query that forces a full-index scan, similar to
"arraycol @> '{}'". That might explain how come the bug went
undetected for so long. I dunno if that info will help you dodge
the bug, but maybe.

Interestingly, we're not querying on empty values, but we ARE querying
on multiple values at the same time (jsonb_column @> "value1" AND
jsonb_column @> "value2").

If you can't find any acceptable workaround, another idea is to
build your own server version with the patch applied, to tide
you over until there's an official release with the fix. IMO
one of the main benefits of using open-source code is that that's
normally not very difficult.

Indeed! And they frequently have very responsive maintainers. :)