BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

Started by PG Bug reporting formover 5 years ago56 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16792
Logged by: pawel kudzia
Email address: kudzia@gmail.com
PostgreSQL version: 11.10
Operating system: Debian Buster x86_64
Description:

I'd like to ask for advice how to proceed with pin-pointing cause of the
silent corruption of GIN index that I'm facing.

Relevant part of the table structure:

CREATE TABLE public.entity (
entity_id bigint NOT NULL,
attribute_value_ids integer[] NOT NULL,
attribute_name_ids integer[] NOT NULL,
[..]
);

ALTER TABLE ONLY public.entity ADD CONSTRAINT entity_pkey PRIMARY KEY
(entity_id);
CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin
(attribute_name_ids public.gin__int_ops);
CREATE INDEX entity_attribute_value_ids_gin ON public.entity USING gin
(attribute_value_ids public.gin__int_ops);

How does the issue manifest?

Queries which use GIN on integer[] column occasionally return too many rows,
including ones that actually do not match criteria expressed in WHERE.
Queries like below should never return any rows, yet - occasionally they do
return some results:

data=> SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{4980}' )
AND NOT ( (attribute_name_ids||0) && '{4980}') ;
entity_id
-----------
31213924
31195117
31209184
(3 rows)

Query plan:
QUERY PLAN
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on entity (cost=50.39..3068.11 rows=26923 width=22)
Recheck Cond: (attribute_name_ids && '{4980}'::integer[])
Filter: (NOT ((attribute_name_ids || 0) && '{4980}'::integer[]))
-> Bitmap Index Scan on entity_attribute_name_ids_gin (cost=0.00..50.39
rows=27058 width=0)
Index Cond: (attribute_name_ids && '{4980}'::integer[])
(5 rows)

The query was crafted intentionally to let PostgreSQL use GIN index
entity_attribute_name_ids_gin [ left side of AND ] and then filter out rows
that after inspection of their values do not actually match that criteria.

What have I done so far?

Originally the problem was discovered when running PostgreSQL 11.7 from
Debian's repo. In the first step we've upgraded to PostgreSQL 11.9 also from
Debian's repo and run:

reindex (verbose) table entity; vacuum (verbose, full, analyze) entity;

After few days of updates the problem returned.

We've upgraded to PostgreSQL 11.10 from postgresql.org repository for Debian
and reindexed / vacuumed again. After few weeks of updates problem returned
again.

Other information worth noting:
* table in question has 38M rows and is the only table created after
PostgreSQL installation
* server in question is master of streaming replication; issue occurs also
on the slave servers - this suggests corruption of the on-disk GIN index
data
* just rewriting rows falsely returned by select above - UPDATE entity SET
attribute_name_ids ='{....}' WHERE entity_id=123 - fixes the issue
* we've set up rudimentary consistency checks that allow to catch some of
the corruptions within 1h of occurring - so far corruptions only happen to
rows that have been updated since the previous check
* server in question handles heavy read/write traffic
* PostgreSQL in question runs in LXC container with Debian Buster running on
top of Debian Buster running on top of bare-metal server
* it's highly unlikely that hardware malfunction is to be blamed - the same
physical server handles also few MySQL instances with hundreds of GB of data
with heavy consistency checks cross checking content of MySQL with
PostgreSQL, unpacking gzip'ed blobs stored in MySQL databases and
de-serializing objects stored there. if there was a bit-rot / bit-flip in
memory would that's not detected or fixed by ECC - high level consistency
checks would pick it.
* despite numerous attempts i cannot reproduce the issue in test
environment, possibly due to much lower level of write traffic. it takes
days to weeks to have it re-occur on the production server.

Thank you in advance for your suggestions how to tackle this.

best regards,
Pawel

#2Pawel Kudzia
kudzia@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On Sun, Dec 27, 2020 at 6:19 PM PG Bug reporting form
<noreply@postgresql.org> wrote:

The following bug has been logged on the website:

Bug reference: 16792
Logged by: pawel kudzia
Email address: kudzia@gmail.com
PostgreSQL version: 11.10
Operating system: Debian Buster x86_64
Description:

I'd like to ask for advice how to proceed with pin-pointing cause of the
silent corruption of GIN index that I'm facing.

Relevant part of the table structure:

CREATE TABLE public.entity (
entity_id bigint NOT NULL,
attribute_value_ids integer[] NOT NULL,
attribute_name_ids integer[] NOT NULL,
[..]
);

ALTER TABLE ONLY public.entity ADD CONSTRAINT entity_pkey PRIMARY KEY
(entity_id);
CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin
(attribute_name_ids public.gin__int_ops);
CREATE INDEX entity_attribute_value_ids_gin ON public.entity USING gin
(attribute_value_ids public.gin__int_ops);

How does the issue manifest?

Queries which use GIN on integer[] column occasionally return too many rows,
including ones that actually do not match criteria expressed in WHERE.
Queries like below should never return any rows, yet - occasionally they do
return some results:

data=> SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{4980}' )
AND NOT ( (attribute_name_ids||0) && '{4980}') ;
entity_id
-----------
31213924
31195117
31209184
(3 rows)

Query plan:
QUERY PLAN
-------------------------------------------------------------------------------------------------
Bitmap Heap Scan on entity (cost=50.39..3068.11 rows=26923 width=22)
Recheck Cond: (attribute_name_ids && '{4980}'::integer[])
Filter: (NOT ((attribute_name_ids || 0) && '{4980}'::integer[]))
-> Bitmap Index Scan on entity_attribute_name_ids_gin (cost=0.00..50.39
rows=27058 width=0)
Index Cond: (attribute_name_ids && '{4980}'::integer[])
(5 rows)

The query was crafted intentionally to let PostgreSQL use GIN index
entity_attribute_name_ids_gin [ left side of AND ] and then filter out rows
that after inspection of their values do not actually match that criteria.

What have I done so far?

Originally the problem was discovered when running PostgreSQL 11.7 from
Debian's repo. In the first step we've upgraded to PostgreSQL 11.9 also from
Debian's repo and run:

reindex (verbose) table entity; vacuum (verbose, full, analyze) entity;

After few days of updates the problem returned.

We've upgraded to PostgreSQL 11.10 from postgresql.org repository for Debian
and reindexed / vacuumed again. After few weeks of updates problem returned
again.

Other information worth noting:
* table in question has 38M rows and is the only table created after
PostgreSQL installation
* server in question is master of streaming replication; issue occurs also
on the slave servers - this suggests corruption of the on-disk GIN index
data
* just rewriting rows falsely returned by select above - UPDATE entity SET
attribute_name_ids ='{....}' WHERE entity_id=123 - fixes the issue
* we've set up rudimentary consistency checks that allow to catch some of
the corruptions within 1h of occurring - so far corruptions only happen to
rows that have been updated since the previous check
* server in question handles heavy read/write traffic
* PostgreSQL in question runs in LXC container with Debian Buster running on
top of Debian Buster running on top of bare-metal server
* it's highly unlikely that hardware malfunction is to be blamed - the same
physical server handles also few MySQL instances with hundreds of GB of data
with heavy consistency checks cross checking content of MySQL with
PostgreSQL, unpacking gzip'ed blobs stored in MySQL databases and
de-serializing objects stored there. if there was a bit-rot / bit-flip in
memory would that's not detected or fixed by ECC - high level consistency
checks would pick it.
* despite numerous attempts i cannot reproduce the issue in test
environment, possibly due to much lower level of write traffic. it takes
days to weeks to have it re-occur on the production server.

Thank you in advance for your suggestions how to tackle this.

best regards,
Pawel

I'd like to provide more details from another occurrence of this issue:

psql (11.7 (Debian 11.7-0+deb10u1), server 11.9 (Debian 11.9-0+deb10u1))

Example where GIN is returning misleading information:

data=# SELECT entity_id,attribute_name_ids FROM entity WHERE (
attribute_name_ids && '{471}' ) AND NOT ( (attribute_name_ids||0) &&
'{471}') ;
entity_id | attribute_name_ids
-----------+------------------------------------------------------------------------------
197911914 | {2,3,4,6,7,8,9,10,13,17,35,48,66,71,3270,3273,3279,4377,4842,5379,5787,7416}
(1 row)

data=# EXPLAIN ANALYZE SELECT entity_id,attribute_name_ids FROM
entity WHERE ( attribute_name_ids && '{471}' ) AND NOT (
(attribute_name_ids||0) && '{471}') ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on entity (cost=71.04..3432.73 rows=29881
width=126) (actual time=64.179..65.255 rows=1 loops=1)
Recheck Cond: (attribute_name_ids && '{471}'::integer[])
Filter: (NOT ((attribute_name_ids || 0) && '{471}'::integer[]))
Rows Removed by Filter: 8814
Heap Blocks: exact=7877
-> Bitmap Index Scan on entity_attribute_name_ids_gin
(cost=0.00..71.04 rows=30031 width=0) (actual time=10.396..10.398
rows=13628 loops=1)
Index Cond: (attribute_name_ids && '{471}'::integer[])
Planning Time: 0.164 ms
Execution Time: 65.458 ms
(9 rows)

When GIN is not used results are as expected:

data=# EXPLAIN ANALYZE SELECT entity_id,attribute_name_ids FROM
entity WHERE entity_id=197911914 AND attribute_name_ids && '{471}';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Index Scan using entity_pkey on entity (cost=0.00..0.20 rows=1
width=126) (actual time=0.090..0.091 rows=0 loops=1)
Index Cond: (entity_id = 197911914)
Filter: (attribute_name_ids && '{471}'::integer[])
Rows Removed by Filter: 1
Planning Time: 0.233 ms
Execution Time: 0.116 ms
(6 rows)

data=# SELECT entity_id,attribute_name_ids FROM entity WHERE
entity_id=197911914 AND attribute_name_ids && '{471}';
entity_id | attribute_name_ids
-----------+--------------------
(0 rows)

From my observations introducing of misleading entry in GIN coincides
with update of database row that will be incorrectly reported.

Fixing the issue can be done by:

UPDATE entity SET attribute_name_ids ='{}' WHERE entity_id=197911914;
UPDATE entity SET attribute_name_ids
='{2,3,4,6,7,8,9,10,13,17,35,48,66,71,3270,3273,3279,4377,4842,5379,5787,7416}'
WHERE entity_id=197911914';

The table has following structure:

Column | Type | Collation |
Nullable | Default
-----------------------+--------------------------------+-----------+----------+---------------
entity_id | bigint | | not null |
attribute_value_ids | integer[] | | not null |
attribute_name_ids | integer[] | | not null |
[..]
Indexes:
"entity_pkey" PRIMARY KEY, btree (entity_id)
"entity_attribute_name_ids_gin" gin (attribute_name_ids gin__int_ops)
"entity_attribute_value_ids_gin" gin (attribute_value_ids gin__int_ops)
[..]

Disk footprint of it is ~ 60GB, ~38M rows. In fact it's a single table
in this database.

The table gets tens of millions of updates each day and maybe 10-100k
inserts with very little deletes.

I have taken a copy of database files from streaming replication slave
server. i also have WAL files from around the time that this
particular corruption occurred.

Is there any additional information I can provide?

thank you!

--
regards,
Pawel Kudzia

#3Pawel Kudzia
kudzia@gmail.com
In reply to: Pawel Kudzia (#2)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

Based on the suggestions we've received from IRC we've:
* Moved some of our postgresql instances to another
physical server,
* Enabled data_checksums on all instances,
* Upgraded upgraded our servers to 13.3
(Debian 13.3-1.pgdg100+1) with hope that
https://github.com/postgres/postgres/commit/0d779d22a290a89b6c892137a37280b9588ad0cc
addresses the issue [ although it's ts-vector
specific, so unlikely to be related to our issue ].

Yet soon after we've observe another inconsistency.

Recap of the problem - SELECTs return rows, based on GIN,
that do not actually meet criteria expressed in WHERE.

SELECT entity_id,attribute_name_ids FROM entity WHERE
( attribute_name_ids && '{7572}' ) AND NOT
( (attribute_name_ids||0) && '{7572}') LIMIT 100 ;

entity_id | attribute_name_ids
-----------+----------------------------------------------------------------------------------------
22327791 | {1,2,3,4,6,8,9,10,11,13,14,17,19,21,35,72,366,1659,2208,2211,3270,3273,3279,5787,7650}

(1 row)

EXPLAIN ANALYZE of the query above:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=35.46..46.67 rows=100 width=132) (actual
time=307.221..335.842 rows=1 loops=1)
-> Bitmap Heap Scan on entity (cost=35.46..3705.32 rows=32724
width=132) (actual time=307.220..335.839 rows=1 loops=1)
Recheck Cond: (attribute_name_ids && '{7572}'::integer[])
Rows Removed by Index Recheck: 72012
Filter: (NOT ((attribute_name_ids || 0) && '{7572}'::integer[]))
Rows Removed by Filter: 21501
Heap Blocks: exact=8998 lossy=9257
-> Bitmap Index Scan on entity_attribute_name_ids_gin
(cost=0.00..35.46 rows=32889 width=0) (actual time=19.790..19.790
rows=115485 loops=1)
Index Cond: (attribute_name_ids && '{7572}'::integer[])
Planning Time: 0.068 ms
Execution Time: 335.879 ms
(11 rows)

Relevant part of the table structure:

CREATE TABLE public.entity (
entity_id bigint NOT NULL,
attribute_value_ids integer[] NOT NULL,
attribute_name_ids integer[] NOT NULL,
[..]
);

ALTER TABLE ONLY public.entity ADD CONSTRAINT entity_pkey PRIMARY KEY
(entity_id);
CREATE INDEX entity_attribute_name_ids_gin ON public.entity USING gin
(attribute_name_ids public.gin__int_ops);
CREATE INDEX entity_attribute_value_ids_gin ON public.entity USING gin
(attribute_value_ids public.gin__int_ops);

There are at leat tens of updates / inserts per second to the table all the
time. the issue above manifests very rarely, after few days if not weeks of
uptime. we did not find any deterministic way of reproducing it, but it's
a matter of time till it shows.

The issue persist is related to on-disk data, it replicates from
streaming replication
masters to slaves.

Thank you in advance for your suggestions how to tackle this.

--
regards,
Pawel Kudzia

#4Pawel Kudzia
kudzia@gmail.com
In reply to: Pawel Kudzia (#3)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

with help from IRC we've found that decreasing work_mem from 1MB to 256kB
or less makes the problem go away:

data=# show work_mem;
work_mem
----------
256kB
(1 row)

data=# SELECT entity_id,attribute_name_ids FROM entity WHERE (
attribute_name_ids && '{7572}' ) AND NOT ( (attribute_name_ids||0) &&
'{7572}') LIMIT 100 ;
entity_id | attribute_name_ids
-----------+--------------------
(0 rows)

data=# explain analyze SELECT entity_id,attribute_name_ids FROM entity
WHERE ( attribute_name_ids && '{7572}' ) AND NOT (
(attribute_name_ids||0) && '{7572}') LIMIT 100 ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=26.53..38.04 rows=100 width=132) (actual
time=110.013..110.015 rows=0 loops=1)
-> Bitmap Heap Scan on entity (cost=26.53..3780.78 rows=32606
width=132) (actual time=110.011..110.011 rows=0 loops=1)
Recheck Cond: (attribute_name_ids && '{7572}'::integer[])
Rows Removed by Index Recheck: 102983
Filter: (NOT ((attribute_name_ids || 0) && '{7572}'::integer[]))
Rows Removed by Filter: 21501
Heap Blocks: exact=898 lossy=13752
-> Bitmap Index Scan on entity_attribute_name_ids_gin
(cost=0.00..26.53 rows=32770 width=0) (actual time=3.582..3.583
rows=21518 loops=1)
Index Cond: (attribute_name_ids && '{7572}'::integer[])
Planning Time: 0.173 ms
Execution Time: 110.220 ms
(11 rows)

problem manifests again with work_mem increased to 512kB or higher:

data=# show work_mem;
work_mem
----------
512kB
(1 row)

data=# SELECT entity_id,attribute_name_ids FROM entity WHERE (
attribute_name_ids && '{7572}' ) AND NOT ( (attribute_name_ids||0) &&
'{7572}') LIMIT 100 ;
entity_id | attribute_name_ids
-----------+----------------------------------------------------------------------------------------
22327791 | {1,2,3,4,6,8,9,10,11,13,14,17,19,21,35,72,366,1659,2208,2211,3270,3273,3279,5787,7650}
(1 row)

data=# explain analyze SELECT entity_id,attribute_name_ids FROM entity
WHERE ( attribute_name_ids && '{7572}' ) AND NOT (
(attribute_name_ids||0) && '{7572}') LIMIT 100 ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=26.73..38.14 rows=100 width=132) (actual
time=112.268..119.475 rows=1 loops=1)
-> Bitmap Heap Scan on entity (cost=26.73..3748.28 rows=32606
width=132) (actual time=112.267..119.473 rows=1 loops=1)
Recheck Cond: (attribute_name_ids && '{7572}'::integer[])
Rows Removed by Index Recheck: 68905
Filter: (NOT ((attribute_name_ids || 0) && '{7572}'::integer[]))
Rows Removed by Filter: 21501
Heap Blocks: exact=5630 lossy=9012
-> Bitmap Index Scan on entity_attribute_name_ids_gin
(cost=0.00..26.73 rows=32770 width=0) (actual time=3.924..3.924
rows=21518 loops=1)
Index Cond: (attribute_name_ids && '{7572}'::integer[])
Planning Time: 0.113 ms
Execution Time: 119.801 ms
(11 rows)

"vacuum table entity;" did not help, neither did
"select gin_clean_pending_list('entity_attribute_name_ids_gin');"

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pawel Kudzia (#4)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

Pawel Kudzia <kudzia@gmail.com> writes:

with help from IRC we've found that decreasing work_mem from 1MB to 256kB
or less makes the problem go away:

Hmm. So that suggests that the index itself is *not* corrupt,
but the problem is associated with a bug in the indexscan
algorithms.

Have you experimented with different index opclasses? Your
original report said you were using gin__int_ops, but that's
not the default, and we sort of deprecate intarray these days.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

I wrote:

Pawel Kudzia <kudzia@gmail.com> writes:

with help from IRC we've found that decreasing work_mem from 1MB to 256kB
or less makes the problem go away:

Hmm. So that suggests that the index itself is *not* corrupt,
but the problem is associated with a bug in the indexscan
algorithms.

After staring at the code I think there is at least one bug, and
possibly two, in shimTriConsistentFn. That's likely to be implicated
here because intarray's GIN opclass only provides a bool consistent
function. I'm not very clear on the circumstances that lead to the scan
code inventing GIN_MAYBE inputs, so I haven't been able to construct a
test case.

The definite bug is triggered because intarray relies on the API
specification that says that if the search mode is
GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called
when there's at least one TRUE key:

case RTOverlapStrategyNumber:
/* result is not lossy */
*recheck = false;
/* at least one element in check[] is true, so result = true */
res = true;
break;

shimTriConsistentFn ignores this and calls it on all-FALSE inputs,
for which it'll incorrectly get a TRUE result, as it will also for
all the following checks. The upshot is that shimTriConsistentFn
will convert any case with a MAYBE input to a hard TRUE result,
with no recheck requirement. This'd easily explain the reported
misbehavior. So in the spot where we do this:

/* First call consistent function with all the maybe-inputs set FALSE */
for (i = 0; i < nmaybe; i++)
key->entryRes[maybeEntries[i]] = GIN_FALSE;
curResult = directBoolConsistentFn(key);

we need to add some code that checks for default searchMode, and in
that case doesn't call the consistentFn unless at least one
(non-MAYBE) input is TRUE.

The other thing that I'm unsure about, because the code is horribly
underdocumented, is that it's not very clear whether
shimTriConsistentFn is accurately converting between the bool and
the tristate APIs. That's because it's not very clear what the
tristate API actually *is*. In particular, is the end state of
key->recheckCurItem meaningful in the tristate case? If it's not,
then the short-circuit case for no MAYBE inputs is broken, because
it'll return TRUE when the bool consistentFn is trying to tell us
to recheck. But if it is meaningful, then the multiway case is broken,
because it will return the recheckCurItem result set by the last call to
the bool consistentfn; which might be false even though other calls said
true. (So in that case I think we'd need "key->recheckCurItem = recheck"
at the end.) I also wonder how any of that works correctly for real
triconsistent functions, which don't have access to the recheckCurItem
flag.

Anyway, I'm punting this to the code authors. I'd like to see
some comments clarifying what the API really is, not just a
quick-n-dirty code fix.

regards, tom lane

#7Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#6)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Pawel Kudzia <kudzia@gmail.com> writes:

with help from IRC we've found that decreasing work_mem from 1MB to 256kB
or less makes the problem go away:

Hmm. So that suggests that the index itself is *not* corrupt,
but the problem is associated with a bug in the indexscan
algorithms.

After staring at the code I think there is at least one bug, and
possibly two, in shimTriConsistentFn. That's likely to be implicated
here because intarray's GIN opclass only provides a bool consistent
function. I'm not very clear on the circumstances that lead to the scan
code inventing GIN_MAYBE inputs, so I haven't been able to construct a
test case.

The definite bug is triggered because intarray relies on the API
specification that says that if the search mode is
GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called
when there's at least one TRUE key:

case RTOverlapStrategyNumber:
/* result is not lossy */
*recheck = false;
/* at least one element in check[] is true, so result = true */
res = true;
break;

shimTriConsistentFn ignores this and calls it on all-FALSE inputs,
for which it'll incorrectly get a TRUE result, as it will also for
all the following checks. The upshot is that shimTriConsistentFn
will convert any case with a MAYBE input to a hard TRUE result,
with no recheck requirement. This'd easily explain the reported
misbehavior. So in the spot where we do this:

/* First call consistent function with all the maybe-inputs set FALSE */
for (i = 0; i < nmaybe; i++)
key->entryRes[maybeEntries[i]] = GIN_FALSE;
curResult = directBoolConsistentFn(key);

we need to add some code that checks for default searchMode, and in
that case doesn't call the consistentFn unless at least one
(non-MAYBE) input is TRUE.

+1,
I think in default search mode we can just start with curResult equal
to GIN_FALSE instead of calling directBoolConsistentFn().

The other thing that I'm unsure about, because the code is horribly
underdocumented, is that it's not very clear whether
shimTriConsistentFn is accurately converting between the bool and
the tristate APIs. That's because it's not very clear what the
tristate API actually *is*. In particular, is the end state of
key->recheckCurItem meaningful in the tristate case? If it's not,
then the short-circuit case for no MAYBE inputs is broken, because
it'll return TRUE when the bool consistentFn is trying to tell us
to recheck. But if it is meaningful, then the multiway case is broken,
because it will return the recheckCurItem result set by the last call to
the bool consistentfn; which might be false even though other calls said
true. (So in that case I think we'd need "key->recheckCurItem = recheck"
at the end.) I also wonder how any of that works correctly for real
triconsistent functions, which don't have access to the recheckCurItem
flag.

As far as I recall, returning MAYBE for no MAYBE inputs in the
triConsistent function is equivalent to setting the recheck flag in
the bool consistent function. Thus, short-circuit case for no MAYBE
inputs should be broken.

Anyway, I'm punting this to the code authors. I'd like to see
some comments clarifying what the API really is, not just a
quick-n-dirty code fix.

Yep, I'm going to have a closer look at this tomorrow.

------
Regards,
Alexander Korotkov

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Korotkov (#7)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

we need to add some code that checks for default searchMode, and in
that case doesn't call the consistentFn unless at least one
(non-MAYBE) input is TRUE.

I think in default search mode we can just start with curResult equal
to GIN_FALSE instead of calling directBoolConsistentFn().

No, it's not that simple, because there might be other keys that are
TRUE not MAYBE. So the result could legitimately be TRUE in this case.

BTW, I think it'd be a really good idea for this function to restore
all the MAYBE entries to GIN_MAYBE before returning, so that you can
remove the caveat that it destroys the contents of entryRes[]. I have
basically zero faith that that's safe, and it seems pretty cheap to
not have to make the assumption.

regards, tom lane

#9Alexander Korotkov
aekorotkov@gmail.com
In reply to: Tom Lane (#8)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On Fri, Jun 18, 2021 at 1:04 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

we need to add some code that checks for default searchMode, and in
that case doesn't call the consistentFn unless at least one
(non-MAYBE) input is TRUE.

I think in default search mode we can just start with curResult equal
to GIN_FALSE instead of calling directBoolConsistentFn().

No, it's not that simple, because there might be other keys that are
TRUE not MAYBE. So the result could legitimately be TRUE in this case.

Yes, that's it. Thank you for correction.

BTW, I think it'd be a really good idea for this function to restore
all the MAYBE entries to GIN_MAYBE before returning, so that you can
remove the caveat that it destroys the contents of entryRes[]. I have
basically zero faith that that's safe, and it seems pretty cheap to
not have to make the assumption.

+1, sounds like a good idea.

------
Regards,
Alexander Korotkov

#10Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#7)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On Fri, Jun 18, 2021 at 12:55 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I'm punting this to the code authors. I'd like to see
some comments clarifying what the API really is, not just a
quick-n-dirty code fix.

Yep, I'm going to have a closer look at this tomorrow.

Sorry for the delay. I'm going to take a closer look in the next
couple of days.

------
Regards,
Alexander Korotkov

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#6)
IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On 17/06/2021 22:49, Tom Lane wrote:

I wrote:

Pawel Kudzia <kudzia@gmail.com> writes:

with help from IRC we've found that decreasing work_mem from 1MB to 256kB
or less makes the problem go away:

Hmm. So that suggests that the index itself is *not* corrupt,
but the problem is associated with a bug in the indexscan
algorithms.

After staring at the code I think there is at least one bug, and
possibly two, in shimTriConsistentFn. That's likely to be implicated
here because intarray's GIN opclass only provides a bool consistent
function. I'm not very clear on the circumstances that lead to the scan
code inventing GIN_MAYBE inputs, so I haven't been able to construct a
test case.

The definite bug is triggered because intarray relies on the API
specification that says that if the search mode is
GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called
when there's at least one TRUE key:

case RTOverlapStrategyNumber:
/* result is not lossy */
*recheck = false;
/* at least one element in check[] is true, so result = true */
res = true;
break;

shimTriConsistentFn ignores this and calls it on all-FALSE inputs,
for which it'll incorrectly get a TRUE result, as it will also for
all the following checks. The upshot is that shimTriConsistentFn
will convert any case with a MAYBE input to a hard TRUE result,
with no recheck requirement. This'd easily explain the reported
misbehavior.

That's subtle. The documentation says:

If <literal>*searchMode</literal> is set to
<literal>GIN_SEARCH_MODE_DEFAULT</literal> (which is the value it is
initialized to before call), only items that match at least one of
the returned keys are considered candidate matches.

I guess you can read that as "consistentFn will only be called when
there is at least one matching item", but that didn't occur to me when I
read that at first. But I agree we need to fix shimTriConsistentFn to
respect that.

The other thing that I'm unsure about, because the code is horribly
underdocumented, is that it's not very clear whether
shimTriConsistentFn is accurately converting between the bool and
the tristate APIs. That's because it's not very clear what the
tristate API actually *is*. In particular, is the end state of
key->recheckCurItem meaningful in the tristate case? If it's not,
then the short-circuit case for no MAYBE inputs is broken, because
it'll return TRUE when the bool consistentFn is trying to tell us
to recheck. But if it is meaningful, then the multiway case is broken,
because it will return the recheckCurItem result set by the last call to
the bool consistentfn; which might be false even though other calls said
true. (So in that case I think we'd need "key->recheckCurItem = recheck"
at the end.) I also wonder how any of that works correctly for real
triconsistent functions, which don't have access to the recheckCurItem
flag.

Anyway, I'm punting this to the code authors. I'd like to see
some comments clarifying what the API really is, not just a
quick-n-dirty code fix.

I've been trying to create a test case for this, but no luck so far. I
cannot find a codepath that would hit these bugs with the kind of query
that Pawel used. The search condition is very simple: "column &&
'{constant}'", with only one key and one constant to search for. There
are three calls to triConsistentFn:

1. In startScanKey, but that's only reached if (key->nentries > 1), so
not reachable with Pawel's query.

2. In keyGetItem, when haveLossyEntry==true. But AFAICS, haveLossyEntry
is never true with Pawel's query. That would require curItem to be a
lossy pointer, and that in turn can only happen when matchBitmap is
used, and that's only used with partial match queries and with queries
that need a full-index scan.

3. The second call in keyGetItem is reached, but it is only passed
GIN_MAYBE when curItem is lossy. Which isn't possible with this query,
see point 2.

AFAICS the recheckCurItem bug should also not cause wrong results with
that query. I must be missing something.

I could probably construct a test case with a different query, one
involving multiple elements in the search key, but I'd like to have a
solid explanation for the original report.

Pawel, is this a production system, or would it be possible for you to
build from sources with a patch with some extra debugging output?

- Heikki

#12Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#11)
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

Pawel, is this a production system, or would it be possible for you to
build from sources with a patch with some extra debugging output?

thank you for looking into it!

it is a production system but we have couple of replicas. when the
problem occurs - it can be reproduced on any replica as well. when
it happens again - i can stop one of replicas, take file level backup
and set up a test machine on which i can try to build PG from source,
including your patch adding extra verbosity.

please note that the last inconsistency was fixed, but i expect that
new one will show up within few days.

greetings!

--
regards,
Pawel Kudzia

#13Alexander Korotkov
aekorotkov@gmail.com
In reply to: Heikki Linnakangas (#11)
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

(On Thu, Jun 24, 2021 at 3:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 17/06/2021 22:49, Tom Lane wrote:

I wrote:

Pawel Kudzia <kudzia@gmail.com> writes:

with help from IRC we've found that decreasing work_mem from 1MB to 256kB
or less makes the problem go away:

Hmm. So that suggests that the index itself is *not* corrupt,
but the problem is associated with a bug in the indexscan
algorithms.

After staring at the code I think there is at least one bug, and
possibly two, in shimTriConsistentFn. That's likely to be implicated
here because intarray's GIN opclass only provides a bool consistent
function. I'm not very clear on the circumstances that lead to the scan
code inventing GIN_MAYBE inputs, so I haven't been able to construct a
test case.

The definite bug is triggered because intarray relies on the API
specification that says that if the search mode is
GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called
when there's at least one TRUE key:

case RTOverlapStrategyNumber:
/* result is not lossy */
*recheck = false;
/* at least one element in check[] is true, so result = true */
res = true;
break;

shimTriConsistentFn ignores this and calls it on all-FALSE inputs,
for which it'll incorrectly get a TRUE result, as it will also for
all the following checks. The upshot is that shimTriConsistentFn
will convert any case with a MAYBE input to a hard TRUE result,
with no recheck requirement. This'd easily explain the reported
misbehavior.

That's subtle. The documentation says:

If <literal>*searchMode</literal> is set to
<literal>GIN_SEARCH_MODE_DEFAULT</literal> (which is the value it is
initialized to before call), only items that match at least one of
the returned keys are considered candidate matches.

I guess you can read that as "consistentFn will only be called when
there is at least one matching item", but that didn't occur to me when I
read that at first. But I agree we need to fix shimTriConsistentFn to
respect that.

The other thing that I'm unsure about, because the code is horribly
underdocumented, is that it's not very clear whether
shimTriConsistentFn is accurately converting between the bool and
the tristate APIs. That's because it's not very clear what the
tristate API actually *is*. In particular, is the end state of
key->recheckCurItem meaningful in the tristate case? If it's not,
then the short-circuit case for no MAYBE inputs is broken, because
it'll return TRUE when the bool consistentFn is trying to tell us
to recheck. But if it is meaningful, then the multiway case is broken,
because it will return the recheckCurItem result set by the last call to
the bool consistentfn; which might be false even though other calls said
true. (So in that case I think we'd need "key->recheckCurItem = recheck"
at the end.) I also wonder how any of that works correctly for real
triconsistent functions, which don't have access to the recheckCurItem
flag.

Anyway, I'm punting this to the code authors. I'd like to see
some comments clarifying what the API really is, not just a
quick-n-dirty code fix.

I've been trying to create a test case for this, but no luck so far. I
cannot find a codepath that would hit these bugs with the kind of query
that Pawel used. The search condition is very simple: "column &&
'{constant}'", with only one key and one constant to search for. There
are three calls to triConsistentFn:

1. In startScanKey, but that's only reached if (key->nentries > 1), so
not reachable with Pawel's query.

2. In keyGetItem, when haveLossyEntry==true. But AFAICS, haveLossyEntry
is never true with Pawel's query. That would require curItem to be a
lossy pointer, and that in turn can only happen when matchBitmap is
used, and that's only used with partial match queries and with queries
that need a full-index scan.

3. The second call in keyGetItem is reached, but it is only passed
GIN_MAYBE when curItem is lossy. Which isn't possible with this query,
see point 2.

AFAICS the recheckCurItem bug should also not cause wrong results with
that query. I must be missing something.

I could probably construct a test case with a different query, one
involving multiple elements in the search key, but I'd like to have a
solid explanation for the original report.

+1,
I see query provided by Pawel is too simple. Bugs, which Tom digged
in shimTriConsistentFn() don't seem to explain the wrong query results
provided by Pawel.

Pawel, is this a production system, or would it be possible for you to
build from sources with a patch with some extra debugging output?

Do you think it also worth checking whether bug persists when set
fastupdate = off. Then we could localize whether bug is related to
pending lists.

------
Regards,
Alexander Korotkov

#14Pawel Kudzia
kudzia@gmail.com
In reply to: Pawel Kudzia (#12)
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On Thu, Jun 24, 2021 at 3:32 PM Pawel Kudzia <kudzia@gmail.com> wrote:

Pawel, is this a production system, or would it be possible for you to
build from sources with a patch with some extra debugging output?

thank you for looking into it!

it is a production system but we have couple of replicas. when the
problem occurs - it can be reproduced on any replica as well. when
it happens again - i can stop one of replicas, take file level backup
and set up a test machine on which i can try to build PG from source,
including your patch adding extra verbosity.

please note that the last inconsistency was fixed, but i expect that
new one will show up within few days.

hello again,

i have a consistent file-level backup of postgresql's /var/lib/postgresql +
/etc/postgresql on which i can reproduce the issue reliably. it's on a test
machine where we can put patched version of PG. currently this machine
is using Debian 13.3-1.pgdg100+1.

set enable_seqscan=off;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{1737}' )
AND NOT ( (attribute_name_ids||0) && '{1737}') LIMIT 10;

returns me 2 rows while it should return none.

also in this case lowering work_mem from 1MB to 256kB makes fixes the
issue - SELECT returns 0 rows instead of 2.

i'll be happy to run patched version and send you back logs produced by it.

greetings!

--
regards,
Pawel Kudzia

#15Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#10)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

( . () w On Tue, Jun 22, 2021 at 7:02 PM Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Fri, Jun 18, 2021 at 12:55 AM Alexander Korotkov
<aekorotkov@gmail.com> wrote:

On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, I'm punting this to the code authors. I'd like to see
some comments clarifying what the API really is, not just a
quick-n-dirty code fix.

Yep, I'm going to have a closer look at this tomorrow.

Sorry for the delay. I'm going to take a closer look in the next
couple of days.

I've closer look at shimTriConsistentFn() function. It looks to me
like the function itself has inconsistencies. But the way it's
currently used in GIN shouldn't produce the wrong query answers.

shimTriConsistentFn() is one of implementation of
GinScanKey.triConsistentFn. In turn, GinScanKey.triConsistentFn have
3 callers:
1) startScanKey() to determine required keys
2) keyGetItem() for lossy item check
3) keyGetItem() for normal item check

Let's see shimTriConsistentFn() inconsistencies and how callers handle them.
1) shimTriConsistentFn() returns result of directBoolConsistentFn()
for zero maybe entries without examining key->recheckCurItem. That
may result in returning GIN_TRUE instead of GIN_MAYBE
1.1) startScanKey() doesn't care about recheck, just looking for
GIN_FALSE result.
1.2) keyGetItem() for lossy item always implies recheck.
1.3) keyGetItem() for a normal item does the trick. Whether a current
item is rechecked is controlled by key->recheckCurItem variable (the
same which is set by directBoolConsistentFn(). The following switch
sets key->recheckCurItem for GIN_MAYBE, but leaves it untouched for
GIN_TRUE. Thus, GIN_TRUE with key->recheckCurItem works here just
like GIN_MAYBE. I think this is inconsistent by itself, but this
inconsistency compensates for inconsistency in shimTriConsistentFn().
2) shimTriConsistentFn() might call directBoolConsistentFn() with all
false inputs for GIN_SEARCH_MODE_DEFAULT. The result is intended to
be false, but opclass consistent method isn't intended to handle this
situation correctly. For instance, ginint4_consistent() returns true
without checking the input array. That could make
shimTriConsistentFn() return GIN_TRUE instead of GIN_MAYBE, or
GIN_MAYBE instead of GIN_FALSE.
2.1) In principle, this could lead startScanKey() to select less
required entries than possible. But this is definitely not the case
of ginint4_consistent() when meeting any of entries is enough for
match.
2.2) In principle, keyGetItem() could get false positives for lossy
items. But that wouldn't lead to wrong query answers. Again, this is
not the case of ginint4_consistent().
2.3) keyGetItem() for a normal item doesn't call triConsistentFn()
with any GIN_MAYBE or all GIN_FALSE.

To sum up, I don't see how inconsistencies in shimTriConsistentFn()
could lead to wrong query answers, especially in intarray GIN index.
Nevertheless, these inconsistencies should be fixed. I'm going to
propose a patch soon.

------
Regards,
Alexander Korotkov

#16Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Alexander Korotkov (#15)
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On 11/07/2021 00:56, Alexander Korotkov wrote:

I've closer look at shimTriConsistentFn() function. It looks to me
like the function itself has inconsistencies. But the way it's
currently used in GIN shouldn't produce the wrong query answers.

shimTriConsistentFn() is one of implementation of
GinScanKey.triConsistentFn. In turn, GinScanKey.triConsistentFn have
3 callers:
1) startScanKey() to determine required keys
2) keyGetItem() for lossy item check
3) keyGetItem() for normal item check

Let's see shimTriConsistentFn() inconsistencies and how callers handle them.
1) shimTriConsistentFn() returns result of directBoolConsistentFn()
for zero maybe entries without examining key->recheckCurItem. That
may result in returning GIN_TRUE instead of GIN_MAYBE
1.1) startScanKey() doesn't care about recheck, just looking for
GIN_FALSE result.
1.2) keyGetItem() for lossy item always implies recheck.
1.3) keyGetItem() for a normal item does the trick. Whether a current
item is rechecked is controlled by key->recheckCurItem variable (the
same which is set by directBoolConsistentFn(). The following switch
sets key->recheckCurItem for GIN_MAYBE, but leaves it untouched for
GIN_TRUE. Thus, GIN_TRUE with key->recheckCurItem works here just
like GIN_MAYBE. I think this is inconsistent by itself, but this
inconsistency compensates for inconsistency in shimTriConsistentFn().
2) shimTriConsistentFn() might call directBoolConsistentFn() with all
false inputs for GIN_SEARCH_MODE_DEFAULT. The result is intended to
be false, but opclass consistent method isn't intended to handle this
situation correctly. For instance, ginint4_consistent() returns true
without checking the input array. That could make
shimTriConsistentFn() return GIN_TRUE instead of GIN_MAYBE, or
GIN_MAYBE instead of GIN_FALSE.
2.1) In principle, this could lead startScanKey() to select less
required entries than possible. But this is definitely not the case
of ginint4_consistent() when meeting any of entries is enough for
match.
2.2) In principle, keyGetItem() could get false positives for lossy
items. But that wouldn't lead to wrong query answers. Again, this is
not the case of ginint4_consistent().
2.3) keyGetItem() for a normal item doesn't call triConsistentFn()
with any GIN_MAYBE or all GIN_FALSE.

To sum up, I don't see how inconsistencies in shimTriConsistentFn()
could lead to wrong query answers, especially in intarray GIN index.

Agreed, I came to the same conclusion looking at the code. Which means
that we still don't have an explanation for the original bug report :-(.

Nevertheless, these inconsistencies should be fixed. I'm going to
propose a patch soon.

Thanks! I came up with the attached patch. I changed
directBoolConsistentFn() to return a GinTernaryValue rather than bool, I
think that makes the logic in shimTriConsistentFn() more clear.

I also tried to write a test case to expose issue 2.1 above, but
couldn't come up with an example.

- Heikki

Attachments:

fix-tri-consistent-issues-1.patchtext/x-patch; charset=UTF-8; name=fix-tri-consistent-issues-1.patchDownload+71-53
#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#14)
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On 05/07/2021 16:36, Pawel Kudzia wrote:

i have a consistent file-level backup of postgresql's /var/lib/postgresql +
/etc/postgresql on which i can reproduce the issue reliably. it's on a test
machine where we can put patched version of PG. currently this machine
is using Debian 13.3-1.pgdg100+1.

set enable_seqscan=off;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{1737}' )
AND NOT ( (attribute_name_ids||0) && '{1737}') LIMIT 10;

returns me 2 rows while it should return none.

also in this case lowering work_mem from 1MB to 256kB makes fixes the
issue - SELECT returns 0 rows instead of 2.

i'll be happy to run patched version and send you back logs produced by it.

Thanks! Here's a patch that prints lines with "GINBUG:" prefix in the
log. It should apply cleanly to PostgreSQL v13.

This should help to confirm whether the bugs in shimTriConsistentFn that
Tom pointed out are the root cause of this issue. It should also tell us
whether the rows are being returned from the pending-inserts list or the
regular part of the GIN index.

We might need a few iterations to test different theories, but we'll get
there...

- Heikki

Ps. Sorry for the delay, I didn't see you reply until now, it went into
spam folder

Attachments:

trace-gin-consistent-funcs-1.patchtext/x-patch; charset=UTF-8; name=trace-gin-consistent-funcs-1.patchDownload+42-2
#18Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#17)
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On Sun, Jul 11, 2021 at 8:42 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 05/07/2021 16:36, Pawel Kudzia wrote:

i have a consistent file-level backup of postgresql's /var/lib/postgresql +
/etc/postgresql on which i can reproduce the issue reliably. it's on a test
machine where we can put patched version of PG. currently this machine
is using Debian 13.3-1.pgdg100+1.

set enable_seqscan=off;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{1737}' )
AND NOT ( (attribute_name_ids||0) && '{1737}') LIMIT 10;

returns me 2 rows while it should return none.

also in this case lowering work_mem from 1MB to 256kB makes fixes the
issue - SELECT returns 0 rows instead of 2.

i'll be happy to run patched version and send you back logs produced by it.

Thanks! Here's a patch that prints lines with "GINBUG:" prefix in the
log. It should apply cleanly to PostgreSQL v13.

This should help to confirm whether the bugs in shimTriConsistentFn that
Tom pointed out are the root cause of this issue. It should also tell us
whether the rows are being returned from the pending-inserts list or the
regular part of the GIN index.

We might need a few iterations to test different theories, but we'll get
there...

Heikki - thank you for the patch! it applied cleanly.

i've run these two commands:

SET enable_seqscan=off;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{1737}' )
AND NOT ( (attribute_name_ids||0) && '{1737}') LIMIT 10;

and got this in the output:

2021-07-12 07:41:05 UTC LOG: GINBUG: startScanKey called: excludeOnly
0 nentries 1

2021-07-12 07:41:05 UTC STATEMENT: SELECT entity_id FROM entity WHERE
( attribute_name_ids && '{1737}' ) AND NOT ( (attribute_name_ids||0)
&& '{1737}') LIMIT 10;
2021-07-12 07:41:05 UTC LOG: GINBUG: called triConsistentFn(1): 1 0

the "STATEMENT" line is repeated 79586 times, the "GINBUG: called
triConsistentFn" - 79585 times.

there's nothing else in the log besides information about server startup.

i'll be happy to help with the follow up checks.

Ps. Sorry for the delay, I didn't see you reply until now, it went into
spam folder

no worries!

--
regards,
Pawel Kudzia

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#18)
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On 12/07/2021 20:24, Pawel Kudzia wrote:

i've run these two commands:

SET enable_seqscan=off;
SELECT entity_id FROM entity WHERE ( attribute_name_ids && '{1737}' )
AND NOT ( (attribute_name_ids||0) && '{1737}') LIMIT 10;

and got this in the output:

2021-07-12 07:41:05 UTC LOG: GINBUG: startScanKey called: excludeOnly
0 nentries 1

2021-07-12 07:41:05 UTC STATEMENT: SELECT entity_id FROM entity WHERE
( attribute_name_ids && '{1737}' ) AND NOT ( (attribute_name_ids||0)
&& '{1737}') LIMIT 10;
2021-07-12 07:41:05 UTC LOG: GINBUG: called triConsistentFn(1): 1 0

the "STATEMENT" line is repeated 79586 times, the "GINBUG: called
triConsistentFn" - 79585 times.

there's nothing else in the log besides information about server startup.

i'll be happy to help with the follow up checks.

Ok, that confirms that it's taking the simple path through
triConsistentFn(), with a single non-lossy key. Also, there don't seem
to be any pending inserts. Now that we've ruled those out, I have no
clue what the problem might be.

Is the data sensitive, or you could you share it? I realize it's a
pretty large database, but if I had a copy I could poke around to look
at the index pages involved here, to see if they're corrupt and how. If
that's possible, you can contact me directly, off-list, to figure the
logistics to send over a copy of the data directory.

- Heikki

In reply to: Heikki Linnakangas (#19)
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows

On Tue, Jul 13, 2021 at 3:25 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Is the data sensitive, or you could you share it? I realize it's a
pretty large database, but if I had a copy I could poke around to look
at the index pages involved here, to see if they're corrupt and how. If
that's possible, you can contact me directly, off-list, to figure the
logistics to send over a copy of the data directory.

Worth noting that the Wiki describes a number two ways to do this when
reporting a bug -- using gdb, and using contrib/pageinspect:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB

--
Peter Geoghegan

#21Pawel Kudzia
kudzia@gmail.com
In reply to: Peter Geoghegan (#20)
#22Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#21)
#23Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#22)
#24Alexander Korotkov
aekorotkov@gmail.com
In reply to: Alexander Korotkov (#13)
#25Pawel Kudzia
kudzia@gmail.com
In reply to: Alexander Korotkov (#24)
#26Alexander Korotkov
aekorotkov@gmail.com
In reply to: Pawel Kudzia (#25)
#27Pawel Kudzia
kudzia@gmail.com
In reply to: Alexander Korotkov (#26)
#28Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#23)
#29Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#28)
In reply to: Alexander Korotkov (#24)
#31Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#30)
In reply to: Heikki Linnakangas (#31)
#33Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#29)
#34Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Peter Geoghegan (#32)
In reply to: Heikki Linnakangas (#34)
In reply to: Peter Geoghegan (#35)
#37Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#33)
#38Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#37)
#39Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#38)
#40Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#39)
#41Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#40)
#42Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: PG Bug reporting form (#1)
#43Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#42)
#44Pawel Kudzia
kudzia@gmail.com
In reply to: Pawel Kudzia (#43)
#45Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#43)
#46Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#45)
#47Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#46)
#48Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Pawel Kudzia (#46)
#49Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#47)
#50Pawel Kudzia
kudzia@gmail.com
In reply to: Heikki Linnakangas (#48)
In reply to: Pawel Kudzia (#49)
#52Pawel Kudzia
kudzia@gmail.com
In reply to: Peter Geoghegan (#51)
#53Pawel Kudzia
kudzia@gmail.com
In reply to: Tom Lane (#5)
#54Pawel Kudzia
kudzia@gmail.com
In reply to: Pawel Kudzia (#53)
In reply to: Heikki Linnakangas (#48)
#56Pawel Kudzia
kudzia@gmail.com
In reply to: Pawel Kudzia (#54)