gin index trouble
I’ve hit this same message
Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type
in a couple of contexts and I’m starting to get worried.
I’ve rebuilt the index, but will that help?
Is there a way to see what the ‘different type’ is?
Is it caught/clean-up by vacuum analyse or some such?
I’ve had good results using “<@" and “@>” and believe I've defended the use of an array, but I can’t loose three days worth of simulations to this dang wrong sibling.
select version(); — will use production release of 10 next week.
version
------------------------------------------------------------------------------------------------------------
PostgreSQL 10beta3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit
The only gin index I have is in this table definition:
\d sui.probandset
Table "sui.probandset"
Column | Type | Modifiers
-------------+------------------+-----------
id | uuid | not null
name | text |
probands | uuid[] | not null
meioses | integer |
min_kincoef | double precision |
max_kincoef | double precision |
people_id | uuid | not null
Indexes:
"probandset_pkey" PRIMARY KEY, btree (id)
"probandsetunique" gin (probands)
Check constraints:
"sortedset" CHECK (issorteduuids(probands))
Foreign-key constraints:
"probandset_people_id_fkey" FOREIGN KEY (people_id) REFERENCES base.people(id)
Referenced by:
TABLE "sui.probandset_group_member" CONSTRAINT "probandset_group_member_member_id_fkey" FOREIGN KEY (member_id) REFERENCES sui.probandset(id)
TABLE "sui.segment" CONSTRAINT "segment_probandset_id_fkey" FOREIGN KEY (probandset_id) REFERENCES sui.probandset(id)
and I um, er, enabled gin on uuid by copying from a thread in this list, as follows:
create operator class _uuid_ops
default for type _uuid
using gin as
operator 1 &&(anyarray, anyarray)
,operator 2 @>(anyarray, anyarray)
,operator 3 <@(anyarray, anyarray)
,operator 4 =(anyarray, anyarray)
,function 1 uuid_cmp(uuid, uuid)
,function 2 ginarrayextract(anyarray, internal, internal)
,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal)
,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal)
,storage uuid;
Rob Sargent <rsargent@xmission.com> writes:
I’ve hit this same message
Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type
in a couple of contexts and I’m starting to get worried.
If you can make a test case that (eventually) hits that, we'd be
interested to see it ...
and I um, er, enabled gin on uuid by copying from a thread in this list, as follows:
create operator class _uuid_ops
default for type _uuid
using gin as
operator 1 &&(anyarray, anyarray)
,operator 2 @>(anyarray, anyarray)
,operator 3 <@(anyarray, anyarray)
,operator 4 =(anyarray, anyarray)
,function 1 uuid_cmp(uuid, uuid)
,function 2 ginarrayextract(anyarray, internal, internal)
,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal)
,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal)
,storage uuid;
You should not have needed to do that, I think, as the standard
anyarray GIN opclass should've handled it. Having said that,
I don't immediately see anything broken about this definition,
so it seems like it should've worked.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
If you can make a test case that (eventually) hits that, we'd be
interested to see it ...
Any hint(s) on what might trigger this sort of thing? I could duplicate
the upload, but I doubt you want the 800K records, 200M input file even
if it did regenerate the problem.
Would select * from <table> order by <gin'd column> show the message?
and I um, er, enabled gin on uuid by copying from a thread in this list, as follows:
create operator class _uuid_ops
default for type _uuid
using gin as
operator 1 &&(anyarray, anyarray)
,operator 2 @>(anyarray, anyarray)
,operator 3 <@(anyarray, anyarray)
,operator 4 =(anyarray, anyarray)
,function 1 uuid_cmp(uuid, uuid)
,function 2 ginarrayextract(anyarray, internal, internal)
,function 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal)
,function 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal)
,storage uuid;You should not have needed to do that, I think, as the standard
anyarray GIN opclass should've handled it. Having said that,
I don't immediately see anything broken about this definition,
so it seems like it should've worked.
Good to hear.
Thanks.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Oct 30, 2017 at 7:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Rob Sargent <rsargent@xmission.com> writes:
I’ve hit this same message
Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN page is of different type
in a couple of contexts and I’m starting to get worried.If you can make a test case that (eventually) hits that, we'd be
interested to see it ...
I suspect that this is the 9.6 bug that I described on that recent
-bugs thread [1]/messages/by-id/CAH2-WzmtLXbs8+c19t1T=Rj0KyP7vK9q8hQJULgDLdVMuEeeUw@mail.gmail.com -- Peter Geoghegan. It's just another symptom of the same problem.
It's certainly true that we saw a mix of undetectable
deadlocks/lock-ups (as seen on that -bugs thread) and corruption (as
seen on this thread) before commit e2c79e14 tried to address those
problems. Jeff Janes reported both symptoms in the thread leading up
to that commit during the beta period for 9.6. My guess is that that
commit was insufficient, and that we now continue to see the same mix
of symptoms for what is essentially the same bug.
[1]: /messages/by-id/CAH2-WzmtLXbs8+c19t1T=Rj0KyP7vK9q8hQJULgDLdVMuEeeUw@mail.gmail.com -- Peter Geoghegan
--
Peter Geoghegan
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Rob Sargent <robjsargent@gmail.com> writes:
If you can make a test case that (eventually) hits that, we'd be
interested to see it ...
Any hint(s) on what might trigger this sort of thing? I could duplicate
the upload, but I doubt you want the 800K records, 200M input file even
if it did regenerate the problem.
It's possible you could duplicate the failure with synthetic data
generated by a not-very-long script. That would beat uploading
a large data file, not to mention possibly needing to sanitize
your data.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Fri, Oct 27, 2017 at 3:15 PM, Rob Sargent <rsargent@xmission.com> wrote:
I’ve hit this same message
Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN
page is of different typein a couple of contexts and I’m starting to get worried.
I’ve rebuilt the index, but will that help?
Is there a way to see what the ‘different type’ is?
Is it caught/clean-up by vacuum analyse or some such?
Is there a lot of churn on this table? Do you either heavily update or
heavily delete rows in the table? Does vacuum tend to run on the table
rather frequently?
--
Peter Geoghegan
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 10/30/2017 10:32 AM, Peter Geoghegan wrote:
On Fri, Oct 27, 2017 at 3:15 PM, Rob Sargent <rsargent@xmission.com> wrote:
I’ve hit this same message
Caused by: org.postgresql.util.PSQLException: ERROR: right sibling of GIN
page is of different typein a couple of contexts and I’m starting to get worried.
I’ve rebuilt the index, but will that help?
Is there a way to see what the ‘different type’ is?
Is it caught/clean-up by vacuum analyse or some such?Is there a lot of churn on this table? Do you either heavily update or
heavily delete rows in the table? Does vacuum tend to run on the table
rather frequently?
Peter, you beat me to the punch. I was just about to say "Having read
the referenced message I thought I would add that we never delete from
this table." In this particular case it was written to record by
record, in a previous execution and at the time of the error it was only
being read. (In case you've been following, the failed execution would
have added ~1M "segments", each which references an entry in the gin'd
table "probandsets" - but like a rookie I'm looking up each
probandset(2^16) individually. Re-working that NOW.)
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote:
Peter, you beat me to the punch. I was just about to say "Having read the
referenced message I thought I would add that we never delete from this
table." In this particular case it was written to record by record, in a
previous execution and at the time of the error it was only being read. (In
case you've been following, the failed execution would have added ~1M
"segments", each which references an entry in the gin'd table "probandsets"
- but like a rookie I'm looking up each probandset(2^16) individually.
Re-working that NOW.)
It's not surprising that only a SELECT statement could see this
problem. I guess that it's possible that only page deletions used for
the pending list are involved here.
I'm not sure how reliably you can recreate the problem, but if it
doesn't take too long then it would be worth seeing what effect
turning off the FASTUPDATE storage parameter for the GIN index has.
That could prevent the problem from recurring, and would support my
theory about what's up here. (It wouldn't fix the corruption, though.)
Of course, what I'd much prefer is a self-contained test case. But if
you can't manage that, or if reproducing the issue takes hours, then
this simpler experiment might be worthwhile.
--
Peter Geoghegan
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 10/30/2017 10:56 AM, Peter Geoghegan wrote:
On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote:
Peter, you beat me to the punch. I was just about to say "Having read the
referenced message I thought I would add that we never delete from this
table." In this particular case it was written to record by record, in a
previous execution and at the time of the error it was only being read. (In
case you've been following, the failed execution would have added ~1M
"segments", each which references an entry in the gin'd table "probandsets"
- but like a rookie I'm looking up each probandset(2^16) individually.
Re-working that NOW.)It's not surprising that only a SELECT statement could see this
problem. I guess that it's possible that only page deletions used for
the pending list are involved here.I'm not sure how reliably you can recreate the problem, but if it
doesn't take too long then it would be worth seeing what effect
turning off the FASTUPDATE storage parameter for the GIN index has.
That could prevent the problem from recurring, and would support my
theory about what's up here. (It wouldn't fix the corruption, though.)Of course, what I'd much prefer is a self-contained test case. But if
you can't manage that, or if reproducing the issue takes hours, then
this simpler experiment might be worthwhile.
I can reload the gin'd table repeatedly in a dev environment. Does
select * from <table> order by <gin'd column> expose the corruption or
does the load itself necessarily fail at the moment of corruption?
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 10/30/2017 10:56 AM, Peter Geoghegan wrote:
On Mon, Oct 30, 2017 at 9:45 AM, Rob Sargent <robjsargent@gmail.com> wrote:
Peter, you beat me to the punch. I was just about to say "Having read the
referenced message I thought I would add that we never delete from this
table." In this particular case it was written to record by record, in a
previous execution and at the time of the error it was only being read. (In
case you've been following, the failed execution would have added ~1M
"segments", each which references an entry in the gin'd table "probandsets"
- but like a rookie I'm looking up each probandset(2^16) individually.
Re-working that NOW.)It's not surprising that only a SELECT statement could see this
problem. I guess that it's possible that only page deletions used for
the pending list are involved here.I'm not sure how reliably you can recreate the problem, but if it
doesn't take too long then it would be worth seeing what effect
turning off the FASTUPDATE storage parameter for the GIN index has.
That could prevent the problem from recurring, and would support my
theory about what's up here. (It wouldn't fix the corruption, though.)Of course, what I'd much prefer is a self-contained test case. But if
you can't manage that, or if reproducing the issue takes hours, then
this simpler experiment might be worthwhile.
My test database machine is:
Not virtual
Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz (quad core, hyperthreaded)
MemTotal: 16272548 kB
default postgres.conf from yum install postgresql-10*
I've loaded thrice the number of records (190K) into the problem table,
but no sign yet of the problem. But unlike the production
lookup-notfind-insert (anti)pattern, these were all loaded in a single
transaction. I think the following query has to read the gin'd column of
every record:
select array_length(probands,1) as heads,
count(*) as occurs
from tld.probandset
where probands @>
'{65fe3b60-1c86-4b14-a85d-21abdf68f9e2,f0963403-3f3c-426d-a828-b5bfff914bb4}'
group by array_length(probands,1)
order by array_length(probands,1);
heads | occurs
-------+--------
2 | 1
3 | 14
4 | 91
5 | 364
6 | 1001
7 | 2002
8 | 3003
9 | 3432
10 | 3003
11 | 2002
12 | 1001
13 | 364
14 | 91
15 | 14
16 | 1
(15 rows)
Time: 17.125 ms
Happy as a clam.
I'll try a run of the antipattern. I have NOT diddled FASTUPDATE at all.