ltree_gist indexes broken after pg_upgrade from 12 to 13
Hi,
Victor Yegorov reported a crash related to a GiST index on ltree [1]/messages/by-id/17406-71e02820ae79bb40@postgresql.org,
following a pg_upgrade from 12.x to 14.x, with a data set reproducing
this. I spent some time investigating this, and it seems this is a silly
bug in commit
commit 911e70207703799605f5a0e8aad9f06cff067c63 (HEAD)
Author: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
Date: Mon Mar 30 19:17:11 2020 +0300
Implement operator class parameters
...
in PG13, which modified ltree_gist so that siglen is opclass parameter
(and not hard-coded). But the procedures use LTREE_GET_ASIGLEN macro to
extract the value, which either extracts the value from the catalog (if
the index has opclass parameter) or uses a default value - but it always
uses LTREE_ASIGLEN_DEFAULT, which belongs to _ltree_gist opclass (i.e.
for array of ltree). And that's 28 instead of the 8, as it should be.
Attached is a simple patch tweaking the macros, which resolves the
issue. Maybe there should be a separate macro though, because "ASIGLEN"
probably refers to "array siglen".
But the bigger issue is this fixes indexes created before the upgrade,
but it breaks indexes created after it. Because those were created with
siglen 28B, so reverting to 8B breaks them. A bit of a symmetry :-/
I'm not sure what to do about this. There's no way to distinguish
indexes, and I think an index may actually be broken both with and
without the patch - imagine an index created before the upgrade (so
using siglen 8), but then receiving a couple new rows (siglen 28).
After thinking about this I only see two options:
1) Don't apply the patch and tell everyone using ltree_gist they need to
rebuild the index after pg_upgrade from 12 to 13+. The downside of this
is larger indexes (because some tuples are 20B larger).
2) Apply the patch + tell those who already upgraded from 12 to rebuild
ltree_gist indexes, because those might be broken due to new inserts.
My opinion is to do (2), because at least those who'll upgrade later
(which is a lot of people) will be fine without a rebuild. And it also
makes the indexes a bit smaller, thanks to saving 20B.
regards
[1]: /messages/by-id/17406-71e02820ae79bb40@postgresql.org
/messages/by-id/17406-71e02820ae79bb40@postgresql.org
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachments:
0001-ltree-gist-siglen-fix.patchtext/x-patch; charset=UTF-8; name=0001-ltree-gist-siglen-fix.patchDownload+13-14
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
After thinking about this I only see two options:
1) Don't apply the patch and tell everyone using ltree_gist they need to
rebuild the index after pg_upgrade from 12 to 13+. The downside of this
is larger indexes (because some tuples are 20B larger).
2) Apply the patch + tell those who already upgraded from 12 to rebuild
ltree_gist indexes, because those might be broken due to new inserts.
Yeah, let's fix it and advise people to reindex those indexes.
Won't be the first time release notes have contained such advice.
Plus, it sounds like we'd have to advise people to reindex *anyway*,
just in a slightly different set of cases.
regards, tom lane
Hi,
On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote:
After thinking about this I only see two options:
1) Don't apply the patch and tell everyone using ltree_gist they need to
rebuild the index after pg_upgrade from 12 to 13+. The downside of this
is larger indexes (because some tuples are 20B larger).2) Apply the patch + tell those who already upgraded from 12 to rebuild
ltree_gist indexes, because those might be broken due to new inserts.My opinion is to do (2), because at least those who'll upgrade later
(which is a lot of people) will be fine without a rebuild. And it also
makes the indexes a bit smaller, thanks to saving 20B.
Won't 2) also break indexes created without a pg_upgrade? "already upgraded
from 12" sounds like it wouldn't but I don't see why?
Greetings,
Andres Freund
On 2/24/22 23:06, Andres Freund wrote:
Hi,
On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote:
After thinking about this I only see two options:
1) Don't apply the patch and tell everyone using ltree_gist they need to
rebuild the index after pg_upgrade from 12 to 13+. The downside of this
is larger indexes (because some tuples are 20B larger).2) Apply the patch + tell those who already upgraded from 12 to rebuild
ltree_gist indexes, because those might be broken due to new inserts.My opinion is to do (2), because at least those who'll upgrade later
(which is a lot of people) will be fine without a rebuild. And it also
makes the indexes a bit smaller, thanks to saving 20B.Won't 2) also break indexes created without a pg_upgrade? "already upgraded
from 12" sounds like it wouldn't but I don't see why?
It will, unfortunately - that's why I wrote "upgrade" in that sentence.
I should have been more explicit, sorry. But any new index tuples formed
after starting the 13+ cluster are/may be corrupted.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2/24/22 23:13, Tomas Vondra wrote:
On 2/24/22 23:06, Andres Freund wrote:
Hi,
On 2022-02-24 22:15:20 +0100, Tomas Vondra wrote:
After thinking about this I only see two options:
1) Don't apply the patch and tell everyone using ltree_gist they need to
rebuild the index after pg_upgrade from 12 to 13+. The downside of this
is larger indexes (because some tuples are 20B larger).2) Apply the patch + tell those who already upgraded from 12 to rebuild
ltree_gist indexes, because those might be broken due to new inserts.My opinion is to do (2), because at least those who'll upgrade later
(which is a lot of people) will be fine without a rebuild. And it also
makes the indexes a bit smaller, thanks to saving 20B.Won't 2) also break indexes created without a pg_upgrade? "already upgraded
from 12" sounds like it wouldn't but I don't see why?It will, unfortunately - that's why I wrote "upgrade" in that sentence.
I should have been more explicit, sorry. But any new index tuples formed
after starting the 13+ cluster are/may be corrupted.
I wonder if we could check the index tuple length, and adjust the siglen
based on that, somehow ...
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
I wonder if we could check the index tuple length, and adjust the siglen
based on that, somehow ...
In an already-corrupted index, there won't be a unique answer.
regards, tom lane
Hi,
On February 24, 2022 5:44:57 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
I wonder if we could check the index tuple length, and adjust the siglen
based on that, somehow ...In an already-corrupted index, there won't be a unique answer.
If we're breaking every ltree gist index, can we detect that it's not an index generated by the new version? Most people don't read release notes, and this is just about guaranteed to break all.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On 25.02.2022 00:15, Tomas Vondra wrote:
Hi,
Victor Yegorov reported a crash related to a GiST index on ltree [1],
following a pg_upgrade from 12.x to 14.x, with a data set reproducing
this. I spent some time investigating this, and it seems this is a silly
bug in commitcommit 911e70207703799605f5a0e8aad9f06cff067c63 (HEAD)
Author: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
Date: Mon Mar 30 19:17:11 2020 +0300Implement operator class parameters
...in PG13, which modified ltree_gist so that siglen is opclass parameter
(and not hard-coded). But the procedures use LTREE_GET_ASIGLEN macro to
extract the value, which either extracts the value from the catalog (if
the index has opclass parameter) or uses a default value - but it always
uses LTREE_ASIGLEN_DEFAULT, which belongs to _ltree_gist opclass (i.e.
for array of ltree). And that's 28 instead of the 8, as it should be.
It seems that ltree extension simply was not updated to v1.2 after
pg_upgrade:
ALTER EXTENSION ltree UPDATE TO '1.2'; -- is missing
Upgrade script ltree--1.1--1.2.sql creates ltree_gist_options() and
registers it in the opclass. ltree_gist_options() initializes bytea
options using the correct SIGLEN_DEFAULT=8.
If ltree_gist_options() is absent, LTREE_GET_ASIGLEN() receives NULL
and wrong LTREE_ASIGLEN_DEFAULT is used. But if ltree_gist_options()
is registered, LTREE_GET_ASIGLEN() receives non-NULL bytea options
with the correct default value.
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.
Also I found that contrib/pg_trgm/trgm_gist.c uses wrongly named macro
LTREE_GET_ASIGLEN().
--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
On 3/4/22 20:29, Nikita Glukhov wrote:
On 25.02.2022 00:15, Tomas Vondra wrote:
Hi,
Victor Yegorov reported a crash related to a GiST index on ltree [1],
following a pg_upgrade from 12.x to 14.x, with a data set reproducing
this. I spent some time investigating this, and it seems this is a silly
bug in commitcommit 911e70207703799605f5a0e8aad9f06cff067c63 (HEAD)
Author: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
Date: Mon Mar 30 19:17:11 2020 +0300Implement operator class parameters
...in PG13, which modified ltree_gist so that siglen is opclass parameter
(and not hard-coded). But the procedures use LTREE_GET_ASIGLEN macro to
extract the value, which either extracts the value from the catalog (if
the index has opclass parameter) or uses a default value - but it always
uses LTREE_ASIGLEN_DEFAULT, which belongs to _ltree_gist opclass (i.e.
for array of ltree). And that's 28 instead of the 8, as it should be.It seems that ltree extension simply was not updated to v1.2 after
pg_upgrade:
ALTER EXTENSION ltree UPDATE TO '1.2'; -- is missingUpgrade script ltree--1.1--1.2.sql creates ltree_gist_options() and
registers it in the opclass. ltree_gist_options() initializes bytea
options using the correct SIGLEN_DEFAULT=8.If ltree_gist_options() is absent, LTREE_GET_ASIGLEN() receives NULL
and wrong LTREE_ASIGLEN_DEFAULT is used. But if ltree_gist_options()
is registered, LTREE_GET_ASIGLEN() receives non-NULL bytea options
with the correct default value.So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.
IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.
Moreover, it's perfectly legal to install older version, so you can do
CREATE EXTENSION ltree VERSION '1.1';
So I don't think we can call this "incomplete upgrade".
Also I found that contrib/pg_trgm/trgm_gist.c uses wrongly named macro
LTREE_GET_ASIGLEN().
Does it? When I grep for LTREE_GET_ASIGLEN, I don't get any matches in
pg_trgm.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
On 3/4/22 20:29, Nikita Glukhov wrote:
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.
IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.
Yeah, exactly. But this opens up an additional consideration we
have to account for: whatever we do needs to work with either 1.1
or 1.2 SQL-level versions of the extension.
regards, tom lane
On 04.03.2022 23:28, Tom Lane wrote:
Tomas Vondra<tomas.vondra@enterprisedb.com> writes:
On 3/4/22 20:29, Nikita Glukhov wrote:
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.Yeah, exactly. But this opens up an additional consideration we
have to account for: whatever we do needs to work with either 1.1
or 1.2 SQL-level versions of the extension.regards, tom lane
It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
is not so much related to PG12 => PG13+ upgrades.
The following examples crashes PG13+:
CREATE EXTENSION ltree VERSION "1.1";
CREATE TABLE test AS
SELECT i::text::ltree data
FROM generate_series(1, 100000) i;
CREATE INDEX ON test USING gist (data);
ALTER EXTENSION ltree UPDATE TO "1.2";
-- works, cached bytea options is still NULL and siglen is 28
SELECT * FROM test WHERE data = '12345';
\connect
-- crash, siglen = 8
SELECT * FROM test WHERE data = '12345';
You can see here another problem: installation of opclass options
procedure does not invalidate relation's opclass options cache.
--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company
On 3/4/22 23:09, Nikita Glukhov wrote:
On 04.03.2022 23:28, Tom Lane wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
On 3/4/22 20:29, Nikita Glukhov wrote:
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.Yeah, exactly. But this opens up an additional consideration we
have to account for: whatever we do needs to work with either 1.1
or 1.2 SQL-level versions of the extension.regards, tom lane
It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
is not so much related to PG12 => PG13+ upgrades.
Well, it's quite a mess :-(
It very clearly is not just 1.1 -> 1.2 upgrade issue, because you can
get a crash with 1.1 after PG12 -> PG13 upgrade, as demonstrated
earlier. So just "fixing" the extension upgrade is no enough.
But as you showed, 1.1 -> 1.2 upgrade is broken too.
You can see here another problem: installation of opclass options
procedure does not invalidate relation's opclass options cache.
Seems like that.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi!
Sorry for this terrible oversight by me.
On Sat, Mar 5, 2022 at 10:13 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
On 3/4/22 23:09, Nikita Glukhov wrote:
On 04.03.2022 23:28, Tom Lane wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
On 3/4/22 20:29, Nikita Glukhov wrote:
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.Yeah, exactly. But this opens up an additional consideration we
have to account for: whatever we do needs to work with either 1.1
or 1.2 SQL-level versions of the extension.regards, tom lane
It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
is not so much related to PG12 => PG13+ upgrades.
So, it seems that ltree 1.1 in PG13+ is incompatible with ltree on
PG12 and ltree 1.2 on PG13+. And there are many scenarios involving.
It seems too difficult to identify all the broken cases in the release
notes. What about applying a patch and asking all ltree users to
reindex their indexes?
Well, it's quite a mess :-(
It very clearly is not just 1.1 -> 1.2 upgrade issue, because you can
get a crash with 1.1 after PG12 -> PG13 upgrade, as demonstrated
earlier. So just "fixing" the extension upgrade is no enough.But as you showed, 1.1 -> 1.2 upgrade is broken too.
You can see here another problem: installation of opclass options
procedure does not invalidate relation's opclass options cache.Seems like that.
I think opclass options procedure shouldn't normally change opclass
options chache. Before installation of options procedure, there
should be no options. And options procedure shouldn't change the
defaults in this case.
------
Regards,
Alexander Korotkov
On 3/4/22 15:28, Tom Lane wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
On 3/4/22 20:29, Nikita Glukhov wrote:
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.Yeah, exactly. But this opens up an additional consideration we
have to account for: whatever we do needs to work with either 1.1
or 1.2 SQL-level versions of the extension.
This is an area not currently touched by the buildfarm's cross version
upgrade testing, which basically compares a pre-upgrade and post-upgrade
dump of the databases. The upgraded cluster does contain
contrib_regression_ltree.
I'm open to suggestions on how we might improve the buildfarm's testing
of upgraded indexes generally.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
On 3/6/22 08:09, Alexander Korotkov wrote:
Hi!
Sorry for this terrible oversight by me.
On Sat, Mar 5, 2022 at 10:13 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:On 3/4/22 23:09, Nikita Glukhov wrote:
On 04.03.2022 23:28, Tom Lane wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
On 3/4/22 20:29, Nikita Glukhov wrote:
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.Yeah, exactly. But this opens up an additional consideration we
have to account for: whatever we do needs to work with either 1.1
or 1.2 SQL-level versions of the extension.regards, tom lane
It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
is not so much related to PG12 => PG13+ upgrades.So, it seems that ltree 1.1 in PG13+ is incompatible with ltree on
PG12 and ltree 1.2 on PG13+. And there are many scenarios involving.It seems too difficult to identify all the broken cases in the release
notes. What about applying a patch and asking all ltree users to
reindex their indexes?
Yeah. I think this is getting so complicated that there's little chance
we'd be able to clearly explain when to reindex.
Well, it's quite a mess :-(
It very clearly is not just 1.1 -> 1.2 upgrade issue, because you can
get a crash with 1.1 after PG12 -> PG13 upgrade, as demonstrated
earlier. So just "fixing" the extension upgrade is no enough.But as you showed, 1.1 -> 1.2 upgrade is broken too.
You can see here another problem: installation of opclass options
procedure does not invalidate relation's opclass options cache.Seems like that.
I think opclass options procedure shouldn't normally change opclass
options chache. Before installation of options procedure, there
should be no options. And options procedure shouldn't change the
defaults in this case.
I should clarify I haven't looked into the details - I merely verified
accessing the index in the same backend works, and after reconnection it
crashes (per Nikita's example). I don't know why that happans.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2022-03-06 07:46:04 -0500, Andrew Dunstan wrote:
This is an area not currently touched by the buildfarm's cross version
upgrade testing, which basically compares a pre-upgrade and post-upgrade
dump of the databases. The upgraded cluster does contain
contrib_regression_ltree.I'm open to suggestions on how we might improve the buildfarm's testing
of upgraded indexes generally.
One thing that's likely worth doing as part of the cross version upgrade test,
even if it wouldn't even help in this case, is to run amcheck post
upgrade. Just dumping data isn't going to touch indices at all.
A sequence of
pg_upgrade; amcheck; upgrade all extensions; amcheck;
would make sense.
Greetings,
Andres Freund
пн, 7 мар. 2022 г. в 00:34, Andres Freund <andres@anarazel.de>:
One thing that's likely worth doing as part of the cross version upgrade
test,
even if it wouldn't even help in this case, is to run amcheck post
upgrade. Just dumping data isn't going to touch indices at all.A sequence of
pg_upgrade; amcheck; upgrade all extensions; amcheck;
would make sense.
Is it possible to amcheck gist indexes?..
--
Victor Yegorov
Hi,
On 2022-03-07 01:12:07 +0200, Victor Yegorov wrote:
пн, 7 мар. 2022 г. в 00:34, Andres Freund <andres@anarazel.de>:
One thing that's likely worth doing as part of the cross version upgrade
test,
even if it wouldn't even help in this case, is to run amcheck post
upgrade. Just dumping data isn't going to touch indices at all.A sequence of
pg_upgrade; amcheck; upgrade all extensions; amcheck;
would make sense.Is it possible to amcheck gist indexes?..
No. That was what I was alluding to with "even if it wouldn't even help in
this case".
Greetings,
Andres Freund
On Sun, Mar 6, 2022 at 8:28 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
On 3/6/22 08:09, Alexander Korotkov wrote:
Sorry for this terrible oversight by me.
On Sat, Mar 5, 2022 at 10:13 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:On 3/4/22 23:09, Nikita Glukhov wrote:
On 04.03.2022 23:28, Tom Lane wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
On 3/4/22 20:29, Nikita Glukhov wrote:
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.Yeah, exactly. But this opens up an additional consideration we
have to account for: whatever we do needs to work with either 1.1
or 1.2 SQL-level versions of the extension.regards, tom lane
It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
is not so much related to PG12 => PG13+ upgrades.So, it seems that ltree 1.1 in PG13+ is incompatible with ltree on
PG12 and ltree 1.2 on PG13+. And there are many scenarios involving.It seems too difficult to identify all the broken cases in the release
notes. What about applying a patch and asking all ltree users to
reindex their indexes?Yeah. I think this is getting so complicated that there's little chance
we'd be able to clearly explain when to reindex.
Good. The revised patch is attached. Instead of adding argument to
LTREE_GET_ASIGLEN(), it introduces separate LTREE_GET_SIGLEN() and
LTREE_GET_ASIGLEN() macros.
------
Regards,
Alexander Korotkov
Attachments:
0001-ltree-gist-siglen-fix-v2.patchapplication/octet-stream; name=0001-ltree-gist-siglen-fix-v2.patchDownload+12-12
On Tue, Mar 8, 2022 at 2:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sun, Mar 6, 2022 at 8:28 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:On 3/6/22 08:09, Alexander Korotkov wrote:
Sorry for this terrible oversight by me.
On Sat, Mar 5, 2022 at 10:13 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:On 3/4/22 23:09, Nikita Glukhov wrote:
On 04.03.2022 23:28, Tom Lane wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
On 3/4/22 20:29, Nikita Glukhov wrote:
So, we probably have corrupted indexes that were updated since such
"incomplete" upgrade of ltree.IIRC pg_upgrade is not expected to upgrade extensions - it keeps the
installed version of the extension, and that's intentional.Yeah, exactly. But this opens up an additional consideration we
have to account for: whatever we do needs to work with either 1.1
or 1.2 SQL-level versions of the extension.regards, tom lane
It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem
is not so much related to PG12 => PG13+ upgrades.So, it seems that ltree 1.1 in PG13+ is incompatible with ltree on
PG12 and ltree 1.2 on PG13+. And there are many scenarios involving.It seems too difficult to identify all the broken cases in the release
notes. What about applying a patch and asking all ltree users to
reindex their indexes?Yeah. I think this is getting so complicated that there's little chance
we'd be able to clearly explain when to reindex.Good. The revised patch is attached. Instead of adding argument to
LTREE_GET_ASIGLEN(), it introduces separate LTREE_GET_SIGLEN() and
LTREE_GET_ASIGLEN() macros.
No feedback yet. I'm going to push this if no objections.
------
Regards,
Alexander Korotkov