Fixing the btree_gist inet mess
As we've known for years[1]/messages/by-id/201010112055.o9BKtZf7011251@wwwmaster.postgresql.org[2]/messages/by-id/7891efc1-8378-2cf2-617b-4143848ec895@proxel.se[3]/messages/by-id/19000-2525470d200672ab@postgresql.org, contrib/btree_gist's opclasses
for inet/cidr columns are fundamentally broken: they rely on the
planner's convert_network_to_scalar() function, which was only
ever intended to give approximate results, so you get the wrong
answers in edge cases. There isn't anything that can be done
about that without breaking on-disk compatibility for such indexes,
so we haven't tried. What we did do some time ago was to implement
a hopefully-correct, in-core gist network_ops opclass to replace the
btree_gist opclasses. But people are still using the btree_gist
opclasses, because those are marked default and the in-core opclass
isn't.
It's past time to move this problem along and try to get out of the
business of encouraging use of known-broken code. I propose that
for v19, we should flip the opcdefault status so that network_ops is
marked default and the btree_gist opclasses are not. This will be
enough to ensure that network_ops is used unless the user explicitly
specifies to do differently. I don't think we should go further than
that yet (ie, not actively disable the btree_gist code) for a couple
of reasons: (1) this step is messy enough already, and (2) given the
current situation, the in-core network_ops opclass may be less well
tested than one would like. So I don't think we have enough evidence
to decide that we can summarily force everyone onto it; broken or not,
there haven't been that many complaints about btree_gist's opclasses.
Having done this, the effects of a plain pg_dump from v18- and restore
into v19+ will be to recreate GiST indexes on inet/cidr columns using
network_ops even if they were previously using btree_gist. That will
happen because in v18-, those opclasses were marked opcdefault and
pg_dump intentionally omits the explicit opclass specification in that
case. So that works the way we want.
pg_upgrade is more of a problem, because its invocation of pg_dump
will also omit the explicit opclass specification, resulting in the
new server thinking that the index uses network_ops while the on-disk
data isn't compatible with that. We can't really change that pg_dump
behavior, because that aspect is managed inside the old server's
pg_get_indexdef() function. The only solution I can see is for
pg_upgrade to refuse to upgrade indexes that use those opclasses.
We can tell users to replace them with network_ops indexes before
upgrading --- that's possible in 9.4 and later, so it should be
a good enough answer for almost everybody.
The attached draft patch implements these ideas and seems to do
the right things in testing. It's worth remarking on the way
that I did the "mark the btree_gist opclasses not-default" part:
I hacked up DefineOpClass() to ignore the DEFAULT specification if
the opclass being created has the right name and input data type.
That certainly has a foul odor about it, but the alternatives seem
worse. We can't simply add a btree_gist update step to remove
the DEFAULT setting, because btree_gist--1.2.sql will already have
failed as a consequence of trying to create a default opclass when
there already is one. Modifying btree_gist--1.2.sql to remove the
DEFAULT markings might be safe, but it goes against our longstanding
rule that extension scripts don't change once shipped, and I'm not
entirely sure that there aren't bad consequences if we break that
rule. (I did go as far as to add a comment to it about what will
really happen.) Moreover, even if we were willing to risk changing
btree_gist--1.2.sql, that's not enough: pg_upgrade would still fail,
because it dumps extensions by content, and what it will see in the
old installation is btree_gist opclasses that are marked default.
So hacking up DefineOpClass() can solve both the
normal-extension-install case and the pg_upgrade case for not a lot
of code, and I'm not seeing another way that's better.
There are a couple of loose ends still to be dealt with. We need
to say something about this in btree-gist.sgml, but I've not
attempted to write that text yet. Also, I expect that
cross-version-upgrade testing will spit up on the inet/cidr indexes
created by btree_gist's regression tests. There's probably
nothing that can be done about the latter except to teach
AdjustUpgrade.pm to drop those indexes from the old installation.
Thoughts?
regards, tom lane
[1]: /messages/by-id/201010112055.o9BKtZf7011251@wwwmaster.postgresql.org
[2]: /messages/by-id/7891efc1-8378-2cf2-617b-4143848ec895@proxel.se
[3]: /messages/by-id/19000-2525470d200672ab@postgresql.org
Attachments:
v1-0001-Mark-GiST-network_ops-opcdefault-and-btree_gist-s.patchtext/x-diff; charset=us-ascii; name*0=v1-0001-Mark-GiST-network_ops-opcdefault-and-btree_gist-s.p; name*1=atchDownload+117-8
On Fri, 1 Aug 2025 at 20:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
There are a couple of loose ends still to be dealt with. We need
to say something about this in btree-gist.sgml, but I've not
attempted to write that text yet. Also, I expect that
cross-version-upgrade testing will spit up on the inet/cidr indexes
created by btree_gist's regression tests. There's probably
nothing that can be done about the latter except to teach
AdjustUpgrade.pm to drop those indexes from the old installation.Thoughts?
This was long overdue from a project perspective, so thanks for picking this up.
I think we should still adjust btree-gist--1.2.sql, if only because it
adds stronger protections against any future installs that might try
to get this flag configured. Especially if we at some point in the far
future want to be able to remove this hack we should stop shipping
code that would break without the hack in new releases.
That doesn't remove the need for the pg_upgrade -related code changes,
but I think that just means we need to do both.
As for the rest of the patch:
+ /* + * HACK: if we're trying to create btree_gist's gist_inet_ops or + * gist_cidr_ops, avoid failure in the next stanza by silently making the + * new opclass non-default. Without this kluge, we would fail to load + * pre-v19 definitions of contrib/btree_gist. We can remove it sometime + * in the far future when we don't expect any such definitions to exist. + */ + if (isDefault) + { + if (amoid == GIST_AM_OID && + ((typeoid == INETOID && strcmp(opcname, "gist_inet_ops") == 0) || + (typeoid == CIDROID && strcmp(opcname, "gist_cidr_ops") == 0))) + isDefault = false; + }
Could we either limit this hack to pg_upgrade cases, or add a WARNING
whenever this condition is triggered and the DEFAULT flag is
overwritten? I think that a user trying to execute such commands
should be made aware that some part of their SQL command was ignored.
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
On Fri, 1 Aug 2025 at 20:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thoughts?
This was long overdue from a project perspective, so thanks for picking this up.
I think we should still adjust btree-gist--1.2.sql, if only because it
adds stronger protections against any future installs that might try
to get this flag configured. Especially if we at some point in the far
future want to be able to remove this hack we should stop shipping
code that would break without the hack in new releases.
Well ... mumble. Project policy has been that extension scripts don't
change once shipped. We have no experience with violating that policy
and hence little certainty about what might break. I don't think that
we can be certain that nothing will break, because for example there
might be some packager out there who has relied on that policy and
decided they could store extension scripts from multiple PG releases
in one directory.
It's probably worth crossing that bridge at some point, but I'd
rather not make a bug fix dependent on it.
One potential path forward is to roll up the existing series of
update scripts to create a new installation-from-scratch script
btree-gist--1.9.sql which would not try to mark the opclasses as
default. And I guess we could provide a btree-gist--1.8--1.9.sql
update script that includes manual catalog updates to turn off the
opcdefault flags if they're somehow on; though I'm not sure that
that case would be reachable, so maybe the 1.8--1.9 update could
as well be empty. Sometime in the very far future, when we have
deprecated pg_upgrade from pre-v19 versions, we could remove all
the pre-1.9 script versions and remove the hack in DefineOpClass.
BTW, one reason why I'm not *that* excited about this is that we've
tolerated some related hacks for a very long time indeed. See for
instance this twenty-year-old gem in DefineIndex:
/*
* Hack to provide more-or-less-transparent updating of old RTREE
* indexes to GiST: if RTREE is requested and not found, use GIST.
*/
if (strcmp(accessMethodName, "rtree") == 0)
{
ereport(NOTICE,
(errmsg("substituting access method \"gist\" for obsolete method \"rtree\"")));
accessMethodName = "gist";
tuple = SearchSysCache1(AMNAME, PointerGetDatum(accessMethodName));
}
Could we either limit this hack to pg_upgrade cases, or add a WARNING
whenever this condition is triggered and the DEFAULT flag is
overwritten? I think that a user trying to execute such commands
should be made aware that some part of their SQL command was ignored.
I'm not opposed in principle to having a warning, but I don't want one
to come out when some user merely does CREATE EXTENSION btree_gist.
And I don't see how to avoid that if we don't touch
btree-gist--1.2.sql. In practice, the odds that somebody would hit
this behavior in some other context seem negligible: nobody would be
re-using btree_gist's opclass names.
regards, tom lane
I wrote:
One potential path forward is to roll up the existing series of
update scripts to create a new installation-from-scratch script
btree-gist--1.9.sql which would not try to mark the opclasses as
default.
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
Could we either limit this hack to pg_upgrade cases, or add a WARNING
whenever this condition is triggered and the DEFAULT flag is
overwritten? I think that a user trying to execute such commands
should be made aware that some part of their SQL command was ignored.
I'm not opposed in principle to having a warning, but I don't want one
to come out when some user merely does CREATE EXTENSION btree_gist.
And I don't see how to avoid that if we don't touch
btree-gist--1.2.sql.
Wait a minute ... if we create a rolled-up btree-gist--1.9.sql as
above, and that's the default version, then plain CREATE EXTENSION
wouldn't show the problem anyway. You'd have to explicitly try to
create an old version to reach the hack. So maybe a warning-if-
not-in-binary-upgrade-mode wouldn't be too noisy after all.
Let me give that a try.
regards, tom lane
On 01/08/2025 21:17, Tom Lane wrote:
It's past time to move this problem along and try to get out of the
business of encouraging use of known-broken code. I propose that
for v19, we should flip the opcdefault status so that network_ops is
marked default and the btree_gist opclasses are not. This will be
enough to ensure that network_ops is used unless the user explicitly
specifies to do differently.
+1
Bunch of ideas and opinions below, but I'm fine with your plan as is too:
I don't think we should go further than
that yet (ie, not actively disable the btree_gist code) for a couple
of reasons: (1) this step is messy enough already, and (2) given the> current situation, the in-core network_ops opclass may be less well
tested than one would like. So I don't think we have enough evidence
to decide that we can summarily force everyone onto it; broken or not,
there haven't been that many complaints about btree_gist's opclasses.
If we implement upgrade the way you propose, all upgraded databases,
whether it's with pg_dump or pg_upgrade, will switch to using
network_ops. The only way to get an index with the old btree_gist
opclass is to specify it explicitly, on v19. I think we might as well
remove it completely.
The upgrade as proposed will be a hurdle for anyone using the old
opclass anyway, so people will get some warning to test their
application after the upgrade. In the worst case people can stick with
the old version until any issues have been fixed.
Having done this, the effects of a plain pg_dump from v18- and restore
into v19+ will be to recreate GiST indexes on inet/cidr columns using
network_ops even if they were previously using btree_gist. That will
happen because in v18-, those opclasses were marked opcdefault and
pg_dump intentionally omits the explicit opclass specification in that
case. So that works the way we want.pg_upgrade is more of a problem, because its invocation of pg_dump
will also omit the explicit opclass specification, resulting in the
new server thinking that the index uses network_ops while the on-disk
data isn't compatible with that. We can't really change that pg_dump
behavior, because that aspect is managed inside the old server's
pg_get_indexdef() function. The only solution I can see is for
pg_upgrade to refuse to upgrade indexes that use those opclasses.
We can tell users to replace them with network_ops indexes before
upgrading --- that's possible in 9.4 and later, so it should be
a good enough answer for almost everybody.
I wonder if we could move the old opclass's code to core, and somehow
detect at runtime e.g. by looking at the root page, whether the index
was built before the upgrade. Hack initGISTstate() to redirect
everything to the old AM if it was created before the upgrade. The idea
is that after upgrade, all indexes would appear to be using the new
opclass if you look at the catalogs, but if it was pg_upgraded from an
older version, it would actually use the old functions. If you REINDEX,
it would get recreated in the new format, and would start using the new
functions. That would be the best user experience, but not sure it's
worth the effort and all the special hacks.
The attached draft patch implements these ideas and seems to do
the right things in testing. It's worth remarking on the way
that I did the "mark the btree_gist opclasses not-default" part:
I hacked up DefineOpClass() to ignore the DEFAULT specification if
the opclass being created has the right name and input data type.
That certainly has a foul odor about it, but the alternatives seem
worse. We can't simply add a btree_gist update step to remove
the DEFAULT setting, because btree_gist--1.2.sql will already have
failed as a consequence of trying to create a default opclass when
there already is one. Modifying btree_gist--1.2.sql to remove the
DEFAULT markings might be safe, but it goes against our longstanding
rule that extension scripts don't change once shipped, and I'm not
entirely sure that there aren't bad consequences if we break that
rule. (I did go as far as to add a comment to it about what will
really happen.) Moreover, even if we were willing to risk changing
btree_gist--1.2.sql, that's not enough: pg_upgrade would still fail,
because it dumps extensions by content, and what it will see in the
old installation is btree_gist opclasses that are marked default.
So hacking up DefineOpClass() can solve both the
normal-extension-install case and the pg_upgrade case for not a lot
of code, and I'm not seeing another way that's better.
Instead of having the hack in DefineOpClass(), we could have a similar
hack in pg_dump's dump_opclass(), only in binary upgrade mode.
It seems silly to keep an unmodified btree_gist--1.2.sql in v19, if it
actually gets installed in a different way. I feel we should truncate
the history and only include a new btree_gist--1.9.sql in v19.
Putting all that together, you get a more aggressive plan:
- Remove the old opclass entirely
- Remove all old btree_gist--*.sql scripts, start afresh with
btree_gist--1.9.sql
- Hack pg_dump, in binary upgrade mode, to dump the btree_gist extension
as simply "CREATE EXTENSION btree_gist;", instead of dumping the
individual members like it usually does.
Because btree_gist a contrib extension, we have the luxury that we can
do special hacks like this, in DefineOpClass() or in pg_upgrade. Out of
core extensions don't have that luxury. Could we generalize this?
I think the common case for extensions is that you'd want them be
implicitly upgraded to the latest version when you pg_upgrade. So for
most extensions, you would actually want pg_upgrade's dump and restore
to just do "CREATE EXTENSION foo;" instead of dumping the individual
members. But I'm sure there are exceptions. Could we add information to
the control file about that? For example, list all the older extension
versions that new default version is binary-compatible with. In
pg_upgrade, if the new default version is marked as binary-compatible
with the old installed version, install the new version on the new
cluster directly. Have support for extension scripts that are run on
pg_upgrade.
In the btree_gist case, the new version would be marked as
binary-compatible with all previous extension versions, but there would
be a pre-upgrade script that throws an error if there are any indexes
using the old opclass.
Also, I expect that cross-version-upgrade testing will spit up on
the inet/cidr indexes created by btree_gist's regression tests.
There's probably nothing that can be done about the latter except to
teach AdjustUpgrade.pm to drop those indexes from the old
installation.
Yeah. It would be nice to not drop them so that we have some test
coverage for upgrading them, though. At least if we do more with them
than just refuse the upgrade.
- Heikki
On 18/12/2025 13:15, Heikki Linnakangas wrote:
On 01/08/2025 21:17, Tom Lane wrote:
It's past time to move this problem along and try to get out of the
business of encouraging use of known-broken code. I propose that
for v19, we should flip the opcdefault status so that network_ops is
marked default and the btree_gist opclasses are not. This will be
enough to ensure that network_ops is used unless the user explicitly
specifies to do differently.+1
Bunch of ideas and opinions below, but I'm fine with your plan as is too:
Sorry, I was confused by my emails and replied to this old email
ignoring the later discussion. But I think all I said is still valid,
and some of it was already mentioned.
- Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes:
On 01/08/2025 21:17, Tom Lane wrote:
It's past time to move this problem along and try to get out of the
business of encouraging use of known-broken code. I propose that
for v19, we should flip the opcdefault status so that network_ops is
marked default and the btree_gist opclasses are not. This will be
enough to ensure that network_ops is used unless the user explicitly
specifies to do differently.
I wonder if we could move the old opclass's code to core, and somehow
detect at runtime e.g. by looking at the root page, whether the index
was built before the upgrade. Hack initGISTstate() to redirect
everything to the old AM if it was created before the upgrade. The idea
is that after upgrade, all indexes would appear to be using the new
opclass if you look at the catalogs, but if it was pg_upgraded from an
older version, it would actually use the old functions. If you REINDEX,
it would get recreated in the new format, and would start using the new
functions. That would be the best user experience, but not sure it's
worth the effort and all the special hacks.
It's more work than I want to do, anyway. Also, that path would
pretty much mean we could never get rid of the broken code.
Instead of having the hack in DefineOpClass(), we could have a similar
hack in pg_dump's dump_opclass(), only in binary upgrade mode.
Hmm ... I'll take a look at that. However, that would not allow
people to install pre-1.9 versions of btree_gist, at least not without
manually editing the extension script.
Putting all that together, you get a more aggressive plan:
- Remove the old opclass entirely
- Remove all old btree_gist--*.sql scripts, start afresh with
btree_gist--1.9.sql
- Hack pg_dump, in binary upgrade mode, to dump the btree_gist extension
as simply "CREATE EXTENSION btree_gist;", instead of dumping the
individual members like it usually does.
I think that that is where we want to end up in a release or two.
But as I explained upthread, I'm afraid to do it right off the bat:
I don't have 100% confidence in the new opclass code because I fear
it hasn't gotten enough road mileage. So I don't want to tell people
they flat out cannot use the old code as of v19. I think switching
the default choice of opclass is the right amount of risk for v19.
We could plan to remove the old code as of v20 or v21 or so.
regards, tom lane
Here's a v2 patchset that tries to address all the discussion so far.
The principal change from v1 is that I made a rolled-up
btree_gist--1.9.sql script in which the problem opclasses are not
marked DEFAULT. So that version can be installed without any
hack in DefineOpClass. This answer is much better than my v1 in
terms of having a clean way to describe the change in the module,
too.
We still need a hack for binary-upgrade mode though, since pg_dump
will dump the CREATE OPERATOR CLASS commands with DEFAULT if it's
looking at an old copy of btree_gist. (I looked at putting said hack
into pg_dump instead of the server, but it didn't seem like an
improvement. Heikki's idea of making pg_dump --binary-upgrade
dump "CREATE EXTENSION btree_gist" seemed much messier than this,
too.)
Because I didn't change DefineOpClass's behavior when
!IsBinaryUpgrade, any attempt to install a pre-1.9 version of
btree_gist will now fail. So we could remove btree_gist--1.2.sql
as well as btree_gist--1.0--1.1.sql and btree_gist--1.1--1.2.sql
without cost. (I've not done that here, as it would just bloat the
patchset some more.) However we should keep btree_gist--1.2--1.3.sql
and later delta scripts, so that users can update old definitions of
the module to 1.9 after a pg_upgrade.
I've also fixed up the cross-version-upgrade tests and written
some documentation.
One point perhaps worth mentioning here is that it works to
copy btree_gist--1.8--1.9.sql into a v18 installation and
issue
ALTER EXTENSION btree_gist UPDATE TO '1.9';
after which pg_upgrade will let you upgrade your old indexes
without complaint, because pg_dump will now do the right things.
I did not document this because (a) it does not work in anything
before v18 due to lack of btree_gist 1.8, and (b) we don't want
to encourage people to stay on the old opclasses in v19. But
perhaps somebody would find a reason to want to do this.
I'd probably squash all this into one commit at the end, but
I made it into several patches for review purposes.
regards, tom lane
Attachments:
v2-0001-Create-btree_gist-v1.9-in-which-inet-cidr-opclass.patchtext/x-diff; charset=us-ascii; name*0=v2-0001-Create-btree_gist-v1.9-in-which-inet-cidr-opclass.p; name*1=atchDownload+2015-2
v2-0002-Mark-btree_gist-1.9-as-the-default-version.patchtext/x-diff; charset=us-ascii; name=v2-0002-Mark-btree_gist-1.9-as-the-default-version.patchDownload+5-6
v2-0003-Mark-GiST-inet_ops-as-opcdefault-and-deal-with-en.patchtext/x-diff; charset=us-ascii; name*0=v2-0003-Mark-GiST-inet_ops-as-opcdefault-and-deal-with-en.p; name*1=atchDownload+186-4
On Fri, 19 Dec 2025 at 19:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Here's a v2 patchset that tries to address all the discussion so far.
Thanks!
[...]
Because I didn't change DefineOpClass's behavior when
!IsBinaryUpgrade, any attempt to install a pre-1.9 version of
btree_gist will now fail. So we could remove btree_gist--1.2.sql
as well as btree_gist--1.0--1.1.sql and btree_gist--1.1--1.2.sql
without cost. (I've not done that here, as it would just bloat the
patchset some more.) However we should keep btree_gist--1.2--1.3.sql
and later delta scripts, so that users can update old definitions of
the module to 1.9 after a pg_upgrade.
That all seems reasonable, yes.
One point perhaps worth mentioning here is that it works to
copy btree_gist--1.8--1.9.sql into a v18 installation and
issue
ALTER EXTENSION btree_gist UPDATE TO '1.9';
after which pg_upgrade will let you upgrade your old indexes
without complaint, because pg_dump will now do the right things.
I did not document this because (a) it does not work in anything
before v18 due to lack of btree_gist 1.8, and (b) we don't want
to encourage people to stay on the old opclasses in v19.
Agreed.
But
perhaps somebody would find a reason to want to do this.
Yes, perhaps. I also hope nobody needs to reach for this.
I'd probably squash all this into one commit at the end, but
I made it into several patches for review purposes.
LGTM.
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)