Dangling operator family after DROP TYPE

Started by Yoran Helingover 1 year ago4 messagesbugs
Jump to latest
#1Yoran Heling
contact@yorhel.nl

Hello list,

pg_upgrade was failing on one of my databases and, while digging a bit,
I found that the database in question contained a dangling operator
family for a type that didn't exist anymore.

I've attached a script to recreate this situation: creating a new type,
defining an operator class for it and then dropping the type causes the
implicitly created operator family to remain.

Attempting to drop this operator family results in an error. Attempting
to do a dump/restore results in a syntax error on restore.

The problematic database I found this on was running 15.10, but with the
attached script I can also reproduce it on 17.2.

Happy to provide more information as needed,

Yoran.

Attachments:

t.sqlapplication/sqlDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yoran Heling (#1)
Re: Dangling operator family after DROP TYPE

Yoran Heling <contact@yorhel.nl> writes:

pg_upgrade was failing on one of my databases and, while digging a bit,
I found that the database in question contained a dangling operator
family for a type that didn't exist anymore.

I've attached a script to recreate this situation: creating a new type,
defining an operator class for it and then dropping the type causes the
implicitly created operator family to remain.

Thanks for the report. I don't think it's wrong for the now-empty
operator family to stick around: it has no direct dependency on the
dropped type. Also, trying to make it go away would cause problems
if another operator class for another type had been added to the
family meanwhile. However, these things are bad:

Attempting to drop this operator family results in an error. Attempting
to do a dump/restore results in a syntax error on restore.

The problem seems to be that the pg_amproc entry for the opclass'
function 4 doesn't get dropped. Examining the pre-drop pg_depend
entries for the opclass and opfamily, we find

# select objid, pg_describe_object(classid,objid,objsubid) as obj, pg_describe_object(refclassid,refobjid,refobjsubid) as ref, deptype from pg_depend where ...
objid | obj | ref | deptype
-------+---------------------------------------------------------------------------------------------+-----------------------------------------------------+---------
...
45105 | operator family t_btree_ops for access method btree | schema public | n
45107 | operator 1 (t, t) of operator family t_btree_ops for access method btree: <(t,t) | operator <(t,t) | n
45107 | operator 1 (t, t) of operator family t_btree_ops for access method btree: <(t,t) | operator class t_btree_ops for access method btree | i
45108 | operator 2 (t, t) of operator family t_btree_ops for access method btree: <=(t,t) | operator <=(t,t) | n
45108 | operator 2 (t, t) of operator family t_btree_ops for access method btree: <=(t,t) | operator class t_btree_ops for access method btree | i
45109 | operator 3 (t, t) of operator family t_btree_ops for access method btree: =(t,t) | operator =(t,t) | n
45109 | operator 3 (t, t) of operator family t_btree_ops for access method btree: =(t,t) | operator class t_btree_ops for access method btree | i
45110 | operator 4 (t, t) of operator family t_btree_ops for access method btree: >=(t,t) | operator >=(t,t) | n
45110 | operator 4 (t, t) of operator family t_btree_ops for access method btree: >=(t,t) | operator class t_btree_ops for access method btree | i
45111 | operator 5 (t, t) of operator family t_btree_ops for access method btree: >(t,t) | operator >(t,t) | n
45111 | operator 5 (t, t) of operator family t_btree_ops for access method btree: >(t,t) | operator class t_btree_ops for access method btree | i
45112 | function 1 (t, t) of operator family t_btree_ops for access method btree: t_cmp(t,t) | function t_cmp(t,t) | n
45112 | function 1 (t, t) of operator family t_btree_ops for access method btree: t_cmp(t,t) | operator class t_btree_ops for access method btree | i
45113 | function 4 (t, t) of operator family t_btree_ops for access method btree: btequalimage(oid) | operator family t_btree_ops for access method btree | a
45106 | operator class t_btree_ops for access method btree | schema public | n
45106 | operator class t_btree_ops for access method btree | operator family t_btree_ops for access method btree | a
45106 | operator class t_btree_ops for access method btree | type t | n
...

pg_amproc OID 45112 (function 1) has a dependency on t_cmp(t,t), which
of course depends in turn on type t. It also has a dependency on
the operator class, which also depends on t, so for sure it's going
away during "DROP TYPE t". But look at 45113 (function 4). It would
have a dependency on btequalimage(), but we don't record that because
btequalimage() is a pinned built-in function. Its other dependency
is on the operator family not the operator class. This seems like the
wrong thing. It's intentional according to the code: in nbtvalidate.c
we have

if (op->is_func && op->number != BTORDER_PROC)
{
/* Optional support proc, so always a soft family dependency */
op->ref_is_hard = false;
op->ref_is_family = true;
op->refobjid = opfamilyoid;
}

But I think we copied that pattern from other index AMs without
thinking too hard about it. In AMs like GiST, the argument is

* Operator members of a GiST opfamily should never have hard
* dependencies, since their connection to the opfamily depends only on
* what the support functions think, and that can be altered. For
* consistency, we make all soft dependencies point to the opfamily,
* though a soft dependency on the opclass would work as well in the
* CREATE OPERATOR CLASS case.

It seems like maybe btree should be using soft dependencies on the
opclass for optional support functions? Not quite sure about that.
There were a lot of moving parts in these choices IIRC.

Now the big reason that the leftover pg_amproc entry causes problems
is that its amproclefttype/amprocrighttype entries are still
referencing the deleted type "t". That wouldn't really stop us from
deleting the opfamily, except that during DROP CASCADE we try to print
descriptions of all the dropped objects, and getObjectDescription
calls format_type_extended which fails. The leftover entry also
causes issues for pg_dump, which will emit something like

CREATE OPERATOR FAMILY public.t_btree_ops USING btree;
ALTER OPERATOR FAMILY public.t_btree_ops USING btree ADD
FUNCTION 4 (45086, 45086) btequalimage(oid);

which of course doesn't parse during restore.

So we really need to fix things so that the pg_amproc entry goes away.
Switching its dependency to be on the opclass would do. A different
approach that might solve more problems is to be careful to record
a dependency from a pg_amproc entry to the type(s) mentioned in it.
This would be redundant in the case where the referenced function
has those types as input, but we can't really assume that for
support functions.

At least in the back branches, I'm inclined to also fix
getObjectDescription to use FORMAT_TYPE_ALLOW_INVALID when
printing the types of a pg_amproc entry, so that you're not
quite so thoroughly hosed if you already have this situation
in your catalog.

Peter, any thoughts about this?

regards, tom lane

In reply to: Tom Lane (#2)
Re: Dangling operator family after DROP TYPE

On Fri, Dec 6, 2024 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thanks for the report. I don't think it's wrong for the now-empty
operator family to stick around: it has no direct dependency on the
dropped type. Also, trying to make it go away would cause problems
if another operator class for another type had been added to the
family meanwhile. However, these things are bad:

Attempting to drop this operator family results in an error. Attempting
to do a dump/restore results in a syntax error on restore.

Agreed.

It's intentional according to the code: in nbtvalidate.c
we have

if (op->is_func && op->number != BTORDER_PROC)
{
/* Optional support proc, so always a soft family dependency */
op->ref_is_hard = false;
op->ref_is_family = true;
op->refobjid = opfamilyoid;
}

But I think we copied that pattern from other index AMs without
thinking too hard about it.

That is accurate.

Peter, any thoughts about this?

Nothing much to say about it.

I would just point out that using the built-in allequalimage function
is specifically documented as bad practice. After all, you as an
individual non-core opclass author don't have any control over its
behavior. At the same time, I do understand the temptation to use the
built-in allequalimage function. In practice most individual B-Tree
opclasses are *obviously* deduplication-safe, and it's convenient to
have a trivial function for that.

--
Peter Geoghegan

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Geoghegan (#3)
Re: Dangling operator family after DROP TYPE

Peter Geoghegan <pg@bowt.ie> writes:

On Fri, Dec 6, 2024 at 12:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter, any thoughts about this?

Nothing much to say about it.

I would just point out that using the built-in allequalimage function
is specifically documented as bad practice.

Perhaps, but there are surely plenty of other ways to get into this
situation, when you have support functions whose signatures don't
involve the data type of the indexed column.

Here's a couple of proposed patches. The first just makes
getObjectDescription robust against dangling
amproclefttype/amprocrighttype links. (I did the same for pg_amop
entries, though that may be dead code, per comments below.) I checked
that this allows dropping the busted opfamily.

The second one solves the problem more permanently by adding
dependencies on the types whenever we don't have an indirect
dependency through the operator or function. Coverage checking shows
that the function case is actually hit in our regression tests (during
creation of contrib GiST opclasses), but the operator case isn't.
I think that the check for operators may be dead code, because AFAICS
from a quick look through opclasscmds.c, assignOperTypes will always
fill lefttype/righttype from the operator's input types and there's
nothing to override that. But it's at least conceivable that the
index AM's amadjustmembers function would modify the
lefttype/righttype settings. So I'm inclined to include that code
even if it does nothing today.

I looked at whether we could add a regression test for this, but
all of the cases that presently hit it are contrib extensions.
So there's no way to drop the data type without also dropping the
opfamily (which'd be likewise a member of the extension). That
probably explains the lack of field reports of this old problem.
We could devise something no doubt, but it doesn't quite seem
worth the trouble and test cycles.

regards, tom lane

Attachments:

0001-avoid-failure-with-missing-type.patchtext/x-diff; charset=us-ascii; name=0001-avoid-failure-with-missing-type.patchDownload+20-4
0002-make-missing-dependencies.patchtext/x-diff; charset=us-ascii; name=0002-make-missing-dependencies.patchDownload+102-0