Remove hardcoded hash opclass function signature exceptions

Started by Peter Eisentrautover 1 year ago4 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

hashvalidate(), which validates the signatures of support functions for
the hash AM, contains several hardcoded exceptions. For example,
hash/date_ops support function 1 is hashint4(), which would ordinarily
fail validation because the function argument is int4, not date. But
this works internally because int4 and date are of the same size. There
are several more exceptions like this that happen to work and were
allowed historically but would now fail the function signature validation.

AFAICT, these exceptions were just carried over from before the current
index AM API and validation functions were added. The code contains
comments like "For the moment, fix it by having a list of allowed
cases.", so it probably wasn't meant as the ideal state.

This patch removes those exceptions by providing new support functions
that have the proper declared signatures. They internally share most of
the C code with the "wrong" functions they replace, so the behavior is
still the same.

With the exceptions gone, hashvalidate() is now simplified and relies
fully on check_amproc_signature(), similar to other index AMs.

I'm also fixing one case where a brin opclass used hashvarlena() for
bytea, even though in that case, there is no function signature
validation done, so it doesn't matter that much.

Not done here, but maybe hashvarlena() and hashvarlenaextended() should
be removed from pg_proc.dat, since their use as opclass support
functions is now dubious. They could continue to exist in the C code as
internal support functions.

Attachments:

0001-Remove-hardcoded-hash-opclass-function-signature-exc.patchtext/plain; charset=UTF-8; name=0001-Remove-hardcoded-hash-opclass-function-signature-exc.patchDownload+169-119
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: Remove hardcoded hash opclass function signature exceptions

Peter Eisentraut <peter@eisentraut.org> writes:

hashvalidate(), which validates the signatures of support functions for
the hash AM, contains several hardcoded exceptions.
...
This patch removes those exceptions by providing new support functions
that have the proper declared signatures. They internally share most of
the C code with the "wrong" functions they replace, so the behavior is
still the same.

+1 for cleaning this up. A couple of minor nitpicks:

* I don't really like the new control structure, or rather lack of
structure, in hashvalidate. In particular the uncommented
s/break/continue/ changes look like typos. They aren't, but can't
you do this in a less confusing fashion? Or at least add comments
like "continue not break because the code below the switch doesn't
apply to this case".

* Hand-picking OIDs as you did in pg_proc.dat is kind of passé now.
I guess it's all right as long as nobody else does the same thing in
the near future, but ...

Not done here, but maybe hashvarlena() and hashvarlenaextended() should
be removed from pg_proc.dat, since their use as opclass support
functions is now dubious.

I wish we could get rid of those, but according to
codesearch.debian.net, postgis and a couple of other extensions are
relying on them. If we remove them we'll break any convenient upgrade
path for those extensions.

regards, tom lane

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#2)
Re: Remove hardcoded hash opclass function signature exceptions

On 06.09.24 21:43, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

hashvalidate(), which validates the signatures of support functions for
the hash AM, contains several hardcoded exceptions.
...
This patch removes those exceptions by providing new support functions
that have the proper declared signatures. They internally share most of
the C code with the "wrong" functions they replace, so the behavior is
still the same.

+1 for cleaning this up. A couple of minor nitpicks:

* I don't really like the new control structure, or rather lack of
structure, in hashvalidate. In particular the uncommented
s/break/continue/ changes look like typos. They aren't, but can't
you do this in a less confusing fashion? Or at least add comments
like "continue not break because the code below the switch doesn't
apply to this case".

Ok, I cleaned that up a bit.

* Hand-picking OIDs as you did in pg_proc.dat is kind of passé now.
I guess it's all right as long as nobody else does the same thing in
the near future, but ...

Renumbered with the suggested "random" numbers.

Not done here, but maybe hashvarlena() and hashvarlenaextended() should
be removed from pg_proc.dat, since their use as opclass support
functions is now dubious.

I wish we could get rid of those, but according to
codesearch.debian.net, postgis and a couple of other extensions are
relying on them. If we remove them we'll break any convenient upgrade
path for those extensions.

Those are using the C function, which is ok. I was thinking about
removing the SQL function (from pg_proc.dat), because you can't use that
for much anymore. (You can't call it directly, and the hash AM will no
longer accept it.) I have done that in this patch version and added
some code comments around it.

Attachments:

v2-0001-Remove-hardcoded-hash-opclass-function-signature-.patchtext/plain; charset=UTF-8; name=v2-0001-Remove-hardcoded-hash-opclass-function-signature-.patchDownload+177-127
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#3)
Re: Remove hardcoded hash opclass function signature exceptions

Peter Eisentraut <peter@eisentraut.org> writes:

On 06.09.24 21:43, Tom Lane wrote:

* I don't really like the new control structure, or rather lack of
structure, in hashvalidate. In particular the uncommented
s/break/continue/ changes look like typos. They aren't, but can't
you do this in a less confusing fashion? Or at least add comments
like "continue not break because the code below the switch doesn't
apply to this case".

Ok, I cleaned that up a bit.

That looks nicer. Thanks.

I wish we could get rid of those, but according to
codesearch.debian.net, postgis and a couple of other extensions are
relying on them. If we remove them we'll break any convenient upgrade
path for those extensions.

Those are using the C function, which is ok. I was thinking about
removing the SQL function (from pg_proc.dat), because you can't use that
for much anymore. (You can't call it directly, and the hash AM will no
longer accept it.) I have done that in this patch version and added
some code comments around it.

No, it isn't okay. What postgis (and the others) is doing is
equivalent to

regression=# create function myhash(bytea) returns int as 'hashvarlena' LANGUAGE 'internal' IMMUTABLE STRICT PARALLEL SAFE;
CREATE FUNCTION

After applying the v2 patch, I get

regression=# create function myhash(bytea) returns int as 'hashvarlena' LANGUAGE 'internal' IMMUTABLE STRICT PARALLEL SAFE;
ERROR: there is no built-in function named "hashvarlena"

The reason is that the fmgr_builtins table is built from
pg_proc.dat, and only names appearing in it can be used as 'internal'
function definitions. So you really can't remove the pg_proc entry.

The other thing that's made from pg_proc.dat is the list of extern
function declarations in fmgrprotos.h. That's why you had to add
those cowboy declarations inside hashfunc.c, which are both ugly
and not helpful for any external module that might wish to call those
functions at the C level.

Other than the business about removing those pg_proc entries,
I think this is good to go.

regards, tom lane