Why does to_json take "anyelement" rather than "any"?

Started by Nikhil Beneschabout 5 years ago7 messages
#1Nikhil Benesch
nikhil.benesch@gmail.com

to_json is declared as taking "anyelement" as input, which means
you can't pass it something of unknown type:

postgres=# SELECT to_json('foo');
ERROR: could not determine polymorphic type because input has type unknown

But this works fine with the very similar json_build_array function:

postgres=# SELECT json_build_array('foo');
json_build_array
------------------
["foo"]
(1 row)

The difference is that json_build_array takes type "any" as input, while
to_json takes "anyelement" as input.

Is there some reason to_json couldn't be switched to take "any" as input?
Hacking this together seems to mostly just work:

postgres=# CREATE FUNCTION my_to_json ("any") RETURNS json LANGUAGE 'internal' AS 'to_json';
postgres=# SELECT my_to_json('foo');
my_to_json
------------
"foo"
(1 row)

Is there something I'm missing?

Nikhil

#2Mohamed Wael Khobalatte
mkhobalatte@grubhub.com
In reply to: Nikhil Benesch (#1)
Re: Why does to_json take "anyelement" rather than "any"?

On Tue, Nov 3, 2020 at 11:54 AM Nikhil Benesch <nikhil.benesch@gmail.com>
wrote:

to_json is declared as taking "anyelement" as input, which means
you can't pass it something of unknown type:

postgres=# SELECT to_json('foo');
ERROR: could not determine polymorphic type because input has type
unknown

But this works fine with the very similar json_build_array function:

postgres=# SELECT json_build_array('foo');
json_build_array
------------------
["foo"]
(1 row)

The difference is that json_build_array takes type "any" as input, while
to_json takes "anyelement" as input.

Is there some reason to_json couldn't be switched to take "any" as input?
Hacking this together seems to mostly just work:

postgres=# CREATE FUNCTION my_to_json ("any") RETURNS json LANGUAGE
'internal' AS 'to_json';
postgres=# SELECT my_to_json('foo');
my_to_json
------------
"foo"
(1 row)

Is there something I'm missing?

Nikhil

Hm, good question. I am also curious as to why this happens.
`json_build_array` ends up casting unknowns to text (from reading the
code), which seems like a reasonable (although not completely tight)
assumption. Not sure why `to_json` can't just do the same. You can always
cast to text yourself, of course, but I am not familiar with the type
hierarchy enough to tell why `to_json` can't deduce that as text whereas
the other function can.

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Mohamed Wael Khobalatte (#2)
Re: Why does to_json take "anyelement" rather than "any"?

On Thu, Nov 5, 2020 at 3:43 PM Mohamed Wael Khobalatte <
mkhobalatte@grubhub.com> wrote:

You can always cast to text yourself, of course, but I am not familiar
with the type hierarchy enough to tell why `to_json` can't deduce that as
text whereas the other function can.

My understanding is that "any" is defined to accept that behavior -
allowing any pseudo-type and unknown. The "anyelement" polymorphic
pseudo-type is defined such that only concrete known types are allowed to
match - and then the rules of polymorphism apply when performing a lookup.
My uninformed conclusion is that since to_json only defines a single
parameter that changing it from "anyelement" to "any" would be reasonable
and the hack describe probably "just works" (though I'd test it on a
wide-range of built-in types first if I was actually going to use the hack).

You only get to use "any" for a C-language function but that is indeed the
case here.

David J.

#4Nikhil Benesch
nikhil.benesch@gmail.com
In reply to: David G. Johnston (#3)
Re: Why does to_json take "anyelement" rather than "any"?

On 11/5/20 7:38 PM, David G. Johnston wrote:

My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown.  The "anyelement" polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the rules of polymorphism apply when performing a lookup.  My uninformed conclusion is that since to_json only defines a single parameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just works" (though I'd test it on a wide-range of built-in types first if I was actually going to use the hack).

You only get to use "any" for a C-language function but that is indeed the case here.

That exactly matches my understanding as well. I'll put together a patch.

Nikhil

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Nikhil Benesch (#4)
Re: Why does to_json take "anyelement" rather than "any"?

Nikhil Benesch <nikhil.benesch@gmail.com> writes:

On 11/5/20 7:38 PM, David G. Johnston wrote:

My understanding is that "any" is defined to accept that behavior - allowing any pseudo-type and unknown.  The "anyelement" polymorphic pseudo-type is defined such that only concrete known types are allowed to match - and then the rules of polymorphism apply when performing a lookup.  My uninformed conclusion is that since to_json only defines a single parameter that changing it from "anyelement" to "any" would be reasonable and the hack describe probably "just works" (though I'd test it on a wide-range of built-in types first if I was actually going to use the hack).

You only get to use "any" for a C-language function but that is indeed the case here.

That exactly matches my understanding as well. I'll put together a patch.

"any" is a dinosaur IMO. It's definitely lower-level than anyelement;
for example the function has to be prepared to deal with raw "unknown"
literals. So I feel like the proposed solution here is a bit of a hack.

What I'm wondering about as I think about this is why we don't allow
unknown literals to be resolved as text when matching to anyelement.
Maybe that was intentional, or maybe just overly conservative; or maybe
there is a good reason for it. I don't recall, but it would be worth
excavating in the list archives to see if it was discussed when the
polymorphic types were being designed.

A relevant data point is that we *do* allow the case with the more
recently added "anycompatible" polymorphics:

regression=# create function foo(anycompatible) returns anycompatible as 'select $1' language sql;
CREATE FUNCTION
regression=# select foo('bar');
foo
-----
bar
(1 row)

regression=# select pg_typeof(foo('bar'));
pg_typeof
-----------
text
(1 row)

So even if we decide that changing the rules for "anyelement" is
too scary, I think switching to_json to anycompatible would be
preferable to switching it to "any".

regards, tom lane

#6Nikhil Benesch
nikhil.benesch@gmail.com
In reply to: Tom Lane (#5)
Re: Why does to_json take "anyelement" rather than "any"?

On 11/5/20 8:58 PM, Tom Lane wrote:

"any" is a dinosaur IMO. It's definitely lower-level than anyelement;
for example the function has to be prepared to deal with raw "unknown"
literals. So I feel like the proposed solution here is a bit of a hack.

I see what you are saying, but since the code for to_jsonb is shared with
the code for jsonb_build_array and jsonb_build_object, which already
handle the pitfalls of "any", the patch seems to be literally this
simple:

diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index c01da4bf01..11adf748c9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -8466,7 +8466,7 @@
    prosrc => 'json_object_two_arg' },
  { oid => '3176', descr => 'map input to json',
    proname => 'to_json', provolatile => 's', prorettype => 'json',
-  proargtypes => 'anyelement', prosrc => 'to_json' },
+  proargtypes => 'any', prosrc => 'to_json' },
  { oid => '3261', descr => 'remove object fields with null values from json',
    proname => 'json_strip_nulls', prorettype => 'json', proargtypes => 'json',
    prosrc => 'json_strip_nulls' },
@@ -9289,7 +9289,7 @@
    proargtypes => '_text _text', prosrc => 'jsonb_object_two_arg' },
  { oid => '3787', descr => 'map input to jsonb',
    proname => 'to_jsonb', provolatile => 's', prorettype => 'jsonb',
-  proargtypes => 'anyelement', prosrc => 'to_jsonb' },
+  proargtypes => 'any', prosrc => 'to_jsonb' },
  { oid => '3265', descr => 'jsonb aggregate transition function',
    proname => 'jsonb_agg_transfn', proisstrict => 'f', provolatile => 's',
    prorettype => 'internal', proargtypes => 'internal anyelement',

I think my argument is that regardless of which of
{any,anyelement,anycompatible} is best, it seems like to_jsonb,
jsonb_build_array, and jsonb_build_object should all use the same
type.

What I'm wondering about as I think about this is why we don't allow
unknown literals to be resolved as text when matching to anyelement.
Maybe that was intentional, or maybe just overly conservative; or maybe
there is a good reason for it. I don't recall, but it would be worth
excavating in the list archives to see if it was discussed when the
polymorphic types were being designed.

I excavated two separate threads from 2003 from when you and Joe Conway
were designing SQL99 array support and the initial polymorphic types:

/messages/by-id/3E701869.4020301@joeconway.com
/messages/by-id/1272.1048633920@sss.pgh.pa.us

I didn't see anything obvious about unknown coercions, though I
certainly could have overlooked something. For what it's worth, the
error message "could not determine polymorphic type because input has
type unknown" has existed, with slightly different wording, since
the very first commit of the feature:

https://github.com/postgres/postgres/commit/730840c9b (parse_coerce.c, L840)

A relevant data point is that we *do* allow the case with the more
recently added "anycompatible" polymorphics:

<...snipped examples...>

So even if we decide that changing the rules for "anyelement" is
too scary, I think switching to_json to anycompatible would be
preferable to switching it to "any".

Oh these new polymorphic types are interesting. I hadn't seen these.

So, I don't feel particularly qualified to determine how to proceed.
These are the options that I would be excited about:

1. Switch to_jsonb to take "any", as in the above patch.
2. Convert all of to_jsonb, jsonb_build_array, and jsonb_build_object
to use the new "anycompatible" type.
3. Switch to_jsonb to take "anyelement", but change "anyelement" and
friends so that "unknown" arguments are coereced to text.

Would someone care to offer guidance on which path to choose?

Nikhil

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: David G. Johnston (#3)
Re: Why does to_json take "anyelement" rather than "any"?

pá 6. 11. 2020 v 1:39 odesílatel David G. Johnston <
david.g.johnston@gmail.com> napsal:

On Thu, Nov 5, 2020 at 3:43 PM Mohamed Wael Khobalatte <
mkhobalatte@grubhub.com> wrote:

You can always cast to text yourself, of course, but I am not familiar
with the type hierarchy enough to tell why `to_json` can't deduce that as
text whereas the other function can.

My understanding is that "any" is defined to accept that behavior -
allowing any pseudo-type and unknown. The "anyelement" polymorphic
pseudo-type is defined such that only concrete known types are allowed to
match - and then the rules of polymorphism apply when performing a lookup.
My uninformed conclusion is that since to_json only defines a single
parameter that changing it from "anyelement" to "any" would be reasonable
and the hack describe probably "just works" (though I'd test it on a
wide-range of built-in types first if I was actually going to use the hack).

You only get to use "any" for a C-language function but that is indeed the
case here.

Type "anyelement" can force the function's result type directly. But there
cannot be function that returns UNKNOWN.

Type "any" just accept any argument without any impact on result type.
Unfortunately, inside a function is necessary to do much more work related
to casting types, and the execution can be slower.

I checked the source code of to_json and this function can use "any"
without any change.

Regards

Pavel

Show quoted text

David J.