Inaccurate description of UNION/CASE/etc type selection

Started by Tom Lanealmost 6 years ago10 messagesdocs

tgl@sss.pgh.pa.us

almost 6 years ago

We had a question about why an ARRAY[] construct was resolving the
array's type strangely [1]/messages/by-id/CAOwYNKYfKPfAL4rgP0AO_w0Mn7h8yiXd_Qi9swPdAc4CAUXeAQ@mail.gmail.com. The documentation about this [2]https://www.postgresql.org/docs/current/typeconv-union-case.html says
that the relevant resolution rules are:

5. Choose the first non-unknown input type which is a preferred type in
that category, if there is one.

6. Otherwise, choose the last non-unknown input type that allows all the
preceding non-unknown inputs to be implicitly converted to it. (There
always is such a type, since at least the first type in the list must
satisfy this condition.)

But what select_common_type() actually does is:

else if (!pispreferred &&
can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
!can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))
{
/*
* take new type if can coerce to it implicitly but not the
* other way; but if we have a preferred type, stay on it.
*/
pexpr = nexpr;
ptype = ntype;
pcategory = ncategory;
pispreferred = nispreferred;
}

(ptype is the currently selected common type, ntype is the next
input type to consider, and we've already eliminated cases involving
UNKNOWN.)

In the reported case, we have ptype = "name", ntype = "text", and there
are implicit coercions in both directions so we stay with "name" even
though it's not preferred.

So, the step-5 claim that we always choose a preferred type over other
types is just wrong. Step 6 is much short of truthful as well, since
it fails to describe the check about coercion in the other direction.
(Also, we're not really checking that *every* earlier argument can be
promoted to ntype, only the currently best one. Typically, if there's
an implicit coercion from A to B and also one from B to C, there'd be
one from A to C too; but there are lots of counterexamples.)

Now, this code is old enough to vote, so I think changing its behavior
is probably a really bad idea. I did experiment with giving preferred
types fractionally more preference, like this:

else if (can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
(nispreferred > pispreferred ||
(!pispreferred &&
!can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))))

but this broke a couple of regression test cases, so I'm sure it'd
break real-world queries too. So I think we need to leave the code
alone and fix the docs to describe it more accurately.

However, I'm having a hard time coming up with wording that describes
this accurately without being a verbatim statement of the algorithm.
(I see that I already made one attempt at improving the description,
back in 07daff63c, but it's clearly still not good enough.)

Any ideas?

regards, tom lane

[1]: /messages/by-id/CAOwYNKYfKPfAL4rgP0AO_w0Mn7h8yiXd_Qi9swPdAc4CAUXeAQ@mail.gmail.com
[2]: https://www.postgresql.org/docs/current/typeconv-union-case.html

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: Tom Lane (#1)

Re: Inaccurate description of UNION/CASE/etc type selection

On Sun, Aug 16, 2020 at 2:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

We had a question about why an ARRAY[] construct was resolving the
array's type strangely [1].

In this specific complaint it strikes me as undesirable to have an lossy
implicit cast from text to name - and if that wasn't present then text
would have been chosen as expected.

The documentation about this [2] says
that the relevant resolution rules are:

5. Choose the first non-unknown input type which is a preferred type in
that category, if there is one.

6. Otherwise, choose the last non-unknown input type that allows all the
preceding non-unknown inputs to be implicitly converted to it. (There
always is such a type, since at least the first type in the list must
satisfy this condition.)

But what select_common_type() actually does is:

else if (!pispreferred &&
can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT)
&&
!can_coerce_type(1, &ntype, &ptype,
COERCION_IMPLICIT))
{
/*
* take new type if can coerce to it implicitly but not the
* other way; but if we have a preferred type, stay on it.
*/
pexpr = nexpr;
ptype = ntype;
pcategory = ncategory;
pispreferred = nispreferred;
}

(ptype is the currently selected common type, ntype is the next
input type to consider, and we've already eliminated cases involving
UNKNOWN.)

Seems like 5 and 6 need to be merged - the chosen type is the first one
that all subsequent types can be implicitly cast to. We do not guarantee
that previous types will be implicitly convertible to this type.

In pseudo-code:
else if (can_coerce(n->p)) continue /* matches when pispreferred */
else if (can_coerce(p->n)) "change chosen type"
else continue /* but now we expect a runtime implicit cast not found error
*/

To go along with the above simplification:

/* move on to next one if no new information... */
if (ntype != UNKNOWNOID && ntype != ptype) {
should be written positively as
if (ntype == UNKNOWNOID || ntype == ptype)
continue;

Random thought, once we get to the end of the loop why not conditionally go
over the list a second time so the elements prior to the selected value's
type are re-considered in lieu of the new choice? That at least clears up
the documentation to state that all encountered types are compared against
the chosen type for implicit casting instead of just those appearing after
the one we settled upon. Maybe on the second pass we don't actually change
the chosen type but instead provide for a better error message in the case
of a missing implicit type cast.

I do agree that while this doesn't feel like an ideal algorithm it seems
undesirable to change the implementation - or at least its behavior in
non-error situations. Given the lack of complaints here error handling
improvements don't seem like a win either.

David J.

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: David G. Johnston (#2)

Re: Inaccurate description of UNION/CASE/etc type selection

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Sun, Aug 16, 2020 at 2:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

We had a question about why an ARRAY[] construct was resolving the
array's type strangely [1].

In this specific complaint it strikes me as undesirable to have an lossy
implicit cast from text to name - and if that wasn't present then text
would have been chosen as expected.

Yeah, in an ideal world maybe, but I don't see us removing that
implicit cast either --- I expect that'd also break a lot of queries.
[ experiments ... actually it might not be that bad ] But in any case,
that's not very relevant to the documentation problem.

Seems like 5 and 6 need to be merged - the chosen type is the first one
that all subsequent types can be implicitly cast to. We do not guarantee
that previous types will be implicitly convertible to this type.
In pseudo-code:
else if (can_coerce(n->p)) continue /* matches when pispreferred */
else if (can_coerce(p->n)) "change chosen type"
else continue /* but now we expect a runtime implicit cast not found error
*/

This formulation fails to account for the pispreferred check, though.
The pseudo-code would be correct if you made the first line be

else if (pispreferred || can_coerce(n->p)) continue

but then it doesn't map neatly to a description that fails to mention
preferred-ness. (Note that this would be simpler if we could assume
that pispreferred implies that there is a cast from every other category
member type to p. But there are counterexamples.)

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: Tom Lane (#3)

Re: Inaccurate description of UNION/CASE/etc type selection

On Sun, Aug 16, 2020 at 5:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Sun, Aug 16, 2020 at 2:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Seems like 5 and 6 need to be merged - the chosen type is the first one
that all subsequent types can be implicitly cast to. We do not guarantee
that previous types will be implicitly convertible to this type.
In pseudo-code:
else if (can_coerce(n->p)) continue /* matches when pispreferred */
else if (can_coerce(p->n)) "change chosen type"
else continue /* but now we expect a runtime implicit cast not found

error

*/

This formulation fails to account for the pispreferred check, though.
The pseudo-code would be correct if you made the first line be

else if (pispreferred || can_coerce(n->p)) continue

but then it doesn't map neatly to a description that fails to mention
preferred-ness. (Note that this would be simpler if we could assume
that pispreferred implies that there is a cast from every other category
member type to p. But there are counterexamples.)

I was making that assumption.

In the pseudo-code:

else if (pispreferred) continue; /* never move away from a preferred type */
{the rest of the else ifs from above}

In the docs:

5. If the first non-unknown type is a preferred type it is chosen,
otherwise it is made a candidate, and then,
6. each subsequent type is compared to the current candidate, with a new
candidate being chosen only when there exists a non-mutal implicit cast to
the new type.
6a. If at any point a preferred type is made a candidate then it will be
chosen.

David J.

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: David G. Johnston (#4)

Re: Inaccurate description of UNION/CASE/etc type selection

On Sun, Aug 16, 2020 at 5:58 PM David G. Johnston <
david.g.johnston@gmail.com> wrote:

5. If the first non-unknown type is a preferred type it is chosen,
otherwise it is made a candidate, and then,
6. each subsequent type is compared to the current candidate, with a new
candidate being chosen only when there exists a non-mutal implicit cast to
the new type.
6a. If at any point a preferred type is made a candidate then it will be
chosen.

More concisely:

Make the first input type a candidate type. Each subsequent input type is
compared to the current candidate, with a new candidate being chosen only
when there exists a non-mutal implicit cast to the new type. If at any
point a preferred type is made a candidate then it will be chosen.

In my original the whole if/otherwise in 5 is pointless given the presence
of 6a.

David J.

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: David G. Johnston (#5)

Re: Inaccurate description of UNION/CASE/etc type selection

"David G. Johnston" <david.g.johnston@gmail.com> writes:

More concisely:

Make the first input type a candidate type. Each subsequent input type is
compared to the current candidate, with a new candidate being chosen only
when there exists a non-mutal implicit cast to the new type. If at any
point a preferred type is made a candidate then it will be chosen.

So this is just a verbatim statement of the algorithm, which is what
I was hoping to avoid :-(. But maybe there's no simpler way.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: Tom Lane (#6)

Re: Inaccurate description of UNION/CASE/etc type selection

On Mon, Aug 17, 2020 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

More concisely:

Make the first input type a candidate type. Each subsequent input type

is

compared to the current candidate, with a new candidate being chosen only
when there exists a non-mutal implicit cast to the new type. If at any
point a preferred type is made a candidate then it will be chosen.

So this is just a verbatim statement of the algorithm, which is what
I was hoping to avoid :-(. But maybe there's no simpler way.

I got nothin'. The locking onto the preferred type is conditional on one
being chosen and there doesn't seem to be any greater principle that
emerges from the pairwise matching algorithm - at least given that implicit
casts are essentially randomly distributed and the algorithm is
order-dependent.

David J.

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: David G. Johnston (#7)

Re: Inaccurate description of UNION/CASE/etc type selection

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Mon, Aug 17, 2020 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So this is just a verbatim statement of the algorithm, which is what
I was hoping to avoid :-(. But maybe there's no simpler way.

I got nothin'.

Yeah, me either. So here's a proposed patch, fixing a couple other
things:

* Re-reading this, I thought the use of "preferred" in the existing
footnote about domains could be read as meaning that we treat the
base type as a preferred type; so I changed that.

* Something that's been true for a very long time, but never documented,
is that CASE puts its ELSE clause at the front for this purpose.
I figured that if we're trying to tell the full truth we better mention
that.

regards, tom lane

David G. Johnston

david.g.johnston@gmail.com

almost 6 years ago

In reply to: Tom Lane (#8)

Re: Inaccurate description of UNION/CASE/etc type selection

On Mon, Aug 17, 2020 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Mon, Aug 17, 2020 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So this is just a verbatim statement of the algorithm, which is what
I was hoping to avoid :-(. But maybe there's no simpler way.

I got nothin'.

Yeah, me either. So here's a proposed patch, fixing a couple other
things:

LGTM

David J.

#10

Tom Lane

tgl@sss.pgh.pa.us

almost 6 years ago

In reply to: David G. Johnston (#9)

Re: Inaccurate description of UNION/CASE/etc type selection

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Mon, Aug 17, 2020 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, me either. So here's a proposed patch, fixing a couple other
things:

LGTM

Pushed, thanks for review.

regards, tom lane

Inaccurate description of UNION/CASE/etc type selection

Attachments: