Better error message for select_common_type()
So I was informed today that UNION types integer and text cannot be
matched. Alright, but it failed to tell which particular expressions
in this 3-branch, 30-columns-each UNION clause in a 100-line statement
it was talking about. So I made the attached patch to give some better
pointers. Example:
peter=# values(0,1), (1::bigint,2), ('text'::text,3);
ERROR: 42804: VALUES types bigint at position 2 and text at position 3
cannot be matched in instance 1
I'm not sure about the terminology "position" and "instance"; they're
just two coordinates to get at the problem.
None of this will help if you have multiple unrelated clauses that
invoke select_common_type(), but that might be better handled using the
parser location mechanism.
Comments?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Attachments:
select-common-type-with-position.patchtext/x-diff; charset=us-ascii; name=select-common-type-with-position.patchDownload
diff -ur ../cvs-pgsql/src/backend/parser/analyze.c ./src/backend/parser/analyze.c
--- ../cvs-pgsql/src/backend/parser/analyze.c 2007-03-13 10:59:04.000000000 +0100
+++ ./src/backend/parser/analyze.c 2007-04-23 21:23:23.000000000 +0200
@@ -2195,7 +2195,7 @@
*/
for (i = 0; i < sublist_length; i++)
{
- coltypes[i] = select_common_type(coltype_lists[i], "VALUES");
+ coltypes[i] = select_common_type(coltype_lists[i], "VALUES", sublist_length > 1 ? i + 1 : 0);
}
newExprsLists = NIL;
@@ -2632,6 +2632,7 @@
ListCell *lcm;
ListCell *rcm;
const char *context;
+ int count;
context = (stmt->op == SETOP_UNION ? "UNION" :
(stmt->op == SETOP_INTERSECT ? "INTERSECT" :
@@ -2665,6 +2666,7 @@
/* don't have a "foreach4", so chase two of the lists by hand */
lcm = list_head(lcoltypmods);
rcm = list_head(rcoltypmods);
+ count = 0;
forboth(lct, lcoltypes, rct, rcoltypes)
{
Oid lcoltype = lfirst_oid(lct);
@@ -2674,9 +2676,12 @@
Oid rescoltype;
int32 rescoltypmod;
+ count++;
+
/* select common type, same as CASE et al */
rescoltype = select_common_type(list_make2_oid(lcoltype, rcoltype),
- context);
+ context,
+ list_length(lcoltypes) > 1 ? count : 0);
/* if same type and same typmod, use typmod; else default */
if (lcoltype == rcoltype && lcoltypmod == rcoltypmod)
rescoltypmod = lcoltypmod;
diff -ur ../cvs-pgsql/src/backend/parser/parse_clause.c ./src/backend/parser/parse_clause.c
--- ../cvs-pgsql/src/backend/parser/parse_clause.c 2007-02-03 14:08:51.000000000 +0100
+++ ./src/backend/parser/parse_clause.c 2007-04-23 21:13:14.000000000 +0200
@@ -965,7 +965,7 @@
{
outcoltype = select_common_type(list_make2_oid(l_colvar->vartype,
r_colvar->vartype),
- "JOIN/USING");
+ "JOIN/USING", 0);
outcoltypmod = -1; /* ie, unknown */
}
else if (outcoltypmod != r_colvar->vartypmod)
diff -ur ../cvs-pgsql/src/backend/parser/parse_coerce.c ./src/backend/parser/parse_coerce.c
--- ../cvs-pgsql/src/backend/parser/parse_coerce.c 2007-04-03 22:26:05.000000000 +0200
+++ ./src/backend/parser/parse_coerce.c 2007-04-23 21:52:52.000000000 +0200
@@ -932,23 +932,31 @@
* typeids is a nonempty list of type OIDs. Note that earlier items
* in the list will be preferred if there is doubt.
* 'context' is a phrase to use in the error message if we fail to select
- * a usable type.
+ * a usable type. 'instance_no' is to decorate the error message in case
+ * there are multiple invocations of this function for a clause.
*/
Oid
-select_common_type(List *typeids, const char *context)
+select_common_type(List *typeids, const char *context, int instance_no)
{
Oid ptype;
CATEGORY pcategory;
ListCell *type_item;
+ int ptype_pos;
+ int ntype_pos;
Assert(typeids != NIL);
ptype = getBaseType(linitial_oid(typeids));
+ ptype_pos = 1;
pcategory = TypeCategory(ptype);
+ ntype_pos = 1;
+
for_each_cell(type_item, lnext(list_head(typeids)))
{
Oid ntype = getBaseType(lfirst_oid(type_item));
+ ntype_pos++;
+
/* move on to next one if no new information... */
if ((ntype != InvalidOid) && (ntype != UNKNOWNOID) && (ntype != ptype))
{
@@ -956,6 +964,7 @@
{
/* so far, only nulls so take anything... */
ptype = ntype;
+ ptype_pos = ntype_pos;
pcategory = TypeCategory(ptype);
}
else if (TypeCategory(ntype) != pcategory)
@@ -968,10 +977,20 @@
/*------
translator: first %s is name of a SQL construct, eg CASE */
- errmsg("%s types %s and %s cannot be matched",
- context,
- format_type_be(ptype),
- format_type_be(ntype))));
+ (instance_no > 0
+ ? errmsg("%s types %s at position %d and %s at position %d cannot be matched in instance %d",
+ context,
+ format_type_be(ptype),
+ ptype_pos,
+ format_type_be(ntype),
+ ntype_pos,
+ instance_no)
+ : errmsg("%s types %s at position %d and %s at position %d cannot be matched",
+ context,
+ format_type_be(ptype),
+ ptype_pos,
+ format_type_be(ntype),
+ ntype_pos))));
}
else if (!IsPreferredType(pcategory, ptype) &&
can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
@@ -982,6 +1001,7 @@
* other way; but if we have a preferred type, stay on it.
*/
ptype = ntype;
+ ptype_pos = ntype_pos;
pcategory = TypeCategory(ptype);
}
}
diff -ur ../cvs-pgsql/src/backend/parser/parse_expr.c ./src/backend/parser/parse_expr.c
--- ../cvs-pgsql/src/backend/parser/parse_expr.c 2007-04-03 22:26:05.000000000 +0200
+++ ./src/backend/parser/parse_expr.c 2007-04-23 21:13:00.000000000 +0200
@@ -870,7 +870,7 @@
* LHS' type is first in the list, it will be preferred when there is
* doubt (eg, when all the RHS items are unknown literals).
*/
- scalar_type = select_common_type(typeids, "IN");
+ scalar_type = select_common_type(typeids, "IN", 0);
/* Do we have an array type to use? */
array_type = get_array_type(scalar_type);
@@ -1074,7 +1074,7 @@
*/
typeids = lcons_oid(exprType((Node *) newc->defresult), typeids);
- ptype = select_common_type(typeids, "CASE");
+ ptype = select_common_type(typeids, "CASE", 0);
Assert(OidIsValid(ptype));
newc->casetype = ptype;
@@ -1249,7 +1249,7 @@
}
/* Select a common type for the elements */
- element_type = select_common_type(typeids, "ARRAY");
+ element_type = select_common_type(typeids, "ARRAY", 0);
/* Coerce arguments to common type if necessary */
foreach(element, newelems)
@@ -1325,7 +1325,7 @@
typeids = lappend_oid(typeids, exprType(newe));
}
- newc->coalescetype = select_common_type(typeids, "COALESCE");
+ newc->coalescetype = select_common_type(typeids, "COALESCE", 0);
/* Convert arguments if necessary */
foreach(args, newargs)
@@ -1363,7 +1363,7 @@
typeids = lappend_oid(typeids, exprType(newe));
}
- newm->minmaxtype = select_common_type(typeids, "GREATEST/LEAST");
+ newm->minmaxtype = select_common_type(typeids, "GREATEST/LEAST", 0);
/* Convert arguments if necessary */
foreach(args, newargs)
diff -ur ../cvs-pgsql/src/include/parser/parse_coerce.h ./src/include/parser/parse_coerce.h
--- ../cvs-pgsql/src/include/parser/parse_coerce.h 2007-04-01 10:57:50.000000000 +0200
+++ ./src/include/parser/parse_coerce.h 2007-04-23 21:08:47.000000000 +0200
@@ -59,7 +59,7 @@
Oid targetTypeId,
const char *constructName);
-extern Oid select_common_type(List *typeids, const char *context);
+extern Oid select_common_type(List *typeids, const char *context, int instance_no);
extern Node *coerce_to_common_type(ParseState *pstate, Node *node,
Oid targetTypeId,
const char *context);
"Peter Eisentraut" <peter_e@gmx.net> writes:
peter=# values(0,1), (1::bigint,2), ('text'::text,3);
ERROR: 42804: VALUES types bigint at position 2 and text at position 3
cannot be matched in instance 1I'm not sure about the terminology "position" and "instance"; they're
just two coordinates to get at the problem.
Wouldn't that just be column 1 in rows 2 and 3?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Peter Eisentraut <peter_e@gmx.net> writes:
None of this will help if you have multiple unrelated clauses that
invoke select_common_type(), but that might be better handled using the
parser location mechanism.
+1 on using the parser location mechanism and avoiding the terminology
problem altogether. I fear though that we're not set up to have
multiple locations in one error report. Will it be sufficient if we
point at one of the two offending expressions? (I'd guess pointing at
the second makes the most sense, if feasible.)
regards, tom lane
Gregory Stark wrote:
"Peter Eisentraut" <peter_e@gmx.net> writes:
peter=# values(0,1), (1::bigint,2), ('text'::text,3);
ERROR: 42804: VALUES types bigint at position 2 and text at
position 3 cannot be matched in instance 1I'm not sure about the terminology "position" and "instance";
they're just two coordinates to get at the problem.Wouldn't that just be column 1 in rows 2 and 3?
In this case yes, but the routine is also used for UNION, IN, GREATEST,
JOIN/USING and others.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Tom Lane wrote:
+1 on using the parser location mechanism and avoiding the
terminology problem altogether.
I figured we would let the parser only point to the UNION or VALUES or
whatever word. It would be fairly cumbersome to drag the individual
expression positions down into select_common_value() for full
precision.
I fear though that we're not set up
to have multiple locations in one error report. Will it be
sufficient if we point at one of the two offending expressions? (I'd
guess pointing at the second makes the most sense, if feasible.)
I don't think that would help. In the example I was looking at 90
expression and I had no idea in most cases what their results types
are, so if it tells me that the 15th expression somewhere doesn't
match, I would need to know which is the other mismatching one.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane wrote:
+1 on using the parser location mechanism and avoiding the
terminology problem altogether.
I figured we would let the parser only point to the UNION or VALUES or
whatever word. It would be fairly cumbersome to drag the individual
expression positions down into select_common_value() for full
precision.
Possibly. I was thinking of demanding that callers pass an additional
list containing positions for the expressions, but hadn't looked to see
how easy that might be. In any case, if we need to point at both
expressions then it's not gonna work.
For the VALUES case, the suggestion of "row" and "column" terminology
seems the right thing, but for UNION it would be better to use "branch"
perhaps ("row" certainly seems misleading). How can we make that work
without indulging in untranslatable keyword-insertion?
Another possibility is "alternative" and "column", which seems like it
applies more or less equally poorly to both cases.
regards, tom lane
Tom Lane wrote:
For the VALUES case, the suggestion of "row" and "column" terminology
seems the right thing, but for UNION it would be better to use "branch"
perhaps ("row" certainly seems misleading). How can we make that work
without indulging in untranslatable keyword-insertion?Another possibility is "alternative" and "column", which seems like it
applies more or less equally poorly to both cases.
Maybe it would be good to have distinctive terminology if at all
possible, as it will be clearer for both cases.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
For the VALUES case, the suggestion of "row" and "column" terminology
seems the right thing, but for UNION it would be better to use "branch"
perhaps ("row" certainly seems misleading). How can we make that work
without indulging in untranslatable keyword-insertion?
Hm, I guess the SQL spec terminology in both cases would be "table
expression".
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Peter Eisentraut <peter_e@gmx.net> writes:
... I'm not sure about the terminology "position" and "instance"; they're
just two coordinates to get at the problem.
None of this will help if you have multiple unrelated clauses that
invoke select_common_type(), but that might be better handled using the
parser location mechanism.
Were there any objections to changing this patch so that it reports
the second expression's parser location, instead of some arbitrary
numbers? The way I'm envisioning doing it is:
1. Invent an exprLocation() function (comparable to, say, exprType)
that knows how to get the parser location from any subtype of Node that
has one.
2. Make a variant of select_common_type() that takes a list of Exprs
instead of just type OIDs. It can get the type IDs from these
using exprType(), and it can get their locations using exprLocation()
if needed.
We could almost just replace the current form of select_common_type()
with the version envisioned in #2. In a quick grep, there is only
one usage of select_common_type() that isn't invoking it on a list
of exprType() results that could be trivially changed over to the
underlying expressions instead --- and that is in
transformSetOperationTree, which may be dealing with inputs that
are previously-resolved output types for child set operations.
I'm not sure about a better way to complain about type mismatches in
nested set operations, anyway. We could possibly keep track of one of
the sub-expressions that had determined the resolved output type, and
point to that, but it would likely seem a bit arbitrary to the user.
Thoughts anyone?
regards, tom lane
I wrote:
Were there any objections to changing this patch so that it reports
the second expression's parser location, instead of some arbitrary
numbers? The way I'm envisioning doing it is:
1. Invent an exprLocation() function (comparable to, say, exprType)
that knows how to get the parser location from any subtype of Node that
has one.
2. Make a variant of select_common_type() that takes a list of Exprs
instead of just type OIDs. It can get the type IDs from these
using exprType(), and it can get their locations using exprLocation()
if needed.
I started to look at this and immediately found out that the above
blithe sketch has nothing to do with reality. The problem is that
the current parser location mechanism stores locations only for
nodes that appear in raw grammar trees (gram.y output), *not* in
analyzed expressions (transformExpr output). This was an intentional
choice based on a couple of factors:
* Once we no longer have the parser input string available, the location
information would be just so much wasted space.
* It would add a weird special case to the equalfuncs.c routines:
should location fields be compared? (Probably not, but it seems a bit
unprincipled to ignore them.) And other places might have comparable
uncertainties what to do with 'em.
We'd need to either go back on that decision or pass in location
information separately to select_common_type. I think I prefer the
latter, but it's messier.
(On the third hand, you could make a case that including location
info in analyzed expressions makes it feasible to point at problems
detected at higher levels of analyze.c than just the first-level
transformExpr() call. select_common_type's problem could be seen
as just one aspect of what might be a widespread need.)
Another problem is that only a rather small subset of raw-grammar
expression node types actually carry locations at all. I had always
intended to go back and extend that, but it's not done yet. One reason
it's not done is that currently a lot of expression node types are used
for both raw-grammar output and analyzed expressions, which brings us
right back up against the issue above. I'd be inclined to fix that by
extending AExpr even more, and/or inventing an analogous raw-grammar
node type for things that take variable numbers of arguments, but
still it's more work.
So this is all eminently do-able but it seems too much to be tackling
during commit fest. I'd like to throw this item back on the TODO list.
Or we could apply Peter's patch more or less as-is, but I don't like
that. I don't think it solves the stated problem: if you know that CASE
branches 3 and 5 don't match, that still doesn't help you in a monster
query with lots of CASEs. I think we can and must do better.
regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
Or we could apply Peter's patch more or less as-is, but I don't like
that. I don't think it solves the stated problem: if you know that CASE
branches 3 and 5 don't match, that still doesn't help you in a monster
query with lots of CASEs. I think we can and must do better.
Do we have something more helpful than "branches 3 and 5"? Perhaps printing
the actual transformed expressions?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
Gregory Stark <stark@enterprisedb.com> writes:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:
Or we could apply Peter's patch more or less as-is, but I don't like
that. I don't think it solves the stated problem: if you know that CASE
branches 3 and 5 don't match, that still doesn't help you in a monster
query with lots of CASEs. I think we can and must do better.
Do we have something more helpful than "branches 3 and 5"?
That's exactly the point of discussion. A parser location is what we
need, the problem is that this patch doesn't provide it.
Perhaps printing the actual transformed expressions?
Don't think it solves the problem either. For instance, if there are
a hundred references to variable X in your query, printing "X" isn't
going to get you far.
regards, tom lane
Am Dienstag, 18. M�rz 2008 schrieb Tom Lane:
Or we could apply Peter's patch more or less as-is, but I don't like
that. �I don't think it solves the stated problem: if you know that CASE
branches 3 and 5 don't match, that still doesn't help you in a monster
query with lots of CASEs. �I think we can and must do better.
Yeah, that and the other reason I sort of gave up on this approach is that it
is nearly impossible to find some good terminology that works for all callers
of select_common_type() (VALUES, UNION, JOIN, IN, CASE, ARRAY, COALESCE,
GREATEST, according to my notes). A pointer into the statement would
certainly be much nicer.
Added to TODO:
* Improve reporting of UNION type mismatches
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00944.php
http://archives.postgresql.org/pgsql-hackers/2008-03/msg00597.php
---------------------------------------------------------------------------
Peter Eisentraut wrote:
So I was informed today that UNION types integer and text cannot be
matched. Alright, but it failed to tell which particular expressions
in this 3-branch, 30-columns-each UNION clause in a 100-line statement
it was talking about. So I made the attached patch to give some better
pointers. Example:peter=# values(0,1), (1::bigint,2), ('text'::text,3);
ERROR: 42804: VALUES types bigint at position 2 and text at position 3
cannot be matched in instance 1I'm not sure about the terminology "position" and "instance"; they're
just two coordinates to get at the problem.None of this will help if you have multiple unrelated clauses that
invoke select_common_type(), but that might be better handled using the
parser location mechanism.Comments?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
[ Attachment, skipping... ]
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +