Some progress on INSERT/SELECT/GROUP BY bugs
I believe I've identified the main cause of the peculiar behavior we
are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
parser bug.
Here is the test case I'm looking at:
CREATE TABLE si_tmpverifyaccountbalances (
type int4 NOT NULL,
memberid int4 NOT NULL,
categoriesid int4 NOT NULL,
amount numeric);
CREATE TABLE invoicelinedetails (
invoiceid int4,
memberid int4,
totshippinghandling numeric,
invoicelinesid int4);
INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
memberid, 1, totshippinghandling FROM invoicelinedetails
GROUP BY invoiceid+3, memberid, totshippinghandling;
ERROR: INSERT has more expressions than target columns
The reason this is coming out is that the matching of GROUP BY (also
ORDER BY) items to targetlist entries is fundamentally broken in this
context. The GROUP BY items "memberid" and "totshippinghandling" are
simply unvarnished Ident nodes when they arrive at findTargetlistEntry()
in parse_clause.c; what findTargetlistEntry() does with them is to try
to match them against the resdom names of the existing targetlist items.
I think that's correct behavior in the plain SELECT case (but note it
means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b
--- is that per spec??). But it fails miserably in the INSERT/SELECT
case, because by the time control gets here, the targetlist items have
been given resdom names *corresponding to the column names of the target
table*.
So, in the example at hand, "memberid" is matched to the correct column
by pure luck (because it has the same name in the destination table),
and then "totshippinghandling" is not recognized as one of the existing
TLEs because it does not match any destination column name.
Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
to mean the same thing no matter whether there is an INSERT in front of
it or not, and thus that letting target column names affect the meaning
of GROUP BY items is dead wrong. (Don't have a spec to check this with,
however.)
I believe the most reasonable fix for this is to postpone relabeling
of the targetlist entries with destination column names until after
analysis of the SELECT's subsidiary clauses is complete. In particular,
it should *not* be done instantly when each TLE is made, which is what
MakeTargetEntryIdent currently does. The TLEs should have the same
resnames as in the SELECT case until after subsidiary clause processing
is done.
(MakeTargetEntryIdent is broken anyway because it tries to associate
a destination column with every TLE, even the resjunk ones. The reason
we see the quoted error message in this situation is that after
findTargetlistEntry fails to detect that totshippinghandling is already
a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
totshippinghandling, and then MakeTargetEntryIdent tries to find a
target column to go with the junk TLE. So the revised code should only
assign dest column names to non-junk TLEs.)
I'm not really familiar enough with the parser to want to tackle this
size of change by myself --- Thomas, do you want to do it? I think it's
largely a matter of moving code around, but I'm not sure where is the
right place for it...
regards, tom lane
(MakeTargetEntryIdent is broken anyway because it tries to associate
a destination column with every TLE, even the resjunk ones. The reason
we see the quoted error message in this situation is that after
findTargetlistEntry fails to detect that totshippinghandling is already
a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
totshippinghandling, and then MakeTargetEntryIdent tries to find a
target column to go with the junk TLE. So the revised code should only
assign dest column names to non-junk TLEs.)I'm not really familiar enough with the parser to want to tackle this
size of change by myself --- Thomas, do you want to do it? I think it's
largely a matter of moving code around, but I'm not sure where is the
right place for it...
Yes, I clearly remember the INSERT assigning target names to columns in
the select to match up the entries. I still am unclear which of these
are valid SQL:
select a as b from test order by a
select a as b from test order by b
Can we just defer the renaming until after we do group-by?
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
I still am unclear which of these are valid SQL:
select a as b from test order by a
select a as b from test order by b
Both are valid, and don't forget the third variant:
select a as b from test order by 1
Andreas
Import Notes
Resolved by subject fallback
I still am unclear which of these are valid SQL:
select a as b from test order by a
select a as b from test order by bBoth are valid, and don't forget the third variant:
select a as b from test order by 1
Andreas
I wonder why this should be valid. Consider the following
test case:
CREATE TABLE t1 (a int4, b int4);
SELECT a AS b, b AS a FROM t1 GROUP BY a, b;
Is that now GROUP BY 1,2 or BY 2,1? Without the grouping, it
is a totally valid statement because the column DISPLAY-names
given with AS don't affect the rest of it.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
I still am unclear which of these are valid SQL:
select a as b from test order by a
select a as b from test order by bBoth are valid, and don't forget the third variant:
select a as b from test order by 1
Andreas
I wonder why this should be valid. Consider the following
test case:CREATE TABLE t1 (a int4, b int4);
SELECT a AS b, b AS a FROM t1 GROUP BY a, b;Is that now GROUP BY 1,2 or BY 2,1? Without the grouping, it
The order of the columns in a group by don't affect the result.
It will affect the sort order, but without an order by, the order is
implementation depentent and not guaranteed by ANSI.
is a totally valid statement because the column DISPLAY-names
given with AS don't affect the rest of it.
Resumee:
group by and where ignores alias completely (in Oracle and Informix)
order by uses alias
(only if unambiguous in Informix, alias precedes column name
in Oracle)
So I guess our group by code does it different, than all others :-(
At last what about this, even if it is how the others do it, it is not
consistent with
our group by:
regression=> select a as b, b as c from a where c=3;
b|c
-+-
3|1
(1 row)
Does anyone know what standard says ?
Andreas
Import Notes
Resolved by subject fallback
Tom, was this done?
I believe I've identified the main cause of the peculiar behavior we
are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
parser bug.Here is the test case I'm looking at:
CREATE TABLE si_tmpverifyaccountbalances (
type int4 NOT NULL,
memberid int4 NOT NULL,
categoriesid int4 NOT NULL,
amount numeric);CREATE TABLE invoicelinedetails (
invoiceid int4,
memberid int4,
totshippinghandling numeric,
invoicelinesid int4);INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
memberid, 1, totshippinghandling FROM invoicelinedetails
GROUP BY invoiceid+3, memberid, totshippinghandling;ERROR: INSERT has more expressions than target columns
The reason this is coming out is that the matching of GROUP BY (also ORDER BY) items to targetlist entries is fundamentally broken in this context. The GROUP BY items "memberid" and "totshippinghandling" are simply unvarnished Ident nodes when they arrive at findTargetlistEntry() in parse_clause.c; what findTargetlistEntry() does with them is to try to match them against the resdom names of the existing targetlist items. I think that's correct behavior in the plain SELECT case (but note it means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b --- is that per spec??). But it fails miserably in the INSERT/SELECT case, because by the time control gets here, the targetlist items have been given resdom names *corresponding to the column names of the target table*.So, in the example at hand, "memberid" is matched to the correct column
by pure luck (because it has the same name in the destination table),
and then "totshippinghandling" is not recognized as one of the existing
TLEs because it does not match any destination column name.Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
to mean the same thing no matter whether there is an INSERT in front of
it or not, and thus that letting target column names affect the meaning
of GROUP BY items is dead wrong. (Don't have a spec to check this with,
however.)I believe the most reasonable fix for this is to postpone relabeling
of the targetlist entries with destination column names until after
analysis of the SELECT's subsidiary clauses is complete. In particular,
it should *not* be done instantly when each TLE is made, which is what
MakeTargetEntryIdent currently does. The TLEs should have the same
resnames as in the SELECT case until after subsidiary clause processing
is done.(MakeTargetEntryIdent is broken anyway because it tries to associate
a destination column with every TLE, even the resjunk ones. The reason
we see the quoted error message in this situation is that after
findTargetlistEntry fails to detect that totshippinghandling is already
a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
totshippinghandling, and then MakeTargetEntryIdent tries to find a
target column to go with the junk TLE. So the revised code should only
assign dest column names to non-junk TLEs.)I'm not really familiar enough with the parser to want to tackle this
size of change by myself --- Thomas, do you want to do it? I think it's
largely a matter of moving code around, but I'm not sure where is the
right place for it...regards, tom lane
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Tom, was this done?
This is not done --- I wasn't willing to try to do such a thing by
myself when we were already in 6.5 beta. It's on my todo list for 6.6.
6.5 fails in a different way than 6.4 did, for reasons that I don't
recall offhand, but the only real fix is to restructure the analyzer.
regards, tom lane
Show quoted text
I believe I've identified the main cause of the peculiar behavior we
are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
parser bug.Here is the test case I'm looking at:
CREATE TABLE si_tmpverifyaccountbalances (
type int4 NOT NULL,
memberid int4 NOT NULL,
categoriesid int4 NOT NULL,
amount numeric);CREATE TABLE invoicelinedetails (
invoiceid int4,
memberid int4,
totshippinghandling numeric,
invoicelinesid int4);INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
memberid, 1, totshippinghandling FROM invoicelinedetails
GROUP BY invoiceid+3, memberid, totshippinghandling;ERROR: INSERT has more expressions than target columns
The reason this is coming out is that the matching of GROUP BY (also ORDER BY) items to targetlist entries is fundamentally broken in this context. The GROUP BY items "memberid" and "totshippinghandling" are simply unvarnished Ident nodes when they arrive at findTargetlistEntry() in parse_clause.c; what findTargetlistEntry() does with them is to try to match them against the resdom names of the existing targetlist items. I think that's correct behavior in the plain SELECT case (but note it means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b --- is that per spec??). But it fails miserably in the INSERT/SELECT case, because by the time control gets here, the targetlist items have been given resdom names *corresponding to the column names of the target table*.So, in the example at hand, "memberid" is matched to the correct column
by pure luck (because it has the same name in the destination table),
and then "totshippinghandling" is not recognized as one of the existing
TLEs because it does not match any destination column name.Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
to mean the same thing no matter whether there is an INSERT in front of
it or not, and thus that letting target column names affect the meaning
of GROUP BY items is dead wrong. (Don't have a spec to check this with,
however.)I believe the most reasonable fix for this is to postpone relabeling
of the targetlist entries with destination column names until after
analysis of the SELECT's subsidiary clauses is complete. In particular,
it should *not* be done instantly when each TLE is made, which is what
MakeTargetEntryIdent currently does. The TLEs should have the same
resnames as in the SELECT case until after subsidiary clause processing
is done.(MakeTargetEntryIdent is broken anyway because it tries to associate
a destination column with every TLE, even the resjunk ones. The reason
we see the quoted error message in this situation is that after
findTargetlistEntry fails to detect that totshippinghandling is already
a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
totshippinghandling, and then MakeTargetEntryIdent tries to find a
target column to go with the junk TLE. So the revised code should only
assign dest column names to non-junk TLEs.)I'm not really familiar enough with the parser to want to tackle this
size of change by myself --- Thomas, do you want to do it? I think it's
largely a matter of moving code around, but I'm not sure where is the
right place for it...regards, tom lane
-- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Import Notes
Reply to msg id not found: YourmessageofTue6Jul1999234416-0400199907070344.XAA02925@candle.pha.pa.us | Resolved by subject fallback
Bruce Momjian <maillist@candle.pha.pa.us> writes:
Tom, was this done?
This is not done --- I wasn't willing to try to do such a thing by
myself when we were already in 6.5 beta. It's on my todo list for 6.6.
On your list. Good. I can't possibly figure out how to describe this
bug.
6.5 fails in a different way than 6.4 did, for reasons that I don't
recall offhand, but the only real fix is to restructure the analyzer.regards, tom lane
I believe I've identified the main cause of the peculiar behavior we
are seeing with INSERT ... SELECT ... GROUP/ORDER BY: it's a subtle
parser bug.Here is the test case I'm looking at:
CREATE TABLE si_tmpverifyaccountbalances (
type int4 NOT NULL,
memberid int4 NOT NULL,
categoriesid int4 NOT NULL,
amount numeric);CREATE TABLE invoicelinedetails (
invoiceid int4,
memberid int4,
totshippinghandling numeric,
invoicelinesid int4);INSERT INTO si_tmpverifyaccountbalances SELECT invoiceid+3,
memberid, 1, totshippinghandling FROM invoicelinedetails
GROUP BY invoiceid+3, memberid, totshippinghandling;ERROR: INSERT has more expressions than target columns
The reason this is coming out is that the matching of GROUP BY (also ORDER BY) items to targetlist entries is fundamentally broken in this context. The GROUP BY items "memberid" and "totshippinghandling" are simply unvarnished Ident nodes when they arrive at findTargetlistEntry() in parse_clause.c; what findTargetlistEntry() does with them is to try to match them against the resdom names of the existing targetlist items. I think that's correct behavior in the plain SELECT case (but note it means "SELECT a AS b, b AS c GROUP BY b" will really group by a not b --- is that per spec??). But it fails miserably in the INSERT/SELECT case, because by the time control gets here, the targetlist items have been given resdom names *corresponding to the column names of the target table*.So, in the example at hand, "memberid" is matched to the correct column
by pure luck (because it has the same name in the destination table),
and then "totshippinghandling" is not recognized as one of the existing
TLEs because it does not match any destination column name.Now, call me silly, but it seems to me that SELECT ... GROUP BY ought
to mean the same thing no matter whether there is an INSERT in front of
it or not, and thus that letting target column names affect the meaning
of GROUP BY items is dead wrong. (Don't have a spec to check this with,
however.)I believe the most reasonable fix for this is to postpone relabeling
of the targetlist entries with destination column names until after
analysis of the SELECT's subsidiary clauses is complete. In particular,
it should *not* be done instantly when each TLE is made, which is what
MakeTargetEntryIdent currently does. The TLEs should have the same
resnames as in the SELECT case until after subsidiary clause processing
is done.(MakeTargetEntryIdent is broken anyway because it tries to associate
a destination column with every TLE, even the resjunk ones. The reason
we see the quoted error message in this situation is that after
findTargetlistEntry fails to detect that totshippinghandling is already
a TLE, it calls MakeTargetEntryIdent to make a junk TLE for
totshippinghandling, and then MakeTargetEntryIdent tries to find a
target column to go with the junk TLE. So the revised code should only
assign dest column names to non-junk TLEs.)I'm not really familiar enough with the parser to want to tackle this
size of change by myself --- Thomas, do you want to do it? I think it's
largely a matter of moving code around, but I'm not sure where is the
right place for it...regards, tom lane
-- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
--
Bruce Momjian | http://www.op.net/~candle
maillist@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <maillist@candle.pha.pa.us> writes:
This is not done --- I wasn't willing to try to do such a thing by
myself when we were already in 6.5 beta. It's on my todo list for 6.6.
On your list. Good. I can't possibly figure out how to describe this
bug.
If you want a TODO entry try
* INSERT ... SELECT ... GROUP BY groups by target columns not source columns
There are other failure modes associated with this bug but that one will
do for the list.
regards, tom lane
Import Notes
Reply to msg id not found: YourmessageofWed7Jul1999123633-0400199907071636.MAA01188@candle.pha.pa.us | Resolved by subject fallback