Counting bool flags in a complex query

Started by Michael Richardsover 26 years ago22 messages
#1Michael Richards
miker@scifair.acadiau.ca

Hi.

I think I've created a monster...

Working on an email system I have the following:
Table    = usermail
+----------------------------------+--------------------------+-------+
|              Field               |              Type        | Length|
+----------------------------------+--------------------------+-------+
| contentlength                    | int4                     |     4 |
| folder                           | int4                     |     4 |
| flagnew                          | bool                     |     1 |
etc...
And:
Table    = folders
+----------------------------------+--------------------------+-------+
|              Field               |              Type        | Length|
+----------------------------------+--------------------------+-------+
| loginid                          | varchar() not null       |    16 |
| folderid                         | int4 not null default (  |     4 |
| foldername                       | varchar()                |    25 |
etc...

So each email message has an entry in usermail, and each mail folder has
an entry in folders. I need to extract the following info:
foldername, number of messages in that folder, number of messages in that
folder with flagread set, total size of all the messages in each folder

Since postgres does not appear to support outer joins, I've come up with a
really icky query that almost does what I want:

SELECT folderid,foldername,count(*),sum(contentlength)
FROM usermail,folders
WHERE usermail.loginid='michael' AND
folders.loginid=usermail.loginid AND
usermail.folder=folders.folderid
GROUP BY folderid,foldername
UNION SELECT folderid,foldername,null,null
FROM folders
WHERE loginid='michael' AND
folderid NOT IN
(SELECT folder FROM usermail WHERE loginid='michael');

WHEW!

folderid|foldername |count| sum
--------+----------------+-----+-------
-4|Deleted Messages| 110| 245627
-3|Saved Drafts | |
-2|Sent Mail | 7| 10878
-1|New Mail Folder | 73|8831226
1|OOL | 7| 8470
etc...

My final problem is to count all the messages with flagnew set to true.
The only way I can think to do this is to convert the bool value to a 1 or
0 (which I think should be a standard conversion anyway) and run a sum()
on them.

Unless anyone can come up with a better way to do this, What is the best
way to implement a conversion from bool to int?

-Michael

#2Duane Currie
dcurrie@sandman.acadiau.ca
In reply to: Michael Richards (#1)
Re: [HACKERS] Counting bool flags in a complex query

Hi.

I think I've created a monster...

...

My final problem is to count all the messages with flagnew set to true.
The only way I can think to do this is to convert the bool value to a 1 or
0 (which I think should be a standard conversion anyway) and run a sum()
on them.

Unless anyone can come up with a better way to do this, What is the best
way to implement a conversion from bool to int?

-Michael

Of course, you could always use count() and a 'WHERE flagnew' clause...

Duane

#3Michael Richards
miker@scifair.acadiau.ca
In reply to: Duane Currie (#2)
Re: [HACKERS] Counting bool flags in a complex query

On Wed, 14 Jul 1999, Duane Currie wrote:

My final problem is to count all the messages with flagnew set to true.
The only way I can think to do this is to convert the bool value to a 1 or
0 (which I think should be a standard conversion anyway) and run a sum()
on them.

Unless anyone can come up with a better way to do this, What is the best
way to implement a conversion from bool to int?

Of course, you could always use count() and a 'WHERE flagnew' clause...

Problem with that of course is that by limiting the query with a "where",
I'd lose all the records in the original count, and therefore the total
number of messages (a count that ignores the status of flagnew) would be
wrong.

What I was sort of hoping for was a way to implement a native conversion
from bool to int, and have it included in the standard postgres system. I
think the conversion if a reasonable logical one where true==1 and
false==0. The problem is, I don't have a sweet clue how to do this. I
think it should be a trivial matter to insert something into a system
table...

-Michael

#4Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Michael Richards (#1)
Re: [HACKERS] Counting bool flags in a complex query

Unless anyone can come up with a better way to do this, What is the best
way to implement a conversion from bool to int?

Try

select sum(case when bfield = TRUE then 1 else 0 end) from table;

It works for me...

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#5Michael Richards
miker@scifair.acadiau.ca
In reply to: Thomas Lockhart (#4)
Re: [HACKERS] Counting bool flags in a complex query

On Wed, 14 Jul 1999, Thomas Lockhart wrote:

Unless anyone can come up with a better way to do this, What is the best
way to implement a conversion from bool to int?

select sum(case when bfield = TRUE then 1 else 0 end) from table;

I'm not sure this is correct, but I think I see a bug of some sort...

SELECT folderid,foldername,count(*),sum(contentlength),sum(case when
flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) ;
ERROR: _finalize_primnode: can't handle node 723

It seems to be the union that is confuzing it...

SELECT folderid,foldername,count(*),sum(contentlength),sum(case when
flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername;
folderid|foldername |count| sum|sum
--------+----------------+-----+-------+---
-4|Deleted Messages| 110| 245627| 50
-2|Sent Mail | 7| 10878| 2
-1|New Mail Folder | 73|8831226| 1
1|OOL | 7| 8470| 0
etc

-Michael

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Richards (#5)
Re: [HACKERS] Counting bool flags in a complex query

Michael Richards <miker@scifair.acadiau.ca> writes:

ERROR: _finalize_primnode: can't handle node 723

Grumble. Still another routine that doesn't know as much as it should
about traversing parsetrees. Looks like a job for <flourish of trumpets>
expression_tree_walker.

It seems to be the union that is confuzing it...

CASE expression inside a UNION/INTERSECT/EXCEPT, to be specific.

Will fix this in time for 6.5.1.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: [HACKERS] Counting bool flags in a complex query

Michael Richards <miker@scifair.acadiau.ca> writes:

I'm not sure this is correct, but I think I see a bug of some sort...

SELECT folderid,foldername,count(*),sum(contentlength),sum(case when
flagnew = TRUE then 1 else 0 end) FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) ;
ERROR: _finalize_primnode: can't handle node 723

I committed a fix last night; it will be in 6.5.1.

regards, tom lane

#8Michael Richards
miker@scifair.acadiau.ca
In reply to: Tom Lane (#7)
Re: [HACKERS] Counting bool flags in a complex query

On Thu, 15 Jul 1999, Tom Lane wrote:

Michael Richards <miker@scifair.acadiau.ca> writes:

I'm not sure this is correct, but I think I see a bug of some sort...

I committed a fix last night; it will be in 6.5.1.

I've found what I believe is another set of bugs:
This is my monster query again...

My folder numbers are: negative numbers are system folders such as New
mail, trash, drafts and sentmail. I wanted to order the tuples so that the
folderids were sorted from -1 to -4, then 1 to x. This way the system
folders would always appear first in the list.

This may not be valid SQL, as none of my books mention it. Is it possible
to order by an expression?

Here are some examples which some some odd behaviour. My suspected bug
findings are at the end:

SELECT folderid,foldername,count(*) as "messgaes",sum(bool2int(flagnew))
as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) order by (folderid>0);
folderid|foldername |messgaes|newmessages| size
--------+----------------+--------+-----------+-------
-4|Deleted Messages| 110| 50| 245627
-2|Sent Mail | 7| 2| 10878
-1|New Mail Folder | 73| 1|8831226
1|OOL | 7| 0| 8470
2|suggestions | 26| 0| 35433
3|Acadia | 5| 0| 17703
4|advertising | 4| 2| 5394
5|dealt with | 3| 0| 2883
36|dauphne | 9| 0| 66850
-3|Saved Drafts | 0| 0| 0
(10 rows)

It looks like the order by is only being applied to the original select,
not the unioned select. Some authority should check on it, but by thought
it that a union does not necessarily maintain the order, so the entire
select should be applied to the order.

I'm not so good at interpreting the query plan, but here it is:
Unique (cost=8.10 rows=0 width=0)
-> Sort (cost=8.10 rows=0 width=0)
-> Append (cost=8.10 rows=0 width=0)
-> Aggregate (cost=6.05 rows=1 width=49)
-> Group (cost=6.05 rows=1 width=49)
-> Sort (cost=6.05 rows=1 width=49)
-> Nested Loop (cost=6.05 rows=1 width=49)
-> Index Scan using usermail_pkey on usermail (cost=2.05 rows=2 width=21)
-> Index Scan using folders_pkey on folders (cost=2.00 rows=8448 width=28)
-> Index Scan using folders_pkey on folders (cost=2.05 rows=2 width=16)
SubPlan
-> Index Scan using usermail_pkey on usermail (cost=2.05 rows=1 width=4)

I would have expected the folderid -3 to appear as the 3rd one in this
case.

I'm probably going to change the numbering scheme of the system folders so
they will sort correctly without a kluge such as:
create function ordfolderid(int) returns int as 'select $1*-1 where $1<0
union select $1+1*10 where $1>=0' language 'sql';

Then running the order clause as:
order by (folderid<0),ordfolderid(folderid)
My thought behind this kludge is that the table should first be ordered by
the t/f value of the fact folderid<0, then within each of the true and
false sortings, subsort those by the value of folderid.

Complicated enough for you?

Well, in my playing I notice what appears to be more of a bug...
SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) order by (folderid<0);
folderid|foldername |messgaes|newmessages| size
--------+----------------+--------+-----------+-------
1|OOL | 7| 0| 8470
2|suggestions | 26| 0| 35433
3|Acadia | 5| 0| 17703
4|advertising | 4| 2| 5394
5|dealt with | 3| 0| 2883
36|dauphne | 9| 0| 66850
-4|Deleted Messages| 110| 50| 245627
-2|Sent Mail | 7| 2| 10878
-1|New Mail Folder | 73| 1|8831226
-3|Saved Drafts | 0| 0| 0
(10 rows)

SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
as "newmessages",sum(contentlength) as "size" FROM usermail,folders WHERE
usermail.loginid='michael' and folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid GROUP BY folderid,foldername UNION
SELECT folderid,foldername,0,0,0 FROM folders WHERE loginid='michael' AND
NOT EXISTS (SELECT folder FROM usermail WHERE loginid='michael' AND
folder=folderid) order by (messages<10);
ERROR: attribute 'messages' not found

Using a column name within an expression in the order by does not seem to
work...
Or a much simpler example to illustrate the bug:
fastmail=> select 1 as "test" order by (test<9);
ERROR: attribute 'test' not found

fastmail=> select 1 as "test" order by test;
test
----
1
(1 row)

I was almost able to make it work properly aside from the sorting issue
with my kludged up routine... This is so nasty that I most definitely
don't want to put it into production:

SELECT folderid,foldername,count(*) as "messages",sum(bool2int(flagnew))
as "newmessages",sum(contentlength) as "size",(folderid>=0) FROM
usermail,folders WHERE usermail.loginid='michael' and
folders.loginid=usermail.loginid AND usermail.folder = folders.folderid
GROUP BY folderid,foldername UNION SELECT
folderid,foldername,0,0,0,(folderid>=0) FROM folders WHERE
loginid='michael' AND NOT EXISTS (SELECT folder FROM usermail WHERE
loginid='michael' AND folder=folderid) order by 6,ordfolderid(folderid);
folderid|foldername |messages|newmessages| size|?column?
--------+----------------+--------+-----------+-------+--------
-1|New Mail Folder | 73| 1|8831226|f
-2|Sent Mail | 7| 2| 10878|f
-4|Deleted Messages| 110| 50| 245627|f
-3|Saved Drafts | 0| 0| 0|f
1|OOL | 7| 0| 8470|t
2|suggestions | 26| 0| 35433|t
3|Acadia | 5| 0| 17703|t
4|advertising | 4| 2| 5394|t
5|dealt with | 3| 0| 2883|t
36|dauphne | 9| 0| 66850|t
(10 rows)

Do I need outer joins to make this work instead of the screwed up union
method I'm trying here, or is it just a series of bugs?

-Michael

#9Ansley, Michael
Michael.Ansley@intec.co.za
In reply to: Michael Richards (#8)
RE: [HACKERS] Counting bool flags in a complex query

Why don't you adjust the ids of your system folders, such that they are
ordered properly? You should have a fixed number of system folders, so you
can guarantee the ids that they will receive. So make the Inbox -4. Then
you just order by folder id, ascending. -4 comes first, with the user
folders always coming after the system folders.

Alternatively, you can sort by an expression, something like:

ORDER BY (if(folderid < 0) then return(abs(folderid)) else
return(folderid+max(abs(MIN(folderid)))))

What this does is shift all the ids up to ensure that they all fall into the
positive range, while inverting the order of the negative ids, which seems
like it's what you want. Of course, this isn't legal SQL. You would
probably have to write a function to implement this. This will work no
matter what folders you add, system or user, and will always give you the
oldest folders (i.e.: those with the lowest absolute id) first, for each
group.
The MAX will make it slow though, except, of course, that in a function, you
can store the value, instead of repeatedly looking it up.

So:

SELECT folderid,
foldername,
count(*) as "messgaes",
sum(bool2int(flagnew)) as "newmessages",
sum(contentlength) as "size"
FROM usermail,folders
WHERE usermail.loginid='michael' AND
folders.loginid=usermail.loginid AND
usermail.folder = folders.folderid
GROUP BY folderid,foldername

UNION ALL

SELECT folderid,
foldername,
0,
0,
0
FROM folders
WHERE loginid='michael' AND
NOT EXISTS (SELECT folder
FROM usermail
WHERE loginid='michael' AND
folder=folderid
)

ORDER BY get_effective_order(folderid);

And then define the function get_effective_order using pgsql to return the
value described above.

However, I don't think that you are going to get away from the UNION ALL.

BTW
If you are going to do this:

fastmail=> select 1 as "test" order by (test<9);

then why not just do this:
select 1 as "test" order by (1<9);
If you actually have a field, then you would be able to put it in. If you
have an expression like this:
select x+y\z as "some_number" from test order by (somenumber>9);
then you could just as easily do this:
select x+y\z as "some_number" from test order by (x+y\z>9);
That's why the expression will not evaluate properly, I think.

MikeA

My folder numbers are: negative numbers are system folders
such as New
mail, trash, drafts and sentmail. I wanted to order the
tuples so that the
folderids were sorted from -1 to -4, then 1 to x. This way the system
folders would always appear first in the list.

<big snip>

Using a column name within an expression in the order by
does not seem to
work...
Or a much simpler example to illustrate the bug:
fastmail=> select 1 as "test" order by (test<9);
ERROR: attribute 'test' not found

fastmail=> select 1 as "test" order by test;
test
----
1
(1 row)

<not so big snip>

Show quoted text

Do I need outer joins to make this work instead of the
screwed up union
method I'm trying here, or is it just a series of bugs?

-Michael

#10Ansley, Michael
Michael.Ansley@intec.co.za
In reply to: Ansley, Michael (#9)
RE: [HACKERS] Counting bool flags in a complex query

Sorry guys, this line:

ORDER BY (if(folderid < 0) then return(abs(folderid)) else
return(folderid+max(abs(MIN(folderid)))))

Should have read:
ORDER BY (if(folderid < 0) then return(abs(folderid)) else
return(folderid+abs(MIN(folderid))))

The max was an error.

MikeA

Show quoted text

Why don't you adjust the ids of your system folders, such
that they are
ordered properly? You should have a fixed number of system
folders, so you
can guarantee the ids that they will receive. So make the
Inbox -4. Then
you just order by folder id, ascending. -4 comes first,
with the user
folders always coming after the system folders.

Alternatively, you can sort by an expression, something like:

ORDER BY (if(folderid < 0) then return(abs(folderid)) else
return(folderid+max(abs(MIN(folderid)))))

What this does is shift all the ids up to ensure that they
all fall into the
positive range, while inverting the order of the negative
ids, which seems

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ansley, Michael (#10)
Re: [HACKERS] Counting bool flags in a complex query

Michael Richards <miker@scifair.acadiau.ca> writes:

I've found what I believe is another set of bugs:

I can shed some light on these.

This may not be valid SQL, as none of my books mention it. Is it possible
to order by an expression?

Postgres accepts expressions as ORDER BY clauses, although strict SQL92
only allows sorting by a column name or number.

It looks like the order by is only being applied to the original select,
not the unioned select. Some authority should check on it, but by thought
it that a union does not necessarily maintain the order, so the entire
select should be applied to the order.

That looks like a bug to me too --- I think the ORDER BY is supposed to
apply across the whole UNION result. Will look into it.

I'm probably going to change the numbering scheme of the system folders so
they will sort correctly without a kluge such as:

Good plan. Although you could sort by a user-defined function result,
it's likely to be horribly slow (because user-defined functions are
slow:-().

Using a column name within an expression in the order by does not seem to
work...
Or a much simpler example to illustrate the bug:
fastmail=> select 1 as "test" order by (test<9);
ERROR: attribute 'test' not found

This is not so much a bug as a definitional issue. For SQL92
compatibility, we accept ORDER BY a column label so long as it's
a bare column label, but column labels are NOT part of the namespace
for full expression evaluation. You can't do this either:

select 1 as "test" , test<9 ;
ERROR: attribute 'test' not found

There are all sorts of squirrely questions about this feature IMHO.
For example,

create table z1 (f1 int4, f2 int4);
CREATE
select f1 as f2, f2 from z1 order by f2;
f2|f2
--+--
(0 rows)

Which column do you think it's ordering by? Which column *should* it
order by? I think this ought to draw an "ambiguous column label" error
... there is code in there that claims to be looking for such a thing,
in fact, so I am not quite sure why it doesn't trigger on this example.

regards, tom lane

#12Michael Richards
miker@scifair.acadiau.ca
In reply to: Tom Lane (#11)
Re: [HACKERS] Counting bool flags in a complex query

On Fri, 16 Jul 1999, Tom Lane wrote:

Good plan. Although you could sort by a user-defined function result,
it's likely to be horribly slow (because user-defined functions are
slow:-().

Yes, but I did include my horrible design ideas so you could see why in
"god's name" I was trying to do what I was trying to do when I found what
looked to be a "bug"

This is not so much a bug as a definitional issue. For SQL92
compatibility, we accept ORDER BY a column label so long as it's
a bare column label, but column labels are NOT part of the namespace
for full expression evaluation. You can't do this either:

select 1 as "test" , test<9 ;
ERROR: attribute 'test' not found

There are all sorts of squirrely questions about this feature IMHO.
For example,

create table z1 (f1 int4, f2 int4);
CREATE
select f1 as f2, f2 from z1 order by f2;
f2|f2
--+--
(0 rows)

Which column do you think it's ordering by? Which column *should* it
order by? I think this ought to draw an "ambiguous column label" error
... there is code in there that claims to be looking for such a thing,
in fact, so I am not quite sure why it doesn't trigger on this example.

Good point. Is there anything in the SQL standard that defined how this
"is supposed" to work? I suppose with no expression support it isn't
really necessary. How about requiring quotes when we're to look at it was
"named" columns? If
select f1 as f2, f2 from z1 order by "f2";

Of course I have no idea how this would conflicy with SQL-92. It's more of
an idea...

-Michael

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Richards (#12)
Re: [SQL] Re: [HACKERS] Counting bool flags in a complex query

Michael Richards <miker@scifair.acadiau.ca> writes:

For example,

create table z1 (f1 int4, f2 int4);
CREATE
select f1 as f2, f2 from z1 order by f2;
f2|f2
--+--
(0 rows)

Which column do you think it's ordering by? Which column *should* it
order by? I think this ought to draw an "ambiguous column label" error

Good point. Is there anything in the SQL standard that defined how this
"is supposed" to work?

After looking at the SQL spec I think the above definitely ought to draw
an error. We have the following verbiage concerning the column names
for the result of a SELECT:

a) If the i-th <derived column> in the <select list> specifies
an <as clause> that contains a <column name> C, then the
<column name> of the i-th column of the result is C.

b) If the i-th <derived column> in the <select list> does not
specify an <as clause> and the <value expression> of that
<derived column> is a single <column reference>, then the
<column name> of the i-th column of the result is C.

c) Otherwise, the <column name> of the i-th column of the <query
specification> is implementation-dependent and different
from the <column name> of any column, other than itself, of
a table referenced by any <table reference> contained in the
SQL-statement.

which Postgres does indeed follow, and we see from (a) and (b) that "f2"
is the required column name for both columns of the SELECT result.
Now ORDER BY says

a) If a <sort specification> contains a <column name>, then T
shall contain exactly one column with that <column name> and
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
the <sort specification> identifies that column.

which sure looks to me like it mandates an error for the example
statement.

However, since SQL doesn't consider the possibility of expressions as
ORDER BY entries, we are more or less on our own for those. An
expression appearing in the target list of a SELECT is not allowed to
refer to columns by their "AS" names (and this does seem to be mandated
by SQL92). So I think it makes sense to carry over the same restriction
to ORDER BY.

regards, tom lane

#14tjk@tksoft.com
tk@uno.tksoft.com
In reply to: Michael Richards (#12)
user defined function speeds

Good plan. Although you could sort by a user-defined function result,
it's likely to be horribly slow (because user-defined functions are
slow:-().

What is the actual overhead from using a userdefined
function (a C function, say)?

Obviously there is an overhead involved in calling
any function. I would like to know what kind of issues
are involved when using results of a user defined function for
sorting.

Are the results calculated only once, as one would expect,
for example.

Thanks,

Troy

Troy Korjuslommi Tksoft OY, Inc.
tjk@tksoft.com Software Development
Open Source Solutions
Hosting Services

#15Herouth Maoz
herouth@oumail.openu.ac.il
In reply to: Michael Richards (#8)
Re: [SQL] Re: [HACKERS] Counting bool flags in a complex query

At 11:37 +0300 on 16/07/1999, Michael Richards wrote:

My folder numbers are: negative numbers are system folders such as New
mail, trash, drafts and sentmail. I wanted to order the tuples so that the
folderids were sorted from -1 to -4, then 1 to x. This way the system
folders would always appear first in the list.

This may not be valid SQL, as none of my books mention it. Is it possible
to order by an expression?

Here are some examples which some some odd behaviour. My suspected bug
findings are at the end:

I think the problem results from using non-standard constructs such as
order by expression, and indeed ordering by columns that don't appear in
the select list.

If you want to do the best by yourself, put the expression by which you
order in the select list. A simple example would be:

Instead of:
SELECT f1, min( f2 ), max ( f3 )
GROUP BY f1
ORDER BY expr( f1 );

Use:

SELECT expr( f1 ) AS ordcol, f1, min( f2 ), max( f3 )
GROUP BY ordcol, f1
ORDER BY ordcol;

What is the difference? The difference is that now GROUP BY (which also
does internal sorting) knows about that expression and considers it. Since
ordcol is the same for each value of f1, this should not change the groups.
This simply makes sure all parts of the query are aware of what is being
done around them. This is also the standard, as far as I recall.

What's the problem? You have a column in the output that you didn't really
want. But hey, why should that bother you? If you're reading it through
some frontend, simply have it ignore the first column that returns.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Herouth Maoz (#15)
Re: [SQL] Re: [HACKERS] Counting bool flags in a complex query

Herouth Maoz <herouth@oumail.openu.ac.il> writes:

I think the problem results from using non-standard constructs such as
order by expression, and indeed ordering by columns that don't appear in
the select list.

I replied:

No, that's not the problem.

Looks like I spoke too soon :-(. On further investigation, it does seem
that the main problem in Richards' example is that he is trying to sort
the result of a UNION by a resjunk attribute. That would work fine as
far as the primary SELECT goes, but there's no mechanism right now for
creating the same resjunk attribute in the sub-selects.

Indeed, we seem to have a whole passel of problems that are related to
transformations done on the target list --- not only resjunk attribute
addition, but rearrangement of the tlist order for INSERT ... SELECT,
and probably other things. In a UNION query these will get done on the
top-level target list but not propagated into the union'd selects.
For example:

create table src (a text, b text, c text);
insert into src values ('a', 'b', 'c');

create table dest (a text default 'A', b text default 'B',
c text default 'C');

insert into dest (a,c) select a,b from src;

select * from dest;
a|b|c
-+-+-
a|B|b
(1 row)

-- OK so far, but now try this:

insert into dest (a,c) select a,b from src union select a,c from src;

ERROR: Each UNION | EXCEPT | INTERSECT query must have the same number
of columns.

-- The default for B was added to the first select, but not the second.
-- Even more interesting:

insert into dest (a,c,b) select a,b,c from src union select a,b,c from src;

select * from dest;
a|b|c
-+-+-
a|B|b
a|c|b
a|b|c
(3 rows)

-- The first select's columns were rearranged per the insert column
-- spec, but the second's were not.

I'm also worried about what happens when different sub-selects have
different collections of resjunk attributes and they all get APPENDed
together...

We've got a few bugs to fix here :-(

Meanwhile, I suspect that Richards' SELECT ... UNION ... ORDER BY
would work OK so long as the ORDER BY was for one of the displayed
columns.

regards, tom lane

#17Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#16)
Re: [SQL] Re: [HACKERS] Counting bool flags in a complex query

We've got a few bugs to fix here :-(

Meanwhile, I suspect that Richards' SELECT ... UNION ... ORDER BY
would work OK so long as the ORDER BY was for one of the displayed
columns.

Tom, can you give me a list for the TODO list?

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#17)
Re: [SQL] Re: [HACKERS] Counting bool flags in a complex query

Bruce Momjian <maillist@candle.pha.pa.us> writes:

We've got a few bugs to fix here :-(

Tom, can you give me a list for the TODO list?

The two cases I mentioned yesterday can be summarized as

* SELECT ... UNION ... ORDER BY fails when sort expr not in result list
* INSERT ... SELECT ... UNION is not reliable

Another thing I realized last night is that Except_Intersect_Rewrite's
coercion of all the sub-select target lists to compatible types is
poorly done; for example in the regression database

regression=> select f1 from int4_tbl union select q1 from int8_tbl;
ERROR: int8 conversion to int4 is out of range

I think we want to use logic similar to what exists for CASE expressions
to find the common supertype of the sub-select results and coerce all
the sub-selects to that type. (Thomas, any comments here? Can we pull
the CASE logic out of transformExpr and make it into a utility routine?)

* Be smarter about promoting types when UNION merges different data types

Finally, heaven help you if you have a GROUP BY in one of the subselects
whose column gets coerced to a different type by Except_Intersect_Rewrite,
because the sortop for the GROUP BY has already been assigned.
(This is another situation where a multi-level output representation
would be a better answer...)

* SELECT ... UNION ... GROUP BY fails if column types disagree

regards, tom lane

#19Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tom Lane (#18)
Re: [SQL] Re: [HACKERS] Counting bool flags in a complex query

Tom, can you give me a list for the TODO list?

The two cases I mentioned yesterday can be summarized as

* SELECT ... UNION ... ORDER BY fails when sort expr not in result list
* INSERT ... SELECT ... UNION is not reliable

Another thing I realized last night is that Except_Intersect_Rewrite's
coercion of all the sub-select target lists to compatible types is
poorly done; for example in the regression database

regression=> select f1 from int4_tbl union select q1 from int8_tbl;
ERROR: int8 conversion to int4 is out of range

I think we want to use logic similar to what exists for CASE expressions
to find the common supertype of the sub-select results and coerce all
the sub-selects to that type. (Thomas, any comments here? Can we pull
the CASE logic out of transformExpr and make it into a utility routine?)

* Be smarter about promoting types when UNION merges different data types

Finally, heaven help you if you have a GROUP BY in one of the subselects
whose column gets coerced to a different type by Except_Intersect_Rewrite,
because the sortop for the GROUP BY has already been assigned.
(This is another situation where a multi-level output representation
would be a better answer...)

* SELECT ... UNION ... GROUP BY fails if column types disagree

All added to TODO.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Richards (#8)
Re: Counting bool flags in a complex query

Quite awhile ago, Michael Richards <miker@scifair.acadiau.ca> wrote:

It looks like the order by is only being applied to the original select,
not the unioned select. Some authority should check on it, but by thought
it that a union does not necessarily maintain the order, so the entire
select should be applied to the order.

Just FYI, I have committed code for 7.1 that allows ORDER BY to work
correctly for a UNION'd query. A limitation is that you can only do
ordering on columns that are outputs of the UNION:

regression=# select q1 from int8_tbl union select q2 from int8_tbl order by 1;
q1
-------------------
-4567890123456789
123
456
4567890123456789
(4 rows)

regression=# select q1 from int8_tbl union select q2 from int8_tbl order by int8_tbl.q1+1;
ERROR: ORDER BY on a UNION/INTERSECT/EXCEPT result must be on one of the result columns

In the general case of an arbitrary ORDER BY expression, it's not clear
how to transpose it into each UNION source select anyway. It could
be made to work for expressions using only the output columns, but since
ORDER BY expressions are not standard SQL I'm not in a big hurry to make
that happen...

regards, tom lane

#21Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#20)
Re: Re: [HACKERS] Counting bool flags in a complex query

Tom,

Just FYI, I have committed code for 7.1 that allows ORDER
BY to work
correctly for a UNION'd query. A limitation is that you
can only do
ordering on columns that are outputs of the UNION:

As far as I know, that limitation is standard to all SQL
that supports UNION; the relational calculus (I'm told) is
impossible otherwise.

So ... we keep hearing about all the fantastic fixes in 7.1.
When will a stable build show up? :-)

-Josh

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#21)
Re: Re: [HACKERS] Counting bool flags in a complex query

"Josh Berkus" <josh@agliodbs.com> writes:

Just FYI, I have committed code for 7.1 that allows ORDER BY to work
correctly for a UNION'd query. A limitation is that you can only do
ordering on columns that are outputs of the UNION:

As far as I know, that limitation is standard to all SQL
that supports UNION; the relational calculus (I'm told) is
impossible otherwise.

It's not very reasonable to imagine ordering on arbitrary expressions;
how would you interpret the expression in each sub-SELECT? But it's
reasonable to imagine ordering on expressions that use only the
output columns of the UNION-type query:

SELECT q1, q2 FROM tbl1 UNION SELECT ...
ORDER BY q1+q2;

However, I didn't try to implement this yet.

So ... we keep hearing about all the fantastic fixes in 7.1.
When will a stable build show up? :-)

How stable is stable? I'd say it's plenty stable enough for beta
testing now, even though we're not putting out formal beta releases
quite yet. You could grab a nightly snapshot off the FTP server
if you want to try it. (Beware that you will most likely have to
do another initdb before beta, so loading lots and lots of data
into a snapshot installation is probably a waste of time.)

regards, tom lane