Rules and Views

Started by Curt Sampsonover 23 years ago30 messages
#1Curt Sampson
cjs@cynic.net

I'm having a weird problem on my " PostgreSQL 7.2.1 on i386--netbsdelf,
compiled by GCC 2.95.3" system. Executing these commands:

CREATE TABLE test_one (id int PRIMARY KEY, value_one text);
CREATE TABLE test_two (id int PRIMARY KEY, value_two text);
CREATE VIEW test AS
SELECT test_one.id, value_one, value_two
FROM test_one
JOIN test_two USING (id);
CREATE RULE test_insert AS
ON INSERT TO test
DO (
INSERT INTO test_one (id, value_one) VALUES (NEW.id, NEW.value_one);
INSERT INTO test_two (id, value_two) VALUES (NEW.id, NEW.value_two);
);
INSERT INTO test VALUES (1, 'one', 'onemore');

returns "ERROR: Cannot insert into a view without an appropriate rule"
for that last statement. The rule does show up in pg_rules, though.

What am I doing wrong here? Is there a bug?

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Curt Sampson (#1)
Re: Rules and Views

Curt Sampson <cjs@cynic.net> writes:

CREATE VIEW test AS ...
CREATE RULE test_insert AS
ON INSERT TO test
DO ...
INSERT INTO test VALUES (1, 'one', 'onemore');
ERROR: Cannot insert into a view without an appropriate rule

What am I doing wrong here? Is there a bug?

Make that "ON INSERT DO INSTEAD". As coded, the rule leaves the
original insertion into the view still active.

Perhaps the error message could be phrased better --- any thoughts?

regards, tom lane

#3Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#2)
Re: Rules and Views

On Wed, 2002-07-31 at 10:22, Tom Lane wrote:

Curt Sampson <cjs@cynic.net> writes:

On Wed, 31 Jul 2002, Tom Lane wrote:

Well, to my mind that's what the error message says now. The reason
it didn't help you was that you *did* have a rule ... but it didn't
completely override the view insertion.

Right, like I said, my model was wrong. I didn't think of the error
message as being an "insert behaviour" that had to be overridden; I
thought of it as a "there is no behaviour right now" message.

Hm. How about

ERROR: Cannot insert into a view
You need an unconditional ON INSERT DO INSTEAD rule

Seems more accurate, but actually you may also have two or more
conditional rules that cover all possibilities if taken together.

Maybe

ERROR: Cannot insert into a view
You need an ON INSERT DO INSTEAD rule that matches your INSERT

Which covers both cases.

-----------------
Hannu

#4Curt Sampson
cjs@cynic.net
In reply to: Tom Lane (#2)
Re: Rules and Views

On Tue, 30 Jul 2002, Tom Lane wrote:

Curt Sampson <cjs@cynic.net> writes:

CREATE VIEW test AS ...
CREATE RULE test_insert AS
ON INSERT TO test
DO ...
INSERT INTO test VALUES (1, 'one', 'onemore');
ERROR: Cannot insert into a view without an appropriate rule

What am I doing wrong here? Is there a bug?

Make that "ON INSERT DO INSTEAD". As coded, the rule leaves the
original insertion into the view still active.

Ah, I see! My model of how this was working was wrong.

Perhaps the error message could be phrased better --- any thoughts?

Maybe a message that says something along the lines of "cannot insert
into views; you need to override this behaviour with a rule"? Also, some
examples in the manual would be helpful.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Curt Sampson (#4)
Re: Rules and Views

Curt Sampson <cjs@cynic.net> writes:

ERROR: Cannot insert into a view without an appropriate rule

Perhaps the error message could be phrased better --- any thoughts?

Maybe a message that says something along the lines of "cannot insert
into views; you need to override this behaviour with a rule"?

Well, to my mind that's what the error message says now. The reason
it didn't help you was that you *did* have a rule ... but it didn't
completely override the view insertion.

I'm not sure how to phrase a more useful message. Note that the place
where the error can be detected doesn't have any good way to know that
a non-INSTEAD rule was in fact processed, so we can't say anything quite
as obvious as "You needed to use INSTEAD in your rule, luser". Can we
cover both the no-rule-at-all case and the had-a-rule-but-it-wasn't-
INSTEAD case in a single, reasonably phrased error message? (Just
to make life interesting, there's also the case where you made an
INSTEAD rule but it's conditional.)

Also, some examples in the manual would be helpful.

Aren't there several already? But feel free to contribute more...

regards, tom lane

#6Curt Sampson
cjs@cynic.net
In reply to: Tom Lane (#5)
Re: Rules and Views

On Wed, 31 Jul 2002, Tom Lane wrote:

Well, to my mind that's what the error message says now. The reason
it didn't help you was that you *did* have a rule ... but it didn't
completely override the view insertion.

Right, like I said, my model was wrong. I didn't think of the error
message as being an "insert behaviour" that had to be overridden; I
thought of it as a "there is no behaviour right now" message.

Maybe it's just me not reading the docs all that well; I wouldn't worry
about this if it's not been a problem for others.

Also, some examples in the manual would be helpful.

Aren't there several already? But feel free to contribute more...

Yeah, but nothing showing these rules on a view across two tables.
I'll try to work it out and send it here for comments.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Curt Sampson (#6)
Re: Rules and Views

Curt Sampson <cjs@cynic.net> writes:

On Wed, 31 Jul 2002, Tom Lane wrote:

Well, to my mind that's what the error message says now. The reason
it didn't help you was that you *did* have a rule ... but it didn't
completely override the view insertion.

Right, like I said, my model was wrong. I didn't think of the error
message as being an "insert behaviour" that had to be overridden; I
thought of it as a "there is no behaviour right now" message.

Hm. How about

ERROR: Cannot insert into a view
You need an unconditional ON INSERT DO INSTEAD rule

regards, tom lane

#8Curt Sampson
cjs@cynic.net
In reply to: Tom Lane (#7)
Re: Rules and Views

On Wed, 31 Jul 2002, Tom Lane wrote:

ERROR: Cannot insert into a view
You need an unconditional ON INSERT DO INSTEAD rule

Sounds great to me!

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#3)
Re: Rules and Views

Hannu Krosing <hannu@tm.ee> writes:

On Wed, 2002-07-31 at 10:22, Tom Lane wrote:

Hm. How about

ERROR: Cannot insert into a view
You need an unconditional ON INSERT DO INSTEAD rule

Seems more accurate, but actually you may also have two or more
conditional rules that cover all possibilities if taken together.
Maybe
ERROR: Cannot insert into a view
You need an ON INSERT DO INSTEAD rule that matches your INSERT
Which covers both cases.

Actually not: the system insists that you provide an unconditional
DO INSTEAD rule. The other would require trying to prove (during
rule expansion) a theorem that the conditions of the available
conditional rules cover all possible cases.

Alternatively we could move the test for insertion-into-a-view out of
the rewriter and into a low level of the executor, producing an error
message only if some inserted tuple actually gets past the rule
conditions. I don't much care for that answer because (a) it turns a
once-per-query overhead check into once-per-tuple overhead, and
(b) if you fail to span the full space of possibilities in your rule
conditions, you might not find out about it until your application goes
belly-up in production. There's some version of Murphy's Law that says
rare conditions arise with very low probability during testing, and very
high probability as soon as you go live...

regards, tom lane

#10Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#9)
Re: Rules and Views

Seems more accurate, but actually you may also have two or more
conditional rules that cover all possibilities if taken together.
Maybe
ERROR: Cannot insert into a view
You need an ON INSERT DO INSTEAD rule that matches your INSERT
Which covers both cases.

Actually not: the system insists that you provide an unconditional
DO INSTEAD rule. The other would require trying to prove (during
rule expansion) a theorem that the conditions of the available
conditional rules cover all possible cases.

Alternatively we could move the test for insertion-into-a-view out of
the rewriter and into a low level of the executor, producing an error
message only if some inserted tuple actually gets past the rule
conditions. I don't much care for that answer because (a) it turns a
once-per-query overhead check into once-per-tuple overhead, and

Since I see a huge benefit in allowing conditional rules for a view,
I think it is worth finding a solution.

The current rewriter test could still catch the case where no instead rule
exists at all.

The utility is "Table Partitioning by expression".

Basically you have a union view like:
create view history as
select * from history2000 where yearcol=2000
union all
select * from history2001 where yearcol=2001

You get the idea.
Now you need conditional insert and update rules to act on the
correct table.

Maybe we would also need additional intelligence in the planner
to eliminate the history2000 table in a select * from history where
yearcol=2001.

But that is all you need for a really useful feature for large databases.

(b) if you fail to span the full space of possibilities in your rule
conditions, you might not find out about it until your application goes
belly-up in production. There's some version of Murphy's Law that says
rare conditions arise with very low probability during testing, and very
high probability as soon as you go live...

This is true for other db's table partitioning capabilities as well, and they
still implement the feature.

Andreas

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#10)
Re: Rules and Views

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

Since I see a huge benefit in allowing conditional rules for a view,
I think it is worth finding a solution.

We do allow conditional rules for a view. You just have to write an
unconditional one too (which can be merely DO INSTEAD NOTHING).

regards, tom lane

#12Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#11)
Re: Rules and Views

Since I see a huge benefit in allowing conditional rules for a view,
I think it is worth finding a solution.

We do allow conditional rules for a view. You just have to write an
unconditional one too (which can be merely DO INSTEAD NOTHING).

Hmm, but you cannot then trow an error, but that is prbbly a minor issue.
Good that we can do Table Partitioning :-)

Andreas

#13Curt Sampson
cjs@cynic.net
In reply to: Zeugswetter Andreas SB SD (#10)
Re: Rules and Views

On Wed, 31 Jul 2002, Zeugswetter Andreas SB SD wrote:

The utility is "Table Partitioning by expression".

Basically you have a union view like:
create view history as
select * from history2000 where yearcol=2000
union all
select * from history2001 where yearcol=2001

You want to be careful with this sort of stuff, since the query planner
sometimes won't do the view as efficiently as it would do the fully
specified equivalant query. I've posted about this here before.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Curt Sampson (#13)
Re: Rules and Views

Curt Sampson <cjs@cynic.net> writes:

You want to be careful with this sort of stuff, since the query planner
sometimes won't do the view as efficiently as it would do the fully
specified equivalant query. I've posted about this here before.

Please provide an example. AFAIK a view is a query macro, and nothing
else.

regards, tom lane

#15Curt Sampson
cjs@cynic.net
In reply to: Tom Lane (#14)
Re: Rules and Views

On Thu, 1 Aug 2002, Tom Lane wrote:

Curt Sampson <cjs@cynic.net> writes:

You want to be careful with this sort of stuff, since the query planner
sometimes won't do the view as efficiently as it would do the fully
specified equivalant query. I've posted about this here before.

Please provide an example. AFAIK a view is a query macro, and nothing
else.

I already did provide an example, and you even replied to it. :-)
See the appended message.

BTW, this page

http://archives.postgresql.org/pgsql-general/2002-06/threads.php

does not display in Navigator 4.78. Otherwise I would have provided a
reference to the thread in the archive.

Maybe we need a web based form for reporting problem pages in the archives.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Curt Sampson (#15)
Re: Rules and Views

Curt Sampson <cjs@cynic.net> writes:

On Thu, 1 Aug 2002, Tom Lane wrote:

Curt Sampson <cjs@cynic.net> writes:

You want to be careful with this sort of stuff, since the query planner
sometimes won't do the view as efficiently as it would do the fully
specified equivalant query. I've posted about this here before.

Please provide an example. AFAIK a view is a query macro, and nothing
else.

I already did provide an example, and you even replied to it. :-)

But that isn't an "equivalent query". You've manually transformed
SELECT * FROM (SELECT something UNION SELECT somethingelse) WHERE foo;
into
(SELECT something WHERE foo) UNION (SELECT somethingelse WHERE foo);
As has been pointed out repeatedly, it's not entirely obvious whether
this is a valid transformation in the general case. (The knee-jerk
reaction that it's obviously right should be held in check, since SQL's
three-valued notion of boolean logic tends to trip up the intuition.)
If you can provide a proof that it's always safe, or that it's safe
under such-and-such conditions, I'll see what I can do about making it
happen.

regards, tom lane

#17Curt Sampson
cjs@cynic.net
In reply to: Tom Lane (#16)
Re: Rules and Views

On Thu, 1 Aug 2002, Tom Lane wrote:

But that isn't an "equivalent query". You've manually transformed
SELECT * FROM (SELECT something UNION SELECT somethingelse) WHERE foo;
into
(SELECT something WHERE foo) UNION (SELECT somethingelse WHERE foo);

Right.

As has been pointed out repeatedly, it's not entirely obvious whether
this is a valid transformation in the general case.

Right. And I agreed that it as soon as you first pointed it out.
And still do.

But the message I was replying to was a similar union query, and I was
thinking that that person might be having a similar initial intuitive
reaction, "well, it looks kinda the same." I just wanted to note that
you need to check this stuff with explain, rather than blindly assuming
you know what's going on.

If you can provide a proof that it's always safe, or that it's safe
under such-and-such conditions, I'll see what I can do about making it
happen.

It's on my list of things to do, but not high enough that it's
likely I'll ever get to it. :-)

BTW, if anybody can think of a way to make a view that really does
represent my original query, I'd appreciate a hint.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#18Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Curt Sampson (#17)
Re: Rules and Views

But the message I was replying to was a similar union query, and I was
thinking that that person might be having a similar initial intuitive
reaction, "well, it looks kinda the same." I just wanted to note that
you need to check this stuff with explain, rather than
blindly assuming
you know what's going on.

I had a "union all" view, which is actually a quite different animal than
a "union" view which needs to eliminate duplicates before further processing.

Andreas

#19Curt Sampson
cjs@cynic.net
In reply to: Zeugswetter Andreas SB SD (#18)
Re: Rules and Views

On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:

I had a "union all" view, which is actually a quite different animal than
a "union" view which needs to eliminate duplicates before further processing.

I had the same problem with UNION ALL.

cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

#20Hannu Krosing
hannu@tm.ee
In reply to: Curt Sampson (#19)
Re: Rules and Views

On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:

On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:

I had a "union all" view, which is actually a quite different animal than
a "union" view which needs to eliminate duplicates before further processing.

I had the same problem with UNION ALL.

Could someone give an example where it is not safe to push the WHERE
clause down to individual parts of UNION (or UNION ALL) wher these parts
are simple (non-aggregate) queries?

I can see that it has to be made into HAVING in subquery if UNION's
subqueries are aggregate (GROUP BY) queries, but can anyone give an
example where the meaning of the query changes for non-aggregate
subqueries.

---------------
Hannu

#21Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Hannu Krosing (#20)
Re: Rules and Views

On 1 Aug 2002, Hannu Krosing wrote:

On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:

On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:

I had a "union all" view, which is actually a quite different animal than
a "union" view which needs to eliminate duplicates before further processing.

I had the same problem with UNION ALL.

Could someone give an example where it is not safe to push the WHERE
clause down to individual parts of UNION (or UNION ALL) wher these parts
are simple (non-aggregate) queries?

For union, queries that want to do something like use a temporary
sequence to act sort of like rownum and do row limiting. Admittedly
that's already pretty much unspecified behavior, but it does change
the behavior in the place of duplicate removal. In addition, I think
using bits of the spec we don't completely support you can have the
same issue with the undefined behavior of which duplicate is returned
for values that aren't the same but are equal, for example where the
duplicate removal is in one collation but the outer comparison has
a different explicitly given one.

I haven't come up with any useful examples, and not really any for
union all, however.

#22Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Stephan Szabo (#21)
Re: Rules and Views

On Thu, 1 Aug 2002, Stephan Szabo wrote:

On 1 Aug 2002, Hannu Krosing wrote:

On Thu, 2002-08-01 at 12:29, Curt Sampson wrote:

On Thu, 1 Aug 2002, Zeugswetter Andreas SB SD wrote:

I had a "union all" view, which is actually a quite different animal than
a "union" view which needs to eliminate duplicates before further processing.

I had the same problem with UNION ALL.

Could someone give an example where it is not safe to push the WHERE
clause down to individual parts of UNION (or UNION ALL) wher these parts
are simple (non-aggregate) queries?

For union, queries that want to do something like use a temporary
sequence to act sort of like rownum and do row limiting. Admittedly
that's already pretty much unspecified behavior, but it does change
the behavior in the place of duplicate removal. In addition, I think
using bits of the spec we don't completely support you can have the
same issue with the undefined behavior of which duplicate is returned
for values that aren't the same but are equal, for example where the
duplicate removal is in one collation but the outer comparison has
a different explicitly given one.

Replying to myself, you can do this right now with char columns if you
just push the conditions down blindly, something like:

create table t1(a char(5));
create table t2(a char(6));

insert into t1 values ('aaaaa');
insert into t2 values ('aaaaa');

select * from (select * from t2 union select * from t1) as f where
a::text='aaaaa';
select * from (select * from t2 where a::text='aaaaa' union
select * from t1 where a::text='aaaaa') as f;

The first select gives no rows, the second gives one. We'd have
to transform the second where clause to something like
cast(a as char(6))::text='aaaaa' in order to get the same effect
I think.

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#21)
Re: Rules and Views

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

For union, queries that want to do something like use a temporary
sequence to act sort of like rownum and do row limiting. Admittedly
that's already pretty much unspecified behavior, but it does change
the behavior in the place of duplicate removal. In addition, I think
using bits of the spec we don't completely support you can have the
same issue with the undefined behavior of which duplicate is returned
for values that aren't the same but are equal, for example where the
duplicate removal is in one collation but the outer comparison has
a different explicitly given one.

Hmm. I think this consideration boils down to whether the WHERE clause
can give different results for rows that appear equal under the rules of
UNION/EXCEPT/INTERSECT. If it gives the same result for any two such
rows, then it's safe to push down; otherwise not.

It's not too difficult to come up with examples. I invite you to play
with

select z,length(z) from
(select 'abc '::char(7) as z intersect
select 'abc '::char(8) as z) ss;

and contemplate the effects of pushing down a qual involving length(z).

Whether this particular case is very important in the real world is hard
to say. But there might be more-important cases out there.

And yet, I think we can do it anyway. The score card looks like this to
me:

UNION ALL: always safe to push down, since the rows will be passed
independently to the outer WHERE anyway.

UNION: it's unspecified which of a set of "equal" rows will be returned,
and therefore the behavior would be unspecified anyway if the outer
WHERE can distinguish the rows - you might get 1 row of the set out or
none. If we push down, then we create a situation where the returned
row will always be one that passes the outer WHERE, but that is a legal
behavior.

INTERSECT: again it's unspecified which of a set of "equal" rows will be
returned, and so you might get 1 row out or none. If we push down then
it's still unspecified whether you get a row out (example: if the outer
WHERE will pass only for rows of the left table and not the right, then
push down will result in no rows of the "equal" set being emitted, but
that's a legal behavior).

INTERSECT ALL: if a set of "equal" rows contains M rows from the left
table and N from the right table, you're supposed to get min(M,N) rows
of the set out of the INTERSECT ALL. Again you can't say which of the
set you will get, so the outer WHERE might let anywhere between 0 and
min(M,N) rows out. With push down, M and N will be reduced by the WHERE
before we do the intersection, so you still have 0 to min(M,N) rows out.
The behavior will change, but it's still legal per spec AFAICT.

EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

In short, it looks to me like the spec was carefully designed to allow
push down. Pushing down a condition of this sort *does* change the
behavior, but the new behavior is still within spec.

The above analysis assumes that the WHERE condition is "stable", ie its
results for a row don't depend on the order in which the rows are tested
or anything as weird as that. But we're assuming that already when we
push down a qual in a non-set-operation case, I think.

Comments? Are there any other considerations to worry about?

regards, tom lane

#24Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#23)
Re: Rules and Views

On Thu, 1 Aug 2002, Tom Lane wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

For union, queries that want to do something like use a temporary
sequence to act sort of like rownum and do row limiting. Admittedly
that's already pretty much unspecified behavior, but it does change
the behavior in the place of duplicate removal. In addition, I think
using bits of the spec we don't completely support you can have the
same issue with the undefined behavior of which duplicate is returned
for values that aren't the same but are equal, for example where the
duplicate removal is in one collation but the outer comparison has
a different explicitly given one.

Hmm. I think this consideration boils down to whether the WHERE clause
can give different results for rows that appear equal under the rules of
UNION/EXCEPT/INTERSECT. If it gives the same result for any two such
rows, then it's safe to push down; otherwise not.

It's not too difficult to come up with examples. I invite you to play
with

select z,length(z) from
(select 'abc '::char(7) as z intersect
select 'abc '::char(8) as z) ss;

and contemplate the effects of pushing down a qual involving length(z).

Whether this particular case is very important in the real world is hard
to say. But there might be more-important cases out there.

And yet, I think we can do it anyway. The score card looks like this to
me:

UNION ALL: always safe to push down, since the rows will be passed
independently to the outer WHERE anyway.

UNION: it's unspecified which of a set of "equal" rows will be returned,
and therefore the behavior would be unspecified anyway if the outer
WHERE can distinguish the rows - you might get 1 row of the set out or
none. If we push down, then we create a situation where the returned
row will always be one that passes the outer WHERE, but that is a legal
behavior.

INTERSECT: again it's unspecified which of a set of "equal" rows will be
returned, and so you might get 1 row out or none. If we push down then
it's still unspecified whether you get a row out (example: if the outer
WHERE will pass only for rows of the left table and not the right, then
push down will result in no rows of the "equal" set being emitted, but
that's a legal behavior).

INTERSECT ALL: if a set of "equal" rows contains M rows from the left
table and N from the right table, you're supposed to get min(M,N) rows
of the set out of the INTERSECT ALL. Again you can't say which of the
set you will get, so the outer WHERE might let anywhere between 0 and
min(M,N) rows out. With push down, M and N will be reduced by the WHERE
before we do the intersection, so you still have 0 to min(M,N) rows out.
The behavior will change, but it's still legal per spec AFAICT.

EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

Actually I think in except you may only push down to the left, since in
this case you know that any duplicate from the right will not be
returned (since there must be none). So, you can't potentially drop
a row from the right side that may have been a duplicate of a left
side row that does match the condition.

If we assume two collations one case sensitive one not with the
except in the non-sensitive and the where in the sensitive and
a left with 'A' and right with 'a', it'd be incorrect to push a
case sensitive where foo='A' down to the right since that'd change the
output from zero rows to one.

Something similar for except all since lowering the number of rows
on the right can increase the number of returned rows above
m-n (if say all m dups match the condition and none of n do)

The above analysis assumes that the WHERE condition is "stable", ie its
results for a row don't depend on the order in which the rows are tested
or anything as weird as that. But we're assuming that already when we
push down a qual in a non-set-operation case, I think.

In which case we don't have to worry about the nextval() case.

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#24)
Re: Rules and Views

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

Actually I think in except you may only push down to the left, since in
this case you know that any duplicate from the right will not be
returned (since there must be none). So, you can't potentially drop
a row from the right side that may have been a duplicate of a left
side row that does match the condition.

But we *want* to push down --- the point is to get some selectivity
into the bottom queries. You're right that in a plain EXCEPT it would
be possible to push only to the left, but that doesn't give the
performance improvement we want.

If we assume two collations one case sensitive one not with the
except in the non-sensitive and the where in the sensitive and
a left with 'A' and right with 'a', it'd be incorrect to push a
case sensitive where foo='A' down to the right since that'd change the
output from zero rows to one.

You missed my point. Per spec, either zero or one rows out of the whole
thing is okay, because either the 'A' or the 'a' row might be returned
as the representative row for the group by the EXCEPT. Yes, the
behavior may change, but it's still within spec.

In which case we don't have to worry about the nextval() case.

Yeah, I think nextval() and random() and so forth can be ignored;
the transformations we already do will confuse the results for such
cases, so one more isn't gonna make it worse.

regards, tom lane

#26Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#25)
Re: Rules and Views

On Thu, 1 Aug 2002, Tom Lane wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

If we assume two collations one case sensitive one not with the
except in the non-sensitive and the where in the sensitive and
a left with 'A' and right with 'a', it'd be incorrect to push a
case sensitive where foo='A' down to the right since that'd change the
output from zero rows to one.

You missed my point. Per spec, either zero or one rows out of the whole
thing is okay, because either the 'A' or the 'a' row might be returned
as the representative row for the group by the EXCEPT. Yes, the
behavior may change, but it's still within spec.

Except can't return 'A' or 'a', there is no representative row because
n>0. That's the difference with UNION and INTERSECT.

"If EXCEPT is specified, then
Case:
A) If m>0 and n=0, then T contains exactly one duplicate of R.
B) Otherwise, T contains no duplicate of R."

So if T1 has a #dups>0 and T2 has a #dups>0 we should get
no rows, but what if T1' (with the clause) has a #dups>0 but
T2' has a #dups=0?

#27Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#23)
Re: Rules and Views

On Thu, 2002-08-01 at 18:02, Tom Lane wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

For union, queries that want to do something like use a temporary
sequence to act sort of like rownum and do row limiting. Admittedly
that's already pretty much unspecified behavior, but it does change
the behavior in the place of duplicate removal. In addition, I think
using bits of the spec we don't completely support you can have the
same issue with the undefined behavior of which duplicate is returned
for values that aren't the same but are equal, for example where the
duplicate removal is in one collation but the outer comparison has
a different explicitly given one.

Hmm. I think this consideration boils down to whether the WHERE clause
can give different results for rows that appear equal under the rules of
UNION/EXCEPT/INTERSECT.

Yes. I originally started to ponder this when trying to draw up a plan
for automatic generation of ON UPDATE DO INSTEAD rules for views. While
pushing down the WHERE clause is just a performance thing for SELECT it
is essential for ON UPDATE rules.

If it gives the same result for any two such
rows, then it's safe to push down; otherwise not.

It's not too difficult to come up with examples. I invite you to play
with

select z,length(z) from
(select 'abc '::char(7) as z intersect
select 'abc '::char(8) as z) ss;

and contemplate the effects of pushing down a qual involving length(z).

I guess the pushdown must also push implicit conversions done to parts
of union.

if that conversion were applied to z's in both parts of UNION then the
result should be the same.

select z,length(z) from
(
select 'abc '::char(7) as z
union
select 'abc '::char(8) as z
) ss where length(z) = 7;

becomes:

select z,length(z) from
(
select 'abc '::char(7) as z
where length(cast('abc '::char(7) as char(7))) = 7
union
select 'abc '::char(8) as z
where length(cast('abc '::char(8) as char(7))) = 7
) ss ;

which both return 'abc ', 7

Of course it is beneficial to detect when the conversion is not needed,
so that indexes will be used if available.

---------------
Hannu

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#26)
Re: Rules and Views

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

So if T1 has a #dups>0 and T2 has a #dups>0 we should get
no rows, but what if T1' (with the clause) has a #dups>0 but
T2' has a #dups=0?

Um, you're right --- pushing down into the right-hand side would reduce
N, thereby possibly *increasing* the number of output rows not reducing
it. My mistake ... should have worked out the EXCEPT case in more
detail.

This says that we can't push down at all in the EXCEPT ALL case, I
think, and I'm leery about whether we should push for EXCEPT. But
the UNION and INTERSECT cases are probably the important ones anyway.

regards, tom lane

#29Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#28)
Re: Rules and Views

On Thu, 1 Aug 2002, Tom Lane wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

So if T1 has a #dups>0 and T2 has a #dups>0 we should get
no rows, but what if T1' (with the clause) has a #dups>0 but
T2' has a #dups=0?

Um, you're right --- pushing down into the right-hand side would reduce
N, thereby possibly *increasing* the number of output rows not reducing
it. My mistake ... should have worked out the EXCEPT case in more
detail.

This says that we can't push down at all in the EXCEPT ALL case, I
think, and I'm leery about whether we should push for EXCEPT. But
the UNION and INTERSECT cases are probably the important ones anyway.

I think that we can push to the left in both (should is a separate issue).

If the condition is true for all of the left hand dups, we can
choose to have emitted such rows as the output of the EXCEPT ALL in
the theoretical case so that the output is the same, max(0, m-n) rows.
If the condition is false for any of the left hand dups, we can safely
return any number of rows between 0 and max(0,m-n) rows since we can
say that the difference were rows that failed the where clause. If
we push the condition down, we'll get some number m1 rows that succeed
the condition (with m1<m), so returning max(0, m1-n) should be safe.
If the condition is false for all of the rows, m1=0 so we'll correctly
return no rows.

I think.

#30Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Stephan Szabo (#29)
Re: Rules and Views

Hmm. I think this consideration boils down to whether the WHERE clause
can give different results for rows that appear equal under the rules of
UNION/EXCEPT/INTERSECT. If it gives the same result for any two such
rows, then it's safe to push down; otherwise not.

It's not too difficult to come up with examples. I invite you to play
with

select z,length(z) from
(select 'abc '::char(7) as z intersect
select 'abc '::char(8) as z) ss;

and contemplate the effects of pushing down a qual involving
length(z).

I guess that is why e.g. Informix returns 3 for both of them. Imho that
makes a lot of sense. The trailing spaces in char's are supposed to be
irrellevant. (But iirc this has already been discussed and rejected)

Whether this particular case is very important in the real world is hard
to say. But there might be more-important cases out there.

And yet, I think we can do it anyway. The score card looks
like this to
me:

UNION ALL: always safe to push down, since the rows will be passed
independently to the outer WHERE anyway.

Yes, that would imho also be the most important optimization.

UNION: it's unspecified which of a set of "equal" rows will be returned,
and therefore the behavior would be unspecified anyway if the outer
WHERE can distinguish the rows - you might get 1 row of the set out or
none. If we push down, then we create a situation where the returned
row will always be one that passes the outer WHERE, but that
is a legal behavior.

INTERSECT: again it's unspecified which of a set of "equal" rows will be
returned, and so you might get 1 row out or none. If we push down then
it's still unspecified whether you get a row out (example: if the outer
WHERE will pass only for rows of the left table and not the right, then
push down will result in no rows of the "equal" set being emitted, but
that's a legal behavior).

INTERSECT ALL: if a set of "equal" rows contains M rows from the left
table and N from the right table, you're supposed to get min(M,N) rows
of the set out of the INTERSECT ALL. Again you can't say which of the
set you will get, so the outer WHERE might let anywhere between 0 and
min(M,N) rows out. With push down, M and N will be reduced by the WHERE
before we do the intersection, so you still have 0 to
min(M,N) rows out.
The behavior will change, but it's still legal per spec AFAICT.

EXCEPT, EXCEPT ALL: the same sort of analysis seems to hold.

The imho difficult question is, which select locks down the datatype to use
for this column. In a strict sense char(6) and char(7) are not the same
type. Since I would certainly not want to be that strict, it imho has to be
decided what type the union/intersect... is supposed to use.
Informix converts them both to the longer char. I do not think it is
valid to return variable length char's.

e.g.:
create table atab1 (a char(6));
create table atab2 (a char(8));
insert into atab1 values ('abc');
insert into atab2 values ('abc');
create view aview as select * from atab1 union all select * from atab2;
select '<'||a||'>' from aview;
Informix:
(expression)
<abc >
<abc >
PostgreSQL:
?column?
------------
<abc >
<abc >

I am not sure eighter answer is strictly correct. I would probably have
expected <abc > <abc > (char(6)) since the first select is supposed to
lock down the type, no ?

In short, it looks to me like the spec was carefully designed to allow
push down. Pushing down a condition of this sort *does* change the
behavior, but the new behavior is still within spec.

I think this would be a great performance boost for views and thus
worth a change in results that are within spec.
Would you want to push down always ? There could be outer where clauses,
that are so expensive that you would not want to do them twice.
If it is all or nothing, I do think pushing down always is better than not.

Andreas