is it a known issue or just a bug?

Started by Hans-Jürgen Schönigover 21 years ago6 messages
#1Hans-Jürgen Schönig
postgres@cybertec.at

Folks,

Last week one of my students confronted me with a nice little SQL
statement which made me call gdb ...

Consider the following scenario:

[hs@fedora bug]$ cat q1.sql
create temporary sequence seq_ab;

select * from (Select nextval('seq_ab') as nv,
* from ( select
t_product.id,t_text.value,t_price.price
from t_product,t_price,t_text
where t_product.id = t_price.product_id
and t_product.name = t_text.id
and t_text.lang='de'
and t_price.typ = 'default'
order by price desc ) as t ) as u
-- WHERE nv <= 1
;
[hs@fedora bug]$ psql test < q1.sql
CREATE SEQUENCE
nv | id | value | price
----+----+---------+-------
1 | 3 | Banane | 12
2 | 1 | T-Shirt | 10
3 | 2 | Apfel | 7
(3 rows)

this query returns the right result.
however, when uncommenting the WHERE clause things look different:

[hs@fedora bug]$ cat q2.sql
create temporary sequence seq_ab;

select * from (Select nextval('seq_ab') as nv,
* from ( select
t_product.id,t_text.value,t_price.price
from t_product,t_price,t_text
where t_product.id = t_price.product_id
and t_product.name = t_text.id
and t_text.lang='de'
and t_price.typ = 'default'
order by price desc ) as t ) as u
WHERE nv <= 1
;
[hs@fedora bug]$ psql test < q2.sql
CREATE SEQUENCE
nv | id | value | price
----+----+---------+-------
4 | 1 | T-Shirt | 10
(1 row)

Obviously nv = 4 is wrong ...
Looking at the execution plan of the second query the problem seems
quite obvious:

QUERY PLAN
--------------------------------------------------------------------------------------------------------
Subquery Scan t (cost=69.24..69.26 rows=1 width=68)
-> Sort (cost=69.24..69.25 rows=1 width=68)
Sort Key: t_price.price
-> Hash Join (cost=22.51..69.23 rows=1 width=68)
Hash Cond: ("outer".name = "inner".id)
Join Filter: (nextval('seq_ab'::text) <= 1)
-> Nested Loop (cost=0.00..46.68 rows=5 width=40)
-> Seq Scan on t_price (cost=0.00..22.50 rows=5
width=36)
Filter: (typ = 'default'::text)
-> Index Scan using t_product_pkey on t_product
(cost=0.00..4.82 rows=1 width=8)
Index Cond: (t_product.id = "outer".product_id)
-> Hash (cost=22.50..22.50 rows=5 width=36)
-> Seq Scan on t_text (cost=0.00..22.50 rows=5
width=36)
Filter: (lang = 'de'::text)
(14 rows)

nextval() is called again when processing the WHERE clause.
this was fine if nextval() would return the same thing again and again
(which is not the job of nextval).
if the planner materialized the subquery things would materialize the
subquery in case of unstable functions things would work in this case.

I know I temp table would easily fix this query and it is certainly not
the best query I have ever seen but still it seems like a bug and I just
wanted to know whether it is a know issue or not.
Looking at the code I did not quite know whether this is something which
should / can be fixed or not.

here is the data:
------------------------------------------------------
CREATE TABLE t_text (
id int4,
lang text,
value text
);

CREATE TABLE t_group (
id int4,
name int4, -- mehrsprachig in t_text
valid boolean
);

INSERT INTO t_group VALUES (1, 1, 't');
INSERT INTO t_text VALUES (1, 'de', 'Obst');
INSERT INTO t_text VALUES (1, 'en', 'Fruits');

INSERT INTO t_group VALUES (2, 2, 't');
INSERT INTO t_text VALUES (2, 'de', 'Kleidung');
INSERT INTO t_text VALUES (2, 'en', 'Clothes');

CREATE UNIQUE INDEX idx_group_id ON t_group (id);

CREATE TABLE t_product (
id int4,
name int4, -- mehrsprachig in t_text
active boolean,
PRIMARY KEY (id)
);

INSERT INTO t_product VALUES (1, 3, 't');
INSERT INTO t_text VALUES (3, 'de', 'T-Shirt');
INSERT INTO t_text VALUES (3, 'en', 'T-Shirt');

INSERT INTO t_product VALUES (2, 4, 't');
INSERT INTO t_text VALUES (4, 'de', 'Apfel');
INSERT INTO t_text VALUES (4, 'en', 'Apple');

INSERT INTO t_product VALUES (3, 5, 't');
INSERT INTO t_text VALUES (5, 'de', 'Banane');
INSERT INTO t_text VALUES (5, 'en', 'Banana');

CREATE TABLE t_product_group (
product_id int4 REFERENCES t_product(id)
ON UPDATE CASCADE
ON DELETE CASCADE,
group_id int4 REFERENCES t_group(id)
ON UPDATE CASCADE
ON DELETE CASCADE
);

INSERT INTO t_product_group VALUES (2, 1);
INSERT INTO t_product_group VALUES (3, 1);
INSERT INTO t_product_group VALUES (1, 2);

CREATE TABLE t_price (
id int4,
typ text,
price numeric,
product_id int4 REFERENCES t_product(id)
ON UPDATE CASCADE
ON DELETE CASCADE,
PRIMARY KEY (id)
);

INSERT INTO t_price VALUES (1, 'default', '10', 1);
INSERT INTO t_price VALUES (2, 'sonder', '20', 1);
INSERT INTO t_price VALUES (3, 'spezial', '30', 1);
INSERT INTO t_price VALUES (4, 'default', '7', 2);
INSERT INTO t_price VALUES (5, 'default', '12', 3);

Best regards,

Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/720/10 1234567 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hans-Jürgen Schönig (#1)
Re: is it a known issue or just a bug?

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres@cybertec.at> writes:

Consider the following scenario:

select * from (Select nextval('seq_ab') as nv,
* from ( select
t_product.id,t_text.value,t_price.price
from t_product,t_price,t_text
where t_product.id = t_price.product_id
and t_product.name = t_text.id
and t_text.lang='de'
and t_price.typ = 'default'
order by price desc ) as t ) as u
WHERE nv <= 1
;

I don't think there's any very clean way to fix this sort of problem in
general. We could make this particular example work if

(1) we prevented a subquery containing volatile functions in its
targetlist from being flattened into the parent query, and

(2) we prevented outer WHERE clauses from being pushed down into a
subquery when they reference subquery outputs containing volatile
functions.

There has been some recent discussion about doing (1) but I think we
forgot about the necessity to also do (2); otherwise you'd end up with

select * from (Select nextval('seq_ab') as nv,
...
WHERE nextval('seq_ab') <= 1
) as u
;

which is hardly any better.

Now those things are both doable but where it really falls down is when
you join the subselect to some other table. Short of materializing the
subselect there'd be no way to guarantee single evaluation of any one
row in the subselect.

I'd be willing to do (1) and (2) but not to force materialization; the
performance hit for that just seems unacceptable.

regards, tom lane

#3Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#2)
Re: is it a known issue or just a bug?

Tom,

I don't think there's any very clean way to fix this sort of problem in
general.  We could make this particular example work if

Frankly, I don't think there *is* any safe way to use volatile functions in
subqueries -- I certainly avoid it, except now() and random() which as
discussed are special cases. Perhaps a WARNING is in order?

--
Josh Berkus
Aglio Database Solutions
San Francisco

#4Hans-Jürgen Schönig
postgres@cybertec.at
In reply to: Josh Berkus (#3)
Re: is it a known issue or just a bug?

Josh Berkus wrote:

Tom,

I don't think there's any very clean way to fix this sort of problem in
general. We could make this particular example work if

Frankly, I don't think there *is* any safe way to use volatile functions in
subqueries -- I certainly avoid it, except now() and random() which as
discussed are special cases. Perhaps a WARNING is in order?

Personally I like Josh's idea. A warning would be a nice thing.

I have just looked at Oracle and SQL Server to find out what those
systems are doing ...

Oracle has a very interesting concept *g*:

ORA-02287 sequence number not allowed here:
The usage of a sequence is limited and it can be used only in few areas
of PL/SQL and SQL coding.

The following are the cases where you can't use a sequence:
For a SELECT Statement:
1. In a WHERE clause
2. In a GROUP BY or ORDER BY clause
3. In a DISTINCT clause
4. Along with a UNION or INTERSECT or MINUS
5. In a sub-query

--------------------------

SQL Server does not support sequences the way we know it so it is hard
to compare.

I did not have time to test DB2 yet.

Thanks a lot,

Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/720/10 1234567 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hans-Jürgen Schönig (#4)
Re: is it a known issue or just a bug?

=?UTF-8?B?SGFucy1Kw7xyZ2VuIFNjaMO2bmln?= <postgres@cybertec.at> writes:

Josh Berkus wrote:

Frankly, I don't think there *is* any safe way to use volatile functions in
subqueries -- I certainly avoid it, except now() and random() which as
discussed are special cases. Perhaps a WARNING is in order?

Personally I like Josh's idea. A warning would be a nice thing.

From the planner's perspective, it would have to warn about any volatile
function, which would probably be overly chatty --- remember that the
default marking for user-defined functions is "volatile".

This default may also be a good reason not to put in the anti-flattening
defenses I suggested before, because it would mean that even slight
sloppiness in the definition of a user function could cripple subquery
optimization. I'm not sure that that's a strong argument, but it's
something to think about.

It'd be easy enough to put in the anti-flattening defenses (checks (1)
and (2) in my prior message) but I've got mixed emotions about whether
this is really a good thing to do. Any opinions out there?

regards, tom lane

#6Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#5)
Re: is it a known issue or just a bug?

Tom,

It'd be easy enough to put in the anti-flattening defenses (checks (1)
and (2) in my prior message) but I've got mixed emotions about whether
this is really a good thing to do. Any opinions out there?

If my opinion wasn't clear, I was suggesting adding a WARNING and not doing
anything about flattening. I can't say that, in 5 years of developing
applications in Postgres, that this has ever been a problem for me personally
-- from my perspective the persons reporting the issue needs to re-code their
query, it's not what sequences were meant for.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco