ERROR: cannot handle unplanned sub-select

Started by Merlin Moncureover 15 years ago6 messages
#1Merlin Moncure
mmoncure@gmail.com

Not 100% sure I have a bug, but I've never seen this before so I
though it was worth a post. Confirmed on 8.4.4 and 9.0 beta. I have
a small self contained test case that I can send off line or on the
list following some code obfuscation (it's fairly complex to set up).

merlin

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#1)
Re: ERROR: cannot handle unplanned sub-select

Merlin Moncure <mmoncure@gmail.com> writes:

Not 100% sure I have a bug, but I've never seen this before so I
though it was worth a post. Confirmed on 8.4.4 and 9.0 beta.

So let's see the test case ...

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#1)
pg_dump and join aliases (was Re: [BUGS] ERROR: cannot handle unplanned sub-select)

Merlin sent me a test case off-list for the problem mentioned here:
http://archives.postgresql.org/pgsql-bugs/2010-07/msg00025.php

After some investigation I was able to simplify it to the following
example using the regression database:

select
(select sq1) as qq1
from
(select exists(select 1 from int4_tbl where f1 = q2) as sq1, 42 as dummy
from int8_tbl) sq0
join
int4_tbl i4 on dummy = i4.f1;

The problem is that flatten_join_alias_vars() can push SubLink
expressions down into sub-selects, as in this example when it replaces
the "sq1" reference with the EXISTS() subexpression that was previously
pulled up by flattening sq0. But it fails to set the hasSubLinks flag
in the sub-Query, so subsequent processing doesn't think it needs to do
SS_process_sublinks within the sub-Query, and eventually we fail when we
come across the unprocessed SubLink. This bug goes clear back to 7.4.
Fortunately it's simple to fix.

What seems more interesting is that I initially had a hard time
reproducing the bug under different conditions, and didn't figure out
what was going on until I realized that I had used pg_dump to
consolidate the multiple files Merlin sent ... and *reloading pg_dump's
version of the views didn't exhibit the bug*. This is because pg_dump,
or more accurately ruleutils.c, has a habit of qualifying variable
references whether or not they were qualified in the original query.
If you turn the above example into a view and then dump it, you'll get

... (select sq0.sq1) as qq1 ...

and that doesn't tickle this bug. (That's because "sq0.sq1" isn't a
join alias Var, whereas unqualified "sq1" is.)

So the question that seems worth discussing is whether this difference
ought to be considered a bug in ruleutils. In theory it shouldn't
matter if pg_dump adds an "unnecessary" qualification, but here's an
example where it did matter. Do we care? People tend to assume that
dumping and reloading will make no change in the behavior of their
views, so this seems kind of scary to me. On the other hand, the
"extra" qualifications make view definitions a bit more robust in the
face of column additions, renamings, etc. So there's certainly a case
to be made that the dump behavior is preferable as-is.

Thoughts?

regards, tom lane

#4Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#3)
Re: pg_dump and join aliases (was Re: [BUGS] ERROR: cannot handle unplanned sub-select)

On Wed, Jul 7, 2010 at 6:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

After some investigation I was able to simplify it to the following
example using the regression database:

select
 (select sq1) as qq1
from
 (select exists(select 1 from int4_tbl where f1 = q2) as sq1, 42 as dummy
  from int8_tbl) sq0
 join
 int4_tbl i4 on dummy = i4.f1;

[discussion of bug]

What seems more interesting is that I initially had a hard time
reproducing the bug under different conditions, and didn't figure out
what was going on until I realized that I had used pg_dump to
consolidate the multiple files Merlin sent ... and *reloading pg_dump's
version of the views didn't exhibit the bug*.  This is because pg_dump,
or more accurately ruleutils.c, has a habit of qualifying variable
references whether or not they were qualified in the original query.
If you turn the above example into a view and then dump it, you'll get

       ... (select sq0.sq1) as qq1 ...

and that doesn't tickle this bug.  (That's because "sq0.sq1" isn't a
join alias Var, whereas unqualified "sq1" is.)

I'm lost. What's a join alias var?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#4)
Re: pg_dump and join aliases (was Re: [BUGS] ERROR: cannot handle unplanned sub-select)

Robert Haas <robertmhaas@gmail.com> writes:

I'm lost. What's a join alias var?

Suppose we have t1 with columns a,b,c and t2 with columns d,e,f, then
consider

select a from t1 join t2 on (whatever)
select t1.a from t1 join t2 on (whatever)

In the first case the parser generates a Var that references a column of
the unnamed join's RTE; in the second case you get a Var that references
t1 directly. These particular cases are semantically equivalent, but
there are lots of other cases where it's important to draw the
distinction. One interesting example is

select x from (t1 join t2 on (whatever)) as j(x,y,z,xx,yy,zz)

where per SQL spec it'd actually be illegal to write a (or t1.a) because
the named join hides its components. But I think what forced us to have
different representations is FULL JOIN USING. If you have

select id from taba full join tabb using (id)

then taba.id and tabb.id and the join's output variable id are all
semantically different and *must* be given different representations at
the Var level.

Anyway, the way it works is that the parser generates "alias Vars" that
refer to the join RTE, mainly because this makes life simpler for
ruleutils. But the planner prefers to work with the "real" underlying
columns whenever those are semantically equivalent, so it has a pass
that does the replacement, and that's what's broken ...

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: pg_dump and join aliases (was Re: [BUGS] ERROR: cannot handle unplanned sub-select)

On Wed, Jul 7, 2010 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Anyway, the way it works is that the parser generates "alias Vars" that
refer to the join RTE, mainly because this makes life simpler for
ruleutils.  But the planner prefers to work with the "real" underlying
columns whenever those are semantically equivalent, so it has a pass
that does the replacement, and that's what's broken ...

Well, +1 from me for leaving the ruleutils as-is. I don't think we
should go out of our way to generate join alias vars just on the off
chance that there's a bug in the translation from join alias vars to
plain ol' vars, and I agree with your statement upthread that
qualification makes things more robust.

I like robust.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company