Oddity with NOT IN

Started by Jim Nasbyover 9 years ago10 messages
#1Jim Nasby
Jim.Nasby@BlueTreble.com

I've got a customer that discovered something odd...

SELECT f1 FROM v1 WHERE f2 not in (SELECT bad FROM v2 WHERE f3 = 1);

does not error, even though bad doesn't exist, but

SELECT bad FROM v2 WHERE f3 = 1;
gives

ERROR: column "bad" does not exist

Is that expected?

This is on 9.4.8, and both v1 and v2 are views. The only "odd" thing
that I see is that v1 is a UNION ALL and v2 is a UNION. I don't think
there's any tables in common between the two views.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Marko Tiikkaja
marko@joh.to
In reply to: Jim Nasby (#1)
Re: Oddity with NOT IN

On 2016-08-04 11:23 PM, Jim Nasby wrote:

I've got a customer that discovered something odd...

SELECT f1 FROM v1 WHERE f2 not in (SELECT bad FROM v2 WHERE f3 = 1);

does not error, even though bad doesn't exist, but

I'm guessing there's a v1.bad?

This is a common mistake, and also why I recommend always table
qualifying column references when there's more than one table in scope.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Marko Tiikkaja (#2)
Re: Oddity with NOT IN

On 8/4/16 4:53 PM, Marko Tiikkaja wrote:

On 2016-08-04 11:23 PM, Jim Nasby wrote:

I've got a customer that discovered something odd...

SELECT f1 FROM v1 WHERE f2 not in (SELECT bad FROM v2 WHERE f3 = 1);

does not error, even though bad doesn't exist, but

I'm guessing there's a v1.bad?

This is a common mistake, and also why I recommend always table
qualifying column references when there's more than one table in scope.

Well now I feel dumb...

It would be very useful if we had some way to warn users about stuff
like this. Emitting a NOTICE comes to mind.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jim Nasby (#3)
Re: Oddity with NOT IN

2016-08-06 18:53 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 8/4/16 4:53 PM, Marko Tiikkaja wrote:

On 2016-08-04 11:23 PM, Jim Nasby wrote:

I've got a customer that discovered something odd...

SELECT f1 FROM v1 WHERE f2 not in (SELECT bad FROM v2 WHERE f3 = 1);

does not error, even though bad doesn't exist, but

I'm guessing there's a v1.bad?

This is a common mistake, and also why I recommend always table
qualifying column references when there's more than one table in scope.

Well now I feel dumb...

It would be very useful if we had some way to warn users about stuff like
this. Emitting a NOTICE comes to mind.

This can be valid query

Regards

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Pavel Stehule (#4)
Re: Oddity with NOT IN

"Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

Well now I feel dumb...

It would be very useful if we had some way to warn users about stuff
like this. Emitting a NOTICE comes to mind.

Pavel> This can be valid query

It can be, but it essentially never is. The cases where you genuinely
want a correlated IN query are rare, but even then there would be
something in the targetlist that referenced the inner query.

The easy to catch case, I think, is when the targetlist of the IN or NOT
IN subquery contains vars of the outer query level but no vars of the
inner one and no volatile functions. This can be checked for with a
handful of lines in the parser or a couple of dozen lines in a plugin
module (though one would have to invent an error code, none of the
existing WARNING sqlstates would do).

Maybe David Fetter's suggested module for catching missing WHERE clauses
could be expanded into a more general SQL-'Lint' module?

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Pavel Stehule (#4)
Re: Oddity with NOT IN

On 8/6/16 12:03 PM, Pavel Stehule wrote:

It would be very useful if we had some way to warn users about stuff
like this. Emitting a NOTICE comes to mind.

This can be valid query

Right, but in my experience it's an extremely uncommon pattern and much
more likely to be a mistake (that ends up being very time consuming to
debug). That's why I think something like a NOTICE or even a WARNING
would be useful. The only thing I don't like about that idea is if you
ever did actually want this behavior you'd have to do something to
squash the ereport. Though, that's a problem we already have in some
places, so perhaps not worth worrying about.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jim Nasby (#6)
Re: Oddity with NOT IN

2016-08-06 20:01 GMT+02:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 8/6/16 12:03 PM, Pavel Stehule wrote:

It would be very useful if we had some way to warn users about stuff
like this. Emitting a NOTICE comes to mind.

This can be valid query

Right, but in my experience it's an extremely uncommon pattern and much
more likely to be a mistake (that ends up being very time consuming to
debug). That's why I think something like a NOTICE or even a WARNING would
be useful. The only thing I don't like about that idea is if you ever did
actually want this behavior you'd have to do something to squash the
ereport. Though, that's a problem we already have in some places, so
perhaps not worth worrying about.

I worked for company where they generated sets of SQL queries as result of
transformation from multidimensional query language. Some similar queries
are possible there.

I don't thing so using NOTICE or WARNING for any valid query is good idea.

I like the idea of some special extension than can block or raises warning
for some strange plans like this or with Cartesian product ...

Regards

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

#8Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Andrew Gierth (#5)
Re: Oddity with NOT IN

"Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Andrew> The easy to catch case, I think, is when the targetlist of the
Andrew> IN or NOT IN subquery contains vars of the outer query level
Andrew> but no vars of the inner one and no volatile functions. This
Andrew> can be checked for with a handful of lines in the parser or a
Andrew> couple of dozen lines in a plugin module (though one would have
Andrew> to invent an error code, none of the existing WARNING sqlstates
Andrew> would do).

Actually thinking about this, if you did it in a module, you'd probably
want to make it an ERROR not a WARNING, because you'd want to actually
stop queries like

delete from t1 where x in (select x from table_with_no_column_x);

rather than let them run.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andrew Gierth (#5)
Re: Oddity with NOT IN

On 8/6/16 12:57 PM, Andrew Gierth wrote:

The easy to catch case, I think, is when the targetlist of the IN or NOT
IN subquery contains vars of the outer query level but no vars of the
inner one and no volatile functions. This can be checked for with a
handful of lines in the parser or a couple of dozen lines in a plugin
module (though one would have to invent an error code, none of the
existing WARNING sqlstates would do).

I would still like to warn on any outer vars show up in the target list
(other than as function params), because it's still very likely to be a
bug. But I agree that what you describe is even more certain to be one.

Maybe David Fetter's suggested module for catching missing WHERE clauses
could be expanded into a more general SQL-'Lint' module?

Possibly, though I hadn't really considered treating this condition as
an error.

Also, there's some other common gotchas that we could better warn users
about, some of which involve DDL. One example is accidentally defining
duplicate indexes.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Corey Huinker
corey.huinker@gmail.com
In reply to: Jim Nasby (#9)
Re: Oddity with NOT IN

On Sat, Aug 6, 2016 at 2:13 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 8/6/16 12:57 PM, Andrew Gierth wrote:

The easy to catch case, I think, is when the targetlist of the IN or NOT
IN subquery contains vars of the outer query level but no vars of the
inner one and no volatile functions. This can be checked for with a
handful of lines in the parser or a couple of dozen lines in a plugin
module (though one would have to invent an error code, none of the
existing WARNING sqlstates would do).

I would still like to warn on any outer vars show up in the target list
(other than as function params), because it's still very likely to be a
bug. But I agree that what you describe is even more certain to be one.

Maybe David Fetter's suggested module for catching missing WHERE clauses

could be expanded into a more general SQL-'Lint' module?

Possibly, though I hadn't really considered treating this condition as an
error.

Also, there's some other common gotchas that we could better warn users
about, some of which involve DDL. One example is accidentally defining
duplicate indexes.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

If we are contemplating a setting wherein we issue debug/notice/warning
messages for potentially erroneous SQL, I would suggest a simple test would
be any reference to a column without the a corresponding table/alias.

This is fine:
SELECT a.x, b.y FROM table_that_has_x a JOIN table_that_has_y b ON a.id
= b.foreign_id
This gives the notice/warning:
SELECT x, b.y FROM table_that_has_x a JOIN table_that_has_y b ON a.id =
b.foreign_id

We'd have to suppress the warning in cases where no tables are mentioned
(no table to alias, i.e. "SELECT 'a_constant' as config_var"), and I could
see a reason for suppressing it where only one table is mentioned, though I
often urge table aliasing and full references because it makes it easier
when you modify the query to add another table.

Some setting name suggestions:

notify_vague_column_reference = (on,off)
pedantic_column_identifiers = (off,debug,notice,warn,error)