Boolean operators without commutators vs. ALL/ANY
Hi
I've recently wanted to define a check constraint on an array
column that verifies that all array entries match some regular
expression. Unfortunately, t
The most natural way of expressing such a check would be
CHECK ('<regexp>' ~ ANY(field)),
but that doesn't work, because "~" expects the *value*
to be the left argument and the *pattern* to be the right.
The next try was
CHECK (ANY(field) ~ '<regexp>'),
but that doesn't even parse.
Ok, so then use UNNEST() and BOOL_AND() I figured, and wrote
CHECK ((SELECT BOOL_AND(v ~ '<regexp>') FROM UNNEST(field) v)).
But that of course lead to nothing but
ERROR: cannot use subquery in check constraint
So I the end, I had to wrap the sub-query in a SQL-language
function and use that in the check constraint. While this
solved my immediate problem, the necessity of doing that
highlights a few problems
(A) "~" is an extremely bad name for the regexp-matching
operators, since it's visual form is symmetric but it's
behaviour isn't. This doesn't only make its usage very
error-prone, it also makes it very hard to come up with
sensible name for an commutator of "~". I suggest that we
add "=~" as an alias for "~", "~=" as an commutator
for "=~", and deprecate "~". The same holds for "~~".
We might want to do this starting with 9.1.
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.
Ideally, we'd support "ANY(<array>) <operator> <value>",
but if that's not possible grammar-wise, I suggest we extend
the OPERATOR() syntax to allow
<value> OPERATOR(COMMUTATOR <operator>) ANY(<array>).
OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR
of the specified operator if one exists, and otherwise
use the original operator with the arguments swapped.
(C) Why do we forbid sub-queries in CHECK constraints?
I do realize that any non-IMMUTABLE CHECK constraint is
a foot-gun, but since we already allow STABLE and even
VOLATILE functions to be used inside CHECK constraint,
forbidding sub-queries seems a bit pointless...
best regards,
Florian Pflug
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
So I the end, I had to wrap the sub-query in a SQL-language
function and use that in the check constraint. While this
solved my immediate problem, the necessity of doing that
highlights a few problems(A) "~" is an extremely bad name for the regexp-matching
operators, since it's visual form is symmetric but it's
behaviour isn't. This doesn't only make its usage very
error-prone, it also makes it very hard to come up with
sensible name for an commutator of "~". I suggest that we
add "=~" as an alias for "~", "~=" as an commutator
for "=~", and deprecate "~". The same holds for "~~".
Does any other database or programming language implement it this way?
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.
Ideally, we'd support "ANY(<array>) <operator> <value>",
but if that's not possible grammar-wise, I suggest we extend
the OPERATOR() syntax to allow
<value> OPERATOR(COMMUTATOR <operator>) ANY(<array>).
OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR
of the specified operator if one exists, and otherwise
use the original operator with the arguments swapped.
It seems to me that if we provided some way of handling this, your
first proposal would be moot; and I have to say I like the idea of
allowing this a lot more than tinkering with the operator names. I'm
not crazy about the proposed syntax, though; it seems cumbersome, and
it's really only needed for SOME/ALL/ANY, not in general operator
expressions. Since ANY is a reserved keyword, I believe we could
allow something like "expr op ANY BACKWARD ( ... )" -- or some other
keyword in lieu of BACKWARD if you prefer.
Hath the spec anything to say about this?
(C) Why do we forbid sub-queries in CHECK constraints?
I do realize that any non-IMMUTABLE CHECK constraint is
a foot-gun, but since we already allow STABLE and even
VOLATILE functions to be used inside CHECK constraint,
forbidding sub-queries seems a bit pointless...
Dunno. Maybe it's just an implementation restriction?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.
It seems to me that if we provided some way of handling this, your
first proposal would be moot; and I have to say I like the idea of
allowing this a lot more than tinkering with the operator names.
There are syntactic reasons not to do that. It'd be a lot easier just
to provide a commutator operator for ~.
(C) Why do we forbid sub-queries in CHECK constraints?
Dunno. Maybe it's just an implementation restriction?
(1) We don't want to invoke the planner in the places where we'd
have to do so to make that work.
(2) It's just about inevitable that a sub-query would have results
dependent on other rows beside the one being checked. As such, it
would be trying to enforce semantics that you simply can't enforce
via CHECK. (And yes, you can bypass that with a function, but guess
what: it still won't actually work.)
regards, tom lane
On Sun, Jun 12, 2011 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.It seems to me that if we provided some way of handling this, your
first proposal would be moot; and I have to say I like the idea of
allowing this a lot more than tinkering with the operator names.There are syntactic reasons not to do that. It'd be a lot easier just
to provide a commutator operator for ~.
Details?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jun13, 2011, at 05:12 , Robert Haas wrote:
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
So I the end, I had to wrap the sub-query in a SQL-language
function and use that in the check constraint. While this
solved my immediate problem, the necessity of doing that
highlights a few problems(A) "~" is an extremely bad name for the regexp-matching
operators, since it's visual form is symmetric but it's
behaviour isn't. This doesn't only make its usage very
error-prone, it also makes it very hard to come up with
sensible name for an commutator of "~". I suggest that we
add "=~" as an alias for "~", "~=" as an commutator
for "=~", and deprecate "~". The same holds for "~~".Does any other database or programming language implement it this way?
Ruby has "=~", which returns the position of the regexp's first
match, or nil if there is none.
$ ruby -e "puts 'hello' =~ /l+/"
2
$ ruby -e "puts 'hello' =~ /x+/"
nil
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.
Ideally, we'd support "ANY(<array>) <operator> <value>",
but if that's not possible grammar-wise, I suggest we extend
the OPERATOR() syntax to allow
<value> OPERATOR(COMMUTATOR <operator>) ANY(<array>).
OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR
of the specified operator if one exists, and otherwise
use the original operator with the arguments swapped.It seems to me that if we provided some way of handling this, your
first proposal would be moot; and I have to say I like the idea of
allowing this a lot more than tinkering with the operator names.
Well, the issue of "~" being anti-self-explanatory remains
independent from whether we do (B) or not.
I'm
not crazy about the proposed syntax, though; it seems cumbersome, and
it's really only needed for SOME/ALL/ANY, not in general operator
expressions. Since ANY is a reserved keyword, I believe we could
allow something like "expr op ANY BACKWARD ( ... )" -- or some other
keyword in lieu of BACKWARD if you prefer.
Hm, that's less bulky but more kludgy, I'd say. But wait a minute...
If ANY and ALL are reserved anyway, should it be possible to
make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)"
work grammar-wise? (Note the enclosing parens)
I just tried that, and it seems to work. bison doesn't report
and conflicts, the regression tests still succeed, and
I get the following
postgres=# select (all(array[1,2]) = 1);
ERROR: ANY()/ALL() <op> <expr> is not yet implemented at character 9
STATEMENT: select (all(array[1,2]) = 1);
ERROR: ANY()/ALL() <op> <expr> is not yet implemented
LINE 1: select (all(array[1,2]) = 1);
^
I've attached a patch with the changes to gram.y.
best regards,
Florian Pflug
Attachments:
pg_anyall_reversed.v0.patchapplication/octet-stream; name=pg_anyall_reversed.v0.patchDownload+14-0
On Sun, Jun 12, 2011 at 6:46 AM, Florian Pflug <fgp@phlo.org> wrote:
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.
FWIW, in case people were unaware, this is getting close to Perl 6
junctions/superpositions. See:
<http://dev.perl.org/perl6/doc/design/exe/E06.html> "The Wonderful
World of Junctions". Beyond "any" and "all" they propose "one" and
"none".
On Jun13, 2011, at 05:44 , Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
(C) Why do we forbid sub-queries in CHECK constraints?
Dunno. Maybe it's just an implementation restriction?
(1) We don't want to invoke the planner in the places where we'd
have to do so to make that work.
Hm, OK, I get that.
(2) It's just about inevitable that a sub-query would have results
dependent on other rows beside the one being checked. As such, it
would be trying to enforce semantics that you simply can't enforce
via CHECK. (And yes, you can bypass that with a function, but guess
what: it still won't actually work.)
Yeah, I never expected non-immutable CHECK constraints to work. I was
simply pointing out that UNNEST() allows one to write all kinds of
interesting CHECK constraints, all of which are immutable.
Now, if supporting these would increase code complexity, or cause
a performance drop for non-sub-query CHECK constraints, I'm perfectly
fine with leaving them unsupported. I just wanted to make sure we
aren't simply nannying the user - especially since it's a nanny who's
out-smarted by function calls.
best regards,
Florian Pflug
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jun 12, 2011 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
There are syntactic reasons not to do that. It'd be a lot easier just
to provide a commutator operator for ~.
Details?
Well, for one, it becomes unobvious what
A op ANY (B) op C
means. This has come up before, and I seem to recall that we identified
some more-compelling problems, but that's the best I can do before
consuming any caffeine.
In any case, if you dig around enough to notice all the infrastructure
that's involved with this, you'll definitely come to the conclusion that
it'd be a lot less work to just add the missing commutator operators.
regards, tom lane
On Mon, Jun 13, 2011 at 3:01 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jun13, 2011, at 05:12 , Robert Haas wrote:
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
So I the end, I had to wrap the sub-query in a SQL-language
function and use that in the check constraint. While this
solved my immediate problem, the necessity of doing that
highlights a few problems(A) "~" is an extremely bad name for the regexp-matching
operators, since it's visual form is symmetric but it's
behaviour isn't. This doesn't only make its usage very
error-prone, it also makes it very hard to come up with
sensible name for an commutator of "~". I suggest that we
add "=~" as an alias for "~", "~=" as an commutator
for "=~", and deprecate "~". The same holds for "~~".Does any other database or programming language implement it this way?
Ruby has "=~", which returns the position of the regexp's first
match, or nil if there is none.$ ruby -e "puts 'hello' =~ /l+/"
2
$ ruby -e "puts 'hello' =~ /x+/"
nil
Sure. Some languages use =~ and some use just ~... I was just
wondering if anyone thought the commutator of =~ was ~=...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 06/13/2011 10:07 AM, Robert Haas wrote:
Some languages use =~ and some use just ~... I was just
wondering if anyone thought the commutator of =~ was ~=...
My feeling is it's a bit dangerous. It's too easy to fat-finger the
reverse op, and get something quite unintended.
cheers
andrew (whose lack of coordination sadly increases with age)
On Mon, Jun 13, 2011 at 09:01:45AM +0200, Florian Pflug wrote:
Hm, that's less bulky but more kludgy, I'd say. But wait a minute...
If ANY and ALL are reserved anyway, should it be possible to
make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)"
work grammar-wise? (Note the enclosing parens)
This would be a very, very useful feature. :)
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Jun13, 2011, at 16:19 , Andrew Dunstan wrote:
On 06/13/2011 10:07 AM, Robert Haas wrote:
Some languages use =~ and some use just ~... I was just
wondering if anyone thought the commutator of =~ was ~=...My feeling is it's a bit dangerous. It's too easy to fat-finger the reverse op, and get something quite unintended.
Well, but with "~" you need to *remember* that the regexp
goes on the right side and the text on the left. That seems
worse than the risk of fat-fingering "=~" and getting "~=".
At, at least, have looked up the argument order of "~"
countless in the past...
best regards,
Florian Pflug
On Jun13, 2011, at 05:44 , Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.It seems to me that if we provided some way of handling this, your
first proposal would be moot; and I have to say I like the idea of
allowing this a lot more than tinkering with the operator names.There are syntactic reasons not to do that. It'd be a lot easier just
to provide a commutator operator for ~.
My suggestion would be the add a commutator for "~" as a short-term
solution (preferably in 9.1).
Since "~" doesn't inspire any obvious names for a possible commutator,
I suggest adding "=~" and "~=".
Is there any support for that proposal?
In the long term, I'd like to add support for "(ANY() <op> <expr>)"
(Note the enclosing parens). I've checked that this works grammar-wise,
but haven't no idea how much tweaking the executor needs to support
that...
best regards,
Florian Pflug
On Tue, Jun 14, 2011 at 6:10 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jun13, 2011, at 05:44 , Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.It seems to me that if we provided some way of handling this, your
first proposal would be moot; and I have to say I like the idea of
allowing this a lot more than tinkering with the operator names.There are syntactic reasons not to do that. It'd be a lot easier just
to provide a commutator operator for ~.My suggestion would be the add a commutator for "~" as a short-term
solution (preferably in 9.1).
I don't think we want to bump catversion again before release if we
can avoid it. And I don't see this as being a terribly urgent problem
- it's not like this is a new regression, and I can't remember hearing
any complaints about it prior to two days ago.
Since "~" doesn't inspire any obvious names for a possible commutator,
I suggest adding "=~" and "~=".Is there any support for that proposal?
I'm OK with adding a commutator but I guess I don't see the point of
adding a synonym for ~ along the way. The existing use of ~ is
consistent with, for example, awk, so it's not like we've dreamed up
something utterly crazy that we now need to fix. I'd suggest we just
come up with some arbitrary variant, like ~~ or <~ or #~ or
!#!%@~bikeshed++!.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Jun14, 2011, at 14:29 , Robert Haas wrote:
On Tue, Jun 14, 2011 at 6:10 AM, Florian Pflug <fgp@phlo.org> wrote:
On Jun13, 2011, at 05:44 , Tom Lane wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.It seems to me that if we provided some way of handling this, your
first proposal would be moot; and I have to say I like the idea of
allowing this a lot more than tinkering with the operator names.There are syntactic reasons not to do that. It'd be a lot easier just
to provide a commutator operator for ~.My suggestion would be the add a commutator for "~" as a short-term
solution (preferably in 9.1).I don't think we want to bump catversion again before release if we
can avoid it. And I don't see this as being a terribly urgent problem
- it's not like this is a new regression, and I can't remember hearing
any complaints about it prior to two days ago.
Hm, OK, that makes sense...
Since "~" doesn't inspire any obvious names for a possible commutator,
I suggest adding "=~" and "~=".Is there any support for that proposal?
I'm OK with adding a commutator but I guess I don't see the point of
adding a synonym for ~ along the way. The existing use of ~ is
consistent with, for example, awk, so it's not like we've dreamed up
something utterly crazy that we now need to fix. I'd suggest we just
come up with some arbitrary variant, like ~~ or <~ or #~ or
!#!%@~bikeshed++!.
That, however, I'm not at all happy with. Quite frankly, operator
naming is already a bit of a mess, and readability of queries
suffers as a result. The geometric types are especially vile
offenders in this regard, but the various array-related operators
aren't poster children either.
I think we should try to work towards more mnemonic operator
naming, not add to the mess by defining commutator pairs whose
names bear no visual resemblance whatsoever to one each other.
I'm not wedded to "=~", it's just the only name I could come
up which
(a) has a natural commutator
(b) gives visual indication of which argument constitutes the
text and which the pattern
(c) there is precedent for.
BTW, there's actually precedent for a commutator of "~", namely
"@". Some of the geometric types (polygon, box, circle, point,
path) use "~" as a commutator for "@" (which stands for "contains").
But IMHO that mainly proves that the geometric types are vile
offenders when it comes to readability...
The pair ("@", "~" ) is also the only pair of commutators whose
names are totally unrelated to each other. Given a suitable
definition of a reverse() function for text [1]I used create or replace function reverse(text) returns text as $$ select string_agg(substring($1, i, 1), '') from generate_series(length($1), 1, -1) i $$ language sql immutable;, the following query
select
o1.oprleft::regtype || ' ' || o1.oprname || ' ' || o1.oprright::regtype as opr,
o2.oprleft::regtype || ' ' || o2.oprname || ' ' || o2.oprright::regtype as com,
o1.oprcode as opr_code,
o2.oprcode as com_code
from pg_operator o1
join pg_operator o2 on o1.oprcom = o2.oid or o2.oprcom = o1.oid
where
o1.oid < o2.oid and
o1.oprname <> reverse(translate(o2.oprname, '<>', '><')) and
o1.oprname <> translate(o2.oprname, '<>', '><');
produces
opr | com | opr_code | com_code
-------------------+-------------------+---------------------+-------------------
polygon @ polygon | polygon ~ polygon | poly_contained | poly_contain
box @ box | box ~ box | box_contained | box_contain
circle @ circle | circle ~ circle | circle_contained | circle_contain
point @ path | path ~ point | on_ppath | path_contain_pt
point @ polygon | polygon ~ point | pt_contained_poly | poly_contain_pt
point @ circle | circle ~ point | pt_contained_circle | circle_contain_pt
(6 rows)
best regards,
Florian Pflug
[1]: I used create or replace function reverse(text) returns text as $$ select string_agg(substring($1, i, 1), '') from generate_series(length($1), 1, -1) i $$ language sql immutable;
create or replace function reverse(text) returns text as $$
select string_agg(substring($1, i, 1), '') from generate_series(length($1), 1, -1) i
$$ language sql immutable;
On Jun13, 2011, at 17:41 , David Fetter wrote:
On Mon, Jun 13, 2011 at 09:01:45AM +0200, Florian Pflug wrote:
Hm, that's less bulky but more kludgy, I'd say. But wait a minute...
If ANY and ALL are reserved anyway, should it be possible to
make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)"
work grammar-wise? (Note the enclosing parens)This would be a very, very useful feature. :)
There, you dared me. So here's a patch :-P
So far it supports (ANY(array) op scalar) and (ALL(array) op scalar),
but not (ANY/ALL(subselect) op scalar). "ANY/ALL op scalar" is
supported independent from whether <op> has a commutator or not.
In the latter case, the clause isn't indexable, but thats no
different from the case "const op field". Both ANY(...) = ctid
and ctid = ANY(...) are correctly executed as TID scans (if the
array expression is a pseudo-constant, that is).
The patch adds a fields "aryArgIdx" (0 or 1) to ScalarArrayOpExpr
which stores the index of the array-valued argument. Thus, for
the traditional "scalar op ANY/ALL(array)" variant, aryArgIdx == 1,
for "ANY/ALL(array) op scalar" it's zero.
I've updates all places that I could find which inspect
ScalarArrayOpExprs to take the aryArgIdx into account. And boy,
those were quite a few, which explains the size of the patch.
Most of the changes are pretty trivial, though.
The indexing support works similar to the case "scalar op scalar",
i.e. match_clause_to_indexcol() consideres the clause to be index-able
if the scalar argument matches an index definition, and relies on
fix_indexqual_references() to commute the ScalarArrayOpExprs if
the scalar argument is on the right side instead of on the left.
I noticed that ANY/ALL is hardly exercised by the regression tests
at all, so I added a (pretty exhaustive, I think) test any_all.
The test exercises the traditional and the reversed form of
ANY/ALL and verify that an index is used if possible.
Comments are extremely welcome, especially ones regarding
the overall approach taken in this patch. If people consider
that to be acceptable, I'd try to add the missing features
and add documentation.
best regards,
Florian Pflug
Attachments:
pg_anyall_reversed.v0.patchapplication/octet-stream; name=pg_anyall_reversed.v0.patchDownload+1212-292
Florian Pflug <fgp@phlo.org> writes:
Comments are extremely welcome, especially ones regarding
the overall approach taken in this patch. If people consider
that to be acceptable, I'd try to add the missing features
and add documentation.
Quite honestly, I don't like this one bit and would rather you not
pursue the idea. There is no such syntax in the standard, and
presumably that's not because the SQL committee never thought of it.
They may have some incompatible idea in mind for the future, who knows?
But in any case, this won't provide any functionality whatever that we
couldn't provide at much less effort and risk, just by providing
commutator operators for the few missing cases.
(FWIW, I've come around to liking the idea of using =~ and the obvious
variants of that for regex operators, mainly because of the Perl
precedent.)
regards, tom lane
On mån, 2011-06-13 at 10:19 -0400, Andrew Dunstan wrote:
On 06/13/2011 10:07 AM, Robert Haas wrote:
Some languages use =~ and some use just ~... I was just
wondering if anyone thought the commutator of =~ was ~=...My feeling is it's a bit dangerous. It's too easy to fat-finger the
reverse op, and get something quite unintended.
Yes, it looked highly dangerous to me as well.
On tis, 2011-06-14 at 15:38 +0200, Florian Pflug wrote:
BTW, there's actually precedent for a commutator of "~", namely
"@". Some of the geometric types (polygon, box, circle, point,
path) use "~" as a commutator for "@" (which stands for "contains").
I wouldn't have a problem with naming the reverse operator "@".
On ons, 2011-06-15 at 22:19 -0400, Tom Lane wrote:
(FWIW, I've come around to liking the idea of using =~ and the obvious
variants of that for regex operators, mainly because of the Perl
precedent.)
Maybe I'm not completely up to date on this, but I observe that Perl
itself doesn't appear to have a commutator for =~ .