Named Operators
Technically correct name of this feature would be Readable Names for
Operators, or Pronounceable Names for Operators. But I'd like to call
it Named Operators.
With this patch in place, the users can name the operators as
:some_pronounceable_name: instead of having to choose from the special
characters like #^&@. For example, users will be able to create and
use operators like:
select
expr1 :distance: expr2,
expr3 :contains_all: expr4,
expr5 :contains_any: expr6
expr7 :contains_exactly_two_of: expr8
from mytable;
instead of being forced to use these:
select
expr1 <#> expr2,
expr3 ?& expr4,
expr5 ?| expr6
expr7 ??!! expr8 -- ¯\_(ツ)_/¯
from mytable;
I think Named Operators will significantly improve the readability
of queries.
After a little trial-an-error, it was easy to develop the scan.l
rules to implement this feature, without flex barfing. The hard part
has been convincing myself that this is a safe implementation, even
though there are no regressions in `make check`. I am unsure of this
implementation's compatibility with the SQL Spec, and I'm not able to
envision problems with its interaction with some current or potential
feature of Postgres. So I'd really appreciate feedback from someone
who is conversant with the SQL Spec.
If the colon character being used as a delimiter poses a
challenge, other good candidates for the delimiter seem to be one of
~^` Although I haven't tested any of these to see if they cause a
regression. The colon character is be preferable for the delimiter,
since it is already used in the typecast :: operator.
I tried to strip the delimiters/colons from the name right in the
scanner, primarily because that would allow the identifier part of the
name to be as long as NAMEDATALEN-1, just like other identifiers
Postgres allows. Added benefit of stripping delimiters was that the
rest of the code, and catalogs/storage won't have to see the
delimiters. But stripping the delimiters made the code brittle; some
places in code now had to be taught different handling depending on
whether the operator name was coming from the user command, or from
the catalogs. I had to special-case code in pg_dump, as well. To share
code with frontends like pg_dump, I had to place code in src/common/.
I was still not able to address some obvious bugs.
By retaining the delimiters : in the name, the code became much
simpler; pg_dump support came for free! The bugs became a non-issue.
To see how much code and complexity was reduced, one can see this
commit [1]Commit: Don't strip the delimiters https://github.com/gurjeet/postgres/commit/62d11a578e5325c32109bb2a55a624d0d89d4b7e. The downside of retaining the delimiters is that the
identifier part of the name can be no more than NAMEDATALEN-3 in
length.
Because of the minimal changes to the scanner rules, and no
changes in the grammar, I don't think there's any impact on precedence
and associativity rules of the operators. I'd be happy to learn
otherwise.
Here's a rudimentary test case to demonstrate the feature:
create operator :add_point: (function = box_add, leftarg = box,
rightarg = point);
create table test(a box);
insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
select a as original, a :add_point: '(1,1)' as modified from test;
drop operator :add_point:(box, point);
Feedback will be much appreciated!
[1]: Commit: Don't strip the delimiters https://github.com/gurjeet/postgres/commit/62d11a578e5325c32109bb2a55a624d0d89d4b7e
https://github.com/gurjeet/postgres/commit/62d11a578e5325c32109bb2a55a624d0d89d4b7e
[2]: Git branch named_operators https://github.com/gurjeet/postgres/tree/named_operators
https://github.com/gurjeet/postgres/tree/named_operators
Best regards,
Gurjeet
http://Gurje.et
Attachments:
named_operators.patchapplication/octet-stream; name=named_operators.patchDownload
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 1017f2eed1..c5b8562cb5 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -31,6 +31,7 @@
#include "catalog/pg_type.h"
#include "miscadmin.h"
#include "parser/parse_oper.h"
+#include "parser/scansup.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -79,6 +80,10 @@ validOperatorName(const char *name)
if (len == 0 || len >= NAMEDATALEN)
return false;
+ /* Is this a Named Operator? */
+ if (validNamedOperator(name))
+ return true;
+
/* Can't contain any invalid characters */
/* Test string here should match op_chars in scan.l */
if (strspn(name, "~!@#^&|`?+-*/%<>=") != len)
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8eb..cef1da0305 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -379,6 +379,15 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. :foo:
+ *
+ * {namedopfailed} is an error rule to avoid scanner backup when {namedop}
+ * fails to match its trailing ":".
+ */
+namedop \:{identifier}\:
+namedopfailed \:{identifier}
+
/*
* Numbers
*
@@ -768,6 +777,23 @@ other .
}
<xdolq><<EOF>> { yyerror("unterminated dollar-quoted string"); }
+{namedop} {
+ SET_YYLLOC();
+ if (yyleng >= NAMEDATALEN)
+ yyerror("operator name too long");
+ /* XXX Should we support double-quoted, case sensitive names? */
+ yylval->str = downcase_identifier(yytext, yyleng, false, false);
+ return Op;
+ }
+
+{namedopfailed} {
+ SET_YYLLOC();
+ /* throw back all but the initial ':' */
+ yyless(1);
+ /* and treat it as {self}, since ':' is a member of that set of chars. */
+ return yytext[0];
+ }
+
{xdstart} {
SET_YYLLOC();
BEGIN(xd);
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 602108a40f..678d9d0e5c 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -125,3 +125,71 @@ scanner_isspace(char ch)
return true;
return false;
}
+
+bool
+validNamedOperator(const char *name)
+{
+ size_t len = strlen(name);
+ bool valid_identifier;
+ char *tmp;
+
+ if (len < 3 || len >= NAMEDATALEN)
+ return false;
+
+ if (name[0] != ':' || name[len-1] != ':')
+ return false;
+
+ tmp = pstrdup(name);
+
+ // Disregard the delimiters
+ tmp[len-1] = '\0';
+ tmp += 1;
+ valid_identifier = validIdentifier(tmp);
+ tmp -= 1;
+ pfree(tmp);
+
+ return valid_identifier;
+}
+
+/*
+ * Function to resemble the same check as the following lex
+ * rules in scan.l:
+ *
+ * ident_start [A-Za-z\200-\377_]
+ * ident_cont [A-Za-z\200-\377_0-9\$]
+ *
+ * Note: this function does not check if the identifier length
+ * is less than NAMEDATALEN.
+ */
+bool
+validIdentifier(const char *name)
+{
+ uint8 c;
+ size_t i, len = strlen(name);
+
+ // Reject if first character is not part of ident_start
+ c = name[0];
+ if ( !(c == '_'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+
+ // Reject if other characters are not part of ident_cont
+ for (i = 1; i < len; ++i)
+ {
+ c = name[i];
+ if ( !(c == '_' || c == '$'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >='0' && c <= '9')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+ }
+
+ return true;
+}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index da427f4d4a..0aafb3297e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -62,6 +62,7 @@
#include "getopt_long.h"
#include "libpq/libpq-fs.h"
#include "parallel.h"
+#include "common/scansup.h"
#include "pg_backup_db.h"
#include "pg_backup_utils.h"
#include "pg_dump.h"
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec240..39bacc6738 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -317,6 +317,15 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. :foo:
+ *
+ * {namedopfailed} is an error rule to avoid scanner backup when {namedop}
+ * fails to match its trailing ":".
+ */
+namedop \:{identifier}\:
+namedopfailed \:{identifier}
+
/*
* Numbers
*
@@ -570,6 +579,16 @@ other .
ECHO;
}
+{namedop} {
+ ECHO;
+ }
+
+{namedopfailed} {
+ /* throw back all but the initial ':' */
+ yyless(1);
+ ECHO;
+ }
+
{xdstart} {
BEGIN(xd);
ECHO;
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
index ff65224bf6..0f6aff8b44 100644
--- a/src/include/parser/scansup.h
+++ b/src/include/parser/scansup.h
@@ -24,4 +24,7 @@ extern void truncate_identifier(char *ident, int len, bool warn);
extern bool scanner_isspace(char ch);
+extern bool validNamedOperator(const char *name);
+extern bool validIdentifier(const char *name);
+
#endif /* SCANSUP_H */
Please see attached a slightly updated patch. There were some comment
changes sitting in uncommitted in Git worktree, that were missed.
Best regards,
Gurjeet
http://Gurje.et
Attachments:
named_operators_v1.patchapplication/octet-stream; name=named_operators_v1.patchDownload
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 1017f2eed1..c5b8562cb5 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -31,6 +31,7 @@
#include "catalog/pg_type.h"
#include "miscadmin.h"
#include "parser/parse_oper.h"
+#include "parser/scansup.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -79,6 +80,10 @@ validOperatorName(const char *name)
if (len == 0 || len >= NAMEDATALEN)
return false;
+ /* Is this a Named Operator? */
+ if (validNamedOperator(name))
+ return true;
+
/* Can't contain any invalid characters */
/* Test string here should match op_chars in scan.l */
if (strspn(name, "~!@#^&|`?+-*/%<>=") != len)
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8eb..cef1da0305 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -379,6 +379,15 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. :foo:
+ *
+ * {namedopfailed} is an error rule to avoid scanner backup when {namedop}
+ * fails to match its trailing ":".
+ */
+namedop \:{identifier}\:
+namedopfailed \:{identifier}
+
/*
* Numbers
*
@@ -768,6 +777,23 @@ other .
}
<xdolq><<EOF>> { yyerror("unterminated dollar-quoted string"); }
+{namedop} {
+ SET_YYLLOC();
+ if (yyleng >= NAMEDATALEN)
+ yyerror("operator name too long");
+ /* XXX Should we support double-quoted, case sensitive names? */
+ yylval->str = downcase_identifier(yytext, yyleng, false, false);
+ return Op;
+ }
+
+{namedopfailed} {
+ SET_YYLLOC();
+ /* throw back all but the initial ':' */
+ yyless(1);
+ /* and treat it as {self}, since ':' is a member of that set of chars. */
+ return yytext[0];
+ }
+
{xdstart} {
SET_YYLLOC();
BEGIN(xd);
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 602108a40f..8d5287a5f5 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -125,3 +125,73 @@ scanner_isspace(char ch)
return true;
return false;
}
+
+/*
+ * validNamedOperator() -- return true if name adheres to the scanner rule
+ * {namedop}
+ */
+bool
+validNamedOperator(const char *name)
+{
+ size_t len = strlen(name);
+ bool valid_identifier;
+ char *tmp;
+
+ if (len < 3 || len >= NAMEDATALEN)
+ return false;
+
+ if (name[0] != ':' || name[len-1] != ':')
+ return false;
+
+ // Make a copy of the name, since we need to scribble on it
+ tmp = pstrdup(name);
+
+ // Disregard the delimiters
+ tmp[len-1] = '\0';
+ tmp += 1;
+ valid_identifier = validIdentifier(tmp);
+ tmp -= 1;
+ pfree(tmp);
+
+ return valid_identifier;
+}
+
+/*
+ * validIdentifier() -- return true if name adheres to the scanner rule
+ * {identifier}
+ *
+ * Note: this function does not check if the identifier length
+ * is less than NAMEDATALEN.
+ */
+bool
+validIdentifier(const char *name)
+{
+ uint8 c;
+ size_t i, len = strlen(name);
+
+ // Reject if first character is not part of ident_start
+ c = name[0];
+ if ( !(c == '_'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+
+ // Reject if other characters are not part of ident_cont
+ for (i = 1; i < len; ++i)
+ {
+ c = name[i];
+ if ( !(c == '_' || c == '$'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >='0' && c <= '9')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+ }
+
+ return true;
+}
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index da427f4d4a..0aafb3297e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -62,6 +62,7 @@
#include "getopt_long.h"
#include "libpq/libpq-fs.h"
#include "parallel.h"
+#include "common/scansup.h"
#include "pg_backup_db.h"
#include "pg_backup_utils.h"
#include "pg_dump.h"
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec240..39bacc6738 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -317,6 +317,15 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. :foo:
+ *
+ * {namedopfailed} is an error rule to avoid scanner backup when {namedop}
+ * fails to match its trailing ":".
+ */
+namedop \:{identifier}\:
+namedopfailed \:{identifier}
+
/*
* Numbers
*
@@ -570,6 +579,16 @@ other .
ECHO;
}
+{namedop} {
+ ECHO;
+ }
+
+{namedopfailed} {
+ /* throw back all but the initial ':' */
+ yyless(1);
+ ECHO;
+ }
+
{xdstart} {
BEGIN(xd);
ECHO;
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
index ff65224bf6..0f6aff8b44 100644
--- a/src/include/parser/scansup.h
+++ b/src/include/parser/scansup.h
@@ -24,4 +24,7 @@ extern void truncate_identifier(char *ident, int len, bool warn);
extern bool scanner_isspace(char ch);
+extern bool validNamedOperator(const char *name);
+extern bool validIdentifier(const char *name);
+
#endif /* SCANSUP_H */
On Thu, 12 Jan 2023 at 10:16, Gurjeet Singh <gurjeet@singh.im> wrote:
Technically correct name of this feature would be Readable Names for
Operators, or Pronounceable Names for Operators. But I'd like to call
it Named Operators.With this patch in place, the users can name the operators as
:some_pronounceable_name: instead of having to choose from the special
characters like #^&@.
[...]
I think Named Operators will significantly improve the readability
of queries.
Couldn't the user better opt to call the functions that implement the
operator directly if they want more legible operations? So, from your
example, `SELECT box_add(a, b)` instead of `SELECT a :add_point: b`?
I'm -1 on the chosen syntax; :name: shadows common variable
substitution patterns including those of psql.
Kind regards,
Matthias van de Meent
On Thu, Jan 12, 2023 at 1:49 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Thu, 12 Jan 2023 at 10:16, Gurjeet Singh <gurjeet@singh.im> wrote:
Technically correct name of this feature would be Readable Names for
Operators, or Pronounceable Names for Operators. But I'd like to call
it Named Operators.With this patch in place, the users can name the operators as
:some_pronounceable_name: instead of having to choose from the special
characters like #^&@.
[...]
I think Named Operators will significantly improve the readability
of queries.Couldn't the user better opt to call the functions that implement the
operator directly if they want more legible operations? So, from your
example, `SELECT box_add(a, b)` instead of `SELECT a :add_point: b`?
Matter of taste, I guess. But more importantly, defining an operator
gives you many additional features that the planner can use to
optimize your query differently, which it can't do with functions. See
the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.
https://www.postgresql.org/docs/current/sql-createoperator.html
This proposal is primarily a replacement for the myriad of
hard-to-pronounce operators that users have to memorize. For example,
it'd be nice to have readable names for the PostGIS operators.
https://postgis.net/docs/reference.html#Operators
For someone who's reading/troubleshooting a PostGIS query, when they
encounter operator <<| — in the query for the first time, they'd have
to open up the docs. But if the query used the :strictly_below:
operator, there's no need to switch to docs and lose context.
I'm -1 on the chosen syntax; :name: shadows common variable
substitution patterns including those of psql.
Ah, thanks for reminding! Early on when I hadn't written code yet, I
remember discarding colon : as a delimiter choice, precisely because
it is used for using variables in psql, and perhaps in some drivers,
as well. But in the rush of implementing and wrangling code, I forgot
about that argument altogether.
I'll consider using one of the other special characters. Do you have
any suggestions?
Best regards,
Gurjeet
http://Gurje.et
On Thu, 12 Jan 2023 at 11:59, Gurjeet Singh <gurjeet@singh.im> wrote:
On Thu, Jan 12, 2023 at 1:49 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:On Thu, 12 Jan 2023 at 10:16, Gurjeet Singh <gurjeet@singh.im> wrote:
Technically correct name of this feature would be Readable Names for
Operators, or Pronounceable Names for Operators. But I'd like to call
it Named Operators.With this patch in place, the users can name the operators as
:some_pronounceable_name: instead of having to choose from the special
characters like #^&@.
[...]
I think Named Operators will significantly improve the readability
of queries.Couldn't the user better opt to call the functions that implement the
operator directly if they want more legible operations? So, from your
example, `SELECT box_add(a, b)` instead of `SELECT a :add_point: b`?Matter of taste, I guess. But more importantly, defining an operator
gives you many additional features that the planner can use to
optimize your query differently, which it can't do with functions. See
the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.
I see. Wouldn't it be better then to instead make it possible for the
planner to detect the use of the functions used in operators and treat
them as aliases of the operator? Or am I missing something w.r.t.
differences between operator and function invocation?
E.g. indexes on `int8pl(my_bigint, 1)` does not match queries for
`my_bigint + 1` (and vice versa), while they should be able to support
that, as OPERATOR(pg_catalog.+(int8, int8)) 's function is int8pl.
Kind regards,
Matthias van de Meent
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
I'm -1 on the chosen syntax; :name: shadows common variable
substitution patterns including those of psql.
Yeah, this syntax is DOA because of that. I think almost
anything you might invent is going to have conflict risks.
We could probably make it work by allowing the existing OPERATOR
syntax to take things that look like names as well as operators,
like
expr3 OPERATOR(contains_all) expr4
But that's bulky enough that nobody will care to use it.
On the whole I don't see this proposal going anywhere.
There's too much investment in the existing operator names,
and too much risk of conflicts if you try to shorten the
syntax.
regards, tom lane
On Thu, Jan 12, 2023 at 3:59 AM Gurjeet Singh <gurjeet@singh.im> wrote:
On Thu, Jan 12, 2023 at 1:49 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:I'm -1 on the chosen syntax; :name: shadows common variable
substitution patterns including those of psql.I'll consider using one of the other special characters. Do you have
any suggestions?
The R language uses %...% to denote custom operators.
That would be a bit annoying for dynamic SQL using format though...
Do we have to choose? There are 15 allowed characters for operator names
presently (aside from + and -), could we define the rule that an operator
name can contain any sequence of alphabetic+underscore+space? characters so
long as the first and last symbol of the operator name is one of those 15
characters?
Another appealing option would be the non-matching but complementary pair
<...> (I'd consider removing these from the 15 choices in we go that route)
SELECT 1 <add> 2;
I would probably avoid requiring back-ticks given their usage as identifier
quoting in other systems - probably remove it from the 15 choices if we go
that route.
David J.
On Thu, 12 Jan 2023 at 05:59, Gurjeet Singh <gurjeet@singh.im> wrote:
I'll consider using one of the other special characters. Do you have
any suggestions?
What about backticks (`)? They are allowed as operator characters but do
not otherwise appear in the lexical syntax as far as I can tell:
https://www.postgresql.org/docs/current/sql-syntax-lexical.html
Isaac Morland <isaac.morland@gmail.com> writes:
What about backticks (`)? They are allowed as operator characters but do
not otherwise appear in the lexical syntax as far as I can tell:
https://www.postgresql.org/docs/current/sql-syntax-lexical.html
Since they're already allowed as operator characters, you can't
use them for this purpose without breaking existing use-cases.
Even if they were completely unused, I'd be pretty hesitant to
adopt them for this purpose because of the potential confusion
for users coming from mysql.
Pretty much the only available syntax space is curly braces,
and I don't really want to give those up for this either.
(One has to assume that the SQL committee has their eyes
on those too.)
regards, tom lane
On 1/12/23 18:14, Tom Lane wrote:
Pretty much the only available syntax space is curly braces,
and I don't really want to give those up for this either.
(One has to assume that the SQL committee has their eyes
on those too.)
They are used in row pattern recognition.
--
Vik Fearing
On Thu, Jan 12, 2023 at 10:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Isaac Morland <isaac.morland@gmail.com> writes:
What about backticks (`)? They are allowed as operator characters but do
not otherwise appear in the lexical syntax as far as I can tell:
https://www.postgresql.org/docs/current/sql-syntax-lexical.htmlSince they're already allowed as operator characters, you can't
use them for this purpose without breaking existing use-cases.
IIUC, specifically the fact that an operator is defined to start with one
of those symbols and end at the first non-symbol. We can't change the
allowed set of non-symbols at this point, without defining something else
to denote the start of an operator.
David J.
On Thu, Jan 12, 2023 at 5:55 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
On Thu, 12 Jan 2023 at 11:59, Gurjeet Singh <gurjeet@singh.im> wrote:
... defining an operator
gives you many additional features that the planner can use to
optimize your query differently, which it can't do with functions. See
the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.I see. Wouldn't it be better then to instead make it possible for the
planner to detect the use of the functions used in operators and treat
them as aliases of the operator? Or am I missing something w.r.t.
differences between operator and function invocation?E.g. indexes on `int8pl(my_bigint, 1)` does not match queries for
`my_bigint + 1` (and vice versa), while they should be able to support
that, as OPERATOR(pg_catalog.+(int8, int8)) 's function is int8pl.
Such a feature would be immensely useful in its own right. But it's
also going to be at least 2 orders of magnitude (or more) effort to
implement, and to get accepted in the community. I'm thinking of
changes in planner, catalogs, etc.
On Thu, Jan 12, 2023 at 7:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
I'm -1 on the chosen syntax; :name: shadows common variable
substitution patterns including those of psql.Yeah, this syntax is DOA because of that. I think almost
anything you might invent is going to have conflict risks.
I remember discussing this in a meeting with Joe Conway a few weeks
ago, when this was just a proposal in my head and I was just bouncing
it off him. And I remember pointing out that colons would be a bad
choice because of their use in psql; but for life of me I can't think
of a reason (except temporary memory loss) why I failed to consider
the psql conflict when implementing the feature. If only some test in
`make check` would have pointed out the mistake, I wouldn't have made
this obvious mistake.
We could probably make it work by allowing the existing OPERATOR
syntax to take things that look like names as well as operators,
likeexpr3 OPERATOR(contains_all) expr4
But that's bulky enough that nobody will care to use it.
+1. Although that'd be better for readers than the all-special-char
names, this format is bulky enough that you won't be able to convince
the query writers to bother using it. But if all other efforts fail,
I'll take this format over the cryptic ones any day.
On the whole I don't see this proposal going anywhere.
There's too much investment in the existing operator names,
and too much risk of conflicts if you try to shorten the
syntax.
I wouldn't give up on the idea, yet :-) See new proposal below.
On Thu, Jan 12, 2023 at 9:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Isaac Morland <isaac.morland@gmail.com> writes:
What about backticks (`)?
Since they're already allowed as operator characters, you can't
use them for this purpose without breaking existing use-cases.Even if they were completely unused, I'd be pretty hesitant to
adopt them for this purpose because of the potential confusion
for users coming from mysql.
Since when have we started caring for the convenience of users of
other databases?!! /s
Pretty much the only available syntax space is curly braces,
and I don't really want to give those up for this either.
(One has to assume that the SQL committee has their eyes
on those too.)
On Thu, Jan 12, 2023 at 9:45 AM Vik Fearing <vik@postgresfriends.org> wrote:
They are used in row pattern recognition.
I was very hopeful of using { }, and hoping that we'd beat the SQL
committee to it, so that they have to choose something else, if we
release this into the wild before them. But it seems that they beat us
to it long ago. (tangent: Reading some blog posts, I have to say I
loved the Row Pattern Recognition feature!)
Considering that there are almost no printable characters left in
1-255 ASCII range for us to choose from, I had to get creative; and I
believe I have found a way to make it work.
Unless the SQL committee has their eyes on a freestanding backslash \
character for something, I believe we can use it as a prefix for Named
Operators. Since the most common use of backslash is for escaping
characters, I believe it would feel natural for the users to use it as
described below.
New scheme for the named operators: \#foo That is, an identifier
prefixed with \# would serve as an operator name. psql considers \ to
be the start of its commands, but it wasn't hard to convince psql to
ignore \# and let it pass through to server.
I agree that an identifier _surrounded_ by the same token (e.g. #foo#)
or the pairing token (e.g. {foo}) looks better aesthetically, so I am
okay with any of the following variations of the scheme, as well:
\#foo\# (tested; works)
\#foo# (not tested; reduces ident length by 1)
We can choose a different character, instead of #. Perhaps \{foo} !
Attached is the v2 patch that supports \#foo style Named Operators.
Following is the SQL snippet to see what the usage looks like.
create operator \#add_point
(function = box_add, leftarg = box, rightarg = point);
create table test(a box);
insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
select a as original, a \#add_point '(1,1)' as modified from test;
drop operator \#add_point(box, point);
Although we have never done it before, but by using backslash we
might be able to define new custom token types as well, if needed.
For those interested, I have couple of different branches with
named_operators* prefix in my Git fork [1]https://github.com/gurjeet/postgres/branches where I'm trying different
combinations.
[1]: https://github.com/gurjeet/postgres/branches
Best regards,
Gurjeet
http://Gurje.et
Attachments:
named_operators_v2.patchapplication/octet-stream; name=named_operators_v2.patchDownload
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 1017f2eed1..c5b8562cb5 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -31,6 +31,7 @@
#include "catalog/pg_type.h"
#include "miscadmin.h"
#include "parser/parse_oper.h"
+#include "parser/scansup.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -79,6 +80,10 @@ validOperatorName(const char *name)
if (len == 0 || len >= NAMEDATALEN)
return false;
+ /* Is this a Named Operator? */
+ if (validNamedOperator(name))
+ return true;
+
/* Can't contain any invalid characters */
/* Test string here should match op_chars in scan.l */
if (strspn(name, "~!@#^&|`?+-*/%<>=") != len)
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8eb..fa282dde4d 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -379,6 +379,15 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. \#foo
+ *
+ * {namedopfailed} is an error rule to avoid scanner backup when
+ * {namedop} fails to match its trailing {identifier}.
+ */
+namedop \\\#{identifier}
+namedopfailed \\\#
+
/*
* Numbers
*
@@ -768,6 +777,18 @@ other .
}
<xdolq><<EOF>> { yyerror("unterminated dollar-quoted string"); }
+{namedop} {
+ SET_YYLLOC();
+ if (yyleng >= NAMEDATALEN)
+ yyerror("operator name too long");
+ yylval->str = downcase_identifier(yytext, yyleng, false, false);
+ return Op;
+ }
+
+{namedopfailed} {
+ yyerror("unexpected token");
+ }
+
{xdstart} {
SET_YYLLOC();
BEGIN(xd);
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 602108a40f..db94a927f7 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -125,3 +125,65 @@ scanner_isspace(char ch)
return true;
return false;
}
+
+/*
+ * validNamedOperator() -- return true if name adheres to the scanner rule
+ * {namedop}
+ */
+bool
+validNamedOperator(const char *name)
+{
+ size_t len = strlen(name);
+ bool valid_identifier;
+
+ if (len < 3 || len >= NAMEDATALEN)
+ return false;
+
+ if (name[0] != '\\' || name[1] != '#')
+ return false;
+
+ // Disregard the delimiters
+ valid_identifier = validIdentifier(name + 2);
+
+ return valid_identifier;
+}
+
+/*
+ * validIdentifier() -- return true if name adheres to the scanner rule
+ * {identifier}
+ *
+ * Note: this function does not check if the identifier length
+ * is less than NAMEDATALEN.
+ */
+bool
+validIdentifier(const char *name)
+{
+ uint8 c;
+ size_t i, len = strlen(name);
+
+ // Reject if first character is not part of ident_start
+ c = name[0];
+ if ( !(c == '_'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+
+ // Reject if other characters are not part of ident_cont
+ for (i = 1; i < len; ++i)
+ {
+ c = name[i];
+ if ( !(c == '_' || c == '$'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >='0' && c <= '9')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+ }
+
+ return true;
+}
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec240..8ec9572d5e 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -317,6 +317,15 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. \#foo
+ *
+ * {namedopfailed} is an error rule to avoid scanner backup when
+ * {namedop} fails to match its trailing {identifier}.
+ */
+namedop \\\#{identifier}
+namedopfailed \\\#
+
/*
* Numbers
*
@@ -570,6 +579,14 @@ other .
ECHO;
}
+{namedop} {
+ ECHO;
+ }
+
+{namedopfailed} {
+ ECHO;
+ }
+
{xdstart} {
BEGIN(xd);
ECHO;
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
index ff65224bf6..0f6aff8b44 100644
--- a/src/include/parser/scansup.h
+++ b/src/include/parser/scansup.h
@@ -24,4 +24,7 @@ extern void truncate_identifier(char *ident, int len, bool warn);
extern bool scanner_isspace(char ch);
+extern bool validNamedOperator(const char *name);
+extern bool validIdentifier(const char *name);
+
#endif /* SCANSUP_H */
On Sat, Jan 14, 2023 at 6:14 AM Gurjeet Singh <gurjeet@singh.im> wrote:
I agree that an identifier _surrounded_ by the same token (e.g. #foo#)
or the pairing token (e.g. {foo}) looks better aesthetically, so I am
okay with any of the following variations of the scheme, as well:\#foo\# (tested; works)
\#foo# (not tested; reduces ident length by 1)We can choose a different character, instead of #. Perhaps \{foo} !
Please find attached the patch that uses \{foo} styled Named
Operators. This is in line with Tom's reluctant hint at possibly using
curly braces as delimiter characters. Since the curly braces are used
by the SQL Specification for row pattern recognition, this patch
proposes escaping the first of the curly braces.
We can get rid of the leading backslash, if (a) we're confident that
SQL committee will not use curly braces anywhere else, and (b) if
we're confident that if/when Postgres supports Row Pattern Recognition
feature, we'll be able to treat curly braces inside the PATTERN clause
specially. Since both of those conditions are unlikely, I think we
must settle for the escaped-first-curly-brace style for the naming our
operators.
Keeping with the previous posts, here's a sample SQL script showing
what the proposed syntax will look like in action. Personally, I
prefer the \#foo style, since the \# prefix stands out among the text,
better than \{..} does, and because # character is a better signal of
an operator than {.
create operator \{add_point}
(function = box_add, leftarg = box, rightarg = point);
create table test(a box);
insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
select a as original, a \{add_point} '(1,1)' as modified from test;
drop operator \{add_point}(box, point);
Best regards,
Gurjeet
http://Gurje.et
Attachments:
named_operators_v3.patchapplication/x-patch; name=named_operators_v3.patchDownload
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 1017f2eed1..c5b8562cb5 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -31,6 +31,7 @@
#include "catalog/pg_type.h"
#include "miscadmin.h"
#include "parser/parse_oper.h"
+#include "parser/scansup.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -79,6 +80,10 @@ validOperatorName(const char *name)
if (len == 0 || len >= NAMEDATALEN)
return false;
+ /* Is this a Named Operator? */
+ if (validNamedOperator(name))
+ return true;
+
/* Can't contain any invalid characters */
/* Test string here should match op_chars in scan.l */
if (strspn(name, "~!@#^&|`?+-*/%<>=") != len)
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8eb..8587b82c8d 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -379,6 +379,16 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. \{foo}
+ *
+ * {namedopfailed*} are error rules to avoid scanner backup when
+ * {namedop} fails to match its trailing tokens.
+ */
+namedop \\\{{identifier}\}
+namedopfailed1 \\\{{identifier}
+namedopfailed2 \\\{
+
/*
* Numbers
*
@@ -768,6 +778,23 @@ other .
}
<xdolq><<EOF>> { yyerror("unterminated dollar-quoted string"); }
+{namedop} {
+ SET_YYLLOC();
+ if (yyleng >= NAMEDATALEN)
+ yyerror("operator name too long");
+ /* XXX Should we support double-quoted, case sensitive names? */
+ yylval->str = downcase_identifier(yytext, yyleng, false, false);
+ return Op;
+ }
+
+{namedopfailed1} {
+ yyerror("unexpected token");
+ }
+
+{namedopfailed2} {
+ yyerror("unexpected token");
+ }
+
{xdstart} {
SET_YYLLOC();
BEGIN(xd);
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 602108a40f..05c46ae09e 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -125,3 +125,70 @@ scanner_isspace(char ch)
return true;
return false;
}
+
+/*
+ * validNamedOperator() -- return true if name adheres to the scanner rule
+ * {namedop}
+ */
+bool
+validNamedOperator(const char *name)
+{
+ size_t len = strlen(name);
+ bool valid_identifier;
+ char *tmp;
+
+ if (len < 4 || len >= NAMEDATALEN)
+ return false;
+
+ if (name[0] != '\\' || name[1] != '{' || name[len-1] != '}')
+ return false;
+
+ tmp = pstrdup(name);
+
+ // Disregard the delimiters
+ tmp[len-1] = '\0';
+ valid_identifier = validIdentifier(tmp + 2);
+ pfree(tmp);
+
+ return valid_identifier;
+}
+
+/*
+ * validIdentifier() -- return true if name adheres to the scanner rule
+ * {identifier}
+ *
+ * Note: this function does not check if the identifier length
+ * is less than NAMEDATALEN.
+ */
+bool
+validIdentifier(const char *name)
+{
+ uint8 c;
+ size_t i, len = strlen(name);
+
+ // Reject if first character is not part of ident_start
+ c = name[0];
+ if ( !(c == '_'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+
+ // Reject if other characters are not part of ident_cont
+ for (i = 1; i < len; ++i)
+ {
+ c = name[i];
+ if ( !(c == '_' || c == '$'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >='0' && c <= '9')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+ }
+
+ return true;
+}
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec240..98a2561886 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -317,6 +317,16 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. :foo:
+ *
+ * {namedopfailed*} are error rules to avoid scanner backup when
+ * {namedop} fails to match its trailing tokens.
+ */
+namedop \\\{{identifier}\}
+namedopfailed1 \\\{{identifier}
+namedopfailed2 \\\{
+
/*
* Numbers
*
@@ -570,6 +580,18 @@ other .
ECHO;
}
+{namedop} {
+ ECHO;
+ }
+
+{namedopfailed1} {
+ ECHO;
+ }
+
+{namedopfailed2} {
+ ECHO;
+ }
+
{xdstart} {
BEGIN(xd);
ECHO;
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
index ff65224bf6..0f6aff8b44 100644
--- a/src/include/parser/scansup.h
+++ b/src/include/parser/scansup.h
@@ -24,4 +24,7 @@ extern void truncate_identifier(char *ident, int len, bool warn);
extern bool scanner_isspace(char ch);
+extern bool validNamedOperator(const char *name);
+extern bool validIdentifier(const char *name);
+
#endif /* SCANSUP_H */
On Fri, Jan 20, 2023 at 9:17 AM Gurjeet Singh <gurjeet@singh.im> wrote:
On Sat, Jan 14, 2023 at 6:14 AM Gurjeet Singh <gurjeet@singh.im> wrote:
I agree that an identifier _surrounded_ by the same token (e.g. #foo#)
or the pairing token (e.g. {foo}) looks better aesthetically, so I am
okay with any of the following variations of the scheme, as well:\#foo\# (tested; works)
\#foo# (not tested; reduces ident length by 1)We can choose a different character, instead of #. Perhaps \{foo} !
Please find attached the patch that uses \{foo} styled Named
Operators. This is in line with Tom's reluctant hint at possibly using
curly braces as delimiter characters. Since the curly braces are used
by the SQL Specification for row pattern recognition, this patch
proposes escaping the first of the curly braces.We can get rid of the leading backslash, if (a) we're confident that
SQL committee will not use curly braces anywhere else, and (b) if
we're confident that if/when Postgres supports Row Pattern Recognition
feature, we'll be able to treat curly braces inside the PATTERN clause
specially. Since both of those conditions are unlikely, I think we
must settle for the escaped-first-curly-brace style for the naming our
operators.Keeping with the previous posts, here's a sample SQL script showing
what the proposed syntax will look like in action. Personally, I
prefer the \#foo style, since the \# prefix stands out among the text,
better than \{..} does, and because # character is a better signal of
an operator than {.create operator \{add_point}
(function = box_add, leftarg = box, rightarg = point);
create table test(a box);
insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
select a as original, a \{add_point} '(1,1)' as modified from test;
drop operator \{add_point}(box, point);Best regards,
Gurjeet
http://Gurje.et
Hi,
Since `validIdentifier` doesn't modify the contents of `name` string, it
seems that there is no need to create `tmp` string in `validNamedOperator`.
You can pass the start and end offsets into the string (name) as second and
third parameters to `validIdentifier`.
Cheers
On Fri, Jan 20, 2023 at 9:32 AM Ted Yu <yuzhihong@gmail.com> wrote:
Since `validIdentifier` doesn't modify the contents of `name` string, it seems that there is no need to create `tmp` string in `validNamedOperator`.
You can pass the start and end offsets into the string (name) as second and third parameters to `validIdentifier`.
Thanks for reviewing the patch!
I was making a temporary copy of the string, since I had to modify it
before the validation, whereas the callee expects a `const char*`. I
agree that the same check can be done with more elegance, while
eliminating the temporary allocation. Please find the updated patch
attached.
Instead of passing the start and end of region I want to check, as
suggested, I'm now passing just the length of the string I want
validated. But I think that's for the better, since it now aligns with
the comment that validIdentifier() does not check if the passed string
is shorter than NAMEDATALEN.
Best regards,
Gurjeet
http://Gurje.et
Attachments:
named_operators_v4.patchapplication/octet-stream; name=named_operators_v4.patchDownload
diff --git a/src/backend/catalog/pg_operator.c b/src/backend/catalog/pg_operator.c
index 1017f2eed1..c5b8562cb5 100644
--- a/src/backend/catalog/pg_operator.c
+++ b/src/backend/catalog/pg_operator.c
@@ -31,6 +31,7 @@
#include "catalog/pg_type.h"
#include "miscadmin.h"
#include "parser/parse_oper.h"
+#include "parser/scansup.h"
#include "utils/acl.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
@@ -79,6 +80,10 @@ validOperatorName(const char *name)
if (len == 0 || len >= NAMEDATALEN)
return false;
+ /* Is this a Named Operator? */
+ if (validNamedOperator(name))
+ return true;
+
/* Can't contain any invalid characters */
/* Test string here should match op_chars in scan.l */
if (strspn(name, "~!@#^&|`?+-*/%<>=") != len)
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index db8b0fe8eb..8587b82c8d 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -379,6 +379,16 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. \{foo}
+ *
+ * {namedopfailed*} are error rules to avoid scanner backup when
+ * {namedop} fails to match its trailing tokens.
+ */
+namedop \\\{{identifier}\}
+namedopfailed1 \\\{{identifier}
+namedopfailed2 \\\{
+
/*
* Numbers
*
@@ -768,6 +778,23 @@ other .
}
<xdolq><<EOF>> { yyerror("unterminated dollar-quoted string"); }
+{namedop} {
+ SET_YYLLOC();
+ if (yyleng >= NAMEDATALEN)
+ yyerror("operator name too long");
+ /* XXX Should we support double-quoted, case sensitive names? */
+ yylval->str = downcase_identifier(yytext, yyleng, false, false);
+ return Op;
+ }
+
+{namedopfailed1} {
+ yyerror("unexpected token");
+ }
+
+{namedopfailed2} {
+ yyerror("unexpected token");
+ }
+
{xdstart} {
SET_YYLLOC();
BEGIN(xd);
diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 602108a40f..f209ac3752 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -125,3 +125,65 @@ scanner_isspace(char ch)
return true;
return false;
}
+
+/*
+ * validNamedOperator() -- return true if name adheres to the scanner rule
+ * {namedop}
+ */
+bool
+validNamedOperator(const char *name)
+{
+ size_t len = strlen(name);
+ bool valid_identifier;
+
+ if (len < 4 || len >= NAMEDATALEN)
+ return false;
+
+ if (name[0] != '\\' || name[1] != '{' || name[len-1] != '}')
+ return false;
+
+ // Disregard the delimiters
+ valid_identifier = validIdentifier(name + 2, len - 3);
+
+ return valid_identifier;
+}
+
+/*
+ * validIdentifier() -- return true if name adheres to the scanner rule
+ * {identifier}
+ *
+ * Note: this function does not check if the identifier length
+ * is less than NAMEDATALEN.
+ */
+bool
+validIdentifier(const char *name, size_t len)
+{
+ uint8 c;
+ size_t i;
+
+ // Reject if first character is not part of ident_start
+ c = name[0];
+ if ( !(c == '_'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+
+ // Reject if other characters are not part of ident_cont
+ for (i = 1; i < len; ++i)
+ {
+ c = name[i];
+ if ( !(c == '_' || c == '$'
+ || (c >='A' && c <= 'Z')
+ || (c >='a' && c <= 'z')
+ || (c >='0' && c <= '9')
+ || (c >= 0200 && c <= 0377)))
+ {
+ return false;
+ }
+ }
+
+ return true;
+}
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index ae531ec240..98a2561886 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -317,6 +317,16 @@ self [,()\[\].;\:\+\-\*\/\%\^\<\>\=]
op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
operator {op_chars}+
+/*
+ * Named Operators, e.g. :foo:
+ *
+ * {namedopfailed*} are error rules to avoid scanner backup when
+ * {namedop} fails to match its trailing tokens.
+ */
+namedop \\\{{identifier}\}
+namedopfailed1 \\\{{identifier}
+namedopfailed2 \\\{
+
/*
* Numbers
*
@@ -570,6 +580,18 @@ other .
ECHO;
}
+{namedop} {
+ ECHO;
+ }
+
+{namedopfailed1} {
+ ECHO;
+ }
+
+{namedopfailed2} {
+ ECHO;
+ }
+
{xdstart} {
BEGIN(xd);
ECHO;
diff --git a/src/include/parser/scansup.h b/src/include/parser/scansup.h
index ff65224bf6..b9eb0a96a4 100644
--- a/src/include/parser/scansup.h
+++ b/src/include/parser/scansup.h
@@ -24,4 +24,7 @@ extern void truncate_identifier(char *ident, int len, bool warn);
extern bool scanner_isspace(char ch);
+extern bool validNamedOperator(const char *name);
+extern bool validIdentifier(const char *name, size_t len);
+
#endif /* SCANSUP_H */
diff --git a/test.sql b/test.sql
new file mode 100644
index 0000000000..4d0c44c629
--- /dev/null
+++ b/test.sql
@@ -0,0 +1,38 @@
+
+create operator \{hello_named_operators} (function = box_add, leftarg = box, rightarg = point);
+
+select '((0,0), (1,1))'::box \{hello_named_operators} '(1,1)'::point;
+
+drop operator \{hello_named_operators}(box, point);
+
+create operator \{add_point} (function = box_add, leftarg = box, rightarg = point);
+
+create table test(a box);
+
+insert into test values('((0,0),(1,1))'), ('((0,0),(2,1))');
+
+select a as original, a \{add_point} '(1,1)' as modified from test;
+
+CREATE OPERATOR \{equal_1} (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = \{equal_1}
+);
+
+CREATE OPERATOR \{equal_2} (
+ LEFTARG = int,
+ RIGHTARG = int,
+ PROCEDURE = int4eq,
+ COMMUTATOR = equal_2 -- The absence of delimiters causes an expected failure
+);
+
+
+\! pg_dump -U postgres --schema public
+
+drop operator \{add_point}(box, point);
+drop operator \{equal_1}(int, int);
+drop operator \{equal_2}(int, int);
+
+drop table test;
+
On 12.01.23 14:55, Matthias van de Meent wrote:
Matter of taste, I guess. But more importantly, defining an operator
gives you many additional features that the planner can use to
optimize your query differently, which it can't do with functions. See
the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.I see. Wouldn't it be better then to instead make it possible for the
planner to detect the use of the functions used in operators and treat
them as aliases of the operator? Or am I missing something w.r.t.
differences between operator and function invocation?E.g. indexes on `int8pl(my_bigint, 1)` does not match queries for
`my_bigint + 1` (and vice versa), while they should be able to support
that, as OPERATOR(pg_catalog.+(int8, int8)) 's function is int8pl.
I have been thinking about something like this for a long time.
Basically, we would merge pg_proc and pg_operator internally. Then, all
the special treatment for operators would also be available to
two-argument functions.
On Fri, 27 Jan 2023 at 16:26, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
On 12.01.23 14:55, Matthias van de Meent wrote:
Matter of taste, I guess. But more importantly, defining an operator
gives you many additional features that the planner can use to
optimize your query differently, which it can't do with functions. See
the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.I see. Wouldn't it be better then to instead make it possible for the
planner to detect the use of the functions used in operators and treat
them as aliases of the operator? Or am I missing something w.r.t.
differences between operator and function invocation?E.g. indexes on `int8pl(my_bigint, 1)` does not match queries for
`my_bigint + 1` (and vice versa), while they should be able to support
that, as OPERATOR(pg_catalog.+(int8, int8)) 's function is int8pl.I have been thinking about something like this for a long time.
Basically, we would merge pg_proc and pg_operator internally. Then, all
the special treatment for operators would also be available to
two-argument functions.
And single-argument functions in case of prefix operators, right?
On 27.01.23 16:34, Matthias van de Meent wrote:
On Fri, 27 Jan 2023 at 16:26, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:On 12.01.23 14:55, Matthias van de Meent wrote:
Matter of taste, I guess. But more importantly, defining an operator
gives you many additional features that the planner can use to
optimize your query differently, which it can't do with functions. See
the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.I see. Wouldn't it be better then to instead make it possible for the
planner to detect the use of the functions used in operators and treat
them as aliases of the operator? Or am I missing something w.r.t.
differences between operator and function invocation?E.g. indexes on `int8pl(my_bigint, 1)` does not match queries for
`my_bigint + 1` (and vice versa), while they should be able to support
that, as OPERATOR(pg_catalog.+(int8, int8)) 's function is int8pl.I have been thinking about something like this for a long time.
Basically, we would merge pg_proc and pg_operator internally. Then, all
the special treatment for operators would also be available to
two-argument functions.And single-argument functions in case of prefix operators, right?
Right.
(The removal of postfix operators is helpful to remove ambiguity here.)
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 12.01.23 14:55, Matthias van de Meent wrote:
Matter of taste, I guess. But more importantly, defining an operator
gives you many additional features that the planner can use to
optimize your query differently, which it can't do with functions. See
the COMMUTATOR, HASHES, etc. clause in the CREATE OPERATOR command.
I see. Wouldn't it be better then to instead make it possible for the
planner to detect the use of the functions used in operators and treat
them as aliases of the operator? Or am I missing something w.r.t.
differences between operator and function invocation?
E.g. indexes on `int8pl(my_bigint, 1)` does not match queries for
`my_bigint + 1` (and vice versa), while they should be able to support
that, as OPERATOR(pg_catalog.+(int8, int8)) 's function is int8pl.
I have been thinking about something like this for a long time.
Basically, we would merge pg_proc and pg_operator internally. Then, all
the special treatment for operators would also be available to
two-argument functions.
I had a thought about this ...
I do not think this proposal is going anywhere as-written.
There seems very little chance that we can invent a syntax that
is concise, non-ugly, and not likely to get blindsided by future
SQL spec extensions. Even if we were sure that, say, "{foo}"
was safe from spec interference, the syntax "a {foo} b" has
exactly nothing to recommend it compared to "foo(a,b)".
It's not shorter, it's not standard, it won't help any pre-existing
queries, and it can't use function-call features such as named
arguments.
As Matthias said, what we actually need is for the planner to be able
to optimize function calls on the same basis as operators. We should
tackle that directly rather than inventing new syntax.
We could go after that by inventing a bunch of new function properties
to parallel operator properties, but there is a simpler way: just
teach the planner to look to see if a function call is a call of the
underlying function of some operator, and if so treat it like that
operator. Right now that'd be an expensive lookup, but we could
remove that objection with an index on pg_operator.oprcode or a
single new field in pg_proc.
This approach does have a couple of shortcomings:
* You still have to invent an operator name, even if you never
plan to use it in queries. This is just cosmetic though.
It's not going to matter if the operator name is long or looks like
line noise, if you only need to use it a few times in setup DDL.
* We could not extend this to support index functions with more than
two arguments, a request we've heard once in awhile in the past.
Our answer to that so far has been "make a function/operator with
one indexed argument and one composite-type argument", which is a
bit of an ugly workaround but seems to be serviceable enough.
On the whole I don't think these shortcomings are big enough
to justify all the work that would be involved in attaching
operator-like optimization information directly to functions.
(To mention just one nontrivial stumbling block: do you really
want to invent "shell functions" similar to the shell-operator
hack? If not, how are you going to handle declaration of
commutator pairs?)
In the long run this might lead to thinking of pg_operator as
an extension of pg_proc in the same way that pg_aggregate is.
But we have not unified pg_aggregate into pg_proc, and I don't
think anyone wants to, because pg_proc rows are undesirably
wide already. There's a similar objection to attaching
optimization fields directly to pg_proc.
You could imagine some follow-on internal cleanup like trying
to unify FuncExpr and OpExpr into a single node type (carrying
a function OID and optionally an operator OID). But that need
not have any user-visible impact either; it'd mainly be good
for eliminating a lot of near-duplicate code.
regards, tom lane
I wrote:
This approach does have a couple of shortcomings:
* You still have to invent an operator name, even if you never
plan to use it in queries. This is just cosmetic though.
It's not going to matter if the operator name is long or looks like
line noise, if you only need to use it a few times in setup DDL.
Oh, one other thought is that we could address that complaint
by allowing OPERATOR(identifier), so that your DDL could use
a meaningful name for the operator. I see that we don't
actually support OPERATOR() right now in CREATE OPERATOR or
ALTER OPERATOR:
regression=# create operator operator(+) (function = foo);
ERROR: syntax error at or near "("
LINE 1: create operator operator(+) (function = foo);
^
but I doubt that'd be hard to fix.
regards, tom lane
On 8 Feb 2023, at 16:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I do not think this proposal is going anywhere as-written.
Reading this thread, it seems there is concensus against this proposal in its
current form, and no updated patch has been presented, so I will mark this as
Returned with Feedback. Please feel free to resubmit to a future CF when there
is renewed interest in working on this.
--
Daniel Gustafsson