Hook for extensible parsing.
Hi,
Being able to extend core parser has been requested multiple times, and AFAICT
all previous attempts were rejected not because this isn't wanted but because
the proposed implementations required plugins to reimplement all of the core
grammar with their own changes, as bison generated parsers aren't extensible.
I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers. I'm sending this now as a follow-up of [1]/messages/by-id/20210315164336.ak32whndsxna5mjf@nol and to avoid duplicated
efforts, as multiple people are interested in that topic.
Obviously, since this is only about parsing, all modules can only implement
some kind of syntactic sugar, as they have to produce valid parsetrees, but
this could be a first step to later allow custom nodes and let plugins
implement e.g. new UTILITY commands.
So, this approach should allow different custom parser implementations:
1 implement only a few new commands on top of core grammar. For instance, an
extension could add support for CREATE [PHYSICAL | LOGICAL] REPLICATION SLOT
and rewrite that to a SelectStmt on top of the extisting function, or add a
CREATE HYPOTHETICAL INDEX, which would internally add a new option in
IndexStmt->options, to be intercepted in processUtility and bypass its
execution with the extension approach instead.
2 implement a totally different grammar for a different language. In case of
error, just silently fallback to core parser (or another hook) so both
parsers can still be used. Any language could be parsed as long as you can
produce a valid postgres parsetree.
3 implement a superuser of core grammar and replace core parser entirely. This
could arguably be done like the 1st case, but the idea is to avoid to
possibly parse the same input string twice, or to forbid the core parser if
that's somehow wanted.
I'm attaching some POC patches that implement this approach to start a
discussion. I split the infrastructure part in 2 patches to make it easier to
review, and I'm also adding 2 other patches with a small parser implementation
to be able to test the infrastructure. Here are some more details on the
patches and implementation details:
0001 simply adds a parser hook, which is called instead of raw_parser. This is
enough to make multiple parser coexist with one exception: multi-statement
query string. If multiple statements are provided, then all of them will be
parsed using the same grammar, which obviously won't work if they are written
for different grammars.
0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
ability to produce "select [col, ] col FROM table" parsetree, for testing
purpose. I chose it to ensure that everything works properly even with a
totally different grammar that has different keywords, which doesn't even ends
statements with a semicolon but a plain keyword.
0003 is where the real modifications are done to allow multi-statement string
to be parsed using different grammar. It implements a new MODE_SINGLE_QUERY
mode, which is used when a parser_hook is present. In that case,
pg_parse_query() will only parse part of the query string and loop until
everything is parsed (or some error happens).
pg_parse_query() will instruct plugins to parse a query at a time. They're
free to ignore that mode if they want to implement the 3rd mode. If so, they
should either return multiple RawStmt, a single RawStmt with a 0 or
strlen(query_string) stmt_len, or error out. Otherwise, they will implement
either mode 1 or 2, and they should always return a List containing a single
RawStmt with properly set stmt_len, even if the underlying statement is NULL.
This is required to properly skip valid strings that don't contain a
statements, and pg_parse_query() will skip RawStmt that don't contain an
underlying statement.
It also teaches the core parser to do the same, by optionally start parsing
somewhere in the input string and stop parsing once a valid statement is found.
Note that the whole input string is provided to the parsers in order to report
correct cursor position, so all token can get a correct location. This means
that raw_parser() signature needs an additional offset to know where the
parsing should start.
Finally, 0004 modifies the sqlol parser to implement the MODE_SINGLE_QUERY
mode, adds grammar for creating views and adds some regression test to validate
proper parsing and error location reporting with multi-statements input string.
As far as I can tell it's all working as expected but I may have missed some
usecases. The regression tests still work with the additional parser
configured. The only difference is for pg_stat_statements, as in
MODE_SINGLE_QUERY the trailing semicolon has to be included in the statement,
since other grammars may understand semicolons differently.
The obvious drawback is that it can cause overhead as the same input can be
parsed multiple time. This could be avoided with plugins implementing a GUC to
enable/disable their parser, so it's only active by default for some
users/database, or requires to be enabled interactively by the client app.
Also, the error messages can also be unhelpful for cases 1 and 2. If the
custom parser doesn't error out, it means that the syntax errors will be raised
by the core parser based on the core grammar, which will likely point out an
unrelated problem. Some of that can be avoided by letting the custom parsers
raise errors when they know for sure it's parsing what it's supposed to parse
(there's an example of that in the sqlol parser for qualified_name parsing, as
it can only happen once some specific keywords already matched). For the rest
of the errors, the only option I can think of is another GUC to let custom
parsers always raise an error (or raise a warning) to help people debug their
queries.
I'll park this patch in the next commitfest so it can be discussed when pg15
development starts.
Attachments:
v1-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload
From fa1f13fdf771c02e5a68388eac71493006113202 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 22:47:18 +0800
Subject: [PATCH v1 1/4] Add a parser_hook hook.
This does nothing but allow third-party plugins to implement a different
syntax, and fallback on the core parser if they don't implement a superset of
the supported core syntax.
---
src/backend/tcop/postgres.c | 16 ++++++++++++++--
src/include/tcop/tcopprot.h | 5 +++++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2d6d145ecc..e91db69830 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -99,6 +99,9 @@ int log_statement = LOGSTMT_NONE;
/* GUC variable for maximum stack depth (measured in kilobytes) */
int max_stack_depth = 100;
+/* Hook for plugins to get control in pg_parse_query() */
+parser_hook_type parser_hook = NULL;
+
/* wait N seconds to allow attach from a debugger */
int PostAuthDelay = 0;
@@ -589,18 +592,27 @@ ProcessClientWriteInterrupt(bool blocked)
* database tables. So, we rely on the raw parser to determine whether
* we've seen a COMMIT or ABORT command; when we are in abort state, other
* commands are not processed any further than the raw parse stage.
+ *
+ * To support loadable plugins that monitor the parsing or implements SQL
+ * syntactic sugar we provide a hook variable that lets a plugin get control
+ * before and after the standard parsing process. If the plugin only implement
+ * a subset of postgres supported syntax, it's its duty to call raw_parser (or
+ * the previous hook if any) for the statements it doesn't understand.
*/
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list;
+ List *raw_parsetree_list = NIL;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
+ else
+ raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 968345404e..131dc2b22e 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -17,6 +17,7 @@
#include "nodes/params.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
+#include "parser/parser.h"
#include "storage/procsignal.h"
#include "utils/guc.h"
#include "utils/queryenvironment.h"
@@ -43,6 +44,10 @@ typedef enum
extern PGDLLIMPORT int log_statement;
+/* Hook for plugins to get control in pg_parse_query() */
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+extern PGDLLIMPORT parser_hook_type parser_hook;
+
extern List *pg_parse_query(const char *query_string);
extern List *pg_rewrite_query(Query *query);
extern List *pg_analyze_and_rewrite(RawStmt *parsetree,
--
2.30.1
v1-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload
From 559760e9fa5adf32d6ca6ed2236fa5f4bb0471ea Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 23:54:02 +0800
Subject: [PATCH v1 2/4] Add a sqlol parser.
This is a toy example of alternative grammar that only accept a LOLCODE
compatible version of a
SELECT [column, ] column FROM tablename
and fallback on the core parser for everything else.
---
contrib/Makefile | 1 +
contrib/sqlol/.gitignore | 7 +
contrib/sqlol/Makefile | 33 ++
contrib/sqlol/sqlol.c | 107 +++++++
contrib/sqlol/sqlol_gram.y | 440 ++++++++++++++++++++++++++
contrib/sqlol/sqlol_gramparse.h | 61 ++++
contrib/sqlol/sqlol_keywords.c | 98 ++++++
contrib/sqlol/sqlol_keywords.h | 38 +++
contrib/sqlol/sqlol_kwlist.h | 21 ++
contrib/sqlol/sqlol_scan.l | 544 ++++++++++++++++++++++++++++++++
contrib/sqlol/sqlol_scanner.h | 118 +++++++
11 files changed, 1468 insertions(+)
create mode 100644 contrib/sqlol/.gitignore
create mode 100644 contrib/sqlol/Makefile
create mode 100644 contrib/sqlol/sqlol.c
create mode 100644 contrib/sqlol/sqlol_gram.y
create mode 100644 contrib/sqlol/sqlol_gramparse.h
create mode 100644 contrib/sqlol/sqlol_keywords.c
create mode 100644 contrib/sqlol/sqlol_keywords.h
create mode 100644 contrib/sqlol/sqlol_kwlist.h
create mode 100644 contrib/sqlol/sqlol_scan.l
create mode 100644 contrib/sqlol/sqlol_scanner.h
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..2a80cd137b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
postgres_fdw \
seg \
spi \
+ sqlol \
tablefunc \
tcn \
test_decoding \
diff --git a/contrib/sqlol/.gitignore b/contrib/sqlol/.gitignore
new file mode 100644
index 0000000000..3c4b587792
--- /dev/null
+++ b/contrib/sqlol/.gitignore
@@ -0,0 +1,7 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
+sqlol_gram.c
+sqlol_gram.h
+sqlol_scan.c
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
new file mode 100644
index 0000000000..025e77c4ff
--- /dev/null
+++ b/contrib/sqlol/Makefile
@@ -0,0 +1,33 @@
+# contrib/sqlol/Makefile
+
+MODULE_big = sqlol
+OBJS = \
+ $(WIN32RES) \
+ sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
+PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+
+sqlol_gram.h: sqlol_gram.c
+ touch $@
+
+sqlol_gram.c: BISONFLAGS += -d
+# sqlol_gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/src/include/parser/kwlist.h
+
+
+sqlol_scan.c: FLEXFLAGS = -CF -p -p
+sqlol_scan.c: FLEX_NO_BACKUP=yes
+sqlol_scan.c: FLEX_FIX_WARNING=yes
+
+
+# Force these dependencies to be known even without dependency info built:
+sqlol_gram.o sqlol_scan.o parser.o: sqlol_gram.h
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/sqlol
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
new file mode 100644
index 0000000000..b986966181
--- /dev/null
+++ b/contrib/sqlol/sqlol.c
@@ -0,0 +1,107 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol.c
+ *
+ *
+ * Copyright (c) 2008-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "tcop/tcopprot.h"
+
+#include "sqlol_gramparse.h"
+#include "sqlol_keywords.h"
+
+PG_MODULE_MAGIC;
+
+
+/* Saved hook values in case of unload */
+static parser_hook_type prev_parser_hook = NULL;
+
+void _PG_init(void);
+void _PG_fini(void);
+
+static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+ /* Install hooks. */
+ prev_parser_hook = parser_hook;
+ parser_hook = sqlol_parser_hook;
+}
+
+/*
+ * Module unload callback
+ */
+void
+_PG_fini(void)
+{
+ /* Uninstall hooks. */
+ parser_hook = prev_parser_hook;
+}
+
+/*
+ * sqlol_parser_hook: parse our grammar
+ */
+static List *
+sqlol_parser_hook(const char *str, RawParseMode mode)
+{
+ sqlol_yyscan_t yyscanner;
+ sqlol_base_yy_extra_type yyextra;
+ int yyresult;
+
+ if (mode != RAW_PARSE_DEFAULT)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ /* initialize the flex scanner */
+ yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
+ sqlol_ScanKeywords, sqlol_NumScanKeywords);
+
+ /* initialize the bison parser */
+ sqlol_parser_init(&yyextra);
+
+ /* Parse! */
+ yyresult = sqlol_base_yyparse(yyscanner);
+
+ /* Clean up (release memory) */
+ sqlol_scanner_finish(yyscanner);
+
+ /*
+ * Invalid statement, fallback on previous parser_hook if any or
+ * raw_parser()
+ */
+ if (yyresult)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ return yyextra.parsetree;
+}
+
+int
+sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, sqlol_yyscan_t yyscanner)
+{
+ int cur_token;
+
+ cur_token = sqlol_yylex(&(lvalp->sqlol_yystype), llocp, yyscanner);
+
+ return cur_token;
+}
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
new file mode 100644
index 0000000000..64d00d14ca
--- /dev/null
+++ b/contrib/sqlol/sqlol_gram.y
@@ -0,0 +1,440 @@
+%{
+
+/*#define YYDEBUG 1*/
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gram.y
+ * sqlol BISON rules/actions
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_gram.y
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/namespace.h"
+#include "nodes/makefuncs.h"
+
+#include "sqlol_gramparse.h"
+
+/*
+ * Location tracking support --- simpler than bison's default, since we only
+ * want to track the start position not the end position of each nonterminal.
+ */
+#define YYLLOC_DEFAULT(Current, Rhs, N) \
+ do { \
+ if ((N) > 0) \
+ (Current) = (Rhs)[1]; \
+ else \
+ (Current) = (-1); \
+ } while (0)
+
+/*
+ * The above macro assigns -1 (unknown) as the parse location of any
+ * nonterminal that was reduced from an empty rule, or whose leftmost
+ * component was reduced from an empty rule. This is problematic
+ * for nonterminals defined like
+ * OptFooList: / * EMPTY * / { ... } | OptFooList Foo { ... } ;
+ * because we'll set -1 as the location during the first reduction and then
+ * copy it during each subsequent reduction, leaving us with -1 for the
+ * location even when the list is not empty. To fix that, do this in the
+ * action for the nonempty rule(s):
+ * if (@$ < 0) @$ = @2;
+ * (Although we have many nonterminals that follow this pattern, we only
+ * bother with fixing @$ like this when the nonterminal's parse location
+ * is actually referenced in some rule.)
+ *
+ * A cleaner answer would be to make YYLLOC_DEFAULT scan all the Rhs
+ * locations until it's found one that's not -1. Then we'd get a correct
+ * location for any nonterminal that isn't entirely empty. But this way
+ * would add overhead to every rule reduction, and so far there's not been
+ * a compelling reason to pay that overhead.
+ */
+
+/*
+ * Bison doesn't allocate anything that needs to live across parser calls,
+ * so we can easily have it use palloc instead of malloc. This prevents
+ * memory leaks if we error out during parsing. Note this only works with
+ * bison >= 2.0. However, in bison 1.875 the default is to use alloca()
+ * if possible, so there's not really much problem anyhow, at least if
+ * you're building with gcc.
+ */
+#define YYMALLOC palloc
+#define YYFREE pfree
+
+
+#define parser_yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+#define parser_errposition(pos) sqlol_scanner_errposition(pos, yyscanner)
+
+static void sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner,
+ const char *msg);
+static RawStmt *makeRawStmt(Node *stmt, int stmt_location);
+static void updateRawStmtEnd(RawStmt *rs, int end_location);
+static Node *makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner);
+static void check_qualified_name(List *names, sqlol_yyscan_t yyscanner);
+static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
+
+%}
+
+%pure-parser
+%expect 0
+%name-prefix="sqlol_base_yy"
+%locations
+
+%parse-param {sqlol_yyscan_t yyscanner}
+%lex-param {sqlol_yyscan_t yyscanner}
+
+%union
+{
+ sqlol_YYSTYPE sqlol_yystype;
+ /* these fields must match sqlol_YYSTYPE: */
+ int ival;
+ char *str;
+ const char *keyword;
+
+ List *list;
+ Node *node;
+ Value *value;
+ RangeVar *range;
+ ResTarget *target;
+}
+
+%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+ indirection_el
+
+%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+
+%type <range> qualified_name
+
+%type <str> ColId ColLabel attr_name
+
+%type <target> gimmeh_el
+
+/*
+ * Non-keyword token types. These are hard-wired into the "flex" lexer.
+ * They must be listed first so that their numeric codes do not depend on
+ * the set of keywords. PL/pgSQL depends on this so that it can share the
+ * same lexer. If you add/change tokens here, fix PL/pgSQL to match!
+ *
+ */
+%token <str> IDENT FCONST SCONST Op
+
+/*
+ * If you want to make any keyword changes, update the keyword table in
+ * src/include/parser/kwlist.h and add new keywords to the appropriate one
+ * of the reserved-or-not-so-reserved keyword lists, below; search
+ * this file for "Keyword category lists".
+ */
+
+/* ordinary key words in alphabetical order */
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE
+
+
+%%
+
+/*
+ * The target production for the whole parse.
+ *
+ * Ordinarily we parse a list of statements, but if we see one of the
+ * special MODE_XXX symbols as first token, we parse something else.
+ * The options here correspond to enum RawParseMode, which see for details.
+ */
+parse_toplevel:
+ stmtmulti
+ {
+ pg_yyget_extra(yyscanner)->parsetree = $1;
+ }
+ ;
+
+/*
+ * At top level, we wrap each stmt with a RawStmt node carrying start location
+ * and length of the stmt's text. Notice that the start loc/len are driven
+ * entirely from semicolon locations (@2). It would seem natural to use
+ * @1 or @3 to get the true start location of a stmt, but that doesn't work
+ * for statements that can start with empty nonterminals (opt_with_clause is
+ * the main offender here); as noted in the comments for YYLLOC_DEFAULT,
+ * we'd get -1 for the location in such cases.
+ * We also take care to discard empty statements entirely.
+ */
+stmtmulti: stmtmulti KTHXBYE toplevel_stmt
+ {
+ if ($1 != NIL)
+ {
+ /* update length of previous stmt */
+ updateRawStmtEnd(llast_node(RawStmt, $1), @2);
+ }
+ if ($3 != NULL)
+ $$ = lappend($1, makeRawStmt($3, @2 + 1));
+ else
+ $$ = $1;
+ }
+ | toplevel_stmt
+ {
+ if ($1 != NULL)
+ $$ = list_make1(makeRawStmt($1, 0));
+ else
+ $$ = NIL;
+ }
+ ;
+
+/*
+ * toplevel_stmt includes BEGIN and END. stmt does not include them, because
+ * those words have different meanings in function bodys.
+ */
+toplevel_stmt:
+ stmt
+ ;
+
+stmt:
+ GimmehStmt
+ | /*EMPTY*/
+ { $$ = NULL; }
+ ;
+
+/*****************************************************************************
+ *
+ * GIMMEH statement
+ *
+ *****************************************************************************/
+
+GimmehStmt:
+ simple_gimmeh { $$ = $1; }
+ ;
+
+simple_gimmeh:
+ HAI FCONST I HAS A qualified_name
+ GIMMEH gimmeh_list
+ {
+ SelectStmt *n = makeNode(SelectStmt);
+ n->targetList = $8;
+ n->fromClause = list_make1($6);
+ $$ = (Node *)n;
+ }
+ ;
+
+gimmeh_list:
+ gimmeh_el { $$ = list_make1($1); }
+ | gimmeh_list ',' gimmeh_el { $$ = lappend($1, $3); }
+
+gimmeh_el:
+ columnref
+ {
+ $$ = makeNode(ResTarget);
+ $$->name = NULL;
+ $$->indirection = NIL;
+ $$->val = (Node *)$1;
+ $$->location = @1;
+ }
+
+qualified_name:
+ ColId
+ {
+ $$ = makeRangeVar(NULL, $1, @1);
+ }
+ | ColId indirection
+ {
+ check_qualified_name($2, yyscanner);
+ $$ = makeRangeVar(NULL, NULL, @1);
+ switch (list_length($2))
+ {
+ case 1:
+ $$->catalogname = NULL;
+ $$->schemaname = $1;
+ $$->relname = strVal(linitial($2));
+ break;
+ case 2:
+ $$->catalogname = $1;
+ $$->schemaname = strVal(linitial($2));
+ $$->relname = strVal(lsecond($2));
+ break;
+ default:
+ /*
+ * It's ok to error out here as at this point we
+ * already parsed a "HAI FCONST" preamble, and no
+ * other grammar is likely to accept a command
+ * starting with that, so there's no point trying
+ * to fall back on the other grammars.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("improper qualified name (too many dotted names): %s",
+ NameListToString(lcons(makeString($1), $2))),
+ parser_errposition(@1)));
+ break;
+ }
+ }
+ ;
+
+columnref: ColId
+ {
+ $$ = makeColumnRef($1, NIL, @1, yyscanner);
+ }
+ | ColId indirection
+ {
+ $$ = makeColumnRef($1, $2, @1, yyscanner);
+ }
+ ;
+
+ColId: IDENT { $$ = $1; }
+
+indirection:
+ indirection_el { $$ = list_make1($1); }
+ | indirection indirection_el { $$ = lappend($1, $2); }
+ ;
+
+indirection_el:
+ '.' attr_name
+ {
+ $$ = (Node *) makeString($2);
+ }
+ ;
+
+attr_name: ColLabel { $$ = $1; };
+
+ColLabel: IDENT { $$ = $1; }
+
+%%
+
+/*
+ * The signature of this function is required by bison. However, we
+ * ignore the passed yylloc and instead use the last token position
+ * available from the scanner.
+ */
+static void
+sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner, const char *msg)
+{
+ parser_yyerror(msg);
+}
+
+static RawStmt *
+makeRawStmt(Node *stmt, int stmt_location)
+{
+ RawStmt *rs = makeNode(RawStmt);
+
+ rs->stmt = stmt;
+ rs->stmt_location = stmt_location;
+ rs->stmt_len = 0; /* might get changed later */
+ return rs;
+}
+
+/* Adjust a RawStmt to reflect that it doesn't run to the end of the string */
+static void
+updateRawStmtEnd(RawStmt *rs, int end_location)
+{
+ /*
+ * If we already set the length, don't change it. This is for situations
+ * like "select foo ;; select bar" where the same statement will be last
+ * in the string for more than one semicolon.
+ */
+ if (rs->stmt_len > 0)
+ return;
+
+ /* OK, update length of RawStmt */
+ rs->stmt_len = end_location - rs->stmt_location;
+}
+
+static Node *
+makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner)
+{
+ /*
+ * Generate a ColumnRef node, with an A_Indirection node added if there
+ * is any subscripting in the specified indirection list. However,
+ * any field selection at the start of the indirection list must be
+ * transposed into the "fields" part of the ColumnRef node.
+ */
+ ColumnRef *c = makeNode(ColumnRef);
+ int nfields = 0;
+ ListCell *l;
+
+ c->location = location;
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Indices))
+ {
+ A_Indirection *i = makeNode(A_Indirection);
+
+ if (nfields == 0)
+ {
+ /* easy case - all indirection goes to A_Indirection */
+ c->fields = list_make1(makeString(colname));
+ i->indirection = check_indirection(indirection, yyscanner);
+ }
+ else
+ {
+ /* got to split the list in two */
+ i->indirection = check_indirection(list_copy_tail(indirection,
+ nfields),
+ yyscanner);
+ indirection = list_truncate(indirection, nfields);
+ c->fields = lcons(makeString(colname), indirection);
+ }
+ i->arg = (Node *) c;
+ return (Node *) i;
+ }
+ else if (IsA(lfirst(l), A_Star))
+ {
+ /* We only allow '*' at the end of a ColumnRef */
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ nfields++;
+ }
+ /* No subscripting, so all indirection gets added to field list */
+ c->fields = lcons(makeString(colname), indirection);
+ return (Node *) c;
+}
+
+/* check_qualified_name --- check the result of qualified_name production
+ *
+ * It's easiest to let the grammar production for qualified_name allow
+ * subscripts and '*', which we then must reject here.
+ */
+static void
+check_qualified_name(List *names, sqlol_yyscan_t yyscanner)
+{
+ ListCell *i;
+
+ foreach(i, names)
+ {
+ if (!IsA(lfirst(i), String))
+ parser_yyerror("syntax error");
+ }
+}
+
+/* check_indirection --- check the result of indirection production
+ *
+ * We only allow '*' at the end of the list, but it's hard to enforce that
+ * in the grammar, so do it here.
+ */
+static List *
+check_indirection(List *indirection, sqlol_yyscan_t yyscanner)
+{
+ ListCell *l;
+
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Star))
+ {
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ }
+ return indirection;
+}
+
+/* sqlol_parser_init()
+ * Initialize to parse one query string
+ */
+void
+sqlol_parser_init(sqlol_base_yy_extra_type *yyext)
+{
+ yyext->parsetree = NIL; /* in case grammar forgets to set it */
+}
diff --git a/contrib/sqlol/sqlol_gramparse.h b/contrib/sqlol/sqlol_gramparse.h
new file mode 100644
index 0000000000..58233a8d87
--- /dev/null
+++ b/contrib/sqlol/sqlol_gramparse.h
@@ -0,0 +1,61 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gramparse.h
+ * Shared definitions for the "raw" parser (flex and bison phases only)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_gramparse.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_GRAMPARSE_H
+#define SQLOL_GRAMPARSE_H
+
+#include "nodes/parsenodes.h"
+#include "sqlol_scanner.h"
+
+/*
+ * NB: include gram.h only AFTER including scanner.h, because scanner.h
+ * is what #defines YYLTYPE.
+ */
+#include "sqlol_gram.h"
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around. Private
+ * state needed for raw parsing/lexing goes here.
+ */
+typedef struct sqlol_base_yy_extra_type
+{
+ /*
+ * Fields used by the core scanner.
+ */
+ sqlol_yy_extra_type sqlol_yy_extra;
+
+ /*
+ * State variables that belong to the grammar.
+ */
+ List *parsetree; /* final parse result is delivered here */
+} sqlol_base_yy_extra_type;
+
+/*
+ * In principle we should use yyget_extra() to fetch the yyextra field
+ * from a yyscanner struct. However, flex always puts that field first,
+ * and this is sufficiently performance-critical to make it seem worth
+ * cheating a bit to use an inline macro.
+ */
+#define pg_yyget_extra(yyscanner) (*((sqlol_base_yy_extra_type **) (yyscanner)))
+
+
+/* from parser.c */
+extern int sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+
+/* from gram.y */
+extern void sqlol_parser_init(sqlol_base_yy_extra_type *yyext);
+extern int sqlol_baseyyparse(sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_GRAMPARSE_H */
diff --git a/contrib/sqlol/sqlol_keywords.c b/contrib/sqlol/sqlol_keywords.c
new file mode 100644
index 0000000000..dbbdf5493c
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.c
@@ -0,0 +1,98 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.c
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * sqlol/sqlol_keywords.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "sqlol_gramparse.h"
+
+#define PG_KEYWORD(a,b,c) {a,b,c},
+
+const sqlol_ScanKeyword sqlol_ScanKeywords[] = {
+#include "sqlol_kwlist.h"
+};
+
+const int sqlol_NumScanKeywords = lengthof(sqlol_ScanKeywords);
+
+#undef PG_KEYWORD
+
+
+/*
+ * ScanKeywordLookup - see if a given word is a keyword
+ *
+ * The table to be searched is passed explicitly, so that this can be used
+ * to search keyword lists other than the standard list appearing above.
+ *
+ * Returns a pointer to the sqlol_ScanKeyword table entry, or NULL if no match.
+ *
+ * The match is done case-insensitively. Note that we deliberately use a
+ * dumbed-down case conversion that will only translate 'A'-'Z' into 'a'-'z',
+ * even if we are in a locale where tolower() would produce more or different
+ * translations. This is to conform to the SQL99 spec, which says that
+ * keywords are to be matched in this way even though non-keyword identifiers
+ * receive a different case-normalization mapping.
+ */
+const sqlol_ScanKeyword *
+sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ int len,
+ i;
+ char word[NAMEDATALEN];
+ const sqlol_ScanKeyword *low;
+ const sqlol_ScanKeyword *high;
+
+ len = strlen(text);
+ /* We assume all keywords are shorter than NAMEDATALEN. */
+ if (len >= NAMEDATALEN)
+ return NULL;
+
+ /*
+ * Apply an ASCII-only downcasing. We must not use tolower() since it may
+ * produce the wrong translation in some locales (eg, Turkish).
+ */
+ for (i = 0; i < len; i++)
+ {
+ char ch = text[i];
+
+ if (ch >= 'A' && ch <= 'Z')
+ ch += 'a' - 'A';
+ word[i] = ch;
+ }
+ word[len] = '\0';
+
+ /*
+ * Now do a binary search using plain strcmp() comparison.
+ */
+ low = keywords;
+ high = keywords + (num_keywords - 1);
+ while (low <= high)
+ {
+ const sqlol_ScanKeyword *middle;
+ int difference;
+
+ middle = low + (high - low) / 2;
+ difference = strcmp(middle->name, word);
+ if (difference == 0)
+ return middle;
+ else if (difference < 0)
+ low = middle + 1;
+ else
+ high = middle - 1;
+ }
+
+ return NULL;
+}
+
diff --git a/contrib/sqlol/sqlol_keywords.h b/contrib/sqlol/sqlol_keywords.h
new file mode 100644
index 0000000000..bc4acf4541
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.h
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.h
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_keywords.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SQLOL_KEYWORDS_H
+#define SQLOL_KEYWORDS_H
+
+/* Keyword categories --- should match lists in gram.y */
+#define UNRESERVED_KEYWORD 0
+#define COL_NAME_KEYWORD 1
+#define TYPE_FUNC_NAME_KEYWORD 2
+#define RESERVED_KEYWORD 3
+
+
+typedef struct sqlol_ScanKeyword
+{
+ const char *name; /* in lower case */
+ int16 value; /* grammar's token code */
+ int16 category; /* see codes above */
+} sqlol_ScanKeyword;
+
+extern PGDLLIMPORT const sqlol_ScanKeyword sqlol_ScanKeywords[];
+extern PGDLLIMPORT const int sqlol_NumScanKeywords;
+
+extern const sqlol_ScanKeyword *sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+
+#endif /* SQLOL_KEYWORDS_H */
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
new file mode 100644
index 0000000000..2de3893ee4
--- /dev/null
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_kwlist.h
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_kwlist.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/* name, value, category, is-bare-label */
+PG_KEYWORD("a", A, UNRESERVED_KEYWORD)
+PG_KEYWORD("gimmeh", GIMMEH, UNRESERVED_KEYWORD)
+PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
+PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
+PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
+PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
new file mode 100644
index 0000000000..a7088b8390
--- /dev/null
+++ b/contrib/sqlol/sqlol_scan.l
@@ -0,0 +1,544 @@
+%top{
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scan.l
+ * lexical scanner for sqlol
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_scan.l
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/string.h"
+#include "sqlol_gramparse.h"
+#include "parser/scansup.h"
+#include "mb/pg_wchar.h"
+
+#include "sqlol_keywords.h"
+}
+
+%{
+
+/* LCOV_EXCL_START */
+
+/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
+#undef fprintf
+#define fprintf(file, fmt, msg) fprintf_to_ereport(fmt, msg)
+
+static void
+fprintf_to_ereport(const char *fmt, const char *msg)
+{
+ ereport(ERROR, (errmsg_internal("%s", msg)));
+}
+
+
+/*
+ * Set the type of YYSTYPE.
+ */
+#define YYSTYPE sqlol_YYSTYPE
+
+/*
+ * Set the type of yyextra. All state variables used by the scanner should
+ * be in yyextra, *not* statically allocated.
+ */
+#define YY_EXTRA_TYPE sqlol_yy_extra_type *
+
+/*
+ * Each call to yylex must set yylloc to the location of the found token
+ * (expressed as a byte offset from the start of the input text).
+ * When we parse a token that requires multiple lexer rules to process,
+ * this should be done in the first such rule, else yylloc will point
+ * into the middle of the token.
+ */
+#define SET_YYLLOC() (*(yylloc) = yytext - yyextra->scanbuf)
+
+/*
+ * Advance yylloc by the given number of bytes.
+ */
+#define ADVANCE_YYLLOC(delta) ( *(yylloc) += (delta) )
+
+/*
+ * Sometimes, we do want yylloc to point into the middle of a token; this is
+ * useful for instance to throw an error about an escape sequence within a
+ * string literal. But if we find no error there, we want to revert yylloc
+ * to the token start, so that that's the location reported to the parser.
+ * Use PUSH_YYLLOC/POP_YYLLOC to save/restore yylloc around such code.
+ * (Currently the implied "stack" is just one location, but someday we might
+ * need to nest these.)
+ */
+#define PUSH_YYLLOC() (yyextra->save_yylloc = *(yylloc))
+#define POP_YYLLOC() (*(yylloc) = yyextra->save_yylloc)
+
+#define startlit() ( yyextra->literallen = 0 )
+static void addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner);
+static void addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner);
+static char *litbufdup(sqlol_yyscan_t yyscanner);
+
+#define yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+
+#define lexer_errposition() sqlol_scanner_errposition(*(yylloc), yyscanner)
+
+/*
+ * Work around a bug in flex 2.5.35: it emits a couple of functions that
+ * it forgets to emit declarations for. Since we use -Wmissing-prototypes,
+ * this would cause warnings. Providing our own declarations should be
+ * harmless even when the bug gets fixed.
+ */
+extern int sqlol_yyget_column(yyscan_t yyscanner);
+extern void sqlol_yyset_column(int column_no, yyscan_t yyscanner);
+
+%}
+
+%option reentrant
+%option bison-bridge
+%option bison-locations
+%option 8bit
+%option never-interactive
+%option nodefault
+%option noinput
+%option nounput
+%option noyywrap
+%option noyyalloc
+%option noyyrealloc
+%option noyyfree
+%option warn
+%option prefix="sqlol_yy"
+
+/*
+ * OK, here is a short description of lex/flex rules behavior.
+ * The longest pattern which matches an input string is always chosen.
+ * For equal-length patterns, the first occurring in the rules list is chosen.
+ * INITIAL is the starting state, to which all non-conditional rules apply.
+ * Exclusive states change parsing rules while the state is active. When in
+ * an exclusive state, only those rules defined for that state apply.
+ *
+ * We use exclusive states for quoted strings, extended comments,
+ * and to eliminate parsing troubles for numeric strings.
+ * Exclusive states:
+ * <xd> delimited identifiers (double-quoted identifiers)
+ * <xq> standard quoted strings
+ * <xqs> quote stop (detect continued strings)
+ *
+ * Remember to add an <<EOF>> case whenever you add a new exclusive state!
+ * The default one is probably not the right thing.
+ */
+
+%x xd
+%x xq
+%x xqs
+
+/*
+ * In order to make the world safe for Windows and Mac clients as well as
+ * Unix ones, we accept either \n or \r as a newline. A DOS-style \r\n
+ * sequence will be seen as two successive newlines, but that doesn't cause
+ * any problems. Comments that start with -- and extend to the next
+ * newline are treated as equivalent to a single whitespace character.
+ *
+ * NOTE a fine point: if there is no newline following --, we will absorb
+ * everything to the end of the input as a comment. This is correct. Older
+ * versions of Postgres failed to recognize -- as a comment if the input
+ * did not end with a newline.
+ *
+ * XXX perhaps \f (formfeed) should be treated as a newline as well?
+ *
+ * XXX if you change the set of whitespace characters, fix scanner_isspace()
+ * to agree.
+ */
+
+space [ \t\n\r\f]
+horiz_space [ \t\f]
+newline [\n\r]
+non_newline [^\n\r]
+
+comment ("--"{non_newline}*)
+
+whitespace ({space}+|{comment})
+
+/*
+ * SQL requires at least one newline in the whitespace separating
+ * string literals that are to be concatenated. Silly, but who are we
+ * to argue? Note that {whitespace_with_newline} should not have * after
+ * it, whereas {whitespace} should generally have a * after it...
+ */
+
+special_whitespace ({space}+|{comment}{newline})
+horiz_whitespace ({horiz_space}|{comment})
+whitespace_with_newline ({horiz_whitespace}*{newline}{special_whitespace}*)
+
+quote '
+/* If we see {quote} then {quotecontinue}, the quoted string continues */
+quotecontinue {whitespace_with_newline}{quote}
+
+/*
+ * {quotecontinuefail} is needed to avoid lexer backup when we fail to match
+ * {quotecontinue}. It might seem that this could just be {whitespace}*,
+ * but if there's a dash after {whitespace_with_newline}, it must be consumed
+ * to see if there's another dash --- which would start a {comment} and thus
+ * allow continuation of the {quotecontinue} token.
+ */
+quotecontinuefail {whitespace}*"-"?
+
+/* Extended quote
+ * xqdouble implements embedded quote, ''''
+ */
+xqstart {quote}
+xqdouble {quote}{quote}
+xqinside [^']+
+
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote \"
+xdstart {dquote}
+xdstop {dquote}
+xddouble {dquote}{dquote}
+xdinside [^"]+
+
+digit [0-9]
+ident_start [A-Za-z\200-\377_]
+ident_cont [A-Za-z\200-\377_0-9\$]
+
+identifier {ident_start}{ident_cont}*
+
+decimal (({digit}+)|({digit}*\.{digit}+)|({digit}+\.{digit}*))
+
+other .
+
+%%
+
+{whitespace} {
+ /* ignore */
+ }
+
+
+{xqstart} {
+ yyextra->saw_non_ascii = false;
+ SET_YYLLOC();
+ BEGIN(xq);
+ startlit();
+}
+<xq>{quote} {
+ /*
+ * When we are scanning a quoted string and see an end
+ * quote, we must look ahead for a possible continuation.
+ * If we don't see one, we know the end quote was in fact
+ * the end of the string. To reduce the lexer table size,
+ * we use a single "xqs" state to do the lookahead for all
+ * types of strings.
+ */
+ yyextra->state_before_str_stop = YYSTATE;
+ BEGIN(xqs);
+ }
+<xqs>{quotecontinue} {
+ /*
+ * Found a quote continuation, so return to the in-quote
+ * state and continue scanning the literal. Nothing is
+ * added to the literal's contents.
+ */
+ BEGIN(yyextra->state_before_str_stop);
+ }
+<xqs>{quotecontinuefail} |
+<xqs>{other} |
+<xqs><<EOF>> {
+ /*
+ * Failed to see a quote continuation. Throw back
+ * everything after the end quote, and handle the string
+ * according to the state we were in previously.
+ */
+ yyless(0);
+ BEGIN(INITIAL);
+
+ switch (yyextra->state_before_str_stop)
+ {
+ case xq:
+ /*
+ * Check that the data remains valid, if it might
+ * have been made invalid by unescaping any chars.
+ */
+ if (yyextra->saw_non_ascii)
+ pg_verifymbstr(yyextra->literalbuf,
+ yyextra->literallen,
+ false);
+ yylval->str = litbufdup(yyscanner);
+ return SCONST;
+ default:
+ yyerror("unhandled previous state in xqs");
+ }
+ }
+
+<xq>{xqdouble} {
+ addlitchar('\'', yyscanner);
+ }
+<xq>{xqinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xq><<EOF>> { yyerror("unterminated quoted string"); }
+
+
+{xdstart} {
+ SET_YYLLOC();
+ BEGIN(xd);
+ startlit();
+ }
+<xd>{xdstop} {
+ char *ident;
+
+ BEGIN(INITIAL);
+ if (yyextra->literallen == 0)
+ yyerror("zero-length delimited identifier");
+ ident = litbufdup(yyscanner);
+ if (yyextra->literallen >= NAMEDATALEN)
+ truncate_identifier(ident, yyextra->literallen, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+<xd>{xddouble} {
+ addlitchar('"', yyscanner);
+ }
+<xd>{xdinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xd><<EOF>> { yyerror("unterminated quoted identifier"); }
+
+{decimal} {
+ SET_YYLLOC();
+ yylval->str = pstrdup(yytext);
+ return FCONST;
+ }
+
+{identifier} {
+ const sqlol_ScanKeyword *keyword;
+ char *ident;
+
+ SET_YYLLOC();
+
+ /* Is it a keyword? */
+ keyword = sqlol_ScanKeywordLookup(yytext,
+ yyextra->keywords,
+ yyextra->num_keywords);
+ if (keyword != NULL)
+ {
+ yylval->keyword = keyword->name;
+ return keyword->value;
+ }
+
+ /*
+ * No. Convert the identifier to lower case, and truncate
+ * if necessary.
+ */
+ ident = downcase_truncate_identifier(yytext, yyleng, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+
+{other} {
+ SET_YYLLOC();
+ return yytext[0];
+ }
+
+<<EOF>> {
+ SET_YYLLOC();
+ yyterminate();
+ }
+
+%%
+
+/* LCOV_EXCL_STOP */
+
+/*
+ * Arrange access to yyextra for subroutines of the main yylex() function.
+ * We expect each subroutine to have a yyscanner parameter. Rather than
+ * use the yyget_xxx functions, which might or might not get inlined by the
+ * compiler, we cheat just a bit and cast yyscanner to the right type.
+ */
+#undef yyextra
+#define yyextra (((struct yyguts_t *) yyscanner)->yyextra_r)
+
+/* Likewise for a couple of other things we need. */
+#undef yylloc
+#define yylloc (((struct yyguts_t *) yyscanner)->yylloc_r)
+#undef yyleng
+#define yyleng (((struct yyguts_t *) yyscanner)->yyleng_r)
+
+
+/*
+ * scanner_errposition
+ * Report a lexer or grammar error cursor position, if possible.
+ *
+ * This is expected to be used within an ereport() call. The return value
+ * is a dummy (always 0, in fact).
+ *
+ * Note that this can only be used for messages emitted during raw parsing
+ * (essentially, sqlol_scan.l, sqlol_parser.c, sqlol_and gram.y), since it
+ * requires the yyscanner struct to still be available.
+ */
+int
+sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner)
+{
+ int pos;
+
+ if (location < 0)
+ return 0; /* no-op if location is unknown */
+
+ /* Convert byte offset to character number */
+ pos = pg_mbstrlen_with_len(yyextra->scanbuf, location) + 1;
+ /* And pass it to the ereport mechanism */
+ return errposition(pos);
+}
+
+/*
+ * scanner_yyerror
+ * Report a lexer or grammar error.
+ *
+ * Just ignore as we'll fallback to raw_parser().
+ */
+void
+sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner)
+{
+ return;
+}
+
+
+/*
+ * Called before any actual parsing is done
+ */
+sqlol_yyscan_t
+sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ Size slen = strlen(str);
+ yyscan_t scanner;
+
+ if (yylex_init(&scanner) != 0)
+ elog(ERROR, "yylex_init() failed: %m");
+
+ sqlol_yyset_extra(yyext, scanner);
+
+ yyext->keywords = keywords;
+ yyext->num_keywords = num_keywords;
+
+ /*
+ * Make a scan buffer with special termination needed by flex.
+ */
+ yyext->scanbuf = (char *) palloc(slen + 2);
+ yyext->scanbuflen = slen;
+ memcpy(yyext->scanbuf, str, slen);
+ yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
+ yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+
+ /* initialize literal buffer to a reasonable but expansible size */
+ yyext->literalalloc = 1024;
+ yyext->literalbuf = (char *) palloc(yyext->literalalloc);
+ yyext->literallen = 0;
+
+ return scanner;
+}
+
+
+/*
+ * Called after parsing is done to clean up after scanner_init()
+ */
+void
+sqlol_scanner_finish(sqlol_yyscan_t yyscanner)
+{
+ /*
+ * We don't bother to call yylex_destroy(), because all it would do is
+ * pfree a small amount of control storage. It's cheaper to leak the
+ * storage until the parsing context is destroyed. The amount of space
+ * involved is usually negligible compared to the output parse tree
+ * anyway.
+ *
+ * We do bother to pfree the scanbuf and literal buffer, but only if they
+ * represent a nontrivial amount of space. The 8K cutoff is arbitrary.
+ */
+ if (yyextra->scanbuflen >= 8192)
+ pfree(yyextra->scanbuf);
+ if (yyextra->literalalloc >= 8192)
+ pfree(yyextra->literalbuf);
+}
+
+
+static void
+addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + yleng) >= yyextra->literalalloc)
+ {
+ do
+ {
+ yyextra->literalalloc *= 2;
+ } while ((yyextra->literallen + yleng) >= yyextra->literalalloc);
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ memcpy(yyextra->literalbuf + yyextra->literallen, ytext, yleng);
+ yyextra->literallen += yleng;
+}
+
+
+static void
+addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + 1) >= yyextra->literalalloc)
+ {
+ yyextra->literalalloc *= 2;
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ yyextra->literalbuf[yyextra->literallen] = ychar;
+ yyextra->literallen += 1;
+}
+
+
+/*
+ * Create a palloc'd copy of literalbuf, adding a trailing null.
+ */
+static char *
+litbufdup(sqlol_yyscan_t yyscanner)
+{
+ int llen = yyextra->literallen;
+ char *new;
+
+ new = palloc(llen + 1);
+ memcpy(new, yyextra->literalbuf, llen);
+ new[llen] = '\0';
+ return new;
+}
+
+/*
+ * Interface functions to make flex use palloc() instead of malloc().
+ * It'd be better to make these static, but flex insists otherwise.
+ */
+
+void *
+sqlol_yyalloc(yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ return palloc(bytes);
+}
+
+void *
+sqlol_yyrealloc(void *ptr, yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ return repalloc(ptr, bytes);
+ else
+ return palloc(bytes);
+}
+
+void
+sqlol_yyfree(void *ptr, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ pfree(ptr);
+}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
new file mode 100644
index 0000000000..0a497e9d91
--- /dev/null
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scanner.h
+ * API for the core scanner (flex machine)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_scanner.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_SCANNER_H
+#define SQLOL_SCANNER_H
+
+#include "sqlol_keywords.h"
+
+/*
+ * The scanner returns extra data about scanned tokens in this union type.
+ * Note that this is a subset of the fields used in YYSTYPE of the bison
+ * parsers built atop the scanner.
+ */
+typedef union sqlol_YYSTYPE
+{
+ int ival; /* for integer literals */
+ char *str; /* for identifiers and non-integer literals */
+ const char *keyword; /* canonical spelling of keywords */
+} sqlol_YYSTYPE;
+
+/*
+ * We track token locations in terms of byte offsets from the start of the
+ * source string, not the column number/line number representation that
+ * bison uses by default. Also, to minimize overhead we track only one
+ * location (usually the first token location) for each construct, not
+ * the beginning and ending locations as bison does by default. It's
+ * therefore sufficient to make YYLTYPE an int.
+ */
+#define YYLTYPE int
+
+/*
+ * Another important component of the scanner's API is the token code numbers.
+ * However, those are not defined in this file, because bison insists on
+ * defining them for itself. The token codes used by the core scanner are
+ * the ASCII characters plus these:
+ * %token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
+ * %token <ival> ICONST PARAM
+ * %token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
+ * %token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+ * The above token definitions *must* be the first ones declared in any
+ * bison parser built atop this scanner, so that they will have consistent
+ * numbers assigned to them (specifically, IDENT = 258 and so on).
+ */
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around.
+ * Private state needed by the core scanner goes here. Note that the actual
+ * yy_extra struct may be larger and have this as its first component, thus
+ * allowing the calling parser to keep some fields of its own in YY_EXTRA.
+ */
+typedef struct sqlol_yy_extra_type
+{
+ /*
+ * The string the scanner is physically scanning. We keep this mainly so
+ * that we can cheaply compute the offset of the current token (yytext).
+ */
+ char *scanbuf;
+ Size scanbuflen;
+
+ /*
+ * The keyword list to use, and the associated grammar token codes.
+ */
+ const sqlol_ScanKeyword *keywords;
+ int num_keywords;
+
+ /*
+ * literalbuf is used to accumulate literal values when multiple rules are
+ * needed to parse a single literal. Call startlit() to reset buffer to
+ * empty, addlit() to add text. NOTE: the string in literalbuf is NOT
+ * necessarily null-terminated, but there always IS room to add a trailing
+ * null at offset literallen. We store a null only when we need it.
+ */
+ char *literalbuf; /* palloc'd expandable buffer */
+ int literallen; /* actual current string length */
+ int literalalloc; /* current allocated buffer size */
+
+ /*
+ * Random assorted scanner state.
+ */
+ int state_before_str_stop; /* start cond. before end quote */
+ YYLTYPE save_yylloc; /* one-element stack for PUSH_YYLLOC() */
+
+ /* state variables for literal-lexing warnings */
+ bool saw_non_ascii;
+} sqlol_yy_extra_type;
+
+/*
+ * The type of yyscanner is opaque outside scan.l.
+ */
+typedef void *sqlol_yyscan_t;
+
+
+/* Constant data exported from parser/scan.l */
+extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
+
+/* Entry points in parser/scan.l */
+extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
+extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+extern int sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner);
+extern void sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_SCANNER_H */
--
2.30.1
v1-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload
From 566b43c29526c2c6a64b695a049531a747501540 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 01:33:42 +0800
Subject: [PATCH v1 3/4] Add a new MODE_SINGLE_QUERY to the core parser and use
it in pg_parse_query.
If a third-party module provides a parser_hook, pg_parse_query() switches to
single-query parsing so multi-query commands using different grammar can work
properly. If the third-party module supports the full set of SQL we support,
or want to prevent fallback on the core parser, it can ignore the
MODE_SINGLE_QUERY mode and parse the full query string. In that case they must
return a List with more than one RawStmt or a single RawStmt with a 0 length to
stop the parsing phase, or raise an ERROR.
Otherwise, plugins should parse a single query only and always return a List
containing a single RawStmt with a properly set length (possibly 0 if it was a
single query without end of query delimiter). If the command is valid but
doesn't contain any statements (e.g. a single semi-colon), a single RawStmt
with a NULL stmt field should be returned, containing the consumed query string
length so we can move to the next command in a single pass rather than 1 byte
at a time.
Also, third-party modules can choose to ignore some or all of parsing error if
they want to implement only subset of postgres suppoted syntax, or even a
totally different syntax, and fall-back on core grammar for unhandled case. In
thase case, they should set the error flag to true. The returned List will be
ignored and the same offset of the input string will be parsed using the core
parser.
Finally, note that third-party plugins that wants to fallback on other grammar
should first try to call a previous parser hook if any before setting the error
switch and returning.
---
.../pg_stat_statements/pg_stat_statements.c | 3 +-
src/backend/commands/tablecmds.c | 2 +-
src/backend/executor/spi.c | 4 +-
src/backend/parser/gram.y | 27 ++++
src/backend/parser/parse_type.c | 2 +-
src/backend/parser/parser.c | 7 +-
src/backend/parser/scan.l | 13 +-
src/backend/tcop/postgres.c | 131 ++++++++++++++++--
src/include/parser/parser.h | 5 +-
src/include/parser/scanner.h | 3 +-
src/include/tcop/tcopprot.h | 3 +-
src/pl/plpgsql/src/pl_gram.y | 2 +-
src/pl/plpgsql/src/pl_scanner.c | 2 +-
13 files changed, 179 insertions(+), 25 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f42f07622e..7c911ef58d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2711,7 +2711,8 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
yyscanner = scanner_init(query,
&yyextra,
&ScanKeywords,
- ScanKeywordTokens);
+ ScanKeywordTokens,
+ 0);
/* we don't want to re-emit any escape string warnings */
yyextra.escape_string_warning = false;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d9ba87a2a3..cc9c86778c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12602,7 +12602,7 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
* parse_analyze() or the rewriter, but instead we need to pass them
* through parse_utilcmd.c to make them ready for execution.
*/
- raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT);
+ raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT, 0);
querytree_list = NIL;
foreach(list_item, raw_parsetree_list)
{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 00aa78ea53..e456172fef 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2121,7 +2121,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Do parse analysis and rule rewrite for each raw parsetree, storing the
@@ -2229,7 +2229,7 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Construct plancache entries, but don't do parse analysis yet.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b4ab4014c8..9733b30529 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -753,6 +753,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token MODE_PLPGSQL_ASSIGN1
%token MODE_PLPGSQL_ASSIGN2
%token MODE_PLPGSQL_ASSIGN3
+%token MODE_SINGLE_QUERY
/* Precedence: lowest to highest */
@@ -858,6 +859,32 @@ parse_toplevel:
pg_yyget_extra(yyscanner)->parsetree =
list_make1(makeRawStmt((Node *) n, 0));
}
+ | MODE_SINGLE_QUERY toplevel_stmt ';'
+ {
+ RawStmt *raw = makeRawStmt($2, 0);
+ updateRawStmtEnd(raw, @3 + 1);
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string and move to the next command.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(raw);
+ YYACCEPT;
+ }
+ /*
+ * We need to explicitly look for EOF to parse non-semicolon
+ * terminated statements in single query mode, as we could
+ * otherwise successfully parse the beginning of an otherwise
+ * invalid query.
+ */
+ | MODE_SINGLE_QUERY toplevel_stmt YYEOF
+ {
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(makeRawStmt($2, 0));
+ YYACCEPT;
+ }
;
/*
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index abe131ebeb..e9a7b5d62a 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -746,7 +746,7 @@ typeStringToTypeName(const char *str)
ptserrcontext.previous = error_context_stack;
error_context_stack = &ptserrcontext;
- raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME);
+ raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME, 0);
error_context_stack = ptserrcontext.previous;
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 875de7ba28..7297733168 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -39,7 +39,7 @@ static char *str_udeescape(const char *str, char escape,
* list have the form required by the specified RawParseMode.
*/
List *
-raw_parser(const char *str, RawParseMode mode)
+raw_parser(const char *str, RawParseMode mode, int offset)
{
core_yyscan_t yyscanner;
base_yy_extra_type yyextra;
@@ -47,7 +47,7 @@ raw_parser(const char *str, RawParseMode mode)
/* initialize the flex scanner */
yyscanner = scanner_init(str, &yyextra.core_yy_extra,
- &ScanKeywords, ScanKeywordTokens);
+ &ScanKeywords, ScanKeywordTokens, offset);
/* base_yylex() only needs us to initialize the lookahead token, if any */
if (mode == RAW_PARSE_DEFAULT)
@@ -61,7 +61,8 @@ raw_parser(const char *str, RawParseMode mode)
MODE_PLPGSQL_EXPR, /* RAW_PARSE_PLPGSQL_EXPR */
MODE_PLPGSQL_ASSIGN1, /* RAW_PARSE_PLPGSQL_ASSIGN1 */
MODE_PLPGSQL_ASSIGN2, /* RAW_PARSE_PLPGSQL_ASSIGN2 */
- MODE_PLPGSQL_ASSIGN3 /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_PLPGSQL_ASSIGN3, /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_SINGLE_QUERY /* RAW_PARSE_SINGLE_QUERY */
};
yyextra.have_lookahead = true;
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 9f9d8a1706..2191360a72 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -1189,8 +1189,10 @@ core_yyscan_t
scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens)
+ const uint16 *keyword_tokens,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -1213,13 +1215,20 @@ scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e91db69830..a45dd602c0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -602,17 +602,130 @@ ProcessClientWriteInterrupt(bool blocked)
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list = NIL;
+ List *result = NIL;
+ int stmt_len, offset;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- if (parser_hook)
- raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
- else
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ stmt_len = 0; /* lazily computed when needed */
+ offset = 0;
+
+ while(true)
+ {
+ List *raw_parsetree_list;
+ RawStmt *raw;
+ bool error = false;
+
+ /*----------------
+ * Start parsing the input string. If a third-party module provided a
+ * parser_hook, we switch to single-query parsing so multi-query
+ * commands using different grammar can work properly.
+ * If the third-party modules support the full set of SQL we support,
+ * or want to prevent fallback on the core parser, it can ignore the
+ * RAW_PARSE_SINGLE_QUERY flag and parse the full query string.
+ * In that case they must return a List with more than one RawStmt or a
+ * single RawStmt with a 0 length to stop the parsing phase, or raise
+ * an ERROR.
+ *
+ * Otherwise, plugins should parse a single query only and always
+ * return a List containing a single RawStmt with a properly set length
+ * (possibly 0 if it was a single query without end of query
+ * delimiter). If the command is valid but doesn't contain any
+ * statements (e.g. a single semi-colon), a single RawStmt with a NULL
+ * stmt field should be returned, containing the consumed query string
+ * length so we can move to the next command in a single pass rather
+ * than 1 byte at a time.
+ *
+ * Also, third-party modules can choose to ignore some or all of
+ * parsing error if they want to implement only subset of postgres
+ * suppoted syntax, or even a totally different syntax, and fall-back
+ * on core grammar for unhandled case. In thase case, they should set
+ * the error flag to true. The returned List will be ignored and the
+ * same offset of the input string will be parsed using the core
+ * parser.
+ *
+ * Finally, note that third-party modules that wants to fallback on
+ * other grammar should first try to call a previous parser hook if any
+ * before setting the error switch and returning .
+ */
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string,
+ RAW_PARSE_SINGLE_QUERY,
+ offset,
+ &error);
+
+ /*
+ * If a third-party module couldn't parse a single query or if no
+ * third-party module is configured, fallback on core parser.
+ */
+ if (error || !parser_hook)
+ raw_parsetree_list = raw_parser(query_string,
+ error ? RAW_PARSE_SINGLE_QUERY : RAW_PARSE_DEFAULT, offset);
+
+ /*
+ * If there are no third-party plugin, or none of the parsers found a
+ * valid query, or if a third party module consumed the whole
+ * query string we're done.
+ */
+ if (!parser_hook || raw_parsetree_list == NIL ||
+ list_length(raw_parsetree_list) > 1)
+ {
+ /*
+ * Warn third-party plugins if they mix "single query" and "whole
+ * input string" strategy rather than silently accepting it and
+ * maybe allow fallback on core grammar even if they want to avoid
+ * that. This way plugin authors can be warned early of the issue.
+ */
+ if (result != NIL)
+ {
+ Assert(parser_hook != NULL);
+ elog(ERROR, "parser_hook should parse a single statement at "
+ "a time or consume the whole input string at once");
+ }
+ result = raw_parsetree_list;
+ break;
+ }
+
+ if (stmt_len == 0)
+ stmt_len = strlen(query_string);
+
+ raw = linitial_node(RawStmt, raw_parsetree_list);
+
+ /*
+ * In single-query mode, the parser will return statement location info
+ * relative to the beginning of complete original string, not the part
+ * we just parsed, so adjust the location info.
+ */
+ if (offset > 0 && raw->stmt_len > 0)
+ {
+ Assert(raw->stmt_len > offset);
+ raw->stmt_location = offset;
+ raw->stmt_len -= offset;
+ }
+
+ /* Ignore the statement if it didn't contain any command. */
+ if (raw->stmt)
+ result = lappend(result, raw);
+
+ if (raw->stmt_len == 0)
+ {
+ /* The statement was the whole string, we're done. */
+ break;
+ }
+ else if (raw->stmt_len + offset >= stmt_len)
+ {
+ /* We consumed all of the input string, we're done. */
+ break;
+ }
+ else
+ {
+ /* Advance the offset to the next command. */
+ offset += raw->stmt_len;
+ }
+ }
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
@@ -620,13 +733,13 @@ pg_parse_query(const char *query_string)
#ifdef COPY_PARSE_PLAN_TREES
/* Optional debugging check: pass raw parsetrees through copyObject() */
{
- List *new_list = copyObject(raw_parsetree_list);
+ List *new_list = copyObject(result);
/* This checks both copyObject() and the equal() routines... */
- if (!equal(new_list, raw_parsetree_list))
+ if (!equal(new_list, result))
elog(WARNING, "copyObject() failed to produce an equal raw parse tree");
else
- raw_parsetree_list = new_list;
+ result = new_list;
}
#endif
@@ -638,7 +751,7 @@ pg_parse_query(const char *query_string)
TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
- return raw_parsetree_list;
+ return result;
}
/*
diff --git a/src/include/parser/parser.h b/src/include/parser/parser.h
index 853b0f1606..5694ae791a 100644
--- a/src/include/parser/parser.h
+++ b/src/include/parser/parser.h
@@ -41,7 +41,8 @@ typedef enum
RAW_PARSE_PLPGSQL_EXPR,
RAW_PARSE_PLPGSQL_ASSIGN1,
RAW_PARSE_PLPGSQL_ASSIGN2,
- RAW_PARSE_PLPGSQL_ASSIGN3
+ RAW_PARSE_PLPGSQL_ASSIGN3,
+ RAW_PARSE_SINGLE_QUERY
} RawParseMode;
/* Values for the backslash_quote GUC */
@@ -59,7 +60,7 @@ extern PGDLLIMPORT bool standard_conforming_strings;
/* Primary entry point for the raw parsing functions */
-extern List *raw_parser(const char *str, RawParseMode mode);
+extern List *raw_parser(const char *str, RawParseMode mode, int offset);
/* Utility functions exported by gram.y (perhaps these should be elsewhere) */
extern List *SystemFuncName(char *name);
diff --git a/src/include/parser/scanner.h b/src/include/parser/scanner.h
index 0d8182faa0..2747e8b1a0 100644
--- a/src/include/parser/scanner.h
+++ b/src/include/parser/scanner.h
@@ -136,7 +136,8 @@ extern PGDLLIMPORT const uint16 ScanKeywordTokens[];
extern core_yyscan_t scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens);
+ const uint16 *keyword_tokens,
+ int offset);
extern void scanner_finish(core_yyscan_t yyscanner);
extern int core_yylex(core_YYSTYPE *lvalp, YYLTYPE *llocp,
core_yyscan_t yyscanner);
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 131dc2b22e..27201dde1d 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -45,7 +45,8 @@ typedef enum
extern PGDLLIMPORT int log_statement;
/* Hook for plugins to get control in pg_parse_query() */
-typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode,
+ int offset, bool *error);
extern PGDLLIMPORT parser_hook_type parser_hook;
extern List *pg_parse_query(const char *query_string);
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 34e0520719..6e09f01370 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3690,7 +3690,7 @@ check_sql_expr(const char *stmt, RawParseMode parseMode, int location)
error_context_stack = &syntax_errcontext;
oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt);
- (void) raw_parser(stmt, parseMode);
+ (void) raw_parser(stmt, parseMode, 0);
MemoryContextSwitchTo(oldCxt);
/* Restore former ereport callback */
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index e4c7a91ab5..a2886c42ec 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -587,7 +587,7 @@ plpgsql_scanner_init(const char *str)
{
/* Start up the core scanner */
yyscanner = scanner_init(str, &core_yy,
- &ReservedPLKeywords, ReservedPLKeywordTokens);
+ &ReservedPLKeywords, ReservedPLKeywordTokens, 0);
/*
* scanorig points to the original string, which unlike the scanner's
--
2.30.1
v1-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload
From d2e263f181bcab99acc6ab25a02c9985c0e95810 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 02:15:54 +0800
Subject: [PATCH v1 4/4] Teach sqlol to use the new MODE_SINGLE_QUERY parser
mode.
This way multi-statements commands using both core parser and sqlol parser can
be supported.
Also add a LOLCODE version of CREATE VIEW viewname AS to easily test
multi-statements commands.
---
contrib/sqlol/Makefile | 2 +
contrib/sqlol/expected/01_sqlol.out | 74 +++++++++++++++++++++++++++++
contrib/sqlol/repro.sql | 18 +++++++
contrib/sqlol/sql/01_sqlol.sql | 40 ++++++++++++++++
contrib/sqlol/sqlol.c | 24 ++++++----
contrib/sqlol/sqlol_gram.y | 63 ++++++++++++------------
contrib/sqlol/sqlol_kwlist.h | 1 +
contrib/sqlol/sqlol_scan.l | 13 ++++-
contrib/sqlol/sqlol_scanner.h | 3 +-
9 files changed, 192 insertions(+), 46 deletions(-)
create mode 100644 contrib/sqlol/expected/01_sqlol.out
create mode 100644 contrib/sqlol/repro.sql
create mode 100644 contrib/sqlol/sql/01_sqlol.sql
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
index 025e77c4ff..554fe91eae 100644
--- a/contrib/sqlol/Makefile
+++ b/contrib/sqlol/Makefile
@@ -6,6 +6,8 @@ OBJS = \
sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+REGRESS = 01_sqlol
+
sqlol_gram.h: sqlol_gram.c
touch $@
diff --git a/contrib/sqlol/expected/01_sqlol.out b/contrib/sqlol/expected/01_sqlol.out
new file mode 100644
index 0000000000..a18eaf6801
--- /dev/null
+++ b/contrib/sqlol/expected/01_sqlol.out
@@ -0,0 +1,74 @@
+LOAD 'sqlol';
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+ id | val
+----+-----
+(0 rows)
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+ ?column?
+----------
+ 3
+(1 row)
+
+-- test empty statement ignoring
+\;\;select 1 \g
+ ?column?
+----------
+ 1
+(1 row)
+
+-- check the created views
+\d
+ List of relations
+ Schema | Name | Type | Owner
+--------+------+-------+-------
+ public | t1 | table | rjuju
+ public | v0 | view | rjuju
+ public | v1 | view | rjuju
+ public | v2 | view | rjuju
+ public | v3 | view | rjuju
+ public | v4 | view | rjuju
+ public | v5 | view | rjuju
+(7 rows)
+
+--
+-- Error position
+--
+SELECT 1\;err;
+ERROR: syntax error at or near "err"
+LINE 1: SELECT 1;err;
+ ^
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+ERROR: syntax error at or near "HAI"
+LINE 1: SELECT 1;HAI 1.2 I HAS A t1 GIMME id KTHXBYE
+ ^
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+ERROR: improper qualified name (too many dotted names): some.thing.public.t1
+LINE 1: SELECT 1;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHX...
+ ^
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
+ERROR: relation "notatable" does not exist
+LINE 1: SELECT 1;SELECT * FROM notatable;
+ ^
diff --git a/contrib/sqlol/repro.sql b/contrib/sqlol/repro.sql
new file mode 100644
index 0000000000..0ebcb53160
--- /dev/null
+++ b/contrib/sqlol/repro.sql
@@ -0,0 +1,18 @@
+DROP TABLE IF EXISTS t1 CASCADE;
+
+LOAD 'sqlol';
+
+\;\; SELECT 1\;
+
+CREATE TABLE t1 (id integer, val text);
+
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+SELECT 1\;SELECT 2\;SELECT 3 \g
+\d
diff --git a/contrib/sqlol/sql/01_sqlol.sql b/contrib/sqlol/sql/01_sqlol.sql
new file mode 100644
index 0000000000..918caf94c0
--- /dev/null
+++ b/contrib/sqlol/sql/01_sqlol.sql
@@ -0,0 +1,40 @@
+LOAD 'sqlol';
+
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+
+-- test empty statement ignoring
+\;\;select 1 \g
+
+-- check the created views
+\d
+
+--
+-- Error position
+--
+SELECT 1\;err;
+
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
index b986966181..7d4e1b631f 100644
--- a/contrib/sqlol/sqlol.c
+++ b/contrib/sqlol/sqlol.c
@@ -26,7 +26,8 @@ static parser_hook_type prev_parser_hook = NULL;
void _PG_init(void);
void _PG_fini(void);
-static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+static List *sqlol_parser_hook(const char *str, RawParseMode mode, int offset,
+ bool *error);
/*
@@ -54,23 +55,25 @@ _PG_fini(void)
* sqlol_parser_hook: parse our grammar
*/
static List *
-sqlol_parser_hook(const char *str, RawParseMode mode)
+sqlol_parser_hook(const char *str, RawParseMode mode, int offset, bool *error)
{
sqlol_yyscan_t yyscanner;
sqlol_base_yy_extra_type yyextra;
int yyresult;
- if (mode != RAW_PARSE_DEFAULT)
+ if (mode != RAW_PARSE_DEFAULT && mode != RAW_PARSE_SINGLE_QUERY)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
/* initialize the flex scanner */
yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
- sqlol_ScanKeywords, sqlol_NumScanKeywords);
+ sqlol_ScanKeywords, sqlol_NumScanKeywords,
+ offset);
/* initialize the bison parser */
sqlol_parser_init(&yyextra);
@@ -88,9 +91,10 @@ sqlol_parser_hook(const char *str, RawParseMode mode)
if (yyresult)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
return yyextra.parsetree;
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
index 64d00d14ca..4c36cfef5e 100644
--- a/contrib/sqlol/sqlol_gram.y
+++ b/contrib/sqlol/sqlol_gram.y
@@ -20,6 +20,7 @@
#include "catalog/namespace.h"
#include "nodes/makefuncs.h"
+#include "catalog/pg_class_d.h"
#include "sqlol_gramparse.h"
@@ -106,10 +107,10 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
ResTarget *target;
}
-%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+%type <node> stmt toplevel_stmt GimmehStmt MaekStmt simple_gimmeh columnref
indirection_el
-%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+%type <list> parse_toplevel rawstmt gimmeh_list indirection
%type <range> qualified_name
@@ -134,22 +135,19 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
*/
/* ordinary key words in alphabetical order */
-%token <keyword> A GIMMEH HAI HAS I KTHXBYE
-
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE MAEK
%%
/*
* The target production for the whole parse.
- *
- * Ordinarily we parse a list of statements, but if we see one of the
- * special MODE_XXX symbols as first token, we parse something else.
- * The options here correspond to enum RawParseMode, which see for details.
*/
parse_toplevel:
- stmtmulti
+ rawstmt
{
pg_yyget_extra(yyscanner)->parsetree = $1;
+
+ YYACCEPT;
}
;
@@ -163,24 +161,11 @@ parse_toplevel:
* we'd get -1 for the location in such cases.
* We also take care to discard empty statements entirely.
*/
-stmtmulti: stmtmulti KTHXBYE toplevel_stmt
- {
- if ($1 != NIL)
- {
- /* update length of previous stmt */
- updateRawStmtEnd(llast_node(RawStmt, $1), @2);
- }
- if ($3 != NULL)
- $$ = lappend($1, makeRawStmt($3, @2 + 1));
- else
- $$ = $1;
- }
- | toplevel_stmt
+rawstmt: toplevel_stmt KTHXBYE
{
- if ($1 != NULL)
- $$ = list_make1(makeRawStmt($1, 0));
- else
- $$ = NIL;
+ RawStmt *raw = makeRawStmt($1, 0);
+ updateRawStmtEnd(raw, @2 + 7);
+ $$ = list_make1(raw);
}
;
@@ -189,13 +174,12 @@ stmtmulti: stmtmulti KTHXBYE toplevel_stmt
* those words have different meanings in function bodys.
*/
toplevel_stmt:
- stmt
+ HAI FCONST stmt { $$ = $3; }
;
stmt:
GimmehStmt
- | /*EMPTY*/
- { $$ = NULL; }
+ | MaekStmt
;
/*****************************************************************************
@@ -209,12 +193,11 @@ GimmehStmt:
;
simple_gimmeh:
- HAI FCONST I HAS A qualified_name
- GIMMEH gimmeh_list
+ I HAS A qualified_name GIMMEH gimmeh_list
{
SelectStmt *n = makeNode(SelectStmt);
- n->targetList = $8;
- n->fromClause = list_make1($6);
+ n->targetList = $6;
+ n->fromClause = list_make1($4);
$$ = (Node *)n;
}
;
@@ -233,6 +216,20 @@ gimmeh_el:
$$->location = @1;
}
+MaekStmt:
+ MAEK GimmehStmt A qualified_name
+ {
+ ViewStmt *n = makeNode(ViewStmt);
+ n->view = $4;
+ n->view->relpersistence = RELPERSISTENCE_PERMANENT;
+ n->aliases = NIL;
+ n->query = $2;
+ n->replace = false;
+ n->options = NIL;
+ n->withCheckOption = false;
+ $$ = (Node *) n;
+ }
+
qualified_name:
ColId
{
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
index 2de3893ee4..8b50d88df9 100644
--- a/contrib/sqlol/sqlol_kwlist.h
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -19,3 +19,4 @@ PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
+PG_KEYWORD("maek", MAEK, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
index a7088b8390..e6d4d53446 100644
--- a/contrib/sqlol/sqlol_scan.l
+++ b/contrib/sqlol/sqlol_scan.l
@@ -412,8 +412,10 @@ sqlol_yyscan_t
sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords)
+ int num_keywords,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -432,13 +434,20 @@ sqlol_scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
index 0a497e9d91..57f95867ee 100644
--- a/contrib/sqlol/sqlol_scanner.h
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -108,7 +108,8 @@ extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords);
+ int num_keywords,
+ int offset);
extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
sqlol_yyscan_t yyscanner);
--
2.30.1
On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:
I'm attaching some POC patches that implement this approach to start a
discussion.
I just noticed that the cfbot fails with the v1 patch. Attached v2 that should
fix that.
Attachments:
v2-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload
From db68ec4dd0f0590db2275f0ca99ec24948983462 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 22:47:18 +0800
Subject: [PATCH v2 1/4] Add a parser_hook hook.
This does nothing but allow third-party plugins to implement a different
syntax, and fallback on the core parser if they don't implement a superset of
the supported core syntax.
---
src/backend/tcop/postgres.c | 16 ++++++++++++++--
src/include/tcop/tcopprot.h | 5 +++++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..e941b59b85 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -99,6 +99,9 @@ int log_statement = LOGSTMT_NONE;
/* GUC variable for maximum stack depth (measured in kilobytes) */
int max_stack_depth = 100;
+/* Hook for plugins to get control in pg_parse_query() */
+parser_hook_type parser_hook = NULL;
+
/* wait N seconds to allow attach from a debugger */
int PostAuthDelay = 0;
@@ -589,18 +592,27 @@ ProcessClientWriteInterrupt(bool blocked)
* database tables. So, we rely on the raw parser to determine whether
* we've seen a COMMIT or ABORT command; when we are in abort state, other
* commands are not processed any further than the raw parse stage.
+ *
+ * To support loadable plugins that monitor the parsing or implements SQL
+ * syntactic sugar we provide a hook variable that lets a plugin get control
+ * before and after the standard parsing process. If the plugin only implement
+ * a subset of postgres supported syntax, it's its duty to call raw_parser (or
+ * the previous hook if any) for the statements it doesn't understand.
*/
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list;
+ List *raw_parsetree_list = NIL;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
+ else
+ raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 968345404e..131dc2b22e 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -17,6 +17,7 @@
#include "nodes/params.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
+#include "parser/parser.h"
#include "storage/procsignal.h"
#include "utils/guc.h"
#include "utils/queryenvironment.h"
@@ -43,6 +44,10 @@ typedef enum
extern PGDLLIMPORT int log_statement;
+/* Hook for plugins to get control in pg_parse_query() */
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+extern PGDLLIMPORT parser_hook_type parser_hook;
+
extern List *pg_parse_query(const char *query_string);
extern List *pg_rewrite_query(Query *query);
extern List *pg_analyze_and_rewrite(RawStmt *parsetree,
--
2.31.1
v2-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload
From 3fb28880df308894cd2ce114f4b3cd97c90c03d6 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 23:54:02 +0800
Subject: [PATCH v2 2/4] Add a sqlol parser.
This is a toy example of alternative grammar that only accept a LOLCODE
compatible version of a
SELECT [column, ] column FROM tablename
and fallback on the core parser for everything else.
---
contrib/Makefile | 1 +
contrib/sqlol/.gitignore | 7 +
contrib/sqlol/Makefile | 33 ++
contrib/sqlol/sqlol.c | 107 +++++++
contrib/sqlol/sqlol_gram.y | 440 ++++++++++++++++++++++++++
contrib/sqlol/sqlol_gramparse.h | 61 ++++
contrib/sqlol/sqlol_keywords.c | 98 ++++++
contrib/sqlol/sqlol_keywords.h | 38 +++
contrib/sqlol/sqlol_kwlist.h | 21 ++
contrib/sqlol/sqlol_scan.l | 544 ++++++++++++++++++++++++++++++++
contrib/sqlol/sqlol_scanner.h | 118 +++++++
11 files changed, 1468 insertions(+)
create mode 100644 contrib/sqlol/.gitignore
create mode 100644 contrib/sqlol/Makefile
create mode 100644 contrib/sqlol/sqlol.c
create mode 100644 contrib/sqlol/sqlol_gram.y
create mode 100644 contrib/sqlol/sqlol_gramparse.h
create mode 100644 contrib/sqlol/sqlol_keywords.c
create mode 100644 contrib/sqlol/sqlol_keywords.h
create mode 100644 contrib/sqlol/sqlol_kwlist.h
create mode 100644 contrib/sqlol/sqlol_scan.l
create mode 100644 contrib/sqlol/sqlol_scanner.h
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..2a80cd137b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
postgres_fdw \
seg \
spi \
+ sqlol \
tablefunc \
tcn \
test_decoding \
diff --git a/contrib/sqlol/.gitignore b/contrib/sqlol/.gitignore
new file mode 100644
index 0000000000..3c4b587792
--- /dev/null
+++ b/contrib/sqlol/.gitignore
@@ -0,0 +1,7 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
+sqlol_gram.c
+sqlol_gram.h
+sqlol_scan.c
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
new file mode 100644
index 0000000000..025e77c4ff
--- /dev/null
+++ b/contrib/sqlol/Makefile
@@ -0,0 +1,33 @@
+# contrib/sqlol/Makefile
+
+MODULE_big = sqlol
+OBJS = \
+ $(WIN32RES) \
+ sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
+PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+
+sqlol_gram.h: sqlol_gram.c
+ touch $@
+
+sqlol_gram.c: BISONFLAGS += -d
+# sqlol_gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/src/include/parser/kwlist.h
+
+
+sqlol_scan.c: FLEXFLAGS = -CF -p -p
+sqlol_scan.c: FLEX_NO_BACKUP=yes
+sqlol_scan.c: FLEX_FIX_WARNING=yes
+
+
+# Force these dependencies to be known even without dependency info built:
+sqlol_gram.o sqlol_scan.o parser.o: sqlol_gram.h
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/sqlol
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
new file mode 100644
index 0000000000..b986966181
--- /dev/null
+++ b/contrib/sqlol/sqlol.c
@@ -0,0 +1,107 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol.c
+ *
+ *
+ * Copyright (c) 2008-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "tcop/tcopprot.h"
+
+#include "sqlol_gramparse.h"
+#include "sqlol_keywords.h"
+
+PG_MODULE_MAGIC;
+
+
+/* Saved hook values in case of unload */
+static parser_hook_type prev_parser_hook = NULL;
+
+void _PG_init(void);
+void _PG_fini(void);
+
+static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+ /* Install hooks. */
+ prev_parser_hook = parser_hook;
+ parser_hook = sqlol_parser_hook;
+}
+
+/*
+ * Module unload callback
+ */
+void
+_PG_fini(void)
+{
+ /* Uninstall hooks. */
+ parser_hook = prev_parser_hook;
+}
+
+/*
+ * sqlol_parser_hook: parse our grammar
+ */
+static List *
+sqlol_parser_hook(const char *str, RawParseMode mode)
+{
+ sqlol_yyscan_t yyscanner;
+ sqlol_base_yy_extra_type yyextra;
+ int yyresult;
+
+ if (mode != RAW_PARSE_DEFAULT)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ /* initialize the flex scanner */
+ yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
+ sqlol_ScanKeywords, sqlol_NumScanKeywords);
+
+ /* initialize the bison parser */
+ sqlol_parser_init(&yyextra);
+
+ /* Parse! */
+ yyresult = sqlol_base_yyparse(yyscanner);
+
+ /* Clean up (release memory) */
+ sqlol_scanner_finish(yyscanner);
+
+ /*
+ * Invalid statement, fallback on previous parser_hook if any or
+ * raw_parser()
+ */
+ if (yyresult)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ return yyextra.parsetree;
+}
+
+int
+sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, sqlol_yyscan_t yyscanner)
+{
+ int cur_token;
+
+ cur_token = sqlol_yylex(&(lvalp->sqlol_yystype), llocp, yyscanner);
+
+ return cur_token;
+}
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
new file mode 100644
index 0000000000..64d00d14ca
--- /dev/null
+++ b/contrib/sqlol/sqlol_gram.y
@@ -0,0 +1,440 @@
+%{
+
+/*#define YYDEBUG 1*/
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gram.y
+ * sqlol BISON rules/actions
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_gram.y
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/namespace.h"
+#include "nodes/makefuncs.h"
+
+#include "sqlol_gramparse.h"
+
+/*
+ * Location tracking support --- simpler than bison's default, since we only
+ * want to track the start position not the end position of each nonterminal.
+ */
+#define YYLLOC_DEFAULT(Current, Rhs, N) \
+ do { \
+ if ((N) > 0) \
+ (Current) = (Rhs)[1]; \
+ else \
+ (Current) = (-1); \
+ } while (0)
+
+/*
+ * The above macro assigns -1 (unknown) as the parse location of any
+ * nonterminal that was reduced from an empty rule, or whose leftmost
+ * component was reduced from an empty rule. This is problematic
+ * for nonterminals defined like
+ * OptFooList: / * EMPTY * / { ... } | OptFooList Foo { ... } ;
+ * because we'll set -1 as the location during the first reduction and then
+ * copy it during each subsequent reduction, leaving us with -1 for the
+ * location even when the list is not empty. To fix that, do this in the
+ * action for the nonempty rule(s):
+ * if (@$ < 0) @$ = @2;
+ * (Although we have many nonterminals that follow this pattern, we only
+ * bother with fixing @$ like this when the nonterminal's parse location
+ * is actually referenced in some rule.)
+ *
+ * A cleaner answer would be to make YYLLOC_DEFAULT scan all the Rhs
+ * locations until it's found one that's not -1. Then we'd get a correct
+ * location for any nonterminal that isn't entirely empty. But this way
+ * would add overhead to every rule reduction, and so far there's not been
+ * a compelling reason to pay that overhead.
+ */
+
+/*
+ * Bison doesn't allocate anything that needs to live across parser calls,
+ * so we can easily have it use palloc instead of malloc. This prevents
+ * memory leaks if we error out during parsing. Note this only works with
+ * bison >= 2.0. However, in bison 1.875 the default is to use alloca()
+ * if possible, so there's not really much problem anyhow, at least if
+ * you're building with gcc.
+ */
+#define YYMALLOC palloc
+#define YYFREE pfree
+
+
+#define parser_yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+#define parser_errposition(pos) sqlol_scanner_errposition(pos, yyscanner)
+
+static void sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner,
+ const char *msg);
+static RawStmt *makeRawStmt(Node *stmt, int stmt_location);
+static void updateRawStmtEnd(RawStmt *rs, int end_location);
+static Node *makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner);
+static void check_qualified_name(List *names, sqlol_yyscan_t yyscanner);
+static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
+
+%}
+
+%pure-parser
+%expect 0
+%name-prefix="sqlol_base_yy"
+%locations
+
+%parse-param {sqlol_yyscan_t yyscanner}
+%lex-param {sqlol_yyscan_t yyscanner}
+
+%union
+{
+ sqlol_YYSTYPE sqlol_yystype;
+ /* these fields must match sqlol_YYSTYPE: */
+ int ival;
+ char *str;
+ const char *keyword;
+
+ List *list;
+ Node *node;
+ Value *value;
+ RangeVar *range;
+ ResTarget *target;
+}
+
+%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+ indirection_el
+
+%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+
+%type <range> qualified_name
+
+%type <str> ColId ColLabel attr_name
+
+%type <target> gimmeh_el
+
+/*
+ * Non-keyword token types. These are hard-wired into the "flex" lexer.
+ * They must be listed first so that their numeric codes do not depend on
+ * the set of keywords. PL/pgSQL depends on this so that it can share the
+ * same lexer. If you add/change tokens here, fix PL/pgSQL to match!
+ *
+ */
+%token <str> IDENT FCONST SCONST Op
+
+/*
+ * If you want to make any keyword changes, update the keyword table in
+ * src/include/parser/kwlist.h and add new keywords to the appropriate one
+ * of the reserved-or-not-so-reserved keyword lists, below; search
+ * this file for "Keyword category lists".
+ */
+
+/* ordinary key words in alphabetical order */
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE
+
+
+%%
+
+/*
+ * The target production for the whole parse.
+ *
+ * Ordinarily we parse a list of statements, but if we see one of the
+ * special MODE_XXX symbols as first token, we parse something else.
+ * The options here correspond to enum RawParseMode, which see for details.
+ */
+parse_toplevel:
+ stmtmulti
+ {
+ pg_yyget_extra(yyscanner)->parsetree = $1;
+ }
+ ;
+
+/*
+ * At top level, we wrap each stmt with a RawStmt node carrying start location
+ * and length of the stmt's text. Notice that the start loc/len are driven
+ * entirely from semicolon locations (@2). It would seem natural to use
+ * @1 or @3 to get the true start location of a stmt, but that doesn't work
+ * for statements that can start with empty nonterminals (opt_with_clause is
+ * the main offender here); as noted in the comments for YYLLOC_DEFAULT,
+ * we'd get -1 for the location in such cases.
+ * We also take care to discard empty statements entirely.
+ */
+stmtmulti: stmtmulti KTHXBYE toplevel_stmt
+ {
+ if ($1 != NIL)
+ {
+ /* update length of previous stmt */
+ updateRawStmtEnd(llast_node(RawStmt, $1), @2);
+ }
+ if ($3 != NULL)
+ $$ = lappend($1, makeRawStmt($3, @2 + 1));
+ else
+ $$ = $1;
+ }
+ | toplevel_stmt
+ {
+ if ($1 != NULL)
+ $$ = list_make1(makeRawStmt($1, 0));
+ else
+ $$ = NIL;
+ }
+ ;
+
+/*
+ * toplevel_stmt includes BEGIN and END. stmt does not include them, because
+ * those words have different meanings in function bodys.
+ */
+toplevel_stmt:
+ stmt
+ ;
+
+stmt:
+ GimmehStmt
+ | /*EMPTY*/
+ { $$ = NULL; }
+ ;
+
+/*****************************************************************************
+ *
+ * GIMMEH statement
+ *
+ *****************************************************************************/
+
+GimmehStmt:
+ simple_gimmeh { $$ = $1; }
+ ;
+
+simple_gimmeh:
+ HAI FCONST I HAS A qualified_name
+ GIMMEH gimmeh_list
+ {
+ SelectStmt *n = makeNode(SelectStmt);
+ n->targetList = $8;
+ n->fromClause = list_make1($6);
+ $$ = (Node *)n;
+ }
+ ;
+
+gimmeh_list:
+ gimmeh_el { $$ = list_make1($1); }
+ | gimmeh_list ',' gimmeh_el { $$ = lappend($1, $3); }
+
+gimmeh_el:
+ columnref
+ {
+ $$ = makeNode(ResTarget);
+ $$->name = NULL;
+ $$->indirection = NIL;
+ $$->val = (Node *)$1;
+ $$->location = @1;
+ }
+
+qualified_name:
+ ColId
+ {
+ $$ = makeRangeVar(NULL, $1, @1);
+ }
+ | ColId indirection
+ {
+ check_qualified_name($2, yyscanner);
+ $$ = makeRangeVar(NULL, NULL, @1);
+ switch (list_length($2))
+ {
+ case 1:
+ $$->catalogname = NULL;
+ $$->schemaname = $1;
+ $$->relname = strVal(linitial($2));
+ break;
+ case 2:
+ $$->catalogname = $1;
+ $$->schemaname = strVal(linitial($2));
+ $$->relname = strVal(lsecond($2));
+ break;
+ default:
+ /*
+ * It's ok to error out here as at this point we
+ * already parsed a "HAI FCONST" preamble, and no
+ * other grammar is likely to accept a command
+ * starting with that, so there's no point trying
+ * to fall back on the other grammars.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("improper qualified name (too many dotted names): %s",
+ NameListToString(lcons(makeString($1), $2))),
+ parser_errposition(@1)));
+ break;
+ }
+ }
+ ;
+
+columnref: ColId
+ {
+ $$ = makeColumnRef($1, NIL, @1, yyscanner);
+ }
+ | ColId indirection
+ {
+ $$ = makeColumnRef($1, $2, @1, yyscanner);
+ }
+ ;
+
+ColId: IDENT { $$ = $1; }
+
+indirection:
+ indirection_el { $$ = list_make1($1); }
+ | indirection indirection_el { $$ = lappend($1, $2); }
+ ;
+
+indirection_el:
+ '.' attr_name
+ {
+ $$ = (Node *) makeString($2);
+ }
+ ;
+
+attr_name: ColLabel { $$ = $1; };
+
+ColLabel: IDENT { $$ = $1; }
+
+%%
+
+/*
+ * The signature of this function is required by bison. However, we
+ * ignore the passed yylloc and instead use the last token position
+ * available from the scanner.
+ */
+static void
+sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner, const char *msg)
+{
+ parser_yyerror(msg);
+}
+
+static RawStmt *
+makeRawStmt(Node *stmt, int stmt_location)
+{
+ RawStmt *rs = makeNode(RawStmt);
+
+ rs->stmt = stmt;
+ rs->stmt_location = stmt_location;
+ rs->stmt_len = 0; /* might get changed later */
+ return rs;
+}
+
+/* Adjust a RawStmt to reflect that it doesn't run to the end of the string */
+static void
+updateRawStmtEnd(RawStmt *rs, int end_location)
+{
+ /*
+ * If we already set the length, don't change it. This is for situations
+ * like "select foo ;; select bar" where the same statement will be last
+ * in the string for more than one semicolon.
+ */
+ if (rs->stmt_len > 0)
+ return;
+
+ /* OK, update length of RawStmt */
+ rs->stmt_len = end_location - rs->stmt_location;
+}
+
+static Node *
+makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner)
+{
+ /*
+ * Generate a ColumnRef node, with an A_Indirection node added if there
+ * is any subscripting in the specified indirection list. However,
+ * any field selection at the start of the indirection list must be
+ * transposed into the "fields" part of the ColumnRef node.
+ */
+ ColumnRef *c = makeNode(ColumnRef);
+ int nfields = 0;
+ ListCell *l;
+
+ c->location = location;
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Indices))
+ {
+ A_Indirection *i = makeNode(A_Indirection);
+
+ if (nfields == 0)
+ {
+ /* easy case - all indirection goes to A_Indirection */
+ c->fields = list_make1(makeString(colname));
+ i->indirection = check_indirection(indirection, yyscanner);
+ }
+ else
+ {
+ /* got to split the list in two */
+ i->indirection = check_indirection(list_copy_tail(indirection,
+ nfields),
+ yyscanner);
+ indirection = list_truncate(indirection, nfields);
+ c->fields = lcons(makeString(colname), indirection);
+ }
+ i->arg = (Node *) c;
+ return (Node *) i;
+ }
+ else if (IsA(lfirst(l), A_Star))
+ {
+ /* We only allow '*' at the end of a ColumnRef */
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ nfields++;
+ }
+ /* No subscripting, so all indirection gets added to field list */
+ c->fields = lcons(makeString(colname), indirection);
+ return (Node *) c;
+}
+
+/* check_qualified_name --- check the result of qualified_name production
+ *
+ * It's easiest to let the grammar production for qualified_name allow
+ * subscripts and '*', which we then must reject here.
+ */
+static void
+check_qualified_name(List *names, sqlol_yyscan_t yyscanner)
+{
+ ListCell *i;
+
+ foreach(i, names)
+ {
+ if (!IsA(lfirst(i), String))
+ parser_yyerror("syntax error");
+ }
+}
+
+/* check_indirection --- check the result of indirection production
+ *
+ * We only allow '*' at the end of the list, but it's hard to enforce that
+ * in the grammar, so do it here.
+ */
+static List *
+check_indirection(List *indirection, sqlol_yyscan_t yyscanner)
+{
+ ListCell *l;
+
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Star))
+ {
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ }
+ return indirection;
+}
+
+/* sqlol_parser_init()
+ * Initialize to parse one query string
+ */
+void
+sqlol_parser_init(sqlol_base_yy_extra_type *yyext)
+{
+ yyext->parsetree = NIL; /* in case grammar forgets to set it */
+}
diff --git a/contrib/sqlol/sqlol_gramparse.h b/contrib/sqlol/sqlol_gramparse.h
new file mode 100644
index 0000000000..58233a8d87
--- /dev/null
+++ b/contrib/sqlol/sqlol_gramparse.h
@@ -0,0 +1,61 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gramparse.h
+ * Shared definitions for the "raw" parser (flex and bison phases only)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_gramparse.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_GRAMPARSE_H
+#define SQLOL_GRAMPARSE_H
+
+#include "nodes/parsenodes.h"
+#include "sqlol_scanner.h"
+
+/*
+ * NB: include gram.h only AFTER including scanner.h, because scanner.h
+ * is what #defines YYLTYPE.
+ */
+#include "sqlol_gram.h"
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around. Private
+ * state needed for raw parsing/lexing goes here.
+ */
+typedef struct sqlol_base_yy_extra_type
+{
+ /*
+ * Fields used by the core scanner.
+ */
+ sqlol_yy_extra_type sqlol_yy_extra;
+
+ /*
+ * State variables that belong to the grammar.
+ */
+ List *parsetree; /* final parse result is delivered here */
+} sqlol_base_yy_extra_type;
+
+/*
+ * In principle we should use yyget_extra() to fetch the yyextra field
+ * from a yyscanner struct. However, flex always puts that field first,
+ * and this is sufficiently performance-critical to make it seem worth
+ * cheating a bit to use an inline macro.
+ */
+#define pg_yyget_extra(yyscanner) (*((sqlol_base_yy_extra_type **) (yyscanner)))
+
+
+/* from parser.c */
+extern int sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+
+/* from gram.y */
+extern void sqlol_parser_init(sqlol_base_yy_extra_type *yyext);
+extern int sqlol_baseyyparse(sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_GRAMPARSE_H */
diff --git a/contrib/sqlol/sqlol_keywords.c b/contrib/sqlol/sqlol_keywords.c
new file mode 100644
index 0000000000..dbbdf5493c
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.c
@@ -0,0 +1,98 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.c
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * sqlol/sqlol_keywords.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "sqlol_gramparse.h"
+
+#define PG_KEYWORD(a,b,c) {a,b,c},
+
+const sqlol_ScanKeyword sqlol_ScanKeywords[] = {
+#include "sqlol_kwlist.h"
+};
+
+const int sqlol_NumScanKeywords = lengthof(sqlol_ScanKeywords);
+
+#undef PG_KEYWORD
+
+
+/*
+ * ScanKeywordLookup - see if a given word is a keyword
+ *
+ * The table to be searched is passed explicitly, so that this can be used
+ * to search keyword lists other than the standard list appearing above.
+ *
+ * Returns a pointer to the sqlol_ScanKeyword table entry, or NULL if no match.
+ *
+ * The match is done case-insensitively. Note that we deliberately use a
+ * dumbed-down case conversion that will only translate 'A'-'Z' into 'a'-'z',
+ * even if we are in a locale where tolower() would produce more or different
+ * translations. This is to conform to the SQL99 spec, which says that
+ * keywords are to be matched in this way even though non-keyword identifiers
+ * receive a different case-normalization mapping.
+ */
+const sqlol_ScanKeyword *
+sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ int len,
+ i;
+ char word[NAMEDATALEN];
+ const sqlol_ScanKeyword *low;
+ const sqlol_ScanKeyword *high;
+
+ len = strlen(text);
+ /* We assume all keywords are shorter than NAMEDATALEN. */
+ if (len >= NAMEDATALEN)
+ return NULL;
+
+ /*
+ * Apply an ASCII-only downcasing. We must not use tolower() since it may
+ * produce the wrong translation in some locales (eg, Turkish).
+ */
+ for (i = 0; i < len; i++)
+ {
+ char ch = text[i];
+
+ if (ch >= 'A' && ch <= 'Z')
+ ch += 'a' - 'A';
+ word[i] = ch;
+ }
+ word[len] = '\0';
+
+ /*
+ * Now do a binary search using plain strcmp() comparison.
+ */
+ low = keywords;
+ high = keywords + (num_keywords - 1);
+ while (low <= high)
+ {
+ const sqlol_ScanKeyword *middle;
+ int difference;
+
+ middle = low + (high - low) / 2;
+ difference = strcmp(middle->name, word);
+ if (difference == 0)
+ return middle;
+ else if (difference < 0)
+ low = middle + 1;
+ else
+ high = middle - 1;
+ }
+
+ return NULL;
+}
+
diff --git a/contrib/sqlol/sqlol_keywords.h b/contrib/sqlol/sqlol_keywords.h
new file mode 100644
index 0000000000..bc4acf4541
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.h
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.h
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_keywords.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SQLOL_KEYWORDS_H
+#define SQLOL_KEYWORDS_H
+
+/* Keyword categories --- should match lists in gram.y */
+#define UNRESERVED_KEYWORD 0
+#define COL_NAME_KEYWORD 1
+#define TYPE_FUNC_NAME_KEYWORD 2
+#define RESERVED_KEYWORD 3
+
+
+typedef struct sqlol_ScanKeyword
+{
+ const char *name; /* in lower case */
+ int16 value; /* grammar's token code */
+ int16 category; /* see codes above */
+} sqlol_ScanKeyword;
+
+extern PGDLLIMPORT const sqlol_ScanKeyword sqlol_ScanKeywords[];
+extern PGDLLIMPORT const int sqlol_NumScanKeywords;
+
+extern const sqlol_ScanKeyword *sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+
+#endif /* SQLOL_KEYWORDS_H */
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
new file mode 100644
index 0000000000..2de3893ee4
--- /dev/null
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_kwlist.h
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_kwlist.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/* name, value, category, is-bare-label */
+PG_KEYWORD("a", A, UNRESERVED_KEYWORD)
+PG_KEYWORD("gimmeh", GIMMEH, UNRESERVED_KEYWORD)
+PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
+PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
+PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
+PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
new file mode 100644
index 0000000000..a7088b8390
--- /dev/null
+++ b/contrib/sqlol/sqlol_scan.l
@@ -0,0 +1,544 @@
+%top{
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scan.l
+ * lexical scanner for sqlol
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_scan.l
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/string.h"
+#include "sqlol_gramparse.h"
+#include "parser/scansup.h"
+#include "mb/pg_wchar.h"
+
+#include "sqlol_keywords.h"
+}
+
+%{
+
+/* LCOV_EXCL_START */
+
+/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
+#undef fprintf
+#define fprintf(file, fmt, msg) fprintf_to_ereport(fmt, msg)
+
+static void
+fprintf_to_ereport(const char *fmt, const char *msg)
+{
+ ereport(ERROR, (errmsg_internal("%s", msg)));
+}
+
+
+/*
+ * Set the type of YYSTYPE.
+ */
+#define YYSTYPE sqlol_YYSTYPE
+
+/*
+ * Set the type of yyextra. All state variables used by the scanner should
+ * be in yyextra, *not* statically allocated.
+ */
+#define YY_EXTRA_TYPE sqlol_yy_extra_type *
+
+/*
+ * Each call to yylex must set yylloc to the location of the found token
+ * (expressed as a byte offset from the start of the input text).
+ * When we parse a token that requires multiple lexer rules to process,
+ * this should be done in the first such rule, else yylloc will point
+ * into the middle of the token.
+ */
+#define SET_YYLLOC() (*(yylloc) = yytext - yyextra->scanbuf)
+
+/*
+ * Advance yylloc by the given number of bytes.
+ */
+#define ADVANCE_YYLLOC(delta) ( *(yylloc) += (delta) )
+
+/*
+ * Sometimes, we do want yylloc to point into the middle of a token; this is
+ * useful for instance to throw an error about an escape sequence within a
+ * string literal. But if we find no error there, we want to revert yylloc
+ * to the token start, so that that's the location reported to the parser.
+ * Use PUSH_YYLLOC/POP_YYLLOC to save/restore yylloc around such code.
+ * (Currently the implied "stack" is just one location, but someday we might
+ * need to nest these.)
+ */
+#define PUSH_YYLLOC() (yyextra->save_yylloc = *(yylloc))
+#define POP_YYLLOC() (*(yylloc) = yyextra->save_yylloc)
+
+#define startlit() ( yyextra->literallen = 0 )
+static void addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner);
+static void addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner);
+static char *litbufdup(sqlol_yyscan_t yyscanner);
+
+#define yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+
+#define lexer_errposition() sqlol_scanner_errposition(*(yylloc), yyscanner)
+
+/*
+ * Work around a bug in flex 2.5.35: it emits a couple of functions that
+ * it forgets to emit declarations for. Since we use -Wmissing-prototypes,
+ * this would cause warnings. Providing our own declarations should be
+ * harmless even when the bug gets fixed.
+ */
+extern int sqlol_yyget_column(yyscan_t yyscanner);
+extern void sqlol_yyset_column(int column_no, yyscan_t yyscanner);
+
+%}
+
+%option reentrant
+%option bison-bridge
+%option bison-locations
+%option 8bit
+%option never-interactive
+%option nodefault
+%option noinput
+%option nounput
+%option noyywrap
+%option noyyalloc
+%option noyyrealloc
+%option noyyfree
+%option warn
+%option prefix="sqlol_yy"
+
+/*
+ * OK, here is a short description of lex/flex rules behavior.
+ * The longest pattern which matches an input string is always chosen.
+ * For equal-length patterns, the first occurring in the rules list is chosen.
+ * INITIAL is the starting state, to which all non-conditional rules apply.
+ * Exclusive states change parsing rules while the state is active. When in
+ * an exclusive state, only those rules defined for that state apply.
+ *
+ * We use exclusive states for quoted strings, extended comments,
+ * and to eliminate parsing troubles for numeric strings.
+ * Exclusive states:
+ * <xd> delimited identifiers (double-quoted identifiers)
+ * <xq> standard quoted strings
+ * <xqs> quote stop (detect continued strings)
+ *
+ * Remember to add an <<EOF>> case whenever you add a new exclusive state!
+ * The default one is probably not the right thing.
+ */
+
+%x xd
+%x xq
+%x xqs
+
+/*
+ * In order to make the world safe for Windows and Mac clients as well as
+ * Unix ones, we accept either \n or \r as a newline. A DOS-style \r\n
+ * sequence will be seen as two successive newlines, but that doesn't cause
+ * any problems. Comments that start with -- and extend to the next
+ * newline are treated as equivalent to a single whitespace character.
+ *
+ * NOTE a fine point: if there is no newline following --, we will absorb
+ * everything to the end of the input as a comment. This is correct. Older
+ * versions of Postgres failed to recognize -- as a comment if the input
+ * did not end with a newline.
+ *
+ * XXX perhaps \f (formfeed) should be treated as a newline as well?
+ *
+ * XXX if you change the set of whitespace characters, fix scanner_isspace()
+ * to agree.
+ */
+
+space [ \t\n\r\f]
+horiz_space [ \t\f]
+newline [\n\r]
+non_newline [^\n\r]
+
+comment ("--"{non_newline}*)
+
+whitespace ({space}+|{comment})
+
+/*
+ * SQL requires at least one newline in the whitespace separating
+ * string literals that are to be concatenated. Silly, but who are we
+ * to argue? Note that {whitespace_with_newline} should not have * after
+ * it, whereas {whitespace} should generally have a * after it...
+ */
+
+special_whitespace ({space}+|{comment}{newline})
+horiz_whitespace ({horiz_space}|{comment})
+whitespace_with_newline ({horiz_whitespace}*{newline}{special_whitespace}*)
+
+quote '
+/* If we see {quote} then {quotecontinue}, the quoted string continues */
+quotecontinue {whitespace_with_newline}{quote}
+
+/*
+ * {quotecontinuefail} is needed to avoid lexer backup when we fail to match
+ * {quotecontinue}. It might seem that this could just be {whitespace}*,
+ * but if there's a dash after {whitespace_with_newline}, it must be consumed
+ * to see if there's another dash --- which would start a {comment} and thus
+ * allow continuation of the {quotecontinue} token.
+ */
+quotecontinuefail {whitespace}*"-"?
+
+/* Extended quote
+ * xqdouble implements embedded quote, ''''
+ */
+xqstart {quote}
+xqdouble {quote}{quote}
+xqinside [^']+
+
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote \"
+xdstart {dquote}
+xdstop {dquote}
+xddouble {dquote}{dquote}
+xdinside [^"]+
+
+digit [0-9]
+ident_start [A-Za-z\200-\377_]
+ident_cont [A-Za-z\200-\377_0-9\$]
+
+identifier {ident_start}{ident_cont}*
+
+decimal (({digit}+)|({digit}*\.{digit}+)|({digit}+\.{digit}*))
+
+other .
+
+%%
+
+{whitespace} {
+ /* ignore */
+ }
+
+
+{xqstart} {
+ yyextra->saw_non_ascii = false;
+ SET_YYLLOC();
+ BEGIN(xq);
+ startlit();
+}
+<xq>{quote} {
+ /*
+ * When we are scanning a quoted string and see an end
+ * quote, we must look ahead for a possible continuation.
+ * If we don't see one, we know the end quote was in fact
+ * the end of the string. To reduce the lexer table size,
+ * we use a single "xqs" state to do the lookahead for all
+ * types of strings.
+ */
+ yyextra->state_before_str_stop = YYSTATE;
+ BEGIN(xqs);
+ }
+<xqs>{quotecontinue} {
+ /*
+ * Found a quote continuation, so return to the in-quote
+ * state and continue scanning the literal. Nothing is
+ * added to the literal's contents.
+ */
+ BEGIN(yyextra->state_before_str_stop);
+ }
+<xqs>{quotecontinuefail} |
+<xqs>{other} |
+<xqs><<EOF>> {
+ /*
+ * Failed to see a quote continuation. Throw back
+ * everything after the end quote, and handle the string
+ * according to the state we were in previously.
+ */
+ yyless(0);
+ BEGIN(INITIAL);
+
+ switch (yyextra->state_before_str_stop)
+ {
+ case xq:
+ /*
+ * Check that the data remains valid, if it might
+ * have been made invalid by unescaping any chars.
+ */
+ if (yyextra->saw_non_ascii)
+ pg_verifymbstr(yyextra->literalbuf,
+ yyextra->literallen,
+ false);
+ yylval->str = litbufdup(yyscanner);
+ return SCONST;
+ default:
+ yyerror("unhandled previous state in xqs");
+ }
+ }
+
+<xq>{xqdouble} {
+ addlitchar('\'', yyscanner);
+ }
+<xq>{xqinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xq><<EOF>> { yyerror("unterminated quoted string"); }
+
+
+{xdstart} {
+ SET_YYLLOC();
+ BEGIN(xd);
+ startlit();
+ }
+<xd>{xdstop} {
+ char *ident;
+
+ BEGIN(INITIAL);
+ if (yyextra->literallen == 0)
+ yyerror("zero-length delimited identifier");
+ ident = litbufdup(yyscanner);
+ if (yyextra->literallen >= NAMEDATALEN)
+ truncate_identifier(ident, yyextra->literallen, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+<xd>{xddouble} {
+ addlitchar('"', yyscanner);
+ }
+<xd>{xdinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xd><<EOF>> { yyerror("unterminated quoted identifier"); }
+
+{decimal} {
+ SET_YYLLOC();
+ yylval->str = pstrdup(yytext);
+ return FCONST;
+ }
+
+{identifier} {
+ const sqlol_ScanKeyword *keyword;
+ char *ident;
+
+ SET_YYLLOC();
+
+ /* Is it a keyword? */
+ keyword = sqlol_ScanKeywordLookup(yytext,
+ yyextra->keywords,
+ yyextra->num_keywords);
+ if (keyword != NULL)
+ {
+ yylval->keyword = keyword->name;
+ return keyword->value;
+ }
+
+ /*
+ * No. Convert the identifier to lower case, and truncate
+ * if necessary.
+ */
+ ident = downcase_truncate_identifier(yytext, yyleng, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+
+{other} {
+ SET_YYLLOC();
+ return yytext[0];
+ }
+
+<<EOF>> {
+ SET_YYLLOC();
+ yyterminate();
+ }
+
+%%
+
+/* LCOV_EXCL_STOP */
+
+/*
+ * Arrange access to yyextra for subroutines of the main yylex() function.
+ * We expect each subroutine to have a yyscanner parameter. Rather than
+ * use the yyget_xxx functions, which might or might not get inlined by the
+ * compiler, we cheat just a bit and cast yyscanner to the right type.
+ */
+#undef yyextra
+#define yyextra (((struct yyguts_t *) yyscanner)->yyextra_r)
+
+/* Likewise for a couple of other things we need. */
+#undef yylloc
+#define yylloc (((struct yyguts_t *) yyscanner)->yylloc_r)
+#undef yyleng
+#define yyleng (((struct yyguts_t *) yyscanner)->yyleng_r)
+
+
+/*
+ * scanner_errposition
+ * Report a lexer or grammar error cursor position, if possible.
+ *
+ * This is expected to be used within an ereport() call. The return value
+ * is a dummy (always 0, in fact).
+ *
+ * Note that this can only be used for messages emitted during raw parsing
+ * (essentially, sqlol_scan.l, sqlol_parser.c, sqlol_and gram.y), since it
+ * requires the yyscanner struct to still be available.
+ */
+int
+sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner)
+{
+ int pos;
+
+ if (location < 0)
+ return 0; /* no-op if location is unknown */
+
+ /* Convert byte offset to character number */
+ pos = pg_mbstrlen_with_len(yyextra->scanbuf, location) + 1;
+ /* And pass it to the ereport mechanism */
+ return errposition(pos);
+}
+
+/*
+ * scanner_yyerror
+ * Report a lexer or grammar error.
+ *
+ * Just ignore as we'll fallback to raw_parser().
+ */
+void
+sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner)
+{
+ return;
+}
+
+
+/*
+ * Called before any actual parsing is done
+ */
+sqlol_yyscan_t
+sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ Size slen = strlen(str);
+ yyscan_t scanner;
+
+ if (yylex_init(&scanner) != 0)
+ elog(ERROR, "yylex_init() failed: %m");
+
+ sqlol_yyset_extra(yyext, scanner);
+
+ yyext->keywords = keywords;
+ yyext->num_keywords = num_keywords;
+
+ /*
+ * Make a scan buffer with special termination needed by flex.
+ */
+ yyext->scanbuf = (char *) palloc(slen + 2);
+ yyext->scanbuflen = slen;
+ memcpy(yyext->scanbuf, str, slen);
+ yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
+ yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+
+ /* initialize literal buffer to a reasonable but expansible size */
+ yyext->literalalloc = 1024;
+ yyext->literalbuf = (char *) palloc(yyext->literalalloc);
+ yyext->literallen = 0;
+
+ return scanner;
+}
+
+
+/*
+ * Called after parsing is done to clean up after scanner_init()
+ */
+void
+sqlol_scanner_finish(sqlol_yyscan_t yyscanner)
+{
+ /*
+ * We don't bother to call yylex_destroy(), because all it would do is
+ * pfree a small amount of control storage. It's cheaper to leak the
+ * storage until the parsing context is destroyed. The amount of space
+ * involved is usually negligible compared to the output parse tree
+ * anyway.
+ *
+ * We do bother to pfree the scanbuf and literal buffer, but only if they
+ * represent a nontrivial amount of space. The 8K cutoff is arbitrary.
+ */
+ if (yyextra->scanbuflen >= 8192)
+ pfree(yyextra->scanbuf);
+ if (yyextra->literalalloc >= 8192)
+ pfree(yyextra->literalbuf);
+}
+
+
+static void
+addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + yleng) >= yyextra->literalalloc)
+ {
+ do
+ {
+ yyextra->literalalloc *= 2;
+ } while ((yyextra->literallen + yleng) >= yyextra->literalalloc);
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ memcpy(yyextra->literalbuf + yyextra->literallen, ytext, yleng);
+ yyextra->literallen += yleng;
+}
+
+
+static void
+addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + 1) >= yyextra->literalalloc)
+ {
+ yyextra->literalalloc *= 2;
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ yyextra->literalbuf[yyextra->literallen] = ychar;
+ yyextra->literallen += 1;
+}
+
+
+/*
+ * Create a palloc'd copy of literalbuf, adding a trailing null.
+ */
+static char *
+litbufdup(sqlol_yyscan_t yyscanner)
+{
+ int llen = yyextra->literallen;
+ char *new;
+
+ new = palloc(llen + 1);
+ memcpy(new, yyextra->literalbuf, llen);
+ new[llen] = '\0';
+ return new;
+}
+
+/*
+ * Interface functions to make flex use palloc() instead of malloc().
+ * It'd be better to make these static, but flex insists otherwise.
+ */
+
+void *
+sqlol_yyalloc(yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ return palloc(bytes);
+}
+
+void *
+sqlol_yyrealloc(void *ptr, yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ return repalloc(ptr, bytes);
+ else
+ return palloc(bytes);
+}
+
+void
+sqlol_yyfree(void *ptr, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ pfree(ptr);
+}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
new file mode 100644
index 0000000000..0a497e9d91
--- /dev/null
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scanner.h
+ * API for the core scanner (flex machine)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_scanner.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_SCANNER_H
+#define SQLOL_SCANNER_H
+
+#include "sqlol_keywords.h"
+
+/*
+ * The scanner returns extra data about scanned tokens in this union type.
+ * Note that this is a subset of the fields used in YYSTYPE of the bison
+ * parsers built atop the scanner.
+ */
+typedef union sqlol_YYSTYPE
+{
+ int ival; /* for integer literals */
+ char *str; /* for identifiers and non-integer literals */
+ const char *keyword; /* canonical spelling of keywords */
+} sqlol_YYSTYPE;
+
+/*
+ * We track token locations in terms of byte offsets from the start of the
+ * source string, not the column number/line number representation that
+ * bison uses by default. Also, to minimize overhead we track only one
+ * location (usually the first token location) for each construct, not
+ * the beginning and ending locations as bison does by default. It's
+ * therefore sufficient to make YYLTYPE an int.
+ */
+#define YYLTYPE int
+
+/*
+ * Another important component of the scanner's API is the token code numbers.
+ * However, those are not defined in this file, because bison insists on
+ * defining them for itself. The token codes used by the core scanner are
+ * the ASCII characters plus these:
+ * %token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
+ * %token <ival> ICONST PARAM
+ * %token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
+ * %token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+ * The above token definitions *must* be the first ones declared in any
+ * bison parser built atop this scanner, so that they will have consistent
+ * numbers assigned to them (specifically, IDENT = 258 and so on).
+ */
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around.
+ * Private state needed by the core scanner goes here. Note that the actual
+ * yy_extra struct may be larger and have this as its first component, thus
+ * allowing the calling parser to keep some fields of its own in YY_EXTRA.
+ */
+typedef struct sqlol_yy_extra_type
+{
+ /*
+ * The string the scanner is physically scanning. We keep this mainly so
+ * that we can cheaply compute the offset of the current token (yytext).
+ */
+ char *scanbuf;
+ Size scanbuflen;
+
+ /*
+ * The keyword list to use, and the associated grammar token codes.
+ */
+ const sqlol_ScanKeyword *keywords;
+ int num_keywords;
+
+ /*
+ * literalbuf is used to accumulate literal values when multiple rules are
+ * needed to parse a single literal. Call startlit() to reset buffer to
+ * empty, addlit() to add text. NOTE: the string in literalbuf is NOT
+ * necessarily null-terminated, but there always IS room to add a trailing
+ * null at offset literallen. We store a null only when we need it.
+ */
+ char *literalbuf; /* palloc'd expandable buffer */
+ int literallen; /* actual current string length */
+ int literalalloc; /* current allocated buffer size */
+
+ /*
+ * Random assorted scanner state.
+ */
+ int state_before_str_stop; /* start cond. before end quote */
+ YYLTYPE save_yylloc; /* one-element stack for PUSH_YYLLOC() */
+
+ /* state variables for literal-lexing warnings */
+ bool saw_non_ascii;
+} sqlol_yy_extra_type;
+
+/*
+ * The type of yyscanner is opaque outside scan.l.
+ */
+typedef void *sqlol_yyscan_t;
+
+
+/* Constant data exported from parser/scan.l */
+extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
+
+/* Entry points in parser/scan.l */
+extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
+extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+extern int sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner);
+extern void sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_SCANNER_H */
--
2.31.1
v2-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload
From 767ef007e65e44a5a2db7018fb759b29796c5f41 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 01:33:42 +0800
Subject: [PATCH v2 3/4] Add a new MODE_SINGLE_QUERY to the core parser and use
it in pg_parse_query.
If a third-party module provides a parser_hook, pg_parse_query() switches to
single-query parsing so multi-query commands using different grammar can work
properly. If the third-party module supports the full set of SQL we support,
or want to prevent fallback on the core parser, it can ignore the
MODE_SINGLE_QUERY mode and parse the full query string. In that case they must
return a List with more than one RawStmt or a single RawStmt with a 0 length to
stop the parsing phase, or raise an ERROR.
Otherwise, plugins should parse a single query only and always return a List
containing a single RawStmt with a properly set length (possibly 0 if it was a
single query without end of query delimiter). If the command is valid but
doesn't contain any statements (e.g. a single semi-colon), a single RawStmt
with a NULL stmt field should be returned, containing the consumed query string
length so we can move to the next command in a single pass rather than 1 byte
at a time.
Also, third-party modules can choose to ignore some or all of parsing error if
they want to implement only subset of postgres suppoted syntax, or even a
totally different syntax, and fall-back on core grammar for unhandled case. In
thase case, they should set the error flag to true. The returned List will be
ignored and the same offset of the input string will be parsed using the core
parser.
Finally, note that third-party plugins that wants to fallback on other grammar
should first try to call a previous parser hook if any before setting the error
switch and returning.
---
.../pg_stat_statements/pg_stat_statements.c | 3 +-
src/backend/commands/tablecmds.c | 2 +-
src/backend/executor/spi.c | 4 +-
src/backend/parser/gram.y | 29 +++-
src/backend/parser/parse_type.c | 2 +-
src/backend/parser/parser.c | 15 +-
src/backend/parser/scan.l | 26 +++-
src/backend/tcop/postgres.c | 138 ++++++++++++++++--
src/include/parser/parser.h | 5 +-
src/include/parser/scanner.h | 6 +-
src/include/tcop/tcopprot.h | 3 +-
src/pl/plpgsql/src/pl_gram.y | 2 +-
src/pl/plpgsql/src/pl_scanner.c | 2 +-
13 files changed, 210 insertions(+), 27 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 09433c8c96..d852575613 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2718,7 +2718,8 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
yyscanner = scanner_init(query,
&yyextra,
&ScanKeywords,
- ScanKeywordTokens);
+ ScanKeywordTokens,
+ 0);
/* we don't want to re-emit any escape string warnings */
yyextra.escape_string_warning = false;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 028e8ac46b..284933c693 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12677,7 +12677,7 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
* parse_analyze() or the rewriter, but instead we need to pass them
* through parse_utilcmd.c to make them ready for execution.
*/
- raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT);
+ raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT, 0);
querytree_list = NIL;
foreach(list_item, raw_parsetree_list)
{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index b8bd05e894..f05b3ce9e7 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2120,7 +2120,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Do parse analysis and rule rewrite for each raw parsetree, storing the
@@ -2228,7 +2228,7 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Construct plancache entries, but don't do parse analysis yet.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9ee90e3f13..2cac062ef4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -626,7 +626,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
%token <ival> ICONST PARAM
%token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
-%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS END_OF_FILE
/*
* If you want to make any keyword changes, update the keyword table in
@@ -753,6 +753,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token MODE_PLPGSQL_ASSIGN1
%token MODE_PLPGSQL_ASSIGN2
%token MODE_PLPGSQL_ASSIGN3
+%token MODE_SINGLE_QUERY
/* Precedence: lowest to highest */
@@ -858,6 +859,32 @@ parse_toplevel:
pg_yyget_extra(yyscanner)->parsetree =
list_make1(makeRawStmt((Node *) n, 0));
}
+ | MODE_SINGLE_QUERY toplevel_stmt ';'
+ {
+ RawStmt *raw = makeRawStmt($2, 0);
+ updateRawStmtEnd(raw, @3 + 1);
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string and move to the next command.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(raw);
+ YYACCEPT;
+ }
+ /*
+ * We need to explicitly look for EOF to parse non-semicolon
+ * terminated statements in single query mode, as we could
+ * otherwise successfully parse the beginning of an otherwise
+ * invalid query.
+ */
+ | MODE_SINGLE_QUERY toplevel_stmt END_OF_FILE
+ {
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(makeRawStmt($2, 0));
+ YYACCEPT;
+ }
;
/*
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index abe131ebeb..e9a7b5d62a 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -746,7 +746,7 @@ typeStringToTypeName(const char *str)
ptserrcontext.previous = error_context_stack;
error_context_stack = &ptserrcontext;
- raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME);
+ raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME, 0);
error_context_stack = ptserrcontext.previous;
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 875de7ba28..418c50ee8f 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -37,17 +37,25 @@ static char *str_udeescape(const char *str, char escape,
*
* Returns a list of raw (un-analyzed) parse trees. The contents of the
* list have the form required by the specified RawParseMode.
+ *
+ * For all mode different from MODE_SINGLE_QUERY, caller should provide a 0
+ * offset as the whole input string should be parsed. Otherwise, caller should
+ * provide the wanted offset in the input string, or -1 if no offset is
+ * required.
*/
List *
-raw_parser(const char *str, RawParseMode mode)
+raw_parser(const char *str, RawParseMode mode, int offset)
{
core_yyscan_t yyscanner;
base_yy_extra_type yyextra;
int yyresult;
+ Assert((mode != MODE_SINGLE_QUERY && offset == 0) ||
+ (mode == MODE_SINGLE_QUERY && offset != 0));
+
/* initialize the flex scanner */
yyscanner = scanner_init(str, &yyextra.core_yy_extra,
- &ScanKeywords, ScanKeywordTokens);
+ &ScanKeywords, ScanKeywordTokens, offset);
/* base_yylex() only needs us to initialize the lookahead token, if any */
if (mode == RAW_PARSE_DEFAULT)
@@ -61,7 +69,8 @@ raw_parser(const char *str, RawParseMode mode)
MODE_PLPGSQL_EXPR, /* RAW_PARSE_PLPGSQL_EXPR */
MODE_PLPGSQL_ASSIGN1, /* RAW_PARSE_PLPGSQL_ASSIGN1 */
MODE_PLPGSQL_ASSIGN2, /* RAW_PARSE_PLPGSQL_ASSIGN2 */
- MODE_PLPGSQL_ASSIGN3 /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_PLPGSQL_ASSIGN3, /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_SINGLE_QUERY /* RAW_PARSE_SINGLE_QUERY */
};
yyextra.have_lookahead = true;
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 9f9d8a1706..8ccbe95ac6 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -1041,7 +1041,10 @@ other .
<<EOF>> {
SET_YYLLOC();
- yyterminate();
+ if (yyextra->return_eof)
+ return END_OF_FILE;
+ else
+ yyterminate();
}
%%
@@ -1189,8 +1192,10 @@ core_yyscan_t
scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens)
+ const uint16 *keyword_tokens,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -1213,13 +1218,28 @@ scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Note that pg_parse_query will set a -1 offset rather than 0 for the
+ * first query of a possibly multi-query string if it wants us to return an
+ * EOF token.
+ */
+ yyext->return_eof = (offset != 0);
+
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ if (offset > 0)
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e941b59b85..9331628add 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -602,17 +602,137 @@ ProcessClientWriteInterrupt(bool blocked)
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list = NIL;
+ List *result = NIL;
+ int stmt_len, offset;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- if (parser_hook)
- raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
- else
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ stmt_len = 0; /* lazily computed when needed */
+ offset = 0;
+
+ while(true)
+ {
+ List *raw_parsetree_list;
+ RawStmt *raw;
+ bool error = false;
+
+ /*----------------
+ * Start parsing the input string. If a third-party module provided a
+ * parser_hook, we switch to single-query parsing so multi-query
+ * commands using different grammar can work properly.
+ * If the third-party modules support the full set of SQL we support,
+ * or want to prevent fallback on the core parser, it can ignore the
+ * RAW_PARSE_SINGLE_QUERY flag and parse the full query string.
+ * In that case they must return a List with more than one RawStmt or a
+ * single RawStmt with a 0 length to stop the parsing phase, or raise
+ * an ERROR.
+ *
+ * Otherwise, plugins should parse a single query only and always
+ * return a List containing a single RawStmt with a properly set length
+ * (possibly 0 if it was a single query without end of query
+ * delimiter). If the command is valid but doesn't contain any
+ * statements (e.g. a single semi-colon), a single RawStmt with a NULL
+ * stmt field should be returned, containing the consumed query string
+ * length so we can move to the next command in a single pass rather
+ * than 1 byte at a time.
+ *
+ * Also, third-party modules can choose to ignore some or all of
+ * parsing error if they want to implement only subset of postgres
+ * suppoted syntax, or even a totally different syntax, and fall-back
+ * on core grammar for unhandled case. In thase case, they should set
+ * the error flag to true. The returned List will be ignored and the
+ * same offset of the input string will be parsed using the core
+ * parser.
+ *
+ * Finally, note that third-party modules that wants to fallback on
+ * other grammar should first try to call a previous parser hook if any
+ * before setting the error switch and returning .
+ */
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string,
+ RAW_PARSE_SINGLE_QUERY,
+ offset,
+ &error);
+
+ /*
+ * If a third-party module couldn't parse a single query or if no
+ * third-party module is configured, fallback on core parser.
+ */
+ if (error || !parser_hook)
+ {
+ /* Send a -1 offset to raw_parser to specify that it should
+ * explicitly detect EOF during parsing. scanner_init() will treat
+ * it the same as a 0 offset.
+ */
+ raw_parsetree_list = raw_parser(query_string,
+ error ? RAW_PARSE_SINGLE_QUERY : RAW_PARSE_DEFAULT,
+ (error && offset == 0) ? -1 : offset);
+ }
+
+ /*
+ * If there are no third-party plugin, or none of the parsers found a
+ * valid query, or if a third party module consumed the whole
+ * query string we're done.
+ */
+ if (!parser_hook || raw_parsetree_list == NIL ||
+ list_length(raw_parsetree_list) > 1)
+ {
+ /*
+ * Warn third-party plugins if they mix "single query" and "whole
+ * input string" strategy rather than silently accepting it and
+ * maybe allow fallback on core grammar even if they want to avoid
+ * that. This way plugin authors can be warned early of the issue.
+ */
+ if (result != NIL)
+ {
+ Assert(parser_hook != NULL);
+ elog(ERROR, "parser_hook should parse a single statement at "
+ "a time or consume the whole input string at once");
+ }
+ result = raw_parsetree_list;
+ break;
+ }
+
+ if (stmt_len == 0)
+ stmt_len = strlen(query_string);
+
+ raw = linitial_node(RawStmt, raw_parsetree_list);
+
+ /*
+ * In single-query mode, the parser will return statement location info
+ * relative to the beginning of complete original string, not the part
+ * we just parsed, so adjust the location info.
+ */
+ if (offset > 0 && raw->stmt_len > 0)
+ {
+ Assert(raw->stmt_len > offset);
+ raw->stmt_location = offset;
+ raw->stmt_len -= offset;
+ }
+
+ /* Ignore the statement if it didn't contain any command. */
+ if (raw->stmt)
+ result = lappend(result, raw);
+
+ if (raw->stmt_len == 0)
+ {
+ /* The statement was the whole string, we're done. */
+ break;
+ }
+ else if (raw->stmt_len + offset >= stmt_len)
+ {
+ /* We consumed all of the input string, we're done. */
+ break;
+ }
+ else
+ {
+ /* Advance the offset to the next command. */
+ offset += raw->stmt_len;
+ }
+ }
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
@@ -620,13 +740,13 @@ pg_parse_query(const char *query_string)
#ifdef COPY_PARSE_PLAN_TREES
/* Optional debugging check: pass raw parsetrees through copyObject() */
{
- List *new_list = copyObject(raw_parsetree_list);
+ List *new_list = copyObject(result);
/* This checks both copyObject() and the equal() routines... */
- if (!equal(new_list, raw_parsetree_list))
+ if (!equal(new_list, result))
elog(WARNING, "copyObject() failed to produce an equal raw parse tree");
else
- raw_parsetree_list = new_list;
+ result = new_list;
}
#endif
@@ -638,7 +758,7 @@ pg_parse_query(const char *query_string)
TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
- return raw_parsetree_list;
+ return result;
}
/*
diff --git a/src/include/parser/parser.h b/src/include/parser/parser.h
index 853b0f1606..5694ae791a 100644
--- a/src/include/parser/parser.h
+++ b/src/include/parser/parser.h
@@ -41,7 +41,8 @@ typedef enum
RAW_PARSE_PLPGSQL_EXPR,
RAW_PARSE_PLPGSQL_ASSIGN1,
RAW_PARSE_PLPGSQL_ASSIGN2,
- RAW_PARSE_PLPGSQL_ASSIGN3
+ RAW_PARSE_PLPGSQL_ASSIGN3,
+ RAW_PARSE_SINGLE_QUERY
} RawParseMode;
/* Values for the backslash_quote GUC */
@@ -59,7 +60,7 @@ extern PGDLLIMPORT bool standard_conforming_strings;
/* Primary entry point for the raw parsing functions */
-extern List *raw_parser(const char *str, RawParseMode mode);
+extern List *raw_parser(const char *str, RawParseMode mode, int offset);
/* Utility functions exported by gram.y (perhaps these should be elsewhere) */
extern List *SystemFuncName(char *name);
diff --git a/src/include/parser/scanner.h b/src/include/parser/scanner.h
index 0d8182faa0..a2e97be5d5 100644
--- a/src/include/parser/scanner.h
+++ b/src/include/parser/scanner.h
@@ -113,6 +113,9 @@ typedef struct core_yy_extra_type
/* state variables for literal-lexing warnings */
bool warn_on_first_escape;
bool saw_non_ascii;
+
+ /* state variable for returning an EOF token in single query mode */
+ bool return_eof;
} core_yy_extra_type;
/*
@@ -136,7 +139,8 @@ extern PGDLLIMPORT const uint16 ScanKeywordTokens[];
extern core_yyscan_t scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens);
+ const uint16 *keyword_tokens,
+ int offset);
extern void scanner_finish(core_yyscan_t yyscanner);
extern int core_yylex(core_YYSTYPE *lvalp, YYLTYPE *llocp,
core_yyscan_t yyscanner);
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 131dc2b22e..27201dde1d 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -45,7 +45,8 @@ typedef enum
extern PGDLLIMPORT int log_statement;
/* Hook for plugins to get control in pg_parse_query() */
-typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode,
+ int offset, bool *error);
extern PGDLLIMPORT parser_hook_type parser_hook;
extern List *pg_parse_query(const char *query_string);
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 3fcca43b90..e5a8a6477a 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3656,7 +3656,7 @@ check_sql_expr(const char *stmt, RawParseMode parseMode, int location)
error_context_stack = &syntax_errcontext;
oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt);
- (void) raw_parser(stmt, parseMode);
+ (void) raw_parser(stmt, parseMode, 0);
MemoryContextSwitchTo(oldCxt);
/* Restore former ereport callback */
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index e4c7a91ab5..a2886c42ec 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -587,7 +587,7 @@ plpgsql_scanner_init(const char *str)
{
/* Start up the core scanner */
yyscanner = scanner_init(str, &core_yy,
- &ReservedPLKeywords, ReservedPLKeywordTokens);
+ &ReservedPLKeywords, ReservedPLKeywordTokens, 0);
/*
* scanorig points to the original string, which unlike the scanner's
--
2.31.1
v2-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload
From 36506de98e53432f13c4ca0b6b9907371fa133a6 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 02:15:54 +0800
Subject: [PATCH v2 4/4] Teach sqlol to use the new MODE_SINGLE_QUERY parser
mode.
This way multi-statements commands using both core parser and sqlol parser can
be supported.
Also add a LOLCODE version of CREATE VIEW viewname AS to easily test
multi-statements commands.
---
contrib/sqlol/Makefile | 2 +
contrib/sqlol/expected/01_sqlol.out | 74 +++++++++++++++++++++++++++++
contrib/sqlol/repro.sql | 18 +++++++
contrib/sqlol/sql/01_sqlol.sql | 40 ++++++++++++++++
contrib/sqlol/sqlol.c | 24 ++++++----
contrib/sqlol/sqlol_gram.y | 63 ++++++++++++------------
contrib/sqlol/sqlol_kwlist.h | 1 +
contrib/sqlol/sqlol_scan.l | 13 ++++-
contrib/sqlol/sqlol_scanner.h | 3 +-
9 files changed, 192 insertions(+), 46 deletions(-)
create mode 100644 contrib/sqlol/expected/01_sqlol.out
create mode 100644 contrib/sqlol/repro.sql
create mode 100644 contrib/sqlol/sql/01_sqlol.sql
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
index 025e77c4ff..554fe91eae 100644
--- a/contrib/sqlol/Makefile
+++ b/contrib/sqlol/Makefile
@@ -6,6 +6,8 @@ OBJS = \
sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+REGRESS = 01_sqlol
+
sqlol_gram.h: sqlol_gram.c
touch $@
diff --git a/contrib/sqlol/expected/01_sqlol.out b/contrib/sqlol/expected/01_sqlol.out
new file mode 100644
index 0000000000..a18eaf6801
--- /dev/null
+++ b/contrib/sqlol/expected/01_sqlol.out
@@ -0,0 +1,74 @@
+LOAD 'sqlol';
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+ id | val
+----+-----
+(0 rows)
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+ ?column?
+----------
+ 3
+(1 row)
+
+-- test empty statement ignoring
+\;\;select 1 \g
+ ?column?
+----------
+ 1
+(1 row)
+
+-- check the created views
+\d
+ List of relations
+ Schema | Name | Type | Owner
+--------+------+-------+-------
+ public | t1 | table | rjuju
+ public | v0 | view | rjuju
+ public | v1 | view | rjuju
+ public | v2 | view | rjuju
+ public | v3 | view | rjuju
+ public | v4 | view | rjuju
+ public | v5 | view | rjuju
+(7 rows)
+
+--
+-- Error position
+--
+SELECT 1\;err;
+ERROR: syntax error at or near "err"
+LINE 1: SELECT 1;err;
+ ^
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+ERROR: syntax error at or near "HAI"
+LINE 1: SELECT 1;HAI 1.2 I HAS A t1 GIMME id KTHXBYE
+ ^
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+ERROR: improper qualified name (too many dotted names): some.thing.public.t1
+LINE 1: SELECT 1;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHX...
+ ^
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
+ERROR: relation "notatable" does not exist
+LINE 1: SELECT 1;SELECT * FROM notatable;
+ ^
diff --git a/contrib/sqlol/repro.sql b/contrib/sqlol/repro.sql
new file mode 100644
index 0000000000..0ebcb53160
--- /dev/null
+++ b/contrib/sqlol/repro.sql
@@ -0,0 +1,18 @@
+DROP TABLE IF EXISTS t1 CASCADE;
+
+LOAD 'sqlol';
+
+\;\; SELECT 1\;
+
+CREATE TABLE t1 (id integer, val text);
+
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+SELECT 1\;SELECT 2\;SELECT 3 \g
+\d
diff --git a/contrib/sqlol/sql/01_sqlol.sql b/contrib/sqlol/sql/01_sqlol.sql
new file mode 100644
index 0000000000..918caf94c0
--- /dev/null
+++ b/contrib/sqlol/sql/01_sqlol.sql
@@ -0,0 +1,40 @@
+LOAD 'sqlol';
+
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+
+-- test empty statement ignoring
+\;\;select 1 \g
+
+-- check the created views
+\d
+
+--
+-- Error position
+--
+SELECT 1\;err;
+
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
index b986966181..7d4e1b631f 100644
--- a/contrib/sqlol/sqlol.c
+++ b/contrib/sqlol/sqlol.c
@@ -26,7 +26,8 @@ static parser_hook_type prev_parser_hook = NULL;
void _PG_init(void);
void _PG_fini(void);
-static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+static List *sqlol_parser_hook(const char *str, RawParseMode mode, int offset,
+ bool *error);
/*
@@ -54,23 +55,25 @@ _PG_fini(void)
* sqlol_parser_hook: parse our grammar
*/
static List *
-sqlol_parser_hook(const char *str, RawParseMode mode)
+sqlol_parser_hook(const char *str, RawParseMode mode, int offset, bool *error)
{
sqlol_yyscan_t yyscanner;
sqlol_base_yy_extra_type yyextra;
int yyresult;
- if (mode != RAW_PARSE_DEFAULT)
+ if (mode != RAW_PARSE_DEFAULT && mode != RAW_PARSE_SINGLE_QUERY)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
/* initialize the flex scanner */
yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
- sqlol_ScanKeywords, sqlol_NumScanKeywords);
+ sqlol_ScanKeywords, sqlol_NumScanKeywords,
+ offset);
/* initialize the bison parser */
sqlol_parser_init(&yyextra);
@@ -88,9 +91,10 @@ sqlol_parser_hook(const char *str, RawParseMode mode)
if (yyresult)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
return yyextra.parsetree;
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
index 64d00d14ca..4c36cfef5e 100644
--- a/contrib/sqlol/sqlol_gram.y
+++ b/contrib/sqlol/sqlol_gram.y
@@ -20,6 +20,7 @@
#include "catalog/namespace.h"
#include "nodes/makefuncs.h"
+#include "catalog/pg_class_d.h"
#include "sqlol_gramparse.h"
@@ -106,10 +107,10 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
ResTarget *target;
}
-%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+%type <node> stmt toplevel_stmt GimmehStmt MaekStmt simple_gimmeh columnref
indirection_el
-%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+%type <list> parse_toplevel rawstmt gimmeh_list indirection
%type <range> qualified_name
@@ -134,22 +135,19 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
*/
/* ordinary key words in alphabetical order */
-%token <keyword> A GIMMEH HAI HAS I KTHXBYE
-
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE MAEK
%%
/*
* The target production for the whole parse.
- *
- * Ordinarily we parse a list of statements, but if we see one of the
- * special MODE_XXX symbols as first token, we parse something else.
- * The options here correspond to enum RawParseMode, which see for details.
*/
parse_toplevel:
- stmtmulti
+ rawstmt
{
pg_yyget_extra(yyscanner)->parsetree = $1;
+
+ YYACCEPT;
}
;
@@ -163,24 +161,11 @@ parse_toplevel:
* we'd get -1 for the location in such cases.
* We also take care to discard empty statements entirely.
*/
-stmtmulti: stmtmulti KTHXBYE toplevel_stmt
- {
- if ($1 != NIL)
- {
- /* update length of previous stmt */
- updateRawStmtEnd(llast_node(RawStmt, $1), @2);
- }
- if ($3 != NULL)
- $$ = lappend($1, makeRawStmt($3, @2 + 1));
- else
- $$ = $1;
- }
- | toplevel_stmt
+rawstmt: toplevel_stmt KTHXBYE
{
- if ($1 != NULL)
- $$ = list_make1(makeRawStmt($1, 0));
- else
- $$ = NIL;
+ RawStmt *raw = makeRawStmt($1, 0);
+ updateRawStmtEnd(raw, @2 + 7);
+ $$ = list_make1(raw);
}
;
@@ -189,13 +174,12 @@ stmtmulti: stmtmulti KTHXBYE toplevel_stmt
* those words have different meanings in function bodys.
*/
toplevel_stmt:
- stmt
+ HAI FCONST stmt { $$ = $3; }
;
stmt:
GimmehStmt
- | /*EMPTY*/
- { $$ = NULL; }
+ | MaekStmt
;
/*****************************************************************************
@@ -209,12 +193,11 @@ GimmehStmt:
;
simple_gimmeh:
- HAI FCONST I HAS A qualified_name
- GIMMEH gimmeh_list
+ I HAS A qualified_name GIMMEH gimmeh_list
{
SelectStmt *n = makeNode(SelectStmt);
- n->targetList = $8;
- n->fromClause = list_make1($6);
+ n->targetList = $6;
+ n->fromClause = list_make1($4);
$$ = (Node *)n;
}
;
@@ -233,6 +216,20 @@ gimmeh_el:
$$->location = @1;
}
+MaekStmt:
+ MAEK GimmehStmt A qualified_name
+ {
+ ViewStmt *n = makeNode(ViewStmt);
+ n->view = $4;
+ n->view->relpersistence = RELPERSISTENCE_PERMANENT;
+ n->aliases = NIL;
+ n->query = $2;
+ n->replace = false;
+ n->options = NIL;
+ n->withCheckOption = false;
+ $$ = (Node *) n;
+ }
+
qualified_name:
ColId
{
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
index 2de3893ee4..8b50d88df9 100644
--- a/contrib/sqlol/sqlol_kwlist.h
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -19,3 +19,4 @@ PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
+PG_KEYWORD("maek", MAEK, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
index a7088b8390..e6d4d53446 100644
--- a/contrib/sqlol/sqlol_scan.l
+++ b/contrib/sqlol/sqlol_scan.l
@@ -412,8 +412,10 @@ sqlol_yyscan_t
sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords)
+ int num_keywords,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -432,13 +434,20 @@ sqlol_scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
index 0a497e9d91..57f95867ee 100644
--- a/contrib/sqlol/sqlol_scanner.h
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -108,7 +108,8 @@ extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords);
+ int num_keywords,
+ int offset);
extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
sqlol_yyscan_t yyscanner);
--
2.31.1
On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:
On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:
I'm attaching some POC patches that implement this approach to start a
discussion.I just noticed that the cfbot fails with the v1 patch. Attached v2 that should
fix that.
The cfbot then revealed a missing dependency in the makefile to generate the
contrib parser, which triggers in make check-world without a previous
make -C contrib.
Thanks a lot to Thomas Munro for getting me the logfile from the failed cfbot
run and the fix!
Attachments:
v3-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload
From 3522fd2b0b27f52ab400abe1c9fbd5bb0c6169b4 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 22:47:18 +0800
Subject: [PATCH v3 1/4] Add a parser_hook hook.
This does nothing but allow third-party plugins to implement a different
syntax, and fallback on the core parser if they don't implement a superset of
the supported core syntax.
---
src/backend/tcop/postgres.c | 16 ++++++++++++++--
src/include/tcop/tcopprot.h | 5 +++++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..e941b59b85 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -99,6 +99,9 @@ int log_statement = LOGSTMT_NONE;
/* GUC variable for maximum stack depth (measured in kilobytes) */
int max_stack_depth = 100;
+/* Hook for plugins to get control in pg_parse_query() */
+parser_hook_type parser_hook = NULL;
+
/* wait N seconds to allow attach from a debugger */
int PostAuthDelay = 0;
@@ -589,18 +592,27 @@ ProcessClientWriteInterrupt(bool blocked)
* database tables. So, we rely on the raw parser to determine whether
* we've seen a COMMIT or ABORT command; when we are in abort state, other
* commands are not processed any further than the raw parse stage.
+ *
+ * To support loadable plugins that monitor the parsing or implements SQL
+ * syntactic sugar we provide a hook variable that lets a plugin get control
+ * before and after the standard parsing process. If the plugin only implement
+ * a subset of postgres supported syntax, it's its duty to call raw_parser (or
+ * the previous hook if any) for the statements it doesn't understand.
*/
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list;
+ List *raw_parsetree_list = NIL;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
+ else
+ raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 968345404e..131dc2b22e 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -17,6 +17,7 @@
#include "nodes/params.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
+#include "parser/parser.h"
#include "storage/procsignal.h"
#include "utils/guc.h"
#include "utils/queryenvironment.h"
@@ -43,6 +44,10 @@ typedef enum
extern PGDLLIMPORT int log_statement;
+/* Hook for plugins to get control in pg_parse_query() */
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+extern PGDLLIMPORT parser_hook_type parser_hook;
+
extern List *pg_parse_query(const char *query_string);
extern List *pg_rewrite_query(Query *query);
extern List *pg_analyze_and_rewrite(RawStmt *parsetree,
--
2.31.1
v3-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload
From 51a4fd99b8c66b970c3f8819cc135e1095126c48 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 23:54:02 +0800
Subject: [PATCH v3 2/4] Add a sqlol parser.
This is a toy example of alternative grammar that only accept a LOLCODE
compatible version of a
SELECT [column, ] column FROM tablename
and fallback on the core parser for everything else.
---
contrib/Makefile | 1 +
contrib/sqlol/.gitignore | 7 +
contrib/sqlol/Makefile | 33 ++
contrib/sqlol/sqlol.c | 107 +++++++
contrib/sqlol/sqlol_gram.y | 440 ++++++++++++++++++++++++++
contrib/sqlol/sqlol_gramparse.h | 61 ++++
contrib/sqlol/sqlol_keywords.c | 98 ++++++
contrib/sqlol/sqlol_keywords.h | 38 +++
contrib/sqlol/sqlol_kwlist.h | 21 ++
contrib/sqlol/sqlol_scan.l | 544 ++++++++++++++++++++++++++++++++
contrib/sqlol/sqlol_scanner.h | 118 +++++++
11 files changed, 1468 insertions(+)
create mode 100644 contrib/sqlol/.gitignore
create mode 100644 contrib/sqlol/Makefile
create mode 100644 contrib/sqlol/sqlol.c
create mode 100644 contrib/sqlol/sqlol_gram.y
create mode 100644 contrib/sqlol/sqlol_gramparse.h
create mode 100644 contrib/sqlol/sqlol_keywords.c
create mode 100644 contrib/sqlol/sqlol_keywords.h
create mode 100644 contrib/sqlol/sqlol_kwlist.h
create mode 100644 contrib/sqlol/sqlol_scan.l
create mode 100644 contrib/sqlol/sqlol_scanner.h
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..2a80cd137b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
postgres_fdw \
seg \
spi \
+ sqlol \
tablefunc \
tcn \
test_decoding \
diff --git a/contrib/sqlol/.gitignore b/contrib/sqlol/.gitignore
new file mode 100644
index 0000000000..3c4b587792
--- /dev/null
+++ b/contrib/sqlol/.gitignore
@@ -0,0 +1,7 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
+sqlol_gram.c
+sqlol_gram.h
+sqlol_scan.c
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
new file mode 100644
index 0000000000..3850ac3fce
--- /dev/null
+++ b/contrib/sqlol/Makefile
@@ -0,0 +1,33 @@
+# contrib/sqlol/Makefile
+
+MODULE_big = sqlol
+OBJS = \
+ $(WIN32RES) \
+ sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
+PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+
+sqlol_gram.h: sqlol_gram.c
+ touch $@
+
+sqlol_gram.c: BISONFLAGS += -d
+# sqlol_gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/src/include/parser/kwlist.h
+
+
+sqlol_scan.c: FLEXFLAGS = -CF -p -p
+sqlol_scan.c: FLEX_NO_BACKUP=yes
+sqlol_scan.c: FLEX_FIX_WARNING=yes
+
+
+# Force these dependencies to be known even without dependency info built:
+sqlol.o sqlol_gram.o sqlol_scan.o parser.o: sqlol_gram.h
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/sqlol
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
new file mode 100644
index 0000000000..b986966181
--- /dev/null
+++ b/contrib/sqlol/sqlol.c
@@ -0,0 +1,107 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol.c
+ *
+ *
+ * Copyright (c) 2008-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "tcop/tcopprot.h"
+
+#include "sqlol_gramparse.h"
+#include "sqlol_keywords.h"
+
+PG_MODULE_MAGIC;
+
+
+/* Saved hook values in case of unload */
+static parser_hook_type prev_parser_hook = NULL;
+
+void _PG_init(void);
+void _PG_fini(void);
+
+static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+ /* Install hooks. */
+ prev_parser_hook = parser_hook;
+ parser_hook = sqlol_parser_hook;
+}
+
+/*
+ * Module unload callback
+ */
+void
+_PG_fini(void)
+{
+ /* Uninstall hooks. */
+ parser_hook = prev_parser_hook;
+}
+
+/*
+ * sqlol_parser_hook: parse our grammar
+ */
+static List *
+sqlol_parser_hook(const char *str, RawParseMode mode)
+{
+ sqlol_yyscan_t yyscanner;
+ sqlol_base_yy_extra_type yyextra;
+ int yyresult;
+
+ if (mode != RAW_PARSE_DEFAULT)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ /* initialize the flex scanner */
+ yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
+ sqlol_ScanKeywords, sqlol_NumScanKeywords);
+
+ /* initialize the bison parser */
+ sqlol_parser_init(&yyextra);
+
+ /* Parse! */
+ yyresult = sqlol_base_yyparse(yyscanner);
+
+ /* Clean up (release memory) */
+ sqlol_scanner_finish(yyscanner);
+
+ /*
+ * Invalid statement, fallback on previous parser_hook if any or
+ * raw_parser()
+ */
+ if (yyresult)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ return yyextra.parsetree;
+}
+
+int
+sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, sqlol_yyscan_t yyscanner)
+{
+ int cur_token;
+
+ cur_token = sqlol_yylex(&(lvalp->sqlol_yystype), llocp, yyscanner);
+
+ return cur_token;
+}
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
new file mode 100644
index 0000000000..64d00d14ca
--- /dev/null
+++ b/contrib/sqlol/sqlol_gram.y
@@ -0,0 +1,440 @@
+%{
+
+/*#define YYDEBUG 1*/
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gram.y
+ * sqlol BISON rules/actions
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_gram.y
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/namespace.h"
+#include "nodes/makefuncs.h"
+
+#include "sqlol_gramparse.h"
+
+/*
+ * Location tracking support --- simpler than bison's default, since we only
+ * want to track the start position not the end position of each nonterminal.
+ */
+#define YYLLOC_DEFAULT(Current, Rhs, N) \
+ do { \
+ if ((N) > 0) \
+ (Current) = (Rhs)[1]; \
+ else \
+ (Current) = (-1); \
+ } while (0)
+
+/*
+ * The above macro assigns -1 (unknown) as the parse location of any
+ * nonterminal that was reduced from an empty rule, or whose leftmost
+ * component was reduced from an empty rule. This is problematic
+ * for nonterminals defined like
+ * OptFooList: / * EMPTY * / { ... } | OptFooList Foo { ... } ;
+ * because we'll set -1 as the location during the first reduction and then
+ * copy it during each subsequent reduction, leaving us with -1 for the
+ * location even when the list is not empty. To fix that, do this in the
+ * action for the nonempty rule(s):
+ * if (@$ < 0) @$ = @2;
+ * (Although we have many nonterminals that follow this pattern, we only
+ * bother with fixing @$ like this when the nonterminal's parse location
+ * is actually referenced in some rule.)
+ *
+ * A cleaner answer would be to make YYLLOC_DEFAULT scan all the Rhs
+ * locations until it's found one that's not -1. Then we'd get a correct
+ * location for any nonterminal that isn't entirely empty. But this way
+ * would add overhead to every rule reduction, and so far there's not been
+ * a compelling reason to pay that overhead.
+ */
+
+/*
+ * Bison doesn't allocate anything that needs to live across parser calls,
+ * so we can easily have it use palloc instead of malloc. This prevents
+ * memory leaks if we error out during parsing. Note this only works with
+ * bison >= 2.0. However, in bison 1.875 the default is to use alloca()
+ * if possible, so there's not really much problem anyhow, at least if
+ * you're building with gcc.
+ */
+#define YYMALLOC palloc
+#define YYFREE pfree
+
+
+#define parser_yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+#define parser_errposition(pos) sqlol_scanner_errposition(pos, yyscanner)
+
+static void sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner,
+ const char *msg);
+static RawStmt *makeRawStmt(Node *stmt, int stmt_location);
+static void updateRawStmtEnd(RawStmt *rs, int end_location);
+static Node *makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner);
+static void check_qualified_name(List *names, sqlol_yyscan_t yyscanner);
+static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
+
+%}
+
+%pure-parser
+%expect 0
+%name-prefix="sqlol_base_yy"
+%locations
+
+%parse-param {sqlol_yyscan_t yyscanner}
+%lex-param {sqlol_yyscan_t yyscanner}
+
+%union
+{
+ sqlol_YYSTYPE sqlol_yystype;
+ /* these fields must match sqlol_YYSTYPE: */
+ int ival;
+ char *str;
+ const char *keyword;
+
+ List *list;
+ Node *node;
+ Value *value;
+ RangeVar *range;
+ ResTarget *target;
+}
+
+%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+ indirection_el
+
+%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+
+%type <range> qualified_name
+
+%type <str> ColId ColLabel attr_name
+
+%type <target> gimmeh_el
+
+/*
+ * Non-keyword token types. These are hard-wired into the "flex" lexer.
+ * They must be listed first so that their numeric codes do not depend on
+ * the set of keywords. PL/pgSQL depends on this so that it can share the
+ * same lexer. If you add/change tokens here, fix PL/pgSQL to match!
+ *
+ */
+%token <str> IDENT FCONST SCONST Op
+
+/*
+ * If you want to make any keyword changes, update the keyword table in
+ * src/include/parser/kwlist.h and add new keywords to the appropriate one
+ * of the reserved-or-not-so-reserved keyword lists, below; search
+ * this file for "Keyword category lists".
+ */
+
+/* ordinary key words in alphabetical order */
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE
+
+
+%%
+
+/*
+ * The target production for the whole parse.
+ *
+ * Ordinarily we parse a list of statements, but if we see one of the
+ * special MODE_XXX symbols as first token, we parse something else.
+ * The options here correspond to enum RawParseMode, which see for details.
+ */
+parse_toplevel:
+ stmtmulti
+ {
+ pg_yyget_extra(yyscanner)->parsetree = $1;
+ }
+ ;
+
+/*
+ * At top level, we wrap each stmt with a RawStmt node carrying start location
+ * and length of the stmt's text. Notice that the start loc/len are driven
+ * entirely from semicolon locations (@2). It would seem natural to use
+ * @1 or @3 to get the true start location of a stmt, but that doesn't work
+ * for statements that can start with empty nonterminals (opt_with_clause is
+ * the main offender here); as noted in the comments for YYLLOC_DEFAULT,
+ * we'd get -1 for the location in such cases.
+ * We also take care to discard empty statements entirely.
+ */
+stmtmulti: stmtmulti KTHXBYE toplevel_stmt
+ {
+ if ($1 != NIL)
+ {
+ /* update length of previous stmt */
+ updateRawStmtEnd(llast_node(RawStmt, $1), @2);
+ }
+ if ($3 != NULL)
+ $$ = lappend($1, makeRawStmt($3, @2 + 1));
+ else
+ $$ = $1;
+ }
+ | toplevel_stmt
+ {
+ if ($1 != NULL)
+ $$ = list_make1(makeRawStmt($1, 0));
+ else
+ $$ = NIL;
+ }
+ ;
+
+/*
+ * toplevel_stmt includes BEGIN and END. stmt does not include them, because
+ * those words have different meanings in function bodys.
+ */
+toplevel_stmt:
+ stmt
+ ;
+
+stmt:
+ GimmehStmt
+ | /*EMPTY*/
+ { $$ = NULL; }
+ ;
+
+/*****************************************************************************
+ *
+ * GIMMEH statement
+ *
+ *****************************************************************************/
+
+GimmehStmt:
+ simple_gimmeh { $$ = $1; }
+ ;
+
+simple_gimmeh:
+ HAI FCONST I HAS A qualified_name
+ GIMMEH gimmeh_list
+ {
+ SelectStmt *n = makeNode(SelectStmt);
+ n->targetList = $8;
+ n->fromClause = list_make1($6);
+ $$ = (Node *)n;
+ }
+ ;
+
+gimmeh_list:
+ gimmeh_el { $$ = list_make1($1); }
+ | gimmeh_list ',' gimmeh_el { $$ = lappend($1, $3); }
+
+gimmeh_el:
+ columnref
+ {
+ $$ = makeNode(ResTarget);
+ $$->name = NULL;
+ $$->indirection = NIL;
+ $$->val = (Node *)$1;
+ $$->location = @1;
+ }
+
+qualified_name:
+ ColId
+ {
+ $$ = makeRangeVar(NULL, $1, @1);
+ }
+ | ColId indirection
+ {
+ check_qualified_name($2, yyscanner);
+ $$ = makeRangeVar(NULL, NULL, @1);
+ switch (list_length($2))
+ {
+ case 1:
+ $$->catalogname = NULL;
+ $$->schemaname = $1;
+ $$->relname = strVal(linitial($2));
+ break;
+ case 2:
+ $$->catalogname = $1;
+ $$->schemaname = strVal(linitial($2));
+ $$->relname = strVal(lsecond($2));
+ break;
+ default:
+ /*
+ * It's ok to error out here as at this point we
+ * already parsed a "HAI FCONST" preamble, and no
+ * other grammar is likely to accept a command
+ * starting with that, so there's no point trying
+ * to fall back on the other grammars.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("improper qualified name (too many dotted names): %s",
+ NameListToString(lcons(makeString($1), $2))),
+ parser_errposition(@1)));
+ break;
+ }
+ }
+ ;
+
+columnref: ColId
+ {
+ $$ = makeColumnRef($1, NIL, @1, yyscanner);
+ }
+ | ColId indirection
+ {
+ $$ = makeColumnRef($1, $2, @1, yyscanner);
+ }
+ ;
+
+ColId: IDENT { $$ = $1; }
+
+indirection:
+ indirection_el { $$ = list_make1($1); }
+ | indirection indirection_el { $$ = lappend($1, $2); }
+ ;
+
+indirection_el:
+ '.' attr_name
+ {
+ $$ = (Node *) makeString($2);
+ }
+ ;
+
+attr_name: ColLabel { $$ = $1; };
+
+ColLabel: IDENT { $$ = $1; }
+
+%%
+
+/*
+ * The signature of this function is required by bison. However, we
+ * ignore the passed yylloc and instead use the last token position
+ * available from the scanner.
+ */
+static void
+sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner, const char *msg)
+{
+ parser_yyerror(msg);
+}
+
+static RawStmt *
+makeRawStmt(Node *stmt, int stmt_location)
+{
+ RawStmt *rs = makeNode(RawStmt);
+
+ rs->stmt = stmt;
+ rs->stmt_location = stmt_location;
+ rs->stmt_len = 0; /* might get changed later */
+ return rs;
+}
+
+/* Adjust a RawStmt to reflect that it doesn't run to the end of the string */
+static void
+updateRawStmtEnd(RawStmt *rs, int end_location)
+{
+ /*
+ * If we already set the length, don't change it. This is for situations
+ * like "select foo ;; select bar" where the same statement will be last
+ * in the string for more than one semicolon.
+ */
+ if (rs->stmt_len > 0)
+ return;
+
+ /* OK, update length of RawStmt */
+ rs->stmt_len = end_location - rs->stmt_location;
+}
+
+static Node *
+makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner)
+{
+ /*
+ * Generate a ColumnRef node, with an A_Indirection node added if there
+ * is any subscripting in the specified indirection list. However,
+ * any field selection at the start of the indirection list must be
+ * transposed into the "fields" part of the ColumnRef node.
+ */
+ ColumnRef *c = makeNode(ColumnRef);
+ int nfields = 0;
+ ListCell *l;
+
+ c->location = location;
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Indices))
+ {
+ A_Indirection *i = makeNode(A_Indirection);
+
+ if (nfields == 0)
+ {
+ /* easy case - all indirection goes to A_Indirection */
+ c->fields = list_make1(makeString(colname));
+ i->indirection = check_indirection(indirection, yyscanner);
+ }
+ else
+ {
+ /* got to split the list in two */
+ i->indirection = check_indirection(list_copy_tail(indirection,
+ nfields),
+ yyscanner);
+ indirection = list_truncate(indirection, nfields);
+ c->fields = lcons(makeString(colname), indirection);
+ }
+ i->arg = (Node *) c;
+ return (Node *) i;
+ }
+ else if (IsA(lfirst(l), A_Star))
+ {
+ /* We only allow '*' at the end of a ColumnRef */
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ nfields++;
+ }
+ /* No subscripting, so all indirection gets added to field list */
+ c->fields = lcons(makeString(colname), indirection);
+ return (Node *) c;
+}
+
+/* check_qualified_name --- check the result of qualified_name production
+ *
+ * It's easiest to let the grammar production for qualified_name allow
+ * subscripts and '*', which we then must reject here.
+ */
+static void
+check_qualified_name(List *names, sqlol_yyscan_t yyscanner)
+{
+ ListCell *i;
+
+ foreach(i, names)
+ {
+ if (!IsA(lfirst(i), String))
+ parser_yyerror("syntax error");
+ }
+}
+
+/* check_indirection --- check the result of indirection production
+ *
+ * We only allow '*' at the end of the list, but it's hard to enforce that
+ * in the grammar, so do it here.
+ */
+static List *
+check_indirection(List *indirection, sqlol_yyscan_t yyscanner)
+{
+ ListCell *l;
+
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Star))
+ {
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ }
+ return indirection;
+}
+
+/* sqlol_parser_init()
+ * Initialize to parse one query string
+ */
+void
+sqlol_parser_init(sqlol_base_yy_extra_type *yyext)
+{
+ yyext->parsetree = NIL; /* in case grammar forgets to set it */
+}
diff --git a/contrib/sqlol/sqlol_gramparse.h b/contrib/sqlol/sqlol_gramparse.h
new file mode 100644
index 0000000000..58233a8d87
--- /dev/null
+++ b/contrib/sqlol/sqlol_gramparse.h
@@ -0,0 +1,61 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gramparse.h
+ * Shared definitions for the "raw" parser (flex and bison phases only)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_gramparse.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_GRAMPARSE_H
+#define SQLOL_GRAMPARSE_H
+
+#include "nodes/parsenodes.h"
+#include "sqlol_scanner.h"
+
+/*
+ * NB: include gram.h only AFTER including scanner.h, because scanner.h
+ * is what #defines YYLTYPE.
+ */
+#include "sqlol_gram.h"
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around. Private
+ * state needed for raw parsing/lexing goes here.
+ */
+typedef struct sqlol_base_yy_extra_type
+{
+ /*
+ * Fields used by the core scanner.
+ */
+ sqlol_yy_extra_type sqlol_yy_extra;
+
+ /*
+ * State variables that belong to the grammar.
+ */
+ List *parsetree; /* final parse result is delivered here */
+} sqlol_base_yy_extra_type;
+
+/*
+ * In principle we should use yyget_extra() to fetch the yyextra field
+ * from a yyscanner struct. However, flex always puts that field first,
+ * and this is sufficiently performance-critical to make it seem worth
+ * cheating a bit to use an inline macro.
+ */
+#define pg_yyget_extra(yyscanner) (*((sqlol_base_yy_extra_type **) (yyscanner)))
+
+
+/* from parser.c */
+extern int sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+
+/* from gram.y */
+extern void sqlol_parser_init(sqlol_base_yy_extra_type *yyext);
+extern int sqlol_baseyyparse(sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_GRAMPARSE_H */
diff --git a/contrib/sqlol/sqlol_keywords.c b/contrib/sqlol/sqlol_keywords.c
new file mode 100644
index 0000000000..dbbdf5493c
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.c
@@ -0,0 +1,98 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.c
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * sqlol/sqlol_keywords.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "sqlol_gramparse.h"
+
+#define PG_KEYWORD(a,b,c) {a,b,c},
+
+const sqlol_ScanKeyword sqlol_ScanKeywords[] = {
+#include "sqlol_kwlist.h"
+};
+
+const int sqlol_NumScanKeywords = lengthof(sqlol_ScanKeywords);
+
+#undef PG_KEYWORD
+
+
+/*
+ * ScanKeywordLookup - see if a given word is a keyword
+ *
+ * The table to be searched is passed explicitly, so that this can be used
+ * to search keyword lists other than the standard list appearing above.
+ *
+ * Returns a pointer to the sqlol_ScanKeyword table entry, or NULL if no match.
+ *
+ * The match is done case-insensitively. Note that we deliberately use a
+ * dumbed-down case conversion that will only translate 'A'-'Z' into 'a'-'z',
+ * even if we are in a locale where tolower() would produce more or different
+ * translations. This is to conform to the SQL99 spec, which says that
+ * keywords are to be matched in this way even though non-keyword identifiers
+ * receive a different case-normalization mapping.
+ */
+const sqlol_ScanKeyword *
+sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ int len,
+ i;
+ char word[NAMEDATALEN];
+ const sqlol_ScanKeyword *low;
+ const sqlol_ScanKeyword *high;
+
+ len = strlen(text);
+ /* We assume all keywords are shorter than NAMEDATALEN. */
+ if (len >= NAMEDATALEN)
+ return NULL;
+
+ /*
+ * Apply an ASCII-only downcasing. We must not use tolower() since it may
+ * produce the wrong translation in some locales (eg, Turkish).
+ */
+ for (i = 0; i < len; i++)
+ {
+ char ch = text[i];
+
+ if (ch >= 'A' && ch <= 'Z')
+ ch += 'a' - 'A';
+ word[i] = ch;
+ }
+ word[len] = '\0';
+
+ /*
+ * Now do a binary search using plain strcmp() comparison.
+ */
+ low = keywords;
+ high = keywords + (num_keywords - 1);
+ while (low <= high)
+ {
+ const sqlol_ScanKeyword *middle;
+ int difference;
+
+ middle = low + (high - low) / 2;
+ difference = strcmp(middle->name, word);
+ if (difference == 0)
+ return middle;
+ else if (difference < 0)
+ low = middle + 1;
+ else
+ high = middle - 1;
+ }
+
+ return NULL;
+}
+
diff --git a/contrib/sqlol/sqlol_keywords.h b/contrib/sqlol/sqlol_keywords.h
new file mode 100644
index 0000000000..bc4acf4541
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.h
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.h
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_keywords.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SQLOL_KEYWORDS_H
+#define SQLOL_KEYWORDS_H
+
+/* Keyword categories --- should match lists in gram.y */
+#define UNRESERVED_KEYWORD 0
+#define COL_NAME_KEYWORD 1
+#define TYPE_FUNC_NAME_KEYWORD 2
+#define RESERVED_KEYWORD 3
+
+
+typedef struct sqlol_ScanKeyword
+{
+ const char *name; /* in lower case */
+ int16 value; /* grammar's token code */
+ int16 category; /* see codes above */
+} sqlol_ScanKeyword;
+
+extern PGDLLIMPORT const sqlol_ScanKeyword sqlol_ScanKeywords[];
+extern PGDLLIMPORT const int sqlol_NumScanKeywords;
+
+extern const sqlol_ScanKeyword *sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+
+#endif /* SQLOL_KEYWORDS_H */
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
new file mode 100644
index 0000000000..2de3893ee4
--- /dev/null
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_kwlist.h
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_kwlist.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/* name, value, category, is-bare-label */
+PG_KEYWORD("a", A, UNRESERVED_KEYWORD)
+PG_KEYWORD("gimmeh", GIMMEH, UNRESERVED_KEYWORD)
+PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
+PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
+PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
+PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
new file mode 100644
index 0000000000..a7088b8390
--- /dev/null
+++ b/contrib/sqlol/sqlol_scan.l
@@ -0,0 +1,544 @@
+%top{
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scan.l
+ * lexical scanner for sqlol
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_scan.l
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/string.h"
+#include "sqlol_gramparse.h"
+#include "parser/scansup.h"
+#include "mb/pg_wchar.h"
+
+#include "sqlol_keywords.h"
+}
+
+%{
+
+/* LCOV_EXCL_START */
+
+/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
+#undef fprintf
+#define fprintf(file, fmt, msg) fprintf_to_ereport(fmt, msg)
+
+static void
+fprintf_to_ereport(const char *fmt, const char *msg)
+{
+ ereport(ERROR, (errmsg_internal("%s", msg)));
+}
+
+
+/*
+ * Set the type of YYSTYPE.
+ */
+#define YYSTYPE sqlol_YYSTYPE
+
+/*
+ * Set the type of yyextra. All state variables used by the scanner should
+ * be in yyextra, *not* statically allocated.
+ */
+#define YY_EXTRA_TYPE sqlol_yy_extra_type *
+
+/*
+ * Each call to yylex must set yylloc to the location of the found token
+ * (expressed as a byte offset from the start of the input text).
+ * When we parse a token that requires multiple lexer rules to process,
+ * this should be done in the first such rule, else yylloc will point
+ * into the middle of the token.
+ */
+#define SET_YYLLOC() (*(yylloc) = yytext - yyextra->scanbuf)
+
+/*
+ * Advance yylloc by the given number of bytes.
+ */
+#define ADVANCE_YYLLOC(delta) ( *(yylloc) += (delta) )
+
+/*
+ * Sometimes, we do want yylloc to point into the middle of a token; this is
+ * useful for instance to throw an error about an escape sequence within a
+ * string literal. But if we find no error there, we want to revert yylloc
+ * to the token start, so that that's the location reported to the parser.
+ * Use PUSH_YYLLOC/POP_YYLLOC to save/restore yylloc around such code.
+ * (Currently the implied "stack" is just one location, but someday we might
+ * need to nest these.)
+ */
+#define PUSH_YYLLOC() (yyextra->save_yylloc = *(yylloc))
+#define POP_YYLLOC() (*(yylloc) = yyextra->save_yylloc)
+
+#define startlit() ( yyextra->literallen = 0 )
+static void addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner);
+static void addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner);
+static char *litbufdup(sqlol_yyscan_t yyscanner);
+
+#define yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+
+#define lexer_errposition() sqlol_scanner_errposition(*(yylloc), yyscanner)
+
+/*
+ * Work around a bug in flex 2.5.35: it emits a couple of functions that
+ * it forgets to emit declarations for. Since we use -Wmissing-prototypes,
+ * this would cause warnings. Providing our own declarations should be
+ * harmless even when the bug gets fixed.
+ */
+extern int sqlol_yyget_column(yyscan_t yyscanner);
+extern void sqlol_yyset_column(int column_no, yyscan_t yyscanner);
+
+%}
+
+%option reentrant
+%option bison-bridge
+%option bison-locations
+%option 8bit
+%option never-interactive
+%option nodefault
+%option noinput
+%option nounput
+%option noyywrap
+%option noyyalloc
+%option noyyrealloc
+%option noyyfree
+%option warn
+%option prefix="sqlol_yy"
+
+/*
+ * OK, here is a short description of lex/flex rules behavior.
+ * The longest pattern which matches an input string is always chosen.
+ * For equal-length patterns, the first occurring in the rules list is chosen.
+ * INITIAL is the starting state, to which all non-conditional rules apply.
+ * Exclusive states change parsing rules while the state is active. When in
+ * an exclusive state, only those rules defined for that state apply.
+ *
+ * We use exclusive states for quoted strings, extended comments,
+ * and to eliminate parsing troubles for numeric strings.
+ * Exclusive states:
+ * <xd> delimited identifiers (double-quoted identifiers)
+ * <xq> standard quoted strings
+ * <xqs> quote stop (detect continued strings)
+ *
+ * Remember to add an <<EOF>> case whenever you add a new exclusive state!
+ * The default one is probably not the right thing.
+ */
+
+%x xd
+%x xq
+%x xqs
+
+/*
+ * In order to make the world safe for Windows and Mac clients as well as
+ * Unix ones, we accept either \n or \r as a newline. A DOS-style \r\n
+ * sequence will be seen as two successive newlines, but that doesn't cause
+ * any problems. Comments that start with -- and extend to the next
+ * newline are treated as equivalent to a single whitespace character.
+ *
+ * NOTE a fine point: if there is no newline following --, we will absorb
+ * everything to the end of the input as a comment. This is correct. Older
+ * versions of Postgres failed to recognize -- as a comment if the input
+ * did not end with a newline.
+ *
+ * XXX perhaps \f (formfeed) should be treated as a newline as well?
+ *
+ * XXX if you change the set of whitespace characters, fix scanner_isspace()
+ * to agree.
+ */
+
+space [ \t\n\r\f]
+horiz_space [ \t\f]
+newline [\n\r]
+non_newline [^\n\r]
+
+comment ("--"{non_newline}*)
+
+whitespace ({space}+|{comment})
+
+/*
+ * SQL requires at least one newline in the whitespace separating
+ * string literals that are to be concatenated. Silly, but who are we
+ * to argue? Note that {whitespace_with_newline} should not have * after
+ * it, whereas {whitespace} should generally have a * after it...
+ */
+
+special_whitespace ({space}+|{comment}{newline})
+horiz_whitespace ({horiz_space}|{comment})
+whitespace_with_newline ({horiz_whitespace}*{newline}{special_whitespace}*)
+
+quote '
+/* If we see {quote} then {quotecontinue}, the quoted string continues */
+quotecontinue {whitespace_with_newline}{quote}
+
+/*
+ * {quotecontinuefail} is needed to avoid lexer backup when we fail to match
+ * {quotecontinue}. It might seem that this could just be {whitespace}*,
+ * but if there's a dash after {whitespace_with_newline}, it must be consumed
+ * to see if there's another dash --- which would start a {comment} and thus
+ * allow continuation of the {quotecontinue} token.
+ */
+quotecontinuefail {whitespace}*"-"?
+
+/* Extended quote
+ * xqdouble implements embedded quote, ''''
+ */
+xqstart {quote}
+xqdouble {quote}{quote}
+xqinside [^']+
+
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote \"
+xdstart {dquote}
+xdstop {dquote}
+xddouble {dquote}{dquote}
+xdinside [^"]+
+
+digit [0-9]
+ident_start [A-Za-z\200-\377_]
+ident_cont [A-Za-z\200-\377_0-9\$]
+
+identifier {ident_start}{ident_cont}*
+
+decimal (({digit}+)|({digit}*\.{digit}+)|({digit}+\.{digit}*))
+
+other .
+
+%%
+
+{whitespace} {
+ /* ignore */
+ }
+
+
+{xqstart} {
+ yyextra->saw_non_ascii = false;
+ SET_YYLLOC();
+ BEGIN(xq);
+ startlit();
+}
+<xq>{quote} {
+ /*
+ * When we are scanning a quoted string and see an end
+ * quote, we must look ahead for a possible continuation.
+ * If we don't see one, we know the end quote was in fact
+ * the end of the string. To reduce the lexer table size,
+ * we use a single "xqs" state to do the lookahead for all
+ * types of strings.
+ */
+ yyextra->state_before_str_stop = YYSTATE;
+ BEGIN(xqs);
+ }
+<xqs>{quotecontinue} {
+ /*
+ * Found a quote continuation, so return to the in-quote
+ * state and continue scanning the literal. Nothing is
+ * added to the literal's contents.
+ */
+ BEGIN(yyextra->state_before_str_stop);
+ }
+<xqs>{quotecontinuefail} |
+<xqs>{other} |
+<xqs><<EOF>> {
+ /*
+ * Failed to see a quote continuation. Throw back
+ * everything after the end quote, and handle the string
+ * according to the state we were in previously.
+ */
+ yyless(0);
+ BEGIN(INITIAL);
+
+ switch (yyextra->state_before_str_stop)
+ {
+ case xq:
+ /*
+ * Check that the data remains valid, if it might
+ * have been made invalid by unescaping any chars.
+ */
+ if (yyextra->saw_non_ascii)
+ pg_verifymbstr(yyextra->literalbuf,
+ yyextra->literallen,
+ false);
+ yylval->str = litbufdup(yyscanner);
+ return SCONST;
+ default:
+ yyerror("unhandled previous state in xqs");
+ }
+ }
+
+<xq>{xqdouble} {
+ addlitchar('\'', yyscanner);
+ }
+<xq>{xqinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xq><<EOF>> { yyerror("unterminated quoted string"); }
+
+
+{xdstart} {
+ SET_YYLLOC();
+ BEGIN(xd);
+ startlit();
+ }
+<xd>{xdstop} {
+ char *ident;
+
+ BEGIN(INITIAL);
+ if (yyextra->literallen == 0)
+ yyerror("zero-length delimited identifier");
+ ident = litbufdup(yyscanner);
+ if (yyextra->literallen >= NAMEDATALEN)
+ truncate_identifier(ident, yyextra->literallen, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+<xd>{xddouble} {
+ addlitchar('"', yyscanner);
+ }
+<xd>{xdinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xd><<EOF>> { yyerror("unterminated quoted identifier"); }
+
+{decimal} {
+ SET_YYLLOC();
+ yylval->str = pstrdup(yytext);
+ return FCONST;
+ }
+
+{identifier} {
+ const sqlol_ScanKeyword *keyword;
+ char *ident;
+
+ SET_YYLLOC();
+
+ /* Is it a keyword? */
+ keyword = sqlol_ScanKeywordLookup(yytext,
+ yyextra->keywords,
+ yyextra->num_keywords);
+ if (keyword != NULL)
+ {
+ yylval->keyword = keyword->name;
+ return keyword->value;
+ }
+
+ /*
+ * No. Convert the identifier to lower case, and truncate
+ * if necessary.
+ */
+ ident = downcase_truncate_identifier(yytext, yyleng, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+
+{other} {
+ SET_YYLLOC();
+ return yytext[0];
+ }
+
+<<EOF>> {
+ SET_YYLLOC();
+ yyterminate();
+ }
+
+%%
+
+/* LCOV_EXCL_STOP */
+
+/*
+ * Arrange access to yyextra for subroutines of the main yylex() function.
+ * We expect each subroutine to have a yyscanner parameter. Rather than
+ * use the yyget_xxx functions, which might or might not get inlined by the
+ * compiler, we cheat just a bit and cast yyscanner to the right type.
+ */
+#undef yyextra
+#define yyextra (((struct yyguts_t *) yyscanner)->yyextra_r)
+
+/* Likewise for a couple of other things we need. */
+#undef yylloc
+#define yylloc (((struct yyguts_t *) yyscanner)->yylloc_r)
+#undef yyleng
+#define yyleng (((struct yyguts_t *) yyscanner)->yyleng_r)
+
+
+/*
+ * scanner_errposition
+ * Report a lexer or grammar error cursor position, if possible.
+ *
+ * This is expected to be used within an ereport() call. The return value
+ * is a dummy (always 0, in fact).
+ *
+ * Note that this can only be used for messages emitted during raw parsing
+ * (essentially, sqlol_scan.l, sqlol_parser.c, sqlol_and gram.y), since it
+ * requires the yyscanner struct to still be available.
+ */
+int
+sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner)
+{
+ int pos;
+
+ if (location < 0)
+ return 0; /* no-op if location is unknown */
+
+ /* Convert byte offset to character number */
+ pos = pg_mbstrlen_with_len(yyextra->scanbuf, location) + 1;
+ /* And pass it to the ereport mechanism */
+ return errposition(pos);
+}
+
+/*
+ * scanner_yyerror
+ * Report a lexer or grammar error.
+ *
+ * Just ignore as we'll fallback to raw_parser().
+ */
+void
+sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner)
+{
+ return;
+}
+
+
+/*
+ * Called before any actual parsing is done
+ */
+sqlol_yyscan_t
+sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ Size slen = strlen(str);
+ yyscan_t scanner;
+
+ if (yylex_init(&scanner) != 0)
+ elog(ERROR, "yylex_init() failed: %m");
+
+ sqlol_yyset_extra(yyext, scanner);
+
+ yyext->keywords = keywords;
+ yyext->num_keywords = num_keywords;
+
+ /*
+ * Make a scan buffer with special termination needed by flex.
+ */
+ yyext->scanbuf = (char *) palloc(slen + 2);
+ yyext->scanbuflen = slen;
+ memcpy(yyext->scanbuf, str, slen);
+ yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
+ yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+
+ /* initialize literal buffer to a reasonable but expansible size */
+ yyext->literalalloc = 1024;
+ yyext->literalbuf = (char *) palloc(yyext->literalalloc);
+ yyext->literallen = 0;
+
+ return scanner;
+}
+
+
+/*
+ * Called after parsing is done to clean up after scanner_init()
+ */
+void
+sqlol_scanner_finish(sqlol_yyscan_t yyscanner)
+{
+ /*
+ * We don't bother to call yylex_destroy(), because all it would do is
+ * pfree a small amount of control storage. It's cheaper to leak the
+ * storage until the parsing context is destroyed. The amount of space
+ * involved is usually negligible compared to the output parse tree
+ * anyway.
+ *
+ * We do bother to pfree the scanbuf and literal buffer, but only if they
+ * represent a nontrivial amount of space. The 8K cutoff is arbitrary.
+ */
+ if (yyextra->scanbuflen >= 8192)
+ pfree(yyextra->scanbuf);
+ if (yyextra->literalalloc >= 8192)
+ pfree(yyextra->literalbuf);
+}
+
+
+static void
+addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + yleng) >= yyextra->literalalloc)
+ {
+ do
+ {
+ yyextra->literalalloc *= 2;
+ } while ((yyextra->literallen + yleng) >= yyextra->literalalloc);
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ memcpy(yyextra->literalbuf + yyextra->literallen, ytext, yleng);
+ yyextra->literallen += yleng;
+}
+
+
+static void
+addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + 1) >= yyextra->literalalloc)
+ {
+ yyextra->literalalloc *= 2;
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ yyextra->literalbuf[yyextra->literallen] = ychar;
+ yyextra->literallen += 1;
+}
+
+
+/*
+ * Create a palloc'd copy of literalbuf, adding a trailing null.
+ */
+static char *
+litbufdup(sqlol_yyscan_t yyscanner)
+{
+ int llen = yyextra->literallen;
+ char *new;
+
+ new = palloc(llen + 1);
+ memcpy(new, yyextra->literalbuf, llen);
+ new[llen] = '\0';
+ return new;
+}
+
+/*
+ * Interface functions to make flex use palloc() instead of malloc().
+ * It'd be better to make these static, but flex insists otherwise.
+ */
+
+void *
+sqlol_yyalloc(yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ return palloc(bytes);
+}
+
+void *
+sqlol_yyrealloc(void *ptr, yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ return repalloc(ptr, bytes);
+ else
+ return palloc(bytes);
+}
+
+void
+sqlol_yyfree(void *ptr, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ pfree(ptr);
+}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
new file mode 100644
index 0000000000..0a497e9d91
--- /dev/null
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scanner.h
+ * API for the core scanner (flex machine)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_scanner.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_SCANNER_H
+#define SQLOL_SCANNER_H
+
+#include "sqlol_keywords.h"
+
+/*
+ * The scanner returns extra data about scanned tokens in this union type.
+ * Note that this is a subset of the fields used in YYSTYPE of the bison
+ * parsers built atop the scanner.
+ */
+typedef union sqlol_YYSTYPE
+{
+ int ival; /* for integer literals */
+ char *str; /* for identifiers and non-integer literals */
+ const char *keyword; /* canonical spelling of keywords */
+} sqlol_YYSTYPE;
+
+/*
+ * We track token locations in terms of byte offsets from the start of the
+ * source string, not the column number/line number representation that
+ * bison uses by default. Also, to minimize overhead we track only one
+ * location (usually the first token location) for each construct, not
+ * the beginning and ending locations as bison does by default. It's
+ * therefore sufficient to make YYLTYPE an int.
+ */
+#define YYLTYPE int
+
+/*
+ * Another important component of the scanner's API is the token code numbers.
+ * However, those are not defined in this file, because bison insists on
+ * defining them for itself. The token codes used by the core scanner are
+ * the ASCII characters plus these:
+ * %token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
+ * %token <ival> ICONST PARAM
+ * %token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
+ * %token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+ * The above token definitions *must* be the first ones declared in any
+ * bison parser built atop this scanner, so that they will have consistent
+ * numbers assigned to them (specifically, IDENT = 258 and so on).
+ */
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around.
+ * Private state needed by the core scanner goes here. Note that the actual
+ * yy_extra struct may be larger and have this as its first component, thus
+ * allowing the calling parser to keep some fields of its own in YY_EXTRA.
+ */
+typedef struct sqlol_yy_extra_type
+{
+ /*
+ * The string the scanner is physically scanning. We keep this mainly so
+ * that we can cheaply compute the offset of the current token (yytext).
+ */
+ char *scanbuf;
+ Size scanbuflen;
+
+ /*
+ * The keyword list to use, and the associated grammar token codes.
+ */
+ const sqlol_ScanKeyword *keywords;
+ int num_keywords;
+
+ /*
+ * literalbuf is used to accumulate literal values when multiple rules are
+ * needed to parse a single literal. Call startlit() to reset buffer to
+ * empty, addlit() to add text. NOTE: the string in literalbuf is NOT
+ * necessarily null-terminated, but there always IS room to add a trailing
+ * null at offset literallen. We store a null only when we need it.
+ */
+ char *literalbuf; /* palloc'd expandable buffer */
+ int literallen; /* actual current string length */
+ int literalalloc; /* current allocated buffer size */
+
+ /*
+ * Random assorted scanner state.
+ */
+ int state_before_str_stop; /* start cond. before end quote */
+ YYLTYPE save_yylloc; /* one-element stack for PUSH_YYLLOC() */
+
+ /* state variables for literal-lexing warnings */
+ bool saw_non_ascii;
+} sqlol_yy_extra_type;
+
+/*
+ * The type of yyscanner is opaque outside scan.l.
+ */
+typedef void *sqlol_yyscan_t;
+
+
+/* Constant data exported from parser/scan.l */
+extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
+
+/* Entry points in parser/scan.l */
+extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
+extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+extern int sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner);
+extern void sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_SCANNER_H */
--
2.31.1
v3-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload
From e85972610aa0d3aa090011c7f2c3bbe976a7c803 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 01:33:42 +0800
Subject: [PATCH v3 3/4] Add a new MODE_SINGLE_QUERY to the core parser and use
it in pg_parse_query.
If a third-party module provides a parser_hook, pg_parse_query() switches to
single-query parsing so multi-query commands using different grammar can work
properly. If the third-party module supports the full set of SQL we support,
or want to prevent fallback on the core parser, it can ignore the
MODE_SINGLE_QUERY mode and parse the full query string. In that case they must
return a List with more than one RawStmt or a single RawStmt with a 0 length to
stop the parsing phase, or raise an ERROR.
Otherwise, plugins should parse a single query only and always return a List
containing a single RawStmt with a properly set length (possibly 0 if it was a
single query without end of query delimiter). If the command is valid but
doesn't contain any statements (e.g. a single semi-colon), a single RawStmt
with a NULL stmt field should be returned, containing the consumed query string
length so we can move to the next command in a single pass rather than 1 byte
at a time.
Also, third-party modules can choose to ignore some or all of parsing error if
they want to implement only subset of postgres suppoted syntax, or even a
totally different syntax, and fall-back on core grammar for unhandled case. In
thase case, they should set the error flag to true. The returned List will be
ignored and the same offset of the input string will be parsed using the core
parser.
Finally, note that third-party plugins that wants to fallback on other grammar
should first try to call a previous parser hook if any before setting the error
switch and returning.
---
.../pg_stat_statements/pg_stat_statements.c | 3 +-
src/backend/commands/tablecmds.c | 2 +-
src/backend/executor/spi.c | 4 +-
src/backend/parser/gram.y | 29 +++-
src/backend/parser/parse_type.c | 2 +-
src/backend/parser/parser.c | 15 +-
src/backend/parser/scan.l | 26 +++-
src/backend/tcop/postgres.c | 138 ++++++++++++++++--
src/include/parser/parser.h | 5 +-
src/include/parser/scanner.h | 6 +-
src/include/tcop/tcopprot.h | 3 +-
src/pl/plpgsql/src/pl_gram.y | 2 +-
src/pl/plpgsql/src/pl_scanner.c | 2 +-
13 files changed, 210 insertions(+), 27 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 09433c8c96..d852575613 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2718,7 +2718,8 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
yyscanner = scanner_init(query,
&yyextra,
&ScanKeywords,
- ScanKeywordTokens);
+ ScanKeywordTokens,
+ 0);
/* we don't want to re-emit any escape string warnings */
yyextra.escape_string_warning = false;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 028e8ac46b..284933c693 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12677,7 +12677,7 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
* parse_analyze() or the rewriter, but instead we need to pass them
* through parse_utilcmd.c to make them ready for execution.
*/
- raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT);
+ raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT, 0);
querytree_list = NIL;
foreach(list_item, raw_parsetree_list)
{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index b8bd05e894..f05b3ce9e7 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2120,7 +2120,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Do parse analysis and rule rewrite for each raw parsetree, storing the
@@ -2228,7 +2228,7 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Construct plancache entries, but don't do parse analysis yet.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9ee90e3f13..2cac062ef4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -626,7 +626,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
%token <ival> ICONST PARAM
%token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
-%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS END_OF_FILE
/*
* If you want to make any keyword changes, update the keyword table in
@@ -753,6 +753,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token MODE_PLPGSQL_ASSIGN1
%token MODE_PLPGSQL_ASSIGN2
%token MODE_PLPGSQL_ASSIGN3
+%token MODE_SINGLE_QUERY
/* Precedence: lowest to highest */
@@ -858,6 +859,32 @@ parse_toplevel:
pg_yyget_extra(yyscanner)->parsetree =
list_make1(makeRawStmt((Node *) n, 0));
}
+ | MODE_SINGLE_QUERY toplevel_stmt ';'
+ {
+ RawStmt *raw = makeRawStmt($2, 0);
+ updateRawStmtEnd(raw, @3 + 1);
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string and move to the next command.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(raw);
+ YYACCEPT;
+ }
+ /*
+ * We need to explicitly look for EOF to parse non-semicolon
+ * terminated statements in single query mode, as we could
+ * otherwise successfully parse the beginning of an otherwise
+ * invalid query.
+ */
+ | MODE_SINGLE_QUERY toplevel_stmt END_OF_FILE
+ {
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(makeRawStmt($2, 0));
+ YYACCEPT;
+ }
;
/*
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index abe131ebeb..e9a7b5d62a 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -746,7 +746,7 @@ typeStringToTypeName(const char *str)
ptserrcontext.previous = error_context_stack;
error_context_stack = &ptserrcontext;
- raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME);
+ raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME, 0);
error_context_stack = ptserrcontext.previous;
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 875de7ba28..23fd49e74c 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -37,17 +37,25 @@ static char *str_udeescape(const char *str, char escape,
*
* Returns a list of raw (un-analyzed) parse trees. The contents of the
* list have the form required by the specified RawParseMode.
+ *
+ * For all mode different from MODE_SINGLE_QUERY, caller should provide a 0
+ * offset as the whole input string should be parsed. Otherwise, caller should
+ * provide the wanted offset in the input string, or -1 if no offset is
+ * required.
*/
List *
-raw_parser(const char *str, RawParseMode mode)
+raw_parser(const char *str, RawParseMode mode, int offset)
{
core_yyscan_t yyscanner;
base_yy_extra_type yyextra;
int yyresult;
+ Assert((mode != RAW_PARSE_SINGLE_QUERY && offset == 0) ||
+ (mode == RAW_PARSE_SINGLE_QUERY && offset != 0));
+
/* initialize the flex scanner */
yyscanner = scanner_init(str, &yyextra.core_yy_extra,
- &ScanKeywords, ScanKeywordTokens);
+ &ScanKeywords, ScanKeywordTokens, offset);
/* base_yylex() only needs us to initialize the lookahead token, if any */
if (mode == RAW_PARSE_DEFAULT)
@@ -61,7 +69,8 @@ raw_parser(const char *str, RawParseMode mode)
MODE_PLPGSQL_EXPR, /* RAW_PARSE_PLPGSQL_EXPR */
MODE_PLPGSQL_ASSIGN1, /* RAW_PARSE_PLPGSQL_ASSIGN1 */
MODE_PLPGSQL_ASSIGN2, /* RAW_PARSE_PLPGSQL_ASSIGN2 */
- MODE_PLPGSQL_ASSIGN3 /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_PLPGSQL_ASSIGN3, /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_SINGLE_QUERY /* RAW_PARSE_SINGLE_QUERY */
};
yyextra.have_lookahead = true;
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 9f9d8a1706..8ccbe95ac6 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -1041,7 +1041,10 @@ other .
<<EOF>> {
SET_YYLLOC();
- yyterminate();
+ if (yyextra->return_eof)
+ return END_OF_FILE;
+ else
+ yyterminate();
}
%%
@@ -1189,8 +1192,10 @@ core_yyscan_t
scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens)
+ const uint16 *keyword_tokens,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -1213,13 +1218,28 @@ scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Note that pg_parse_query will set a -1 offset rather than 0 for the
+ * first query of a possibly multi-query string if it wants us to return an
+ * EOF token.
+ */
+ yyext->return_eof = (offset != 0);
+
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ if (offset > 0)
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e941b59b85..9331628add 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -602,17 +602,137 @@ ProcessClientWriteInterrupt(bool blocked)
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list = NIL;
+ List *result = NIL;
+ int stmt_len, offset;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- if (parser_hook)
- raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
- else
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ stmt_len = 0; /* lazily computed when needed */
+ offset = 0;
+
+ while(true)
+ {
+ List *raw_parsetree_list;
+ RawStmt *raw;
+ bool error = false;
+
+ /*----------------
+ * Start parsing the input string. If a third-party module provided a
+ * parser_hook, we switch to single-query parsing so multi-query
+ * commands using different grammar can work properly.
+ * If the third-party modules support the full set of SQL we support,
+ * or want to prevent fallback on the core parser, it can ignore the
+ * RAW_PARSE_SINGLE_QUERY flag and parse the full query string.
+ * In that case they must return a List with more than one RawStmt or a
+ * single RawStmt with a 0 length to stop the parsing phase, or raise
+ * an ERROR.
+ *
+ * Otherwise, plugins should parse a single query only and always
+ * return a List containing a single RawStmt with a properly set length
+ * (possibly 0 if it was a single query without end of query
+ * delimiter). If the command is valid but doesn't contain any
+ * statements (e.g. a single semi-colon), a single RawStmt with a NULL
+ * stmt field should be returned, containing the consumed query string
+ * length so we can move to the next command in a single pass rather
+ * than 1 byte at a time.
+ *
+ * Also, third-party modules can choose to ignore some or all of
+ * parsing error if they want to implement only subset of postgres
+ * suppoted syntax, or even a totally different syntax, and fall-back
+ * on core grammar for unhandled case. In thase case, they should set
+ * the error flag to true. The returned List will be ignored and the
+ * same offset of the input string will be parsed using the core
+ * parser.
+ *
+ * Finally, note that third-party modules that wants to fallback on
+ * other grammar should first try to call a previous parser hook if any
+ * before setting the error switch and returning .
+ */
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string,
+ RAW_PARSE_SINGLE_QUERY,
+ offset,
+ &error);
+
+ /*
+ * If a third-party module couldn't parse a single query or if no
+ * third-party module is configured, fallback on core parser.
+ */
+ if (error || !parser_hook)
+ {
+ /* Send a -1 offset to raw_parser to specify that it should
+ * explicitly detect EOF during parsing. scanner_init() will treat
+ * it the same as a 0 offset.
+ */
+ raw_parsetree_list = raw_parser(query_string,
+ error ? RAW_PARSE_SINGLE_QUERY : RAW_PARSE_DEFAULT,
+ (error && offset == 0) ? -1 : offset);
+ }
+
+ /*
+ * If there are no third-party plugin, or none of the parsers found a
+ * valid query, or if a third party module consumed the whole
+ * query string we're done.
+ */
+ if (!parser_hook || raw_parsetree_list == NIL ||
+ list_length(raw_parsetree_list) > 1)
+ {
+ /*
+ * Warn third-party plugins if they mix "single query" and "whole
+ * input string" strategy rather than silently accepting it and
+ * maybe allow fallback on core grammar even if they want to avoid
+ * that. This way plugin authors can be warned early of the issue.
+ */
+ if (result != NIL)
+ {
+ Assert(parser_hook != NULL);
+ elog(ERROR, "parser_hook should parse a single statement at "
+ "a time or consume the whole input string at once");
+ }
+ result = raw_parsetree_list;
+ break;
+ }
+
+ if (stmt_len == 0)
+ stmt_len = strlen(query_string);
+
+ raw = linitial_node(RawStmt, raw_parsetree_list);
+
+ /*
+ * In single-query mode, the parser will return statement location info
+ * relative to the beginning of complete original string, not the part
+ * we just parsed, so adjust the location info.
+ */
+ if (offset > 0 && raw->stmt_len > 0)
+ {
+ Assert(raw->stmt_len > offset);
+ raw->stmt_location = offset;
+ raw->stmt_len -= offset;
+ }
+
+ /* Ignore the statement if it didn't contain any command. */
+ if (raw->stmt)
+ result = lappend(result, raw);
+
+ if (raw->stmt_len == 0)
+ {
+ /* The statement was the whole string, we're done. */
+ break;
+ }
+ else if (raw->stmt_len + offset >= stmt_len)
+ {
+ /* We consumed all of the input string, we're done. */
+ break;
+ }
+ else
+ {
+ /* Advance the offset to the next command. */
+ offset += raw->stmt_len;
+ }
+ }
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
@@ -620,13 +740,13 @@ pg_parse_query(const char *query_string)
#ifdef COPY_PARSE_PLAN_TREES
/* Optional debugging check: pass raw parsetrees through copyObject() */
{
- List *new_list = copyObject(raw_parsetree_list);
+ List *new_list = copyObject(result);
/* This checks both copyObject() and the equal() routines... */
- if (!equal(new_list, raw_parsetree_list))
+ if (!equal(new_list, result))
elog(WARNING, "copyObject() failed to produce an equal raw parse tree");
else
- raw_parsetree_list = new_list;
+ result = new_list;
}
#endif
@@ -638,7 +758,7 @@ pg_parse_query(const char *query_string)
TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
- return raw_parsetree_list;
+ return result;
}
/*
diff --git a/src/include/parser/parser.h b/src/include/parser/parser.h
index 853b0f1606..5694ae791a 100644
--- a/src/include/parser/parser.h
+++ b/src/include/parser/parser.h
@@ -41,7 +41,8 @@ typedef enum
RAW_PARSE_PLPGSQL_EXPR,
RAW_PARSE_PLPGSQL_ASSIGN1,
RAW_PARSE_PLPGSQL_ASSIGN2,
- RAW_PARSE_PLPGSQL_ASSIGN3
+ RAW_PARSE_PLPGSQL_ASSIGN3,
+ RAW_PARSE_SINGLE_QUERY
} RawParseMode;
/* Values for the backslash_quote GUC */
@@ -59,7 +60,7 @@ extern PGDLLIMPORT bool standard_conforming_strings;
/* Primary entry point for the raw parsing functions */
-extern List *raw_parser(const char *str, RawParseMode mode);
+extern List *raw_parser(const char *str, RawParseMode mode, int offset);
/* Utility functions exported by gram.y (perhaps these should be elsewhere) */
extern List *SystemFuncName(char *name);
diff --git a/src/include/parser/scanner.h b/src/include/parser/scanner.h
index 0d8182faa0..a2e97be5d5 100644
--- a/src/include/parser/scanner.h
+++ b/src/include/parser/scanner.h
@@ -113,6 +113,9 @@ typedef struct core_yy_extra_type
/* state variables for literal-lexing warnings */
bool warn_on_first_escape;
bool saw_non_ascii;
+
+ /* state variable for returning an EOF token in single query mode */
+ bool return_eof;
} core_yy_extra_type;
/*
@@ -136,7 +139,8 @@ extern PGDLLIMPORT const uint16 ScanKeywordTokens[];
extern core_yyscan_t scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens);
+ const uint16 *keyword_tokens,
+ int offset);
extern void scanner_finish(core_yyscan_t yyscanner);
extern int core_yylex(core_YYSTYPE *lvalp, YYLTYPE *llocp,
core_yyscan_t yyscanner);
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 131dc2b22e..27201dde1d 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -45,7 +45,8 @@ typedef enum
extern PGDLLIMPORT int log_statement;
/* Hook for plugins to get control in pg_parse_query() */
-typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode,
+ int offset, bool *error);
extern PGDLLIMPORT parser_hook_type parser_hook;
extern List *pg_parse_query(const char *query_string);
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 3fcca43b90..e5a8a6477a 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3656,7 +3656,7 @@ check_sql_expr(const char *stmt, RawParseMode parseMode, int location)
error_context_stack = &syntax_errcontext;
oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt);
- (void) raw_parser(stmt, parseMode);
+ (void) raw_parser(stmt, parseMode, 0);
MemoryContextSwitchTo(oldCxt);
/* Restore former ereport callback */
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index e4c7a91ab5..a2886c42ec 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -587,7 +587,7 @@ plpgsql_scanner_init(const char *str)
{
/* Start up the core scanner */
yyscanner = scanner_init(str, &core_yy,
- &ReservedPLKeywords, ReservedPLKeywordTokens);
+ &ReservedPLKeywords, ReservedPLKeywordTokens, 0);
/*
* scanorig points to the original string, which unlike the scanner's
--
2.31.1
v3-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload
From 379695587598a0af4490fef22f17f7f28f7df0ad Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 02:15:54 +0800
Subject: [PATCH v3 4/4] Teach sqlol to use the new MODE_SINGLE_QUERY parser
mode.
This way multi-statements commands using both core parser and sqlol parser can
be supported.
Also add a LOLCODE version of CREATE VIEW viewname AS to easily test
multi-statements commands.
---
contrib/sqlol/Makefile | 2 +
contrib/sqlol/expected/01_sqlol.out | 74 +++++++++++++++++++++++++++++
contrib/sqlol/repro.sql | 18 +++++++
contrib/sqlol/sql/01_sqlol.sql | 40 ++++++++++++++++
contrib/sqlol/sqlol.c | 24 ++++++----
contrib/sqlol/sqlol_gram.y | 63 ++++++++++++------------
contrib/sqlol/sqlol_kwlist.h | 1 +
contrib/sqlol/sqlol_scan.l | 13 ++++-
contrib/sqlol/sqlol_scanner.h | 3 +-
9 files changed, 192 insertions(+), 46 deletions(-)
create mode 100644 contrib/sqlol/expected/01_sqlol.out
create mode 100644 contrib/sqlol/repro.sql
create mode 100644 contrib/sqlol/sql/01_sqlol.sql
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
index 3850ac3fce..eaf94801c2 100644
--- a/contrib/sqlol/Makefile
+++ b/contrib/sqlol/Makefile
@@ -6,6 +6,8 @@ OBJS = \
sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+REGRESS = 01_sqlol
+
sqlol_gram.h: sqlol_gram.c
touch $@
diff --git a/contrib/sqlol/expected/01_sqlol.out b/contrib/sqlol/expected/01_sqlol.out
new file mode 100644
index 0000000000..a18eaf6801
--- /dev/null
+++ b/contrib/sqlol/expected/01_sqlol.out
@@ -0,0 +1,74 @@
+LOAD 'sqlol';
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+ id | val
+----+-----
+(0 rows)
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+ ?column?
+----------
+ 3
+(1 row)
+
+-- test empty statement ignoring
+\;\;select 1 \g
+ ?column?
+----------
+ 1
+(1 row)
+
+-- check the created views
+\d
+ List of relations
+ Schema | Name | Type | Owner
+--------+------+-------+-------
+ public | t1 | table | rjuju
+ public | v0 | view | rjuju
+ public | v1 | view | rjuju
+ public | v2 | view | rjuju
+ public | v3 | view | rjuju
+ public | v4 | view | rjuju
+ public | v5 | view | rjuju
+(7 rows)
+
+--
+-- Error position
+--
+SELECT 1\;err;
+ERROR: syntax error at or near "err"
+LINE 1: SELECT 1;err;
+ ^
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+ERROR: syntax error at or near "HAI"
+LINE 1: SELECT 1;HAI 1.2 I HAS A t1 GIMME id KTHXBYE
+ ^
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+ERROR: improper qualified name (too many dotted names): some.thing.public.t1
+LINE 1: SELECT 1;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHX...
+ ^
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
+ERROR: relation "notatable" does not exist
+LINE 1: SELECT 1;SELECT * FROM notatable;
+ ^
diff --git a/contrib/sqlol/repro.sql b/contrib/sqlol/repro.sql
new file mode 100644
index 0000000000..0ebcb53160
--- /dev/null
+++ b/contrib/sqlol/repro.sql
@@ -0,0 +1,18 @@
+DROP TABLE IF EXISTS t1 CASCADE;
+
+LOAD 'sqlol';
+
+\;\; SELECT 1\;
+
+CREATE TABLE t1 (id integer, val text);
+
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+SELECT 1\;SELECT 2\;SELECT 3 \g
+\d
diff --git a/contrib/sqlol/sql/01_sqlol.sql b/contrib/sqlol/sql/01_sqlol.sql
new file mode 100644
index 0000000000..918caf94c0
--- /dev/null
+++ b/contrib/sqlol/sql/01_sqlol.sql
@@ -0,0 +1,40 @@
+LOAD 'sqlol';
+
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+
+-- test empty statement ignoring
+\;\;select 1 \g
+
+-- check the created views
+\d
+
+--
+-- Error position
+--
+SELECT 1\;err;
+
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
index b986966181..7d4e1b631f 100644
--- a/contrib/sqlol/sqlol.c
+++ b/contrib/sqlol/sqlol.c
@@ -26,7 +26,8 @@ static parser_hook_type prev_parser_hook = NULL;
void _PG_init(void);
void _PG_fini(void);
-static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+static List *sqlol_parser_hook(const char *str, RawParseMode mode, int offset,
+ bool *error);
/*
@@ -54,23 +55,25 @@ _PG_fini(void)
* sqlol_parser_hook: parse our grammar
*/
static List *
-sqlol_parser_hook(const char *str, RawParseMode mode)
+sqlol_parser_hook(const char *str, RawParseMode mode, int offset, bool *error)
{
sqlol_yyscan_t yyscanner;
sqlol_base_yy_extra_type yyextra;
int yyresult;
- if (mode != RAW_PARSE_DEFAULT)
+ if (mode != RAW_PARSE_DEFAULT && mode != RAW_PARSE_SINGLE_QUERY)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
/* initialize the flex scanner */
yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
- sqlol_ScanKeywords, sqlol_NumScanKeywords);
+ sqlol_ScanKeywords, sqlol_NumScanKeywords,
+ offset);
/* initialize the bison parser */
sqlol_parser_init(&yyextra);
@@ -88,9 +91,10 @@ sqlol_parser_hook(const char *str, RawParseMode mode)
if (yyresult)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
return yyextra.parsetree;
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
index 64d00d14ca..4c36cfef5e 100644
--- a/contrib/sqlol/sqlol_gram.y
+++ b/contrib/sqlol/sqlol_gram.y
@@ -20,6 +20,7 @@
#include "catalog/namespace.h"
#include "nodes/makefuncs.h"
+#include "catalog/pg_class_d.h"
#include "sqlol_gramparse.h"
@@ -106,10 +107,10 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
ResTarget *target;
}
-%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+%type <node> stmt toplevel_stmt GimmehStmt MaekStmt simple_gimmeh columnref
indirection_el
-%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+%type <list> parse_toplevel rawstmt gimmeh_list indirection
%type <range> qualified_name
@@ -134,22 +135,19 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
*/
/* ordinary key words in alphabetical order */
-%token <keyword> A GIMMEH HAI HAS I KTHXBYE
-
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE MAEK
%%
/*
* The target production for the whole parse.
- *
- * Ordinarily we parse a list of statements, but if we see one of the
- * special MODE_XXX symbols as first token, we parse something else.
- * The options here correspond to enum RawParseMode, which see for details.
*/
parse_toplevel:
- stmtmulti
+ rawstmt
{
pg_yyget_extra(yyscanner)->parsetree = $1;
+
+ YYACCEPT;
}
;
@@ -163,24 +161,11 @@ parse_toplevel:
* we'd get -1 for the location in such cases.
* We also take care to discard empty statements entirely.
*/
-stmtmulti: stmtmulti KTHXBYE toplevel_stmt
- {
- if ($1 != NIL)
- {
- /* update length of previous stmt */
- updateRawStmtEnd(llast_node(RawStmt, $1), @2);
- }
- if ($3 != NULL)
- $$ = lappend($1, makeRawStmt($3, @2 + 1));
- else
- $$ = $1;
- }
- | toplevel_stmt
+rawstmt: toplevel_stmt KTHXBYE
{
- if ($1 != NULL)
- $$ = list_make1(makeRawStmt($1, 0));
- else
- $$ = NIL;
+ RawStmt *raw = makeRawStmt($1, 0);
+ updateRawStmtEnd(raw, @2 + 7);
+ $$ = list_make1(raw);
}
;
@@ -189,13 +174,12 @@ stmtmulti: stmtmulti KTHXBYE toplevel_stmt
* those words have different meanings in function bodys.
*/
toplevel_stmt:
- stmt
+ HAI FCONST stmt { $$ = $3; }
;
stmt:
GimmehStmt
- | /*EMPTY*/
- { $$ = NULL; }
+ | MaekStmt
;
/*****************************************************************************
@@ -209,12 +193,11 @@ GimmehStmt:
;
simple_gimmeh:
- HAI FCONST I HAS A qualified_name
- GIMMEH gimmeh_list
+ I HAS A qualified_name GIMMEH gimmeh_list
{
SelectStmt *n = makeNode(SelectStmt);
- n->targetList = $8;
- n->fromClause = list_make1($6);
+ n->targetList = $6;
+ n->fromClause = list_make1($4);
$$ = (Node *)n;
}
;
@@ -233,6 +216,20 @@ gimmeh_el:
$$->location = @1;
}
+MaekStmt:
+ MAEK GimmehStmt A qualified_name
+ {
+ ViewStmt *n = makeNode(ViewStmt);
+ n->view = $4;
+ n->view->relpersistence = RELPERSISTENCE_PERMANENT;
+ n->aliases = NIL;
+ n->query = $2;
+ n->replace = false;
+ n->options = NIL;
+ n->withCheckOption = false;
+ $$ = (Node *) n;
+ }
+
qualified_name:
ColId
{
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
index 2de3893ee4..8b50d88df9 100644
--- a/contrib/sqlol/sqlol_kwlist.h
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -19,3 +19,4 @@ PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
+PG_KEYWORD("maek", MAEK, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
index a7088b8390..e6d4d53446 100644
--- a/contrib/sqlol/sqlol_scan.l
+++ b/contrib/sqlol/sqlol_scan.l
@@ -412,8 +412,10 @@ sqlol_yyscan_t
sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords)
+ int num_keywords,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -432,13 +434,20 @@ sqlol_scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
index 0a497e9d91..57f95867ee 100644
--- a/contrib/sqlol/sqlol_scanner.h
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -108,7 +108,8 @@ extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords);
+ int num_keywords,
+ int offset);
extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
sqlol_yyscan_t yyscanner);
--
2.31.1
On Tue, Jun 08, 2021 at 12:16:48PM +0800, Julien Rouhaud wrote:
On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:
On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:
I'm attaching some POC patches that implement this approach to start a
discussion.
The regression tests weren't stable, v4 fixes that.
Attachments:
v4-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload
From 236c61e4f26b5ef2dedb9ecb7efacb175777fba8 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 22:47:18 +0800
Subject: [PATCH v4 1/4] Add a parser_hook hook.
This does nothing but allow third-party plugins to implement a different
syntax, and fallback on the core parser if they don't implement a superset of
the supported core syntax.
---
src/backend/tcop/postgres.c | 16 ++++++++++++++--
src/include/tcop/tcopprot.h | 5 +++++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..e941b59b85 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -99,6 +99,9 @@ int log_statement = LOGSTMT_NONE;
/* GUC variable for maximum stack depth (measured in kilobytes) */
int max_stack_depth = 100;
+/* Hook for plugins to get control in pg_parse_query() */
+parser_hook_type parser_hook = NULL;
+
/* wait N seconds to allow attach from a debugger */
int PostAuthDelay = 0;
@@ -589,18 +592,27 @@ ProcessClientWriteInterrupt(bool blocked)
* database tables. So, we rely on the raw parser to determine whether
* we've seen a COMMIT or ABORT command; when we are in abort state, other
* commands are not processed any further than the raw parse stage.
+ *
+ * To support loadable plugins that monitor the parsing or implements SQL
+ * syntactic sugar we provide a hook variable that lets a plugin get control
+ * before and after the standard parsing process. If the plugin only implement
+ * a subset of postgres supported syntax, it's its duty to call raw_parser (or
+ * the previous hook if any) for the statements it doesn't understand.
*/
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list;
+ List *raw_parsetree_list = NIL;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
+ else
+ raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 968345404e..131dc2b22e 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -17,6 +17,7 @@
#include "nodes/params.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
+#include "parser/parser.h"
#include "storage/procsignal.h"
#include "utils/guc.h"
#include "utils/queryenvironment.h"
@@ -43,6 +44,10 @@ typedef enum
extern PGDLLIMPORT int log_statement;
+/* Hook for plugins to get control in pg_parse_query() */
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+extern PGDLLIMPORT parser_hook_type parser_hook;
+
extern List *pg_parse_query(const char *query_string);
extern List *pg_rewrite_query(Query *query);
extern List *pg_analyze_and_rewrite(RawStmt *parsetree,
--
2.31.1
v4-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload
From dcd5a2b45fcc65724ac81e78afb0611e310f15e7 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 23:54:02 +0800
Subject: [PATCH v4 2/4] Add a sqlol parser.
This is a toy example of alternative grammar that only accept a LOLCODE
compatible version of a
SELECT [column, ] column FROM tablename
and fallback on the core parser for everything else.
---
contrib/Makefile | 1 +
contrib/sqlol/.gitignore | 7 +
contrib/sqlol/Makefile | 33 ++
contrib/sqlol/sqlol.c | 107 +++++++
contrib/sqlol/sqlol_gram.y | 440 ++++++++++++++++++++++++++
contrib/sqlol/sqlol_gramparse.h | 61 ++++
contrib/sqlol/sqlol_keywords.c | 98 ++++++
contrib/sqlol/sqlol_keywords.h | 38 +++
contrib/sqlol/sqlol_kwlist.h | 21 ++
contrib/sqlol/sqlol_scan.l | 544 ++++++++++++++++++++++++++++++++
contrib/sqlol/sqlol_scanner.h | 118 +++++++
11 files changed, 1468 insertions(+)
create mode 100644 contrib/sqlol/.gitignore
create mode 100644 contrib/sqlol/Makefile
create mode 100644 contrib/sqlol/sqlol.c
create mode 100644 contrib/sqlol/sqlol_gram.y
create mode 100644 contrib/sqlol/sqlol_gramparse.h
create mode 100644 contrib/sqlol/sqlol_keywords.c
create mode 100644 contrib/sqlol/sqlol_keywords.h
create mode 100644 contrib/sqlol/sqlol_kwlist.h
create mode 100644 contrib/sqlol/sqlol_scan.l
create mode 100644 contrib/sqlol/sqlol_scanner.h
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..2a80cd137b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
postgres_fdw \
seg \
spi \
+ sqlol \
tablefunc \
tcn \
test_decoding \
diff --git a/contrib/sqlol/.gitignore b/contrib/sqlol/.gitignore
new file mode 100644
index 0000000000..3c4b587792
--- /dev/null
+++ b/contrib/sqlol/.gitignore
@@ -0,0 +1,7 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
+sqlol_gram.c
+sqlol_gram.h
+sqlol_scan.c
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
new file mode 100644
index 0000000000..3850ac3fce
--- /dev/null
+++ b/contrib/sqlol/Makefile
@@ -0,0 +1,33 @@
+# contrib/sqlol/Makefile
+
+MODULE_big = sqlol
+OBJS = \
+ $(WIN32RES) \
+ sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
+PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+
+sqlol_gram.h: sqlol_gram.c
+ touch $@
+
+sqlol_gram.c: BISONFLAGS += -d
+# sqlol_gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/src/include/parser/kwlist.h
+
+
+sqlol_scan.c: FLEXFLAGS = -CF -p -p
+sqlol_scan.c: FLEX_NO_BACKUP=yes
+sqlol_scan.c: FLEX_FIX_WARNING=yes
+
+
+# Force these dependencies to be known even without dependency info built:
+sqlol.o sqlol_gram.o sqlol_scan.o parser.o: sqlol_gram.h
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/sqlol
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
new file mode 100644
index 0000000000..b986966181
--- /dev/null
+++ b/contrib/sqlol/sqlol.c
@@ -0,0 +1,107 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol.c
+ *
+ *
+ * Copyright (c) 2008-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "tcop/tcopprot.h"
+
+#include "sqlol_gramparse.h"
+#include "sqlol_keywords.h"
+
+PG_MODULE_MAGIC;
+
+
+/* Saved hook values in case of unload */
+static parser_hook_type prev_parser_hook = NULL;
+
+void _PG_init(void);
+void _PG_fini(void);
+
+static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+ /* Install hooks. */
+ prev_parser_hook = parser_hook;
+ parser_hook = sqlol_parser_hook;
+}
+
+/*
+ * Module unload callback
+ */
+void
+_PG_fini(void)
+{
+ /* Uninstall hooks. */
+ parser_hook = prev_parser_hook;
+}
+
+/*
+ * sqlol_parser_hook: parse our grammar
+ */
+static List *
+sqlol_parser_hook(const char *str, RawParseMode mode)
+{
+ sqlol_yyscan_t yyscanner;
+ sqlol_base_yy_extra_type yyextra;
+ int yyresult;
+
+ if (mode != RAW_PARSE_DEFAULT)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ /* initialize the flex scanner */
+ yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
+ sqlol_ScanKeywords, sqlol_NumScanKeywords);
+
+ /* initialize the bison parser */
+ sqlol_parser_init(&yyextra);
+
+ /* Parse! */
+ yyresult = sqlol_base_yyparse(yyscanner);
+
+ /* Clean up (release memory) */
+ sqlol_scanner_finish(yyscanner);
+
+ /*
+ * Invalid statement, fallback on previous parser_hook if any or
+ * raw_parser()
+ */
+ if (yyresult)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ return yyextra.parsetree;
+}
+
+int
+sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, sqlol_yyscan_t yyscanner)
+{
+ int cur_token;
+
+ cur_token = sqlol_yylex(&(lvalp->sqlol_yystype), llocp, yyscanner);
+
+ return cur_token;
+}
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
new file mode 100644
index 0000000000..64d00d14ca
--- /dev/null
+++ b/contrib/sqlol/sqlol_gram.y
@@ -0,0 +1,440 @@
+%{
+
+/*#define YYDEBUG 1*/
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gram.y
+ * sqlol BISON rules/actions
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_gram.y
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/namespace.h"
+#include "nodes/makefuncs.h"
+
+#include "sqlol_gramparse.h"
+
+/*
+ * Location tracking support --- simpler than bison's default, since we only
+ * want to track the start position not the end position of each nonterminal.
+ */
+#define YYLLOC_DEFAULT(Current, Rhs, N) \
+ do { \
+ if ((N) > 0) \
+ (Current) = (Rhs)[1]; \
+ else \
+ (Current) = (-1); \
+ } while (0)
+
+/*
+ * The above macro assigns -1 (unknown) as the parse location of any
+ * nonterminal that was reduced from an empty rule, or whose leftmost
+ * component was reduced from an empty rule. This is problematic
+ * for nonterminals defined like
+ * OptFooList: / * EMPTY * / { ... } | OptFooList Foo { ... } ;
+ * because we'll set -1 as the location during the first reduction and then
+ * copy it during each subsequent reduction, leaving us with -1 for the
+ * location even when the list is not empty. To fix that, do this in the
+ * action for the nonempty rule(s):
+ * if (@$ < 0) @$ = @2;
+ * (Although we have many nonterminals that follow this pattern, we only
+ * bother with fixing @$ like this when the nonterminal's parse location
+ * is actually referenced in some rule.)
+ *
+ * A cleaner answer would be to make YYLLOC_DEFAULT scan all the Rhs
+ * locations until it's found one that's not -1. Then we'd get a correct
+ * location for any nonterminal that isn't entirely empty. But this way
+ * would add overhead to every rule reduction, and so far there's not been
+ * a compelling reason to pay that overhead.
+ */
+
+/*
+ * Bison doesn't allocate anything that needs to live across parser calls,
+ * so we can easily have it use palloc instead of malloc. This prevents
+ * memory leaks if we error out during parsing. Note this only works with
+ * bison >= 2.0. However, in bison 1.875 the default is to use alloca()
+ * if possible, so there's not really much problem anyhow, at least if
+ * you're building with gcc.
+ */
+#define YYMALLOC palloc
+#define YYFREE pfree
+
+
+#define parser_yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+#define parser_errposition(pos) sqlol_scanner_errposition(pos, yyscanner)
+
+static void sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner,
+ const char *msg);
+static RawStmt *makeRawStmt(Node *stmt, int stmt_location);
+static void updateRawStmtEnd(RawStmt *rs, int end_location);
+static Node *makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner);
+static void check_qualified_name(List *names, sqlol_yyscan_t yyscanner);
+static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
+
+%}
+
+%pure-parser
+%expect 0
+%name-prefix="sqlol_base_yy"
+%locations
+
+%parse-param {sqlol_yyscan_t yyscanner}
+%lex-param {sqlol_yyscan_t yyscanner}
+
+%union
+{
+ sqlol_YYSTYPE sqlol_yystype;
+ /* these fields must match sqlol_YYSTYPE: */
+ int ival;
+ char *str;
+ const char *keyword;
+
+ List *list;
+ Node *node;
+ Value *value;
+ RangeVar *range;
+ ResTarget *target;
+}
+
+%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+ indirection_el
+
+%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+
+%type <range> qualified_name
+
+%type <str> ColId ColLabel attr_name
+
+%type <target> gimmeh_el
+
+/*
+ * Non-keyword token types. These are hard-wired into the "flex" lexer.
+ * They must be listed first so that their numeric codes do not depend on
+ * the set of keywords. PL/pgSQL depends on this so that it can share the
+ * same lexer. If you add/change tokens here, fix PL/pgSQL to match!
+ *
+ */
+%token <str> IDENT FCONST SCONST Op
+
+/*
+ * If you want to make any keyword changes, update the keyword table in
+ * src/include/parser/kwlist.h and add new keywords to the appropriate one
+ * of the reserved-or-not-so-reserved keyword lists, below; search
+ * this file for "Keyword category lists".
+ */
+
+/* ordinary key words in alphabetical order */
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE
+
+
+%%
+
+/*
+ * The target production for the whole parse.
+ *
+ * Ordinarily we parse a list of statements, but if we see one of the
+ * special MODE_XXX symbols as first token, we parse something else.
+ * The options here correspond to enum RawParseMode, which see for details.
+ */
+parse_toplevel:
+ stmtmulti
+ {
+ pg_yyget_extra(yyscanner)->parsetree = $1;
+ }
+ ;
+
+/*
+ * At top level, we wrap each stmt with a RawStmt node carrying start location
+ * and length of the stmt's text. Notice that the start loc/len are driven
+ * entirely from semicolon locations (@2). It would seem natural to use
+ * @1 or @3 to get the true start location of a stmt, but that doesn't work
+ * for statements that can start with empty nonterminals (opt_with_clause is
+ * the main offender here); as noted in the comments for YYLLOC_DEFAULT,
+ * we'd get -1 for the location in such cases.
+ * We also take care to discard empty statements entirely.
+ */
+stmtmulti: stmtmulti KTHXBYE toplevel_stmt
+ {
+ if ($1 != NIL)
+ {
+ /* update length of previous stmt */
+ updateRawStmtEnd(llast_node(RawStmt, $1), @2);
+ }
+ if ($3 != NULL)
+ $$ = lappend($1, makeRawStmt($3, @2 + 1));
+ else
+ $$ = $1;
+ }
+ | toplevel_stmt
+ {
+ if ($1 != NULL)
+ $$ = list_make1(makeRawStmt($1, 0));
+ else
+ $$ = NIL;
+ }
+ ;
+
+/*
+ * toplevel_stmt includes BEGIN and END. stmt does not include them, because
+ * those words have different meanings in function bodys.
+ */
+toplevel_stmt:
+ stmt
+ ;
+
+stmt:
+ GimmehStmt
+ | /*EMPTY*/
+ { $$ = NULL; }
+ ;
+
+/*****************************************************************************
+ *
+ * GIMMEH statement
+ *
+ *****************************************************************************/
+
+GimmehStmt:
+ simple_gimmeh { $$ = $1; }
+ ;
+
+simple_gimmeh:
+ HAI FCONST I HAS A qualified_name
+ GIMMEH gimmeh_list
+ {
+ SelectStmt *n = makeNode(SelectStmt);
+ n->targetList = $8;
+ n->fromClause = list_make1($6);
+ $$ = (Node *)n;
+ }
+ ;
+
+gimmeh_list:
+ gimmeh_el { $$ = list_make1($1); }
+ | gimmeh_list ',' gimmeh_el { $$ = lappend($1, $3); }
+
+gimmeh_el:
+ columnref
+ {
+ $$ = makeNode(ResTarget);
+ $$->name = NULL;
+ $$->indirection = NIL;
+ $$->val = (Node *)$1;
+ $$->location = @1;
+ }
+
+qualified_name:
+ ColId
+ {
+ $$ = makeRangeVar(NULL, $1, @1);
+ }
+ | ColId indirection
+ {
+ check_qualified_name($2, yyscanner);
+ $$ = makeRangeVar(NULL, NULL, @1);
+ switch (list_length($2))
+ {
+ case 1:
+ $$->catalogname = NULL;
+ $$->schemaname = $1;
+ $$->relname = strVal(linitial($2));
+ break;
+ case 2:
+ $$->catalogname = $1;
+ $$->schemaname = strVal(linitial($2));
+ $$->relname = strVal(lsecond($2));
+ break;
+ default:
+ /*
+ * It's ok to error out here as at this point we
+ * already parsed a "HAI FCONST" preamble, and no
+ * other grammar is likely to accept a command
+ * starting with that, so there's no point trying
+ * to fall back on the other grammars.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("improper qualified name (too many dotted names): %s",
+ NameListToString(lcons(makeString($1), $2))),
+ parser_errposition(@1)));
+ break;
+ }
+ }
+ ;
+
+columnref: ColId
+ {
+ $$ = makeColumnRef($1, NIL, @1, yyscanner);
+ }
+ | ColId indirection
+ {
+ $$ = makeColumnRef($1, $2, @1, yyscanner);
+ }
+ ;
+
+ColId: IDENT { $$ = $1; }
+
+indirection:
+ indirection_el { $$ = list_make1($1); }
+ | indirection indirection_el { $$ = lappend($1, $2); }
+ ;
+
+indirection_el:
+ '.' attr_name
+ {
+ $$ = (Node *) makeString($2);
+ }
+ ;
+
+attr_name: ColLabel { $$ = $1; };
+
+ColLabel: IDENT { $$ = $1; }
+
+%%
+
+/*
+ * The signature of this function is required by bison. However, we
+ * ignore the passed yylloc and instead use the last token position
+ * available from the scanner.
+ */
+static void
+sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner, const char *msg)
+{
+ parser_yyerror(msg);
+}
+
+static RawStmt *
+makeRawStmt(Node *stmt, int stmt_location)
+{
+ RawStmt *rs = makeNode(RawStmt);
+
+ rs->stmt = stmt;
+ rs->stmt_location = stmt_location;
+ rs->stmt_len = 0; /* might get changed later */
+ return rs;
+}
+
+/* Adjust a RawStmt to reflect that it doesn't run to the end of the string */
+static void
+updateRawStmtEnd(RawStmt *rs, int end_location)
+{
+ /*
+ * If we already set the length, don't change it. This is for situations
+ * like "select foo ;; select bar" where the same statement will be last
+ * in the string for more than one semicolon.
+ */
+ if (rs->stmt_len > 0)
+ return;
+
+ /* OK, update length of RawStmt */
+ rs->stmt_len = end_location - rs->stmt_location;
+}
+
+static Node *
+makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner)
+{
+ /*
+ * Generate a ColumnRef node, with an A_Indirection node added if there
+ * is any subscripting in the specified indirection list. However,
+ * any field selection at the start of the indirection list must be
+ * transposed into the "fields" part of the ColumnRef node.
+ */
+ ColumnRef *c = makeNode(ColumnRef);
+ int nfields = 0;
+ ListCell *l;
+
+ c->location = location;
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Indices))
+ {
+ A_Indirection *i = makeNode(A_Indirection);
+
+ if (nfields == 0)
+ {
+ /* easy case - all indirection goes to A_Indirection */
+ c->fields = list_make1(makeString(colname));
+ i->indirection = check_indirection(indirection, yyscanner);
+ }
+ else
+ {
+ /* got to split the list in two */
+ i->indirection = check_indirection(list_copy_tail(indirection,
+ nfields),
+ yyscanner);
+ indirection = list_truncate(indirection, nfields);
+ c->fields = lcons(makeString(colname), indirection);
+ }
+ i->arg = (Node *) c;
+ return (Node *) i;
+ }
+ else if (IsA(lfirst(l), A_Star))
+ {
+ /* We only allow '*' at the end of a ColumnRef */
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ nfields++;
+ }
+ /* No subscripting, so all indirection gets added to field list */
+ c->fields = lcons(makeString(colname), indirection);
+ return (Node *) c;
+}
+
+/* check_qualified_name --- check the result of qualified_name production
+ *
+ * It's easiest to let the grammar production for qualified_name allow
+ * subscripts and '*', which we then must reject here.
+ */
+static void
+check_qualified_name(List *names, sqlol_yyscan_t yyscanner)
+{
+ ListCell *i;
+
+ foreach(i, names)
+ {
+ if (!IsA(lfirst(i), String))
+ parser_yyerror("syntax error");
+ }
+}
+
+/* check_indirection --- check the result of indirection production
+ *
+ * We only allow '*' at the end of the list, but it's hard to enforce that
+ * in the grammar, so do it here.
+ */
+static List *
+check_indirection(List *indirection, sqlol_yyscan_t yyscanner)
+{
+ ListCell *l;
+
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Star))
+ {
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ }
+ return indirection;
+}
+
+/* sqlol_parser_init()
+ * Initialize to parse one query string
+ */
+void
+sqlol_parser_init(sqlol_base_yy_extra_type *yyext)
+{
+ yyext->parsetree = NIL; /* in case grammar forgets to set it */
+}
diff --git a/contrib/sqlol/sqlol_gramparse.h b/contrib/sqlol/sqlol_gramparse.h
new file mode 100644
index 0000000000..58233a8d87
--- /dev/null
+++ b/contrib/sqlol/sqlol_gramparse.h
@@ -0,0 +1,61 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gramparse.h
+ * Shared definitions for the "raw" parser (flex and bison phases only)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_gramparse.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_GRAMPARSE_H
+#define SQLOL_GRAMPARSE_H
+
+#include "nodes/parsenodes.h"
+#include "sqlol_scanner.h"
+
+/*
+ * NB: include gram.h only AFTER including scanner.h, because scanner.h
+ * is what #defines YYLTYPE.
+ */
+#include "sqlol_gram.h"
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around. Private
+ * state needed for raw parsing/lexing goes here.
+ */
+typedef struct sqlol_base_yy_extra_type
+{
+ /*
+ * Fields used by the core scanner.
+ */
+ sqlol_yy_extra_type sqlol_yy_extra;
+
+ /*
+ * State variables that belong to the grammar.
+ */
+ List *parsetree; /* final parse result is delivered here */
+} sqlol_base_yy_extra_type;
+
+/*
+ * In principle we should use yyget_extra() to fetch the yyextra field
+ * from a yyscanner struct. However, flex always puts that field first,
+ * and this is sufficiently performance-critical to make it seem worth
+ * cheating a bit to use an inline macro.
+ */
+#define pg_yyget_extra(yyscanner) (*((sqlol_base_yy_extra_type **) (yyscanner)))
+
+
+/* from parser.c */
+extern int sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+
+/* from gram.y */
+extern void sqlol_parser_init(sqlol_base_yy_extra_type *yyext);
+extern int sqlol_baseyyparse(sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_GRAMPARSE_H */
diff --git a/contrib/sqlol/sqlol_keywords.c b/contrib/sqlol/sqlol_keywords.c
new file mode 100644
index 0000000000..dbbdf5493c
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.c
@@ -0,0 +1,98 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.c
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * sqlol/sqlol_keywords.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "sqlol_gramparse.h"
+
+#define PG_KEYWORD(a,b,c) {a,b,c},
+
+const sqlol_ScanKeyword sqlol_ScanKeywords[] = {
+#include "sqlol_kwlist.h"
+};
+
+const int sqlol_NumScanKeywords = lengthof(sqlol_ScanKeywords);
+
+#undef PG_KEYWORD
+
+
+/*
+ * ScanKeywordLookup - see if a given word is a keyword
+ *
+ * The table to be searched is passed explicitly, so that this can be used
+ * to search keyword lists other than the standard list appearing above.
+ *
+ * Returns a pointer to the sqlol_ScanKeyword table entry, or NULL if no match.
+ *
+ * The match is done case-insensitively. Note that we deliberately use a
+ * dumbed-down case conversion that will only translate 'A'-'Z' into 'a'-'z',
+ * even if we are in a locale where tolower() would produce more or different
+ * translations. This is to conform to the SQL99 spec, which says that
+ * keywords are to be matched in this way even though non-keyword identifiers
+ * receive a different case-normalization mapping.
+ */
+const sqlol_ScanKeyword *
+sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ int len,
+ i;
+ char word[NAMEDATALEN];
+ const sqlol_ScanKeyword *low;
+ const sqlol_ScanKeyword *high;
+
+ len = strlen(text);
+ /* We assume all keywords are shorter than NAMEDATALEN. */
+ if (len >= NAMEDATALEN)
+ return NULL;
+
+ /*
+ * Apply an ASCII-only downcasing. We must not use tolower() since it may
+ * produce the wrong translation in some locales (eg, Turkish).
+ */
+ for (i = 0; i < len; i++)
+ {
+ char ch = text[i];
+
+ if (ch >= 'A' && ch <= 'Z')
+ ch += 'a' - 'A';
+ word[i] = ch;
+ }
+ word[len] = '\0';
+
+ /*
+ * Now do a binary search using plain strcmp() comparison.
+ */
+ low = keywords;
+ high = keywords + (num_keywords - 1);
+ while (low <= high)
+ {
+ const sqlol_ScanKeyword *middle;
+ int difference;
+
+ middle = low + (high - low) / 2;
+ difference = strcmp(middle->name, word);
+ if (difference == 0)
+ return middle;
+ else if (difference < 0)
+ low = middle + 1;
+ else
+ high = middle - 1;
+ }
+
+ return NULL;
+}
+
diff --git a/contrib/sqlol/sqlol_keywords.h b/contrib/sqlol/sqlol_keywords.h
new file mode 100644
index 0000000000..bc4acf4541
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.h
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.h
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_keywords.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SQLOL_KEYWORDS_H
+#define SQLOL_KEYWORDS_H
+
+/* Keyword categories --- should match lists in gram.y */
+#define UNRESERVED_KEYWORD 0
+#define COL_NAME_KEYWORD 1
+#define TYPE_FUNC_NAME_KEYWORD 2
+#define RESERVED_KEYWORD 3
+
+
+typedef struct sqlol_ScanKeyword
+{
+ const char *name; /* in lower case */
+ int16 value; /* grammar's token code */
+ int16 category; /* see codes above */
+} sqlol_ScanKeyword;
+
+extern PGDLLIMPORT const sqlol_ScanKeyword sqlol_ScanKeywords[];
+extern PGDLLIMPORT const int sqlol_NumScanKeywords;
+
+extern const sqlol_ScanKeyword *sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+
+#endif /* SQLOL_KEYWORDS_H */
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
new file mode 100644
index 0000000000..2de3893ee4
--- /dev/null
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_kwlist.h
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_kwlist.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/* name, value, category, is-bare-label */
+PG_KEYWORD("a", A, UNRESERVED_KEYWORD)
+PG_KEYWORD("gimmeh", GIMMEH, UNRESERVED_KEYWORD)
+PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
+PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
+PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
+PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
new file mode 100644
index 0000000000..a7088b8390
--- /dev/null
+++ b/contrib/sqlol/sqlol_scan.l
@@ -0,0 +1,544 @@
+%top{
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scan.l
+ * lexical scanner for sqlol
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_scan.l
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/string.h"
+#include "sqlol_gramparse.h"
+#include "parser/scansup.h"
+#include "mb/pg_wchar.h"
+
+#include "sqlol_keywords.h"
+}
+
+%{
+
+/* LCOV_EXCL_START */
+
+/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
+#undef fprintf
+#define fprintf(file, fmt, msg) fprintf_to_ereport(fmt, msg)
+
+static void
+fprintf_to_ereport(const char *fmt, const char *msg)
+{
+ ereport(ERROR, (errmsg_internal("%s", msg)));
+}
+
+
+/*
+ * Set the type of YYSTYPE.
+ */
+#define YYSTYPE sqlol_YYSTYPE
+
+/*
+ * Set the type of yyextra. All state variables used by the scanner should
+ * be in yyextra, *not* statically allocated.
+ */
+#define YY_EXTRA_TYPE sqlol_yy_extra_type *
+
+/*
+ * Each call to yylex must set yylloc to the location of the found token
+ * (expressed as a byte offset from the start of the input text).
+ * When we parse a token that requires multiple lexer rules to process,
+ * this should be done in the first such rule, else yylloc will point
+ * into the middle of the token.
+ */
+#define SET_YYLLOC() (*(yylloc) = yytext - yyextra->scanbuf)
+
+/*
+ * Advance yylloc by the given number of bytes.
+ */
+#define ADVANCE_YYLLOC(delta) ( *(yylloc) += (delta) )
+
+/*
+ * Sometimes, we do want yylloc to point into the middle of a token; this is
+ * useful for instance to throw an error about an escape sequence within a
+ * string literal. But if we find no error there, we want to revert yylloc
+ * to the token start, so that that's the location reported to the parser.
+ * Use PUSH_YYLLOC/POP_YYLLOC to save/restore yylloc around such code.
+ * (Currently the implied "stack" is just one location, but someday we might
+ * need to nest these.)
+ */
+#define PUSH_YYLLOC() (yyextra->save_yylloc = *(yylloc))
+#define POP_YYLLOC() (*(yylloc) = yyextra->save_yylloc)
+
+#define startlit() ( yyextra->literallen = 0 )
+static void addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner);
+static void addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner);
+static char *litbufdup(sqlol_yyscan_t yyscanner);
+
+#define yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+
+#define lexer_errposition() sqlol_scanner_errposition(*(yylloc), yyscanner)
+
+/*
+ * Work around a bug in flex 2.5.35: it emits a couple of functions that
+ * it forgets to emit declarations for. Since we use -Wmissing-prototypes,
+ * this would cause warnings. Providing our own declarations should be
+ * harmless even when the bug gets fixed.
+ */
+extern int sqlol_yyget_column(yyscan_t yyscanner);
+extern void sqlol_yyset_column(int column_no, yyscan_t yyscanner);
+
+%}
+
+%option reentrant
+%option bison-bridge
+%option bison-locations
+%option 8bit
+%option never-interactive
+%option nodefault
+%option noinput
+%option nounput
+%option noyywrap
+%option noyyalloc
+%option noyyrealloc
+%option noyyfree
+%option warn
+%option prefix="sqlol_yy"
+
+/*
+ * OK, here is a short description of lex/flex rules behavior.
+ * The longest pattern which matches an input string is always chosen.
+ * For equal-length patterns, the first occurring in the rules list is chosen.
+ * INITIAL is the starting state, to which all non-conditional rules apply.
+ * Exclusive states change parsing rules while the state is active. When in
+ * an exclusive state, only those rules defined for that state apply.
+ *
+ * We use exclusive states for quoted strings, extended comments,
+ * and to eliminate parsing troubles for numeric strings.
+ * Exclusive states:
+ * <xd> delimited identifiers (double-quoted identifiers)
+ * <xq> standard quoted strings
+ * <xqs> quote stop (detect continued strings)
+ *
+ * Remember to add an <<EOF>> case whenever you add a new exclusive state!
+ * The default one is probably not the right thing.
+ */
+
+%x xd
+%x xq
+%x xqs
+
+/*
+ * In order to make the world safe for Windows and Mac clients as well as
+ * Unix ones, we accept either \n or \r as a newline. A DOS-style \r\n
+ * sequence will be seen as two successive newlines, but that doesn't cause
+ * any problems. Comments that start with -- and extend to the next
+ * newline are treated as equivalent to a single whitespace character.
+ *
+ * NOTE a fine point: if there is no newline following --, we will absorb
+ * everything to the end of the input as a comment. This is correct. Older
+ * versions of Postgres failed to recognize -- as a comment if the input
+ * did not end with a newline.
+ *
+ * XXX perhaps \f (formfeed) should be treated as a newline as well?
+ *
+ * XXX if you change the set of whitespace characters, fix scanner_isspace()
+ * to agree.
+ */
+
+space [ \t\n\r\f]
+horiz_space [ \t\f]
+newline [\n\r]
+non_newline [^\n\r]
+
+comment ("--"{non_newline}*)
+
+whitespace ({space}+|{comment})
+
+/*
+ * SQL requires at least one newline in the whitespace separating
+ * string literals that are to be concatenated. Silly, but who are we
+ * to argue? Note that {whitespace_with_newline} should not have * after
+ * it, whereas {whitespace} should generally have a * after it...
+ */
+
+special_whitespace ({space}+|{comment}{newline})
+horiz_whitespace ({horiz_space}|{comment})
+whitespace_with_newline ({horiz_whitespace}*{newline}{special_whitespace}*)
+
+quote '
+/* If we see {quote} then {quotecontinue}, the quoted string continues */
+quotecontinue {whitespace_with_newline}{quote}
+
+/*
+ * {quotecontinuefail} is needed to avoid lexer backup when we fail to match
+ * {quotecontinue}. It might seem that this could just be {whitespace}*,
+ * but if there's a dash after {whitespace_with_newline}, it must be consumed
+ * to see if there's another dash --- which would start a {comment} and thus
+ * allow continuation of the {quotecontinue} token.
+ */
+quotecontinuefail {whitespace}*"-"?
+
+/* Extended quote
+ * xqdouble implements embedded quote, ''''
+ */
+xqstart {quote}
+xqdouble {quote}{quote}
+xqinside [^']+
+
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote \"
+xdstart {dquote}
+xdstop {dquote}
+xddouble {dquote}{dquote}
+xdinside [^"]+
+
+digit [0-9]
+ident_start [A-Za-z\200-\377_]
+ident_cont [A-Za-z\200-\377_0-9\$]
+
+identifier {ident_start}{ident_cont}*
+
+decimal (({digit}+)|({digit}*\.{digit}+)|({digit}+\.{digit}*))
+
+other .
+
+%%
+
+{whitespace} {
+ /* ignore */
+ }
+
+
+{xqstart} {
+ yyextra->saw_non_ascii = false;
+ SET_YYLLOC();
+ BEGIN(xq);
+ startlit();
+}
+<xq>{quote} {
+ /*
+ * When we are scanning a quoted string and see an end
+ * quote, we must look ahead for a possible continuation.
+ * If we don't see one, we know the end quote was in fact
+ * the end of the string. To reduce the lexer table size,
+ * we use a single "xqs" state to do the lookahead for all
+ * types of strings.
+ */
+ yyextra->state_before_str_stop = YYSTATE;
+ BEGIN(xqs);
+ }
+<xqs>{quotecontinue} {
+ /*
+ * Found a quote continuation, so return to the in-quote
+ * state and continue scanning the literal. Nothing is
+ * added to the literal's contents.
+ */
+ BEGIN(yyextra->state_before_str_stop);
+ }
+<xqs>{quotecontinuefail} |
+<xqs>{other} |
+<xqs><<EOF>> {
+ /*
+ * Failed to see a quote continuation. Throw back
+ * everything after the end quote, and handle the string
+ * according to the state we were in previously.
+ */
+ yyless(0);
+ BEGIN(INITIAL);
+
+ switch (yyextra->state_before_str_stop)
+ {
+ case xq:
+ /*
+ * Check that the data remains valid, if it might
+ * have been made invalid by unescaping any chars.
+ */
+ if (yyextra->saw_non_ascii)
+ pg_verifymbstr(yyextra->literalbuf,
+ yyextra->literallen,
+ false);
+ yylval->str = litbufdup(yyscanner);
+ return SCONST;
+ default:
+ yyerror("unhandled previous state in xqs");
+ }
+ }
+
+<xq>{xqdouble} {
+ addlitchar('\'', yyscanner);
+ }
+<xq>{xqinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xq><<EOF>> { yyerror("unterminated quoted string"); }
+
+
+{xdstart} {
+ SET_YYLLOC();
+ BEGIN(xd);
+ startlit();
+ }
+<xd>{xdstop} {
+ char *ident;
+
+ BEGIN(INITIAL);
+ if (yyextra->literallen == 0)
+ yyerror("zero-length delimited identifier");
+ ident = litbufdup(yyscanner);
+ if (yyextra->literallen >= NAMEDATALEN)
+ truncate_identifier(ident, yyextra->literallen, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+<xd>{xddouble} {
+ addlitchar('"', yyscanner);
+ }
+<xd>{xdinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xd><<EOF>> { yyerror("unterminated quoted identifier"); }
+
+{decimal} {
+ SET_YYLLOC();
+ yylval->str = pstrdup(yytext);
+ return FCONST;
+ }
+
+{identifier} {
+ const sqlol_ScanKeyword *keyword;
+ char *ident;
+
+ SET_YYLLOC();
+
+ /* Is it a keyword? */
+ keyword = sqlol_ScanKeywordLookup(yytext,
+ yyextra->keywords,
+ yyextra->num_keywords);
+ if (keyword != NULL)
+ {
+ yylval->keyword = keyword->name;
+ return keyword->value;
+ }
+
+ /*
+ * No. Convert the identifier to lower case, and truncate
+ * if necessary.
+ */
+ ident = downcase_truncate_identifier(yytext, yyleng, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+
+{other} {
+ SET_YYLLOC();
+ return yytext[0];
+ }
+
+<<EOF>> {
+ SET_YYLLOC();
+ yyterminate();
+ }
+
+%%
+
+/* LCOV_EXCL_STOP */
+
+/*
+ * Arrange access to yyextra for subroutines of the main yylex() function.
+ * We expect each subroutine to have a yyscanner parameter. Rather than
+ * use the yyget_xxx functions, which might or might not get inlined by the
+ * compiler, we cheat just a bit and cast yyscanner to the right type.
+ */
+#undef yyextra
+#define yyextra (((struct yyguts_t *) yyscanner)->yyextra_r)
+
+/* Likewise for a couple of other things we need. */
+#undef yylloc
+#define yylloc (((struct yyguts_t *) yyscanner)->yylloc_r)
+#undef yyleng
+#define yyleng (((struct yyguts_t *) yyscanner)->yyleng_r)
+
+
+/*
+ * scanner_errposition
+ * Report a lexer or grammar error cursor position, if possible.
+ *
+ * This is expected to be used within an ereport() call. The return value
+ * is a dummy (always 0, in fact).
+ *
+ * Note that this can only be used for messages emitted during raw parsing
+ * (essentially, sqlol_scan.l, sqlol_parser.c, sqlol_and gram.y), since it
+ * requires the yyscanner struct to still be available.
+ */
+int
+sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner)
+{
+ int pos;
+
+ if (location < 0)
+ return 0; /* no-op if location is unknown */
+
+ /* Convert byte offset to character number */
+ pos = pg_mbstrlen_with_len(yyextra->scanbuf, location) + 1;
+ /* And pass it to the ereport mechanism */
+ return errposition(pos);
+}
+
+/*
+ * scanner_yyerror
+ * Report a lexer or grammar error.
+ *
+ * Just ignore as we'll fallback to raw_parser().
+ */
+void
+sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner)
+{
+ return;
+}
+
+
+/*
+ * Called before any actual parsing is done
+ */
+sqlol_yyscan_t
+sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ Size slen = strlen(str);
+ yyscan_t scanner;
+
+ if (yylex_init(&scanner) != 0)
+ elog(ERROR, "yylex_init() failed: %m");
+
+ sqlol_yyset_extra(yyext, scanner);
+
+ yyext->keywords = keywords;
+ yyext->num_keywords = num_keywords;
+
+ /*
+ * Make a scan buffer with special termination needed by flex.
+ */
+ yyext->scanbuf = (char *) palloc(slen + 2);
+ yyext->scanbuflen = slen;
+ memcpy(yyext->scanbuf, str, slen);
+ yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
+ yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+
+ /* initialize literal buffer to a reasonable but expansible size */
+ yyext->literalalloc = 1024;
+ yyext->literalbuf = (char *) palloc(yyext->literalalloc);
+ yyext->literallen = 0;
+
+ return scanner;
+}
+
+
+/*
+ * Called after parsing is done to clean up after scanner_init()
+ */
+void
+sqlol_scanner_finish(sqlol_yyscan_t yyscanner)
+{
+ /*
+ * We don't bother to call yylex_destroy(), because all it would do is
+ * pfree a small amount of control storage. It's cheaper to leak the
+ * storage until the parsing context is destroyed. The amount of space
+ * involved is usually negligible compared to the output parse tree
+ * anyway.
+ *
+ * We do bother to pfree the scanbuf and literal buffer, but only if they
+ * represent a nontrivial amount of space. The 8K cutoff is arbitrary.
+ */
+ if (yyextra->scanbuflen >= 8192)
+ pfree(yyextra->scanbuf);
+ if (yyextra->literalalloc >= 8192)
+ pfree(yyextra->literalbuf);
+}
+
+
+static void
+addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + yleng) >= yyextra->literalalloc)
+ {
+ do
+ {
+ yyextra->literalalloc *= 2;
+ } while ((yyextra->literallen + yleng) >= yyextra->literalalloc);
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ memcpy(yyextra->literalbuf + yyextra->literallen, ytext, yleng);
+ yyextra->literallen += yleng;
+}
+
+
+static void
+addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + 1) >= yyextra->literalalloc)
+ {
+ yyextra->literalalloc *= 2;
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ yyextra->literalbuf[yyextra->literallen] = ychar;
+ yyextra->literallen += 1;
+}
+
+
+/*
+ * Create a palloc'd copy of literalbuf, adding a trailing null.
+ */
+static char *
+litbufdup(sqlol_yyscan_t yyscanner)
+{
+ int llen = yyextra->literallen;
+ char *new;
+
+ new = palloc(llen + 1);
+ memcpy(new, yyextra->literalbuf, llen);
+ new[llen] = '\0';
+ return new;
+}
+
+/*
+ * Interface functions to make flex use palloc() instead of malloc().
+ * It'd be better to make these static, but flex insists otherwise.
+ */
+
+void *
+sqlol_yyalloc(yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ return palloc(bytes);
+}
+
+void *
+sqlol_yyrealloc(void *ptr, yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ return repalloc(ptr, bytes);
+ else
+ return palloc(bytes);
+}
+
+void
+sqlol_yyfree(void *ptr, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ pfree(ptr);
+}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
new file mode 100644
index 0000000000..0a497e9d91
--- /dev/null
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scanner.h
+ * API for the core scanner (flex machine)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_scanner.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_SCANNER_H
+#define SQLOL_SCANNER_H
+
+#include "sqlol_keywords.h"
+
+/*
+ * The scanner returns extra data about scanned tokens in this union type.
+ * Note that this is a subset of the fields used in YYSTYPE of the bison
+ * parsers built atop the scanner.
+ */
+typedef union sqlol_YYSTYPE
+{
+ int ival; /* for integer literals */
+ char *str; /* for identifiers and non-integer literals */
+ const char *keyword; /* canonical spelling of keywords */
+} sqlol_YYSTYPE;
+
+/*
+ * We track token locations in terms of byte offsets from the start of the
+ * source string, not the column number/line number representation that
+ * bison uses by default. Also, to minimize overhead we track only one
+ * location (usually the first token location) for each construct, not
+ * the beginning and ending locations as bison does by default. It's
+ * therefore sufficient to make YYLTYPE an int.
+ */
+#define YYLTYPE int
+
+/*
+ * Another important component of the scanner's API is the token code numbers.
+ * However, those are not defined in this file, because bison insists on
+ * defining them for itself. The token codes used by the core scanner are
+ * the ASCII characters plus these:
+ * %token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
+ * %token <ival> ICONST PARAM
+ * %token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
+ * %token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+ * The above token definitions *must* be the first ones declared in any
+ * bison parser built atop this scanner, so that they will have consistent
+ * numbers assigned to them (specifically, IDENT = 258 and so on).
+ */
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around.
+ * Private state needed by the core scanner goes here. Note that the actual
+ * yy_extra struct may be larger and have this as its first component, thus
+ * allowing the calling parser to keep some fields of its own in YY_EXTRA.
+ */
+typedef struct sqlol_yy_extra_type
+{
+ /*
+ * The string the scanner is physically scanning. We keep this mainly so
+ * that we can cheaply compute the offset of the current token (yytext).
+ */
+ char *scanbuf;
+ Size scanbuflen;
+
+ /*
+ * The keyword list to use, and the associated grammar token codes.
+ */
+ const sqlol_ScanKeyword *keywords;
+ int num_keywords;
+
+ /*
+ * literalbuf is used to accumulate literal values when multiple rules are
+ * needed to parse a single literal. Call startlit() to reset buffer to
+ * empty, addlit() to add text. NOTE: the string in literalbuf is NOT
+ * necessarily null-terminated, but there always IS room to add a trailing
+ * null at offset literallen. We store a null only when we need it.
+ */
+ char *literalbuf; /* palloc'd expandable buffer */
+ int literallen; /* actual current string length */
+ int literalalloc; /* current allocated buffer size */
+
+ /*
+ * Random assorted scanner state.
+ */
+ int state_before_str_stop; /* start cond. before end quote */
+ YYLTYPE save_yylloc; /* one-element stack for PUSH_YYLLOC() */
+
+ /* state variables for literal-lexing warnings */
+ bool saw_non_ascii;
+} sqlol_yy_extra_type;
+
+/*
+ * The type of yyscanner is opaque outside scan.l.
+ */
+typedef void *sqlol_yyscan_t;
+
+
+/* Constant data exported from parser/scan.l */
+extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
+
+/* Entry points in parser/scan.l */
+extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
+extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+extern int sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner);
+extern void sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_SCANNER_H */
--
2.31.1
v4-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload
From b362e0238049009c586fdadb2fb53b2eb5dc6e3d Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 01:33:42 +0800
Subject: [PATCH v4 3/4] Add a new MODE_SINGLE_QUERY to the core parser and use
it in pg_parse_query.
If a third-party module provides a parser_hook, pg_parse_query() switches to
single-query parsing so multi-query commands using different grammar can work
properly. If the third-party module supports the full set of SQL we support,
or want to prevent fallback on the core parser, it can ignore the
MODE_SINGLE_QUERY mode and parse the full query string. In that case they must
return a List with more than one RawStmt or a single RawStmt with a 0 length to
stop the parsing phase, or raise an ERROR.
Otherwise, plugins should parse a single query only and always return a List
containing a single RawStmt with a properly set length (possibly 0 if it was a
single query without end of query delimiter). If the command is valid but
doesn't contain any statements (e.g. a single semi-colon), a single RawStmt
with a NULL stmt field should be returned, containing the consumed query string
length so we can move to the next command in a single pass rather than 1 byte
at a time.
Also, third-party modules can choose to ignore some or all of parsing error if
they want to implement only subset of postgres suppoted syntax, or even a
totally different syntax, and fall-back on core grammar for unhandled case. In
thase case, they should set the error flag to true. The returned List will be
ignored and the same offset of the input string will be parsed using the core
parser.
Finally, note that third-party plugins that wants to fallback on other grammar
should first try to call a previous parser hook if any before setting the error
switch and returning.
---
.../pg_stat_statements/pg_stat_statements.c | 3 +-
src/backend/commands/tablecmds.c | 2 +-
src/backend/executor/spi.c | 4 +-
src/backend/parser/gram.y | 29 +++-
src/backend/parser/parse_type.c | 2 +-
src/backend/parser/parser.c | 15 +-
src/backend/parser/scan.l | 26 +++-
src/backend/tcop/postgres.c | 138 ++++++++++++++++--
src/include/parser/parser.h | 5 +-
src/include/parser/scanner.h | 6 +-
src/include/tcop/tcopprot.h | 3 +-
src/pl/plpgsql/src/pl_gram.y | 2 +-
src/pl/plpgsql/src/pl_scanner.c | 2 +-
13 files changed, 210 insertions(+), 27 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 09433c8c96..d852575613 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2718,7 +2718,8 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
yyscanner = scanner_init(query,
&yyextra,
&ScanKeywords,
- ScanKeywordTokens);
+ ScanKeywordTokens,
+ 0);
/* we don't want to re-emit any escape string warnings */
yyextra.escape_string_warning = false;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 028e8ac46b..284933c693 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12677,7 +12677,7 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
* parse_analyze() or the rewriter, but instead we need to pass them
* through parse_utilcmd.c to make them ready for execution.
*/
- raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT);
+ raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT, 0);
querytree_list = NIL;
foreach(list_item, raw_parsetree_list)
{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index b8bd05e894..f05b3ce9e7 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2120,7 +2120,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Do parse analysis and rule rewrite for each raw parsetree, storing the
@@ -2228,7 +2228,7 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Construct plancache entries, but don't do parse analysis yet.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index eb24195438..911bb4d24b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -626,7 +626,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
%token <ival> ICONST PARAM
%token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
-%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS END_OF_FILE
/*
* If you want to make any keyword changes, update the keyword table in
@@ -753,6 +753,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token MODE_PLPGSQL_ASSIGN1
%token MODE_PLPGSQL_ASSIGN2
%token MODE_PLPGSQL_ASSIGN3
+%token MODE_SINGLE_QUERY
/* Precedence: lowest to highest */
@@ -858,6 +859,32 @@ parse_toplevel:
pg_yyget_extra(yyscanner)->parsetree =
list_make1(makeRawStmt((Node *) n, 0));
}
+ | MODE_SINGLE_QUERY toplevel_stmt ';'
+ {
+ RawStmt *raw = makeRawStmt($2, 0);
+ updateRawStmtEnd(raw, @3 + 1);
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string and move to the next command.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(raw);
+ YYACCEPT;
+ }
+ /*
+ * We need to explicitly look for EOF to parse non-semicolon
+ * terminated statements in single query mode, as we could
+ * otherwise successfully parse the beginning of an otherwise
+ * invalid query.
+ */
+ | MODE_SINGLE_QUERY toplevel_stmt END_OF_FILE
+ {
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(makeRawStmt($2, 0));
+ YYACCEPT;
+ }
;
/*
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index abe131ebeb..e9a7b5d62a 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -746,7 +746,7 @@ typeStringToTypeName(const char *str)
ptserrcontext.previous = error_context_stack;
error_context_stack = &ptserrcontext;
- raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME);
+ raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME, 0);
error_context_stack = ptserrcontext.previous;
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 875de7ba28..23fd49e74c 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -37,17 +37,25 @@ static char *str_udeescape(const char *str, char escape,
*
* Returns a list of raw (un-analyzed) parse trees. The contents of the
* list have the form required by the specified RawParseMode.
+ *
+ * For all mode different from MODE_SINGLE_QUERY, caller should provide a 0
+ * offset as the whole input string should be parsed. Otherwise, caller should
+ * provide the wanted offset in the input string, or -1 if no offset is
+ * required.
*/
List *
-raw_parser(const char *str, RawParseMode mode)
+raw_parser(const char *str, RawParseMode mode, int offset)
{
core_yyscan_t yyscanner;
base_yy_extra_type yyextra;
int yyresult;
+ Assert((mode != RAW_PARSE_SINGLE_QUERY && offset == 0) ||
+ (mode == RAW_PARSE_SINGLE_QUERY && offset != 0));
+
/* initialize the flex scanner */
yyscanner = scanner_init(str, &yyextra.core_yy_extra,
- &ScanKeywords, ScanKeywordTokens);
+ &ScanKeywords, ScanKeywordTokens, offset);
/* base_yylex() only needs us to initialize the lookahead token, if any */
if (mode == RAW_PARSE_DEFAULT)
@@ -61,7 +69,8 @@ raw_parser(const char *str, RawParseMode mode)
MODE_PLPGSQL_EXPR, /* RAW_PARSE_PLPGSQL_EXPR */
MODE_PLPGSQL_ASSIGN1, /* RAW_PARSE_PLPGSQL_ASSIGN1 */
MODE_PLPGSQL_ASSIGN2, /* RAW_PARSE_PLPGSQL_ASSIGN2 */
- MODE_PLPGSQL_ASSIGN3 /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_PLPGSQL_ASSIGN3, /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_SINGLE_QUERY /* RAW_PARSE_SINGLE_QUERY */
};
yyextra.have_lookahead = true;
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 9f9d8a1706..8ccbe95ac6 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -1041,7 +1041,10 @@ other .
<<EOF>> {
SET_YYLLOC();
- yyterminate();
+ if (yyextra->return_eof)
+ return END_OF_FILE;
+ else
+ yyterminate();
}
%%
@@ -1189,8 +1192,10 @@ core_yyscan_t
scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens)
+ const uint16 *keyword_tokens,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -1213,13 +1218,28 @@ scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Note that pg_parse_query will set a -1 offset rather than 0 for the
+ * first query of a possibly multi-query string if it wants us to return an
+ * EOF token.
+ */
+ yyext->return_eof = (offset != 0);
+
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ if (offset > 0)
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index e941b59b85..9331628add 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -602,17 +602,137 @@ ProcessClientWriteInterrupt(bool blocked)
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list = NIL;
+ List *result = NIL;
+ int stmt_len, offset;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- if (parser_hook)
- raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
- else
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ stmt_len = 0; /* lazily computed when needed */
+ offset = 0;
+
+ while(true)
+ {
+ List *raw_parsetree_list;
+ RawStmt *raw;
+ bool error = false;
+
+ /*----------------
+ * Start parsing the input string. If a third-party module provided a
+ * parser_hook, we switch to single-query parsing so multi-query
+ * commands using different grammar can work properly.
+ * If the third-party modules support the full set of SQL we support,
+ * or want to prevent fallback on the core parser, it can ignore the
+ * RAW_PARSE_SINGLE_QUERY flag and parse the full query string.
+ * In that case they must return a List with more than one RawStmt or a
+ * single RawStmt with a 0 length to stop the parsing phase, or raise
+ * an ERROR.
+ *
+ * Otherwise, plugins should parse a single query only and always
+ * return a List containing a single RawStmt with a properly set length
+ * (possibly 0 if it was a single query without end of query
+ * delimiter). If the command is valid but doesn't contain any
+ * statements (e.g. a single semi-colon), a single RawStmt with a NULL
+ * stmt field should be returned, containing the consumed query string
+ * length so we can move to the next command in a single pass rather
+ * than 1 byte at a time.
+ *
+ * Also, third-party modules can choose to ignore some or all of
+ * parsing error if they want to implement only subset of postgres
+ * suppoted syntax, or even a totally different syntax, and fall-back
+ * on core grammar for unhandled case. In thase case, they should set
+ * the error flag to true. The returned List will be ignored and the
+ * same offset of the input string will be parsed using the core
+ * parser.
+ *
+ * Finally, note that third-party modules that wants to fallback on
+ * other grammar should first try to call a previous parser hook if any
+ * before setting the error switch and returning .
+ */
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string,
+ RAW_PARSE_SINGLE_QUERY,
+ offset,
+ &error);
+
+ /*
+ * If a third-party module couldn't parse a single query or if no
+ * third-party module is configured, fallback on core parser.
+ */
+ if (error || !parser_hook)
+ {
+ /* Send a -1 offset to raw_parser to specify that it should
+ * explicitly detect EOF during parsing. scanner_init() will treat
+ * it the same as a 0 offset.
+ */
+ raw_parsetree_list = raw_parser(query_string,
+ error ? RAW_PARSE_SINGLE_QUERY : RAW_PARSE_DEFAULT,
+ (error && offset == 0) ? -1 : offset);
+ }
+
+ /*
+ * If there are no third-party plugin, or none of the parsers found a
+ * valid query, or if a third party module consumed the whole
+ * query string we're done.
+ */
+ if (!parser_hook || raw_parsetree_list == NIL ||
+ list_length(raw_parsetree_list) > 1)
+ {
+ /*
+ * Warn third-party plugins if they mix "single query" and "whole
+ * input string" strategy rather than silently accepting it and
+ * maybe allow fallback on core grammar even if they want to avoid
+ * that. This way plugin authors can be warned early of the issue.
+ */
+ if (result != NIL)
+ {
+ Assert(parser_hook != NULL);
+ elog(ERROR, "parser_hook should parse a single statement at "
+ "a time or consume the whole input string at once");
+ }
+ result = raw_parsetree_list;
+ break;
+ }
+
+ if (stmt_len == 0)
+ stmt_len = strlen(query_string);
+
+ raw = linitial_node(RawStmt, raw_parsetree_list);
+
+ /*
+ * In single-query mode, the parser will return statement location info
+ * relative to the beginning of complete original string, not the part
+ * we just parsed, so adjust the location info.
+ */
+ if (offset > 0 && raw->stmt_len > 0)
+ {
+ Assert(raw->stmt_len > offset);
+ raw->stmt_location = offset;
+ raw->stmt_len -= offset;
+ }
+
+ /* Ignore the statement if it didn't contain any command. */
+ if (raw->stmt)
+ result = lappend(result, raw);
+
+ if (raw->stmt_len == 0)
+ {
+ /* The statement was the whole string, we're done. */
+ break;
+ }
+ else if (raw->stmt_len + offset >= stmt_len)
+ {
+ /* We consumed all of the input string, we're done. */
+ break;
+ }
+ else
+ {
+ /* Advance the offset to the next command. */
+ offset += raw->stmt_len;
+ }
+ }
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
@@ -620,13 +740,13 @@ pg_parse_query(const char *query_string)
#ifdef COPY_PARSE_PLAN_TREES
/* Optional debugging check: pass raw parsetrees through copyObject() */
{
- List *new_list = copyObject(raw_parsetree_list);
+ List *new_list = copyObject(result);
/* This checks both copyObject() and the equal() routines... */
- if (!equal(new_list, raw_parsetree_list))
+ if (!equal(new_list, result))
elog(WARNING, "copyObject() failed to produce an equal raw parse tree");
else
- raw_parsetree_list = new_list;
+ result = new_list;
}
#endif
@@ -638,7 +758,7 @@ pg_parse_query(const char *query_string)
TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
- return raw_parsetree_list;
+ return result;
}
/*
diff --git a/src/include/parser/parser.h b/src/include/parser/parser.h
index 853b0f1606..5694ae791a 100644
--- a/src/include/parser/parser.h
+++ b/src/include/parser/parser.h
@@ -41,7 +41,8 @@ typedef enum
RAW_PARSE_PLPGSQL_EXPR,
RAW_PARSE_PLPGSQL_ASSIGN1,
RAW_PARSE_PLPGSQL_ASSIGN2,
- RAW_PARSE_PLPGSQL_ASSIGN3
+ RAW_PARSE_PLPGSQL_ASSIGN3,
+ RAW_PARSE_SINGLE_QUERY
} RawParseMode;
/* Values for the backslash_quote GUC */
@@ -59,7 +60,7 @@ extern PGDLLIMPORT bool standard_conforming_strings;
/* Primary entry point for the raw parsing functions */
-extern List *raw_parser(const char *str, RawParseMode mode);
+extern List *raw_parser(const char *str, RawParseMode mode, int offset);
/* Utility functions exported by gram.y (perhaps these should be elsewhere) */
extern List *SystemFuncName(char *name);
diff --git a/src/include/parser/scanner.h b/src/include/parser/scanner.h
index 0d8182faa0..a2e97be5d5 100644
--- a/src/include/parser/scanner.h
+++ b/src/include/parser/scanner.h
@@ -113,6 +113,9 @@ typedef struct core_yy_extra_type
/* state variables for literal-lexing warnings */
bool warn_on_first_escape;
bool saw_non_ascii;
+
+ /* state variable for returning an EOF token in single query mode */
+ bool return_eof;
} core_yy_extra_type;
/*
@@ -136,7 +139,8 @@ extern PGDLLIMPORT const uint16 ScanKeywordTokens[];
extern core_yyscan_t scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens);
+ const uint16 *keyword_tokens,
+ int offset);
extern void scanner_finish(core_yyscan_t yyscanner);
extern int core_yylex(core_YYSTYPE *lvalp, YYLTYPE *llocp,
core_yyscan_t yyscanner);
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 131dc2b22e..27201dde1d 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -45,7 +45,8 @@ typedef enum
extern PGDLLIMPORT int log_statement;
/* Hook for plugins to get control in pg_parse_query() */
-typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode,
+ int offset, bool *error);
extern PGDLLIMPORT parser_hook_type parser_hook;
extern List *pg_parse_query(const char *query_string);
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 3fcca43b90..e5a8a6477a 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3656,7 +3656,7 @@ check_sql_expr(const char *stmt, RawParseMode parseMode, int location)
error_context_stack = &syntax_errcontext;
oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt);
- (void) raw_parser(stmt, parseMode);
+ (void) raw_parser(stmt, parseMode, 0);
MemoryContextSwitchTo(oldCxt);
/* Restore former ereport callback */
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index e4c7a91ab5..a2886c42ec 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -587,7 +587,7 @@ plpgsql_scanner_init(const char *str)
{
/* Start up the core scanner */
yyscanner = scanner_init(str, &core_yy,
- &ReservedPLKeywords, ReservedPLKeywordTokens);
+ &ReservedPLKeywords, ReservedPLKeywordTokens, 0);
/*
* scanorig points to the original string, which unlike the scanner's
--
2.31.1
v4-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload
From 346b336b959bec03e1e7dbb6827c1cfd150785ff Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 02:15:54 +0800
Subject: [PATCH v4 4/4] Teach sqlol to use the new MODE_SINGLE_QUERY parser
mode.
This way multi-statements commands using both core parser and sqlol parser can
be supported.
Also add a LOLCODE version of CREATE VIEW viewname AS to easily test
multi-statements commands.
---
contrib/sqlol/Makefile | 2 +
contrib/sqlol/expected/01_sqlol.out | 77 +++++++++++++++++++++++++++++
contrib/sqlol/repro.sql | 18 +++++++
contrib/sqlol/sql/01_sqlol.sql | 44 +++++++++++++++++
contrib/sqlol/sqlol.c | 24 +++++----
contrib/sqlol/sqlol_gram.y | 63 +++++++++++------------
contrib/sqlol/sqlol_kwlist.h | 1 +
contrib/sqlol/sqlol_scan.l | 13 ++++-
contrib/sqlol/sqlol_scanner.h | 3 +-
9 files changed, 199 insertions(+), 46 deletions(-)
create mode 100644 contrib/sqlol/expected/01_sqlol.out
create mode 100644 contrib/sqlol/repro.sql
create mode 100644 contrib/sqlol/sql/01_sqlol.sql
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
index 3850ac3fce..eaf94801c2 100644
--- a/contrib/sqlol/Makefile
+++ b/contrib/sqlol/Makefile
@@ -6,6 +6,8 @@ OBJS = \
sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+REGRESS = 01_sqlol
+
sqlol_gram.h: sqlol_gram.c
touch $@
diff --git a/contrib/sqlol/expected/01_sqlol.out b/contrib/sqlol/expected/01_sqlol.out
new file mode 100644
index 0000000000..9c51dd62c2
--- /dev/null
+++ b/contrib/sqlol/expected/01_sqlol.out
@@ -0,0 +1,77 @@
+LOAD 'sqlol';
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+ id | val
+----+-----
+(0 rows)
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+ ?column?
+----------
+ 3
+(1 row)
+
+-- test empty statement ignoring
+\;\;select 1 \g
+ ?column?
+----------
+ 1
+(1 row)
+
+-- check the created views
+SELECT relname, relkind
+FROM pg_class c
+JOIN pg_namespace n ON c.relnamespace = n.oid
+WHERE nspname = 'public'
+ORDER BY relname COLLATE "C";
+ relname | relkind
+---------+---------
+ t1 | r
+ v0 | v
+ v1 | v
+ v2 | v
+ v3 | v
+ v4 | v
+ v5 | v
+(7 rows)
+
+--
+-- Error position
+--
+SELECT 1\;err;
+ERROR: syntax error at or near "err"
+LINE 1: SELECT 1;err;
+ ^
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+ERROR: syntax error at or near "HAI"
+LINE 1: SELECT 1;HAI 1.2 I HAS A t1 GIMME id KTHXBYE
+ ^
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+ERROR: improper qualified name (too many dotted names): some.thing.public.t1
+LINE 1: SELECT 1;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHX...
+ ^
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
+ERROR: relation "notatable" does not exist
+LINE 1: SELECT 1;SELECT * FROM notatable;
+ ^
diff --git a/contrib/sqlol/repro.sql b/contrib/sqlol/repro.sql
new file mode 100644
index 0000000000..0ebcb53160
--- /dev/null
+++ b/contrib/sqlol/repro.sql
@@ -0,0 +1,18 @@
+DROP TABLE IF EXISTS t1 CASCADE;
+
+LOAD 'sqlol';
+
+\;\; SELECT 1\;
+
+CREATE TABLE t1 (id integer, val text);
+
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+SELECT 1\;SELECT 2\;SELECT 3 \g
+\d
diff --git a/contrib/sqlol/sql/01_sqlol.sql b/contrib/sqlol/sql/01_sqlol.sql
new file mode 100644
index 0000000000..e89a3dd9a0
--- /dev/null
+++ b/contrib/sqlol/sql/01_sqlol.sql
@@ -0,0 +1,44 @@
+LOAD 'sqlol';
+
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+
+-- test empty statement ignoring
+\;\;select 1 \g
+
+-- check the created views
+SELECT relname, relkind
+FROM pg_class c
+JOIN pg_namespace n ON c.relnamespace = n.oid
+WHERE nspname = 'public'
+ORDER BY relname COLLATE "C";
+
+--
+-- Error position
+--
+SELECT 1\;err;
+
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
index b986966181..7d4e1b631f 100644
--- a/contrib/sqlol/sqlol.c
+++ b/contrib/sqlol/sqlol.c
@@ -26,7 +26,8 @@ static parser_hook_type prev_parser_hook = NULL;
void _PG_init(void);
void _PG_fini(void);
-static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+static List *sqlol_parser_hook(const char *str, RawParseMode mode, int offset,
+ bool *error);
/*
@@ -54,23 +55,25 @@ _PG_fini(void)
* sqlol_parser_hook: parse our grammar
*/
static List *
-sqlol_parser_hook(const char *str, RawParseMode mode)
+sqlol_parser_hook(const char *str, RawParseMode mode, int offset, bool *error)
{
sqlol_yyscan_t yyscanner;
sqlol_base_yy_extra_type yyextra;
int yyresult;
- if (mode != RAW_PARSE_DEFAULT)
+ if (mode != RAW_PARSE_DEFAULT && mode != RAW_PARSE_SINGLE_QUERY)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
/* initialize the flex scanner */
yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
- sqlol_ScanKeywords, sqlol_NumScanKeywords);
+ sqlol_ScanKeywords, sqlol_NumScanKeywords,
+ offset);
/* initialize the bison parser */
sqlol_parser_init(&yyextra);
@@ -88,9 +91,10 @@ sqlol_parser_hook(const char *str, RawParseMode mode)
if (yyresult)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
return yyextra.parsetree;
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
index 64d00d14ca..4c36cfef5e 100644
--- a/contrib/sqlol/sqlol_gram.y
+++ b/contrib/sqlol/sqlol_gram.y
@@ -20,6 +20,7 @@
#include "catalog/namespace.h"
#include "nodes/makefuncs.h"
+#include "catalog/pg_class_d.h"
#include "sqlol_gramparse.h"
@@ -106,10 +107,10 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
ResTarget *target;
}
-%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+%type <node> stmt toplevel_stmt GimmehStmt MaekStmt simple_gimmeh columnref
indirection_el
-%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+%type <list> parse_toplevel rawstmt gimmeh_list indirection
%type <range> qualified_name
@@ -134,22 +135,19 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
*/
/* ordinary key words in alphabetical order */
-%token <keyword> A GIMMEH HAI HAS I KTHXBYE
-
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE MAEK
%%
/*
* The target production for the whole parse.
- *
- * Ordinarily we parse a list of statements, but if we see one of the
- * special MODE_XXX symbols as first token, we parse something else.
- * The options here correspond to enum RawParseMode, which see for details.
*/
parse_toplevel:
- stmtmulti
+ rawstmt
{
pg_yyget_extra(yyscanner)->parsetree = $1;
+
+ YYACCEPT;
}
;
@@ -163,24 +161,11 @@ parse_toplevel:
* we'd get -1 for the location in such cases.
* We also take care to discard empty statements entirely.
*/
-stmtmulti: stmtmulti KTHXBYE toplevel_stmt
- {
- if ($1 != NIL)
- {
- /* update length of previous stmt */
- updateRawStmtEnd(llast_node(RawStmt, $1), @2);
- }
- if ($3 != NULL)
- $$ = lappend($1, makeRawStmt($3, @2 + 1));
- else
- $$ = $1;
- }
- | toplevel_stmt
+rawstmt: toplevel_stmt KTHXBYE
{
- if ($1 != NULL)
- $$ = list_make1(makeRawStmt($1, 0));
- else
- $$ = NIL;
+ RawStmt *raw = makeRawStmt($1, 0);
+ updateRawStmtEnd(raw, @2 + 7);
+ $$ = list_make1(raw);
}
;
@@ -189,13 +174,12 @@ stmtmulti: stmtmulti KTHXBYE toplevel_stmt
* those words have different meanings in function bodys.
*/
toplevel_stmt:
- stmt
+ HAI FCONST stmt { $$ = $3; }
;
stmt:
GimmehStmt
- | /*EMPTY*/
- { $$ = NULL; }
+ | MaekStmt
;
/*****************************************************************************
@@ -209,12 +193,11 @@ GimmehStmt:
;
simple_gimmeh:
- HAI FCONST I HAS A qualified_name
- GIMMEH gimmeh_list
+ I HAS A qualified_name GIMMEH gimmeh_list
{
SelectStmt *n = makeNode(SelectStmt);
- n->targetList = $8;
- n->fromClause = list_make1($6);
+ n->targetList = $6;
+ n->fromClause = list_make1($4);
$$ = (Node *)n;
}
;
@@ -233,6 +216,20 @@ gimmeh_el:
$$->location = @1;
}
+MaekStmt:
+ MAEK GimmehStmt A qualified_name
+ {
+ ViewStmt *n = makeNode(ViewStmt);
+ n->view = $4;
+ n->view->relpersistence = RELPERSISTENCE_PERMANENT;
+ n->aliases = NIL;
+ n->query = $2;
+ n->replace = false;
+ n->options = NIL;
+ n->withCheckOption = false;
+ $$ = (Node *) n;
+ }
+
qualified_name:
ColId
{
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
index 2de3893ee4..8b50d88df9 100644
--- a/contrib/sqlol/sqlol_kwlist.h
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -19,3 +19,4 @@ PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
+PG_KEYWORD("maek", MAEK, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
index a7088b8390..e6d4d53446 100644
--- a/contrib/sqlol/sqlol_scan.l
+++ b/contrib/sqlol/sqlol_scan.l
@@ -412,8 +412,10 @@ sqlol_yyscan_t
sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords)
+ int num_keywords,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -432,13 +434,20 @@ sqlol_scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
index 0a497e9d91..57f95867ee 100644
--- a/contrib/sqlol/sqlol_scanner.h
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -108,7 +108,8 @@ extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords);
+ int num_keywords,
+ int offset);
extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
sqlol_yyscan_t yyscanner);
--
2.31.1
On Sat, Jun 12, 2021 at 4:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers. I'm sending this now as a follow-up of [1] and to avoid duplicated
efforts, as multiple people are interested in that topic.
The patches all build properly and pass all regressions tests.
pg_parse_query() will instruct plugins to parse a query at a time. They're
free to ignore that mode if they want to implement the 3rd mode. If so, they
should either return multiple RawStmt, a single RawStmt with a 0 or
strlen(query_string) stmt_len, or error out. Otherwise, they will implement
either mode 1 or 2, and they should always return a List containing a single
RawStmt with properly set stmt_len, even if the underlying statement is NULL.
This is required to properly skip valid strings that don't contain a
statements, and pg_parse_query() will skip RawStmt that don't contain an
underlying statement.
Wouldn't we want to only loop through the individual statements if parser_hook
exists? The current patch seems to go through the new code path regardless
of the hook being grabbed.
Thanks for the review Jim!
On Wed, Jul 7, 2021 at 3:26 AM Jim Mlodgenski <jimmy76@gmail.com> wrote:
On Sat, Jun 12, 2021 at 4:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
The patches all build properly and pass all regressions tests.
Note that the cfbot reports a compilation error on windows. That's on
the grammar extension part, so I'm really really interested in trying
to fix that for now, as it's mostly a quick POC to demonstrate how one
could implement a different grammar and validate that everything works
as expected.
Also, if this patch is eventually committed and having some code to
experience the hook is wanted it would probably be better to have a
very naive parser (based on a few strcmp() calls or something like
that) to validate the behavior rather than having a real parser.
pg_parse_query() will instruct plugins to parse a query at a time. They're
free to ignore that mode if they want to implement the 3rd mode. If so, they
should either return multiple RawStmt, a single RawStmt with a 0 or
strlen(query_string) stmt_len, or error out. Otherwise, they will implement
either mode 1 or 2, and they should always return a List containing a single
RawStmt with properly set stmt_len, even if the underlying statement is NULL.
This is required to properly skip valid strings that don't contain a
statements, and pg_parse_query() will skip RawStmt that don't contain an
underlying statement.Wouldn't we want to only loop through the individual statements if parser_hook
exists? The current patch seems to go through the new code path regardless
of the hook being grabbed.
I did think about it, but I eventually chose to write it this way.
Having a different code path for the no-hook situation won't make the
with-hook code any easier (it should only remove some check for the
hook in some places that have 2 or 3 other checks already). On the
other hand, having a single code path avoid some (minimal) code
duplication, and also ensure that the main loop is actively tested
even without the hook being set. That's not 100% coverage, but it's
better than nothing. Performance wise, it shouldn't make any
noticeable difference for the no-hook case.
On Wed, Jul 7, 2021 at 5:26 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Also, if this patch is eventually committed and having some code to
experience the hook is wanted it would probably be better to have a
very naive parser (based on a few strcmp() calls or something like
that) to validate the behavior rather than having a real parser.
The test module is very useful to show how to use the hook but it isn't
very useful to the general user like most other things in contrib. It probably
fits better in src/test/modules
On Wed, Jul 7, 2021 at 8:45 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:
The test module is very useful to show how to use the hook but it isn't
very useful to the general user like most other things in contrib. It probably
fits better in src/test/modules
I agree that it's not useful at all to eventually have it as a
contrib, but it's somewhat convenient at this stage to be able to
easily test the hook, possibly with different behavior.
But as I said, if there's an agreement on the approach and the
implementation, I don't think that it would make sense to keep it even
in the src/test/modules. A full bison parser, even with a limited
grammar, will have about 99% of noise when it comes to demonstrate how
the hook is supposed to work, which basically is having a "single
query" parser or a "full input string" parser. I'm not even convinced
that flex/bison will be the preferred choice for someone who wants to
implement a custom parser.
I tried to add really thorough comments in the various parts of the
patch to make it clear how to do that and how the system will react
depending on what a hook does. I also added some protection to catch
inconsistent hook implementation. I think that's the best way to help
external parser authors to implement what they want, and I'll be happy
to improve the comments if necessary. But if eventually people would
like to have a real parser in the tree, for testing or guidance, I
will of course take care of doing the required changes and moving the
demo parser in src/test/modules.
On Sat, Jun 12, 2021 at 1:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Jun 08, 2021 at 12:16:48PM +0800, Julien Rouhaud wrote:
On Sun, Jun 06, 2021 at 02:50:19PM +0800, Julien Rouhaud wrote:
On Sat, May 01, 2021 at 03:24:58PM +0800, Julien Rouhaud wrote:
I'm attaching some POC patches that implement this approach to start a
discussion.The regression tests weren't stable, v4 fixes that.
1) CFBOT showed the following compilation errors in windows:
"C:\projects\postgresql\pgsql.sln" (default target) (1) ->
"C:\projects\postgresql\sqlol.vcxproj" (default target) (69) ->
(ClCompile target) ->
c1 : fatal error C1083: Cannot open source file:
'contrib/sqlol/sqlol_gram.c': No such file or directory
[C:\projects\postgresql\sqlol.vcxproj]
c:\projects\postgresql\contrib\sqlol\sqlol_gramparse.h(25): fatal
error C1083: Cannot open include file: 'sqlol_gram.h': No such file or
directory (contrib/sqlol/sqlol.c)
[C:\projects\postgresql\sqlol.vcxproj]
c:\projects\postgresql\contrib\sqlol\sqlol_gramparse.h(25): fatal
error C1083: Cannot open include file: 'sqlol_gram.h': No such file or
directory (contrib/sqlol/sqlol_keywords.c)
[C:\projects\postgresql\sqlol.vcxproj]
c1 : fatal error C1083: Cannot open source file:
'contrib/sqlol/sqlol_scan.c': No such file or directory
[C:\projects\postgresql\sqlol.vcxproj]
0 Warning(s)
4 Error(s)
6123
6124Time Elapsed 00:05:40.23
6125
2) There was one small whitespace error with the patch:
git am v4-0002-Add-a-sqlol-parser.patch
Applying: Add a sqlol parser.
.git/rebase-apply/patch:818: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Regards,
Vignesh
On Thu, Jul 22, 2021 at 12:01:34PM +0530, vignesh C wrote:
1) CFBOT showed the following compilation errors in windows:
Thanks for looking at it. I'm aware of this issue on windows, but as mentioned
in the thread the new contrib is there to demonstrate how the new
infrastructure works. If there were some interest in pushing the patch, I
don't think that we would add a full bison parser, whether it's in contrib or
test modules.
So unless there's a clear indication from a committer that we would want to
integrate such a parser, or if someone is interested in reviewing the patch and
only has a windows machine, I don't plan to spend time trying to fix a windows
only problem for something that will disappear anyway.
2) There was one small whitespace error with the patch:
git am v4-0002-Add-a-sqlol-parser.patch
Applying: Add a sqlol parser.
.git/rebase-apply/patch:818: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
Indeed, there's a trailing empty line in contrib/sqlol/sqlol_keywords.c. I
fixed it locally, but as I said this module is most certainly going to
disappear so I'm not sending an updating patch right now.
On Thu, Jul 22, 2021 at 03:04:19PM +0800, Julien Rouhaud wrote:
On Thu, Jul 22, 2021 at 12:01:34PM +0530, vignesh C wrote:
1) CFBOT showed the following compilation errors in windows:
Thanks for looking at it. I'm aware of this issue on windows, but as mentioned
in the thread the new contrib is there to demonstrate how the new
infrastructure works. If there were some interest in pushing the patch, I
don't think that we would add a full bison parser, whether it's in contrib or
test modules.So unless there's a clear indication from a committer that we would want to
integrate such a parser, or if someone is interested in reviewing the patch and
only has a windows machine, I don't plan to spend time trying to fix a windows
only problem for something that will disappear anyway.
I'm not sure what changed in the Windows part of the cfbot, but somehow it's
not hitting any compilation error anymore and all the tests are now green.
v5 attached, fixing conflict with 639a86e36a (Remove Value node struct)
Attachments:
v5-0001-Add-a-parser_hook-hook.patchtext/x-diff; charset=us-asciiDownload
From 00644b3ec87c11fa4c4a1215ed79238e1407cd29 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 22:47:18 +0800
Subject: [PATCH v5 1/4] Add a parser_hook hook.
This does nothing but allow third-party plugins to implement a different
syntax, and fallback on the core parser if they don't implement a superset of
the supported core syntax.
---
src/backend/tcop/postgres.c | 16 ++++++++++++++--
src/include/tcop/tcopprot.h | 5 +++++
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3f9ed549f9..66ee58a4b1 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -99,6 +99,9 @@ int log_statement = LOGSTMT_NONE;
/* GUC variable for maximum stack depth (measured in kilobytes) */
int max_stack_depth = 100;
+/* Hook for plugins to get control in pg_parse_query() */
+parser_hook_type parser_hook = NULL;
+
/* wait N seconds to allow attach from a debugger */
int PostAuthDelay = 0;
@@ -589,18 +592,27 @@ ProcessClientWriteInterrupt(bool blocked)
* database tables. So, we rely on the raw parser to determine whether
* we've seen a COMMIT or ABORT command; when we are in abort state, other
* commands are not processed any further than the raw parse stage.
+ *
+ * To support loadable plugins that monitor the parsing or implements SQL
+ * syntactic sugar we provide a hook variable that lets a plugin get control
+ * before and after the standard parsing process. If the plugin only implement
+ * a subset of postgres supported syntax, it's its duty to call raw_parser (or
+ * the previous hook if any) for the statements it doesn't understand.
*/
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list;
+ List *raw_parsetree_list = NIL;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
+ else
+ raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 968345404e..131dc2b22e 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -17,6 +17,7 @@
#include "nodes/params.h"
#include "nodes/parsenodes.h"
#include "nodes/plannodes.h"
+#include "parser/parser.h"
#include "storage/procsignal.h"
#include "utils/guc.h"
#include "utils/queryenvironment.h"
@@ -43,6 +44,10 @@ typedef enum
extern PGDLLIMPORT int log_statement;
+/* Hook for plugins to get control in pg_parse_query() */
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+extern PGDLLIMPORT parser_hook_type parser_hook;
+
extern List *pg_parse_query(const char *query_string);
extern List *pg_rewrite_query(Query *query);
extern List *pg_analyze_and_rewrite(RawStmt *parsetree,
--
2.32.0
v5-0002-Add-a-sqlol-parser.patchtext/x-diff; charset=us-asciiDownload
From a8815ab1cfdd29a5ae77bfff92279e25d7c380db Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Wed, 21 Apr 2021 23:54:02 +0800
Subject: [PATCH v5 2/4] Add a sqlol parser.
This is a toy example of alternative grammar that only accept a LOLCODE
compatible version of a
SELECT [column, ] column FROM tablename
and fallback on the core parser for everything else.
---
contrib/Makefile | 1 +
contrib/sqlol/.gitignore | 7 +
contrib/sqlol/Makefile | 33 ++
contrib/sqlol/sqlol.c | 107 +++++++
contrib/sqlol/sqlol_gram.y | 439 ++++++++++++++++++++++++++
contrib/sqlol/sqlol_gramparse.h | 61 ++++
contrib/sqlol/sqlol_keywords.c | 97 ++++++
contrib/sqlol/sqlol_keywords.h | 38 +++
contrib/sqlol/sqlol_kwlist.h | 21 ++
contrib/sqlol/sqlol_scan.l | 544 ++++++++++++++++++++++++++++++++
contrib/sqlol/sqlol_scanner.h | 118 +++++++
11 files changed, 1466 insertions(+)
create mode 100644 contrib/sqlol/.gitignore
create mode 100644 contrib/sqlol/Makefile
create mode 100644 contrib/sqlol/sqlol.c
create mode 100644 contrib/sqlol/sqlol_gram.y
create mode 100644 contrib/sqlol/sqlol_gramparse.h
create mode 100644 contrib/sqlol/sqlol_keywords.c
create mode 100644 contrib/sqlol/sqlol_keywords.h
create mode 100644 contrib/sqlol/sqlol_kwlist.h
create mode 100644 contrib/sqlol/sqlol_scan.l
create mode 100644 contrib/sqlol/sqlol_scanner.h
diff --git a/contrib/Makefile b/contrib/Makefile
index f27e458482..2a80cd137b 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
postgres_fdw \
seg \
spi \
+ sqlol \
tablefunc \
tcn \
test_decoding \
diff --git a/contrib/sqlol/.gitignore b/contrib/sqlol/.gitignore
new file mode 100644
index 0000000000..3c4b587792
--- /dev/null
+++ b/contrib/sqlol/.gitignore
@@ -0,0 +1,7 @@
+# Generated subdirectories
+/log/
+/results/
+/tmp_check/
+sqlol_gram.c
+sqlol_gram.h
+sqlol_scan.c
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
new file mode 100644
index 0000000000..3850ac3fce
--- /dev/null
+++ b/contrib/sqlol/Makefile
@@ -0,0 +1,33 @@
+# contrib/sqlol/Makefile
+
+MODULE_big = sqlol
+OBJS = \
+ $(WIN32RES) \
+ sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
+PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+
+sqlol_gram.h: sqlol_gram.c
+ touch $@
+
+sqlol_gram.c: BISONFLAGS += -d
+# sqlol_gram.c: BISON_CHECK_CMD = $(PERL) $(srcdir)/check_keywords.pl $< $(top_srcdir)/src/include/parser/kwlist.h
+
+
+sqlol_scan.c: FLEXFLAGS = -CF -p -p
+sqlol_scan.c: FLEX_NO_BACKUP=yes
+sqlol_scan.c: FLEX_FIX_WARNING=yes
+
+
+# Force these dependencies to be known even without dependency info built:
+sqlol.o sqlol_gram.o sqlol_scan.o parser.o: sqlol_gram.h
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/sqlol
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
new file mode 100644
index 0000000000..b986966181
--- /dev/null
+++ b/contrib/sqlol/sqlol.c
@@ -0,0 +1,107 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol.c
+ *
+ *
+ * Copyright (c) 2008-2021, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "tcop/tcopprot.h"
+
+#include "sqlol_gramparse.h"
+#include "sqlol_keywords.h"
+
+PG_MODULE_MAGIC;
+
+
+/* Saved hook values in case of unload */
+static parser_hook_type prev_parser_hook = NULL;
+
+void _PG_init(void);
+void _PG_fini(void);
+
+static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+ /* Install hooks. */
+ prev_parser_hook = parser_hook;
+ parser_hook = sqlol_parser_hook;
+}
+
+/*
+ * Module unload callback
+ */
+void
+_PG_fini(void)
+{
+ /* Uninstall hooks. */
+ parser_hook = prev_parser_hook;
+}
+
+/*
+ * sqlol_parser_hook: parse our grammar
+ */
+static List *
+sqlol_parser_hook(const char *str, RawParseMode mode)
+{
+ sqlol_yyscan_t yyscanner;
+ sqlol_base_yy_extra_type yyextra;
+ int yyresult;
+
+ if (mode != RAW_PARSE_DEFAULT)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ /* initialize the flex scanner */
+ yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
+ sqlol_ScanKeywords, sqlol_NumScanKeywords);
+
+ /* initialize the bison parser */
+ sqlol_parser_init(&yyextra);
+
+ /* Parse! */
+ yyresult = sqlol_base_yyparse(yyscanner);
+
+ /* Clean up (release memory) */
+ sqlol_scanner_finish(yyscanner);
+
+ /*
+ * Invalid statement, fallback on previous parser_hook if any or
+ * raw_parser()
+ */
+ if (yyresult)
+ {
+ if (prev_parser_hook)
+ return (*prev_parser_hook) (str, mode);
+ else
+ return raw_parser(str, mode);
+ }
+
+ return yyextra.parsetree;
+}
+
+int
+sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp, sqlol_yyscan_t yyscanner)
+{
+ int cur_token;
+
+ cur_token = sqlol_yylex(&(lvalp->sqlol_yystype), llocp, yyscanner);
+
+ return cur_token;
+}
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
new file mode 100644
index 0000000000..3214865a53
--- /dev/null
+++ b/contrib/sqlol/sqlol_gram.y
@@ -0,0 +1,439 @@
+%{
+
+/*#define YYDEBUG 1*/
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gram.y
+ * sqlol BISON rules/actions
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_gram.y
+ *
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "catalog/namespace.h"
+#include "nodes/makefuncs.h"
+
+#include "sqlol_gramparse.h"
+
+/*
+ * Location tracking support --- simpler than bison's default, since we only
+ * want to track the start position not the end position of each nonterminal.
+ */
+#define YYLLOC_DEFAULT(Current, Rhs, N) \
+ do { \
+ if ((N) > 0) \
+ (Current) = (Rhs)[1]; \
+ else \
+ (Current) = (-1); \
+ } while (0)
+
+/*
+ * The above macro assigns -1 (unknown) as the parse location of any
+ * nonterminal that was reduced from an empty rule, or whose leftmost
+ * component was reduced from an empty rule. This is problematic
+ * for nonterminals defined like
+ * OptFooList: / * EMPTY * / { ... } | OptFooList Foo { ... } ;
+ * because we'll set -1 as the location during the first reduction and then
+ * copy it during each subsequent reduction, leaving us with -1 for the
+ * location even when the list is not empty. To fix that, do this in the
+ * action for the nonempty rule(s):
+ * if (@$ < 0) @$ = @2;
+ * (Although we have many nonterminals that follow this pattern, we only
+ * bother with fixing @$ like this when the nonterminal's parse location
+ * is actually referenced in some rule.)
+ *
+ * A cleaner answer would be to make YYLLOC_DEFAULT scan all the Rhs
+ * locations until it's found one that's not -1. Then we'd get a correct
+ * location for any nonterminal that isn't entirely empty. But this way
+ * would add overhead to every rule reduction, and so far there's not been
+ * a compelling reason to pay that overhead.
+ */
+
+/*
+ * Bison doesn't allocate anything that needs to live across parser calls,
+ * so we can easily have it use palloc instead of malloc. This prevents
+ * memory leaks if we error out during parsing. Note this only works with
+ * bison >= 2.0. However, in bison 1.875 the default is to use alloca()
+ * if possible, so there's not really much problem anyhow, at least if
+ * you're building with gcc.
+ */
+#define YYMALLOC palloc
+#define YYFREE pfree
+
+
+#define parser_yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+#define parser_errposition(pos) sqlol_scanner_errposition(pos, yyscanner)
+
+static void sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner,
+ const char *msg);
+static RawStmt *makeRawStmt(Node *stmt, int stmt_location);
+static void updateRawStmtEnd(RawStmt *rs, int end_location);
+static Node *makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner);
+static void check_qualified_name(List *names, sqlol_yyscan_t yyscanner);
+static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
+
+%}
+
+%pure-parser
+%expect 0
+%name-prefix="sqlol_base_yy"
+%locations
+
+%parse-param {sqlol_yyscan_t yyscanner}
+%lex-param {sqlol_yyscan_t yyscanner}
+
+%union
+{
+ sqlol_YYSTYPE sqlol_yystype;
+ /* these fields must match sqlol_YYSTYPE: */
+ int ival;
+ char *str;
+ const char *keyword;
+
+ List *list;
+ Node *node;
+ RangeVar *range;
+ ResTarget *target;
+}
+
+%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+ indirection_el
+
+%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+
+%type <range> qualified_name
+
+%type <str> ColId ColLabel attr_name
+
+%type <target> gimmeh_el
+
+/*
+ * Non-keyword token types. These are hard-wired into the "flex" lexer.
+ * They must be listed first so that their numeric codes do not depend on
+ * the set of keywords. PL/pgSQL depends on this so that it can share the
+ * same lexer. If you add/change tokens here, fix PL/pgSQL to match!
+ *
+ */
+%token <str> IDENT FCONST SCONST Op
+
+/*
+ * If you want to make any keyword changes, update the keyword table in
+ * src/include/parser/kwlist.h and add new keywords to the appropriate one
+ * of the reserved-or-not-so-reserved keyword lists, below; search
+ * this file for "Keyword category lists".
+ */
+
+/* ordinary key words in alphabetical order */
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE
+
+
+%%
+
+/*
+ * The target production for the whole parse.
+ *
+ * Ordinarily we parse a list of statements, but if we see one of the
+ * special MODE_XXX symbols as first token, we parse something else.
+ * The options here correspond to enum RawParseMode, which see for details.
+ */
+parse_toplevel:
+ stmtmulti
+ {
+ pg_yyget_extra(yyscanner)->parsetree = $1;
+ }
+ ;
+
+/*
+ * At top level, we wrap each stmt with a RawStmt node carrying start location
+ * and length of the stmt's text. Notice that the start loc/len are driven
+ * entirely from semicolon locations (@2). It would seem natural to use
+ * @1 or @3 to get the true start location of a stmt, but that doesn't work
+ * for statements that can start with empty nonterminals (opt_with_clause is
+ * the main offender here); as noted in the comments for YYLLOC_DEFAULT,
+ * we'd get -1 for the location in such cases.
+ * We also take care to discard empty statements entirely.
+ */
+stmtmulti: stmtmulti KTHXBYE toplevel_stmt
+ {
+ if ($1 != NIL)
+ {
+ /* update length of previous stmt */
+ updateRawStmtEnd(llast_node(RawStmt, $1), @2);
+ }
+ if ($3 != NULL)
+ $$ = lappend($1, makeRawStmt($3, @2 + 1));
+ else
+ $$ = $1;
+ }
+ | toplevel_stmt
+ {
+ if ($1 != NULL)
+ $$ = list_make1(makeRawStmt($1, 0));
+ else
+ $$ = NIL;
+ }
+ ;
+
+/*
+ * toplevel_stmt includes BEGIN and END. stmt does not include them, because
+ * those words have different meanings in function bodys.
+ */
+toplevel_stmt:
+ stmt
+ ;
+
+stmt:
+ GimmehStmt
+ | /*EMPTY*/
+ { $$ = NULL; }
+ ;
+
+/*****************************************************************************
+ *
+ * GIMMEH statement
+ *
+ *****************************************************************************/
+
+GimmehStmt:
+ simple_gimmeh { $$ = $1; }
+ ;
+
+simple_gimmeh:
+ HAI FCONST I HAS A qualified_name
+ GIMMEH gimmeh_list
+ {
+ SelectStmt *n = makeNode(SelectStmt);
+ n->targetList = $8;
+ n->fromClause = list_make1($6);
+ $$ = (Node *)n;
+ }
+ ;
+
+gimmeh_list:
+ gimmeh_el { $$ = list_make1($1); }
+ | gimmeh_list ',' gimmeh_el { $$ = lappend($1, $3); }
+
+gimmeh_el:
+ columnref
+ {
+ $$ = makeNode(ResTarget);
+ $$->name = NULL;
+ $$->indirection = NIL;
+ $$->val = (Node *)$1;
+ $$->location = @1;
+ }
+
+qualified_name:
+ ColId
+ {
+ $$ = makeRangeVar(NULL, $1, @1);
+ }
+ | ColId indirection
+ {
+ check_qualified_name($2, yyscanner);
+ $$ = makeRangeVar(NULL, NULL, @1);
+ switch (list_length($2))
+ {
+ case 1:
+ $$->catalogname = NULL;
+ $$->schemaname = $1;
+ $$->relname = strVal(linitial($2));
+ break;
+ case 2:
+ $$->catalogname = $1;
+ $$->schemaname = strVal(linitial($2));
+ $$->relname = strVal(lsecond($2));
+ break;
+ default:
+ /*
+ * It's ok to error out here as at this point we
+ * already parsed a "HAI FCONST" preamble, and no
+ * other grammar is likely to accept a command
+ * starting with that, so there's no point trying
+ * to fall back on the other grammars.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("improper qualified name (too many dotted names): %s",
+ NameListToString(lcons(makeString($1), $2))),
+ parser_errposition(@1)));
+ break;
+ }
+ }
+ ;
+
+columnref: ColId
+ {
+ $$ = makeColumnRef($1, NIL, @1, yyscanner);
+ }
+ | ColId indirection
+ {
+ $$ = makeColumnRef($1, $2, @1, yyscanner);
+ }
+ ;
+
+ColId: IDENT { $$ = $1; }
+
+indirection:
+ indirection_el { $$ = list_make1($1); }
+ | indirection indirection_el { $$ = lappend($1, $2); }
+ ;
+
+indirection_el:
+ '.' attr_name
+ {
+ $$ = (Node *) makeString($2);
+ }
+ ;
+
+attr_name: ColLabel { $$ = $1; };
+
+ColLabel: IDENT { $$ = $1; }
+
+%%
+
+/*
+ * The signature of this function is required by bison. However, we
+ * ignore the passed yylloc and instead use the last token position
+ * available from the scanner.
+ */
+static void
+sqlol_base_yyerror(YYLTYPE *yylloc, sqlol_yyscan_t yyscanner, const char *msg)
+{
+ parser_yyerror(msg);
+}
+
+static RawStmt *
+makeRawStmt(Node *stmt, int stmt_location)
+{
+ RawStmt *rs = makeNode(RawStmt);
+
+ rs->stmt = stmt;
+ rs->stmt_location = stmt_location;
+ rs->stmt_len = 0; /* might get changed later */
+ return rs;
+}
+
+/* Adjust a RawStmt to reflect that it doesn't run to the end of the string */
+static void
+updateRawStmtEnd(RawStmt *rs, int end_location)
+{
+ /*
+ * If we already set the length, don't change it. This is for situations
+ * like "select foo ;; select bar" where the same statement will be last
+ * in the string for more than one semicolon.
+ */
+ if (rs->stmt_len > 0)
+ return;
+
+ /* OK, update length of RawStmt */
+ rs->stmt_len = end_location - rs->stmt_location;
+}
+
+static Node *
+makeColumnRef(char *colname, List *indirection,
+ int location, sqlol_yyscan_t yyscanner)
+{
+ /*
+ * Generate a ColumnRef node, with an A_Indirection node added if there
+ * is any subscripting in the specified indirection list. However,
+ * any field selection at the start of the indirection list must be
+ * transposed into the "fields" part of the ColumnRef node.
+ */
+ ColumnRef *c = makeNode(ColumnRef);
+ int nfields = 0;
+ ListCell *l;
+
+ c->location = location;
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Indices))
+ {
+ A_Indirection *i = makeNode(A_Indirection);
+
+ if (nfields == 0)
+ {
+ /* easy case - all indirection goes to A_Indirection */
+ c->fields = list_make1(makeString(colname));
+ i->indirection = check_indirection(indirection, yyscanner);
+ }
+ else
+ {
+ /* got to split the list in two */
+ i->indirection = check_indirection(list_copy_tail(indirection,
+ nfields),
+ yyscanner);
+ indirection = list_truncate(indirection, nfields);
+ c->fields = lcons(makeString(colname), indirection);
+ }
+ i->arg = (Node *) c;
+ return (Node *) i;
+ }
+ else if (IsA(lfirst(l), A_Star))
+ {
+ /* We only allow '*' at the end of a ColumnRef */
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ nfields++;
+ }
+ /* No subscripting, so all indirection gets added to field list */
+ c->fields = lcons(makeString(colname), indirection);
+ return (Node *) c;
+}
+
+/* check_qualified_name --- check the result of qualified_name production
+ *
+ * It's easiest to let the grammar production for qualified_name allow
+ * subscripts and '*', which we then must reject here.
+ */
+static void
+check_qualified_name(List *names, sqlol_yyscan_t yyscanner)
+{
+ ListCell *i;
+
+ foreach(i, names)
+ {
+ if (!IsA(lfirst(i), String))
+ parser_yyerror("syntax error");
+ }
+}
+
+/* check_indirection --- check the result of indirection production
+ *
+ * We only allow '*' at the end of the list, but it's hard to enforce that
+ * in the grammar, so do it here.
+ */
+static List *
+check_indirection(List *indirection, sqlol_yyscan_t yyscanner)
+{
+ ListCell *l;
+
+ foreach(l, indirection)
+ {
+ if (IsA(lfirst(l), A_Star))
+ {
+ if (lnext(indirection, l) != NULL)
+ parser_yyerror("improper use of \"*\"");
+ }
+ }
+ return indirection;
+}
+
+/* sqlol_parser_init()
+ * Initialize to parse one query string
+ */
+void
+sqlol_parser_init(sqlol_base_yy_extra_type *yyext)
+{
+ yyext->parsetree = NIL; /* in case grammar forgets to set it */
+}
diff --git a/contrib/sqlol/sqlol_gramparse.h b/contrib/sqlol/sqlol_gramparse.h
new file mode 100644
index 0000000000..58233a8d87
--- /dev/null
+++ b/contrib/sqlol/sqlol_gramparse.h
@@ -0,0 +1,61 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_gramparse.h
+ * Shared definitions for the "raw" parser (flex and bison phases only)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_gramparse.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_GRAMPARSE_H
+#define SQLOL_GRAMPARSE_H
+
+#include "nodes/parsenodes.h"
+#include "sqlol_scanner.h"
+
+/*
+ * NB: include gram.h only AFTER including scanner.h, because scanner.h
+ * is what #defines YYLTYPE.
+ */
+#include "sqlol_gram.h"
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around. Private
+ * state needed for raw parsing/lexing goes here.
+ */
+typedef struct sqlol_base_yy_extra_type
+{
+ /*
+ * Fields used by the core scanner.
+ */
+ sqlol_yy_extra_type sqlol_yy_extra;
+
+ /*
+ * State variables that belong to the grammar.
+ */
+ List *parsetree; /* final parse result is delivered here */
+} sqlol_base_yy_extra_type;
+
+/*
+ * In principle we should use yyget_extra() to fetch the yyextra field
+ * from a yyscanner struct. However, flex always puts that field first,
+ * and this is sufficiently performance-critical to make it seem worth
+ * cheating a bit to use an inline macro.
+ */
+#define pg_yyget_extra(yyscanner) (*((sqlol_base_yy_extra_type **) (yyscanner)))
+
+
+/* from parser.c */
+extern int sqlol_base_yylex(YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+
+/* from gram.y */
+extern void sqlol_parser_init(sqlol_base_yy_extra_type *yyext);
+extern int sqlol_baseyyparse(sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_GRAMPARSE_H */
diff --git a/contrib/sqlol/sqlol_keywords.c b/contrib/sqlol/sqlol_keywords.c
new file mode 100644
index 0000000000..ee51f423ac
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.c
@@ -0,0 +1,97 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.c
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * sqlol/sqlol_keywords.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "sqlol_gramparse.h"
+
+#define PG_KEYWORD(a,b,c) {a,b,c},
+
+const sqlol_ScanKeyword sqlol_ScanKeywords[] = {
+#include "sqlol_kwlist.h"
+};
+
+const int sqlol_NumScanKeywords = lengthof(sqlol_ScanKeywords);
+
+#undef PG_KEYWORD
+
+
+/*
+ * ScanKeywordLookup - see if a given word is a keyword
+ *
+ * The table to be searched is passed explicitly, so that this can be used
+ * to search keyword lists other than the standard list appearing above.
+ *
+ * Returns a pointer to the sqlol_ScanKeyword table entry, or NULL if no match.
+ *
+ * The match is done case-insensitively. Note that we deliberately use a
+ * dumbed-down case conversion that will only translate 'A'-'Z' into 'a'-'z',
+ * even if we are in a locale where tolower() would produce more or different
+ * translations. This is to conform to the SQL99 spec, which says that
+ * keywords are to be matched in this way even though non-keyword identifiers
+ * receive a different case-normalization mapping.
+ */
+const sqlol_ScanKeyword *
+sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ int len,
+ i;
+ char word[NAMEDATALEN];
+ const sqlol_ScanKeyword *low;
+ const sqlol_ScanKeyword *high;
+
+ len = strlen(text);
+ /* We assume all keywords are shorter than NAMEDATALEN. */
+ if (len >= NAMEDATALEN)
+ return NULL;
+
+ /*
+ * Apply an ASCII-only downcasing. We must not use tolower() since it may
+ * produce the wrong translation in some locales (eg, Turkish).
+ */
+ for (i = 0; i < len; i++)
+ {
+ char ch = text[i];
+
+ if (ch >= 'A' && ch <= 'Z')
+ ch += 'a' - 'A';
+ word[i] = ch;
+ }
+ word[len] = '\0';
+
+ /*
+ * Now do a binary search using plain strcmp() comparison.
+ */
+ low = keywords;
+ high = keywords + (num_keywords - 1);
+ while (low <= high)
+ {
+ const sqlol_ScanKeyword *middle;
+ int difference;
+
+ middle = low + (high - low) / 2;
+ difference = strcmp(middle->name, word);
+ if (difference == 0)
+ return middle;
+ else if (difference < 0)
+ low = middle + 1;
+ else
+ high = middle - 1;
+ }
+
+ return NULL;
+}
diff --git a/contrib/sqlol/sqlol_keywords.h b/contrib/sqlol/sqlol_keywords.h
new file mode 100644
index 0000000000..bc4acf4541
--- /dev/null
+++ b/contrib/sqlol/sqlol_keywords.h
@@ -0,0 +1,38 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_keywords.h
+ * lexical token lookup for key words in PostgreSQL
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_keywords.h
+ *
+ *-------------------------------------------------------------------------
+ */
+#ifndef SQLOL_KEYWORDS_H
+#define SQLOL_KEYWORDS_H
+
+/* Keyword categories --- should match lists in gram.y */
+#define UNRESERVED_KEYWORD 0
+#define COL_NAME_KEYWORD 1
+#define TYPE_FUNC_NAME_KEYWORD 2
+#define RESERVED_KEYWORD 3
+
+
+typedef struct sqlol_ScanKeyword
+{
+ const char *name; /* in lower case */
+ int16 value; /* grammar's token code */
+ int16 category; /* see codes above */
+} sqlol_ScanKeyword;
+
+extern PGDLLIMPORT const sqlol_ScanKeyword sqlol_ScanKeywords[];
+extern PGDLLIMPORT const int sqlol_NumScanKeywords;
+
+extern const sqlol_ScanKeyword *sqlol_ScanKeywordLookup(const char *text,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+
+#endif /* SQLOL_KEYWORDS_H */
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
new file mode 100644
index 0000000000..2de3893ee4
--- /dev/null
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -0,0 +1,21 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_kwlist.h
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_kwlist.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+/* name, value, category, is-bare-label */
+PG_KEYWORD("a", A, UNRESERVED_KEYWORD)
+PG_KEYWORD("gimmeh", GIMMEH, UNRESERVED_KEYWORD)
+PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
+PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
+PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
+PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
new file mode 100644
index 0000000000..a7088b8390
--- /dev/null
+++ b/contrib/sqlol/sqlol_scan.l
@@ -0,0 +1,544 @@
+%top{
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scan.l
+ * lexical scanner for sqlol
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * contrib/sqlol/sqlol_scan.l
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "common/string.h"
+#include "sqlol_gramparse.h"
+#include "parser/scansup.h"
+#include "mb/pg_wchar.h"
+
+#include "sqlol_keywords.h"
+}
+
+%{
+
+/* LCOV_EXCL_START */
+
+/* Avoid exit() on fatal scanner errors (a bit ugly -- see yy_fatal_error) */
+#undef fprintf
+#define fprintf(file, fmt, msg) fprintf_to_ereport(fmt, msg)
+
+static void
+fprintf_to_ereport(const char *fmt, const char *msg)
+{
+ ereport(ERROR, (errmsg_internal("%s", msg)));
+}
+
+
+/*
+ * Set the type of YYSTYPE.
+ */
+#define YYSTYPE sqlol_YYSTYPE
+
+/*
+ * Set the type of yyextra. All state variables used by the scanner should
+ * be in yyextra, *not* statically allocated.
+ */
+#define YY_EXTRA_TYPE sqlol_yy_extra_type *
+
+/*
+ * Each call to yylex must set yylloc to the location of the found token
+ * (expressed as a byte offset from the start of the input text).
+ * When we parse a token that requires multiple lexer rules to process,
+ * this should be done in the first such rule, else yylloc will point
+ * into the middle of the token.
+ */
+#define SET_YYLLOC() (*(yylloc) = yytext - yyextra->scanbuf)
+
+/*
+ * Advance yylloc by the given number of bytes.
+ */
+#define ADVANCE_YYLLOC(delta) ( *(yylloc) += (delta) )
+
+/*
+ * Sometimes, we do want yylloc to point into the middle of a token; this is
+ * useful for instance to throw an error about an escape sequence within a
+ * string literal. But if we find no error there, we want to revert yylloc
+ * to the token start, so that that's the location reported to the parser.
+ * Use PUSH_YYLLOC/POP_YYLLOC to save/restore yylloc around such code.
+ * (Currently the implied "stack" is just one location, but someday we might
+ * need to nest these.)
+ */
+#define PUSH_YYLLOC() (yyextra->save_yylloc = *(yylloc))
+#define POP_YYLLOC() (*(yylloc) = yyextra->save_yylloc)
+
+#define startlit() ( yyextra->literallen = 0 )
+static void addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner);
+static void addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner);
+static char *litbufdup(sqlol_yyscan_t yyscanner);
+
+#define yyerror(msg) sqlol_scanner_yyerror(msg, yyscanner)
+
+#define lexer_errposition() sqlol_scanner_errposition(*(yylloc), yyscanner)
+
+/*
+ * Work around a bug in flex 2.5.35: it emits a couple of functions that
+ * it forgets to emit declarations for. Since we use -Wmissing-prototypes,
+ * this would cause warnings. Providing our own declarations should be
+ * harmless even when the bug gets fixed.
+ */
+extern int sqlol_yyget_column(yyscan_t yyscanner);
+extern void sqlol_yyset_column(int column_no, yyscan_t yyscanner);
+
+%}
+
+%option reentrant
+%option bison-bridge
+%option bison-locations
+%option 8bit
+%option never-interactive
+%option nodefault
+%option noinput
+%option nounput
+%option noyywrap
+%option noyyalloc
+%option noyyrealloc
+%option noyyfree
+%option warn
+%option prefix="sqlol_yy"
+
+/*
+ * OK, here is a short description of lex/flex rules behavior.
+ * The longest pattern which matches an input string is always chosen.
+ * For equal-length patterns, the first occurring in the rules list is chosen.
+ * INITIAL is the starting state, to which all non-conditional rules apply.
+ * Exclusive states change parsing rules while the state is active. When in
+ * an exclusive state, only those rules defined for that state apply.
+ *
+ * We use exclusive states for quoted strings, extended comments,
+ * and to eliminate parsing troubles for numeric strings.
+ * Exclusive states:
+ * <xd> delimited identifiers (double-quoted identifiers)
+ * <xq> standard quoted strings
+ * <xqs> quote stop (detect continued strings)
+ *
+ * Remember to add an <<EOF>> case whenever you add a new exclusive state!
+ * The default one is probably not the right thing.
+ */
+
+%x xd
+%x xq
+%x xqs
+
+/*
+ * In order to make the world safe for Windows and Mac clients as well as
+ * Unix ones, we accept either \n or \r as a newline. A DOS-style \r\n
+ * sequence will be seen as two successive newlines, but that doesn't cause
+ * any problems. Comments that start with -- and extend to the next
+ * newline are treated as equivalent to a single whitespace character.
+ *
+ * NOTE a fine point: if there is no newline following --, we will absorb
+ * everything to the end of the input as a comment. This is correct. Older
+ * versions of Postgres failed to recognize -- as a comment if the input
+ * did not end with a newline.
+ *
+ * XXX perhaps \f (formfeed) should be treated as a newline as well?
+ *
+ * XXX if you change the set of whitespace characters, fix scanner_isspace()
+ * to agree.
+ */
+
+space [ \t\n\r\f]
+horiz_space [ \t\f]
+newline [\n\r]
+non_newline [^\n\r]
+
+comment ("--"{non_newline}*)
+
+whitespace ({space}+|{comment})
+
+/*
+ * SQL requires at least one newline in the whitespace separating
+ * string literals that are to be concatenated. Silly, but who are we
+ * to argue? Note that {whitespace_with_newline} should not have * after
+ * it, whereas {whitespace} should generally have a * after it...
+ */
+
+special_whitespace ({space}+|{comment}{newline})
+horiz_whitespace ({horiz_space}|{comment})
+whitespace_with_newline ({horiz_whitespace}*{newline}{special_whitespace}*)
+
+quote '
+/* If we see {quote} then {quotecontinue}, the quoted string continues */
+quotecontinue {whitespace_with_newline}{quote}
+
+/*
+ * {quotecontinuefail} is needed to avoid lexer backup when we fail to match
+ * {quotecontinue}. It might seem that this could just be {whitespace}*,
+ * but if there's a dash after {whitespace_with_newline}, it must be consumed
+ * to see if there's another dash --- which would start a {comment} and thus
+ * allow continuation of the {quotecontinue} token.
+ */
+quotecontinuefail {whitespace}*"-"?
+
+/* Extended quote
+ * xqdouble implements embedded quote, ''''
+ */
+xqstart {quote}
+xqdouble {quote}{quote}
+xqinside [^']+
+
+/* Double quote
+ * Allows embedded spaces and other special characters into identifiers.
+ */
+dquote \"
+xdstart {dquote}
+xdstop {dquote}
+xddouble {dquote}{dquote}
+xdinside [^"]+
+
+digit [0-9]
+ident_start [A-Za-z\200-\377_]
+ident_cont [A-Za-z\200-\377_0-9\$]
+
+identifier {ident_start}{ident_cont}*
+
+decimal (({digit}+)|({digit}*\.{digit}+)|({digit}+\.{digit}*))
+
+other .
+
+%%
+
+{whitespace} {
+ /* ignore */
+ }
+
+
+{xqstart} {
+ yyextra->saw_non_ascii = false;
+ SET_YYLLOC();
+ BEGIN(xq);
+ startlit();
+}
+<xq>{quote} {
+ /*
+ * When we are scanning a quoted string and see an end
+ * quote, we must look ahead for a possible continuation.
+ * If we don't see one, we know the end quote was in fact
+ * the end of the string. To reduce the lexer table size,
+ * we use a single "xqs" state to do the lookahead for all
+ * types of strings.
+ */
+ yyextra->state_before_str_stop = YYSTATE;
+ BEGIN(xqs);
+ }
+<xqs>{quotecontinue} {
+ /*
+ * Found a quote continuation, so return to the in-quote
+ * state and continue scanning the literal. Nothing is
+ * added to the literal's contents.
+ */
+ BEGIN(yyextra->state_before_str_stop);
+ }
+<xqs>{quotecontinuefail} |
+<xqs>{other} |
+<xqs><<EOF>> {
+ /*
+ * Failed to see a quote continuation. Throw back
+ * everything after the end quote, and handle the string
+ * according to the state we were in previously.
+ */
+ yyless(0);
+ BEGIN(INITIAL);
+
+ switch (yyextra->state_before_str_stop)
+ {
+ case xq:
+ /*
+ * Check that the data remains valid, if it might
+ * have been made invalid by unescaping any chars.
+ */
+ if (yyextra->saw_non_ascii)
+ pg_verifymbstr(yyextra->literalbuf,
+ yyextra->literallen,
+ false);
+ yylval->str = litbufdup(yyscanner);
+ return SCONST;
+ default:
+ yyerror("unhandled previous state in xqs");
+ }
+ }
+
+<xq>{xqdouble} {
+ addlitchar('\'', yyscanner);
+ }
+<xq>{xqinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xq><<EOF>> { yyerror("unterminated quoted string"); }
+
+
+{xdstart} {
+ SET_YYLLOC();
+ BEGIN(xd);
+ startlit();
+ }
+<xd>{xdstop} {
+ char *ident;
+
+ BEGIN(INITIAL);
+ if (yyextra->literallen == 0)
+ yyerror("zero-length delimited identifier");
+ ident = litbufdup(yyscanner);
+ if (yyextra->literallen >= NAMEDATALEN)
+ truncate_identifier(ident, yyextra->literallen, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+<xd>{xddouble} {
+ addlitchar('"', yyscanner);
+ }
+<xd>{xdinside} {
+ addlit(yytext, yyleng, yyscanner);
+ }
+<xd><<EOF>> { yyerror("unterminated quoted identifier"); }
+
+{decimal} {
+ SET_YYLLOC();
+ yylval->str = pstrdup(yytext);
+ return FCONST;
+ }
+
+{identifier} {
+ const sqlol_ScanKeyword *keyword;
+ char *ident;
+
+ SET_YYLLOC();
+
+ /* Is it a keyword? */
+ keyword = sqlol_ScanKeywordLookup(yytext,
+ yyextra->keywords,
+ yyextra->num_keywords);
+ if (keyword != NULL)
+ {
+ yylval->keyword = keyword->name;
+ return keyword->value;
+ }
+
+ /*
+ * No. Convert the identifier to lower case, and truncate
+ * if necessary.
+ */
+ ident = downcase_truncate_identifier(yytext, yyleng, true);
+ yylval->str = ident;
+ return IDENT;
+ }
+
+{other} {
+ SET_YYLLOC();
+ return yytext[0];
+ }
+
+<<EOF>> {
+ SET_YYLLOC();
+ yyterminate();
+ }
+
+%%
+
+/* LCOV_EXCL_STOP */
+
+/*
+ * Arrange access to yyextra for subroutines of the main yylex() function.
+ * We expect each subroutine to have a yyscanner parameter. Rather than
+ * use the yyget_xxx functions, which might or might not get inlined by the
+ * compiler, we cheat just a bit and cast yyscanner to the right type.
+ */
+#undef yyextra
+#define yyextra (((struct yyguts_t *) yyscanner)->yyextra_r)
+
+/* Likewise for a couple of other things we need. */
+#undef yylloc
+#define yylloc (((struct yyguts_t *) yyscanner)->yylloc_r)
+#undef yyleng
+#define yyleng (((struct yyguts_t *) yyscanner)->yyleng_r)
+
+
+/*
+ * scanner_errposition
+ * Report a lexer or grammar error cursor position, if possible.
+ *
+ * This is expected to be used within an ereport() call. The return value
+ * is a dummy (always 0, in fact).
+ *
+ * Note that this can only be used for messages emitted during raw parsing
+ * (essentially, sqlol_scan.l, sqlol_parser.c, sqlol_and gram.y), since it
+ * requires the yyscanner struct to still be available.
+ */
+int
+sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner)
+{
+ int pos;
+
+ if (location < 0)
+ return 0; /* no-op if location is unknown */
+
+ /* Convert byte offset to character number */
+ pos = pg_mbstrlen_with_len(yyextra->scanbuf, location) + 1;
+ /* And pass it to the ereport mechanism */
+ return errposition(pos);
+}
+
+/*
+ * scanner_yyerror
+ * Report a lexer or grammar error.
+ *
+ * Just ignore as we'll fallback to raw_parser().
+ */
+void
+sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner)
+{
+ return;
+}
+
+
+/*
+ * Called before any actual parsing is done
+ */
+sqlol_yyscan_t
+sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords)
+{
+ Size slen = strlen(str);
+ yyscan_t scanner;
+
+ if (yylex_init(&scanner) != 0)
+ elog(ERROR, "yylex_init() failed: %m");
+
+ sqlol_yyset_extra(yyext, scanner);
+
+ yyext->keywords = keywords;
+ yyext->num_keywords = num_keywords;
+
+ /*
+ * Make a scan buffer with special termination needed by flex.
+ */
+ yyext->scanbuf = (char *) palloc(slen + 2);
+ yyext->scanbuflen = slen;
+ memcpy(yyext->scanbuf, str, slen);
+ yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
+ yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+
+ /* initialize literal buffer to a reasonable but expansible size */
+ yyext->literalalloc = 1024;
+ yyext->literalbuf = (char *) palloc(yyext->literalalloc);
+ yyext->literallen = 0;
+
+ return scanner;
+}
+
+
+/*
+ * Called after parsing is done to clean up after scanner_init()
+ */
+void
+sqlol_scanner_finish(sqlol_yyscan_t yyscanner)
+{
+ /*
+ * We don't bother to call yylex_destroy(), because all it would do is
+ * pfree a small amount of control storage. It's cheaper to leak the
+ * storage until the parsing context is destroyed. The amount of space
+ * involved is usually negligible compared to the output parse tree
+ * anyway.
+ *
+ * We do bother to pfree the scanbuf and literal buffer, but only if they
+ * represent a nontrivial amount of space. The 8K cutoff is arbitrary.
+ */
+ if (yyextra->scanbuflen >= 8192)
+ pfree(yyextra->scanbuf);
+ if (yyextra->literalalloc >= 8192)
+ pfree(yyextra->literalbuf);
+}
+
+
+static void
+addlit(char *ytext, int yleng, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + yleng) >= yyextra->literalalloc)
+ {
+ do
+ {
+ yyextra->literalalloc *= 2;
+ } while ((yyextra->literallen + yleng) >= yyextra->literalalloc);
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ memcpy(yyextra->literalbuf + yyextra->literallen, ytext, yleng);
+ yyextra->literallen += yleng;
+}
+
+
+static void
+addlitchar(unsigned char ychar, sqlol_yyscan_t yyscanner)
+{
+ /* enlarge buffer if needed */
+ if ((yyextra->literallen + 1) >= yyextra->literalalloc)
+ {
+ yyextra->literalalloc *= 2;
+ yyextra->literalbuf = (char *) repalloc(yyextra->literalbuf,
+ yyextra->literalalloc);
+ }
+ /* append new data */
+ yyextra->literalbuf[yyextra->literallen] = ychar;
+ yyextra->literallen += 1;
+}
+
+
+/*
+ * Create a palloc'd copy of literalbuf, adding a trailing null.
+ */
+static char *
+litbufdup(sqlol_yyscan_t yyscanner)
+{
+ int llen = yyextra->literallen;
+ char *new;
+
+ new = palloc(llen + 1);
+ memcpy(new, yyextra->literalbuf, llen);
+ new[llen] = '\0';
+ return new;
+}
+
+/*
+ * Interface functions to make flex use palloc() instead of malloc().
+ * It'd be better to make these static, but flex insists otherwise.
+ */
+
+void *
+sqlol_yyalloc(yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ return palloc(bytes);
+}
+
+void *
+sqlol_yyrealloc(void *ptr, yy_size_t bytes, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ return repalloc(ptr, bytes);
+ else
+ return palloc(bytes);
+}
+
+void
+sqlol_yyfree(void *ptr, sqlol_yyscan_t yyscanner)
+{
+ if (ptr)
+ pfree(ptr);
+}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
new file mode 100644
index 0000000000..0a497e9d91
--- /dev/null
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -0,0 +1,118 @@
+/*-------------------------------------------------------------------------
+ *
+ * sqlol_scanner.h
+ * API for the core scanner (flex machine)
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * contrib/sqlol/sqlol_scanner.h
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef SQLOL_SCANNER_H
+#define SQLOL_SCANNER_H
+
+#include "sqlol_keywords.h"
+
+/*
+ * The scanner returns extra data about scanned tokens in this union type.
+ * Note that this is a subset of the fields used in YYSTYPE of the bison
+ * parsers built atop the scanner.
+ */
+typedef union sqlol_YYSTYPE
+{
+ int ival; /* for integer literals */
+ char *str; /* for identifiers and non-integer literals */
+ const char *keyword; /* canonical spelling of keywords */
+} sqlol_YYSTYPE;
+
+/*
+ * We track token locations in terms of byte offsets from the start of the
+ * source string, not the column number/line number representation that
+ * bison uses by default. Also, to minimize overhead we track only one
+ * location (usually the first token location) for each construct, not
+ * the beginning and ending locations as bison does by default. It's
+ * therefore sufficient to make YYLTYPE an int.
+ */
+#define YYLTYPE int
+
+/*
+ * Another important component of the scanner's API is the token code numbers.
+ * However, those are not defined in this file, because bison insists on
+ * defining them for itself. The token codes used by the core scanner are
+ * the ASCII characters plus these:
+ * %token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
+ * %token <ival> ICONST PARAM
+ * %token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
+ * %token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+ * The above token definitions *must* be the first ones declared in any
+ * bison parser built atop this scanner, so that they will have consistent
+ * numbers assigned to them (specifically, IDENT = 258 and so on).
+ */
+
+/*
+ * The YY_EXTRA data that a flex scanner allows us to pass around.
+ * Private state needed by the core scanner goes here. Note that the actual
+ * yy_extra struct may be larger and have this as its first component, thus
+ * allowing the calling parser to keep some fields of its own in YY_EXTRA.
+ */
+typedef struct sqlol_yy_extra_type
+{
+ /*
+ * The string the scanner is physically scanning. We keep this mainly so
+ * that we can cheaply compute the offset of the current token (yytext).
+ */
+ char *scanbuf;
+ Size scanbuflen;
+
+ /*
+ * The keyword list to use, and the associated grammar token codes.
+ */
+ const sqlol_ScanKeyword *keywords;
+ int num_keywords;
+
+ /*
+ * literalbuf is used to accumulate literal values when multiple rules are
+ * needed to parse a single literal. Call startlit() to reset buffer to
+ * empty, addlit() to add text. NOTE: the string in literalbuf is NOT
+ * necessarily null-terminated, but there always IS room to add a trailing
+ * null at offset literallen. We store a null only when we need it.
+ */
+ char *literalbuf; /* palloc'd expandable buffer */
+ int literallen; /* actual current string length */
+ int literalalloc; /* current allocated buffer size */
+
+ /*
+ * Random assorted scanner state.
+ */
+ int state_before_str_stop; /* start cond. before end quote */
+ YYLTYPE save_yylloc; /* one-element stack for PUSH_YYLLOC() */
+
+ /* state variables for literal-lexing warnings */
+ bool saw_non_ascii;
+} sqlol_yy_extra_type;
+
+/*
+ * The type of yyscanner is opaque outside scan.l.
+ */
+typedef void *sqlol_yyscan_t;
+
+
+/* Constant data exported from parser/scan.l */
+extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
+
+/* Entry points in parser/scan.l */
+extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
+ sqlol_yy_extra_type *yyext,
+ const sqlol_ScanKeyword *keywords,
+ int num_keywords);
+extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
+extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
+ sqlol_yyscan_t yyscanner);
+extern int sqlol_scanner_errposition(int location, sqlol_yyscan_t yyscanner);
+extern void sqlol_scanner_yyerror(const char *message, sqlol_yyscan_t yyscanner);
+
+#endif /* SQLOL_SCANNER_H */
--
2.32.0
v5-0003-Add-a-new-MODE_SINGLE_QUERY-to-the-core-parser-an.patchtext/x-diff; charset=us-asciiDownload
From 1577da8672747bf9bb3ef8cb22114b526b4419d1 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 01:33:42 +0800
Subject: [PATCH v5 3/4] Add a new MODE_SINGLE_QUERY to the core parser and use
it in pg_parse_query.
If a third-party module provides a parser_hook, pg_parse_query() switches to
single-query parsing so multi-query commands using different grammar can work
properly. If the third-party module supports the full set of SQL we support,
or want to prevent fallback on the core parser, it can ignore the
MODE_SINGLE_QUERY mode and parse the full query string. In that case they must
return a List with more than one RawStmt or a single RawStmt with a 0 length to
stop the parsing phase, or raise an ERROR.
Otherwise, plugins should parse a single query only and always return a List
containing a single RawStmt with a properly set length (possibly 0 if it was a
single query without end of query delimiter). If the command is valid but
doesn't contain any statements (e.g. a single semi-colon), a single RawStmt
with a NULL stmt field should be returned, containing the consumed query string
length so we can move to the next command in a single pass rather than 1 byte
at a time.
Also, third-party modules can choose to ignore some or all of parsing error if
they want to implement only subset of postgres suppoted syntax, or even a
totally different syntax, and fall-back on core grammar for unhandled case. In
thase case, they should set the error flag to true. The returned List will be
ignored and the same offset of the input string will be parsed using the core
parser.
Finally, note that third-party plugins that wants to fallback on other grammar
should first try to call a previous parser hook if any before setting the error
switch and returning.
---
.../pg_stat_statements/pg_stat_statements.c | 3 +-
src/backend/commands/tablecmds.c | 2 +-
src/backend/executor/spi.c | 4 +-
src/backend/parser/gram.y | 29 +++-
src/backend/parser/parse_type.c | 2 +-
src/backend/parser/parser.c | 15 +-
src/backend/parser/scan.l | 26 +++-
src/backend/tcop/postgres.c | 138 ++++++++++++++++--
src/include/parser/parser.h | 5 +-
src/include/parser/scanner.h | 6 +-
src/include/tcop/tcopprot.h | 3 +-
src/pl/plpgsql/src/pl_gram.y | 2 +-
src/pl/plpgsql/src/pl_scanner.c | 2 +-
13 files changed, 210 insertions(+), 27 deletions(-)
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 07fe0e7cda..6655ac12b9 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2720,7 +2720,8 @@ fill_in_constant_lengths(JumbleState *jstate, const char *query,
yyscanner = scanner_init(query,
&yyextra,
&ScanKeywords,
- ScanKeywordTokens);
+ ScanKeywordTokens,
+ 0);
/* we don't want to re-emit any escape string warnings */
yyextra.escape_string_warning = false;
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index dbee6ae199..770d3abe9a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -12791,7 +12791,7 @@ ATPostAlterTypeParse(Oid oldId, Oid oldRelId, Oid refRelId, char *cmd,
* parse_analyze() or the rewriter, but instead we need to pass them
* through parse_utilcmd.c to make them ready for execution.
*/
- raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT);
+ raw_parsetree_list = raw_parser(cmd, RAW_PARSE_DEFAULT, 0);
querytree_list = NIL;
foreach(list_item, raw_parsetree_list)
{
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index a5aec7ba7d..dc6c1dea1d 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2126,7 +2126,7 @@ _SPI_prepare_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Do parse analysis and rule rewrite for each raw parsetree, storing the
@@ -2234,7 +2234,7 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan)
/*
* Parse the request string into a list of raw parse trees.
*/
- raw_parsetree_list = raw_parser(src, plan->parse_mode);
+ raw_parsetree_list = raw_parser(src, plan->parse_mode, 0);
/*
* Construct plancache entries, but don't do parse analysis yet.
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e3068a374e..843b27ff39 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -625,7 +625,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token <str> IDENT UIDENT FCONST SCONST USCONST BCONST XCONST Op
%token <ival> ICONST PARAM
%token TYPECAST DOT_DOT COLON_EQUALS EQUALS_GREATER
-%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS
+%token LESS_EQUALS GREATER_EQUALS NOT_EQUALS END_OF_FILE
/*
* If you want to make any keyword changes, update the keyword table in
@@ -752,6 +752,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
%token MODE_PLPGSQL_ASSIGN1
%token MODE_PLPGSQL_ASSIGN2
%token MODE_PLPGSQL_ASSIGN3
+%token MODE_SINGLE_QUERY
/* Precedence: lowest to highest */
@@ -857,6 +858,32 @@ parse_toplevel:
pg_yyget_extra(yyscanner)->parsetree =
list_make1(makeRawStmt((Node *) n, 0));
}
+ | MODE_SINGLE_QUERY toplevel_stmt ';'
+ {
+ RawStmt *raw = makeRawStmt($2, 0);
+ updateRawStmtEnd(raw, @3 + 1);
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string and move to the next command.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(raw);
+ YYACCEPT;
+ }
+ /*
+ * We need to explicitly look for EOF to parse non-semicolon
+ * terminated statements in single query mode, as we could
+ * otherwise successfully parse the beginning of an otherwise
+ * invalid query.
+ */
+ | MODE_SINGLE_QUERY toplevel_stmt END_OF_FILE
+ {
+ /* NOTE: we can return a raw statement containing a NULL stmt.
+ * This is done to allow pg_parse_query to ignore that part of
+ * the input string.
+ */
+ pg_yyget_extra(yyscanner)->parsetree = list_make1(makeRawStmt($2, 0));
+ YYACCEPT;
+ }
;
/*
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index 31b07ad5ae..576726cb5b 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -750,7 +750,7 @@ typeStringToTypeName(const char *str)
ptserrcontext.previous = error_context_stack;
error_context_stack = &ptserrcontext;
- raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME);
+ raw_parsetree_list = raw_parser(str, RAW_PARSE_TYPE_NAME, 0);
error_context_stack = ptserrcontext.previous;
diff --git a/src/backend/parser/parser.c b/src/backend/parser/parser.c
index 875de7ba28..23fd49e74c 100644
--- a/src/backend/parser/parser.c
+++ b/src/backend/parser/parser.c
@@ -37,17 +37,25 @@ static char *str_udeescape(const char *str, char escape,
*
* Returns a list of raw (un-analyzed) parse trees. The contents of the
* list have the form required by the specified RawParseMode.
+ *
+ * For all mode different from MODE_SINGLE_QUERY, caller should provide a 0
+ * offset as the whole input string should be parsed. Otherwise, caller should
+ * provide the wanted offset in the input string, or -1 if no offset is
+ * required.
*/
List *
-raw_parser(const char *str, RawParseMode mode)
+raw_parser(const char *str, RawParseMode mode, int offset)
{
core_yyscan_t yyscanner;
base_yy_extra_type yyextra;
int yyresult;
+ Assert((mode != RAW_PARSE_SINGLE_QUERY && offset == 0) ||
+ (mode == RAW_PARSE_SINGLE_QUERY && offset != 0));
+
/* initialize the flex scanner */
yyscanner = scanner_init(str, &yyextra.core_yy_extra,
- &ScanKeywords, ScanKeywordTokens);
+ &ScanKeywords, ScanKeywordTokens, offset);
/* base_yylex() only needs us to initialize the lookahead token, if any */
if (mode == RAW_PARSE_DEFAULT)
@@ -61,7 +69,8 @@ raw_parser(const char *str, RawParseMode mode)
MODE_PLPGSQL_EXPR, /* RAW_PARSE_PLPGSQL_EXPR */
MODE_PLPGSQL_ASSIGN1, /* RAW_PARSE_PLPGSQL_ASSIGN1 */
MODE_PLPGSQL_ASSIGN2, /* RAW_PARSE_PLPGSQL_ASSIGN2 */
- MODE_PLPGSQL_ASSIGN3 /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_PLPGSQL_ASSIGN3, /* RAW_PARSE_PLPGSQL_ASSIGN3 */
+ MODE_SINGLE_QUERY /* RAW_PARSE_SINGLE_QUERY */
};
yyextra.have_lookahead = true;
diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 6e6824faeb..156c0df7d7 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -1042,7 +1042,10 @@ other .
<<EOF>> {
SET_YYLLOC();
- yyterminate();
+ if (yyextra->return_eof)
+ return END_OF_FILE;
+ else
+ yyterminate();
}
%%
@@ -1190,8 +1193,10 @@ core_yyscan_t
scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens)
+ const uint16 *keyword_tokens,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -1214,13 +1219,28 @@ scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Note that pg_parse_query will set a -1 offset rather than 0 for the
+ * first query of a possibly multi-query string if it wants us to return an
+ * EOF token.
+ */
+ yyext->return_eof = (offset != 0);
+
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ if (offset > 0)
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 66ee58a4b1..d94d2b3c10 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -602,17 +602,137 @@ ProcessClientWriteInterrupt(bool blocked)
List *
pg_parse_query(const char *query_string)
{
- List *raw_parsetree_list = NIL;
+ List *result = NIL;
+ int stmt_len, offset;
TRACE_POSTGRESQL_QUERY_PARSE_START(query_string);
if (log_parser_stats)
ResetUsage();
- if (parser_hook)
- raw_parsetree_list = (*parser_hook) (query_string, RAW_PARSE_DEFAULT);
- else
- raw_parsetree_list = raw_parser(query_string, RAW_PARSE_DEFAULT);
+ stmt_len = 0; /* lazily computed when needed */
+ offset = 0;
+
+ while(true)
+ {
+ List *raw_parsetree_list;
+ RawStmt *raw;
+ bool error = false;
+
+ /*----------------
+ * Start parsing the input string. If a third-party module provided a
+ * parser_hook, we switch to single-query parsing so multi-query
+ * commands using different grammar can work properly.
+ * If the third-party modules support the full set of SQL we support,
+ * or want to prevent fallback on the core parser, it can ignore the
+ * RAW_PARSE_SINGLE_QUERY flag and parse the full query string.
+ * In that case they must return a List with more than one RawStmt or a
+ * single RawStmt with a 0 length to stop the parsing phase, or raise
+ * an ERROR.
+ *
+ * Otherwise, plugins should parse a single query only and always
+ * return a List containing a single RawStmt with a properly set length
+ * (possibly 0 if it was a single query without end of query
+ * delimiter). If the command is valid but doesn't contain any
+ * statements (e.g. a single semi-colon), a single RawStmt with a NULL
+ * stmt field should be returned, containing the consumed query string
+ * length so we can move to the next command in a single pass rather
+ * than 1 byte at a time.
+ *
+ * Also, third-party modules can choose to ignore some or all of
+ * parsing error if they want to implement only subset of postgres
+ * suppoted syntax, or even a totally different syntax, and fall-back
+ * on core grammar for unhandled case. In thase case, they should set
+ * the error flag to true. The returned List will be ignored and the
+ * same offset of the input string will be parsed using the core
+ * parser.
+ *
+ * Finally, note that third-party modules that wants to fallback on
+ * other grammar should first try to call a previous parser hook if any
+ * before setting the error switch and returning .
+ */
+ if (parser_hook)
+ raw_parsetree_list = (*parser_hook) (query_string,
+ RAW_PARSE_SINGLE_QUERY,
+ offset,
+ &error);
+
+ /*
+ * If a third-party module couldn't parse a single query or if no
+ * third-party module is configured, fallback on core parser.
+ */
+ if (error || !parser_hook)
+ {
+ /* Send a -1 offset to raw_parser to specify that it should
+ * explicitly detect EOF during parsing. scanner_init() will treat
+ * it the same as a 0 offset.
+ */
+ raw_parsetree_list = raw_parser(query_string,
+ error ? RAW_PARSE_SINGLE_QUERY : RAW_PARSE_DEFAULT,
+ (error && offset == 0) ? -1 : offset);
+ }
+
+ /*
+ * If there are no third-party plugin, or none of the parsers found a
+ * valid query, or if a third party module consumed the whole
+ * query string we're done.
+ */
+ if (!parser_hook || raw_parsetree_list == NIL ||
+ list_length(raw_parsetree_list) > 1)
+ {
+ /*
+ * Warn third-party plugins if they mix "single query" and "whole
+ * input string" strategy rather than silently accepting it and
+ * maybe allow fallback on core grammar even if they want to avoid
+ * that. This way plugin authors can be warned early of the issue.
+ */
+ if (result != NIL)
+ {
+ Assert(parser_hook != NULL);
+ elog(ERROR, "parser_hook should parse a single statement at "
+ "a time or consume the whole input string at once");
+ }
+ result = raw_parsetree_list;
+ break;
+ }
+
+ if (stmt_len == 0)
+ stmt_len = strlen(query_string);
+
+ raw = linitial_node(RawStmt, raw_parsetree_list);
+
+ /*
+ * In single-query mode, the parser will return statement location info
+ * relative to the beginning of complete original string, not the part
+ * we just parsed, so adjust the location info.
+ */
+ if (offset > 0 && raw->stmt_len > 0)
+ {
+ Assert(raw->stmt_len > offset);
+ raw->stmt_location = offset;
+ raw->stmt_len -= offset;
+ }
+
+ /* Ignore the statement if it didn't contain any command. */
+ if (raw->stmt)
+ result = lappend(result, raw);
+
+ if (raw->stmt_len == 0)
+ {
+ /* The statement was the whole string, we're done. */
+ break;
+ }
+ else if (raw->stmt_len + offset >= stmt_len)
+ {
+ /* We consumed all of the input string, we're done. */
+ break;
+ }
+ else
+ {
+ /* Advance the offset to the next command. */
+ offset += raw->stmt_len;
+ }
+ }
if (log_parser_stats)
ShowUsage("PARSER STATISTICS");
@@ -620,13 +740,13 @@ pg_parse_query(const char *query_string)
#ifdef COPY_PARSE_PLAN_TREES
/* Optional debugging check: pass raw parsetrees through copyObject() */
{
- List *new_list = copyObject(raw_parsetree_list);
+ List *new_list = copyObject(result);
/* This checks both copyObject() and the equal() routines... */
- if (!equal(new_list, raw_parsetree_list))
+ if (!equal(new_list, result))
elog(WARNING, "copyObject() failed to produce an equal raw parse tree");
else
- raw_parsetree_list = new_list;
+ result = new_list;
}
#endif
@@ -638,7 +758,7 @@ pg_parse_query(const char *query_string)
TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string);
- return raw_parsetree_list;
+ return result;
}
/*
diff --git a/src/include/parser/parser.h b/src/include/parser/parser.h
index 853b0f1606..5694ae791a 100644
--- a/src/include/parser/parser.h
+++ b/src/include/parser/parser.h
@@ -41,7 +41,8 @@ typedef enum
RAW_PARSE_PLPGSQL_EXPR,
RAW_PARSE_PLPGSQL_ASSIGN1,
RAW_PARSE_PLPGSQL_ASSIGN2,
- RAW_PARSE_PLPGSQL_ASSIGN3
+ RAW_PARSE_PLPGSQL_ASSIGN3,
+ RAW_PARSE_SINGLE_QUERY
} RawParseMode;
/* Values for the backslash_quote GUC */
@@ -59,7 +60,7 @@ extern PGDLLIMPORT bool standard_conforming_strings;
/* Primary entry point for the raw parsing functions */
-extern List *raw_parser(const char *str, RawParseMode mode);
+extern List *raw_parser(const char *str, RawParseMode mode, int offset);
/* Utility functions exported by gram.y (perhaps these should be elsewhere) */
extern List *SystemFuncName(char *name);
diff --git a/src/include/parser/scanner.h b/src/include/parser/scanner.h
index 0d8182faa0..a2e97be5d5 100644
--- a/src/include/parser/scanner.h
+++ b/src/include/parser/scanner.h
@@ -113,6 +113,9 @@ typedef struct core_yy_extra_type
/* state variables for literal-lexing warnings */
bool warn_on_first_escape;
bool saw_non_ascii;
+
+ /* state variable for returning an EOF token in single query mode */
+ bool return_eof;
} core_yy_extra_type;
/*
@@ -136,7 +139,8 @@ extern PGDLLIMPORT const uint16 ScanKeywordTokens[];
extern core_yyscan_t scanner_init(const char *str,
core_yy_extra_type *yyext,
const ScanKeywordList *keywordlist,
- const uint16 *keyword_tokens);
+ const uint16 *keyword_tokens,
+ int offset);
extern void scanner_finish(core_yyscan_t yyscanner);
extern int core_yylex(core_YYSTYPE *lvalp, YYLTYPE *llocp,
core_yyscan_t yyscanner);
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 131dc2b22e..27201dde1d 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -45,7 +45,8 @@ typedef enum
extern PGDLLIMPORT int log_statement;
/* Hook for plugins to get control in pg_parse_query() */
-typedef List *(*parser_hook_type) (const char *str, RawParseMode mode);
+typedef List *(*parser_hook_type) (const char *str, RawParseMode mode,
+ int offset, bool *error);
extern PGDLLIMPORT parser_hook_type parser_hook;
extern List *pg_parse_query(const char *query_string);
diff --git a/src/pl/plpgsql/src/pl_gram.y b/src/pl/plpgsql/src/pl_gram.y
index 0f6a5b30b1..5c4f4c08bc 100644
--- a/src/pl/plpgsql/src/pl_gram.y
+++ b/src/pl/plpgsql/src/pl_gram.y
@@ -3656,7 +3656,7 @@ check_sql_expr(const char *stmt, RawParseMode parseMode, int location)
error_context_stack = &syntax_errcontext;
oldCxt = MemoryContextSwitchTo(plpgsql_compile_tmp_cxt);
- (void) raw_parser(stmt, parseMode);
+ (void) raw_parser(stmt, parseMode, 0);
MemoryContextSwitchTo(oldCxt);
/* Restore former ereport callback */
diff --git a/src/pl/plpgsql/src/pl_scanner.c b/src/pl/plpgsql/src/pl_scanner.c
index e4c7a91ab5..a2886c42ec 100644
--- a/src/pl/plpgsql/src/pl_scanner.c
+++ b/src/pl/plpgsql/src/pl_scanner.c
@@ -587,7 +587,7 @@ plpgsql_scanner_init(const char *str)
{
/* Start up the core scanner */
yyscanner = scanner_init(str, &core_yy,
- &ReservedPLKeywords, ReservedPLKeywordTokens);
+ &ReservedPLKeywords, ReservedPLKeywordTokens, 0);
/*
* scanorig points to the original string, which unlike the scanner's
--
2.32.0
v5-0004-Teach-sqlol-to-use-the-new-MODE_SINGLE_QUERY-pars.patchtext/x-diff; charset=us-asciiDownload
From 8a17a647b4249400754bc026a221735ef9f7c523 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 22 Apr 2021 02:15:54 +0800
Subject: [PATCH v5 4/4] Teach sqlol to use the new MODE_SINGLE_QUERY parser
mode.
This way multi-statements commands using both core parser and sqlol parser can
be supported.
Also add a LOLCODE version of CREATE VIEW viewname AS to easily test
multi-statements commands.
---
contrib/sqlol/Makefile | 2 +
contrib/sqlol/expected/01_sqlol.out | 77 +++++++++++++++++++++++++++++
contrib/sqlol/repro.sql | 18 +++++++
contrib/sqlol/sql/01_sqlol.sql | 44 +++++++++++++++++
contrib/sqlol/sqlol.c | 24 +++++----
contrib/sqlol/sqlol_gram.y | 63 +++++++++++------------
contrib/sqlol/sqlol_kwlist.h | 1 +
contrib/sqlol/sqlol_scan.l | 13 ++++-
contrib/sqlol/sqlol_scanner.h | 3 +-
9 files changed, 199 insertions(+), 46 deletions(-)
create mode 100644 contrib/sqlol/expected/01_sqlol.out
create mode 100644 contrib/sqlol/repro.sql
create mode 100644 contrib/sqlol/sql/01_sqlol.sql
diff --git a/contrib/sqlol/Makefile b/contrib/sqlol/Makefile
index 3850ac3fce..eaf94801c2 100644
--- a/contrib/sqlol/Makefile
+++ b/contrib/sqlol/Makefile
@@ -6,6 +6,8 @@ OBJS = \
sqlol.o sqlol_gram.o sqlol_scan.o sqlol_keywords.o
PGFILEDESC = "sqlol - Toy alternative grammar based on LOLCODE"
+REGRESS = 01_sqlol
+
sqlol_gram.h: sqlol_gram.c
touch $@
diff --git a/contrib/sqlol/expected/01_sqlol.out b/contrib/sqlol/expected/01_sqlol.out
new file mode 100644
index 0000000000..9c51dd62c2
--- /dev/null
+++ b/contrib/sqlol/expected/01_sqlol.out
@@ -0,0 +1,77 @@
+LOAD 'sqlol';
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+ id | val
+----+-----
+(0 rows)
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+ id | id
+----+----
+(0 rows)
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+ ?column?
+----------
+ 3
+(1 row)
+
+-- test empty statement ignoring
+\;\;select 1 \g
+ ?column?
+----------
+ 1
+(1 row)
+
+-- check the created views
+SELECT relname, relkind
+FROM pg_class c
+JOIN pg_namespace n ON c.relnamespace = n.oid
+WHERE nspname = 'public'
+ORDER BY relname COLLATE "C";
+ relname | relkind
+---------+---------
+ t1 | r
+ v0 | v
+ v1 | v
+ v2 | v
+ v3 | v
+ v4 | v
+ v5 | v
+(7 rows)
+
+--
+-- Error position
+--
+SELECT 1\;err;
+ERROR: syntax error at or near "err"
+LINE 1: SELECT 1;err;
+ ^
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+ERROR: syntax error at or near "HAI"
+LINE 1: SELECT 1;HAI 1.2 I HAS A t1 GIMME id KTHXBYE
+ ^
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+ERROR: improper qualified name (too many dotted names): some.thing.public.t1
+LINE 1: SELECT 1;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHX...
+ ^
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
+ERROR: relation "notatable" does not exist
+LINE 1: SELECT 1;SELECT * FROM notatable;
+ ^
diff --git a/contrib/sqlol/repro.sql b/contrib/sqlol/repro.sql
new file mode 100644
index 0000000000..0ebcb53160
--- /dev/null
+++ b/contrib/sqlol/repro.sql
@@ -0,0 +1,18 @@
+DROP TABLE IF EXISTS t1 CASCADE;
+
+LOAD 'sqlol';
+
+\;\; SELECT 1\;
+
+CREATE TABLE t1 (id integer, val text);
+
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+SELECT 1\;SELECT 2\;SELECT 3 \g
+\d
diff --git a/contrib/sqlol/sql/01_sqlol.sql b/contrib/sqlol/sql/01_sqlol.sql
new file mode 100644
index 0000000000..e89a3dd9a0
--- /dev/null
+++ b/contrib/sqlol/sql/01_sqlol.sql
@@ -0,0 +1,44 @@
+LOAD 'sqlol';
+
+-- create a base table, falling back on core grammar
+CREATE TABLE t1 (id integer, val text);
+
+-- test a SQLOL statement
+HAI 1.2 I HAS A t1 GIMMEH id, "val" KTHXBYE\g
+
+-- create a view in SQLOL
+HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v0 KTHXBYE\g
+
+-- combine standard SQL with a trailing SQLOL statement in multi-statements command
+CREATE VIEW v1 AS SELECT * FROM t1\; CREATE VIEW v2 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- interleave standard SQL and SQLOL commands in multi-statements command
+CREATE VIEW v3 AS SELECT * FROM t1\; HAI 1.2 MAEK I HAS A t1 GIMMEH id, "val" A v4 KTHXBYE CREATE VIEW v5 AS SELECT * FROM t1\;HAI 1.2 I HAS A t1 GIMMEH "id", id KTHXBYE\g
+
+-- test MODE_SINGLE_QUERY with no trailing semicolon
+SELECT 1\;SELECT 2\;SELECT 3 \g
+
+-- test empty statement ignoring
+\;\;select 1 \g
+
+-- check the created views
+SELECT relname, relkind
+FROM pg_class c
+JOIN pg_namespace n ON c.relnamespace = n.oid
+WHERE nspname = 'public'
+ORDER BY relname COLLATE "C";
+
+--
+-- Error position
+--
+SELECT 1\;err;
+
+-- sqlol won't trigger an error on incorrect GIMME keyword, so core parser will
+-- complain about HAI
+SELECT 1\;HAI 1.2 I HAS A t1 GIMME id KTHXBYE\g
+
+-- sqlol will trigger the error about too many qualifiers on t1
+SELECT 1\;HAI 1.2 I HAS A some.thing.public.t1 GIMMEH id KTHXBYE\g
+
+-- position reported outside of the parser/scanner should be correct too
+SELECT 1\;SELECT * FROM notatable;
diff --git a/contrib/sqlol/sqlol.c b/contrib/sqlol/sqlol.c
index b986966181..7d4e1b631f 100644
--- a/contrib/sqlol/sqlol.c
+++ b/contrib/sqlol/sqlol.c
@@ -26,7 +26,8 @@ static parser_hook_type prev_parser_hook = NULL;
void _PG_init(void);
void _PG_fini(void);
-static List *sqlol_parser_hook(const char *str, RawParseMode mode);
+static List *sqlol_parser_hook(const char *str, RawParseMode mode, int offset,
+ bool *error);
/*
@@ -54,23 +55,25 @@ _PG_fini(void)
* sqlol_parser_hook: parse our grammar
*/
static List *
-sqlol_parser_hook(const char *str, RawParseMode mode)
+sqlol_parser_hook(const char *str, RawParseMode mode, int offset, bool *error)
{
sqlol_yyscan_t yyscanner;
sqlol_base_yy_extra_type yyextra;
int yyresult;
- if (mode != RAW_PARSE_DEFAULT)
+ if (mode != RAW_PARSE_DEFAULT && mode != RAW_PARSE_SINGLE_QUERY)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
/* initialize the flex scanner */
yyscanner = sqlol_scanner_init(str, &yyextra.sqlol_yy_extra,
- sqlol_ScanKeywords, sqlol_NumScanKeywords);
+ sqlol_ScanKeywords, sqlol_NumScanKeywords,
+ offset);
/* initialize the bison parser */
sqlol_parser_init(&yyextra);
@@ -88,9 +91,10 @@ sqlol_parser_hook(const char *str, RawParseMode mode)
if (yyresult)
{
if (prev_parser_hook)
- return (*prev_parser_hook) (str, mode);
- else
- return raw_parser(str, mode);
+ return (*prev_parser_hook) (str, mode, offset, error);
+
+ *error = true;
+ return NIL;
}
return yyextra.parsetree;
diff --git a/contrib/sqlol/sqlol_gram.y b/contrib/sqlol/sqlol_gram.y
index 3214865a53..94e20c2018 100644
--- a/contrib/sqlol/sqlol_gram.y
+++ b/contrib/sqlol/sqlol_gram.y
@@ -20,6 +20,7 @@
#include "catalog/namespace.h"
#include "nodes/makefuncs.h"
+#include "catalog/pg_class_d.h"
#include "sqlol_gramparse.h"
@@ -105,10 +106,10 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
ResTarget *target;
}
-%type <node> stmt toplevel_stmt GimmehStmt simple_gimmeh columnref
+%type <node> stmt toplevel_stmt GimmehStmt MaekStmt simple_gimmeh columnref
indirection_el
-%type <list> parse_toplevel stmtmulti gimmeh_list indirection
+%type <list> parse_toplevel rawstmt gimmeh_list indirection
%type <range> qualified_name
@@ -133,22 +134,19 @@ static List *check_indirection(List *indirection, sqlol_yyscan_t yyscanner);
*/
/* ordinary key words in alphabetical order */
-%token <keyword> A GIMMEH HAI HAS I KTHXBYE
-
+%token <keyword> A GIMMEH HAI HAS I KTHXBYE MAEK
%%
/*
* The target production for the whole parse.
- *
- * Ordinarily we parse a list of statements, but if we see one of the
- * special MODE_XXX symbols as first token, we parse something else.
- * The options here correspond to enum RawParseMode, which see for details.
*/
parse_toplevel:
- stmtmulti
+ rawstmt
{
pg_yyget_extra(yyscanner)->parsetree = $1;
+
+ YYACCEPT;
}
;
@@ -162,24 +160,11 @@ parse_toplevel:
* we'd get -1 for the location in such cases.
* We also take care to discard empty statements entirely.
*/
-stmtmulti: stmtmulti KTHXBYE toplevel_stmt
- {
- if ($1 != NIL)
- {
- /* update length of previous stmt */
- updateRawStmtEnd(llast_node(RawStmt, $1), @2);
- }
- if ($3 != NULL)
- $$ = lappend($1, makeRawStmt($3, @2 + 1));
- else
- $$ = $1;
- }
- | toplevel_stmt
+rawstmt: toplevel_stmt KTHXBYE
{
- if ($1 != NULL)
- $$ = list_make1(makeRawStmt($1, 0));
- else
- $$ = NIL;
+ RawStmt *raw = makeRawStmt($1, 0);
+ updateRawStmtEnd(raw, @2 + 7);
+ $$ = list_make1(raw);
}
;
@@ -188,13 +173,12 @@ stmtmulti: stmtmulti KTHXBYE toplevel_stmt
* those words have different meanings in function bodys.
*/
toplevel_stmt:
- stmt
+ HAI FCONST stmt { $$ = $3; }
;
stmt:
GimmehStmt
- | /*EMPTY*/
- { $$ = NULL; }
+ | MaekStmt
;
/*****************************************************************************
@@ -208,12 +192,11 @@ GimmehStmt:
;
simple_gimmeh:
- HAI FCONST I HAS A qualified_name
- GIMMEH gimmeh_list
+ I HAS A qualified_name GIMMEH gimmeh_list
{
SelectStmt *n = makeNode(SelectStmt);
- n->targetList = $8;
- n->fromClause = list_make1($6);
+ n->targetList = $6;
+ n->fromClause = list_make1($4);
$$ = (Node *)n;
}
;
@@ -232,6 +215,20 @@ gimmeh_el:
$$->location = @1;
}
+MaekStmt:
+ MAEK GimmehStmt A qualified_name
+ {
+ ViewStmt *n = makeNode(ViewStmt);
+ n->view = $4;
+ n->view->relpersistence = RELPERSISTENCE_PERMANENT;
+ n->aliases = NIL;
+ n->query = $2;
+ n->replace = false;
+ n->options = NIL;
+ n->withCheckOption = false;
+ $$ = (Node *) n;
+ }
+
qualified_name:
ColId
{
diff --git a/contrib/sqlol/sqlol_kwlist.h b/contrib/sqlol/sqlol_kwlist.h
index 2de3893ee4..8b50d88df9 100644
--- a/contrib/sqlol/sqlol_kwlist.h
+++ b/contrib/sqlol/sqlol_kwlist.h
@@ -19,3 +19,4 @@ PG_KEYWORD("hai", HAI, RESERVED_KEYWORD)
PG_KEYWORD("has", HAS, UNRESERVED_KEYWORD)
PG_KEYWORD("i", I, UNRESERVED_KEYWORD)
PG_KEYWORD("kthxbye", KTHXBYE, UNRESERVED_KEYWORD)
+PG_KEYWORD("maek", MAEK, UNRESERVED_KEYWORD)
diff --git a/contrib/sqlol/sqlol_scan.l b/contrib/sqlol/sqlol_scan.l
index a7088b8390..e6d4d53446 100644
--- a/contrib/sqlol/sqlol_scan.l
+++ b/contrib/sqlol/sqlol_scan.l
@@ -412,8 +412,10 @@ sqlol_yyscan_t
sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords)
+ int num_keywords,
+ int offset)
{
+ YY_BUFFER_STATE state;
Size slen = strlen(str);
yyscan_t scanner;
@@ -432,13 +434,20 @@ sqlol_scanner_init(const char *str,
yyext->scanbuflen = slen;
memcpy(yyext->scanbuf, str, slen);
yyext->scanbuf[slen] = yyext->scanbuf[slen + 1] = YY_END_OF_BUFFER_CHAR;
- yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
+ state = yy_scan_buffer(yyext->scanbuf, slen + 2, scanner);
/* initialize literal buffer to a reasonable but expansible size */
yyext->literalalloc = 1024;
yyext->literalbuf = (char *) palloc(yyext->literalalloc);
yyext->literallen = 0;
+ /*
+ * Adjust the offset in the input string. This is required in single-query
+ * mode, as we need to register the same token locations as we would have
+ * in normal mode with multi-statement query string.
+ */
+ state->yy_buf_pos += offset;
+
return scanner;
}
diff --git a/contrib/sqlol/sqlol_scanner.h b/contrib/sqlol/sqlol_scanner.h
index 0a497e9d91..57f95867ee 100644
--- a/contrib/sqlol/sqlol_scanner.h
+++ b/contrib/sqlol/sqlol_scanner.h
@@ -108,7 +108,8 @@ extern PGDLLIMPORT const uint16 sqlol_ScanKeywordTokens[];
extern sqlol_yyscan_t sqlol_scanner_init(const char *str,
sqlol_yy_extra_type *yyext,
const sqlol_ScanKeyword *keywords,
- int num_keywords);
+ int num_keywords,
+ int offset);
extern void sqlol_scanner_finish(sqlol_yyscan_t yyscanner);
extern int sqlol_yylex(sqlol_YYSTYPE *lvalp, YYLTYPE *llocp,
sqlol_yyscan_t yyscanner);
--
2.32.0
On Sat, 1 May 2021 at 08:24, Julien Rouhaud <rjuju123@gmail.com> wrote:
Being able to extend core parser has been requested multiple times, and AFAICT
all previous attempts were rejected not because this isn't wanted but because
the proposed implementations required plugins to reimplement all of the core
grammar with their own changes, as bison generated parsers aren't extensible.I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers.
Yes, that approach has been discussed by many people, most recently
around the idea to create a fast-path grammar to make the most
frequently used SQL parse faster.
0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
ability to produce "select [col, ] col FROM table" parsetree, for testing
purpose. I chose it to ensure that everything works properly even with a
totally different grammar that has different keywords, which doesn't even ends
statements with a semicolon but a plain keyword.
The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.
What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.
Core doesn't need a LOL parser and I don't think we should commit that.
If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.
--
Simon Riggs http://www.EnterpriseDB.com/
On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.
That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
openssl_tls_init_hook only has a usage in src/test/modules
What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.
Agreed. There should be something in src/test/modules to exercise this
but probably more to flush things out. Maybe extending adminpack to use
this so if enabled, it can use syntax like:
FILE READ 'foo.txt'
Core doesn't need a LOL parser and I don't think we should commit that.
+1
If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.
This hook allows extension developers to supplement syntax in addition
to adding a whole new language allowing the extension to appear more
native to the end user. For example, pglogical could use this to add
syntax to do a CREATE NODE instead of calling the function create_node.
Adding CREATE LANGUAGE support around this would just be for a narrow
set of use cases where a new language is added.
On Wed, Sep 15, 2021 at 10:14 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:
On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
openssl_tls_init_hook only has a usage in src/test/modules
Yes, I also think that it's not a strict requirement that all hooks
have a caller in the core, even if it's obviously better if that's the
case.
What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.Agreed. There should be something in src/test/modules to exercise this
but probably more to flush things out. Maybe extending adminpack to use
this so if enabled, it can use syntax like:
FILE READ 'foo.txt'
For this hook, maintaining a real alternative parser seems like way
too much trouble to justify an in-core user. The fact that many
people have asked for such a feature over the year should be enough to
justify the use case. We could try to invent some artificial need
like the one you suggest for adminpack, but it also feels like a waste
of resources.
As far as I'm concerned a naive strcmp-based parser in
src/test/modules should be enough to validate that the hook is
working, there's no need for more. In any case if the only
requirement for it to be committed is to write a real parser, whether
in contrib or src/test/modules, I'll be happy to do it.
Core doesn't need a LOL parser and I don't think we should commit that.
+1
I entirely agree, and I repeatedly mentioned in that thread that I did
*not* want to add this parser in core. The only purpose of patches
0002 and 0004 is to make the third-party bison based parser
requirements less abstract, and demonstrate that this approach can
successfully make two *radically different* parsers cohabit.
If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.This hook allows extension developers to supplement syntax in addition
to adding a whole new language allowing the extension to appear more
native to the end user. For example, pglogical could use this to add
syntax to do a CREATE NODE instead of calling the function create_node.
Adding CREATE LANGUAGE support around this would just be for a narrow
set of use cases where a new language is added.
Yes, this hook can be used to implement multiple things as I mentioned
my initial email. Additionally, if this is eventually committed I'd
like to add support for CREATE HYPOTHETICAL INDEX grammar in hypopg.
Such a parser would only support one command (that extends an existing
one), so it can't really be called a language. Of course if would be
better to have the core parser accept a CREATE [ HYPOTHETICAL ] INDEX
and setup a flag so that third-parrty module can intercept this
utility command, but until that happens I could provide that syntactic
sugar for my users as long as I'm motivated enough to write this
parser.
Also, a hook based approach is still compatible with per database /
role configuration. It can be done either via specific
session_preload_libraries, or via a custom GUC if for some reason the
module requires to be in shared_preload_libraries.
On Wed, Sep 15, 2021 at 02:25:17PM +0100, Simon Riggs wrote:
On Sat, 1 May 2021 at 08:24, Julien Rouhaud <rjuju123@gmail.com> wrote:
Being able to extend core parser has been requested multiple times, and AFAICT
all previous attempts were rejected not because this isn't wanted but because
the proposed implementations required plugins to reimplement all of the core
grammar with their own changes, as bison generated parsers aren't extensible.I'd like to propose an alternative approach, which is to allow multiple parsers
to coexist, and let third-party parsers optionally fallback on the core
parsers.Yes, that approach has been discussed by many people, most recently
around the idea to create a fast-path grammar to make the most
frequently used SQL parse faster.0002 implements a lame "sqlol" parser, based on LOLCODE syntax, with only the
ability to produce "select [col, ] col FROM table" parsetree, for testing
purpose. I chose it to ensure that everything works properly even with a
totally different grammar that has different keywords, which doesn't even ends
statements with a semicolon but a plain keyword.The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.Core doesn't need a LOL parser and I don't think we should commit that.
It doesn't, but it very likely needs something people can use when
they create a new table AM, and that we should use the hook in core to
implement the heap* table AM to make sure the thing is working at DDL
time.
If we do this, I think it should have CREATE LANGUAGE support, so
that each plugin can be seen as an in-core object and allow security
around which users can execute which language types, allow us to
switch between languages and have default languages for specific
users or databases.
That's a great idea, but I must be missing something important as it
relates to parser hooks. Could you connect those a little more
explicitly?
Best,
David.
* It's not actually a heap in the sense that the term is normally used
in computing. I'd love to find out how it got to have this name and
document same so others aren't also left wondering.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
st 15. 9. 2021 v 16:55 odesílatel Julien Rouhaud <rjuju123@gmail.com>
napsal:
On Wed, Sep 15, 2021 at 10:14 PM Jim Mlodgenski <jimmy76@gmail.com> wrote:
On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook. The recent
openssl_tls_init_hook only has a usage in src/test/modulesYes, I also think that it's not a strict requirement that all hooks
have a caller in the core, even if it's obviously better if that's the
case.What we need is something in core that actually makes use of this. The
reason for that is not politics, but a simple test of whether the
feature makes sense AND includes all required bells and whistles to be
useful in the real world.Agreed. There should be something in src/test/modules to exercise this
but probably more to flush things out. Maybe extending adminpack to use
this so if enabled, it can use syntax like:
FILE READ 'foo.txt'For this hook, maintaining a real alternative parser seems like way
too much trouble to justify an in-core user. The fact that many
people have asked for such a feature over the year should be enough to
justify the use case. We could try to invent some artificial need
like the one you suggest for adminpack, but it also feels like a waste
of resources.As far as I'm concerned a naive strcmp-based parser in
src/test/modules should be enough to validate that the hook is
working, there's no need for more. In any case if the only
requirement for it to be committed is to write a real parser, whether
in contrib or src/test/modules, I'll be happy to do it.Core doesn't need a LOL parser and I don't think we should commit that.
+1
I entirely agree, and I repeatedly mentioned in that thread that I did
*not* want to add this parser in core. The only purpose of patches
0002 and 0004 is to make the third-party bison based parser
requirements less abstract, and demonstrate that this approach can
successfully make two *radically different* parsers cohabit.If we do this, I think it should have CREATE LANGUAGE support, so that
each plugin can be seen as an in-core object and allow security around
which users can execute which language types, allow us to switch
between languages and have default languages for specific users or
databases.This hook allows extension developers to supplement syntax in addition
to adding a whole new language allowing the extension to appear more
native to the end user. For example, pglogical could use this to add
syntax to do a CREATE NODE instead of calling the function create_node.
Adding CREATE LANGUAGE support around this would just be for a narrow
set of use cases where a new language is added.Yes, this hook can be used to implement multiple things as I mentioned
my initial email. Additionally, if this is eventually committed I'd
like to add support for CREATE HYPOTHETICAL INDEX grammar in hypopg.
Such a parser would only support one command (that extends an existing
one), so it can't really be called a language. Of course if would be
better to have the core parser accept a CREATE [ HYPOTHETICAL ] INDEX
and setup a flag so that third-parrty module can intercept this
utility command, but until that happens I could provide that syntactic
sugar for my users as long as I'm motivated enough to write this
parser.
There were nice stream databases, but that ended because maintaining a fork
is too expensive. And without direct SQL (without possibility of parser
enhancing), the commands based on function call API were not readable and
workable flexible like SQL. Sometimes we really don't want to replace
PostgreSQL, but just enhance the main interface for extensions.
Also, a hook based approach is still compatible with per database /
Show quoted text
role configuration. It can be done either via specific
session_preload_libraries, or via a custom GUC if for some reason the
module requires to be in shared_preload_libraries.
Jim Mlodgenski <jimmy76@gmail.com> writes:
On Wed, Sep 15, 2021 at 9:25 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:The general rule has always been that we don't just put hooks in, we
always require an in-core use for those hooks. I was reminded of that
myself recently.
That's not historically what has happened. There are several hooks with
no in core use such as emit_log_hook and ExplainOneQuery_hook.
Yeah. I think the proper expectation is that there be a sufficiently
worked-out example to convince us that the proposed hooks have real-world
usefulness, and are not missing any basic requirements to make them do
something useful. Whether the example ends up in our tree is a
case-by-case decision.
In the case at hand, what's troubling me is that I don't see any
particular use in merely substituting a new bison grammar, if it
still has to produce parse trees that the rest of the system will
understand. Yeah, you could make some very simple surface-syntax
changes that way, but it doesn't seem like you could do anything
interesting (like, say, support Oracle-style outer join syntax).
AFAICS, to get to a useful feature, you'd then need to invent an
extensible Node system (which'd be hugely invasive if it's feasible
at all), and then probably more things on top of that. So I'm not
convinced that you've demonstrated any real-world usefulness.
regards, tom lane
On Wed, Sep 15, 2021 at 11:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
In the case at hand, what's troubling me is that I don't see any
particular use in merely substituting a new bison grammar, if it
still has to produce parse trees that the rest of the system will
understand. Yeah, you could make some very simple surface-syntax
changes that way, but it doesn't seem like you could do anything
interesting (like, say, support Oracle-style outer join syntax).
AFAICS, to get to a useful feature, you'd then need to invent an
extensible Node system (which'd be hugely invasive if it's feasible
at all), and then probably more things on top of that. So I'm not
convinced that you've demonstrated any real-world usefulness.
I agree that this patchset can only implement syntactic sugars,
nothing more (although for utility command you can do a bit more than
that). But that's already something people can use, mostly for
migration to postgres use cases probably.
I'm not sure why you couldn't implement an Oracle-style outer join
with such a hook? The requirement is that the parser can't leak any
node that the rest of the system doesn't know about, but you can do
what you want inside the parser. And as far as I can see we already
have an extensible node since bcac23de73b, so it seems to me that
there's enough infrastructure to handle this kind of use case.
The main downside is that you'll have to make a first pass to
transform your "custom raw statement" into a valid RawStmt in your
parser, and the system will do another one to transform it in a Query.
But apart from that it should work. Am I missing something?
Julien Rouhaud <rjuju123@gmail.com> writes:
I'm not sure why you couldn't implement an Oracle-style outer join
with such a hook?
Try it.
The requirement is that the parser can't leak any
node that the rest of the system doesn't know about, but you can do
what you want inside the parser.
That's not what the patch actually does, though. It only replaces
the grammar, not semantic analysis. So you couldn't associate the
(+)-decorated WHERE clause with the appropriate join. (And no,
I will not accept that it's okay to perform catalog lookups in
the grammar to get around that. See comment at the head of gram.y.)
In general, I'm having a hard time believing that anything very
interesting can be done at only the grammar level without changing
the parse analysis phase. That's not unrelated to the restriction
that the grammar can't do catalog accesses. Maybe with some fundamental
restructuring, we could get around that issue ... but this patch isn't
doing any fundamental restructuring, it's just putting a hook where it's
easy to do so. We've often found that such hooks aren't as useful as
they initially seem.
regards, tom lane
On Thu, Sep 16, 2021 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The requirement is that the parser can't leak any
node that the rest of the system doesn't know about, but you can do
what you want inside the parser.That's not what the patch actually does, though. It only replaces
the grammar, not semantic analysis. So you couldn't associate the
(+)-decorated WHERE clause with the appropriate join. (And no,
I will not accept that it's okay to perform catalog lookups in
the grammar to get around that. See comment at the head of gram.y.)
I never said that one should do catalog lookup for that? What I said
is that you can do a specific semantic analysis pass in the hook if
you know that you can have extensible nodes in your parsetree, and you
can do that with that hook unless I'm missing something?
Yes that's not ideal, but I don't see how it can be worse than writing
some middleware that parses the query, rewrite it to postgres style
sql on the fly so that postgres can parse it again. I'm also not sure
how the semantic analysis could be made generally extensible, if
possible at all, so that's the best I can propose.
If that approach is a deal breaker then fine I can accept it.
On Thu, Sep 16, 2021 at 1:23 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Sep 16, 2021 at 12:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The requirement is that the parser can't leak any
node that the rest of the system doesn't know about, but you can do
what you want inside the parser.That's not what the patch actually does, though. It only replaces
the grammar, not semantic analysis. So you couldn't associate the
(+)-decorated WHERE clause with the appropriate join. (And no,
I will not accept that it's okay to perform catalog lookups in
the grammar to get around that. See comment at the head of gram.y.)I never said that one should do catalog lookup for that? What I said
is that you can do a specific semantic analysis pass in the hook if
you know that you can have extensible nodes in your parsetree, and you
can do that with that hook unless I'm missing something?
Ah, now that I think more about it I think that you're talking about
unqualified fields? I was naively assuming that those wouldn't be
allowed by oracle, but I guess that wishful thinking.
Hi,
On 2021-09-15 12:57:00 -0400, Tom Lane wrote:
That's not what the patch actually does, though. It only replaces
the grammar, not semantic analysis. So you couldn't associate the
(+)-decorated WHERE clause with the appropriate join. (And no,
I will not accept that it's okay to perform catalog lookups in
the grammar to get around that. See comment at the head of gram.y.)
In general, I'm having a hard time believing that anything very
interesting can be done at only the grammar level without changing
the parse analysis phase. That's not unrelated to the restriction
that the grammar can't do catalog accesses. Maybe with some fundamental
restructuring, we could get around that issue ... but this patch isn't
doing any fundamental restructuring, it's just putting a hook where it's
easy to do so. We've often found that such hooks aren't as useful as
they initially seem.
Agreed - it doesn't make sense to me to have a hook that only replaces raw
parsing, without also hooking into parse-analysis. ISTM that the least a
patchset going for a parser hook would have to do is to do sufficient
restructuring so that one could hook together into both raw parsing and
analysis. It could still be two callbacks, but perhaps we'd ensure that
they're both set.
Greetings,
Andres Freund
On Wed, Sep 15, 2021 at 3:55 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-09-15 12:57:00 -0400, Tom Lane wrote:
That's not what the patch actually does, though. It only replaces
the grammar, not semantic analysis. So you couldn't associate the
(+)-decorated WHERE clause with the appropriate join. (And no,
I will not accept that it's okay to perform catalog lookups in
the grammar to get around that. See comment at the head of gram.y.)In general, I'm having a hard time believing that anything very
interesting can be done at only the grammar level without changing
the parse analysis phase. That's not unrelated to the restriction
that the grammar can't do catalog accesses. Maybe with some fundamental
restructuring, we could get around that issue ... but this patch isn't
doing any fundamental restructuring, it's just putting a hook where it's
easy to do so. We've often found that such hooks aren't as useful as
they initially seem.Agreed - it doesn't make sense to me to have a hook that only replaces raw
parsing, without also hooking into parse-analysis. ISTM that the least a
patchset going for a parser hook would have to do is to do sufficient
restructuring so that one could hook together into both raw parsing and
analysis. It could still be two callbacks, but perhaps we'd ensure that
they're both set.
This is a bad example as it doesn't require semantic analysis from
Postgres. While most of the tools out there tend to do simple replacement,
this can be done within a custom parser by simply walking its own AST,
evaluating join conditions against the expression, and rewriting the join
accordingly. Or, do you have an example that couldn't be done this way
within a custom parser?
--
Jonah H. Harris
Hi,
On 2021-09-15 16:35:53 -0400, Jonah H. Harris wrote:
On Wed, Sep 15, 2021 at 3:55 PM Andres Freund <andres@anarazel.de> wrote:
On 2021-09-15 12:57:00 -0400, Tom Lane wrote:
Agreed - it doesn't make sense to me to have a hook that only replaces raw
parsing, without also hooking into parse-analysis. ISTM that the least a
patchset going for a parser hook would have to do is to do sufficient
restructuring so that one could hook together into both raw parsing and
analysis. It could still be two callbacks, but perhaps we'd ensure that
they're both set.This is a bad example as it doesn't require semantic analysis from
Postgres.
"it"? I assume you mean a different type of join? If so, I'm highly doubtful -
without semantic analysis you can't really handle column references.
While most of the tools out there tend to do simple replacement,
this can be done within a custom parser by simply walking its own AST,
evaluating join conditions against the expression, and rewriting the join
accordingly. Or, do you have an example that couldn't be done this way
within a custom parser?
You cannot just "evaluate conditions" in a raw parse tree... You don't even
know what things are functions, columns etc, nor to what relation a column
belongs.
Greetings,
Andres Freund
Andres Freund <andres@anarazel.de> writes:
Agreed - it doesn't make sense to me to have a hook that only replaces raw
parsing, without also hooking into parse-analysis. ISTM that the least a
patchset going for a parser hook would have to do is to do sufficient
restructuring so that one could hook together into both raw parsing and
analysis. It could still be two callbacks, but perhaps we'd ensure that
they're both set.
The other problem here is that a simple call-this-instead-of-that
top-level hook doesn't seem all that useful anyway, because it leaves
you with the task of duplicating a huge amount of functionality that
you're then going to make some tweaks within. That's already an issue
when you're just thinking about the grammar, and if you have to buy
into it for parse analysis too, I doubt that it's going to be very
practical. If, say, you'd like to support some weird function that
requires special parsing and analysis rules, I don't see how you get
that out of this without first duplicating a very large fraction of
src/backend/parser/.
(As a comparison point, we do have a top-level hook for replacing
the planner; but I have never heard of anyone actually doing so.
There are people using that hook to *wrap* the planner with some
before-and-after processing, which is quite a different thing.)
I don't have any better ideas to offer :-( ... but I very much fear
that the approach proposed here is a dead end.
regards, tom lane
Hi,
On 2021-09-15 16:51:37 -0400, Tom Lane wrote:
The other problem here is that a simple call-this-instead-of-that
top-level hook doesn't seem all that useful anyway, because it leaves
you with the task of duplicating a huge amount of functionality that
you're then going to make some tweaks within. That's already an issue
when you're just thinking about the grammar, and if you have to buy
into it for parse analysis too, I doubt that it's going to be very
practical. If, say, you'd like to support some weird function that
requires special parsing and analysis rules, I don't see how you get
that out of this without first duplicating a very large fraction of
src/backend/parser/.
We do have a small amount of infrastructure around this - the hackery that
plpgsql uses. That's not going to help you with everything, but I think it
should be be enough to recognize e.g. additional top-level
statements. Obviously not enough to intercept parsing deeper into a statement,
but at least something.
And parse-analysis for some types of things will be doable with the current
infrastructure, by e.g. handling the new top-level statement in the hook, and
then passing the buck to the normal parse analysis for e.g. expressions in
that.
Obviously not going to get you that far...
(As a comparison point, we do have a top-level hook for replacing
the planner; but I have never heard of anyone actually doing so.
There are people using that hook to *wrap* the planner with some
before-and-after processing, which is quite a different thing.)
Citus IIRC has some paths that do not end up calling into the standard
planner, but only for a few simplistic cases.
I don't have any better ideas to offer :-( ... but I very much fear
that the approach proposed here is a dead end.
I unfortunately don't see a good way forward without changing the way we do
parsing on a more fundamental level :(.
Greetings,
Andres Freund
On Thu, Sep 16, 2021 at 5:40 AM Andres Freund <andres@anarazel.de> wrote:
I don't have any better ideas to offer :-( ... but I very much fear
that the approach proposed here is a dead end.I unfortunately don't see a good way forward without changing the way we do
parsing on a more fundamental level :(.
Using the ExtensibleNode infrastructure, I can see two ways to try to
leverage that.
First one would be to require modules to wrap their RawStmt->stmt in
an ExtensibleNode if they want to do anything that requires semantic
analysis, and handle such node in transformStmt() with another hook.
I think it would allow modules to do pretty much anything, at the cost
of walking the stmt twice and duplicating possibly huge amount of
analyze.c and friends.
The other one would be to allow the parser to leak ExtensibleNode in
the middle of the RawStmt and catch them in the transform* functions,
with e.g. some generic transformExtensibleNode(pstate, node,
some_identifier...) (the identifier giving both the general transform
action and some secondary info, like ParseExprKind for expressions).
This would avoid the downsides of the first approach, but would
require to call this new hook in a bunch of places.
Or we could combine both approaches so that the most common use cases,
like transformExprRecurse(), would be easily handled while more exotic
cases will have to go the hard way. Parser authors could still ask
for adding a new call to this new hook to ease their work in the next
major version.
Would any of that be a reasonable approach?
On Thu, 16 Sept 2021 at 05:33, Julien Rouhaud <rjuju123@gmail.com> wrote:
Would any of that be a reasonable approach?
The way I summarize all of the above is that
1) nobody is fundamentally opposed to the idea
2) we just need to find real-world example(s) and show that any
associated in-core patch provides all that is needed in a clean way,
since that point is currently in-doubt by senior committers.
So what is needed is some actual prototypes that explore this. I guess
that means they have to be open source, but those examples could be
under a different licence, as long as the in-core patch is clearly a
project submission to PostgreSQL.
I presume a few real-world examples could be:
* Grammar extensions to support additional syntax for Greenplum, Citus, XL
* A grammar that adds commands for an extension, such as pglogical
(Jim's example)
* A strict SQL Standard grammar/parser
* GQL implementation
--
Simon Riggs http://www.EnterpriseDB.com/
On Thu, Sep 23, 2021 at 07:37:27AM +0100, Simon Riggs wrote:
On Thu, 16 Sept 2021 at 05:33, Julien Rouhaud <rjuju123@gmail.com> wrote:
Would any of that be a reasonable approach?
The way I summarize all of the above is that
1) nobody is fundamentally opposed to the idea
2) we just need to find real-world example(s) and show that any
associated in-core patch provides all that is needed in a clean way,
since that point is currently in-doubt by senior committers.So what is needed is some actual prototypes that explore this. I guess
that means they have to be open source, but those examples could be
under a different licence, as long as the in-core patch is clearly a
project submission to PostgreSQL.I presume a few real-world examples could be:
* Grammar extensions to support additional syntax for Greenplum, Citus, XL
* A grammar that adds commands for an extension, such as pglogical
(Jim's example)
* A strict SQL Standard grammar/parser
* GQL implementation
As I mentioned, there's at least one use case that would work with that
approach that I will be happy to code in hypopg, which is an open source
project. As a quick prototype, here's a basic overview of how I can use this
hook to implement a CREATE HYPOTHETICAL INDEX command:
rjuju=# LOAD 'hypopg';
LOAD
rjuju=# create hypothetical index meh on t1(id);
CREATE INDEX
rjuju=# explain select * from t1 where id = 1;
QUERY PLAN
--------------------------------------------------------------------------------
Index Scan using "<13543>btree_t1_id" on t1 (cost=0.04..8.05 rows=1 width=13)
Index Cond: (id = 1)
(2 rows)
rjuju=# \d t1
Table "public.t1"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id | integer | | |
val | text | | |
My POC's grammar is only like:
CREATE HYPOTHETICAL INDEX opt_index_name ON relation_expr '(' index_params ')'
{
IndexStmt *n = makeNode(IndexStmt);
n->idxname = $4;
n->relation = $6;
n->accessMethod = DEFAULT_INDEX_TYPE;
n->indexParams = $8;
n->options = list_make1(makeDefElem("hypothetical", NULL, -1));
$$ = (Node *) n;
}
as I'm not willing to import the whole CREATE INDEX grammar for now for a patch
that may be rejected. I can however publish this POC if that helps. Note
that once my parser returns this parse tree, all I need to do is intercept
IndexStmt containing this option in a utility_hook and run my code rather than
a plain DefineIndex(), which works as intended as I showed above.
One could easily imagine similar usage to extend existing commands, like
implementing a new syntax on top of CREATE TABLE to implement an automatic
partition creation grammar (which would return multiple CreateStmt),
or even a partition manager.
I'm not an expert in other RDBMS syntax, but maybe you could use such a
hook to implement SQL Server or mysql syntax, which use at least different
quoting rules. Maybe Amazon people could confirm that as it looks like they
implemented an SQL Server parser using a similar hook?
So yes you can't create new commands or implement grammars that require
additional semantic analysis with this hook, but I think that there are still
real use cases that can be implemented using only a different parser.
Julien Rouhaud <rjuju123@gmail.com> writes:
My POC's grammar is only like:
CREATE HYPOTHETICAL INDEX opt_index_name ON relation_expr '(' index_params ')'
{
IndexStmt *n = makeNode(IndexStmt);
n->idxname = $4;
n->relation = $6;
n->accessMethod = DEFAULT_INDEX_TYPE;
n->indexParams = $8;
n->options = list_make1(makeDefElem("hypothetical", NULL, -1));
$$ = (Node *) n;
}
I'm not too impressed by this example, because there seems little
reason why you couldn't just define "hypothetical" as an index_param
option, and not need to touch the grammar at all.
as I'm not willing to import the whole CREATE INDEX grammar for now for a patch
that may be rejected.
The fact that that's so daunting seems to me to be a perfect illustration
of the problems with this concept. Doing anything interesting with a
hook like this will create a maintenance nightmare, because you'll have
to duplicate (and track every change in) large swaths of gram.y. To the
extent that you fail to, say, match every detail of the core's expression
grammar, you'll be creating a crappy user experience.
that once my parser returns this parse tree, all I need to do is intercept
IndexStmt containing this option in a utility_hook and run my code rather than
a plain DefineIndex(), which works as intended as I showed above.
And I'm even less impressed by the idea of half a dozen extensions
each adding its own overhead to the parser and also to ProcessUtility
so that they can process statements in this klugy, highly-restricted
way.
I do have sympathy for the idea that extensions would like to define
their own statement types. I just don't see a practical way to do it
in our existing parser infrastructure. This patch certainly doesn't
offer that.
regards, tom lane
On Thu, Sep 23, 2021 at 10:21:20AM -0400, Tom Lane wrote:
I do have sympathy for the idea that extensions would like to define
their own statement types. I just don't see a practical way to do it
in our existing parser infrastructure. This patch certainly doesn't
offer that.
Allowing extensions to define their own (utility) statement type is just a
matter of allowing ExtensibleNode as top level statement. As far as I can
see the only change required for that is to give those a specific command tag
in CreateCommandTag(), since transformStmt() default to emitting a utility
command. You can then easily intercept such statement in the utility hook and
fetch your custom struct.
I could do that but I'm assuming that you still wouldn't be satisfied as
custom parser would still be needed, whihc may or may not require to
copy/paste chunks of the core grammar?
If so, do you have any suggestion for an approach you would accept?
Hi,
On Fri, Sep 24, 2021 at 02:33:59PM +0800, Julien Rouhaud wrote:
On Thu, Sep 23, 2021 at 10:21:20AM -0400, Tom Lane wrote:
I do have sympathy for the idea that extensions would like to define
their own statement types. I just don't see a practical way to do it
in our existing parser infrastructure. This patch certainly doesn't
offer that.Allowing extensions to define their own (utility) statement type is just a
matter of allowing ExtensibleNode as top level statement. As far as I can
see the only change required for that is to give those a specific command tag
in CreateCommandTag(), since transformStmt() default to emitting a utility
command. You can then easily intercept such statement in the utility hook and
fetch your custom struct.I could do that but I'm assuming that you still wouldn't be satisfied as
custom parser would still be needed, whihc may or may not require to
copy/paste chunks of the core grammar?If so, do you have any suggestion for an approach you would accept?
Given the total lack of answer on the various improvements I suggested, I'm
assuming that no one is interested in that feature, so I'm marking it as
Rejected.