[PROPOSAL] comments in repl_scanner
Hi hackers,
certain instrumentation tools do prefix each statement with an
informational comment, typically to provide some tracing information to
logs (datadog for example). While this works for SQL statements, it's not
possible with logical replication statements because their grammar doesn't
support comments and it is causing unnecessary syntax errors.
I can imagine this limitation is likely a holdover from the system's
evolution from physical replication where comments make no sense. However,
in logical replication walsender mode both SQL and replication statements
can be issued [1], so the current state brings the necessity to distinguish
when to inject the comment and when not to. What do you feel, are there any
unexpected impacts of extending the replication grammar with comments?
I attached a simple patch extending the `replication/repl_scanner.l` with
following test:
```
psql "dbname=postgres replication=database"
Border style is 2.
Line style is unicode.
psql (14.19 (Homebrew), server 19devel)
WARNING: psql major version 14, server major version 19.
Some psql features might not work.
Type "help" for help.
postgres=# -- foo
/* bar
/* blah
*/
*/
IDENTIFY_SYSTEM;
┌─────────────────────┬──────────┬────────────┬──────────┐
│ systemid │ timeline │ xlogpos │ dbname │
├─────────────────────┼──────────┼────────────┼──────────┤
│ 7561021876571120357 │ 1 │ 0/0176B5A8 │ postgres │
└─────────────────────┴──────────┴────────────┴──────────┘
(1 row)
```
What do you feel, is that a good idea and/or are there any downsides I am
missing? Thank you all for the feedback.
Regards,
Matej Klonfar
Attachments:
0001-Allowing-comments-in-replication-statements.patchapplication/octet-stream; name=0001-Allowing-comments-in-replication-statements.patchDownload
From 27d3b98cdf869b413efaadbc8dc3946f7d530bf8 Mon Sep 17 00:00:00 2001
From: Matej Klonfar <matej.klonfar@gmail.com>
Date: Tue, 14 Oct 2025 12:47:36 +0200
Subject: [PATCH] Allowing comments in replication statements
---
src/backend/replication/repl_scanner.l | 71 +++++++++++++++++++++++++-
1 file changed, 70 insertions(+), 1 deletion(-)
diff --git a/src/backend/replication/repl_scanner.l b/src/backend/replication/repl_scanner.l
index b6930e28659..035a82c2e4e 100644
--- a/src/backend/replication/repl_scanner.l
+++ b/src/backend/replication/repl_scanner.l
@@ -45,6 +45,9 @@ struct replication_yy_extra_type
/* Work area for collecting literals */
StringInfoData litbuf;
+
+ /* Depth of nesting in slash-star comments */
+ int xcdepth;
};
static void startlit(yyscan_t yyscanner);
@@ -75,15 +78,44 @@ static void addlitchar(unsigned char ychar, yyscan_t yyscanner);
* Exclusive states:
* <xd> delimited identifiers (double-quoted identifiers)
* <xq> standard single-quoted strings
+ * <xc> extended C-style comments
*/
%x xd
%x xq
+%x xc
space [ \t\n\r\f\v]
+op_chars [\~\!\@\#\^\&\|\`\?\+\-\*\/\%\<\>\=]
+non_newline [^\n\r]
+comment ("--"{non_newline}*)
+whitespace ({space}+|{comment})
quote '
quotestop {quote}
+/* C-style comments
+ *
+ * The "extended comment" syntax closely resembles allowable operator syntax.
+ * The tricky part here is to get lex to recognize a string starting with
+ * slash-star as a comment, when interpreting it as an operator would produce
+ * a longer match --- remember lex will prefer a longer match! Also, if we
+ * have something like plus-slash-star, lex will think this is a 3-character
+ * operator whereas we want to see it as a + operator and a comment start.
+ * The solution is two-fold:
+ * 1. append {op_chars}* to xcstart so that it matches as much text as
+ * {operator} would. Then the tie-breaker (first matching rule of same
+ * length) ensures xcstart wins. We put back the extra stuff with yyless()
+ * in case it contains a star-slash that should terminate the comment.
+ * 2. In the operator rule, check for slash-star within the operator, and
+ * if found throw it back with yyless(). This handles the plus-slash-star
+ * problem.
+ * Dash-dash comments have similar interactions with the operator rule.
+ */
+xcstart \/\*{op_chars}*
+xcstop \*+\/
+xcinside [^*/]+
+
+
/* Extended quote
* xqdouble implements embedded quote, ''''
*/
@@ -145,7 +177,44 @@ USE_SNAPSHOT { return K_USE_SNAPSHOT; }
WAIT { return K_WAIT; }
UPLOAD_MANIFEST { return K_UPLOAD_MANIFEST; }
-{space}+ { /* do nothing */ }
+{whitespace} { /* do nothing */ }
+{xcstart} {
+ yyextra->xcdepth = 0;
+ BEGIN(xc);
+ /* Put back any characters past slash-star; see above */
+ yyless(2);
+ }
+
+<xc>{
+{xcstart} {
+ (yyextra->xcdepth)++;
+ /* Put back any characters past slash-star; see above */
+ yyless(2);
+ }
+
+{xcstop} {
+ if (yyextra->xcdepth <= 0)
+ BEGIN(INITIAL);
+ else
+ (yyextra->xcdepth)--;
+ }
+
+{xcinside} {
+ /* ignore */
+ }
+{op_chars} {
+ /* ignore */
+ }
+
+\*+ {
+ /* ignore */
+ }
+
+<<EOF>> {
+ replication_yyerror(NULL, yyscanner, "unterminated /* comment");
+ }
+} /* <xc> */
+
{digit}+ {
yylval->uintval = strtoul(yytext, NULL, 10);
--
2.51.0
On Tue, Oct 14, 2025 at 01:13:38PM +0200, Matěj Klonfar wrote:
I can imagine this limitation is likely a holdover from the system's
evolution from physical replication where comments make no sense. However,
in logical replication walsender mode both SQL and replication statements
can be issued [1], so the current state brings the necessity to distinguish
when to inject the comment and when not to. What do you feel, are there any
unexpected impacts of extending the replication grammar with comments?I attached a simple patch extending the `replication/repl_scanner.l` with
following test:What do you feel, is that a good idea and/or are there any downsides I am
missing? Thank you all for the feedback.
A downside here is the extra maintenance that this creates. I cannot
get much excited about the support of comments in the context of
replication commands, TBH. That's just one opinion, of course, others
may have a different view. Note that even if one uses
"replication=database", the code falls back to the main query parser,
where comments work.
FWIW, this is a bit like a past proposal with making the replication
commands case-insensitive, back in 2017:
/messages/by-id/CAB7nPqSg2ZECE+ctGXieiCpVFibLyWgwAGaoEP3-zwYqJwaP-g@mail.gmail.com
The argument for rejection back then is that these commands are not
for manual consumption, so we should keep the replication grammar file
simpler. Perhaps there could be an argument with allowing commands
when it comes to the logging of replication commands in the server
logs, where comments can be used as a reference. That's a very narrow
case, so I'd still argue that this is not enough to balance in favor
of this proposal. Just my 2c.
--
Michael
Hi Michael,
Thanks for your notes.
čt 16. 10. 2025 v 6:08 odesílatel Michael Paquier <michael@paquier.xyz>
napsal:
On Tue, Oct 14, 2025 at 01:13:38PM +0200, Matěj Klonfar wrote:
I can imagine this limitation is likely a holdover from the system's
evolution from physical replication where comments make no sense.However,
in logical replication walsender mode both SQL and replication statements
can be issued [1], so the current state brings the necessity todistinguish
when to inject the comment and when not to. What do you feel, are there
any
unexpected impacts of extending the replication grammar with comments?
I attached a simple patch extending the `replication/repl_scanner.l` with
following test:What do you feel, is that a good idea and/or are there any downsides I am
missing? Thank you all for the feedback.A downside here is the extra maintenance that this creates. I cannot
get much excited about the support of comments in the context of
replication commands, TBH.
Makes sense, on the other hand, the most of the work was done
in e4a8fb8fefb9 where `yyextra` was introduced to handle the context
information (necessary for nested `/* */` comments). This proposal is then,
in my opinion, just a small enhancement built on top of that with no
significant impact to the maintenance costs.
That's just one opinion, of course, others
may have a different view. Note that even if one uses
"replication=database", the code falls back to the main query parser,
where comments work.
I do not agree. Tried now on the current master HEAD (commit 41c674d2):
```
psql "dbname=postgres replication=database"
Border style is 2.
Line style is unicode.
psql (14.19 (Homebrew), server 19devel)
WARNING: psql major version 14, server major version 19.
Some psql features might not work.
Type "help" for help.
postgres=# /* foo */ IDENTIFY_SYSTEM;
2025-10-16 09:49:59.152 CEST [49754] ERROR: syntax error at or near
"IDENTIFY_SYSTEM" at character 11
2025-10-16 09:49:59.152 CEST [49754] STATEMENT: /* foo */ IDENTIFY_SYSTEM;
ERROR: syntax error at or near "IDENTIFY_SYSTEM"
LINE 1: /* foo */ IDENTIFY_SYSTEM;
```
Matej
Show quoted text
FWIW, this is a bit like a past proposal with making the replication
commands case-insensitive, back in 2017:/messages/by-id/CAB7nPqSg2ZECE+ctGXieiCpVFibLyWgwAGaoEP3-zwYqJwaP-g@mail.gmail.com
The argument for rejection back then is that these commands are not
for manual consumption, so we should keep the replication grammar file
simpler. Perhaps there could be an argument with allowing commands
when it comes to the logging of replication commands in the server
logs, where comments can be used as a reference. That's a very narrow
case, so I'd still argue that this is not enough to balance in favor
of this proposal. Just my 2c.
--
Michael
On 14.10.25 13:13, Matěj Klonfar wrote:
certain instrumentation tools do prefix each statement with an
informational comment, typically to provide some tracing information to
logs (datadog for example). While this works for SQL statements, it's
not possible with logical replication statements because their grammar
doesn't support comments and it is causing unnecessary syntax errors.I can imagine this limitation is likely a holdover from the system's
evolution from physical replication where comments make no sense.
However, in logical replication walsender mode both SQL and replication
statements can be issued [1], so the current state brings the necessity
to distinguish when to inject the comment and when not to. What do you
feel, are there any unexpected impacts of extending the replication
grammar with comments?
Another approach could be to get rid of repl_scanner.l and use the main
scanner. This would be similar to how plpgsql works.