BUG #17379: Cannot issue multi-command statements using a replication connection

Started by PG Bug reporting formover 4 years ago3 messagesbugs

noreply@postgresql.org

over 4 years ago

The following bug has been logged on the website:

Bug reference: 17379
Logged by: Greg Rychlewski
Email address: greg.rychlewski@gmail.com
PostgreSQL version: 13.5
Operating system: Mac OS
Description:

When I issue the following multi-command query on a replication connection I
receive a syntax error:

$psql "dbname=postgres replication=database" -c "select 1;select 2;"
ERROR: syntax error

I was curious why this happens, given the documentation at [1]https://www.postgresql.org/docs/current/protocol-replication.html states "In
either physical replication or logical replication walsender mode, only the
simple query protocol can be used."

For comparison, the following query returns the expected result:

$psql "dbname=postgres replication=database" -c "select 1;"
?column?
----------
1

[1]: https://www.postgresql.org/docs/current/protocol-replication.html

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #17379: Cannot issue multi-command statements using a replication connection

PG Bug reporting form <noreply@postgresql.org> writes:

When I issue the following multi-command query on a replication connection I
receive a syntax error:

$psql "dbname=postgres replication=database" -c "select 1;select 2;"
ERROR: syntax error

As I mentioned on the pgsql-novice thread, I think the proximate cause
of this is that repl_gram.y's make_sqlcmd() tries to skip to the end
of the SQL statement, but for some reason it is coded to stop at a
semicolon. It needs to eat the whole rest of the string,
unconditionally.

It gets worse though. repl_scanner.l is not built to lex everything
the core scanner can (and I don't think we want to require it to).
But this approach to consuming non-replication commands requires it
to be able to do so. It's not very hard to find cases that break it,
for example

$ psql "dbname=postgres replication=database"
psql (15devel)
Type "help" for help.

postgres=# select $x$ " $x$;
ERROR: unterminated quoted string

Of course that happens because repl_scanner.l doesn't know about
dollar-quoting, so it tries to process the ", which it mis-recognizes
as the start of a quoted string. We probably want to shut down the
lexer as soon as we realize it's a non-replication command, instead
of asking it to lex to the end of the string.

Still worse, if you repeat that a few times, you find the behavior
is unstable:

postgres=# select $x$ " $x$;
?column?
----------
"
(1 row)

postgres=# select $x$ " $x$;
ERROR: unterminated quoted string
postgres=# select $x$ " $x$;
?column?
----------
"
(1 row)

postgres=# select $x$ " $x$;
ERROR: unterminated quoted string

I've not traced the reason for that in detail, but I bet it is
because there is static state in repl_scanner.l that doesn't
get cleaned up after elog(ERROR).

Oh, and another thing:

postgres=# /* foo */ select 42;
ERROR: syntax error

Presuming that all SQL statements start with a keyword has
its problems.

This sort of half-baked implementation was probably fine when
the replication protocol was first designed, but if we're going
to claim that clients can issue arbitrary SQL, it needs upgrading.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Tom Lane (#2)

Re: BUG #17379: Cannot issue multi-command statements using a replication connection

I wrote:

[ assorted whining about replication-command lexing ]

The business with unstable results turned out to be due to failure
to reset the lexer's start state, so that's a one-line fix that I
already pushed. Attached is the patch I propose to fix the rest
of it.

The core of this is deciding that we cannot try to run repl_scanner.l
over the whole input string when it is not a replication command.
That's just going to leave us chasing a moving target of what it has
to know to lex successfully. If it were designed to never have any
lexer failure conditions, maybe this could be made to work, but that
ship already sailed. Hence, what this does is to lex just the first
token, see if that's one of the replication-command keywords, and if
so push it back so that repl_gram.y will succeed. If not, we just
punt immediately without examining any more of the string. This
gets rid of all of the other failure conditions discussed, and allows
deletion of nearly as much code as it adds. Notably, we don't
need the SQLCmd node type anymore, since repl_gram.y will never be
asked to look at a general SQL command.

Note: I put the switch() recognizing command-starting keywords into
repl_scanner.l. I'd tried to put it in walsender.c, which seemed
like a more natural place, but the keyword token names aren't
currently exported outside repl_gram.y + repl_scanner.l. Moving
them to a header file seems like way more work than is justified.
You'd have to touch repl_scanner.l anyway while adding a new
command keyword, so this arrangement isn't terribly awful.

I also failed to resist the temptation to clean up some poor style
in repl_scanner.l, as well as bad decisions like not having
the same idea of what's whitespace as the core lexer does.

The part about removing the SQLCmd node type can't be back-patched
(since we can't renumber enum NodeTag in stable branches), but
I don't see any reason the rest of this can't be.

regards, tom lane

BUG #17379: Cannot issue multi-command statements using a replication connection

Attachments: