Add Pipelining support in psql

Started by Anthonin Bonnefoyover 1 year ago55 messageshackers
Jump to latest
#1Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com

Hi,

With \bind, \parse, \bind_named and \close, it is possible to issue
queries from psql using the extended protocol. However, it wasn't
possible to send those queries using pipelining and the only way to
test pipelined queries was through pgbench's tap tests.

The attached patch adds pipelining support to psql with 3 new
meta-commands, mirroring what's already done in pgbench:
- \startpipeline starts a new pipeline. All extended queries will be
queued until the end of the pipeline is reached.
- \endpipeline ends an ongoing pipeline. All queued commands will be
sent to the server and all responses will be processed by the psql.
- \syncpipeline queue a synchronisation point without flushing the
commands to the server

Those meta-commands will allow testing pipelined query behaviour using
psql regression tests.

Regards,
Anthonin

Attachments:

v01-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v01-0001-Add-pipelining-support-in-psql.patchDownload+785-5
#2Kirill Reshke
reshkekirill@gmail.com
In reply to: Anthonin Bonnefoy (#1)
Re: Add Pipelining support in psql

On Wed, 27 Nov 2024 at 14:50, Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

Hi,

With \bind, \parse, \bind_named and \close, it is possible to issue
queries from psql using the extended protocol. However, it wasn't
possible to send those queries using pipelining and the only way to
test pipelined queries was through pgbench's tap tests.

Hello, good concept. Our connection pooler testing will greatly
benefit from this feature. At the moment, our tests are golang-based
and build pipelines in extended proto and verify the outcome.
Regression tests based on SQL will be far more thorough and organic.

The attached patch adds pipelining support to psql with 3 new
meta-commands, mirroring what's already done in pgbench:
- \startpipeline starts a new pipeline. All extended queries will be
queued until the end of the pipeline is reached.
- \endpipeline ends an ongoing pipeline. All queued commands will be
sent to the server and all responses will be processed by the psql.
- \syncpipeline queue a synchronisation point without flushing the
commands to the server

I'm very doubtful about the \syncpipeline . Maybe we should instead
support \sync meta-command in psql? This will be a useful contribution
itself.

Those meta-commands will allow testing pipelined query behaviour using
psql regression tests.

Regards,
Anthonin

I haven't looked into the patch in detail yet.

--
Best regards,
Kirill Reshke

#3Kirill Reshke
reshkekirill@gmail.com
In reply to: Anthonin Bonnefoy (#1)
Re: Add Pipelining support in psql

On Wed, 27 Nov 2024 at 14:50, Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

Hi,

With \bind, \parse, \bind_named and \close, it is possible to issue
queries from psql using the extended protocol. However, it wasn't
possible to send those queries using pipelining and the only way to
test pipelined queries was through pgbench's tap tests.

The attached patch adds pipelining support to psql with 3 new
meta-commands, mirroring what's already done in pgbench:
- \startpipeline starts a new pipeline. All extended queries will be
queued until the end of the pipeline is reached.
- \endpipeline ends an ongoing pipeline. All queued commands will be
sent to the server and all responses will be processed by the psql.
- \syncpipeline queue a synchronisation point without flushing the
commands to the server

Those meta-commands will allow testing pipelined query behaviour using
psql regression tests.

Regards,
Anthonin

Hi! I stopped this:

```
db1=# \startpipeline
db1=# begin \parse p1
db1=*#
```
Notice the asterisks that appeared after parse the message. This
typically indicates we are in the tx block. this is however untrue
before the bind+exec message for p1 will be sent (\bind_name
metacommand). Am I correct?

--
Best regards,
Kirill Reshke

#4Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Anthonin Bonnefoy (#1)
Re: Add Pipelining support in psql

On Wed, 27 Nov 2024 at 10:50, Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

With \bind, \parse, \bind_named and \close, it is possible to issue
queries from psql using the extended protocol. However, it wasn't
possible to send those queries using pipelining and the only way to
test pipelined queries was through pgbench's tap tests.

Big +1. Not being able to use psql for even the most basic pipeline
tests has definitely been an annoyance of mine.

I played around quickly with this patch and it works quite well. A few
things that would be nice improvements I think. Feel free to change
the command names:
1. Add a \flush command that calls PQflush
2. Add a \flushrequest command that calls PQsendFlushRequest
3. Add a \getresult command so you can get a result from a query
without having to close the pipeline

To be clear, not having those additional commands isn't a blocker for
this patch imho, but I'd definitely miss them if they weren't there
when I would be using it.

#5Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Kirill Reshke (#3)
Re: Add Pipelining support in psql

On Wed, 27 Nov 2024 at 13:05, Kirill Reshke <reshkekirill@gmail.com> wrote:

```
db1=# \startpipeline
db1=# begin \parse p1
db1=*#
```
Notice the asterisks that appeared after parse the message. This
typically indicates we are in the tx block. this is however untrue
before the bind+exec message for p1 will be sent (\bind_name
metacommand). Am I correct?

This behaviour is expected, it also happens if you send "SELECT 1"
instead of "begin" in the parse message:
db1=# \startpipeline
db1=# SELECT 1 \parse p1
db1=*#

The reason this happens is that the first command in a pipeline sent
to the server opens an implicit transaction. See the "implicit COMMIT"
wording here[1]https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-PIPELINING, or look at this code in exec_parse_message:

/*
* Start up a transaction command so we can run parse analysis etc. (Note
* that this will normally change current memory context.) Nothing happens
* if we are already in one. This also arms the statement timeout if
* necessary.
*/
start_xact_command();

[1]: https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-PIPELINING

#6Michael Paquier
michael@paquier.xyz
In reply to: Jelte Fennema-Nio (#4)
Re: Add Pipelining support in psql

On Wed, Nov 27, 2024 at 01:45:32PM +0100, Jelte Fennema-Nio wrote:

I played around quickly with this patch and it works quite well. A few
things that would be nice improvements I think. Feel free to change
the command names:
1. Add a \flush command that calls PQflush
2. Add a \flushrequest command that calls PQsendFlushRequest
3. Add a \getresult command so you can get a result from a query
without having to close the pipeline

To be clear, not having those additional commands isn't a blocker for
this patch imho, but I'd definitely miss them if they weren't there
when I would be using it.

Hmm. The start, end and sync meta-commands are useful for testing. I
find the flush control a bit less interesting, TBH.

What would you use these for? Perhaps these ones would be better if
just applied to pgbench rather than psql where it is possible to pass
down custom SQL sequences with input files?
--
Michael

#7Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Michael Paquier (#6)
Re: Add Pipelining support in psql

On Thu, 28 Nov 2024 at 07:43, Michael Paquier <michael@paquier.xyz> wrote:

Hmm. The start, end and sync meta-commands are useful for testing. I
find the flush control a bit less interesting, TBH.

What would you use these for?

I guess mostly for interactively playing around with pipelining from psql.

But I think \getresult would be useful for testing too. This would
allow us to test that we can read part of the pipeline, without
sending a sync and waiting for everything.

To be clear \flushrequest and \flush would be necessary to make
\getresult work reliably.

#8Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Jelte Fennema-Nio (#7)
Re: Add Pipelining support in psql

On Wed, Nov 27, 2024 at 11:46 AM Kirill Reshke <reshkekirill@gmail.com> wrote:

I'm very doubtful about the \syncpipeline . Maybe we should instead
support \sync meta-command in psql? This will be a useful contribution
itself.

\syncpipeline is useful to cover regression tests involving implicit
transaction blocks within a pipeline where a sync acts as an implicit
COMMIT. What would be the use of sending a \sync outside of a
pipeline? All extended queries meta-commands sent by psql already send
a sync if you're not in a pipeline, so the only use of a \sync
meta-command would be to send a single sync (which could be achieved
with \startpipeline followed by a \endpipeline).

On Wed, Nov 27, 2024 at 1:54 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

This behaviour is expected, it also happens if you send "SELECT 1"
instead of "begin" in the parse message:
db1=# \startpipeline
db1=# SELECT 1 \parse p1
db1=*#

The reason this happens is that the first command in a pipeline sent
to the server opens an implicit transaction. See the "implicit COMMIT"
wording here[1], or look at this code in exec_parse_message:

I don't think that's the case here. Nothing was sent to the server so
there's no active transaction yet. From prompt.c, psql will show a '*'
when PQtransactionStatus is either PQTRANS_ACTIVE or PQTRANS_INTRANS.

Looking at the state of the PGConn, after a \startpipeline, we have:
db->xactStatus: (PGTransactionStatusType) PQTRANS_IDLE
db->asyncStatus: (PGAsyncStatusType) PGASYNC_IDLE
db->pipelineStatus: (PGpipelineStatus) PQ_PIPELINE_ON

After the first command is sent to the pipeline:
db->asyncStatus: (PGAsyncStatusType) PGASYNC_BUSY
db->xactStatus: (PGTransactionStatusType) PQTRANS_IDLE
db->pipelineStatus: (PGpipelineStatus) PQ_PIPELINE_ON

The xactStatus is idle, however, since asyncStatus reports busy, we
fall in the "if (conn->asyncStatus != PGASYNC_IDLE) return
PQTRANS_ACTIVE;" and psql display '*' in the prompt. This might be a
bit misleading since there's no ongoing transaction block on the
server and maybe it's worth having a new state? I've updated the patch
to display a tentative '|' when there's an ongoing pipeline. On the
other hand, a pipeline will start an implicit transaction block even
without BEGIN so leaving the '*' may be a good way to reflect that?

On Wed, Nov 27, 2024 at 1:45 PM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

I played around quickly with this patch and it works quite well. A few
things that would be nice improvements I think. Feel free to change
the command names:
1. Add a \flush command that calls PQflush
2. Add a \flushrequest command that calls PQsendFlushRequest
3. Add a \getresult command so you can get a result from a query
without having to close the pipeline

To be clear, not having those additional commands isn't a blocker for
this patch imho, but I'd definitely miss them if they weren't there
when I would be using it.

I will need a bit of time to check those and how they can be included
in the regression tests.

Attachments:

v02-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v02-0001-Add-pipelining-support-in-psql.patchDownload+799-10
#9Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Anthonin Bonnefoy (#8)
Re: Add Pipelining support in psql

An updated version of the patch, still a bit rough around the edges.

I've added \flushrequest, \flush and \getrequests meta-commands.
\flushrequest and \getresults look interesting as they add an
additional protocol message to test, but it also introduces the
possibility to be in the middle of an aborted pipeline which adds some
complexity to the patch. I'm a bit skeptical of \flush as \getresults
already automatically flush through PQgetResult and I haven't thought
of interesting tests that could rely on \flush.

I've added a new prompt interpolation instead of trying to hijack the
existing %x: %P reports the pipeline status with either
'|num_syncs,num_queries,pending_results|' for an ongoing pipeline or
'+num_syncs,num_queries,pending_results+' for an aborted pipeline.
Those are definitely open to debate for those but I found those
informations useful while debugging.

With the new \flushrequest and \getresults, it's visible that %x
doesn't reflect the transaction status when pipelines are involved. If
I do:

postgres=# \startpipeline
postgres=|0,0,0|# BEGIN \bind \g
postgres=|0,1,0|*# \flushrequest
postgres=|0,1,1|*# \getresults
BEGIN
postgres=|0,0,0|#

There's an ongoing transaction block running on the server but %x will
be empty as there's no queued commands and the server results are
consumed. I'm tempted to change %x to not display anything when within
a pipeline.

Calling PQgetResult while a command is piped and not followed by a
flushrequest or a sync would block since the asyncstatus is busy. So
\getresults has to keep track of the number of pending results to
avoid calling PQgetResult if there's no pending results. This is to
cover cases like:

postgres=# \startpipeline
postgres=|0,0,0|# select 1 \bind \g
postgres=|0,1,0|*# \getresults
No pending results to get
postgres=|0,1,0|*#

There are some interesting behaviors with pipeline + copy. Running the
following:

\startpipeline
SELECT $1 \bind 'val1' \g
COPY psql_pipeline FROM STDIN \bind \g
\flushrequest
\getresults
3 test3
\.
\endpipeline

Will complain with a "message type 0x5a arrived from server while
idle" (but still work). It's also possible to trigger a protocol error
with a query error after a piped copy:

truncate pgbench_accounts;
\startpipeline
copy pgbench_accounts FROM stdin \bind \g
SELECT 1 \bind \g
\endpipeline
1 2 3 asd
\.

Copy will fail with "ERROR: unexpected message type 0x50 during COPY
from stdin" and connection will be terminated.

Attachments:

v03-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v03-0001-Add-pipelining-support-in-psql.patchDownload+1311-16
#10Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Anthonin Bonnefoy (#9)
Re: Add Pipelining support in psql

An improved version with simplifications and refinements.

num_queries (2nd element in the pipeline status prompt) is now used to
track queued queries that were not flushed (with a flush request or
sync) to the server. It used to count both unflushed queries and
flushed queries.

Code in ExecQueryAndProcessResults should be simpler now.
- DiscardAbortedPipelineResults function handles both discarding of
results until a synchronisation point is reached or discarding of
results until there's no more pending results.
- The logic to process the pipeline's results and getting the next
results fit more with the existing flow.
- Tests didn't cover chunk results so I've added additional tests to
cover use of pipelining + FETCH_COUNT

Attachments:

v04-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v04-0001-Add-pipelining-support-in-psql.patchDownload+1330-11
#11Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Anthonin Bonnefoy (#10)
Re: Add Pipelining support in psql

On Tue, 10 Dec 2024 at 11:44, Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

num_queries (2nd element in the pipeline status prompt) is now used to
track queued queries that were not flushed (with a flush request or
sync) to the server. It used to count both unflushed queries and
flushed queries.

I skimmed the code a bit, but haven't looked closely at it yet. I did
try the patch out though. My thoughts below:

I think that new prompt is super useful, so useful in fact that I'd
suggest linking to it from the \startpipeline docs. I do think that
the wording in the docs could be a bit more precise:
1. The columns are not necessarily queries, they are messages or
commands. i.e. \parse and \bind_named both count as 1, even though
they form one query together.
2. messages not followed by \sync and \flushrequest, could very well
already "be sent to the server" (if the client buffer got full, or in
case of manual \flush). The main thing that \sync and \flushrequest do
is make sure that the server actually sends its own result back, even
if its buffer is not full yet.

The main feedback I have after playing around with this version is
that I'd like to have a \getresult (without the s), to only get a
single result. So that you can get results one by one, possibly
interleaved with some other queries again.

One thing I'm wondering is how useful the num_syncs count is in the
pipeline prompt, since you never really wait for a sync.

Regarding the usefulness of \flush. I agree it's not as useful as I
thought, because indeed \getresults already flushes everything. But
it's not completely useless either. The main way I was able to use it
interactively in a somewhat interesting way was to send it after a
long running query, and then while that's processing type up the next
query after it. Something like the following:

localhost jelte@postgres:5432-26274=
#> \startpipeline
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|0,0,0|> select pg_sleep(5) \bind \g
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|0,1,0|*> \flush
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|0,1,0|*> select 1 \bind \g
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|0,2,0|*> \syncpipeline
Time: 0.000 ms
localhost jelte@postgres:5432-26274=
#|1,0,2|*> \getresults
pg_sleep
──────────

(1 row)

?column?
──────────
1
(1 row)

Time: 0.348 ms

#12Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Jelte Fennema-Nio (#11)
Re: Add Pipelining support in psql

Thanks for the review!

On Thu, Dec 12, 2024 at 12:53 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

I think that new prompt is super useful, so useful in fact that I'd
suggest linking to it from the \startpipeline docs.

Good point, I've added a paragraph with the link to the %P prompt.

I do think that
the wording in the docs could be a bit more precise:
1. The columns are not necessarily queries, they are messages or
commands. i.e. \parse and \bind_named both count as 1, even though
they form one query together.

Yeah, I wasn't super happy with the num_queries and wording. I've
renamed the prompt columns to piped_syncs, piped_commands and
pending_results and added more precision on what counts as a command.

2. messages not followed by \sync and \flushrequest, could very well
already "be sent to the server" (if the client buffer got full, or in
case of manual \flush). The main thing that \sync and \flushrequest do
is make sure that the server actually sends its own result back, even
if its buffer is not full yet.

I had the wrong assumptions on what was happening. I've changed the
definition of pending_results to: "pending_results is the number of
commands that were followed by either a \flushrequest or a
\syncpipeline, forcing the server to send the results that can be
retrieved with \getresults."

The main feedback I have after playing around with this version is
that I'd like to have a \getresult (without the s), to only get a
single result. So that you can get results one by one, possibly
interleaved with some other queries again.

\getresults was easier to implement since it was more or less the
current behaviour :). I didn't want to add a new meta-command just for
that (I feel like I'm already adding a lot of them) so I've added a
num_results parameter to \getresults. You should be able to use
\getresults 1 to get a single result. A sync response count as a
result.

One thing I'm wondering is how useful the num_syncs count is in the
pipeline prompt, since you never really wait for a sync.

With the addition of the num_results parameter for \getresults,
knowing the number of syncs becomes more useful when results are
consumed one by one.

Regarding the usefulness of \flush. I agree it's not as useful as I
thought, because indeed \getresults already flushes everything. But
it's not completely useless either. The main way I was able to use it
interactively in a somewhat interesting way was to send it after a
long running query, and then while that's processing type up the next
query after it.

I feel there's a large overlap between \flush and \flushrequest. On
the other hand, if people want to test the behaviour of pushing data
with and without a flush request message, then \flush can be useful.

Attachments:

v05-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v05-0001-Add-pipelining-support-in-psql.patchDownload+1576-11
#13Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Anthonin Bonnefoy (#12)
Re: Add Pipelining support in psql

I feel there's a large overlap between \flush and \flushrequest. On
the other hand, if people want to test the behaviour of pushing data
with and without a flush request message, then \flush can be useful.

I've been looking into some issues related to a backend process stuck
in ClientWrite state blocking the WAL replay and it turns out that
pipelining + \flush was extremely useful to reproduce the issue. It
makes it possible to saturate the server->client socket buffer since
psql doesn't consume the results, easily triggering the state where
the process is stuck in ClientWrite. So I'm amending my position on
the usefulness of \flush and definitely see interesting usages.

I've also found out that I didn't correctly manage connection reset.
I've fixed this in v6 by resetting the pipeline counters on connection
reset and only calling discardAbortedPipelineResults if we're inside a
pipeline.

Attachments:

v06-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v06-0001-Add-pipelining-support-in-psql.patchDownload+1594-11
#14Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Anthonin Bonnefoy (#13)
Re: Add Pipelining support in psql

During my tests, I've noticed I didn't handle query cancellation, this
is now fixed. I've also added additional comments related to
available_results to make it clearer that it depends on what the
server has flushed to the client.

Attachments:

v07-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v07-0001-Add-pipelining-support-in-psql.patchDownload+1619-13
#15Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#14)
Re: Add Pipelining support in psql

On Tue, Jan 14, 2025 at 09:49:02AM +0100, Anthonin Bonnefoy wrote:

During my tests, I've noticed I didn't handle query cancellation, this
is now fixed. I've also added additional comments related to
available_results to make it clearer that it depends on what the
server has flushed to the client.

This is a pretty cool patch. I like the structure you have used for
the integration with the tracking of the number of commands, the
number of syncs (like pgbench) put in a pipeline, the number of
results requested and the number of results available. That makes the
whole easier to look at with a state in pset.

+	PSQL_SEND_PIPELINE_SYNC,
+	PSQL_START_PIPELINE_MODE,
+	PSQL_END_PIPELINE_MODE,
+	PSQL_FLUSH,
+	PSQL_SEND_FLUSH_REQUEST,
+	PSQL_GET_RESULTS,

These new values are inconsistent, let's use some more PSQL_SEND_*
here. That makes the whole set of values more greppable.

The tests in psql.sql are becoming really long. Perhaps it would be
better to split that into its own file, say psql_pipeline.sql? The
input file is already 2k lines, you are adding 15% more lines to that.

+ * otherwise, calling PQgetResults will block.

Likely PQgetResults => PQgetResult().

Wondering if the cancellation in ExecQueryAndProcessResults() is
sound, I've not been able to break it, still..

I can also get behind the additions of \flush and \flushrequest to
query different parts of libpq.

+	if (pset.requested_results == 0)
+		/* We've read all requested results, exit */
+		return res;

Set of nits with the style of the code, but I'd suggest to use
braces {} here, to outline that the comment is in a block. There's a
second, larger one in discardAbortedPipelineResults().

+	if (strcmp(cmd, "gx") == 0 && PQpipelineStatus(pset.db) != PQ_PIPELINE_OFF)
+	{
+		pg_log_error("\\gx not allowed in pipeline mode");
+		clean_extended_state();
+		return PSQL_CMD_ERROR;
+	}

What is the reasoning here behind this restriction? \gx is a wrapper
of \g with expanded mode on, but it is also possible to call \g with
expanded=on, bypassing this restriction.

Let's split the prompt patch with the support of %P into its own
patch.

-#define DEFAULT_PROMPT1 "%/%R%x%# "
-#define DEFAULT_PROMPT2 "%/%R%x%# "
+#define DEFAULT_PROMPT1 "%/%R%P%x%# "
+#define DEFAULT_PROMPT2 "%/%R%P%x%# "
 #define DEFAULT_PROMPT3 ">> "

I don't think that changing this default is a good idea. Everybody
can do that in their own .psqlrc (spoiler: I do).

The idea to use three fields with a hardcoded format does not look
like a good idea to me. I think that this should be done in a
different and more extensible way:
- Use %P to track if we are in pipeline mode on, off or abort.
- Define three new variables that behave like ROW_COUNT, but for the
fields you want to track here. These could then be passed down to a
PROMPT with %:name:, grammar already supported.

That would make the whole much more flexible. At it seems to me that
we could also add requested_results to this set? These could be named
with the same prefix, like PIPELINE_SYNC_COUNT, etc.
--
Michael

#16Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Michael Paquier (#15)
Re: Add Pipelining support in psql

On Tue, Feb 18, 2025 at 8:23 AM Michael Paquier <michael@paquier.xyz> wrote:

This is a pretty cool patch. I like the structure you have used for
the integration with the tracking of the number of commands, the
number of syncs (like pgbench) put in a pipeline, the number of
results requested and the number of results available. That makes the
whole easier to look at with a state in pset.

Thanks!

+       PSQL_SEND_PIPELINE_SYNC,
+       PSQL_START_PIPELINE_MODE,
+       PSQL_END_PIPELINE_MODE,
+       PSQL_FLUSH,
+       PSQL_SEND_FLUSH_REQUEST,
+       PSQL_GET_RESULTS,

These new values are inconsistent, let's use some more PSQL_SEND_*
here. That makes the whole set of values more greppable.

Changed.

The tests in psql.sql are becoming really long. Perhaps it would be
better to split that into its own file, say psql_pipeline.sql? The
input file is already 2k lines, you are adding 15% more lines to that.

Agreed, I wasn't sure if this was enough to warrant a dedicated test
file. This is now separated in psql_pipeline.sql.

+ * otherwise, calling PQgetResults will block.

Likely PQgetResults => PQgetResult().

Indeed, this is fixed.

Set of nits with the style of the code, but I'd suggest to use
braces {} here, to outline that the comment is in a block. There's a
second, larger one in discardAbortedPipelineResults().

+       if (strcmp(cmd, "gx") == 0 && PQpipelineStatus(pset.db) != PQ_PIPELINE_OFF)
+       {
+               pg_log_error("\\gx not allowed in pipeline mode");
+               clean_extended_state();
+               return PSQL_CMD_ERROR;
+       }

Changed.

What is the reasoning here behind this restriction? \gx is a wrapper
of \g with expanded mode on, but it is also possible to call \g with
expanded=on, bypassing this restriction.

The issue is that \gx enables expanded mode for the duration of the
query and immediately reset it in sendquery_cleanup. With pipelining,
the command is piped and displaying is done by either \endpipeline or
\getresults, so the flag change has no impact. Forbidding it was a way
to make it clearer that it won't have the expected effect. If we
wanted a similar feature, this would need to be done with something
like \endpipelinex or \getresultsx.

Let's split the prompt patch with the support of %P into its own
patch.

-#define DEFAULT_PROMPT1 "%/%R%x%# "
-#define DEFAULT_PROMPT2 "%/%R%x%# "
+#define DEFAULT_PROMPT1 "%/%R%P%x%# "
+#define DEFAULT_PROMPT2 "%/%R%P%x%# "
#define DEFAULT_PROMPT3 ">> "

I don't think that changing this default is a good idea. Everybody
can do that in their own .psqlrc (spoiler: I do).

The idea to use three fields with a hardcoded format does not look
like a good idea to me. I think that this should be done in a
different and more extensible way:
- Use %P to track if we are in pipeline mode on, off or abort.
- Define three new variables that behave like ROW_COUNT, but for the
fields you want to track here. These could then be passed down to a
PROMPT with %:name:, grammar already supported.

That would make the whole much more flexible. At it seems to me that
we could also add requested_results to this set? These could be named
with the same prefix, like PIPELINE_SYNC_COUNT, etc.

I've split the patch and created the 3 special variables:
PIPELINE_SYNC_COUNT, PIPELINE_COMMAND_COUNT, PIPELINE_RESULT_COUNT.

For requested_results, I don't think there's value in exposing it
since it is used as an exit condition and thus will always be 0
outside of ExecQueryAndProcessResults.

Attachments:

v08-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v08-0001-Add-pipelining-support-in-psql.patchDownload+919-12
v08-0002-Add-prompt-interpolation-and-variables-for-psql-.patchapplication/octet-stream; name=v08-0002-Add-prompt-interpolation-and-variables-for-psql-.patchDownload+92-5
#17Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#16)
Re: Add Pipelining support in psql

On Tue, Feb 18, 2025 at 06:34:20PM +0100, Anthonin Bonnefoy wrote:

On Tue, Feb 18, 2025 at 8:23 AM Michael Paquier <michael@paquier.xyz> wrote:

The tests in psql.sql are becoming really long. Perhaps it would be
better to split that into its own file, say psql_pipeline.sql? The
input file is already 2k lines, you are adding 15% more lines to that.

Agreed, I wasn't sure if this was enough to warrant a dedicated test
file. This is now separated in psql_pipeline.sql.

You have forgotten the expected output. Not a big issue as the input
was sent.

What is the reasoning here behind this restriction? \gx is a wrapper
of \g with expanded mode on, but it is also possible to call \g with
expanded=on, bypassing this restriction.

The issue is that \gx enables expanded mode for the duration of the
query and immediately reset it in sendquery_cleanup. With pipelining,
the command is piped and displaying is done by either \endpipeline or
\getresults, so the flag change has no impact. Forbidding it was a way
to make it clearer that it won't have the expected effect. If we
wanted a similar feature, this would need to be done with something
like \endpipelinex or \getresultsx.

Hmm, okay. If one wants one mode or the other it is always possible
to force one with \pset expanded when getting the results. Not sure
if there is any need for new specific commands for these two printing
the results. Another option would be to authorize the command to run,
but perhaps your option is just better as per the enforced behavior in
the output. So fine by me. There is coverage so we'll know if there
are arguments in favor of authorizing the command, if need be.

I've split the patch and created the 3 special variables:
PIPELINE_SYNC_COUNT, PIPELINE_COMMAND_COUNT, PIPELINE_RESULT_COUNT.

Thanks. Looks sensible now.

For requested_results, I don't think there's value in exposing it
since it is used as an exit condition and thus will always be 0
outside of ExecQueryAndProcessResults.

I've been playing with this patch and this configuration:
\set PROMPT1 '=(pipeline=%P,sync=%:PIPELINE_SYNC_COUNT:,cmd=%:PIPELINE_COMMAND_COUNT:,res=%:PIPELINE_RESULT_COUNT:)%#'

That's long, but seeing the evolution of the pipeline status is pretty
cool depending on the meta-commands used.

While testing, I have been able to run into an assertion failure by
adding some tests in psql.sql to check for the case of inactive
branches for \if.  For example:
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1047,11 +1047,15 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two;
    \echo arg1 arg2 arg3 arg4 arg5
    \echo arg1
    \encoding arg1
+   \endpipeline
    \errverbose

And the report:
+psql: mainloop.c:513: MainLoop: Assertion `conditional_active(cond_stack)' failed.

We should have tests for all new six meta-commands in psql.sql.
MainLoop() is wrong when in pipeline mode for inactive branches.
--
Michael

#18Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Michael Paquier (#17)
Re: Add Pipelining support in psql

On Thu, Feb 20, 2025 at 9:02 AM Michael Paquier <michael@paquier.xyz> wrote:

You have forgotten the expected output. Not a big issue as the input
was sent.

I was writing the mail with the missing file when you sent this mail.
This is fixed.

While testing, I have been able to run into an assertion failure by
adding some tests in psql.sql to check for the case of inactive
branches for \if.  For example:
--- a/src/test/regress/sql/psql.sql
+++ b/src/test/regress/sql/psql.sql
@@ -1047,11 +1047,15 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two;
\echo arg1 arg2 arg3 arg4 arg5
\echo arg1
\encoding arg1
+   \endpipeline
\errverbose

And the report:
+psql: mainloop.c:513: MainLoop: Assertion `conditional_active(cond_stack)' failed.

We should have tests for all new six meta-commands in psql.sql.
MainLoop() is wrong when in pipeline mode for inactive branches.

Ha yeah, I forgot about the inactive branches. I've added the new
commands and fixed the behaviour.

A small issue I've noticed while testing: When a pipeline has at least
one queue command, pqClearConnErrorState isn't called in
PQsendQueryStart and errors are appended. For example:

\startpipeline
select 1 \bind \g
select 1;
PQsendQuery not allowed in pipeline mode
select 1;
PQsendQuery not allowed in pipeline mode
PQsendQuery not allowed in pipeline mode

This looks more like an issue on libpq's side as there's no way to
reset or advance the errorReported from ExecQueryAndProcessResults
(plus PQerrorMessage seems to ignore errorReported). I've added an
additional test to track this behaviour for now as this would probably
be better discussed in a dedicated thread.

Attachments:

v09-0002-Add-prompt-interpolation-and-variables-for-psql-.patchapplication/octet-stream; name=v09-0002-Add-prompt-interpolation-and-variables-for-psql-.patchDownload+92-5
v09-0001-Add-pipelining-support-in-psql.patchapplication/octet-stream; name=v09-0001-Add-pipelining-support-in-psql.patchDownload+1640-12
#19Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#18)
Re: Add Pipelining support in psql

On Thu, Feb 20, 2025 at 10:29:33AM +0100, Anthonin Bonnefoy wrote:

Ha yeah, I forgot about the inactive branches. I've added the new
commands and fixed the behaviour.

And I did not notice that it was as simple as forcing the status in
the routines for the new meta-commands, as we do for the existing
ones. Noted.

This looks more like an issue on libpq's side as there's no way to
reset or advance the errorReported from ExecQueryAndProcessResults
(plus PQerrorMessage seems to ignore errorReported). I've added an
additional test to track this behaviour for now as this would probably
be better discussed in a dedicated thread.

I am not sure if we should change that, actually, as it does not feel
completely wrong to stack these errors. That's a bit confusing,
sure. Perhaps a new libpq API to retrieve stacked errors when we are
in pipeline mode would be more adapted? The design would be
different.

Anyway, I've stared at the result processing code for a couple of
hours, and the branches we're taking for the pipeline modes seem to be
rather right the way you have implemented them. The docs, comments
and tests needed quite a few tweaks and edits to be more consistent.
There were some grammar mistakes, some frenchisms.

I'm hoping that there won't be any issues, but let's be honest, I am
definitely sure there will be some more tuning required. It comes
down to if we want this set of features, and I do to be able to expand
tests in core with the extended query protocol and pipelines, knowing
that there is an ask for out-of-core projects. This one being
reachable with a COPY gave me a huge smile:
+message type 0x5a arrived from server while idle

So let's take one step here, I have applied the main patch. I am
really excited by the possibilities all this stuff offers.

Attached are the remaining pieces, split here because they are
different bullet points:
- Tests for implicit transactions with various commands, with some
edits.
- Prompt support, with more edits.

I'm putting these on standby for a few days, to let the buildfarm
digest the main change.
--
Michael

Attachments:

v10-0001-Add-tests-with-implicit-transaction-blocks-and-p.patchtext/x-diff; charset=us-asciiDownload+184-5
v10-0002-Add-prompt-interpolation-and-variables-for-psql-.patchtext/x-diff; charset=us-asciiDownload+95-5
#20Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#19)
Re: Add Pipelining support in psql

On Fri, Feb 21, 2025 at 11:33:41AM +0900, Michael Paquier wrote:

Attached are the remaining pieces, split here because they are
different bullet points:
- Tests for implicit transactions with various commands, with some
edits.
- Prompt support, with more edits.

I'm putting these on standby for a few days, to let the buildfarm
digest the main change.

Initial digestion has gone well. The remaining pieces have been done
as 3ce357584e79 and a4e986ef5a46. For the prompt part, I have added a
couple of tests with \echo and the variables. The patch felt
incomplete without these. Perhaps we could extend them more, at least
we have a start point.
--
Michael

#21Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Michael Paquier (#20)
#22Daniel Verite
daniel@manitou-mail.org
In reply to: Anthonin Bonnefoy (#16)
#23Michael Paquier
michael@paquier.xyz
In reply to: Daniel Verite (#22)
#24Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Michael Paquier (#23)
#25Daniel Verite
daniel@manitou-mail.org
In reply to: Anthonin Bonnefoy (#24)
#26Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Daniel Verite (#25)
#27Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#24)
#28Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#26)
#29Daniel Verite
daniel@manitou-mail.org
In reply to: Anthonin Bonnefoy (#26)
#30Michael Paquier
michael@paquier.xyz
In reply to: Daniel Verite (#29)
#31Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Michael Paquier (#20)
#32Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Jelte Fennema-Nio (#31)
#33Daniel Verite
daniel@manitou-mail.org
In reply to: Jelte Fennema-Nio (#31)
#34Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Daniel Verite (#33)
#35Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#34)
#36Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#34)
#37Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Michael Paquier (#36)
#38Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Anthonin Bonnefoy (#37)
#39Daniel Verite
daniel@manitou-mail.org
In reply to: Anthonin Bonnefoy (#34)
#40Michael Paquier
michael@paquier.xyz
In reply to: Daniel Verite (#39)
#41Michael Paquier
michael@paquier.xyz
In reply to: Jelte Fennema-Nio (#38)
#42Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#37)
#43Daniel Verite
daniel@manitou-mail.org
In reply to: Michael Paquier (#41)
#44Noah Misch
noah@leadboat.com
In reply to: Michael Paquier (#19)
#45Jelte Fennema-Nio
postgres@jeltef.nl
In reply to: Noah Misch (#44)
#46a.kozhemyakin
a.kozhemyakin@postgrespro.ru
In reply to: Michael Paquier (#30)
#47Michael Paquier
michael@paquier.xyz
In reply to: a.kozhemyakin (#46)
#48Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#44)
#49Noah Misch
noah@leadboat.com
In reply to: Michael Paquier (#48)
#50Michael Paquier
michael@paquier.xyz
In reply to: Noah Misch (#49)
#51Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#47)
#52Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#51)
#53Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Michael Paquier (#52)
#54Michael Paquier
michael@paquier.xyz
In reply to: Anthonin Bonnefoy (#53)
#55Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#54)