BUG #19494: Error on transaction commit inside pipeline triggers psql's Assert
The following bug has been logged on the website:
Bug reference: 19494
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 18.4
Operating system: Ubuntu 24.04
Description:
The following psql script:
CREATE TABLE t(a INTEGER PRIMARY KEY DEFERRABLE INITIALLY DEFERRED);
\startpipeline
INSERT INTO t VALUES ($1), ($1) RETURNING * \bind 1 \sendpipeline
\endpipeline
results in:
a
---
1
1
(2 rows)
ERROR: duplicate key value violates unique constraint "t_pkey"
DETAIL: Key (a)=(1) already exists.
psql: common.c:1503: discardAbortedPipelineResults: Assertion
`pset.available_results > 0' failed.
Program terminated with signal SIGABRT, Aborted.
(gdb) bt
#0 __pthread_kill_implementation (threadid=281473811984096,
signo=signo@entry=6, no_tid=no_tid@entry=0)
at ./nptl/pthread_kill.c:44
#1 0x0000ffffba60b718 [PAC] in __pthread_kill_internal (threadid=<optimized
out>, signo=6) at ./nptl/pthread_kill.c:89
#2 0x0000ffffba5b757c in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3 0x0000ffffba5a1d48 [PAC] in __GI_abort () at ./stdlib/abort.c:77
#4 0x0000ffffba5b05e0 [PAC] in __assert_fail_base (fmt=<optimized out>,
assertion=<optimized out>,
file=0xaaaac5a317a8 "common.c", line=1503, function=<optimized out>) at
./assert/assert.c:118
#5 0x0000aaaac59d3ffc [PAC] in discardAbortedPipelineResults () at
common.c:1503
#6 0x0000aaaac59d4b28 in ExecQueryAndProcessResults (query=0xaaaae5929250
"INSERT INTO t VALUES ($1), ($1) RETURNING * ",
elapsed_msec=0xffffe4d657b0, svpt_gone_p=0xffffe4d657af, is_watch=false,
min_rows=0, opt=0x0, printQueryFout=0x0)
at common.c:1830
#7 0x0000aaaac59d37b0 in SendQuery (query=0xaaaae5929250 "INSERT INTO t
VALUES ($1), ($1) RETURNING * ") at common.c:1214
#8 0x0000aaaac59ee4f8 in MainLoop (source=0xffffba730870 <_IO_2_1_stdin_>)
at mainloop.c:515
#9 0x0000aaaac59cd488 in process_file (filename=0x0,
use_relative_path=false) at command.c:4977
#10 0x0000aaaac59fbbcc in main (argc=10, argv=0xffffe4d65f68) at
startup.c:424
Reproduced starting from 41625ab8e (with s/\sendpipeline/\g/ before
17caf6644).
On Tue, May 26, 2026 at 07:00:01PM +0000, PG Bug reporting form wrote:
The following psql script:
CREATE TABLE t(a INTEGER PRIMARY KEY DEFERRABLE INITIALLY DEFERRED);
\startpipeline
INSERT INTO t VALUES ($1), ($1) RETURNING * \bind 1 \sendpipeline
\endpipelinepsql: common.c:1503: discardAbortedPipelineResults: Assertion
`pset.available_results > 0' failed.
That's one assertion, and with a bit more imagination for a deferred
constraint, I can trigger a second one for the number of syncs when at
the end of a pipeline:
\startpipeline
INSERT INTO pipeline_defer_tab VALUES ($1), ($1) \bind 1 \sendpipeline
\syncpipeline
SELECT 1 \bind \sendpipeline
\endpipeline
#5 0x000000000041bb0a in ExecQueryAndProcessResults (query=0x5db560
"SELECT 1 ", elapsed_msec=0x7fffffffc660, svpt_gone_p=0x7fffffffc65f,
is_watch=false, min_rows=0, opt=0x0, printQueryFout=0x0) at
common.c:2190
2190 Assert(pset.piped_syncs == 0);
(gdb) p pset.piped_syncs
$1 = 1
I am completely sure yet, but it looks like we will need to be smarter
with the handling of the number of piped commands by tracking them
across the syncs in the shape of a queue, or something like that? So
it feels like we need to think harder about the tracking of this
activity depending on the state of the pipeline we're in. Or we could
lift some of these assertions, but that would not be right to me.
--
Michael
On Thu, May 28, 2026 at 12:51:38PM +0900, Michael Paquier wrote:
I am completely sure yet, but it looks like we will need to be smarter
with the handling of the number of piped commands by tracking them
across the syncs in the shape of a queue, or something like that? So
it feels like we need to think harder about the tracking of this
activity depending on the state of the pipeline we're in. Or we could
lift some of these assertions, but that would not be right to me.
Hmm. Taking a step back this would be overcomplicating things. As
long as we are careful to consume the synced results still in a
pipeline, it looks like we should be fine. While digging into it, I
have found a third assertion that was triggerable with
available_results at the end of the pipeline, once I began mixing
\getresults with a deferred error.
This stuff is tricky enough that I may not have overseen all the
patterns possible, of course, at least this is progress.
Alexander, what do you think?
--
Michael
Attachments:
0001-psql-Fix-failures-with-deferred-errors-in-pipelines.patchtext/plain; charset=us-asciiDownload+166-13
Hello Michael,
28.05.2026 08:26, Michael Paquier wrote:
On Thu, May 28, 2026 at 12:51:38PM +0900, Michael Paquier wrote:
I am completely sure yet, but it looks like we will need to be smarter
with the handling of the number of piped commands by tracking them
across the syncs in the shape of a queue, or something like that? So
it feels like we need to think harder about the tracking of this
activity depending on the state of the pipeline we're in. Or we could
lift some of these assertions, but that would not be right to me.Hmm. Taking a step back this would be overcomplicating things. As
long as we are careful to consume the synced results still in a
pipeline, it looks like we should be fine. While digging into it, I
have found a third assertion that was triggerable with
available_results at the end of the pipeline, once I began mixing
\getresults with a deferred error.This stuff is tricky enough that I may not have overseen all the
patterns possible, of course, at least this is progress.Alexander, what do you think?
While testing the patch, I've observed apparently new anomaly. psql got
stuck inside this loop:
if (end_pipeline)
{
/*
* Reset available/requested results. Normally these are already 0,
* but an error generated by Sync processing itself can leave some of
* them behind. Consume them before exiting pipeline mode.
*/
while (pset.piped_syncs > 0)
{
PGresult *remaining = PQgetResult(pset.db);
if (remaining == NULL)
continue;
...
}
it's happening upon/after postgres process termination, so PQgetResult()
returns NULL, pset.piped_syncs == 1. I need more time to look deeper and
to come with a reproducer, but maybe you can already see what's wrong.
Best regards,
Alexander
On Fri, May 29, 2026 at 07:00:01AM +0300, Alexander Lakhin wrote:
it's happening upon/after postgres process termination, so PQgetResult()
returns NULL, pset.piped_syncs == 1. I need more time to look deeper and
to come with a reproducer, but maybe you can already see what's wrong.
Yeah, I do. Nice catch. See this sequence to reproduce the problem:
\startpipeline
INSERT INTO psql_pipeline_defer VALUES (1), (1) \bind \sendpipeline
\syncpipeline
SELECT pg_terminate_backend(pg_backend_pid()) \bind \sendpipeline
SELECT 1 \bind \sendpipeline
\endpipeline
When ending the pipeline the loop consuming the results is stuck, so
we could check the connection state. We are going to enter in a
freeze of the branches due to beta1 next week, so let's take our time.
Please feel to use the v2 attached for your tests. I am also testing
it more on my side.
--
Michael
Attachments:
v2-0001-psql-Fix-issues-with-deferred-errors-in-pipelines.patchtext/plain; charset=us-asciiDownload+227-12
Hello Michael,
29.05.2026 08:00, Michael Paquier wrote:
When ending the pipeline the loop consuming the results is stuck, so
we could check the connection state. We are going to enter in a
freeze of the branches due to beta1 next week, so let's take our time.Please feel to use the v2 attached for your tests. I am also testing
it more on my side.
Thank you for the fix! I haven't discovered new issues so far.
I've found a way to trigger another assertion, but I don't think it's
legitimate:
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -880,7 +880,7 @@ RemoveSocketFiles(void)
static void
socket_set_nonblocking(bool nonblocking)
{
- if (MyProcPort == NULL)
+ if ((MyProcPort == NULL) || (rand() % 10 == 0))
ereport(ERROR,
(errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
errmsg("there is no client connection")));
makes this script:
(
echo "\startpipeline"
for i in {1..50}; do echo "\syncpipeline"; done
echo "
SELECT 1;
\endpipeline
\startpipeline
SELECT 2;
\endpipeline
"
) | psql
trigger
psql: common.c:2055: ExecQueryAndProcessResults: Assertion `pset.piped_syncs > 0' failed.
Probably there could be another way to throw an ERROR on \syncpipeline,
but I have no good idea yet.
Running psql_pipeline in a loop with the above modification applied:
for i in {1..1000}; do echo "ITERATION $i"; NO_TEMP_INSTALL=1 TESTS=psql_pipeline make -s check-tests; done
I also observed the test hanging (at iterations 284. 543, 218) due to loss
of synchronization between psql and postgres.
Best regards,
Alexander
On Sun, May 31, 2026 at 12:00:01PM +0300, Alexander Lakhin wrote:
I've found a way to trigger another assertion, but I don't think it's legitimate: --- a/src/backend/libpq/pqcomm.c +++ b/src/backend/libpq/pqcomm.c @@ -880,7 +880,7 @@ RemoveSocketFiles(void) static void socket_set_nonblocking(bool nonblocking) { - if (MyProcPort == NULL) + if ((MyProcPort == NULL) || (rand() % 10 == 0)) ereport(ERROR, (errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST), errmsg("there is no client connection")));
Server-side error injection. Noice.
trigger
psql: common.c:2055: ExecQueryAndProcessResults: Assertion
`pset.piped_syncs > 0' failed.
This one would be in the same spirit as the others, if we cannot
really guarantee that the counters will be correct all the time we can
just more more defensive. That's an error thrown while the sync
message is processing itself, causing piped_syncs to get out of step.
Probably there could be another way to throw an ERROR on \syncpipeline,
but I have no good idea yet.
There is one challenge here, as far as I can see: libpq does not
really offer a way to make the difference between this thrown error
and an error that comes from a Sync, so it seems like we cannot do
much on the psql side except be more defensive? I am not sure if this
is worth the extra facility in libpq, the point would be moot in the
back branches anyway. And there is a benefit in keeping the psql code
as simple as possible, as well, so I'd tend to keep it more useful
still simpler.
Running psql_pipeline in a loop with the above modification applied:
for i in {1..1000}; do echo "ITERATION $i"; NO_TEMP_INSTALL=1 TESTS=psql_pipeline make -s check-tests; done
I also observed the test hanging (at iterations 284. 543, 218) due to loss
of synchronization between psql and postgres.
I have looked at that as well, and I don't think that this is fixable
only from the point of psql, because the error injected creates a
state where libpq's internal command queue gets out of sync regarding
what the backend has sent. The only thing that could be done is
inside libpq, as far as I can see, where we should try to detect that
the state is not synchronized anymore and fail rather than block. So
IMO, and with the error injected (which would never happen in
production in practice), the best thing I can come up with is the
attached for now.
One thing that I could see ourselves do as an extra improvement in
ExecQueryAndProcessResults() where we consume the results and check if
we're still in a busy state (some PQconsumeInput+PQisBusy). I don't
think that this should be a problem in practice, but this feels like
just hiding the real problem on the libpq side with the inconsistent
protocol state generated by the backend. I have also quickly tested
an approach based on that, unfortunately this leads to some
instability in the tests to due the async nature of the commands.
Anyway, the v3 attached passes the regression tests, handles the
pg_terminate_backend() case gracefully, handles the error case with
the error injected on backend-side a but better, and can avoid
some of the issues in the fourth case, but not all as we don't have
access to the pipe state when reaching the results do to the backend
missing up with the libpq state. Handling the 4th case more
gracefully would require some libpq changes, which may not justify the
cases we are dealing with here, at least to me. As a whole, I'd feel
that v3 is a good improvement in itself, and it addresses your
original issues and the assertions.
What do you think?
--
Michael
Attachments:
v3-0001-psql-Fix-issues-with-deferred-errors-in-pipelines.patchtext/plain; charset=us-asciiDownload+245-19
Hello Michael,
01.06.2026 05:11, Michael Paquier пишет:
I have looked at that as well, and I don't think that this is fixable
only from the point of psql, because the error injected creates a
state where libpq's internal command queue gets out of sync regarding
what the backend has sent. The only thing that could be done is
inside libpq, as far as I can see, where we should try to detect that
the state is not synchronized anymore and fail rather than block. So
IMO, and with the error injected (which would never happen in
production in practice), the best thing I can come up with is the
attached for now.One thing that I could see ourselves do as an extra improvement in
ExecQueryAndProcessResults() where we consume the results and check if
we're still in a busy state (some PQconsumeInput+PQisBusy). I don't
think that this should be a problem in practice, but this feels like
just hiding the real problem on the libpq side with the inconsistent
protocol state generated by the backend. I have also quickly tested
an approach based on that, unfortunately this leads to some
instability in the tests to due the async nature of the commands.Anyway, the v3 attached passes the regression tests, handles the
pg_terminate_backend() case gracefully, handles the error case with
the error injected on backend-side a but better, and can avoid
some of the issues in the fourth case, but not all as we don't have
access to the pipe state when reaching the results do to the backend
missing up with the libpq state. Handling the 4th case more
gracefully would require some libpq changes, which may not justify the
cases we are dealing with here, at least to me. As a whole, I'd feel
that v3 is a good improvement in itself, and it addresses your
original issues and the assertions.What do you think?
I agree with your points, the v3 looks good to me. Thank you for paying
attention to all of these issues!
Best regards,
Alexander
On Mon, Jun 01, 2026 at 08:00:01AM +0300, Alexander Lakhin wrote:
I agree with your points, the v3 looks good to me. Thank you for paying
attention to all of these issues!
Okay, thanks!
--
Michael
On Mon, Jun 01, 2026 at 08:00:01AM +0300, Alexander Lakhin wrote:
I agree with your points, the v3 looks good to me. Thank you for paying
attention to all of these issues!
And now applied down to v18 as of d21604e17e49. Thanks, Alexander.
--
Michael