Assertion failure in pgbench

Started by Fujii Masao9 months ago11 messageshackers
Jump to latest
#1Fujii Masao
masao.fujii@gmail.com

Hi,

I encountered the following assertion failure in pgbench on the current master:

Assertion failed: (res == ((void*)0)), function discardUntilSync,
file pgbench.c, line 3515.
Abort trap: 6

This can be reliably reproduced with the following steps:

------------------------
$ psql -c "ALTER SYSTEM SET default_transaction_isolation TO 'serializable'"

$ psql -c "SELECT pg_reload_conf()"

$ pgbench -i

$ cat test.sql
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
\startpipeline
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
\endpipeline

$ pgbench -f test.sql -c 10 -j 10 -T 60 -M extended
------------------------

Even without a custom script, shutting down the server with
immediate mode while running "pgbench -c 10 -j 10 -T 60" could
trigger the same assertion, though not always reliably.

/* receive PGRES_PIPELINE_SYNC and null following it */
for (;;)
{
PGresult *res = PQgetResult(st->con);

if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
{
PQclear(res);
res = PQgetResult(st->con);
Assert(res == NULL);
break;
}
PQclear(res);
}

The failure occurs in this code. This code assumes that PGRES_PIPELINE_SYNC
is always followed by a NULL. However, it seems that another
PGRES_PIPELINE_SYNC can appear consecutively, which violates that assumption
and causes the assertion to fail. Thought?

Regards.

--
Fujii Masao

#2Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Fujii Masao (#1)
Re: Assertion failure in pgbench

Hi,

I encountered the following assertion failure in pgbench on the current master:

Assertion failed: (res == ((void*)0)), function discardUntilSync,
file pgbench.c, line 3515.
Abort trap: 6

This can be reliably reproduced with the following steps:

------------------------
$ psql -c "ALTER SYSTEM SET default_transaction_isolation TO 'serializable'"

$ psql -c "SELECT pg_reload_conf()"

$ pgbench -i

$ cat test.sql
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
\startpipeline
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
\endpipeline

$ pgbench -f test.sql -c 10 -j 10 -T 60 -M extended
------------------------

Even without a custom script, shutting down the server with
immediate mode while running "pgbench -c 10 -j 10 -T 60" could
trigger the same assertion, though not always reliably.

/* receive PGRES_PIPELINE_SYNC and null following it */
for (;;)
{
PGresult *res = PQgetResult(st->con);

if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
{
PQclear(res);
res = PQgetResult(st->con);
Assert(res == NULL);
break;
}
PQclear(res);
}

The failure occurs in this code. This code assumes that PGRES_PIPELINE_SYNC
is always followed by a NULL. However, it seems that another
PGRES_PIPELINE_SYNC can appear consecutively, which violates that assumption
and causes the assertion to fail. Thought?

Yes. When an error occurs and an error response message returned from
backend, pgbench will send one more sync message, then sends ROLLBACK
if necessary. I think the code above should be changed to call
PQgetResult repeatably until it returns NULL.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tatsuo Ishii (#2)
Re: Assertion failure in pgbench

Hi,

I encountered the following assertion failure in pgbench on the current master:

Assertion failed: (res == ((void*)0)), function discardUntilSync,
file pgbench.c, line 3515.
Abort trap: 6

This can be reliably reproduced with the following steps:

------------------------
$ psql -c "ALTER SYSTEM SET default_transaction_isolation TO 'serializable'"

$ psql -c "SELECT pg_reload_conf()"

$ pgbench -i

$ cat test.sql
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
\startpipeline
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
\endpipeline

$ pgbench -f test.sql -c 10 -j 10 -T 60 -M extended
------------------------

Even without a custom script, shutting down the server with
immediate mode while running "pgbench -c 10 -j 10 -T 60" could
trigger the same assertion, though not always reliably.

/* receive PGRES_PIPELINE_SYNC and null following it */
for (;;)
{
PGresult *res = PQgetResult(st->con);

if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
{
PQclear(res);
res = PQgetResult(st->con);
Assert(res == NULL);
break;
}
PQclear(res);
}

The failure occurs in this code. This code assumes that PGRES_PIPELINE_SYNC
is always followed by a NULL. However, it seems that another
PGRES_PIPELINE_SYNC can appear consecutively, which violates that assumption
and causes the assertion to fail. Thought?

Yes. When an error occurs and an error response message returned from
backend, pgbench will send one more sync message, then sends ROLLBACK
if necessary. I think the code above should be changed to call
PQgetResult repeatably until it returns NULL.

Correction. That would not be a proper fix. Just removing inner
PQgetResult and the Assert is enough?

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#4Stepan Neretin
slpmcf@gmail.com
In reply to: Tatsuo Ishii (#3)
Re: Assertion failure in pgbench

On Thu, Jul 31, 2025 at 9:03 AM Tatsuo Ishii <ishii@postgresql.org> wrote:

Hi,

I encountered the following assertion failure in pgbench on the current

master:

Assertion failed: (res == ((void*)0)), function discardUntilSync,
file pgbench.c, line 3515.
Abort trap: 6

This can be reliably reproduced with the following steps:

------------------------
$ psql -c "ALTER SYSTEM SET default_transaction_isolation TO

'serializable'"

$ psql -c "SELECT pg_reload_conf()"

$ pgbench -i

$ cat test.sql
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
\startpipeline
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =

:aid;

SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid =

:tid;

UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid =

:bid;

INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES
(:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
\endpipeline

$ pgbench -f test.sql -c 10 -j 10 -T 60 -M extended
------------------------

Even without a custom script, shutting down the server with
immediate mode while running "pgbench -c 10 -j 10 -T 60" could
trigger the same assertion, though not always reliably.

/* receive PGRES_PIPELINE_SYNC and null following it */
for (;;)
{
PGresult *res = PQgetResult(st->con);

if (PQresultStatus(res) == PGRES_PIPELINE_SYNC)
{
PQclear(res);
res = PQgetResult(st->con);
Assert(res == NULL);
break;
}
PQclear(res);
}

The failure occurs in this code. This code assumes that

PGRES_PIPELINE_SYNC

is always followed by a NULL. However, it seems that another
PGRES_PIPELINE_SYNC can appear consecutively, which violates that

assumption

and causes the assertion to fail. Thought?

Yes. When an error occurs and an error response message returned from
backend, pgbench will send one more sync message, then sends ROLLBACK
if necessary. I think the code above should be changed to call
PQgetResult repeatably until it returns NULL.

Correction. That would not be a proper fix. Just removing inner
PQgetResult and the Assert is enough?

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Hi, Tatsuo.
Do you understand why there is an assertion error in the immediate shutdown
case?
Best Regards,
Stepan Neretin

#5Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Stepan Neretin (#4)
Re: Assertion failure in pgbench

Hi, Tatsuo.
Do you understand why there is an assertion error in the immediate shutdown
case?

No. I was not able to reproduce the case so far.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#6Fujii Masao
masao.fujii@gmail.com
In reply to: Tatsuo Ishii (#3)
Re: Assertion failure in pgbench

On Thu, Jul 31, 2025 at 11:03 AM Tatsuo Ishii <ishii@postgresql.org> wrote:

Yes. When an error occurs and an error response message returned from
backend, pgbench will send one more sync message, then sends ROLLBACK
if necessary. I think the code above should be changed to call
PQgetResult repeatably until it returns NULL.

I was thinking the same. The attached patch implements that approach,
and it seems to eliminate the assertion failure.

Correction. That would not be a proper fix. Just removing inner
PQgetResult and the Assert is enough?

Could you explain why you think repeatedly calling PQgetResult
until it returns NULL isn't the right fix?

Regards,

--
Fujii Masao

Attachments:

v1-0001-Fix-assertion-failure-in-pgbench-when-handling-mu.patchapplication/octet-stream; name=v1-0001-Fix-assertion-failure-in-pgbench-when-handling-mu.patchDownload+15-3
#7Fujii Masao
masao.fujii@gmail.com
In reply to: Tatsuo Ishii (#5)
Re: Assertion failure in pgbench

On Thu, Jul 31, 2025 at 11:56 AM Tatsuo Ishii <ishii@postgresql.org> wrote:

Hi, Tatsuo.
Do you understand why there is an assertion error in the immediate shutdown
case?

No. I was not able to reproduce the case so far.

I would be mistaken here, as I haven't been able to reproduce
the issue (i.e., assertion failure by immediate shutdown)
since my earlier post. Sorry for the noise.

Regards,

--
Fujii Masao

#8Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Fujii Masao (#6)
Re: Assertion failure in pgbench

On Thu, Jul 31, 2025 at 11:03 AM Tatsuo Ishii <ishii@postgresql.org> wrote:

Yes. When an error occurs and an error response message returned from
backend, pgbench will send one more sync message, then sends ROLLBACK
if necessary. I think the code above should be changed to call
PQgetResult repeatably until it returns NULL.

I was thinking the same. The attached patch implements that approach,
and it seems to eliminate the assertion failure.

The patch looks good to me and I confirmed it fixes the the assertion
failure.

Correction. That would not be a proper fix. Just removing inner
PQgetResult and the Assert is enough?

Could you explain why you think repeatedly calling PQgetResult
until it returns NULL isn't the right fix?

Sorry, that was my thinko. I should have had more coffee.
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

#9Fujii Masao
masao.fujii@gmail.com
In reply to: Tatsuo Ishii (#8)
Re: Assertion failure in pgbench

On Fri, Aug 1, 2025 at 9:03 AM Tatsuo Ishii <ishii@postgresql.org> wrote:

On Thu, Jul 31, 2025 at 11:03 AM Tatsuo Ishii <ishii@postgresql.org> wrote:

Yes. When an error occurs and an error response message returned from
backend, pgbench will send one more sync message, then sends ROLLBACK
if necessary. I think the code above should be changed to call
PQgetResult repeatably until it returns NULL.

I was thinking the same. The attached patch implements that approach,
and it seems to eliminate the assertion failure.

The patch looks good to me and I confirmed it fixes the the assertion
failure.

Thanks for the review and testing!

I've updated the commit message and attached a revised version of the patch.

This assertion failure can occur when retriable errors (like
serialization errors) happen while using pipeline mode. Since
this issue exists from v15 onward, the fix should be back-patched
to v15. I’ve also attached a version of the patch that applies
cleanly to v15 and v16, as the master patch doesn’t apply cleanly
to those branches.

Unless there are any objections, I plan to commit this and
back-patch to v15.

Regards,

--
Fujii Masao

Attachments:

v2-0001-PG15_PG16-Fix-assertion-failure-in-pgbench-when-handling-mu.patchapplication/octet-stream; name=v2-0001-PG15_PG16-Fix-assertion-failure-in-pgbench-when-handling-mu.patchDownload+10-3
v2-0001-Fix-assertion-failure-in-pgbench-when-handling-mu.patchapplication/octet-stream; name=v2-0001-Fix-assertion-failure-in-pgbench-when-handling-mu.patchDownload+15-3
#10Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#9)
Re: Assertion failure in pgbench

On Fri, Aug 1, 2025 at 8:55 PM Fujii Masao <masao.fujii@gmail.com> wrote:

Unless there are any objections, I plan to commit this and
back-patch to v15.

I've pushed the patch. Thanks!

Regards,

--
Fujii Masao

#11Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Fujii Masao (#10)
Re: Assertion failure in pgbench

On Fri, Aug 1, 2025 at 8:55 PM Fujii Masao <masao.fujii@gmail.com> wrote:

Unless there are any objections, I plan to commit this and
back-patch to v15.

I've pushed the patch. Thanks!

Thanks. Great!
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp