BUG #19500: pgrepack logical decoding plugin can crash assert builds via SQL decoding API
The following bug has been logged on the website:
Bug reference: 19500
Logged by: Nikita Kalinin
Email address: n.kalinin@postgrespro.ru
PostgreSQL version: 18.4
Operating system: Fedora 44
Description:
Hi,
It appears that the pgrepack output plugin is accessible through the SQL
logical decoding API, even though the plugin code explicitly indicates that
this interface is not supported. Reading changes from such a slot can cause
a backend process crash in builds with asserts enabled.
The crash is reproducible on the current master branch. Since the web form
does not allow selecting master, I selected the latest available released
version instead.
Steps to reproduce:
CREATE TABLE rp(a int);
SELECT *
FROM pg_create_logical_replication_slot('s_repack', 'pgrepack');
INSERT INTO rp VALUES (1);
SELECT *
FROM pg_logical_slot_get_binary_changes('s_repack', NULL, NULL);
Server log:
2026-05-28 21:32:23.185 +07 [142878] STATEMENT: SELECT *
FROM pg_create_logical_replication_slot('s_repack', 'pgrepack');
TRAP: failed Assert("RelationGetRelid(relation) == private->relid"), File:
"pgrepack.c", Line: 100, PID: 142878
postgres: nkpit postgres [local] SELECT(ExceptionalCondition+0x57)
[0xa2d437]
/tmp/pg/lib/postgresql/pgrepack.so(+0xa99) [0x7f7dd9332a99]
Backtrace:
#0 __pthread_kill_implementation (threadid=<optimized out>,
signo=signo@entry=6,
no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007f7dd807a8d3 in __pthread_kill_internal (threadid=<optimized out>,
signo=6)
at pthread_kill.c:89
#2 0x00007f7dd801f48e in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#3 0x00007f7dd80067b3 in __GI_abort () at abort.c:77
#4 0x0000000000a2d458 in ExceptionalCondition (
conditionName=conditionName@entry=0x7f7dd93337c8
"RelationGetRelid(relation) == private->relid",
fileName=fileName@entry=0x7f7dd93337f5 "pgrepack.c",
lineNumber=lineNumber@entry=100) at assert.c:65
#5 0x00007f7dd9332a99 in repack_process_change (ctx=<optimized out>,
txn=<optimized out>, relation=<optimized out>, change=<optimized out>)
at pgrepack.c:100
#6 0x0000000000821223 in change_cb_wrapper (cache=<optimized out>,
txn=<optimized out>, relation=<optimized out>, change=<optimized out>)
at logical.c:1111
#7 0x000000000082d91b in ReorderBufferApplyChange (rb=<optimized out>,
txn=<optimized out>, relation=0x7f7dd848f7e8, change=0x29f71f10,
streaming=false)
at reorderbuffer.c:2080
#8 ReorderBufferProcessTXN (rb=0x29f55f20, txn=0x29f49e30,
commit_lsn=25673024,
snapshot_now=<optimized out>, command_id=command_id@entry=0,
streaming=streaming@entry=false) at reorderbuffer.c:2387
#9 0x000000000082dca9 in ReorderBufferReplay (txn=<optimized out>,
rb=<optimized out>, commit_lsn=<optimized out>, end_lsn=<optimized out>,
commit_time=<optimized out>, origin_id=<optimized out>, origin_lsn=0,
xid=<optimized out>) at reorderbuffer.c:2872
#10 0x000000000082ea38 in ReorderBufferCommit (rb=<optimized out>,
xid=<optimized out>, commit_lsn=<optimized out>, end_lsn=<optimized
out>,
commit_time=<optimized out>, origin_id=<optimized out>,
origin_lsn=<optimized out>)
at reorderbuffer.c:2896
#11 0x000000000081d075 in DecodeCommit (ctx=0x29f3de70, buf=0x7ffe263bd7e0,
parsed=0x7ffe263bd630, xid=695, two_phase=false) at decode.c:755
#12 xact_decode (ctx=0x29f3de70, buf=0x7ffe263bd7e0) at decode.c:254
#13 0x000000000081cbaa in LogicalDecodingProcessRecord
(ctx=ctx@entry=0x29f3de70,
record=<optimized out>) at decode.c:117
#14 0x0000000000823b71 in pg_logical_slot_get_changes_guts
(fcinfo=0x29f2d400,
confirm=confirm@entry=true, binary=binary@entry=true) at
logicalfuncs.c:267
#15 0x0000000000823d13 in pg_logical_slot_get_binary_changes
(fcinfo=<optimized out>)
at logicalfuncs.c:354
#16 0x00000000006b7cb5 in ExecMakeTableFunctionResult (setexpr=0x29f279c8,
econtext=0x29f27818, argContext=<optimized out>,
expectedDesc=0x29f2f348,
randomAccess=false) at execSRF.c:235
#17 0x00000000006ccad7 in FunctionNext (node=0x29f27608) at
nodeFunctionscan.c:95
#18 0x00000000006ac21a in ExecProcNode (node=0x29f27608)
at ../../../src/include/executor/executor.h:327
#19 ExecutePlan (queryDesc=0x29e40780, operation=CMD_SELECT,
sendTuples=true,
numberTuples=0, direction=<optimized out>, dest=0x29f29618) at
execMain.c:1736
#20 standard_ExecutorRun (queryDesc=0x29e40780, direction=<optimized out>,
count=0)
at execMain.c:377
#21 0x00000000008c5f98 in PortalRunSelect (portal=portal@entry=0x29eb7130,
forward=forward@entry=true, count=0, count@entry=9223372036854775807,
dest=dest@entry=0x29f29618) at pquery.c:917
#22 0x00000000008c767e in PortalRun (portal=portal@entry=0x29eb7130,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
dest=dest@entry=0x29f29618, altdest=altdest@entry=0x29f29618,
qc=qc@entry=0x7ffe263bdd50) at pquery.c:761
#23 0x00000000008c3308 in exec_simple_query (
query_string=0x29e13800 "SELECT *\n FROM
pg_logical_slot_get_binary_changes('s_repack', NULL, NULL);") at
postgres.c:1290
#24 0x00000000008c4de1 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4856
#25 0x00000000008beddd in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at backend_startup.c:124
#26 0x00000000007fed6e in postmaster_child_launch (child_type=<optimized
out>,
child_slot=1, startup_data=startup_data@entry=0x7ffe263be1a0,
startup_data_len=startup_data_len@entry=24,
client_sock=client_sock@entry=0x7ffe263be1c0) at launch_backend.c:268
#27 0x0000000000802776 in BackendStartup (client_sock=0x7ffe263be1c0)
at postmaster.c:3627
#28 ServerLoop () at postmaster.c:1728
#29 0x0000000000804239 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x29dbcfe0) at postmaster.c:1415
#30 0x00000000004a1b48 in main (argc=3, argv=0x29dbcfe0) at main.c:231
postgres=# select version();
version
-------------------------------------------------------------------------------------------------------------
PostgreSQL 19devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 16.1.1
20260515 (Red Hat 16.1.1-2), 64-bit
(1 row)
Is this considered normal behavior for the pgrepack plugin, i.e. essentially
a “don’t do that” situation?
Hi,
On 2026-05-28, PG Bug reporting form wrote:
It appears that the pgrepack output plugin is accessible through the SQL
logical decoding API, even though the plugin code explicitly indicates that
this interface is not supported. Reading changes from such a slot can cause
a backend process crash in builds with asserts enabled.
Is this considered normal behavior for the pgrepack plugin, i.e. essentially
a “don’t do that” situation?
Yeah, I would like to have a way to prevent this, if only for user-friendliness, but it's not terribly pressing since only a role with REPLICATION privs can create the replication slot, which as I recall are already pretty powerful.
--
Álvaro Herrera
On 2026-May-29, Álvaro Herrera wrote:
On 2026-05-28, PG Bug reporting form wrote:
It appears that the pgrepack output plugin is accessible through the
SQL logical decoding API, even though the plugin code explicitly
indicates that this interface is not supported. Reading changes from
such a slot can cause a backend process crash in builds with asserts
enabled.Yeah, I would like to have a way to prevent this, if only for
user-friendliness, but it's not terribly pressing since only a role
with REPLICATION privs can create the replication slot, which as I
recall are already pretty powerful.
How about something like this? It makes your test case throw an error
instead of failing the assertion, which I suppose is an improvement.
The patch is a bit noisy because I moved more code than the minimum
necessary; but the gist of it is that we allocate RepackDecodingState in
repack_startup(), then have repack_setup_logical_decoding() fill in a
magic number, which we later check in repack_begin_txn(). This is a bit
wasteful, because we have to do that check once for each and every
transaction; however I see no other callback that would let us do this
kind of check after the slot is created but before we start to consume
from it.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
Attachments:
0001-Have-RepackDecodingState-carry-a-magic-number.patchtext/x-diff; charset=utf-8Download+60-43
On 2 Jun 2026, at 00:12, Álvaro Herrera <alvherre@kurilemu.de> wrote:
How about something like this? It makes your test case throw an error
instead of failing the assertion, which I suppose is an improvement.The patch is a bit noisy because I moved more code than the minimum
necessary; but the gist of it is that we allocate RepackDecodingState in
repack_startup(), then have repack_setup_logical_decoding() fill in a
magic number, which we later check in repack_begin_txn(). This is a bit
wasteful, because we have to do that check once for each and every
transaction; however I see no other callback that would let us do this
kind of check after the slot is created but before we start to consume
from it.--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Before you were born your parents weren't as boring as they are now. They
got that way paying your bills, cleaning up your room and listening to you
tell them how idealistic you are." -- Charles J. Sykes' advice to teenagers
<0001-Have-RepackDecodingState-carry-a-magic-number.patch>
Yes, I agree that returning an error to the user makes sense.
But does the error message need to be that detailed? Perhaps something like
"ERROR: wrong magic number in "pgrepack" decoder plugin"
would be sufficient.
Nevertheless, I tested the patch and can confirm that there are no assertion failures anymore.
I also ran it under ASAN and did not observe any issues.
Would it make sense to add a test for this case from the bug report?
Hi Nikita,
On 2026-Jun-02, Никита Калинин wrote:
On 2 Jun 2026, at 00:12, Álvaro Herrera <alvherre@kurilemu.de> wrote:
But does the error message need to be that detailed? Perhaps something like
"ERROR: wrong magic number in "pgrepack" decoder plugin"
would be sufficient.
Maybe. Getting 0x00000000 would be quite different from 0x7f7f7f7f for
instance, or a completely random number, so I don't want to judge ahead
of time.
Nevertheless, I tested the patch and can confirm that there are no
assertion failures anymore.I also ran it under ASAN and did not observe any issues.
Thanks for testing it.
Would it make sense to add a test for this case from the bug report?
Sure, I would do that.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Saca el libro que tu religión considere como el indicado para encontrar la
oración que traiga paz a tu alma. Luego rebootea el computador
y ve si funciona" (Carlos Duclós)
Hi Álvaro,
On Mon, Jun 1, 2026 at 10:43 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2026-May-29, Álvaro Herrera wrote:
On 2026-05-28, PG Bug reporting form wrote:
It appears that the pgrepack output plugin is accessible through the
SQL logical decoding API, even though the plugin code explicitly
indicates that this interface is not supported. Reading changes from
such a slot can cause a backend process crash in builds with asserts
enabled.Yeah, I would like to have a way to prevent this, if only for
user-friendliness, but it's not terribly pressing since only a role
with REPLICATION privs can create the replication slot, which as I
recall are already pretty powerful.How about something like this? It makes your test case throw an error
instead of failing the assertion, which I suppose is an improvement.The patch is a bit noisy because I moved more code than the minimum
necessary; but the gist of it is that we allocate RepackDecodingState in
repack_startup(), then have repack_setup_logical_decoding() fill in a
magic number, which we later check in repack_begin_txn(). This is a bit
wasteful, because we have to do that check once for each and every
transaction; however I see no other callback that would let us do this
kind of check after the slot is created but before we start to consume
from it.
The magic guard is correct. One thing worth noting: the check is in the
begin callback, which fires only at the transaction's commit, so a single
large transaction (a bulk load) is decoded in full and buffered, spilling to
disk past logical_decoding_work_mem, before the plugin rejects it.
That work is then thrown away. It's a misuse path, so this might not be
a big concern, I guess, but it does mean the wasted work scales with
the transaction size rather than being negligible.
Could we reject the pgrepack plugin at slot creation instead, in
pg_create_logical_replication_slot() and the CREATE_REPLICATION_SLOT
command, so misuse gets a clear "reserved for REPACK (CONCURRENTLY)"
error up front, before any decoding? REPACK creates its slot directly via
ReplicationSlotCreate(), so it's unaffected, and the begin-callback check
with magic guard can stay as the internal safety net.
Happy to be told this isn't worth special-casing :)
I attached the patch which brings the above behaviour, this patch in on top
of your patch.
Thoughts?
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Attachments:
v1-0002-Reject-the-pgrepack-output-plugin-outside-REPACK-CON.patchapplication/octet-stream; name=v1-0002-Reject-the-pgrepack-output-plugin-outside-REPACK-CON.patchDownload+33-1
Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Could we reject the pgrepack plugin at slot creation instead, in
pg_create_logical_replication_slot() and the CREATE_REPLICATION_SLOT
command, so misuse gets a clear "reserved for REPACK (CONCURRENTLY)"
error up front, before any decoding? REPACK creates its slot directly via
ReplicationSlotCreate(), so it's unaffected, and the begin-callback check
with magic guard can stay as the internal safety net.
Happy to be told this isn't worth special-casing :)
Another possible approach: restrict the use of the plugin to the REPACK
decoding worker.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Attachments:
repack_reserve_plugin_name.difftext/x-diffDownload+14-3
On Wed, Jun 3, 2026 at 1:00 PM Antonin Houska <ah@cybertec.at> wrote:
Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Could we reject the pgrepack plugin at slot creation instead, in
pg_create_logical_replication_slot() and the CREATE_REPLICATION_SLOT
command, so misuse gets a clear "reserved for REPACK (CONCURRENTLY)"
error up front, before any decoding? REPACK creates its slot directly via
ReplicationSlotCreate(), so it's unaffected, and the begin-callback check
with magic guard can stay as the internal safety net.
Happy to be told this isn't worth special-casing :)Another possible approach: restrict the use of the plugin to the REPACK
decoding worker.
cool ... that's cleaner, incorporated these changes and added test
,errcode.
--
Thanks :)
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
Attachments:
v2-0002-Reject-the-pgrepack-output-plugin-outside-REPACK-CON.patchapplication/octet-stream; name=v2-0002-Reject-the-pgrepack-output-plugin-outside-REPACK-CON.patchDownload+23-4
On 2026-Jun-03, Antonin Houska wrote:
Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Could we reject the pgrepack plugin at slot creation instead, in
pg_create_logical_replication_slot() and the CREATE_REPLICATION_SLOT
command, so misuse gets a clear "reserved for REPACK (CONCURRENTLY)"
error up front, before any decoding? REPACK creates its slot directly via
ReplicationSlotCreate(), so it's unaffected, and the begin-callback check
with magic guard can stay as the internal safety net.
Happy to be told this isn't worth special-casing :)Another possible approach: restrict the use of the plugin to the REPACK
decoding worker.
I don't like either of these approaches, because they are forcing the
generic facility (either slot creation or logical decoding setup) to
know something about one specific user of the facility. That is to say,
the restriction is being added on the wrong side of the abstraction.
I know my implementation the drawback you (Srinath) mentioned, because
the abstraction doesn't provide us with a great way to inject an error
report at the exact spot we need it; but I think it's at the correct
side of the abstraction.
(I'm not really sure that there _is_ a great way to throw an error
report at the right time. That would require every single output plugin
author to add a function we can call; and every single one of them,
except REPACK, would do nothing. This seems quite pointless.)
I frankly don't have a problem with letting a transaction spill a few
GBs to disk only to then report an error that pgrepack is being misused.
It's just not something that anyone would do for fun.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Alvaro Herrera <alvherre@kurilemu.de> wrote:
On 2026-Jun-03, Antonin Houska wrote:
Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Could we reject the pgrepack plugin at slot creation instead, in
pg_create_logical_replication_slot() and the CREATE_REPLICATION_SLOT
command, so misuse gets a clear "reserved for REPACK (CONCURRENTLY)"
error up front, before any decoding? REPACK creates its slot directly via
ReplicationSlotCreate(), so it's unaffected, and the begin-callback check
with magic guard can stay as the internal safety net.
Happy to be told this isn't worth special-casing :)Another possible approach: restrict the use of the plugin to the REPACK
decoding worker.I don't like either of these approaches, because they are forcing the
generic facility (either slot creation or logical decoding setup) to
know something about one specific user of the facility. That is to say,
the restriction is being added on the wrong side of the abstraction.
I know my implementation the drawback you (Srinath) mentioned, because
the abstraction doesn't provide us with a great way to inject an error
report at the exact spot we need it; but I think it's at the correct
side of the abstraction.
I noticed that ReplicationSlotAcquire() already does something like that
/*
* Do not allow users to acquire the reserved slot. This scenario may
* occur if the launcher that owns the slot has terminated unexpectedly
* due to an error, and a backend process attempts to reuse the slot.
*/
if (!IsLogicalLauncher() && IsSlotForConflictCheck(name))
ereport(ERROR,
errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg("cannot acquire replication slot \"%s\"", name),
errdetail("The slot is reserved for conflict detection and can only be acquired by logical replication launcher."));
but I agree that it's not perfect to hard-wire particular slot names into
functions like this. Perhaps we could introduce a concept of "reserved slots"
and an API (callback) to perform these checks, but that's not appropriate for
beta release.
(I'm not really sure that there _is_ a great way to throw an error
report at the right time. That would require every single output plugin
author to add a function we can call; and every single one of them,
except REPACK, would do nothing. This seems quite pointless.)I frankly don't have a problem with letting a transaction spill a few
GBs to disk only to then report an error that pgrepack is being misused.
It's just not something that anyone would do for fun.
I admit that the possibility of wasted processing of a transaction didn't
really frighten me. The idea I posted just occurred to me somehow, but I don't
consider it urgent. I'm fine with your approach.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
On Thu, Jun 4, 2026 at 2:50 AM Alvaro Herrera <alvherre@kurilemu.de> wrote:
On 2026-Jun-03, Antonin Houska wrote:
Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Could we reject the pgrepack plugin at slot creation instead, in
pg_create_logical_replication_slot() and the CREATE_REPLICATION_SLOT
command, so misuse gets a clear "reserved for REPACK (CONCURRENTLY)"
error up front, before any decoding? REPACK creates its slot directlyvia
ReplicationSlotCreate(), so it's unaffected, and the begin-callback
check
with magic guard can stay as the internal safety net.
Happy to be told this isn't worth special-casing :)Another possible approach: restrict the use of the plugin to the REPACK
decoding worker.I don't like either of these approaches, because they are forcing the
generic facility (either slot creation or logical decoding setup) to
know something about one specific user of the facility. That is to say,
the restriction is being added on the wrong side of the abstraction.
I know my implementation the drawback you (Srinath) mentioned, because
the abstraction doesn't provide us with a great way to inject an error
report at the exact spot we need it; but I think it's at the correct
side of the abstraction.
(I'm not really sure that there _is_ a great way to throw an error
report at the right time. That would require every single output plugin
author to add a function we can call; and every single one of them,
except REPACK, would do nothing. This seems quite pointless.)I frankly don't have a problem with letting a transaction spill a few
GBs to disk only to then report an error that pgrepack is being misused.
It's just not something that anyone would do for fun.
makes sense, we can go with your approach, thanks for
the clarification.
--
Thanks :)
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
On Thursday, June 4, 2026 5:03 AM Alvaro Herrera <alvherre@kurilemu.de> wrote:
On 2026-Jun-03, Antonin Houska wrote:
Srinath Reddy Sadipiralla <srinath2133@gmail.com> wrote:
Could we reject the pgrepack plugin at slot creation instead, in
pg_create_logical_replication_slot() and the CREATE_REPLICATION_SLOT
command, so misuse gets a clear "reserved for REPACK(CONCURRENTLY)"
error up front, before any decoding? REPACK creates its slot
directly via ReplicationSlotCreate(), so it's unaffected, and the
begin-callback check with magic guard can stay as the internal safety net.
Happy to be told this isn't worth special-casing :)Another possible approach: restrict the use of the plugin to the
REPACK decoding worker.I don't like either of these approaches, because they are forcing the generic
facility (either slot creation or logical decoding setup) to know something
about one specific user of the facility. That is to say, the restriction is being
added on the wrong side of the abstraction.
I know my implementation the drawback you (Srinath) mentioned, because
the abstraction doesn't provide us with a great way to inject an error report at
the exact spot we need it; but I think it's at the correct side of the abstraction.
I have no objection to the proposed approach. But I would like to confirm
whether reporting an ERROR in the startup callback (when the context is not a
REPACK decoding worker) is considered acceptable.
Like:
repack_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
bool is_init)
...
if (!AmRepackWorker())
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("this plugin can only be used by REPACK (CONCURRENTLY)"));
Best Regards,
Hou zj
On 2026-Jun-05, Zhijie Hou (Fujitsu) wrote:
I have no objection to the proposed approach. But I would like to confirm
whether reporting an ERROR in the startup callback (when the context is not a
REPACK decoding worker) is considered acceptable.Like:
repack_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt,
bool is_init)
...
if (!AmRepackWorker())
ereport(ERROR,
errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("this plugin can only be used by REPACK (CONCURRENTLY)"));
Hmm, yeah, that works for me, we can ditch the magic number then. I'm
considering something like the attached. I added the test case and
edited nearby comments. Will stare some more at it tomorrow ...
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Attachments:
v2-0001-Disallow-direct-use-of-the-pgrepack-plugin.patchtext/x-diff; charset=utf-8Download+79-62
On 2026-Jun-08, Alvaro Herrera wrote:
Hmm, yeah, that works for me, we can ditch the magic number then. I'm
considering something like the attached. I added the test case and
edited nearby comments. Will stare some more at it tomorrow ...
Pushed, thanks everyone.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"¿Cómo puedes confiar en algo que pagas y que no ves,
y no confiar en algo que te dan y te lo muestran?" (Germán Poo)