Reset the output buffer after sending from WalSndWriteData
Hi,
I recently stumbled over an issue with an unintentional re-transmission.
While this clearly was our fault in the output plugin code, I think the
walsender's exposed API could easily be hardened to prevent the bad
consequence from this mistake.
Does anything speak against the attached one line patch?
Best Regards
Markus
Attachments:
0001-Reset-the-output-buffer-after-sending-from-WalSndWri.patchtext/x-patch; charset=UTF-8; name=0001-Reset-the-output-buffer-after-sending-from-WalSndWri.patchDownload
From 65dd24ed975cc513e4e0de5e175dff16b4b0f6d4 Mon Sep 17 00:00:00 2001
From: Markus Wanner <markus.wanner@enterprisedb.com>
Date: Thu, 20 Feb 2025 21:01:45 +0100
Subject: [PATCH] Reset the output buffer after sending from WalSndWriteData
While not strictly necessary, clearing the buffer right away after
sending prevents an accidential re-delivery in case WalSndWriteData
is called again.
---
src/backend/replication/walsender.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 446d10c1a7d..96884ce152f 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1563,6 +1563,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
/* output previously gathered data in a CopyData packet */
pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
+ /* reset the output buffer to prevent re-sending */
+ resetStringInfo(ctx->out);
+
CHECK_FOR_INTERRUPTS();
/* Try to flush pending output to the client */
--
2.39.5
On Thu, Feb 20, 2025 at 12:50 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:
Hi,
I recently stumbled over an issue with an unintentional re-transmission.
While this clearly was our fault in the output plugin code, I think the
walsender's exposed API could easily be hardened to prevent the bad
consequence from this mistake.Does anything speak against the attached one line patch?
According to the documentation[1]https://www.postgresql.org/docs/devel/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-PLUGIN-OUTPUT, OutputPluginPrepareWrite() has to
be called before OutputPluginWrite(). When it comes to walsender
codes, we reset the ctx->out buffer in WalSndPrepareWrite() called via
OutputPluginPrepareWrite(). Could you share the case where you faced
the unintentional re-transmission error?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On 2/21/25 07:17, Masahiko Sawada wrote:
On Thu, Feb 20, 2025 at 12:50 PM Markus Wanner
<markus.wanner@enterprisedb.com> wrote:Hi,
I recently stumbled over an issue with an unintentional re-transmission.
While this clearly was our fault in the output plugin code, I think the
walsender's exposed API could easily be hardened to prevent the bad
consequence from this mistake.Does anything speak against the attached one line patch?
According to the documentation[1], OutputPluginPrepareWrite() has to
be called before OutputPluginWrite().
Sure, that's exactly what we forgot. My complaint is that it's a mistake
that's unnecessarily hard to spot and the API could be improved. There
is no good reason to keep the sent data in the ctx->out buffer. But
clearly, there's some danger to it.
(Note that it wasn't quite as simple as you may think, though. With an
error involved in the send/recv loop of the walsender invoked from
OutputPluginWrite. The error handling code in PG_CATCH then attempting
to flush the queue with OutputPluginWrite.)
Arguably, one could even say that replacing the call to resetStringInfo
in WalSndPrepareWrite with an Assert(ctx->out->len == 0) would catch a
variant of improper use. And another Assert in WalSndWriteData to check
OutputPluginPrepareWrite had properly been invoked before can't hurt,
either.
Best Regards
Markus
Attachments:
0002-Add-asserts-to-guide-to-proper-API-usage.patchtext/x-patch; charset=UTF-8; name=0002-Add-asserts-to-guide-to-proper-API-usage.patchDownload
From 5b9d6285252204d7109001fc42323add00ac773f Mon Sep 17 00:00:00 2001
From: Markus Wanner <markus.wanner@enterprisedb.com>
Date: Fri, 21 Feb 2025 09:00:28 +0100
Subject: [PATCH] Add asserts to guide to proper API usage
---
src/backend/replication/walsender.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 96884ce152f..b5d7baf83ad 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1523,7 +1523,7 @@ WalSndPrepareWrite(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xi
if (!last_write)
lsn = InvalidXLogRecPtr;
- resetStringInfo(ctx->out);
+ Assert(ctx->out->len == 0);
pq_sendbyte(ctx->out, 'w');
pq_sendint64(ctx->out, lsn); /* dataStart */
@@ -1549,6 +1549,8 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
{
TimestampTz now;
+ Assert(ctx->out->len >= 1 + 3 * sizeof(int64);
+
/*
* Fill the send timestamp last, so that it is taken as late as possible.
* This is somewhat ugly, but the protocol is set as it's already used for
--
2.39.5