Accept recovery conflict interrupt on blocked writing

Started by Anthonin Bonnefoyover 1 year ago8 messageshackers
Jump to latest
#1Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com

Hi,

I have replicas that have regular transient bursts of replay lag (>5
minutes). The events have the following symptoms:
- Replicas are using physical replication slot and hot_standby_feedback
- The lag recovers by itself after at most 15 minutes
- During the same timeframe, there's a query stuck in ClientWrite on
the replica for ~15 minutes (despite the 30s statement_timeout). The
matching client was terminated at the beginning when the server
started sending results and when the server process died, the replay
lag started to recover.

The 15 minutes timeout matches the default linux TCP retransmission
timeout[1]https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html. A similar situation can be triggered by using psql
pipelining patch[2]https://commitfest.postgresql.org/51/5407/.

# On the primary, generate constant updates:
echo "\set aid random(1, 100000 * :scale)
UPDATE pgbench_accounts SET bid=bid+1 WHERE aid=:aid;" > update.sql
pgbench -f update.sql -T900

-- On a replica:
\startpipeline
-- At least 2 select are needed to completely saturate socket buffers
select * from pgbench_accounts \bind \g
select * from pgbench_accounts \bind \g
-- Flush the commands to the server
\flush

After that, the queries are sent to the server but the client doesn't
consume the results and the backend process should be stuck in a
ClientWrite state. Eventually, usually after ~10s, WAL replay becomes
completely blocked (sometimes I need to redo the pipeline query if
nothing happens). Looking at the backtrace, the recovery process is
blocked on ResolveRecoveryConflictWithBufferPin. The conflicting
buffer pin is held by the pipelined query, currently blocked trying to
write the result to the socket which is completely saturated. During
this time, all interrupts are ignored and WAL replay won't be able to
recover until the socket becomes either writable or ends with an
error.

The same situation happened on my instances where a backend process
was sending results to a client that died without sending a FIN or a
RST. The only way for this process to die is to reach the TCP
retransmission timeout after 15m. During this time, it can conflict
with the recovery as it holds a buffer pin, possibly blocking the
recovery for the whole duration.

To avoid blocking recovery for an extended period of time, this patch
changes client write interrupts by handling recovery conflict
interrupts instead of ignoring them. Since the interrupt happens while
we're likely to have partially written results on the socket, there's
no easy way to keep protocol sync so the session needs to be
terminated.

Setting tcp_user_timeout is also a possible mitigation for users, but
my assumption of how conflict recovery is done is that it is not
desirable to block recovery for an extended period of time and it is
fine to be aggressive when the standby delay is exceeded.

[1]: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html
[2]: https://commitfest.postgresql.org/51/5407/

Regards,
Anthonin Bonnefoy

Attachments:

v01-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patchapplication/octet-stream; name=v01-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patchDownload+39-4
#2Thomas Munro
thomas.munro@gmail.com
In reply to: Anthonin Bonnefoy (#1)
Re: Accept recovery conflict interrupt on blocked writing

Bonjour Anthonin,

On Mon, Jan 13, 2025 at 11:31 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

To avoid blocking recovery for an extended period of time, this patch
changes client write interrupts by handling recovery conflict
interrupts instead of ignoring them. Since the interrupt happens while
we're likely to have partially written results on the socket, there's
no easy way to keep protocol sync so the session needs to be
terminated.

Right. Before commit 0da096d7 in v17, the recovery conflict code
running in a signal handler would have set ProcDiePending, so this
looks like an unintended regression due to that commit.

In your patch you have CHECK_FOR_INTERRUPTS(), but I think that means
it could also service other interrupts that we probably don't want to
introduce without more study, no? For example if
ProcessRecoveryConflictInterrupts() returns without throwing, and
another pending interrupt flag is also set.

Thinking of other proc signals/interrupts, of course it's not really
acceptable that we don't process at least also global barriers here,
ie for a time controllable by a client, but let's call that an
independent problem. I think the minimal thing for this specific
problem might be to use ProcessRecoveryConflictInterrupts() directly.

I understand why you created DoingBlockingWrite, based on the variable
named DoingQueryRead. I wonder if it would be better to model it on
QueryCancelHoldoffCount, inventing QueryCancelPromoteToFatalCount, but
I'm not sure.

The read from the socket also seems to have a variant of the problem,
unless I'm missing something, except in that case I'm not sure it's
new. For sending you're proposing that our only choice is to kill the
session, which makes sense, but the equivalent read side stay-in-sync
strategy is to keep deferring until the client gets around to sending
a complete message, if I read that right.

Hmm, I wonder if we could write tests for both directions in
031_recovery_conflict.pl, using a Perl to do raw pq protocol in TCP,
as seen in some other recent proposals... I might have a crack at
that soon unless you want to.

#3Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Thomas Munro (#2)
Re: Accept recovery conflict interrupt on blocked writing

Bonjour Thomas,

On Wed, Jan 15, 2025 at 6:21 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Right. Before commit 0da096d7 in v17, the recovery conflict code
running in a signal handler would have set ProcDiePending, so this
looks like an unintended regression due to that commit.

The issue is happening on instances with 14.13. Though I haven't tried
to reproduce it yet on a version older than the current HEAD yet.

In your patch you have CHECK_FOR_INTERRUPTS(), but I think that means
it could also service other interrupts that we probably don't want to
introduce without more study, no? For example if
ProcessRecoveryConflictInterrupts() returns without throwing, and
another pending interrupt flag is also set.

Thinking of other proc signals/interrupts, of course it's not really
acceptable that we don't process at least also global barriers here,
ie for a time controllable by a client, but let's call that an
independent problem. I think the minimal thing for this specific
problem might be to use ProcessRecoveryConflictInterrupts() directly.

I think that's definitely happening, my first version used a
pg_unreachable after the CHECK_FOR_INTERRUPTS() which was triggered.
I've replaced it by a direct call to
ProcessRecoveryConflictInterrupts().

The read from the socket also seems to have a variant of the problem,
unless I'm missing something, except in that case I'm not sure it's
new. For sending you're proposing that our only choice is to kill the
session, which makes sense, but the equivalent read side stay-in-sync
strategy is to keep deferring until the client gets around to sending
a complete message, if I read that right.

I'm not sure the read path has the same issue. That would require a
replica to do a non command read but I don't think it's possible? A
recovering instance can't run COPY, and is there another situation
where a replica would do a blocking read without the DoingCommandRead
flag? Also, the read side is also killing the session since
DoingCommandRead will be true most of the time, falling through to the
FATAL case.

Hmm, I wonder if we could write tests for both directions in
031_recovery_conflict.pl, using a Perl to do raw pq protocol in TCP,
as seen in some other recent proposals... I might have a crack at
that soon unless you want to.

I've tried adding a test for this. I've implemented some utility
functions to start a session, read a payload and send a simple query
using raw TCP. The implementation is definitely a bit rough as I'm far
from being fluent in perl. The socket utility functions should
probably be moved to a dedicated file and the Windows build is
currently failing. However, the test does trigger the issue with the
recovery blocked by the session blocked in ClientWrite.

Attachments:

v02-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patchapplication/octet-stream; name=v02-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patchDownload+216-28
#4Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Anthonin Bonnefoy (#3)
Re: Accept recovery conflict interrupt on blocked writing

I've tested on a PG16, the issue is indeed triggered with the
replication blocked while the conflicting query is stuck in
ClientWrite.

I've cleaned up the tests: I've created a dedicated PgProto
(definitely open to suggestions for a better name...) module
containing all the helpers to send and receive messages on a raw
socket in 0001.

0002 contains the change with the test using the new PgProto module.
The failure on windows I had was due to using the wrong username. I
was using $USER variable env and I'm now getting the user from an
existing psql session instead. Also, I think the DoingBlockingWrite
variable was unnecessary? Calling ProcessRecoveryConflictInterrupts
should be enough as I don't think QueryCancelHoldoffCount can be >0
when writing on a socket it should always be able to cancel the
statement.

Attachments:

v03-0001-Add-PgProto-test-module-to-send-message-on-a-raw.patchapplication/octet-stream; name=v03-0001-Add-PgProto-test-module-to-send-message-on-a-raw.patchDownload+222-1
v03-0002-Accept-recovery-conflict-interrupt-on-blocked-wr.patchapplication/octet-stream; name=v03-0002-Accept-recovery-conflict-interrupt-on-blocked-wr.patchDownload+103-25
#5Andres Freund
andres@anarazel.de
In reply to: Anthonin Bonnefoy (#4)
Re: Accept recovery conflict interrupt on blocked writing

Hi,

On 2025-01-17 17:01:53 +0100, Anthonin Bonnefoy wrote:

I've cleaned up the tests: I've created a dedicated PgProto
(definitely open to suggestions for a better name...) module
containing all the helpers to send and receive messages on a raw
socket in 0001.

Might be worth using it it in src/test/postmaster/t/002_start_stop.pl? That
has e.g. code to send a startup message.

Also, I think the DoingBlockingWrite variable was unnecessary? Calling
ProcessRecoveryConflictInterrupts should be enough as I don't think
QueryCancelHoldoffCount can be >0 when writing on a socket it should always
be able to cancel the statement.

I'd probably add an assertion to that effect, it's too easy for that to change
in a few years otherwise.

Previously, all interrupts except process dying were ignored while a
process was blocked writing to a socket. If the connection to the client
was broken (no clean FIN nor RST), a process sending results to the
client could be stuck for 924s until the TCP retransmission timeout is
reached. During this time, it was possible for the process to conflict
with recovery: For example, the process returning results can have a
conflicting buffer pin.

To avoid blocking recovery for an extended period of time, this patch
changes client write interrupts by handling recovery conflict interrupts
instead of ignoring them. Since the interrupt happens while we're likely
to have partially written results on the socket, there's no easy way to
keep protocol sync so the session needs to be terminated.

I don't see anything implementing the promotion of ERRORs to FATAL? You're
preventing the error message being sent to the client, but I don't think that
causes the connection to be terminated. The pre-existing code doesn't have
that problem, because it's only active when ProcDiePending is already set.

In fact, the test your patch added goes through
ProcessRecoveryConflictInterrupts() multiple times:

2025-01-17 12:52:47.842 EST [3376411] LOG: recovery still waiting after 20.709 ms: recovery conflict on buffer pin
2025-01-17 12:52:47.842 EST [3376411] CONTEXT: WAL redo at 0/3462288 for Heap2/PRUNE_VACUUM_SCAN: , isCatalogRel: F, nplans: 0, nredirected: 0, ndead: 0, nun>
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
...
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recovery
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl DETAIL: User was holding shared buffer pin for too long.
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl STATEMENT:
BEGIN;
DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1;
FETCH FORWARD FROM test_recovery_conflict_cursor;
SELECT generate_series(1, 100000);

backend 3376451> 2025-01-17 12:52:48.072 EST [3376411] LOG: recovery finished waiting after 250.681 ms: recovery conflict on buffer pin

I don't actually know why the conflict ends up being resolved after a bunch of
retries.

Note also the "backend>" (to which I added the PID to identify it) which gets
emitted. Just setting whereToSendOutput = DestNone has side effects when not
actually in a process exit status...

@@ -581,6 +582,22 @@ ProcessClientWriteInterrupt(bool blocked)
else
SetLatch(MyLatch);
}
+	else if (blocked && RecoveryConflictPending)
+	{
+		/*
+		 * We're conflicting with recovery while being blocked writing. This
+		 * can happen when the process is returning results and no ACK is
+		 * received (broken connection, client overloaded...), eventually
+		 * saturating the socket buffer while the process holds a page pin
+		 * that eventually conflict with recovery.
+		 */
+		if (InterruptHoldoffCount == 0 && CritSectionCount == 0)

I was about to say that we shouldn't get here with CritSectionCount != 0, but
unfortunately I'm not sure that's true. Thinking of e.g. debug messages or
such. And it's of course a copy of existing code...

Playing around with this it's unfortunately worse - we very commonly get to
ProcessClientWriteInterrupt() with InterruptHoldoffCount > 0. Including just
after processing recovery conflict interrupts. The reason for that is that if
we do throw an error the first thing we do is to hold interrupts
(c.f. sigsetjmp() in PostgresMain()) and then we call EmitErrorReport().

Unfortunately I suspect that this means any protection we'd get from this
version of the patch is rather incomplete - if the recovery conflict is
triggered while not blocked, we'll ERROR out and report the message with
interrupts held. That cycle of recovery conflict signalling wouldn't ever be
resolved in such a case, I think. Of course we do re-signal recovery conflict
interrupts rather aggressively - but I'm not sure that's going to help
reliably.

I've previously suggested that we ought to report some errors to the client by
trying to send the data in a a non-blocking way, and just terminate if we
can't send the data immediately. IIRC Tom was concerned that that would lead
to too many lost messages. Perhaps we could instead implement this with a
fairly timer that terminates connections when stuck trying to write out an
error message?

+		{
+			if (whereToSendOutput == DestRemote)
+				whereToSendOutput = DestNone;
+			ProcessRecoveryConflictInterrupts();
+		}
+	}

This largely duplicates the existing code in
ProcessClientWriteInterrupt(). Perhaps it's worth flipping the order of
branches so that we first check that we're neither blocked nor have
InterruptHoldoffCount == 0 && CritSectionCount == 0?

Greetings,

Andres Freund

#6Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#5)
Re: Accept recovery conflict interrupt on blocked writing

Hi,

On 2025-01-17 13:03:35 -0500, Andres Freund wrote:

I don't see anything implementing the promotion of ERRORs to FATAL? You're
preventing the error message being sent to the client, but I don't think that
causes the connection to be terminated. The pre-existing code doesn't have
that problem, because it's only active when ProcDiePending is already set.

In fact, the test your patch added goes through
ProcessRecoveryConflictInterrupts() multiple times:

2025-01-17 12:52:47.842 EST [3376411] LOG: recovery still waiting after 20.709 ms: recovery conflict on buffer pin
2025-01-17 12:52:47.842 EST [3376411] CONTEXT: WAL redo at 0/3462288 for Heap2/PRUNE_VACUUM_SCAN: , isCatalogRel: F, nplans: 0, nredirected: 0, ndead: 0, nun>
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
...
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
3376451: recovery conflict processing done
write(8192) = -1: 11/Resource temporarily unavailable
3376451: recovery conflict interrupt while blocked
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recovery
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl DETAIL: User was holding shared buffer pin for too long.
2025-01-17 12:52:48.072 EST [3376451] 031_recovery_conflict.pl STATEMENT:
BEGIN;
DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1;
FETCH FORWARD FROM test_recovery_conflict_cursor;
SELECT generate_series(1, 100000);

backend 3376451> 2025-01-17 12:52:48.072 EST [3376411] LOG: recovery finished waiting after 250.681 ms: recovery conflict on buffer pin

I don't actually know why the conflict ends up being resolved after a bunch of
retries.

It's because the test sets deadlock_timeout lower than
max_standby_streaming_delay.

Note also the "backend>" (to which I added the PID to identify it) which gets
emitted. Just setting whereToSendOutput = DestNone has side effects when not
actually in a process exit status...

I the the only reason the patch works at all is because we end up in
InteractiveBackend()'s EOF handling, because InteractiveBackend reads from
stdin. That's closed here, but I don't think we have any guarantee that stdin
isn't something that can be read from.

Greetings,

Andres Freund

#7Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#5)
Re: Accept recovery conflict interrupt on blocked writing

Hi,

On 2025-01-17 13:03:35 -0500, Andres Freund wrote:

Previously, all interrupts except process dying were ignored while a
process was blocked writing to a socket. If the connection to the client
was broken (no clean FIN nor RST), a process sending results to the
client could be stuck for 924s until the TCP retransmission timeout is
reached. During this time, it was possible for the process to conflict
with recovery: For example, the process returning results can have a
conflicting buffer pin.

To avoid blocking recovery for an extended period of time, this patch
changes client write interrupts by handling recovery conflict interrupts
instead of ignoring them. Since the interrupt happens while we're likely
to have partially written results on the socket, there's no easy way to
keep protocol sync so the session needs to be terminated.

I don't see anything implementing the promotion of ERRORs to FATAL? You're
preventing the error message being sent to the client, but I don't think that
causes the connection to be terminated. The pre-existing code doesn't have
that problem, because it's only active when ProcDiePending is already set.

The easiest way to do that promotion is probably to set ExitOnAnyError to
true before entering ProcessRecoveryConflictInterrupts(). That does prevent
ending up in InteractiveBackend().

But I suspect this won't suffice:

@@ -581,6 +582,22 @@ ProcessClientWriteInterrupt(bool blocked)
else
SetLatch(MyLatch);
}
+	else if (blocked && RecoveryConflictPending)
+	{
+		/*
+		 * We're conflicting with recovery while being blocked writing. This
+		 * can happen when the process is returning results and no ACK is
+		 * received (broken connection, client overloaded...), eventually
+		 * saturating the socket buffer while the process holds a page pin
+		 * that eventually conflict with recovery.
+		 */
+		if (InterruptHoldoffCount == 0 && CritSectionCount == 0)

I was about to say that we shouldn't get here with CritSectionCount != 0, but
unfortunately I'm not sure that's true. Thinking of e.g. debug messages or
such. And it's of course a copy of existing code...

Playing around with this it's unfortunately worse - we very commonly get to
ProcessClientWriteInterrupt() with InterruptHoldoffCount > 0. Including just
after processing recovery conflict interrupts. The reason for that is that if
we do throw an error the first thing we do is to hold interrupts
(c.f. sigsetjmp() in PostgresMain()) and then we call EmitErrorReport().

Unfortunately I suspect that this means any protection we'd get from this
version of the patch is rather incomplete - if the recovery conflict is
triggered while not blocked, we'll ERROR out and report the message with
interrupts held. That cycle of recovery conflict signalling wouldn't ever be
resolved in such a case, I think. Of course we do re-signal recovery conflict
interrupts rather aggressively - but I'm not sure that's going to help
reliably.

This unfortunately indeed is true. If I, instead of the generate_series(),
add the following to the pgproto query:
DO \$\$BEGIN RAISE 'endless scream: %', repeat('a', 1000000); END;\$\$;

the conflict doesn't get handled.

Example log, with a fair bit of added debugging added:

2025-01-17 13:31:46.226 EST [3407842] 031_recovery_conflict.pl ERROR: endless scream: aa...
2025-01-17 13:31:46.226 EST [3407842] 031_recovery_conflict.pl CONTEXT: PL/pgSQL function inline_code_block line 1 at RAISE
2025-01-17 13:31:46.226 EST [3407842] 031_recovery_conflict.pl STATEMENT:
BEGIN;
DECLARE test_recovery_conflict_cursor CURSOR FOR SELECT b FROM test_recovery_conflict_table1;
FETCH FORWARD FROM test_recovery_conflict_cursor;
DO $$BEGIN RAISE 'endless scream: %', repeat('a', 1000000); END;$$;

3407842: start fronted error
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: write(8192) = 8192: 0/Success
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: write(992023) = 219264: 0/Success
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: ProcessClientWriteInterrupt blocked: 0, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 0
3407842: write(772759) = -1: 11/Resource temporarily unavailable
2025-01-17 13:31:46.231 EST [3407800] DEBUG: reaping dead processes
2025-01-17 13:31:46.232 EST [3407800] DEBUG: releasing pm child slot 3
2025-01-17 13:31:46.232 EST [3407800] DEBUG: client backend (PID 3407844) exited with exit code 0
2025-01-17 13:31:46.407 EST [3407804] DEBUG: sendtime 2025-01-17 13:31:46.407326-05 receipttime 2025-01-17 13:31:46.40737-05 replication apply delay 0 ms transfer latency 1 ms
2025-01-17 13:31:46.407 EST [3407804] DEBUG: sending write 0/3460F78 flush 0/3460DA0 apply 0/3460DA0
2025-01-17 13:31:46.407 EST [3407804] DEBUG: sending write 0/3460F78 flush 0/3460F78 apply 0/3460DA0
3407842: ProcessClientWriteInterrupt blocked: 1, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 1
3407832: !HoldingBufferPinThatDelaysRecovery
3407842: write(772759) = -1: 11/Resource temporarily unavailable
2025-01-17 13:31:46.707 EST [3407803] LOG: recovery still waiting after 300.099 ms: recovery conflict on buffer pin
2025-01-17 13:31:46.707 EST [3407803] CONTEXT: WAL redo at 0/3460DA0 for Heap2/PRUNE_VACUUM_SCAN: , isCatalogRel: F, nplans: 0, nredirected: 0, ndead: 0, nunused: 1, unused: [21]; blkref #0: rel 1663/16385/16386, blk 0
3407832: !HoldingBufferPinThatDelaysRecovery
3407842: ProcessClientWriteInterrupt blocked: 1, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 1
3407842: write(772759) = -1: 11/Resource temporarily unavailable
3407842: ProcessClientWriteInterrupt blocked: 1, IHC: 1, CSC: 0, ProcDieP: 0, RecoveryConflictPending: 1
3407842: write(772759) = -1: 11/Resource temporarily unavailable
2025-01-17 13:31:56.417 EST [3407804] DEBUG: sending write 0/3460F78 flush 0/3460F78 apply 0/3460DA0
2025-01-17 13:32:06.427 EST [3407804] DEBUG: sending write 0/3460F78 flush 0/3460F78 apply 0/3460DA0
2025-01-17 13:32:07.224 EST [3407804] DEBUG: sendtime 2025-01-17 13:32:07.224295-05 receipttime 2025-01-17 13:32:07.224413-05 replication apply delay 20817 ms transfer latency 1 ms

Greetings,

Andres Freund

#8Anthonin Bonnefoy
anthonin.bonnefoy@datadoghq.com
In reply to: Andres Freund (#7)
Re: Accept recovery conflict interrupt on blocked writing

Hi,

Thanks for the detailed explanations, I've definitely misinterpreted
how interrupts and errors were handled.

On Fri, Jan 17, 2025 at 7:03 PM Andres Freund <andres@anarazel.de> wrote:

Might be worth using it it in src/test/postmaster/t/002_start_stop.pl? That
has e.g. code to send a startup message.

I've changed pgproto to provide the necessary helpers to be used in
001_connection_limits.pl and 002_start_stop.pl.

Playing around with this it's unfortunately worse - we very commonly get to
ProcessClientWriteInterrupt() with InterruptHoldoffCount > 0. Including just
after processing recovery conflict interrupts. The reason for that is that if
we do throw an error the first thing we do is to hold interrupts
(c.f. sigsetjmp() in PostgresMain()) and then we call EmitErrorReport().

Unfortunately I suspect that this means any protection we'd get from this
version of the patch is rather incomplete - if the recovery conflict is
triggered while not blocked, we'll ERROR out and report the message with
interrupts held. That cycle of recovery conflict signalling wouldn't ever be
resolved in such a case, I think. Of course we do re-signal recovery conflict
interrupts rather aggressively - but I'm not sure that's going to help
reliably.

This unfortunately indeed is true. If I, instead of the generate_series(),
add the following to the pgproto query:
DO \$\$BEGIN RAISE 'endless scream: %', repeat('a', 1000000); END;\$\$;

the conflict doesn't get handled.

A possible approach to handle this case could be to stop retrying and
return a write error if there's a recovery conflict on a blocked
write, with ProcessClientWriteInterrupt deciding whether a write is
retryable or not. WIthin ProcessClientWriteInterrupt, if we conflict
with the recovery on a blocked write and interrupts can be processed,
then we can call ProcessRecoveryConflictInterrupts with
ExitOnAnyError. Otherwise, if the interrupts can't be processed, then
we make the write not retryable, leading to a write error which will
close the socket with a "connection to client lost".

This requires a new CheckBlockedWriteConflictInterrupts function that
validates whether the process conflicts or not without processing the
conflict as we don't want to abort a writing if the conflict could be
ignored.

The code is a bit rough as there's duplicated logic checks between
CheckBlockedWriteConflictInterrupts and
ProcessRecoveryConflictInterrupt that could very probably be
simplified, but I wanted to validate the approach first. It would also
be probably worth adding more tests to cover all conflicts while
there's a blocked write, though I did add the RAISE case in the tests.

Regards,
Anthonin Bonnefoy

Attachments:

v04-0002-Accept-recovery-conflict-interrupt-on-blocked-wr.patchapplication/octet-stream; name=v04-0002-Accept-recovery-conflict-interrupt-on-blocked-wr.patchDownload+303-60
v04-0001-Add-PgProto-test-module-to-send-message-on-a-raw.patchapplication/octet-stream; name=v04-0001-Add-PgProto-test-module-to-send-message-on-a-raw.patchDownload+287-13