Queries that should be canceled will get stuck on secure_write function
Hi, all
Recently, I got a problem that the startup process of standby is stuck and keeps in a waiting state. The backtrace of startup process shows that it is waiting for a backend process which conflicts with recovery processing to exit, the guc parameters max_standby_streaming_delay and max_standby_archive_delay are both set as 30 seconds, but it doesn't work. The backend process keeps alive, and the backtrace of this backend process show that it is waiting for the socket to be writeable in secure_write(). After further reading the code, I found that ProcessClientWriteInterrupt() in secure_write() only process interrupts when ProcDiePending is true, otherwise do nothing. However, snapshot conflicts with recovery will only set QueryCancelPending as true, so the response to the signal will de delayed indefinitely if the corresponding client is stuck, thus blocking the recovery process.
I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be canceled also should respond to the signal.
I also want to share a patch with you, I add a guc parameter max_standby_client_write_delay, if a query is supposed to be canceled, and the time delayed by a client exceeds max_standby_client_write_delay, then set ProcDiePending as true to avoid being delayed indefinitely, what do you think of it, hope to get your reply.
Thanks & Best Regard
Attachments:
0001-Set-max-client-write-delay-timeout-for-query-which-i.patchapplication/octet-streamDownload
From bbcf92ed6d17e2c6f4a8e8607a34534869d5edd6 Mon Sep 17 00:00:00 2001
From: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Date: Sun, 22 Aug 2021 19:16:21 +0800
Subject: [PATCH] Set max client write delay timeout for query which is
supposed to be canceled on a hot standby server
---
src/backend/libpq/be-secure.c | 19 +++++++++++++++++++
src/backend/utils/misc/guc.c | 11 +++++++++++
src/include/libpq/libpq.h | 1 +
3 files changed, 31 insertions(+)
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 8ef083200a..bb5b39e527 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -36,6 +36,7 @@
#include "storage/proc.h"
#include "tcop/tcopprot.h"
#include "utils/memutils.h"
+#include "utils/timestamp.h"
char *ssl_library;
char *ssl_cert_file;
@@ -63,6 +64,9 @@ bool SSLPreferServerCiphers;
int ssl_min_protocol_version;
int ssl_max_protocol_version;
+/* GUC variable: Max client write delay for a standby query which is supposed to be canceled */
+int max_standby_client_write_delay = 30 * 1000;
+
/* ------------------------------------------------------------ */
/* Procedures common to all secure sessions */
/* ------------------------------------------------------------ */
@@ -261,6 +265,7 @@ secure_write(Port *port, void *ptr, size_t len)
{
ssize_t n;
int waitfor;
+ TimestampTz waitstart = GetCurrentTimestamp();
/* Deal with any already-pending interrupt condition. */
ProcessClientWriteInterrupt(false);
@@ -287,6 +292,9 @@ retry:
waitfor = WL_SOCKET_WRITEABLE;
}
+ if (n >= 0)
+ waitstart = GetCurrentTimestamp();
+
if (n < 0 && !port->noblock && (errno == EWOULDBLOCK || errno == EAGAIN))
{
WaitEvent event;
@@ -298,6 +306,17 @@ retry:
WaitEventSetWait(FeBeWaitSet, -1 /* no timeout */ , &event, 1,
WAIT_EVENT_CLIENT_WRITE);
+ /*
+ * if current query is supposed to be canceled,
+ * and the time delayed by a client exceeds max_standby_client_write_delay, then set
+ * ProcDiePending as true to handle interrupt, avoid being delayed indefinitely by a stuck client
+ */
+ if (!ProcDiePending &&
+ QueryCancelPending &&
+ (max_standby_client_write_delay >= 0) &&
+ TimestampDifferenceExceeds(waitstart, GetCurrentTimestamp(), max_standby_client_write_delay))
+ ProcDiePending = true;
+
/* See comments in secure_read. */
if (event.events & WL_POSTMASTER_DEATH)
ereport(FATAL,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index a2e0f8de7e..69f65ae73e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2257,6 +2257,17 @@ static struct config_int ConfigureNamesInt[] =
NULL, NULL, NULL
},
+ {
+ {"max_standby_client_write_delay", PGC_SIGHUP, REPLICATION_STANDBY,
+ gettext_noop("Sets the maximum delay for query which is waiting for client write and is supposed to be canceled on a hot standby server."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &max_standby_client_write_delay,
+ 30 * 1000, -1, INT_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"recovery_min_apply_delay", PGC_SIGHUP, REPLICATION_STANDBY,
gettext_noop("Sets the minimum delay for applying changes during recovery."),
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 6c51b2f20f..1776553f43 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -121,6 +121,7 @@ extern char *SSLECDHCurve;
extern bool SSLPreferServerCiphers;
extern int ssl_min_protocol_version;
extern int ssl_max_protocol_version;
+extern int max_standby_client_write_delay;
enum ssl_protocol_versions
{
--
2.24.3 (Apple Git-128)
On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com> wrote:
I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be canceled also should respond to the signal.
Well, if we're halfway through sending a message to the client and we
abort the write, we have no way of re-establishing protocol sync,
right? It's OK to die without sending any more data to the client,
since then the connection is closed anyway and the loss of sync
doesn't matter, but continuing the session doesn't seem workable.
Your proposed patch actually seems to dodge this problem and I think
perhaps we could consider something along those lines. It would be
interesting to hear what Andres thinks about this idea, since I think
he was the last one to rewrite that code.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2021-Aug-23, Robert Haas wrote:
On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com> wrote:
I want to know why the interrupt is only handled when ProcDiePending
is true, I think query which is supposed to be canceled also should
respond to the signal.
Yeah, I agree.
Well, if we're halfway through sending a message to the client and we
abort the write, we have no way of re-establishing protocol sync,
right? It's OK to die without sending any more data to the client,
since then the connection is closed anyway and the loss of sync
doesn't matter, but continuing the session doesn't seem workable.Your proposed patch actually seems to dodge this problem and I think
perhaps we could consider something along those lines.
Do we actually need new GUCs, though? I think we should never let an
unresponsive client dictate what the server does, because that opens the
door for uncooperative or malicious clients to wreak serious havoc. I
think the implementation should wait until time now+X to cancel the
query, but if by time now+2X (or whatever we deem reasonable -- maybe
now+1.1X) we're still waiting, then it's okay to just close the
connection. This suggests a completely different implementation, though.
I wonder if it's possible to write a test for this. We would have to
send a query and then hang the client somehow. I recently added a TAP
test that uses SIGSTOP to a walsender ... can we use SIGSTOP with a
background psql that's running SELECT pg_sleep() perhaps?
(Or maybe it's sufficient to start background psql and not pump() it?)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"It takes less than 2 seconds to get to 78% complete; that's a good sign.
A few seconds later it's at 90%, but it seems to have stuck there. Did
somebody make percentages logarithmic while I wasn't looking?"
http://smylers.hates-software.com/2005/09/08/1995c749.html
On Mon, Aug 23, 2021 at 10:45 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Do we actually need new GUCs, though? I think we should never let an
unresponsive client dictate what the server does, because that opens the
door for uncooperative or malicious clients to wreak serious havoc. I
think the implementation should wait until time now+X to cancel the
query, but if by time now+2X (or whatever we deem reasonable -- maybe
now+1.1X) we're still waiting, then it's okay to just close the
connection. This suggests a completely different implementation, though.
I don't quite understand what you mean by waiting until time now+X to
cancel the query. We don't know a priori when query cancels are going
to happen, but when they do happen we want to respond to them as
quickly as we can. It seems reasonable to say that if we can't handle
them within X amount of time we can resort to emergency measures, but
that's basically what the patch does, except it uses a GUC instead of
hardcoding X.
I wonder if it's possible to write a test for this. We would have to
send a query and then hang the client somehow. I recently added a TAP
test that uses SIGSTOP to a walsender ... can we use SIGSTOP with a
background psql that's running SELECT pg_sleep() perhaps?
(Or maybe it's sufficient to start background psql and not pump() it?)
Starting a background process and not pumping it sounds promising,
because it seems like it would be more likely to be portable. I think
we'd want to be careful not to assume very much about how large the
output buffer is, because I'm guessing that varies by platform and
that it might in some cases be fairly large. Perhaps we could use
pg_stat_activity to wait until we block in a ClientWrite state,
although I wonder if we might find out that we sometimes block on a
different wait state than what we expect to see.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2021-Aug-23, Robert Haas wrote:
On Mon, Aug 23, 2021 at 10:45 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Do we actually need new GUCs, though? I think we should never let an
unresponsive client dictate what the server does, because that opens the
door for uncooperative or malicious clients to wreak serious havoc. I
think the implementation should wait until time now+X to cancel the
query, but if by time now+2X (or whatever we deem reasonable -- maybe
now+1.1X) we're still waiting, then it's okay to just close the
connection. This suggests a completely different implementation, though.I don't quite understand what you mean by waiting until time now+X to
cancel the query. We don't know a priori when query cancels are going
to happen, but when they do happen we want to respond to them as
quickly as we can. It seems reasonable to say that if we can't handle
them within X amount of time we can resort to emergency measures, but
that's basically what the patch does, except it uses a GUC instead of
hardcoding X.
Aren't we talking about query cancellations that occur in response to a
standby delay limit? Those aren't in response to user action. What I
mean is that if the standby delay limit is exceeded, then we send a
query cancel; we expect the standby to cancel its query at that time and
then the primary can move on. But if the standby doesn't react, then we
can have it terminate its connection. I'm looking at the problem from
the primary's point of view rather than the standby's point of view.
--
Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Aren't we talking about query cancellations that occur in response to a
standby delay limit? Those aren't in response to user action.
Oh, you're right. But I guess a similar problem could also occur in
response to pg_terminate_backend(), no?
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2021/08/24 0:26, Alvaro Herrera wrote:
Aren't we talking about query cancellations that occur in response to a
standby delay limit? Those aren't in response to user action. What I
mean is that if the standby delay limit is exceeded, then we send a
query cancel; we expect the standby to cancel its query at that time and
then the primary can move on. But if the standby doesn't react, then we
can have it terminate its connection.
+1
On 2021/08/24 3:45, Robert Haas wrote:
On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Aren't we talking about query cancellations that occur in response to a
standby delay limit? Those aren't in response to user action.Oh, you're right. But I guess a similar problem could also occur in
response to pg_terminate_backend(), no?
There seems no problem in that case because pg_terminate_backend() causes
a backend to set ProcDiePending to true in die() signal hander and
ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending.
No?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Yes, pg_terminate_backend() can terminate the connection successfully in this case because ProcDiePending is set as true and ProcessClientWriteInterrupt() can handle it.
Queries those exceed standby delay limit can be terminated in this way, but what about other queries that should be canceled but get stuck on secure_write()? After all, there is currently no way to list all possible situations and then terminate these queries one by one.
------------------------------------------------------------------
发件人:Fujii Masao <masao.fujii@oss.nttdata.com>
发送时间:2021年8月24日(星期二) 13:15
收件人:Robert Haas <robertmhaas@gmail.com>; Alvaro Herrera <alvherre@alvh.no-ip.org>
抄 送:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>; pgsql-hackers <pgsql-hackers@lists.postgresql.org>; Andres Freund <andres@anarazel.de>
主 题:Re: Queries that should be canceled will get stuck on secure_write function
On 2021/08/24 0:26, Alvaro Herrera wrote:
Aren't we talking about query cancellations that occur in response to a
standby delay limit? Those aren't in response to user action. What I
mean is that if the standby delay limit is exceeded, then we send a
query cancel; we expect the standby to cancel its query at that time and
then the primary can move on. But if the standby doesn't react, then we
can have it terminate its connection.
+1
On 2021/08/24 3:45, Robert Haas wrote:
On Mon, Aug 23, 2021 at 11:26 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Aren't we talking about query cancellations that occur in response to a
standby delay limit? Those aren't in response to user action.Oh, you're right. But I guess a similar problem could also occur in
response to pg_terminate_backend(), no?
There seems no problem in that case because pg_terminate_backend() causes
a backend to set ProcDiePending to true in die() signal hander and
ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending.
No?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Tue, Aug 24, 2021 at 1:15 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Oh, you're right. But I guess a similar problem could also occur in
response to pg_terminate_backend(), no?There seems no problem in that case because pg_terminate_backend() causes
a backend to set ProcDiePending to true in die() signal hander and
ProcessClientWriteInterrupt() called by secure_write() handles ProcDiePending.
No?
Hmm, maybe you're right. What about pg_cancel_backend()?
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2021/08/25 2:30, Robert Haas wrote:
Hmm, maybe you're right. What about pg_cancel_backend()?
I was thinking that it's valid even if secure_write() doesn't react to
pg_cancel_backend() because it's basically called outside transaction block,
i.e., there is no longer transaction to cancel in that case. But there can be
some cases where secure_write() is called inside transaction block,
for example, when the query generates NOTICE message. In these cases,
secure_write() might need to react to the cancel request.
BTW, when an error happens, I found that a backend calls EmitErrorReport()
to report an error to a client, and then calls AbortCurrentTransaction()
to abort the transaction. If secure_write() called by EmitErrorReport()
gets stuck, a backend gets stuck inside transaction block and the locks
keep being held unnecessarily. Isn't this problematic? Can we change
the order of them?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Tue, Aug 24, 2021 at 9:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
I was thinking that it's valid even if secure_write() doesn't react to
pg_cancel_backend() because it's basically called outside transaction block,
i.e., there is no longer transaction to cancel in that case. But there can be
some cases where secure_write() is called inside transaction block,
for example, when the query generates NOTICE message. In these cases,
secure_write() might need to react to the cancel request.
Yeah. I think we could also be sending tuple data.
BTW, when an error happens, I found that a backend calls EmitErrorReport()
to report an error to a client, and then calls AbortCurrentTransaction()
to abort the transaction. If secure_write() called by EmitErrorReport()
gets stuck, a backend gets stuck inside transaction block and the locks
keep being held unnecessarily. Isn't this problematic? Can we change
the order of them?
I think there might be problems with that, like perhaps the ErrorData
object can have pointers into the memory contexts that we'd be
destroying in AbortCurrentTransaction().
More generally, I think it's hopeless to try to improve the situation
very much by changing the place where secure_write() happens. It
happens in so many different places, and it is clearly not going to be
possible to make it so that in none of those places do we hold any
important server resources. Therefore I think the only solution is to
fix secure_write() itself ... and the trick is what to do there given
that we have to be very careful not to do anything that might try to
write another message while we are already in the middle of writing a
message.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On 2021-08-23 10:13:03 -0400, Robert Haas wrote:
On Mon, Aug 23, 2021 at 4:15 AM 蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com> wrote:
I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be canceled also should respond to the signal.
Well, if we're halfway through sending a message to the client and we
abort the write, we have no way of re-establishing protocol sync,
right? It's OK to die without sending any more data to the client,
since then the connection is closed anyway and the loss of sync
doesn't matter, but continuing the session doesn't seem workable.
Right.
Your proposed patch actually seems to dodge this problem and I think
perhaps we could consider something along those lines. It would be
interesting to hear what Andres thinks about this idea, since I think
he was the last one to rewrite that code.
I think it's a reasonable idea. I have some quibbles with the implementation
(new code should be in ProcessClientWriteInterrupt(), not secure_write()), and
I suspect we should escalate more quickly to killing the connection, but those
seem like details.
I think that if we want to tackle this, we need to do the same for
secure_read() as well. secure_read() will process interrupts normally while
DoingCommandRead, but if we're already in a command, we'd just as well be
prevented from processing a !ProcDiePending interrupt.
Greetings,
Andres Freund
Hi,
On 2021-08-27 08:27:38 -0400, Robert Haas wrote:
On Tue, Aug 24, 2021 at 9:58 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
to report an error to a client, and then calls AbortCurrentTransaction()
to abort the transaction. If secure_write() called by EmitErrorReport()
gets stuck, a backend gets stuck inside transaction block and the locks
keep being held unnecessarily. Isn't this problematic? Can we change
the order of them?...
More generally, I think it's hopeless to try to improve the situation
very much by changing the place where secure_write() happens. It
happens in so many different places, and it is clearly not going to be
possible to make it so that in none of those places do we hold any
important server resources. Therefore I think the only solution is to
fix secure_write() itself ... and the trick is what to do there given
that we have to be very careful not to do anything that might try to
write another message while we are already in the middle of writing a
message.
I wonder if we could improve the situation somewhat by using the non-blocking
pqcomm functions in a few select places. E.g. if elog.c's
send_message_to_frontend() sent its message via a new pq_endmessage_noblock()
(which'd use the existing pq_putmessage_noblock()) and used
pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the
client before AbortCurrentTransaction(), b) able to queue further error
messages safely.
I think this'd not violate the goal of putting pq_flush() into
send_message_to_frontend():
/*
* This flush is normally not necessary, since postgres.c will flush out
* waiting data when control returns to the main loop. But it seems best
* to leave it here, so that the client has some clue what happened if the
* backend dies before getting back to the main loop ... error/notice
* messages should not be a performance-critical path anyway, so an extra
* flush won't hurt much ...
*/
pq_flush();
because the only situations where we'd not send the data out immediately would
be when the socket buffer is already full. In which case the client wouldn't
get the error immediately anyway?
Greetings,
Andres Freund
On Fri, Aug 27, 2021 at 3:24 PM Andres Freund <andres@anarazel.de> wrote:
I wonder if we could improve the situation somewhat by using the non-blocking
pqcomm functions in a few select places. E.g. if elog.c's
send_message_to_frontend() sent its message via a new pq_endmessage_noblock()
(which'd use the existing pq_putmessage_noblock()) and used
pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the
client before AbortCurrentTransaction(), b) able to queue further error
messages safely.
pq_flush_if_writable() could succeed in sending only part of the data,
so I don't see how this works.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On Fri, Aug 27, 2021, at 13:07, Robert Haas wrote:
On Fri, Aug 27, 2021 at 3:24 PM Andres Freund <andres@anarazel.de> wrote:
I wonder if we could improve the situation somewhat by using the non-blocking
pqcomm functions in a few select places. E.g. if elog.c's
send_message_to_frontend() sent its message via a new pq_endmessage_noblock()
(which'd use the existing pq_putmessage_noblock()) and used
pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the
client before AbortCurrentTransaction(), b) able to queue further error
messages safely.pq_flush_if_writable() could succeed in sending only part of the data,
so I don't see how this works.
All the data is buffered though, so I don't see that problem that causes?
Andres
I add a test to reproduce the problem, you can see the attachment for specific content
during the last sleep time of the test, use pstack to get the stack of the backend process, which is as follows:
#0 0x00007f6ebdd744d3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1 0x00000000007738d2 in WaitEventSetWait ()
#2 0x0000000000675aae in secure_write ()
#3 0x000000000067bfab in internal_flush ()
#4 0x000000000067c13a in internal_putbytes ()
#5 0x000000000067c217 in socket_putmessage ()
#6 0x0000000000497f36 in printtup ()
#7 0x00000000006301e0 in standard_ExecutorRun ()
#8 0x00000000007985fb in PortalRunSelect ()
#9 0x0000000000799968 in PortalRun ()
#10 0x0000000000795866 in exec_simple_query ()
#11 0x0000000000796cff in PostgresMain ()
#12 0x0000000000488339 in ServerLoop ()
#13 0x0000000000717bbc in PostmasterMain ()
#14 0x0000000000488f26 in main ()
Attachments:
0001-Test-stuck-on-secure-write.patchapplication/octet-streamDownload
From 10afde0a2105511fbf17b5f5223b7e6ca2f3136d Mon Sep 17 00:00:00 2001
From: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Date: Mon, 6 Sep 2021 07:20:17 +0000
Subject: [PATCH] Add test for backend process gets stuck on secure_write and
can not be canceled
---
src/test/perl/PostgresNode.pm | 27 +++++++++++++
src/test/recovery/t/000_stuck_on_secure_write.pl | 49 ++++++++++++++++++++++++
2 files changed, 76 insertions(+)
create mode 100644 src/test/recovery/t/000_stuck_on_secure_write.pl
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index c59da75..b6a158a 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -844,6 +844,33 @@ sub start
=pod
+=item $node->find_child(process_name)
+Get child pid by process_name
+=cut
+sub find_child
+{
+ my ($self, $process_name) = @_;
+ my $pid=0;
+ my @childs=`ps -o pid,cmd --ppid $self->{_pid}` or die "can't run ps! $! \n";
+
+ foreach my $child (@childs)
+ {
+ $child =~ s/^\s+|\s+$//g;
+ my $pos = index($child, $process_name);
+ if ($pos > 0)
+ {
+ $pos = index($child, ' ');
+ $pid = substr($child, 0, $pos);
+ $pid =~ s/^\s+|\s+$//g;
+ last;
+ }
+ }
+
+ return $pid;
+}
+
+=pod
+
=item $node->kill9()
Send SIGKILL (signal 9) to the postmaster.
diff --git a/src/test/recovery/t/000_stuck_on_secure_write.pl b/src/test/recovery/t/000_stuck_on_secure_write.pl
new file mode 100644
index 0000000..cc37734
--- /dev/null
+++ b/src/test/recovery/t/000_stuck_on_secure_write.pl
@@ -0,0 +1,49 @@
+# Test for backend process gets stuck on secure_write()
+use strict;
+use warnings;
+use PostgresNode;
+use TestLib ();
+use Test::More tests=>4;
+
+# primary node
+my $node_primary = PostgresNode->new('primary');
+$node_primary->init;
+
+$node_primary->start;
+$node_primary->safe_psql('postgres', 'CREATE TABLE test_table(val integer);');
+# insert more data so that the output buffer can be filled up when select from the table
+$node_primary->safe_psql('postgres', "INSERT INTO test_table(val) SELECT generate_series(1,5000000) as newwal");
+
+my $connstr = $node_primary->connstr;
+my @res = `nohup psql -d "$connstr" -c "begin; select pg_sleep(5); select * from test_table;" > myout.file 2>&1 &`;
+my $client = readpipe("ps -ef | grep \"select pg_sleep(5)\" | grep -v grep | awk '{print \$2}'");
+print "client pid: $client";
+ok($client != 0, "ready to stop client");
+
+# stop client and backend process will get stuck on secure_write()
+sleep 3;
+@res = `kill -stop $client`;
+my $pid = $node_primary->find_child("[local] SELECT");
+print "backend pid:$pid\n";
+ok($pid != 0, "get corresponding backend process");
+
+# cancel the query
+sleep 10;
+$node_primary->safe_psql('postgres', "select pg_cancel_backend($pid)");
+# check the query
+sleep 3;
+my $pid_new = $node_primary->find_child("[local] SELECT");
+print "backend pid:$pid_new\n";
+ok($pid_new == $pid, "backend can't be canceled");
+
+# cancel the query again
+$node_primary->safe_psql('postgres', "select :($pid)");
+# check the query
+sleep 3;
+$pid_new = $node_primary->find_child("[local] SELECT");
+print "backend pid:$pid_new\n";
+ok($pid_new == $pid, "backend can't be canceled");
+
+# sleep to check the backtrace of backend process by pstack backend_pid
+sleep 300;
+$node_primary->stop;
\ No newline at end of file
--
1.8.3.1
I changed the implementation about this problem:
a) if the cancel query interrupt is from db for some reason, such as recovery conflict, then handle it immediately, and cancel request is treated as terminate request;
b) if the cancel query interrupt is from client, then ignore as original way
In the attached patch, I also add a tap test to generate a recovery conflict on a standby during the backend process is stuck on client write, check whether it can handle the cancel query request due to recovery conflict.
what do you think of it, hope to get your reply
Thanks & Best Regards
Attachments:
0001-Handle-cancel-interrupts-during-client-readwrite.patchapplication/octet-streamDownload
From 630c7d5e769dda6676eeef4ad56f8ed8ab9878e0 Mon Sep 17 00:00:00 2001
From: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Date: Thu, 9 Sep 2021 07:37:01 +0000
Subject: [PATCH] Handle cancel query interrupts during client read/write
---
src/backend/tcop/postgres.c | 37 ++++++++
src/test/perl/PostgresNode.pm | 30 ++++++
.../t/026_recovery_stuck_on_client_write.pl | 102 +++++++++++++++++++++
3 files changed, 169 insertions(+)
create mode 100644 src/test/recovery/t/026_recovery_stuck_on_client_write.pl
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 58b5960..a7570f4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -206,6 +206,7 @@ static void drop_unnamed_stmt(void);
static void log_disconnections(int code, Datum arg);
static void enable_statement_timeout(void);
static void disable_statement_timeout(void);
+static void ProcessClientReadWriteCancelInterrupt(void);
/* ----------------------------------------------------------------
@@ -493,6 +494,9 @@ ProcessClientReadInterrupt(bool blocked)
{
int save_errno = errno;
+ /* Process cancel query interrupts */
+ ProcessClientReadWriteCancelInterrupt();
+
if (DoingCommandRead)
{
/* Check for general interrupts that arrived before/while reading */
@@ -539,6 +543,9 @@ ProcessClientWriteInterrupt(bool blocked)
{
int save_errno = errno;
+ /* Process cancel query interrupts */
+ ProcessClientReadWriteCancelInterrupt();
+
if (ProcDiePending)
{
/*
@@ -4958,3 +4965,33 @@ disable_statement_timeout(void)
if (get_timeout_active(STATEMENT_TIMEOUT))
disable_timeout(STATEMENT_TIMEOUT, false);
}
+
+/*
+ * process cancel query request during client read/write
+ * in original way, cancel query requests are all ignored during client read/write,
+ * which is not appropriate. for example, if a query which generated recovery conflict
+ * can't be canceled when it is reading from/writing to client, the recovery process of
+ * startup will be delay infinitely by a stuck client. Improve it in this way:
+ * 1) if cancel query request is from db itself, then we handle it right now, and cancel request
+ * is treated as a terminate request, connection will be terminated;
+ * 2) if cancel query request is from client, then we ignore it as original way, client needs
+ * to send a termiante request to cancel it truly.
+ */
+static void
+ProcessClientReadWriteCancelInterrupt(void)
+{
+ bool handle_cancel_request = false;
+ bool stmt_timeout_occurred;
+ bool lock_timeout_occurred;
+
+ lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true);
+ stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true);
+
+ /* cancel request need to be handled */
+ if (lock_timeout_occurred || stmt_timeout_occurred || RecoveryConflictPending)
+ handle_cancel_request = true;
+
+ /* treat cancel request as terminate request */
+ if (!ProcDiePending && QueryCancelPending && handle_cancel_request)
+ ProcDiePending = true;
+}
\ No newline at end of file
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index c59da75..82da7f6 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -844,6 +844,36 @@ sub start
=pod
+=item $node->find_child(process_name)
+Find child process base on process name
+
+=cut
+
+sub find_child
+{
+ my ($self, $process_name) = @_;
+ my $pid=0;
+ my @childs=`ps -o pid,cmd --ppid $self->{_pid}` or die "can't run ps! $! \n";
+
+ foreach my $child (@childs)
+ {
+ $child =~ s/^\s+|\s+$//g;
+ my $pos = index($child, $process_name);
+ if ($pos > 0)
+ {
+ $pos = index($child, ' ');
+ $pid = substr($child, 0, $pos);
+ $pid =~ s/^\s+|\s+$//g;
+ print "### Killing child process \"$pid\", \"$child\" using signal 9\n";
+ last;
+ }
+ }
+
+ return $pid;
+}
+
+=pod
+
=item $node->kill9()
Send SIGKILL (signal 9) to the postmaster.
diff --git a/src/test/recovery/t/026_recovery_stuck_on_client_write.pl b/src/test/recovery/t/026_recovery_stuck_on_client_write.pl
new file mode 100644
index 0000000..fb227c5
--- /dev/null
+++ b/src/test/recovery/t/026_recovery_stuck_on_client_write.pl
@@ -0,0 +1,102 @@
+# Test for backend process gets stuck on client_write
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use IPC::Run;
+use Test::More tests=>5;
+
+# primary node
+my $node_primary = PostgresNode->new('primary');
+$node_primary->init(
+ allows_streaming => 1,
+ auth_extra => [ '--create-role', 'repl_role' ]);
+$node_primary->start;
+
+# standby node
+my $backup_name = 'my_backup';
+# Take backup
+$node_primary->backup($backup_name);
+# Create streaming standby linking to primary
+my $node_standby = PostgresNode->new('standby');
+$node_standby->init_from_backup($node_primary, $backup_name,
+ has_streaming => 1);
+# start standby
+$node_standby->append_conf('postgresql.conf', "max_standby_streaming_delay = 5000");
+$node_standby->append_conf('postgresql.conf', "max_standby_archive_delay = 5000");
+$node_standby->append_conf('postgresql.conf', "hot_standby_feedback = off");
+$node_standby->append_conf('postgresql.conf', "logging_collector = on");
+$node_standby->append_conf('postgresql.conf', "log_directory = 'log'");
+$node_standby->start;
+
+# Wait for standbys to catch up
+$node_primary->safe_psql('postgres', 'CREATE TABLE test_table(val integer);');
+# insert more data so that the output buffer can be filled up when select from the table
+$node_primary->safe_psql('postgres', "INSERT INTO test_table(val) SELECT generate_series(1,10000000) as newwal");
+$node_primary->safe_psql('postgres', "checkpoint;");
+$node_primary->wait_for_catchup($node_standby, 'replay', $node_primary->lsn('insert'));
+my $result = $node_standby->safe_psql('postgres', "SELECT count(*) FROM test_table");
+print "standby result: $result\n";
+ok($result == 10000000, 'check streamed content on standby');
+
+my $connstr = $node_primary->connstr;
+my @res = `nohup psql -d "$connstr" -c "begin; select pg_sleep(2); select * from test_table;" > tmp_check/myout.file 2>&1 &`;
+my $host = $node_primary->host;
+my $port = $node_primary->port;
+my $client = readpipe("ps -ef | grep \"psql\" | grep \"$host\" | grep \"$port\" | grep -v grep | awk '{print \$2}'");
+# stop client and backend will get stuck on secure_write()
+print "ready to stop primary client pid: $client\n";
+sleep 1;
+@res = `kill -stop $client`;
+my $backend = $node_primary->find_child("SELECT");
+print "backend pid:$backend\n";
+
+# cancel the query
+sleep 2;
+$node_primary->safe_psql('postgres', "select pg_cancel_backend($backend)");
+# check the query
+my $pid_new = $node_primary->find_child("SELECT");
+print "backend pid:$pid_new\n";
+ok($pid_new == $backend, "backend can't be canceled");
+@res = `kill -cont $client`;
+@res = `kill -s 9 $client`;
+
+# create recovery conflict
+$connstr = $node_standby->connstr;
+@res = `nohup psql -d "$connstr" -c "begin; select pg_sleep(2); select * from test_table;" > tmp_check/myout.file 2>&1 &`;
+$host = $node_standby->host;
+$port = $node_standby->port;
+$client = readpipe("ps -ef | grep \"psql\" | grep \"$host\" | grep \"$port\" | grep -v grep | awk '{print \$2}'");
+print "ready to stop standby client pid: $client\n";
+sleep 1;
+@res = `kill -stop $client`;
+
+# delete data and do vacuum in primary node
+$node_primary->safe_psql("postgres", "delete from test_table");
+$node_primary->safe_psql("postgres", "vacuum test_table");
+# cancel the query
+$backend = $node_standby->find_child("SELECT");
+$node_standby->safe_psql('postgres', "select pg_cancel_backend($backend)");
+# check the query
+sleep 2;
+$pid_new = $node_standby->find_child("SELECT");
+print "backend pid:$pid_new\n";
+ok($pid_new == $backend, "backend can't be canceled by user");
+
+# wait until recovery conflict generate
+sleep 3;
+$pid_new = $node_standby->find_child("SELECT");
+print "backend pid:$pid_new\n";
+ok($pid_new == 0, "backend is canceled");
+@res = `kill -cont $client`;
+@res = `kill -s 9 $client`;
+
+my $basedir = $node_standby->basedir();
+my $logdir = "$basedir/pgdata/log";
+my @exits = `grep -rn "terminating connection due to conflict with recovery" $logdir`;
+my $found = @exits;
+ok($found == 1, "backend is canceled due to recovery conflict");
+
+$node_standby->stop;
+$node_primary->stop;
--
1.8.3.1
On Fri, Aug 27, 2021 at 4:10 PM Andres Freund <andres@anarazel.de> wrote:
On Fri, Aug 27, 2021, at 13:07, Robert Haas wrote:
On Fri, Aug 27, 2021 at 3:24 PM Andres Freund <andres@anarazel.de> wrote:
I wonder if we could improve the situation somewhat by using the non-blocking
pqcomm functions in a few select places. E.g. if elog.c's
send_message_to_frontend() sent its message via a new pq_endmessage_noblock()
(which'd use the existing pq_putmessage_noblock()) and used
pq_flush_if_writable() instead of pq_flush(), we'd a) not block sending to the
client before AbortCurrentTransaction(), b) able to queue further error
messages safely.pq_flush_if_writable() could succeed in sending only part of the data,
so I don't see how this works.All the data is buffered though, so I don't see that problem that causes?
OK, I guess I'm confused here.
If we're always buffering the data, then I suppose there's no risk of
injecting a protocol message into the middle of some other protocol
message, assuming that we don't have a non-local transfer of control
halfway through putting a message in the buffer. But there's still the
risk of getting out of step with the client. Suppose the client does
SELECT 1/0 and we send an ErrorResponse complaining about the division
by zero. But as we're trying to send that response, we block. Later, a
query cancel occurs. We can't queue another ErrorResponse because the
client will interpret that as the response to the next query, since
the division by zero error is the response to the current one. I don't
think that changing pq_flush() to pq_flush_if_writable() in elog.c or
anywhere else can fix that problem.
But that doesn't mean that it isn't a good idea. Any place where we're
doing a pq_flush() and know that another one will happen soon
afterward, before we wait for data from the client, can be changed to
pq_flush_if_writable() without harm, and it's beneficial to do so,
because like you say, it avoids blocking in places that users may find
inconvenient - e.g. while holding locks, as Fujii-san said. The
comment here claims that "postgres.c will flush out waiting data when
control returns to the main loop" but the only pq_flush() call that's
directly present in postgres.c is in response to receiving a Flush
message, so I suppose this is actually talking about the pq_flush()
inside ReadyForQuery. It's not 100% clear to me that we do that in all
relevant cases, though. Suppose we hit an error while processing some
protocol message that does not set send_ready_for_query = true, like
for example Describe ('D'). I think in that case the flush in elog.c
is the only one. Perhaps we ought to change postgres.c so that if we
don't enter the block guarded by "if (send_ready_for_query)" we
instead pq_flush().
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi all, I want to know your opinion on this patch, or in what way do you think we should solve this problem?
------------------------------------------------------------------
发件人:蔡梦娟(玊于) <mengjuan.cmj@alibaba-inc.com>
发送时间:2021年9月9日(星期四) 17:38
收件人:Robert Haas <robertmhaas@gmail.com>; Andres Freund <andres@anarazel.de>; alvherre <alvherre@alvh.no-ip.org>; masao.fujii <masao.fujii@oss.nttdata.com>
抄 送:pgsql-hackers <pgsql-hackers@lists.postgresql.org>
主 题:回复:Queries that should be canceled will get stuck on secure_write function
I changed the implementation about this problem:
a) if the cancel query interrupt is from db for some reason, such as recovery conflict, then handle it immediately, and cancel request is treated as terminate request;
b) if the cancel query interrupt is from client, then ignore as original way
In the attached patch, I also add a tap test to generate a recovery conflict on a standby during the backend process is stuck on client write, check whether it can handle the cancel query request due to recovery conflict.
what do you think of it, hope to get your reply
Thanks & Best Regards
Attachments:
0001-Handle-cancel-interrupts-during-client-readwrite.patchapplication/octet-streamDownload
From 630c7d5e769dda6676eeef4ad56f8ed8ab9878e0 Mon Sep 17 00:00:00 2001
From: "suyu.cmj" <mengjuan.cmj@alibaba-inc.com>
Date: Thu, 9 Sep 2021 07:37:01 +0000
Subject: [PATCH] Handle cancel query interrupts during client read/write
---
src/backend/tcop/postgres.c | 37 ++++++++
src/test/perl/PostgresNode.pm | 30 ++++++
.../t/026_recovery_stuck_on_client_write.pl | 102 +++++++++++++++++++++
3 files changed, 169 insertions(+)
create mode 100644 src/test/recovery/t/026_recovery_stuck_on_client_write.pl
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 58b5960..a7570f4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -206,6 +206,7 @@ static void drop_unnamed_stmt(void);
static void log_disconnections(int code, Datum arg);
static void enable_statement_timeout(void);
static void disable_statement_timeout(void);
+static void ProcessClientReadWriteCancelInterrupt(void);
/* ----------------------------------------------------------------
@@ -493,6 +494,9 @@ ProcessClientReadInterrupt(bool blocked)
{
int save_errno = errno;
+ /* Process cancel query interrupts */
+ ProcessClientReadWriteCancelInterrupt();
+
if (DoingCommandRead)
{
/* Check for general interrupts that arrived before/while reading */
@@ -539,6 +543,9 @@ ProcessClientWriteInterrupt(bool blocked)
{
int save_errno = errno;
+ /* Process cancel query interrupts */
+ ProcessClientReadWriteCancelInterrupt();
+
if (ProcDiePending)
{
/*
@@ -4958,3 +4965,33 @@ disable_statement_timeout(void)
if (get_timeout_active(STATEMENT_TIMEOUT))
disable_timeout(STATEMENT_TIMEOUT, false);
}
+
+/*
+ * process cancel query request during client read/write
+ * in original way, cancel query requests are all ignored during client read/write,
+ * which is not appropriate. for example, if a query which generated recovery conflict
+ * can't be canceled when it is reading from/writing to client, the recovery process of
+ * startup will be delay infinitely by a stuck client. Improve it in this way:
+ * 1) if cancel query request is from db itself, then we handle it right now, and cancel request
+ * is treated as a terminate request, connection will be terminated;
+ * 2) if cancel query request is from client, then we ignore it as original way, client needs
+ * to send a termiante request to cancel it truly.
+ */
+static void
+ProcessClientReadWriteCancelInterrupt(void)
+{
+ bool handle_cancel_request = false;
+ bool stmt_timeout_occurred;
+ bool lock_timeout_occurred;
+
+ lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true);
+ stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true);
+
+ /* cancel request need to be handled */
+ if (lock_timeout_occurred || stmt_timeout_occurred || RecoveryConflictPending)
+ handle_cancel_request = true;
+
+ /* treat cancel request as terminate request */
+ if (!ProcDiePending && QueryCancelPending && handle_cancel_request)
+ ProcDiePending = true;
+}
\ No newline at end of file
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index c59da75..82da7f6 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -844,6 +844,36 @@ sub start
=pod
+=item $node->find_child(process_name)
+Find child process base on process name
+
+=cut
+
+sub find_child
+{
+ my ($self, $process_name) = @_;
+ my $pid=0;
+ my @childs=`ps -o pid,cmd --ppid $self->{_pid}` or die "can't run ps! $! \n";
+
+ foreach my $child (@childs)
+ {
+ $child =~ s/^\s+|\s+$//g;
+ my $pos = index($child, $process_name);
+ if ($pos > 0)
+ {
+ $pos = index($child, ' ');
+ $pid = substr($child, 0, $pos);
+ $pid =~ s/^\s+|\s+$//g;
+ print "### Killing child process \"$pid\", \"$child\" using signal 9\n";
+ last;
+ }
+ }
+
+ return $pid;
+}
+
+=pod
+
=item $node->kill9()
Send SIGKILL (signal 9) to the postmaster.
diff --git a/src/test/recovery/t/026_recovery_stuck_on_client_write.pl b/src/test/recovery/t/026_recovery_stuck_on_client_write.pl
new file mode 100644
index 0000000..fb227c5
--- /dev/null
+++ b/src/test/recovery/t/026_recovery_stuck_on_client_write.pl
@@ -0,0 +1,102 @@
+# Test for backend process gets stuck on client_write
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use IPC::Run;
+use Test::More tests=>5;
+
+# primary node
+my $node_primary = PostgresNode->new('primary');
+$node_primary->init(
+ allows_streaming => 1,
+ auth_extra => [ '--create-role', 'repl_role' ]);
+$node_primary->start;
+
+# standby node
+my $backup_name = 'my_backup';
+# Take backup
+$node_primary->backup($backup_name);
+# Create streaming standby linking to primary
+my $node_standby = PostgresNode->new('standby');
+$node_standby->init_from_backup($node_primary, $backup_name,
+ has_streaming => 1);
+# start standby
+$node_standby->append_conf('postgresql.conf', "max_standby_streaming_delay = 5000");
+$node_standby->append_conf('postgresql.conf', "max_standby_archive_delay = 5000");
+$node_standby->append_conf('postgresql.conf', "hot_standby_feedback = off");
+$node_standby->append_conf('postgresql.conf', "logging_collector = on");
+$node_standby->append_conf('postgresql.conf', "log_directory = 'log'");
+$node_standby->start;
+
+# Wait for standbys to catch up
+$node_primary->safe_psql('postgres', 'CREATE TABLE test_table(val integer);');
+# insert more data so that the output buffer can be filled up when select from the table
+$node_primary->safe_psql('postgres', "INSERT INTO test_table(val) SELECT generate_series(1,10000000) as newwal");
+$node_primary->safe_psql('postgres', "checkpoint;");
+$node_primary->wait_for_catchup($node_standby, 'replay', $node_primary->lsn('insert'));
+my $result = $node_standby->safe_psql('postgres', "SELECT count(*) FROM test_table");
+print "standby result: $result\n";
+ok($result == 10000000, 'check streamed content on standby');
+
+my $connstr = $node_primary->connstr;
+my @res = `nohup psql -d "$connstr" -c "begin; select pg_sleep(2); select * from test_table;" > tmp_check/myout.file 2>&1 &`;
+my $host = $node_primary->host;
+my $port = $node_primary->port;
+my $client = readpipe("ps -ef | grep \"psql\" | grep \"$host\" | grep \"$port\" | grep -v grep | awk '{print \$2}'");
+# stop client and backend will get stuck on secure_write()
+print "ready to stop primary client pid: $client\n";
+sleep 1;
+@res = `kill -stop $client`;
+my $backend = $node_primary->find_child("SELECT");
+print "backend pid:$backend\n";
+
+# cancel the query
+sleep 2;
+$node_primary->safe_psql('postgres', "select pg_cancel_backend($backend)");
+# check the query
+my $pid_new = $node_primary->find_child("SELECT");
+print "backend pid:$pid_new\n";
+ok($pid_new == $backend, "backend can't be canceled");
+@res = `kill -cont $client`;
+@res = `kill -s 9 $client`;
+
+# create recovery conflict
+$connstr = $node_standby->connstr;
+@res = `nohup psql -d "$connstr" -c "begin; select pg_sleep(2); select * from test_table;" > tmp_check/myout.file 2>&1 &`;
+$host = $node_standby->host;
+$port = $node_standby->port;
+$client = readpipe("ps -ef | grep \"psql\" | grep \"$host\" | grep \"$port\" | grep -v grep | awk '{print \$2}'");
+print "ready to stop standby client pid: $client\n";
+sleep 1;
+@res = `kill -stop $client`;
+
+# delete data and do vacuum in primary node
+$node_primary->safe_psql("postgres", "delete from test_table");
+$node_primary->safe_psql("postgres", "vacuum test_table");
+# cancel the query
+$backend = $node_standby->find_child("SELECT");
+$node_standby->safe_psql('postgres', "select pg_cancel_backend($backend)");
+# check the query
+sleep 2;
+$pid_new = $node_standby->find_child("SELECT");
+print "backend pid:$pid_new\n";
+ok($pid_new == $backend, "backend can't be canceled by user");
+
+# wait until recovery conflict generate
+sleep 3;
+$pid_new = $node_standby->find_child("SELECT");
+print "backend pid:$pid_new\n";
+ok($pid_new == 0, "backend is canceled");
+@res = `kill -cont $client`;
+@res = `kill -s 9 $client`;
+
+my $basedir = $node_standby->basedir();
+my $logdir = "$basedir/pgdata/log";
+my @exits = `grep -rn "terminating connection due to conflict with recovery" $logdir`;
+my $found = @exits;
+ok($found == 1, "backend is canceled due to recovery conflict");
+
+$node_standby->stop;
+$node_primary->stop;
--
1.8.3.1
On 2021/09/22 1:14, 蔡梦娟(玊于) wrote:
Hi all, I want to know your opinion on this patch, or in what way do you think we should solve this problem?
I agree that something like the patch (i.e., introduction of promotion
from cancel request to terminate one) is necessary for the fix. One concern
on the patch is that the cancel request can be promoted to the terminate one
even when secure_write() doesn't actually get stuck. Is this acceptable?
Maybe I'm tempted to set up the duration until the promotion happens....
Or we should introduce the dedicated timer for communication on the socket?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Yes, it is more appropriate to set a duration time to determine whether secure_write() is stuck, but it is difficult to define how long the duration time is.
in my first patch, I add a GUC to allow the user to set the time, or it can be hardcoded if a time deemed reasonable is provided?
------------------------------------------------------------------I agree that something like the patch (i.e., introduction of promotion
from cancel request to terminate one) is necessary for the fix. One concern
on the patch is that the cancel request can be promoted to the terminate one
even when secure_write() doesn't actually get stuck. Is this acceptable?
Maybe I'm tempted to set up the duration until the promotion happens....
Or we should introduce the dedicated timer for communication on the socket?
On 8/23/21 10:15 AM, 蔡梦娟(玊于) wrote:
Hi, all
Recently, I got a problem that the startup process of standby is stuck and keeps in a waiting state. The backtrace of startup process shows that it is waiting for a backend process which conflicts with recovery processing to exit, the guc parameters max_standby_streaming_delay and max_standby_archive_delay are both set as 30 seconds, but it doesn't work. The backend process keeps alive, and the backtrace of this backend process show that it is waiting for the socket to be writeable in secure_write(). After further reading the code, I found that ProcessClientWriteInterrupt() in secure_write() only process interrupts when ProcDiePending is true, otherwise do nothing. However, snapshot conflicts with recovery will only set QueryCancelPending as true, so the response to the signal will de delayed indefinitely if the corresponding client is stuck, thus blocking the recovery process.
I want to know why the interrupt is only handled when ProcDiePending is true, I think query which is supposed to be canceled also should respond to the signal.
I also want to share a patch with you, I add a guc parameter max_standby_client_write_delay, if a query is supposed to be canceled, and the time delayed by a client exceeds max_standby_client_write_delay, then set ProcDiePending as true to avoid being delayed indefinitely, what do you think of it, hope to get your reply.
Thanks & Best Regard
Hello,
A customer encountered a similar issue (Postgres 15.12) :
1. An heavily updated table on the primary. Reads seems to generate heap
prune record.
2. This same table is heavily read on standby and queries conflicts with
recovery
3. This generates a lag and the load balancer decide to cut the
connection to the clients. Thus, we have actives sessions with
ClientWrite wait event.
4. The recovery is still waiting, there is no wait_event on the startup
process, neither "not granted" locks.
5. After 900s, recovery is resumed
In the logs we have :
LOG: recovery still waiting after 1002.241 ms: recovery conflict on snapshot
<here the load balancer decide to cut connections>
Then ~900s later, we can a see a canceled query due to "connection to
client lost".
Then, recovery can resume :
LOG: recovery finished waiting after 952697.269 ms: recovery conflict
on snapshot
It is surprising as we have both :
max_standby_archive_delay = 30s
max_standby_streaming_delay = 30s
And all standbys became stuck around at the same time during the same
duration.
We tried to put aggressive tcp_keepalives* settings :
tcp_keepalives_count = 3
tcp_keepalives_idle = 5
tcp_keepalives_interval = 5
client_connection_check_interval = 10s
It changed nothing, we are investigating why they are not working.
We suspect "something" in k8s network layerS.
Anyway, it is on "system side".
As mentioned in this thread, we expect the query should be canceled
after 30s in all cases (even if network is lying).
(The main issue was hot_standby_feedback was at off. However, I know,
even at on, it can't prevent all recovery conflicts. That's why I wanted
to add a secure belt with keepalives settings).
Thanks
--
Adrien NAYRAT