Escaping from blocked send() reprised.
Hello, I have received inquiries related to blocked communication
several times for these weeks with different symptoms. Then I
found this message from archive,
Subject: Escaping a blocked sendto() syscall without causing a restart
Mr. Tom Lane gave a comment replying it,
Offhand it looks to me like most signals would kick the backend off the
send() call ... but it would loop right back and try again. See
internal_flush() in pqcomm.c. (If you're using SSL, this diagnosis
may or may not apply.)We can't do anything except repeat the send attempt if the client
connection is to be kept in a sane state.
(snipped)
And I'm not at all sure if we could get it to work in SSL mode...
That's true for timeouts that should continue the connection,
say, statement_timeout, but focusing on intentional backend
termination, I think it does no harm to break it up abruptly,
even if it was on SSL. On the other hand it seems still
preferable to keep a connection when not blocked. The following
expression would detects such a blocking state at just before
next send(2) after the previous try exited by signals.
(ProcDiePending && select(1, NULL, fd, NULL, '1 sec') == 0)
Finally, pg_terminate_backend() works even when send is blocked
for both SSL and non-SSL connections after 1 second delay with
this patch (break_socket_blocking_on_termination_v1.patch).
Nevetheless, of course statement_timeout cannot become effective
by this method since it breaks the consistency in the client
protocol. It needs change in client protocol to have "out of
band" mechanism or something, maybe.
Any suggestions?
Attached patches are:
- break_socket_blocking_on_termination_v1.patch : The patch to
break blocked state of send(2) for pg_terminate_backend().
- socket_block_test.patch : debug printing and changing send
buffer of libpq for reproducing the blocked situation.
Some point of discussion follows,
==== Discussion about the appropriateness of looking into
ProcDiePending there and calling CHECK_FOR_INTERRUPTS()
seeing it.
I have somewhat uneasiness of these things, but what we can at
most seems to be replacing ProcDiePending here with some another
variable, say ImmediatelyExitFromBlockedState, and somehow go
upstairs through normal return path. Additional Try-Catch seems
can do that but it looks no benefit for the added complexity..
==== Discussion on breaking up connetion especially for SSL
Breaking an SSL connection up in my_sock_write() cause the
following message on client side if it still lives and resumes to
receive from the connection, this seems to show that the client
handles the event properly.
| SSL SYSCALL error: EOF detected
| The connection to the server was lost. Attempting reset: Succeeded.
==== Discussion on reliability of select(2)
This method is not a perfect solution, since the select(2)
sometimes gives a wrong oracle about wheather the follwing
send(2) will be blocked.
Even so, as far as I see, select(2) just after exiting from
blocked send(2) by signal seems always says 'write will be
blocked', so what this patch does seems to save most cases except
when the any amount of socket buffer is vacated just before the
following select. The second chance to exit from blocked send(2)
won't come after this(, before one more pg_terminate_backend() ?).
Removing the select(2) from the condition (that is,
CHECK_FOR_INTERRUPTS() is called always ProcDiePending is true)
prevents such a possibility, in exchange for killing 'really
live' connection but IMHO it's no problem on intentional server
termination.
More reliable measure for this would be non-blocking IO but it
changes more of the code.
==== Reproducing the situation.
Another possible question would be about the possibility of such
blocking, or how to reproduce the situation. I found that send(2)
on CentOS6.5 somehow sends successfully, for most cases, the
remaining data at the retry after exiting by signal during being
blocked with buffer full, in spite of no change in environment.
So reproducing the stucked situation is rather difficult on the
server as is. But such situation would be reproduced quite easily
with some cheat, that is, enlarging PQ send buffer, say by ten
times.
Applying the attached testing patch (socket_block_test.patch),
the following steps will make the stucked situation.
1. Do a select which returns big result and enter Ctrl-Z just
after invoking.
cl> $ psql -h localhost postgres
cl> postgres=# select 1 from generate_series(0, 9999999);
cl> ^Z
cl> [4]+ Stopped psql -h localhost postgres
2. Watch the server to stuck.
The server starts to print lines like following to console
after a while, then stops. The number enclosed by the square
brackets is PID of the server.
sv> #### [8809] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0
3. Do pg_terminate_backend().
cl> $ psql postgres -c "select pg_terminate_backend(8809)"
The server will stuck like follows, PID=8811 is the another
session made by the command just above.
sv> #### [8809] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0
sv> #### [8811] (bare) rest = 0 / 327 bytes, ProcDiePending = 0
sv> #### [8809] (bare) rest = 500 / 81920 bytes, ProcDiePending = 1
sv> #### [8811] (bare) rest = 0 / 78 bytes, ProcDiePending = 0
The server 8809 is blocked during sending the remaining 500
bytes and won't come back forever except for SIGKILL, or
possible data reading on the client (fg does it).
cl> $ fg
sv> #### [8809] (bare) rest = 0 / 500 bytes, ProcDiePending = 1
sv> FATAL: terminating connection due to administrator command
sv> STATEMENT: select 1 from generate_series(0, 9999999);
sv> #### [8809] (bare) rest = 0 / 116 bytes, ProcDiePending = 0
sv> #### [8883] (bare) rest = 0 / 327 bytes, ProcDiePending = 0
If you don't see the situation to occur, changing the value of
select clause (by its length, not by value:) would be
effective, or entering Ctrl-Z after some debug output also
would be effective.
For SSL connections, the debug output looks like the following,
sv> #### [20064] (bare) rest = 0 / 81920 bytes, ProcDiePending = 0
sv> #### [20064] (SSL) rest = 0 / 16413 bytes, ProcDiePending = 0
sv> #### [20064] (SSL) rest = 0 / 16413 bytes, ProcDiePending = 0
sv> #### [20064] (SSL) rest = 0 / 16413 bytes, ProcDiePending = 0
sv> #### [20064] (SSL) rest = 980 / 16413 bytes, ProcDiePending = 1
sv> #### [20064] (SSL) rest = 0 / 980 bytes, ProcDiePending = 1
sv> #### [20064] (SSL) rest = 1029 / 16413 bytes, ProcDiePending = 1
"bare" here in turn means the status of SSL_write and "SSL"
means the status of the underlying 'bare' socket of SSL
connection. (Sorry for the confising labelings..)
The (bare) line above is not corresponding to the following
bunch of (SSL) lines, but its precedents. At the fifth line,
send(2) exits by signal issued by pg_teminate_backend() then
retry (somehow) successfully at sixth line but SSL layer gave
another 16413 bytes and only 1029 bytes of that is sent by the
first try and the server stucked at the second try for the
seventh line. The control doesn't come back to secure_write()
during this sequence.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
break_socketblocking_on_termination_v1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 59204cf..f01c140 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -76,6 +76,7 @@
#include "libpq/libpq.h"
#include "tcop/tcopprot.h"
+#include "miscadmin.h"
#include "utils/memutils.h"
@@ -323,6 +324,41 @@ rloop:
}
/*
+ * Although we basically should try to send all data staying in our send
+ * buffer, we also should consider the possibility that hanging of clients or
+ * network cutoff has compelled send(2) to be blokced. We need to be allowed
+ * to exit from send() if such blocking states last for a while during process
+ * termination. Returns true if send blocking is detected.
+ *
+ * The worse side of this would be that extra-slow receiver (specifically
+ * under one packet per second) might fail to recive the result to the end,
+ * but pg_terminate_backend() is such a thing.
+ */
+static bool
+is_write_blocked(int fd)
+{
+#ifndef WIN32
+ fd_set wfds;
+ struct timeval tv;
+ int ret;
+ int save_errno = errno;
+
+ FD_ZERO(&wfds);
+ FD_SET(fd, &wfds);
+ tv.tv_sec = 1;
+ tv.tv_usec = 0;
+ ret = select(fd + 1, NULL, &wfds, NULL, &tv);
+
+ /* The error of select here is safely ignorable. */
+ errno = save_errno;
+
+ return ret <= 0 ? true : false;
+#else
+ return false;
+#endif
+}
+
+/*
* Write data to a secure connection.
*/
ssize_t
@@ -457,6 +493,14 @@ wloop:
#endif
n = send(port->sock, ptr, len, 0);
+ /*
+ * Check for interrupt if the socket was blocked during process is
+ * being terminated.
+ */
+ if (ProcDiePending && is_write_blocked(port->sock))
+ CHECK_FOR_INTERRUPTS();
+
+
return n;
}
@@ -515,6 +559,14 @@ my_sock_write(BIO *h, const char *buf, int size)
res = send(h->num, buf, size, 0);
BIO_clear_retry_flags(h);
+
+ /*
+ * Check for interrupt if the socket was blocked during process is being
+ * terminated.
+ */
+ if (ProcDiePending && is_write_blocked(h->num))
+ CHECK_FOR_INTERRUPTS();
+
if (res <= 0)
{
if (errno == EINTR)
socket_block_test.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 59204cf..98ee41e 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -75,6 +75,7 @@
#endif /* USE_SSL */
#include "libpq/libpq.h"
+#include "miscadmin.h"
#include "tcop/tcopprot.h"
#include "utils/memutils.h"
@@ -457,6 +458,9 @@ wloop:
#endif
n = send(port->sock, ptr, len, 0);
+ fprintf(stderr, "#### [%d] (bare) rest = %d / %d bytes, ProcDiePending = %d\n",
+ getpid(), len - n, len, ProcDiePending);
+
return n;
}
@@ -514,6 +518,8 @@ my_sock_write(BIO *h, const char *buf, int size)
int res = 0;
res = send(h->num, buf, size, 0);
+ fprintf(stderr, "#### [%d] (SSL) rest = %d / %d bytes, ProcDiePending = %d\n",
+ getpid(), size - res, size, ProcDiePending);
BIO_clear_retry_flags(h);
if (res <= 0)
{
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..259b6ac 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -114,7 +114,7 @@ static List *sock_paths = NIL;
* enlarged by pq_putmessage_noblock() if the message doesn't fit otherwise.
*/
-#define PQ_SEND_BUFFER_SIZE 8192
+#define PQ_SEND_BUFFER_SIZE (8192 * 10)
#define PQ_RECV_BUFFER_SIZE 8192
static char *PqSendBuffer;
On Mon, Jun 30, 2014 at 4:13 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello, I have received inquiries related to blocked communication
several times for these weeks with different symptoms. Then I
found this message from archive,Subject: Escaping a blocked sendto() syscall without causing a restart
Mr. Tom Lane gave a comment replying it,
Offhand it looks to me like most signals would kick the backend off the
send() call ... but it would loop right back and try again. See
internal_flush() in pqcomm.c. (If you're using SSL, this diagnosis
may or may not apply.)We can't do anything except repeat the send attempt if the client
connection is to be kept in a sane state.(snipped)
And I'm not at all sure if we could get it to work in SSL mode...
That's true for timeouts that should continue the connection,
say, statement_timeout, but focusing on intentional backend
termination, I think it does no harm to break it up abruptly,
even if it was on SSL. On the other hand it seems still
preferable to keep a connection when not blocked. The following
expression would detects such a blocking state at just before
next send(2) after the previous try exited by signals.(ProcDiePending && select(1, NULL, fd, NULL, '1 sec') == 0)
Finally, pg_terminate_backend() works even when send is blocked
for both SSL and non-SSL connections after 1 second delay with
this patch (break_socket_blocking_on_termination_v1.patch).Nevetheless, of course statement_timeout cannot become effective
by this method since it breaks the consistency in the client
protocol. It needs change in client protocol to have "out of
band" mechanism or something, maybe.Any suggestions?
You should probably add your patch here, so it doesn't get forgotten about:
https://commitfest.postgresql.org/action/commitfest_view/open
We're focused on reviewing patches for the current CommitFest, so your
patch might not get attention right away. A couple of general
thoughts on this topic:
1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case. I'm not sure what, if anything, we can do about that.
2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time. But I'm unsure what "a
reasonable period of time" means. This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, The replies follow are mainly as a memo for myself so
please don't be bothered to answer until the time comes.
At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ@mail.gmail.com>
You should probably add your patch here, so it doesn't get forgotten about:
https://commitfest.postgresql.org/action/commitfest_view/open
Ok, I'll put this there.
We're focused on reviewing patches for the current CommitFest, so your
patch might not get attention right away. A couple of general
thoughts on this topic:
Thank you for suggestions. I'll consider on them.
1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case. I'm not sure what, if anything, we can do about that.
man 2 send on FreeBSD has not description about EINTR.. And even
on linux, send won't return EINTR for most cases, at least I
haven't seen that. So send()=-1,EINTR seems to me as only an
equivalent of send() = 0. I have no idea about what the
implementer thought the difference is.
2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time. But I'm unsure what "a
reasonable period of time" means. This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.
I think there's no such a reasonable time. The behavior might
should be determined from another point.. On alternative would be
let pg_terminate_backend() have a parameter instructing force
shutodwn (how to propagate it?), or make a forced shutdown on
duplicate invocation of pg_terminate_backend().
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 30, 2014 at 11:26 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time. But I'm unsure what "a
reasonable period of time" means. This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.I think there's no such a reasonable time. The behavior might
should be determined from another point.. On alternative would be
let pg_terminate_backend() have a parameter instructing force
shutodwn (how to propagate it?), or make a forced shutdown on
duplicate invocation of pg_terminate_backend().
Well, I think that when people call pg_terminate_backend() just once,
they expect it to kill the target backend. I think people will
tolerate a short delay, like a few seconds; after all, there's no
guarantee, even today, that the backend will hit a
CHECK_FOR_INTERRUPTS() in less than a few hundred milliseconds. But
they are not going to want to have to take a second action to kill the
backend - killing it once should be sufficient.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 01, 2014 at 12:26:43PM +0900, Kyotaro HORIGUCHI wrote:
1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case. I'm not sure what, if anything, we can do about that.man 2 send on FreeBSD has not description about EINTR.. And even
on linux, send won't return EINTR for most cases, at least I
haven't seen that. So send()=-1,EINTR seems to me as only an
equivalent of send() = 0. I have no idea about what the
implementer thought the difference is.
Whether send() returns EINTR or not depends on whether the signal has
been marked restartable or not. This is configurable per signal, see
sigaction(). If the signal is marked to restart, the kernel returns
ERESTARTHAND (IIRC) and the libc will redo the call internally.
Default BSD does not return EINTR normally, but supports sigaction().
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.
-- Arthur Schopenhauer
Hello,
At Tue, 1 Jul 2014 21:21:38 +0200, Martijn van Oosterhout <kleptog@svana.org> wrote in <20140701192138.GB20140@svana.org>
On Tue, Jul 01, 2014 at 12:26:43PM +0900, Kyotaro HORIGUCHI wrote:
1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case. I'm not sure what, if anything, we can do about that.man 2 send on FreeBSD has not description about EINTR.. And even
on linux, send won't return EINTR for most cases, at least I
haven't seen that. So send()=-1,EINTR seems to me as only an
equivalent of send() = 0. I have no idea about what the
implementer thought the difference is.Whether send() returns EINTR or not depends on whether the signal has
been marked restartable or not. This is configurable per signal, see
sigaction(). If the signal is marked to restart, the kernel returns
ERESTARTHAND (IIRC) and the libc will redo the call internally.
Wow, thank you for detailed information. I'll study that and take
it into future discussion.
Default BSD does not return EINTR normally, but supports sigaction().
I guess it is for easiness-to-keep-compatibility, seems
reasonable.
have a nice day,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, thank you for keeping this discussion moving.
I think there's no such a reasonable time. The behavior might
should be determined from another point.. On alternative would be
let pg_terminate_backend() have a parameter instructing force
shutodwn (how to propagate it?), or make a forced shutdown on
duplicate invocation of pg_terminate_backend().Well, I think that when people call pg_terminate_backend() just once,
they expect it to kill the target backend. I think people will
tolerate a short delay, like a few seconds; after all, there's no
guarantee, even today, that the backend will hit a
CHECK_FOR_INTERRUPTS() in less than a few hundred milliseconds.
Sure.
But they are not going to want to have to take a second action
to kill the backend - killing it once should be sufficient.
Hmm, it sounds persuasive. Well, do you think they tolerate
-force option? (Even though its technical practicality is not
clear)
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote:
At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ@mail.gmail.com>
1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case. I'm not sure what, if anything, we can do about that.
We use a custom "write" routine with SSL_write, where we call send()
ourselves, so that's not a problem as long as we put the check in the
right place (in secure_raw_write(), after my recent SSL refactoring -
the patch needs to be rebased).
man 2 send on FreeBSD has not description about EINTR.. And even
on linux, send won't return EINTR for most cases, at least I
haven't seen that. So send()=-1,EINTR seems to me as only an
equivalent of send() = 0. I have no idea about what the
implementer thought the difference is.
As the patch stands, there's a race condition: if the SIGTERM arrives
*before* the send() call, the send() won't return EINTR anyway. So
there's a chance that you still block. Calling pq_terminate_backend()
again will dislodge it (assuming send() returns with EINTR on signal),
but I don't think we want to define the behavior as "usually,
pq_terminate_backend() will kill a backend that's blocked on sending to
the client, but sometimes you have to call it twice (or more!) to really
kill it".
A more robust way is to set ImmediateInterruptOK before calling send().
That wouldn't let you send data that can be sent without blocking
though. For that, you could put the socket to non-blocking mode, and
sleep with select(), also waiting for the process' latch at the same
time (die() sets the latch, so that will wake up the select() if a
termination request arrives).
Is it actually safe to process the die-interrupt where send() is called?
ProcessInterrupts() does "ereport(FATAL, ...)", which will attempt to
send a message to the client. If that happens in the middle of
constructing some other message, that will violate the protocol.
2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time. But I'm unsure what "a
reasonable period of time" means. This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.I think there's no such a reasonable time.
I agree it's pretty hard to define any reasonable timeout here. I think
it would be fine to just cut the connection; even if you don't block
while sending, you'll probably reach a CHECK_FOR_INTERRUPT() somewhere
higher in the stack and kill the connection almost as abruptly anyway.
(you can't violate the protocol, however)
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Sorry, I was absorbed by other tasks..
Thank you for reviewing thiis.
On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote:
At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas
<robertmhaas@gmail.com> wrote in
<CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=Wc8PnQ@mail.gmail.com>1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR.... and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case. I'm not sure what, if anything, we can do about that.We use a custom "write" routine with SSL_write, where we call send()
ourselves, so that's not a problem as long as we put the check in the
right place (in secure_raw_write(), after my recent SSL refactoring -
the patch needs to be rebased).man 2 send on FreeBSD has not description about EINTR.. And even
on linux, send won't return EINTR for most cases, at least I
haven't seen that. So send()=-1,EINTR seems to me as only an
equivalent of send() = 0. I have no idea about what the
implementer thought the difference is.As the patch stands, there's a race condition: if the SIGTERM arrives
*before* the send() call, the send() won't return EINTR anyway. So
there's a chance that you still block. Calling pq_terminate_backend()
again will dislodge it (assuming send() returns with EINTR on signal),
Yes, that window would'nt be extinguished without introducing
something more. EINTR is set only when nothing sent by the
call. So AFAIS the chance of getting EINTR is far small than
expectation.
but I don't think we want to define the behavior as "usually,
pq_terminate_backend() will kill a backend that's blocked on sending
to the client, but sometimes you have to call it twice (or more!) to
really kill it".
I agree that it is desirable behavior, if any measure to avoid
that. But I think it's better than doing kill -9 engulfing all
innocent backends.
A more robust way is to set ImmediateInterruptOK before calling
send(). That wouldn't let you send data that can be sent without
blocking though. For that, you could put the socket to non-blocking
mode, and sleep with select(), also waiting for the process' latch at
the same time (die() sets the latch, so that will wake up the select()
if a termination request arrives).
I condiered it but select() frequently (rather in most cases when
send() blocks by send buffer exhaustion) fails to predict that
following send() will be blocked. (If my memory is correct.) So
the final problem would be blocked send()...
Is it actually safe to process the die-interrupt where send() is
called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
attempt to send a message to the client. If that happens in the middle
of constructing some other message, that will violate the protocol.
So I strongly agree to you if select() works as the impression
when reading the man document.
2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time. But I'm unsure what "a
reasonable period of time" means. This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.I think there's no such a reasonable time.
I agree it's pretty hard to define any reasonable timeout here. I
think it would be fine to just cut the connection; even if you don't
block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
somewhere higher in the stack and kill the connection almost as
abruptly anyway. (you can't violate the protocol, however)
Yes, closing the blocked connection seems one of the most smarter
way, checking the occurred interrupt could avoid protocol
violation. But the problem for that is that there seems no means
to close sockets elsewhere the blocking handle. dup(2)'ed handle
cannot release the resource by only itself.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 08/26/2014 09:17 AM, Kyotaro HORIGUCHI wrote:
but I don't think we want to define the behavior as "usually,
pq_terminate_backend() will kill a backend that's blocked on sending
to the client, but sometimes you have to call it twice (or more!) to
really kill it".I agree that it is desirable behavior, if any measure to avoid
that. But I think it's better than doing kill -9 engulfing all
innocent backends.A more robust way is to set ImmediateInterruptOK before calling
send(). That wouldn't let you send data that can be sent without
blocking though. For that, you could put the socket to non-blocking
mode, and sleep with select(), also waiting for the process' latch at
the same time (die() sets the latch, so that will wake up the select()
if a termination request arrives).I condiered it but select() frequently (rather in most cases when
send() blocks by send buffer exhaustion) fails to predict that
following send() will be blocked. (If my memory is correct.) So
the final problem would be blocked send()...
My point was to put the socket in non-blocking mode, so that send() will
return immediately with EAGAIN instead of blocking, if the send buffer
is full. See WalSndWriteData for how that would work, it does something
similar.
Is it actually safe to process the die-interrupt where send() is
called? ProcessInterrupts() does "ereport(FATAL, ...)", which will
attempt to send a message to the client. If that happens in the middle
of constructing some other message, that will violate the protocol.So I strongly agree to you if select() works as the impression
when reading the man document.
Not sure what you mean, but the above is a fatal problem with the patch
right now, regardless of how you do the sleeping.
2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time. But I'm unsure what "a
reasonable period of time" means. This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.I think there's no such a reasonable time.
I agree it's pretty hard to define any reasonable timeout here. I
think it would be fine to just cut the connection; even if you don't
block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
somewhere higher in the stack and kill the connection almost as
abruptly anyway. (you can't violate the protocol, however)Yes, closing the blocked connection seems one of the most smarter
way, checking the occurred interrupt could avoid protocol
violation. But the problem for that is that there seems no means
to close sockets elsewhere the blocking handle. dup(2)'ed handle
cannot release the resource by only itself.
I didn't understand that, surely you can just close() the socket? There
is no dup(2) involved. And we don't necessarily need to close the
socket, we just need to avoid writing to it when we're already in the
middle of sending a message.
I'm marking this as Waiting on Author in the commitfest app, because:
1. the protocol violation needs to be avoided one way or another, and
2. the behavior needs to be consistent so that a single
pg_terminate_backend() is enough to always kill the connection.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
I condiered it but select() frequently (rather in most cases when
send() blocks by send buffer exhaustion) fails to predict that
following send() will be blocked. (If my memory is correct.) So
the final problem would be blocked send()...My point was to put the socket in non-blocking mode, so that send()
will return immediately with EAGAIN instead of blocking, if the send
buffer is full. See WalSndWriteData for how that would work, it does
something similar.
I confused it with what I did during writing this patch. select()
- blocking send(). Sorry for confusing the discussion. I
understand correctly what you mean and It sounds reasonable.
I agree it's pretty hard to define any reasonable timeout here. I
think it would be fine to just cut the connection; even if you don't
block while sending, you'll probably reach a CHECK_FOR_INTERRUPT()
somewhere higher in the stack and kill the connection almost as
abruptly anyway. (you can't violate the protocol, however)Yes, closing the blocked connection seems one of the most smarter
way, checking the occurred interrupt could avoid protocol
violation. But the problem for that is that there seems no means
to close sockets elsewhere the blocking handle. dup(2)'ed handle
cannot release the resource by only itself.I didn't understand that, surely you can just close() the socket?
There is no dup(2) involved. And we don't necessarily need to close
the socket, we just need to avoid writing to it when we're already in
the middle of sending a message.
My assumption there was select() and *blocking* send(). So the
sentence cited is out of point from the first.
I'm marking this as Waiting on Author in the commitfest app, because:
1. the protocol violation needs to be avoided one way or another, and
2. the behavior needs to be consistent so that a single
pg_terminate_backend() is enough to always kill the connection.
Thank you for the suggestion. I think I can go forward with that
and will come up with new patch.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, sorry for the dazed reply in the previous mail.
I made revised patch for this issue.
Attached patches are following,
- 0001_Revise_socket_emulation_for_win32_backend.patch
Revises socket emulation on win32 backend so that each socket
can have its own blocking mode state.
- 0002_Allow_backend_termination_during_write_blocking.patch
The patch to solve the issue. This patch depends on the 0001_
patch.
==========
I'm marking this as Waiting on Author in the commitfest app, because:
1. the protocol violation needs to be avoided one way or another, and
2. the behavior needs to be consistent so that a single
pg_terminate_backend() is enough to always kill the connection.
- Preventing protocol violation.
To prevent protocol violation, secure_write sets
ClientConnectionLost when SIGTERM detected, then
internal_flush() and ProcessInterrupts() follow the
instruction.
- Single pg_terminate_backend surely kills the backend.
secure_raw_write() uses non-blocking socket and a loop of
select() with timeout to surely detects received
signal(SIGTERM).
To avoid frequent switching of blocking mode, the bare socket
for Port is put to non-blocking mode from the first in
StreamConnection() and blocking mode is controlled only by
Port->noblock in secure_raw_read/write().
To make the code mentioned above (Patch 0002) tidy, rewrite the
socket emulation code for win32 backends so that each socket
can have its own non-blocking state. (patch 0001)
Some concern about this patch,
- This patch allows the number of non-blocking socket to be below
64 (FD_SETSIZE) on win32 backend but it seems to be sufficient.
- This patch introduced redundant socket emulation for win32
backend but win32 bare socket for Port is already nonblocking
as described so it donsn't seem to be a serious problem on
performance. Addition to it, since I don't know the reason why
win32/socket.c provides the blocking-mode socket emulation, I
decided to preserve win32/socket.c to have blocking socket
emulation. Possibly it can be removed.
Any suggestions?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001_Revise_socket_emulation_for_win32_backend.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..c92851e 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -795,10 +795,6 @@ pq_set_nonblocking(bool nonblocking)
if (MyProcPort->noblock == nonblocking)
return;
-#ifdef WIN32
- pgwin32_noblock = nonblocking ? 1 : 0;
-#else
-
/*
* Use COMMERROR on failure, because ERROR would try to send the error to
* the client, which might require changing the mode again, leading to
@@ -816,7 +812,7 @@ pq_set_nonblocking(bool nonblocking)
ereport(COMMERROR,
(errmsg("could not set socket to blocking mode: %m")));
}
-#endif
+
MyProcPort->noblock = nonblocking;
}
diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index c981169..f0ff3e7 100644
--- a/src/backend/port/win32/socket.c
+++ b/src/backend/port/win32/socket.c
@@ -21,11 +21,8 @@
* non-blocking mode in order to be able to deliver signals, we must
* specify this in a separate flag if we actually need non-blocking
* operation.
- *
- * This flag changes the behaviour *globally* for all socket operations,
- * so it should only be set for very short periods of time.
*/
-int pgwin32_noblock = 0;
+static fd_set nonblockset;
#undef socket
#undef accept
@@ -33,6 +30,7 @@ int pgwin32_noblock = 0;
#undef select
#undef recv
#undef send
+#undef closesocket
/*
* Blocking socket functions implemented so they listen on both
@@ -40,6 +38,34 @@ int pgwin32_noblock = 0;
*/
/*
+ * Set blocking mode for each socket
+ */
+void
+pgwin32_set_socket_nonblock(SOCKET s, int nonblock)
+{
+ if (nonblock)
+ FD_SET(s, &nonblockset);
+ else
+ FD_CLR(s, &nonblockset);
+
+ /*
+ * fd_set cannot have more than FD_SETSIZE entries. It's not likey to come
+ * close to this limit but if it goes above the limit, non blocking state
+ * of some existing sockets will be discarded.
+ */
+ if (nonblockset.fd_count >= FD_SETSIZE)
+ elog(FATAL, "Too many sockets requested to be nonblocking mode.");
+}
+
+void
+pgwin32_nonblockset_init()
+{
+ FD_ZERO(&nonblockset);
+}
+
+#define socket_is_nonblocking(s) FD_ISSET((s), &nonblockset)
+
+/*
* Convert the last socket error code into errno
*/
static void
@@ -256,6 +282,10 @@ pgwin32_socket(int af, int type, int protocol)
TranslateSocketError();
return INVALID_SOCKET;
}
+
+ /* newly cerated socket should be in blocking mode */
+ pgwin32_set_socket_nonblock(s, false);
+
errno = 0;
return s;
@@ -334,7 +364,7 @@ pgwin32_recv(SOCKET s, char *buf, int len, int f)
return -1;
}
- if (pgwin32_noblock)
+ if (socket_is_nonblocking(s))
{
/*
* No data received, and we are in "emulated non-blocking mode", so
@@ -420,7 +450,7 @@ pgwin32_send(SOCKET s, const void *buf, int len, int flags)
return -1;
}
- if (pgwin32_noblock)
+ if (socket_is_nonblocking(s))
{
/*
* No data sent, and we are in "emulated non-blocking mode", so
@@ -645,6 +675,15 @@ pgwin32_select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, c
/*
+ * Unused entry in nonblockset needs to be removed when closing socket.
+ */
+int pgwin32_closesocket(SOCKET s)
+{
+ pgwin32_set_socket_nonblock(s, false);
+ return closesocket(s);
+}
+
+/*
* Return win32 error string, since strerror can't
* handle winsock codes
*/
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c7f41a5..72e0576 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3255,22 +3255,10 @@ PgstatCollectorMain(int argc, char *argv[])
/*
* Try to receive and process a message. This will not block,
* since the socket is set to non-blocking mode.
- *
- * XXX On Windows, we have to force pgwin32_recv to cooperate,
- * despite the previous use of pg_set_noblock() on the socket.
- * This is extremely broken and should be fixed someday.
*/
-#ifdef WIN32
- pgwin32_noblock = 1;
-#endif
-
len = recv(pgStatSock, (char *) &msg,
sizeof(PgStat_Msg), 0);
-#ifdef WIN32
- pgwin32_noblock = 0;
-#endif
-
if (len < 0)
{
if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b190cf5..5d32de6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -896,6 +896,10 @@ PostmasterMain(int argc, char *argv[])
*/
InitializeMaxBackends();
+#ifdef WIN32
+ pgwin32_nonblockset_init();
+#endif
+
/*
* Establish input sockets.
*/
diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index 550c3ec..b0df45e 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -368,6 +368,7 @@ void pg_queue_signal(int signum);
#define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout)
#define recv(s, buf, len, flags) pgwin32_recv(s, buf, len, flags)
#define send(s, buf, len, flags) pgwin32_send(s, buf, len, flags)
+#define closesocket(s) pgwin32_closesocket(s)
SOCKET pgwin32_socket(int af, int type, int protocol);
SOCKET pgwin32_accept(SOCKET s, struct sockaddr * addr, int *addrlen);
@@ -375,11 +376,12 @@ int pgwin32_connect(SOCKET s, const struct sockaddr * name, int namelen);
int pgwin32_select(int nfds, fd_set *readfs, fd_set *writefds, fd_set *exceptfds, const struct timeval * timeout);
int pgwin32_recv(SOCKET s, char *buf, int len, int flags);
int pgwin32_send(SOCKET s, const void *buf, int len, int flags);
+int pgwin32_closesocket(SOCKET s);
const char *pgwin32_socket_strerror(int err);
int pgwin32_waitforsinglesocket(SOCKET s, int what, int timeout);
-
-extern int pgwin32_noblock;
+void pgwin32_set_socket_nonblock(SOCKET s, int nonblock);
+void pgwin32_nonblockset_init();
/* in backend/port/win32/security.c */
extern int pgwin32_is_admin(void);
diff --git a/src/port/noblock.c b/src/port/noblock.c
index 1da0339..d6cc6a2 100644
--- a/src/port/noblock.c
+++ b/src/port/noblock.c
@@ -25,9 +25,18 @@ pg_set_noblock(pgsocket sock)
#else
unsigned long ioctlsocket_ret = 1;
+#ifndef FRONTEND
+ /*
+ * sockets on non-frontend processes on win32 is wrapped and blocking mode
+ * is controlled there. See socket.c for the details.
+ */
+ pgwin32_set_socket_nonblock(sock, true);
+ return 1;
+#else
/* Returns non-0 on failure, while fcntl() returns -1 on failure */
return (ioctlsocket(sock, FIONBIO, &ioctlsocket_ret) == 0);
-#endif
+#endif /* FRONTEND */
+#endif /* !WIN32 */
}
@@ -41,10 +50,16 @@ pg_set_block(pgsocket sock)
if (flags < 0 || fcntl(sock, F_SETFL, (long) (flags & ~O_NONBLOCK)))
return false;
return true;
-#else
+#else /* !WIN32 */
unsigned long ioctlsocket_ret = 0;
+#ifndef FRONTEND
+ /* See pg_set_noblock */
+ pgwin32_set_socket_nonblock(sock, false);
+ return 1;
+#else
/* Returns non-0 on failure, while fcntl() returns -1 on failure */
return (ioctlsocket(sock, FIONBIO, &ioctlsocket_ret) == 0);
-#endif
+#endif /* FRONTEND */
+#endif /* !WIN32 */
}
0002_Allow_backend_termination_during_write_blocking.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 41ec1ad..fbb4c47 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -34,7 +34,7 @@
#include "libpq/libpq.h"
#include "tcop/tcopprot.h"
#include "utils/memutils.h"
-
+#include "miscadmin.h"
char *ssl_cert_file;
char *ssl_key_file;
@@ -140,6 +140,10 @@ secure_read(Port *port, void *ptr, size_t len)
return n;
}
+/*
+ * Read data from socket.
+ * This emulates blocking behavior using non-blocking sockets.
+ */
ssize_t
secure_raw_read(Port *port, void *ptr, size_t len)
{
@@ -147,8 +151,34 @@ secure_raw_read(Port *port, void *ptr, size_t len)
prepare_for_client_read();
- n = recv(port->sock, ptr, len, 0);
+ if (port->noblock)
+ n = recv(port->sock, ptr, len, 0);
+ else
+ {
+ do
+ {
+ fd_set rfds;
+
+ FD_ZERO(&rfds);
+ FD_SET(port->sock, &rfds);
+ /*
+ * In contrast to secure_raw_write, this section runs with
+ * ImmediateInterruptOK = true so we can wait forever in
+ * select.
+ */
+ n = select(port->sock + 1, &rfds, NULL, NULL, NULL);
+ if (n < 0) break;
+
+ n = recv(port->sock, ptr, len, 0);
+
+ /*
+ * We should have something to read here so EAGAIN/EWOULDBLOCK is
+ * likey not to be seen. But we check them here not to return
+ * these error numbers for blocking sockets for the caller.
+ */
+ } while (n < 0 && (errno == EAGAIN || errno == EWOULDBLOCK));
+ }
client_read_ended();
return n;
@@ -178,5 +208,77 @@ secure_write(Port *port, void *ptr, size_t len)
ssize_t
secure_raw_write(Port *port, const void *ptr, size_t len)
{
- return send(port->sock, ptr, len, 0);
+ int ret = 0;
+
+ /*
+ * Port socket is always in non-blocking mode. See StreamConnection for
+ * the details.
+ */
+ ret = send(port->sock, ptr, len, 0);
+
+ /* We can return here regardless of blocking mode in the most cases */
+ if (port->noblock || ret > 0 || len == 0)
+ return ret;
+
+ /* Here, we shold block waiting for the room in send buffer. */
+ while(ret < 1 && !ProcDiePending)
+ {
+ fd_set wfds;
+ struct timeval tv;
+ int i = 0;
+
+ FD_ZERO(&wfds);
+ tv.tv_usec = 0;
+
+ /*
+ * We may get terminate signal (SIGTERM) during write blocking. If we
+ * check ProcDiePending then wait by select indefinitely, SIGTERM
+ * comes after the check and before the select will be pending and we
+ * should wait the second SIGTERM. So we periodically wake up to check
+ * ProcDiePending in order to catch the signal surely. The timeout
+ * for the select is the maximum delay of handling the signal. 1
+ * seconds groundlessly seems to be appropreate.
+ */
+ do
+ {
+ FD_SET(port->sock, &wfds);
+ tv.tv_sec = 1;
+ tv.tv_usec = 0;
+
+ ret = select(port->sock + 1, NULL, &wfds, NULL, &tv);
+ } while (!ProcDiePending && ret == 0);
+
+ if (ProcDiePending || ret < 0)
+ break;
+
+ ret = send(port->sock, ptr, len, 0);
+ if (ProcDiePending)
+ break;
+ if (ret < 0)
+ {
+ if (errno != EAGAIN && errno != EWOULDBLOCK)
+ break;
+
+ /*
+ * This loop might run a busy loop if send(2) returned EAGAIN or
+ * EWOULDBLOCK after select(2) returned normally. Sleep expressly
+ * to avoid the busy loop.
+ */
+ pg_usleep(200000L); /* 200 ms */
+ ret = 0;
+ }
+ }
+
+ if (ProcDiePending)
+ {
+ /*
+ * Allow to terminate this backend. ClientConnectionLost prevents any
+ * more bytes including error messages from being sent to
+ * client. errno is set in order to teach ssl layer not to retry.
+ */
+ ClientConnectionLost = 1;
+ errno = ECONNRESET;
+ }
+
+ return ret;
}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index c92851e..8387d6a 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -718,6 +718,17 @@ StreamConnection(pgsocket server_fd, Port *port)
(void) pq_setkeepalivescount(tcp_keepalives_count, port);
}
+ /*
+ * Put this socket to non-blocking mode. Blocking behavior is emulated in
+ * secure_write() and secure_read().
+ * Use COMMERROR on failure, because ERROR would try to send the error to
+ * the client, which might require changing the mode again, leading to
+ * infinite recursion.
+ */
+ if (!pg_set_noblock(port->sock))
+ ereport(COMMERROR,
+ (errmsg("could not set socket to nonblocking mode: %m")));
+
return STATUS_OK;
}
@@ -792,27 +803,6 @@ TouchSocketFiles(void)
static void
pq_set_nonblocking(bool nonblocking)
{
- if (MyProcPort->noblock == nonblocking)
- return;
-
- /*
- * Use COMMERROR on failure, because ERROR would try to send the error to
- * the client, which might require changing the mode again, leading to
- * infinite recursion.
- */
- if (nonblocking)
- {
- if (!pg_set_noblock(MyProcPort->sock))
- ereport(COMMERROR,
- (errmsg("could not set socket to nonblocking mode: %m")));
- }
- else
- {
- if (!pg_set_block(MyProcPort->sock))
- ereport(COMMERROR,
- (errmsg("could not set socket to blocking mode: %m")));
- }
-
MyProcPort->noblock = nonblocking;
}
@@ -1249,34 +1239,38 @@ internal_flush(void)
if (r <= 0)
{
- if (errno == EINTR)
- continue; /* Ok if we were interrupted */
-
- /*
- * Ok if no data writable without blocking, and the socket is in
- * non-blocking mode.
- */
- if (errno == EAGAIN ||
- errno == EWOULDBLOCK)
+ if (!ClientConnectionLost)
{
+ if (errno == EINTR)
+ continue; /* Ok if we were interrupted */
+
+ /*
+ * Ok if no data writable without blocking, and the socket is in
+ * non-blocking mode.
+ */
+ if (errno == EAGAIN ||
+ errno == EWOULDBLOCK)
+ {
return 0;
- }
-
- /*
- * Careful: an ereport() that tries to write to the client would
- * cause recursion to here, leading to stack overflow and core
- * dump! This message must go *only* to the postmaster log.
- *
- * If a client disconnects while we're in the midst of output, we
- * might write quite a bit of data before we get to a safe query
- * abort point. So, suppress duplicate log messages.
- */
- if (errno != last_reported_send_errno)
- {
- last_reported_send_errno = errno;
- ereport(COMMERROR,
- (errcode_for_socket_access(),
- errmsg("could not send data to client: %m")));
+ }
+
+ /*
+ * Careful: an ereport() that tries to write to the client
+ * would cause recursion to here, leading to stack overflow
+ * and core dump! This message must go *only* to the
+ * postmaster log.
+ *
+ * If a client disconnects while we're in the midst of output,
+ * we might write quite a bit of data before we get to a safe
+ * query abort point. So, suppress duplicate log messages.
+ */
+ if (errno != last_reported_send_errno)
+ {
+ last_reported_send_errno = errno;
+ ereport(COMMERROR,
+ (errcode_for_socket_access(),
+ errmsg("could not send data to client: %m")));
+ }
}
/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5d32de6..d979191 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5789,6 +5789,13 @@ read_inheritable_socket(SOCKET *dest, InheritableSocket *src)
*dest = s;
/*
+ * We didn't inherit emulated blocking mode but port socket should be
+ * always in nonblocking mode. pg_set_noblock() on win32 backend won't
+ * return error.
+ */
+ pg_set_noblock(s);
+
+ /*
* To make sure we don't get two references to the same socket, close
* the original one. (This would happen when inheritance actually
* works..
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7b5480f..1d252e7 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2840,8 +2840,16 @@ ProcessInterrupts(void)
ImmediateInterruptOK = false; /* not idle anymore */
DisableNotifyInterrupt();
DisableCatchupInterrupt();
- /* As in quickdie, don't risk sending to client during auth */
- if (ClientAuthInProgress && whereToSendOutput == DestRemote)
+ /*
+ * As in quickdie, don't risk sending to client during auth. In
+ * addition to that, don't try to send any more to client if current
+ * connection is marked as ClientConnectionLost. It will lead to
+ * protocol violation if the truth is that the connection is living
+ * and amid sending data. Such case will occur if this backend was
+ * terminated during waiting for query result to be sent.
+ */
+ if ((ClientAuthInProgress && whereToSendOutput == DestRemote) ||
+ ClientConnectionLost)
whereToSendOutput = DestNone;
if (IsAutoVacuumWorkerProcess())
ereport(FATAL,
On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
To make the code mentioned above (Patch 0002) tidy, rewrite the
socket emulation code for win32 backends so that each socket
can have its own non-blocking state. (patch 0001)
The first patch that makes non-blocking sockets behave more sanely on
Windows seems like a good idea, independently of the second patch. I'm
looking at the first patch now, I'll make a separate post about the
second patch.
Some concern about this patch,
- This patch allows the number of non-blocking socket to be below
64 (FD_SETSIZE) on win32 backend but it seems to be sufficient.
Yeah, that's plenty.
- This patch introduced redundant socket emulation for win32
backend but win32 bare socket for Port is already nonblocking
as described so it donsn't seem to be a serious problem on
performance. Addition to it, since I don't know the reason why
win32/socket.c provides the blocking-mode socket emulation, I
decided to preserve win32/socket.c to have blocking socket
emulation. Possibly it can be removed.
On Windows, the backend has an emulation layer for POSIX signals, which
uses threads and Windows events. The reason win32/socket.c always uses
non-blocking mode internally is that it needs to wait for the socket to
become readable/writeable, and for the signal-emulation event, at the
same time. So no, we can't remove it.
The approach taken in the first patch seems sensible. I changed it to
not use FD_SET, though. A custom array seems better, that way we don't
need the pgwin32_nonblockset_init() call, we can just use initialize the
variable. It's a little bit more code, but it's well-contained in
win32/socket.c. Please take a look, to double-check that I didn't screw up.
- Heikki
Attachments:
improve-nonblocking-sockets-on-windows-1.patchtext/x-diff; name=improve-nonblocking-sockets-on-windows-1.patchDownload
commit aaaec23b08677baaed900f72db3f9628c0070922
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Tue Sep 2 20:05:47 2014 +0300
Improve non-blocking sockets emulation on Windows.
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..cba79a7 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -795,10 +795,6 @@ pq_set_nonblocking(bool nonblocking)
if (MyProcPort->noblock == nonblocking)
return;
-#ifdef WIN32
- pgwin32_noblock = nonblocking ? 1 : 0;
-#else
-
/*
* Use COMMERROR on failure, because ERROR would try to send the error to
* the client, which might require changing the mode again, leading to
@@ -816,7 +812,6 @@ pq_set_nonblocking(bool nonblocking)
ereport(COMMERROR,
(errmsg("could not set socket to blocking mode: %m")));
}
-#endif
MyProcPort->noblock = nonblocking;
}
diff --git a/src/backend/port/win32/socket.c b/src/backend/port/win32/socket.c
index c981169..51982f0 100644
--- a/src/backend/port/win32/socket.c
+++ b/src/backend/port/win32/socket.c
@@ -3,6 +3,9 @@
* socket.c
* Microsoft Windows Win32 Socket Functions
*
+ * Blocking socket functions implemented so they listen on both the socket
+ * and the signal event, required for signal handling.
+ *
* Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
*
* IDENTIFICATION
@@ -14,18 +17,19 @@
#include "postgres.h"
/*
- * Indicate if pgwin32_recv() and pgwin32_send() should operate
- * in non-blocking mode.
- *
- * Since the socket emulation layer always sets the actual socket to
- * non-blocking mode in order to be able to deliver signals, we must
- * specify this in a separate flag if we actually need non-blocking
- * operation.
- *
- * This flag changes the behaviour *globally* for all socket operations,
- * so it should only be set for very short periods of time.
+ * Keep track which sockets are in non-blocking mode. Since the socket
+ * emulation layer always sets the actual socket to non-blocking mode in
+ * order to be able to deliver signals, we must track ourselves whether to
+ * present blocking or non-blocking behavior to the callers of pgwin32_recv()
+ * and pgwin32_send(). We expect there to be only a few non-blocking sockets,
+ * so a small array will do.
*/
-int pgwin32_noblock = 0;
+#define MAX_NONBLOCKING_SOCKETS 10
+
+static SOCKET nonblocking_sockets[MAX_NONBLOCKING_SOCKETS];
+static int num_nonblocking_sockets = 0;
+
+static bool pgwin32_socket_is_nonblocking(SOCKET s);
#undef socket
#undef accept
@@ -33,11 +37,57 @@ int pgwin32_noblock = 0;
#undef select
#undef recv
#undef send
+#undef closesocket
/*
- * Blocking socket functions implemented so they listen on both
- * the socket and the signal event, required for signal handling.
+ * Add a socket to the list of non-blocking sockets.
*/
+void
+pgwin32_set_socket_noblock(SOCKET s)
+{
+ if (pgwin32_socket_is_nonblocking(s))
+ return; /* already non-nlocking */
+
+ if (num_nonblocking_sockets >= MAX_NONBLOCKING_SOCKETS)
+ elog(ERROR, "too many non-blocking sockets");
+
+ nonblocking_sockets[num_nonblocking_sockets++] = s;
+}
+
+/*
+ * Remove a socket from the list of non-blocking sockets.
+ */
+void
+pgwin32_set_socket_block(SOCKET s)
+{
+ int i;
+
+ for (i = 0; i < num_nonblocking_sockets; i++)
+ {
+ if (nonblocking_sockets[i] == s)
+ {
+ /* Found. Move the last entry to this slot. */
+ if (i != num_nonblocking_sockets - 1)
+ nonblocking_sockets[i] =
+ nonblocking_sockets[num_nonblocking_sockets - 1];
+ num_nonblocking_sockets--;
+ break;
+ }
+ }
+}
+
+static bool
+pgwin32_socket_is_nonblocking(SOCKET s)
+{
+ int i;
+
+ for (i = 0; i < num_nonblocking_sockets; i++)
+ {
+ if (nonblocking_sockets[i] == s)
+ return true;
+ }
+ return false;
+}
/*
* Convert the last socket error code into errno
@@ -334,7 +384,7 @@ pgwin32_recv(SOCKET s, char *buf, int len, int f)
return -1;
}
- if (pgwin32_noblock)
+ if (pgwin32_socket_is_nonblocking(s))
{
/*
* No data received, and we are in "emulated non-blocking mode", so
@@ -420,7 +470,7 @@ pgwin32_send(SOCKET s, const void *buf, int len, int flags)
return -1;
}
- if (pgwin32_noblock)
+ if (pgwin32_socket_is_nonblocking(s))
{
/*
* No data sent, and we are in "emulated non-blocking mode", so
@@ -645,6 +695,16 @@ pgwin32_select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, c
/*
+ * If the socket is in non-blocking mode, remove the entry from the array.
+ */
+int
+pgwin32_closesocket(SOCKET s)
+{
+ pgwin32_set_socket_block(s);
+ return closesocket(s);
+}
+
+/*
* Return win32 error string, since strerror can't
* handle winsock codes
*/
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index c7f41a5..32be003 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3255,22 +3255,9 @@ PgstatCollectorMain(int argc, char *argv[])
/*
* Try to receive and process a message. This will not block,
* since the socket is set to non-blocking mode.
- *
- * XXX On Windows, we have to force pgwin32_recv to cooperate,
- * despite the previous use of pg_set_noblock() on the socket.
- * This is extremely broken and should be fixed someday.
*/
-#ifdef WIN32
- pgwin32_noblock = 1;
-#endif
-
len = recv(pgStatSock, (char *) &msg,
sizeof(PgStat_Msg), 0);
-
-#ifdef WIN32
- pgwin32_noblock = 0;
-#endif
-
if (len < 0)
{
if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
diff --git a/src/include/port/win32.h b/src/include/port/win32.h
index 550c3ec..e3c0473 100644
--- a/src/include/port/win32.h
+++ b/src/include/port/win32.h
@@ -368,6 +368,7 @@ void pg_queue_signal(int signum);
#define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout)
#define recv(s, buf, len, flags) pgwin32_recv(s, buf, len, flags)
#define send(s, buf, len, flags) pgwin32_send(s, buf, len, flags)
+#define closesocket(s) pgwin32_closesocket(s)
SOCKET pgwin32_socket(int af, int type, int protocol);
SOCKET pgwin32_accept(SOCKET s, struct sockaddr * addr, int *addrlen);
@@ -375,11 +376,12 @@ int pgwin32_connect(SOCKET s, const struct sockaddr * name, int namelen);
int pgwin32_select(int nfds, fd_set *readfs, fd_set *writefds, fd_set *exceptfds, const struct timeval * timeout);
int pgwin32_recv(SOCKET s, char *buf, int len, int flags);
int pgwin32_send(SOCKET s, const void *buf, int len, int flags);
+int pgwin32_closesocket(SOCKET s);
const char *pgwin32_socket_strerror(int err);
int pgwin32_waitforsinglesocket(SOCKET s, int what, int timeout);
-
-extern int pgwin32_noblock;
+void pgwin32_set_socket_block(SOCKET s);
+void pgwin32_set_socket_noblock(SOCKET s);
/* in backend/port/win32/security.c */
extern int pgwin32_is_admin(void);
diff --git a/src/port/noblock.c b/src/port/noblock.c
index 1da0339..c8b35e0 100644
--- a/src/port/noblock.c
+++ b/src/port/noblock.c
@@ -22,11 +22,21 @@ pg_set_noblock(pgsocket sock)
{
#if !defined(WIN32)
return (fcntl(sock, F_SETFL, O_NONBLOCK) != -1);
-#else
+
+#elif defined(FRONTEND)
unsigned long ioctlsocket_ret = 1;
/* Returns non-0 on failure, while fcntl() returns -1 on failure */
return (ioctlsocket(sock, FIONBIO, &ioctlsocket_ret) == 0);
+
+#else
+ /*
+ * Sockets in Windows backend processes are wrapped, and blocking mode is
+ * handled by the emulation layer. See src/backend/port/win32/socket.c for
+ * details.
+ */
+ pgwin32_set_socket_noblock(sock);
+ return 1;
#endif
}
@@ -41,10 +51,16 @@ pg_set_block(pgsocket sock)
if (flags < 0 || fcntl(sock, F_SETFL, (long) (flags & ~O_NONBLOCK)))
return false;
return true;
-#else
+
+#elif FRONTEND
unsigned long ioctlsocket_ret = 0;
/* Returns non-0 on failure, while fcntl() returns -1 on failure */
return (ioctlsocket(sock, FIONBIO, &ioctlsocket_ret) == 0);
+
+#else
+ /* See pg_set_noblock */
+ pgwin32_set_socket_block(sock);
+ return 1;
#endif
}
On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
- Preventing protocol violation.
To prevent protocol violation, secure_write sets
ClientConnectionLost when SIGTERM detected, then
internal_flush() and ProcessInterrupts() follow the
instruction.
Oh, hang on. Now that I look at pqcomm.c more closely, it already has a
mechanism to avoid writing a message in the middle of another message.
See pq_putmessage and PqCommBusy. Can we rely on that?
- Single pg_terminate_backend surely kills the backend.
secure_raw_write() uses non-blocking socket and a loop of
select() with timeout to surely detects received
signal(SIGTERM).
I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
second increment, but I see that WaitLatchOrSocket() doesn't currently
support waiting for a socket to become writeable, without also waiting
for it to become readable. I wonder how difficult it would be to lift
that restriction.
I also wonder if it would be simpler to keep the socket in blocking mode
after all, and just close() in the signal handler if PqCommBusy == true.
If the signal arrives while sleeping in send(), the effect would be the
same as with your patch. If the signal arrives while sending, but not
sleeping, you would not get the chance to send the already-buffered data
to the client. But maybe that's OK, whether or not you're blocked is not
very deterministic anyway.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-09-02 21:46:29 +0300, Heikki Linnakangas wrote:
I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
second increment, but I see that WaitLatchOrSocket() doesn't currently
support waiting for a socket to become writeable, without also waiting for
it to become readable. I wonder how difficult it would be to lift that
restriction.
It's imo not that difficult. I've a prototype patch for that
somewhere. I tested my poll() implementation and it worked, but didn't
yet get to select().
I also wonder if it would be simpler to keep the socket in blocking mode
after all, and just close() in the signal handler if PqCommBusy == true. If
the signal arrives while sleeping in send(), the effect would be the same as
with your patch. If the signal arrives while sending, but not sleeping, you
would not get the chance to send the already-buffered data to the client.
But maybe that's OK, whether or not you're blocked is not very deterministic
anyway.
I've actually been working on a patch to make the whole interaction with
the client using sockets. The reason I started so is that we lots of far
to complex stuff in signal handlers, and using a latch would allow us to
instead interrupt send/recv. While still heavily WIP the reduction in
odd stuff (check e.g. HandleCatchupInterrupt()) made me rather happy.
I'm slightly worried about the added overhead due to the latch code. In
my implementation I only use latches after a nonblocking read, but
still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
that can be made problematic.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-09-02 21:01:44 +0200, Andres Freund wrote:
I've actually been working on a patch to make the whole interaction with
the client using sockets. The reason I started so is that we lots of far
to complex stuff in signal handlers, and using a latch would allow us to
instead interrupt send/recv. While still heavily WIP the reduction in
odd stuff (check e.g. HandleCatchupInterrupt()) made me rather happy.
Actually, the even more important reason is that that would allow us to
throw errors/fatals sanely in interrupts because we wouldn't possibly
jump through openssl anymore...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
second increment, but I see that WaitLatchOrSocket() doesn't currently
support waiting for a socket to become writeable, without also waiting
for it to become readable. I wonder how difficult it would be to lift
that restriction.
My recollection is that there was a reason for that, but I don't recall
details any more.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
second increment, but I see that WaitLatchOrSocket() doesn't currently
support waiting for a socket to become writeable, without also waiting
for it to become readable. I wonder how difficult it would be to lift
that restriction.My recollection is that there was a reason for that, but I don't recall
details any more.
http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
In my prototype I've changed the API that errors set both
READABLE/WRITABLE. Seems to work....
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/03/2014 12:23 AM, Andres Freund wrote:
On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
second increment, but I see that WaitLatchOrSocket() doesn't currently
support waiting for a socket to become writeable, without also waiting
for it to become readable. I wonder how difficult it would be to lift
that restriction.My recollection is that there was a reason for that, but I don't recall
details any more.http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
In my prototype I've changed the API that errors set both
READABLE/WRITABLE. Seems to work....
Andres, would you mind posting the WIP patch you have? That could be a
better foundation for this patch.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I'm slightly worried about the added overhead due to the latch code. In
my implementation I only use latches after a nonblocking read, but
still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
that can be made problematic.
I think that's not the word you're looking for. Or if it is, then -
it's already problematic. At some point I hacked up a very crude
prototype that made LWLocks use latches to sleep instead of
semaphores. It was slow.
AIUI, the only reason why we need the self-pipe thing is because on
some platforms signals don't interrupt system calls. But my
impression was that those platforms were somewhat obscure. Could we
have a separate latch implementation for platforms where we know that
system calls will get interrupted by signals? Alternatively, should
we consider reimplementing latches using semaphores? I assume having
the signal handler up the semaphore would allow the attempt to down
the semaphore to succeed on return from the handler, so it would
accomplish the same thing as the self-pipe trick.
Basically, it doesn't feel like a good thing that we've got two sets
of primitives for making a backend wait that (1) don't really know
about each other and (2) use different operating system primitives.
Presumably one of the two systems is better; let's figure out which
one it is, use that one all the time, and get rid of the other one.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/04/2014 03:49 PM, Robert Haas wrote:
On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I'm slightly worried about the added overhead due to the latch code. In
my implementation I only use latches after a nonblocking read, but
still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
that can be made problematic.I think that's not the word you're looking for. Or if it is, then -
it's already problematic. At some point I hacked up a very crude
prototype that made LWLocks use latches to sleep instead of
semaphores. It was slow.
Hmm. Perhaps we should call drainSelfPipe() only after poll/select
returns saying that there is something in the self-pipe. That would be a
win assuming it's more common for the self-pipe to be empty.
AIUI, the only reason why we need the self-pipe thing is because on
some platforms signals don't interrupt system calls.
That's not the only reason. It also eliminates the race condition that
someone might set the latch after we've checked that it's not set, but
before calling poll/select. The same reason that ppoll and pselect exist.
But my
impression was that those platforms were somewhat obscure. Could we
have a separate latch implementation for platforms where we know that
system calls will get interrupted by signals?
... and have ppoll or pselect. Yeah, seems reasonable, assuming that
ppoll/pselect is faster.
Alternatively, should
we consider reimplementing latches using semaphores? I assume having
the signal handler up the semaphore would allow the attempt to down
the semaphore to succeed on return from the handler, so it would
accomplish the same thing as the self-pipe trick.
I don't think there's a function to wait for a file descriptor or
semaphore at the same time.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 4, 2014 at 9:05 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
Hmm. Perhaps we should call drainSelfPipe() only after poll/select returns
saying that there is something in the self-pipe. That would be a win
assuming it's more common for the self-pipe to be empty.
Couldn't hurt.
But my
impression was that those platforms were somewhat obscure. Could we
have a separate latch implementation for platforms where we know that
system calls will get interrupted by signals?... and have ppoll or pselect. Yeah, seems reasonable, assuming that
ppoll/pselect is faster.
Hrm. So we'd have to block SIGUSR1, check the flag, then use
pselect() to temporarily unblock SIGUSR1 and wait, then on return
again unblock SIGUSR1? Doesn't seem very appealing. I think changing
the signal mask is fast on Linux, but quite slow on at least some
other UNIX-like platforms. And I've heard that pselect() isn't always
truly atomic, so we might run into platform-specific bugs, too. I
wonder if there's a better way e.g. using memory barriers.
WaitLatch: check is_set. if yes then done. otherwise, set signal_me.
memory barrier. recheck is_set. if not set then wait using
poll/select. memory barrier. clear signal_me.
SetLatch: check is_set. if yes then done. otherwise, set is_set.
memory barrier. check signal_me. if set, then send SIGUSR1.
Alternatively, should
we consider reimplementing latches using semaphores? I assume having
the signal handler up the semaphore would allow the attempt to down
the semaphore to succeed on return from the handler, so it would
accomplish the same thing as the self-pipe trick.I don't think there's a function to wait for a file descriptor or semaphore
at the same time.
Oh, good point. So that's out, then.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/04/2014 04:37 PM, Robert Haas wrote:
Hrm. So we'd have to block SIGUSR1, check the flag, then use
pselect() to temporarily unblock SIGUSR1 and wait, then on return
again unblock SIGUSR1? Doesn't seem very appealing. I think changing
the signal mask is fast on Linux, but quite slow on at least some
other UNIX-like platforms. And I've heard that pselect() isn't always
truly atomic, so we might run into platform-specific bugs, too. I
wonder if there's a better way e.g. using memory barriers.WaitLatch: check is_set. if yes then done. otherwise, set signal_me.
memory barrier. recheck is_set. if not set then wait using
poll/select. memory barrier. clear signal_me.
SetLatch: check is_set. if yes then done. otherwise, set is_set.
memory barrier. check signal_me. if set, then send SIGUSR1.
Doesn't work. No matter what you do, the process running WaitLatch might
receive the signal immediately before it calls poll/select. The signal
handler will run, and the poll/select call will then go to sleep. There
is no way to do this without support from the kernel, that is why
ppoll/pselect exist.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 4, 2014 at 9:53 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
On 09/04/2014 04:37 PM, Robert Haas wrote:
Hrm. So we'd have to block SIGUSR1, check the flag, then use
pselect() to temporarily unblock SIGUSR1 and wait, then on return
again unblock SIGUSR1? Doesn't seem very appealing. I think changing
the signal mask is fast on Linux, but quite slow on at least some
other UNIX-like platforms. And I've heard that pselect() isn't always
truly atomic, so we might run into platform-specific bugs, too. I
wonder if there's a better way e.g. using memory barriers.WaitLatch: check is_set. if yes then done. otherwise, set signal_me.
memory barrier. recheck is_set. if not set then wait using
poll/select. memory barrier. clear signal_me.
SetLatch: check is_set. if yes then done. otherwise, set is_set.
memory barrier. check signal_me. if set, then send SIGUSR1.Doesn't work. No matter what you do, the process running WaitLatch might
receive the signal immediately before it calls poll/select. The signal
handler will run, and the poll/select call will then go to sleep. There is
no way to do this without support from the kernel, that is why ppoll/pselect
exist.
Eesh, I was confused there: ignore me. I was trying to optimize away
the signal handling but assuming we still had the self-pipe byte. But
of course in that case we don't need to change anything at all.
I'm going to go get some more caffeine.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
- This patch introduced redundant socket emulation for win32
backend but win32 bare socket for Port is already nonblocking
as described so it donsn't seem to be a serious problem on
performance. Addition to it, since I don't know the reason why
win32/socket.c provides the blocking-mode socket emulation, I
decided to preserve win32/socket.c to have blocking socket
emulation. Possibly it can be removed.On Windows, the backend has an emulation layer for POSIX signals,
which uses threads and Windows events. The reason win32/socket.c
always uses non-blocking mode internally is that it needs to wait for
the socket to become readable/writeable, and for the signal-emulation
event, at the same time. So no, we can't remove it.
I see, thank you.
The approach taken in the first patch seems sensible. I changed it to
not use FD_SET, though. A custom array seems better, that way we don't
need the pgwin32_nonblockset_init() call, we can just use initialize
the variable. It's a little bit more code, but it's well-contained in
win32/socket.c. Please take a look, to double-check that I didn't
screw up.
Thank you. I felt a bit qualm to abusing fd_set. A bit more code
is not a problem.
I had close look on your patch.
Both 'nonblocking' and 'noblock' are appears in function names,
pgwin32_set_socket_block/noblock/is_nonblocking(). I prefer
nonblocking/blocking pair but I'm satisfied they are in uniform
style anyway. (Though I also didn't so ;p)
pgwin32_set_socket_block() leaves garbage in
nonblocking_sockets[] but it's no problem practically. You also
removed blocking'ize(?) code but I agree that it is correct
because fds of nonclosed socket won't be reused anyway.
pg_set_block/noblock() made me laugh. Yes you're correct. Sorry
for the bronken (but workable) code.
After all, the patch looks pretty good.
I'll continue to fit the another patch onto this.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, attached is the new-and-far-simple version of this
patch. It no longer depends on win32 nonblocking patch since the
socket is in blocking mode again.
On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
- Preventing protocol violation.
To prevent protocol violation, secure_write sets
ClientConnectionLost when SIGTERM detected, then
internal_flush() and ProcessInterrupts() follow the
instruction.Oh, hang on. Now that I look at pqcomm.c more closely, it already has
a mechanism to avoid writing a message in the middle of another
message. See pq_putmessage and PqCommBusy. Can we rely on that?
Hmm, it gracefully returns up to ExecProcNode() and PqCommBusy is
turned off on the way at pq_putmessage() under current
implement. So PqCommBusy is already false before it runs into
ProcessInterrupts().
Allowing ImmediateInterruptOK on signalled during send(), setting
whereToSendOutput to DestNone if PqCommBusy is true will do. We
can also distinguish read and write by looking
DoingCommandRead. The ImmediateInterruptOK section can be defined
enclosing by prepare_for_client_read/client_read_end.
- Single pg_terminate_backend surely kills the backend.
secure_raw_write() uses non-blocking socket and a loop of
select() with timeout to surely detects received
signal(SIGTERM).I was going to suggest using WaitLatchOrSocket instead of sleeping in
1 second increment, but I see that WaitLatchOrSocket() doesn't
currently support waiting for a socket to become writeable, without
also waiting for it to become readable. I wonder how difficult it
would be to lift that restriction.
It seems quite difficult hearing the following discussion.
I also wonder if it would be simpler to keep the socket in blocking
mode after all, and just close() in the signal handler if PqCommBusy
== true. If the signal arrives while sleeping in send(), the effect
would be the same as with your patch. If the signal arrives while
sending, but not sleeping, you would not get the chance to send the
already-buffered data to the client. But maybe that's OK, whether or
not you're blocked is not very deterministic anyway.
Hmm. We're back round to the my first patch, with immediately
close the socket, and became irrelevant to win32 layer
patch. Anyway, it sounds reasonable.
Attached patch is a quick hack patch, but it seems working as
expected at a glance.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Simplly-cutting-off-the-socket-if-signalled-during-s.patchtext/x-patch; charset=us-asciiDownload
>From 7fcb6ef2e66231605e49bd51cd09d275b40cfd57 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 5 Sep 2014 17:21:48 +0900
Subject: [PATCH] Simplly cutting off the socket if signalled during sending to client.
---
src/backend/libpq/be-secure.c | 14 +++++++++++---
src/backend/libpq/pqcomm.c | 6 ++++++
src/backend/tcop/postgres.c | 40 +++++++++++++++++++++-------------------
src/include/libpq/libpq.h | 1 +
4 files changed, 39 insertions(+), 22 deletions(-)
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 41ec1ad..329812b 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -145,11 +145,11 @@ secure_raw_read(Port *port, void *ptr, size_t len)
{
ssize_t n;
- prepare_for_client_read();
+ prepare_for_client_comm();
n = recv(port->sock, ptr, len, 0);
- client_read_ended();
+ client_comm_ended();
return n;
}
@@ -178,5 +178,13 @@ secure_write(Port *port, void *ptr, size_t len)
ssize_t
secure_raw_write(Port *port, const void *ptr, size_t len)
{
- return send(port->sock, ptr, len, 0);
+ ssize_t n;
+
+ prepare_for_client_comm();
+
+ n = send(port->sock, ptr, len, 0);
+
+ client_comm_ended();
+
+ return n;
}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..8f84f67 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1342,6 +1342,12 @@ pq_is_send_pending(void)
return (PqSendStart < PqSendPointer);
}
+bool
+pq_is_busy(void)
+{
+ return PqCommBusy;
+}
+
/* --------------------------------
* Message-level I/O routines begin here.
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7b5480f..7a4c483 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -303,16 +303,16 @@ InteractiveBackend(StringInfo inBuf)
*
* Even though we are not reading from a "client" process, we still want to
* respond to signals, particularly SIGTERM/SIGQUIT. Hence we must use
- * prepare_for_client_read and client_read_ended.
+ * prepare_for_client_comm and client_comm_ended.
*/
static int
interactive_getc(void)
{
int c;
- prepare_for_client_read();
+ prepare_for_client_comm();
c = getc(stdin);
- client_read_ended();
+ client_comm_ended();
return c;
}
@@ -487,7 +487,7 @@ ReadCommand(StringInfo inBuf)
}
/*
- * prepare_for_client_read -- set up to possibly block on client input
+ * prepare_for_client_comm -- set up to possibly block on client communication
*
* This must be called immediately before any low-level read from the
* client connection. It is necessary to do it at a sufficiently low level
@@ -496,32 +496,29 @@ ReadCommand(StringInfo inBuf)
* In particular there mustn't be use of malloc() or other potentially
* non-reentrant libc functions. This restriction makes it safe for us
* to allow interrupt service routines to execute nontrivial code while
- * we are waiting for input.
+ * we are waiting for input or blocking of output.
*/
void
-prepare_for_client_read(void)
+prepare_for_client_comm(void)
{
- if (DoingCommandRead)
- {
- /* Enable immediate processing of asynchronous signals */
- EnableNotifyInterrupt();
- EnableCatchupInterrupt();
+ /* Enable immediate processing of asynchronous signals */
+ EnableNotifyInterrupt();
+ EnableCatchupInterrupt();
- /* Allow cancel/die interrupts to be processed while waiting */
- ImmediateInterruptOK = true;
+ /* Allow cancel/die interrupts to be processed while waiting */
+ ImmediateInterruptOK = true;
- /* And don't forget to detect one that already arrived */
- CHECK_FOR_INTERRUPTS();
- }
+ /* And don't forget to detect one that already arrived */
+ CHECK_FOR_INTERRUPTS();
}
/*
- * client_read_ended -- get out of the client-input state
+ * client_comm_ended -- get out of the client-communicating state
*
- * This is called just after low-level reads. It must preserve errno!
+ * This is called just after low-level reads/writes. It must preserve errno!
*/
void
-client_read_ended(void)
+client_comm_ended(void)
{
if (DoingCommandRead)
{
@@ -2594,6 +2591,11 @@ die(SIGNAL_ARGS)
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
{
+ if (pq_is_busy() && !DoingCommandRead)
+ {
+ close(MyProcPort->sock);
+ whereToSendOutput = DestNone;
+ }
/* bump holdoff count to make ProcessInterrupts() a no-op */
/* until we are done getting ready for it */
InterruptHoldoffCount++;
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 5da9d8d..c3fc5f3 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -62,6 +62,7 @@ extern int pq_putbytes(const char *s, size_t len);
extern int pq_flush(void);
extern int pq_flush_if_writable(void);
extern bool pq_is_send_pending(void);
+extern bool pq_is_busy(void);
extern int pq_putmessage(char msgtype, const char *s, size_t len);
extern void pq_putmessage_noblock(char msgtype, const char *s, size_t len);
extern void pq_startcopyout(void);
--
1.7.1
Sorry, It tha patch contains a silly bug. Please find the
attatched one.
Show quoted text
Hello, attached is the new-and-far-simple version of this
patch. It no longer depends on win32 nonblocking patch since the
socket is in blocking mode again.On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
- Preventing protocol violation.
To prevent protocol violation, secure_write sets
ClientConnectionLost when SIGTERM detected, then
internal_flush() and ProcessInterrupts() follow the
instruction.Oh, hang on. Now that I look at pqcomm.c more closely, it already has
a mechanism to avoid writing a message in the middle of another
message. See pq_putmessage and PqCommBusy. Can we rely on that?Hmm, it gracefully returns up to ExecProcNode() and PqCommBusy is
turned off on the way at pq_putmessage() under current
implement. So PqCommBusy is already false before it runs into
ProcessInterrupts().Allowing ImmediateInterruptOK on signalled during send(), setting
whereToSendOutput to DestNone if PqCommBusy is true will do. We
can also distinguish read and write by looking
DoingCommandRead. The ImmediateInterruptOK section can be defined
enclosing by prepare_for_client_read/client_read_end.- Single pg_terminate_backend surely kills the backend.
secure_raw_write() uses non-blocking socket and a loop of
select() with timeout to surely detects received
signal(SIGTERM).I was going to suggest using WaitLatchOrSocket instead of sleeping in
1 second increment, but I see that WaitLatchOrSocket() doesn't
currently support waiting for a socket to become writeable, without
also waiting for it to become readable. I wonder how difficult it
would be to lift that restriction.It seems quite difficult hearing the following discussion.
I also wonder if it would be simpler to keep the socket in blocking
mode after all, and just close() in the signal handler if PqCommBusy
== true. If the signal arrives while sleeping in send(), the effect
would be the same as with your patch. If the signal arrives while
sending, but not sleeping, you would not get the chance to send the
already-buffered data to the client. But maybe that's OK, whether or
not you're blocked is not very deterministic anyway.Hmm. We're back round to the my first patch, with immediately
close the socket, and became irrelevant to win32 layer
patch. Anyway, it sounds reasonable.Attached patch is a quick hack patch, but it seems working as
expected at a glance.
Attachments:
0001-Simplly-cutting-off-the-socket-if-signalled-during-s.patchtext/x-patch; charset=us-asciiDownload
>From 11da4bc3c214490671d27379910a667f06cc35af Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 5 Sep 2014 17:21:48 +0900
Subject: [PATCH] Simplly cutting off the socket if signalled during sending to client.
---
src/backend/libpq/be-secure.c | 14 ++++++++--
src/backend/libpq/pqcomm.c | 6 ++++
src/backend/tcop/postgres.c | 53 ++++++++++++++++++++---------------------
src/include/libpq/libpq.h | 1 +
4 files changed, 44 insertions(+), 30 deletions(-)
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 41ec1ad..329812b 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -145,11 +145,11 @@ secure_raw_read(Port *port, void *ptr, size_t len)
{
ssize_t n;
- prepare_for_client_read();
+ prepare_for_client_comm();
n = recv(port->sock, ptr, len, 0);
- client_read_ended();
+ client_comm_ended();
return n;
}
@@ -178,5 +178,13 @@ secure_write(Port *port, void *ptr, size_t len)
ssize_t
secure_raw_write(Port *port, const void *ptr, size_t len)
{
- return send(port->sock, ptr, len, 0);
+ ssize_t n;
+
+ prepare_for_client_comm();
+
+ n = send(port->sock, ptr, len, 0);
+
+ client_comm_ended();
+
+ return n;
}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..8f84f67 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1342,6 +1342,12 @@ pq_is_send_pending(void)
return (PqSendStart < PqSendPointer);
}
+bool
+pq_is_busy(void)
+{
+ return PqCommBusy;
+}
+
/* --------------------------------
* Message-level I/O routines begin here.
*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7b5480f..15627c3 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -303,16 +303,16 @@ InteractiveBackend(StringInfo inBuf)
*
* Even though we are not reading from a "client" process, we still want to
* respond to signals, particularly SIGTERM/SIGQUIT. Hence we must use
- * prepare_for_client_read and client_read_ended.
+ * prepare_for_client_comm and client_comm_ended.
*/
static int
interactive_getc(void)
{
int c;
- prepare_for_client_read();
+ prepare_for_client_comm();
c = getc(stdin);
- client_read_ended();
+ client_comm_ended();
return c;
}
@@ -487,7 +487,7 @@ ReadCommand(StringInfo inBuf)
}
/*
- * prepare_for_client_read -- set up to possibly block on client input
+ * prepare_for_client_comm -- set up to possibly block on client communication
*
* This must be called immediately before any low-level read from the
* client connection. It is necessary to do it at a sufficiently low level
@@ -496,44 +496,38 @@ ReadCommand(StringInfo inBuf)
* In particular there mustn't be use of malloc() or other potentially
* non-reentrant libc functions. This restriction makes it safe for us
* to allow interrupt service routines to execute nontrivial code while
- * we are waiting for input.
+ * we are waiting for input or blocking of output.
*/
void
-prepare_for_client_read(void)
+prepare_for_client_comm(void)
{
- if (DoingCommandRead)
- {
- /* Enable immediate processing of asynchronous signals */
- EnableNotifyInterrupt();
- EnableCatchupInterrupt();
+ /* Enable immediate processing of asynchronous signals */
+ EnableNotifyInterrupt();
+ EnableCatchupInterrupt();
- /* Allow cancel/die interrupts to be processed while waiting */
- ImmediateInterruptOK = true;
+ /* Allow cancel/die interrupts to be processed while waiting */
+ ImmediateInterruptOK = true;
- /* And don't forget to detect one that already arrived */
- CHECK_FOR_INTERRUPTS();
- }
+ /* And don't forget to detect one that already arrived */
+ CHECK_FOR_INTERRUPTS();
}
/*
- * client_read_ended -- get out of the client-input state
+ * client_comm_ended -- get out of the client-communicating state
*
- * This is called just after low-level reads. It must preserve errno!
+ * This is called just after low-level reads/writes. It must preserve errno!
*/
void
-client_read_ended(void)
+client_comm_ended(void)
{
- if (DoingCommandRead)
- {
- int save_errno = errno;
+ int save_errno = errno;
- ImmediateInterruptOK = false;
+ ImmediateInterruptOK = false;
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
+ DisableNotifyInterrupt();
+ DisableCatchupInterrupt();
- errno = save_errno;
- }
+ errno = save_errno;
}
@@ -2594,6 +2588,11 @@ die(SIGNAL_ARGS)
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
{
+ if (pq_is_busy() && !DoingCommandRead)
+ {
+ close(MyProcPort->sock);
+ whereToSendOutput = DestNone;
+ }
/* bump holdoff count to make ProcessInterrupts() a no-op */
/* until we are done getting ready for it */
InterruptHoldoffCount++;
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 5da9d8d..c3fc5f3 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -62,6 +62,7 @@ extern int pq_putbytes(const char *s, size_t len);
extern int pq_flush(void);
extern int pq_flush_if_writable(void);
extern bool pq_is_send_pending(void);
+extern bool pq_is_busy(void);
extern int pq_putmessage(char msgtype, const char *s, size_t len);
extern void pq_putmessage_noblock(char msgtype, const char *s, size_t len);
extern void pq_startcopyout(void);
--
1.7.1
Hi, I added and edited some comments.
Sorry, It tha patch contains a silly bug. Please find the
attatched one.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0001-Simplly-cutting-off-the-socket-if-signalled-during_v2.patchtext/x-patch; charset=us-asciiDownload
>From eb91a7c91e1fd3b24bf5bff0eb885f1c3d274637 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 5 Sep 2014 17:21:48 +0900
Subject: [PATCH] Simplly cutting off the socket if signalled during sending to client.
---
src/backend/libpq/be-secure.c | 21 ++++++++++--
src/backend/libpq/pqcomm.c | 13 +++++++
src/backend/tcop/postgres.c | 71 ++++++++++++++++++++++-------------------
src/include/libpq/libpq.h | 1 +
4 files changed, 70 insertions(+), 36 deletions(-)
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 41ec1ad..3006697 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -145,11 +145,11 @@ secure_raw_read(Port *port, void *ptr, size_t len)
{
ssize_t n;
- prepare_for_client_read();
+ prepare_for_client_comm();
n = recv(port->sock, ptr, len, 0);
- client_read_ended();
+ client_comm_ended();
return n;
}
@@ -178,5 +178,20 @@ secure_write(Port *port, void *ptr, size_t len)
ssize_t
secure_raw_write(Port *port, const void *ptr, size_t len)
{
- return send(port->sock, ptr, len, 0);
+ ssize_t n;
+
+ /*
+ * If we get interrupted during send under execution without blocking,
+ * processing interrupt immediately actually throws away the chance to
+ * complete sending the bytes handed, but the chance which we could send
+ * one more tuple or maybe the final bytes has less not significance than
+ * the risk that we might can't bail out forever due to blocking send.
+ */
+ prepare_for_client_comm();
+
+ n = send(port->sock, ptr, len, 0);
+
+ client_comm_ended();
+
+ return n;
}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..9b08529 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1343,6 +1343,19 @@ pq_is_send_pending(void)
}
/* --------------------------------
+ * pq_is_busy - is there any I/O command running?
+ *
+ * This function is intended for use within signal handlers to check if
+ * any pqcomm I/O operation is under execution.
+ * --------------------------------
+ */
+bool
+pq_is_busy(void)
+{
+ return PqCommBusy;
+}
+
+/* --------------------------------
* Message-level I/O routines begin here.
*
* These routines understand about the old-style COPY OUT protocol.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7b5480f..b29b200 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -303,16 +303,16 @@ InteractiveBackend(StringInfo inBuf)
*
* Even though we are not reading from a "client" process, we still want to
* respond to signals, particularly SIGTERM/SIGQUIT. Hence we must use
- * prepare_for_client_read and client_read_ended.
+ * prepare_for_client_comm and client_comm_ended.
*/
static int
interactive_getc(void)
{
int c;
- prepare_for_client_read();
+ prepare_for_client_comm();
c = getc(stdin);
- client_read_ended();
+ client_comm_ended();
return c;
}
@@ -487,53 +487,47 @@ ReadCommand(StringInfo inBuf)
}
/*
- * prepare_for_client_read -- set up to possibly block on client input
+ * prepare_for_client_comm -- set up to possibly block on client communication
*
- * This must be called immediately before any low-level read from the
- * client connection. It is necessary to do it at a sufficiently low level
- * that there won't be any other operations except the read kernel call
- * itself between this call and the subsequent client_read_ended() call.
+ * This must be called immediately before any low-level read from or write to
+ * the client connection. It is necessary to do it at a sufficiently low
+ * level that there won't be any other operations except the read/write kernel
+ * call itself between this call and the subsequent client_comm_ended() call.
* In particular there mustn't be use of malloc() or other potentially
- * non-reentrant libc functions. This restriction makes it safe for us
- * to allow interrupt service routines to execute nontrivial code while
- * we are waiting for input.
+ * non-reentrant libc functions. This restriction makes it safe for us to
+ * allow interrupt service routines to execute nontrivial code while we are
+ * waiting for input or blocking of output.
*/
void
-prepare_for_client_read(void)
+prepare_for_client_comm(void)
{
- if (DoingCommandRead)
- {
- /* Enable immediate processing of asynchronous signals */
- EnableNotifyInterrupt();
- EnableCatchupInterrupt();
+ /* Enable immediate processing of asynchronous signals */
+ EnableNotifyInterrupt();
+ EnableCatchupInterrupt();
- /* Allow cancel/die interrupts to be processed while waiting */
- ImmediateInterruptOK = true;
+ /* Allow cancel/die interrupts to be processed while waiting */
+ ImmediateInterruptOK = true;
- /* And don't forget to detect one that already arrived */
- CHECK_FOR_INTERRUPTS();
- }
+ /* And don't forget to detect one that already arrived */
+ CHECK_FOR_INTERRUPTS();
}
/*
- * client_read_ended -- get out of the client-input state
+ * client_comm_ended -- get out of the client-communicating state
*
- * This is called just after low-level reads. It must preserve errno!
+ * This is called just after low-level reads/writes. It must preserve errno!
*/
void
-client_read_ended(void)
+client_comm_ended(void)
{
- if (DoingCommandRead)
- {
- int save_errno = errno;
+ int save_errno = errno;
- ImmediateInterruptOK = false;
+ ImmediateInterruptOK = false;
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
+ DisableNotifyInterrupt();
+ DisableCatchupInterrupt();
- errno = save_errno;
- }
+ errno = save_errno;
}
@@ -2594,6 +2588,17 @@ die(SIGNAL_ARGS)
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
{
+ if (pq_is_busy() && !DoingCommandRead)
+ {
+ /*
+ * Getting here indicates that we have interrupted during a
+ * data block is under sending to the client, so cut off the
+ * connection immediately not to send any more bytes which
+ * should cause protocol violation.
+ */
+ close(MyProcPort->sock);
+ whereToSendOutput = DestNone;
+ }
/* bump holdoff count to make ProcessInterrupts() a no-op */
/* until we are done getting ready for it */
InterruptHoldoffCount++;
diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h
index 5da9d8d..c3fc5f3 100644
--- a/src/include/libpq/libpq.h
+++ b/src/include/libpq/libpq.h
@@ -62,6 +62,7 @@ extern int pq_putbytes(const char *s, size_t len);
extern int pq_flush(void);
extern int pq_flush_if_writable(void);
extern bool pq_is_send_pending(void);
+extern bool pq_is_busy(void);
extern int pq_putmessage(char msgtype, const char *s, size_t len);
extern void pq_putmessage_noblock(char msgtype, const char *s, size_t len);
extern void pq_startcopyout(void);
--
1.7.1
Hmm. Sorry, I misunderstood the specification.
You approach that coloring tokens seems right, but you have
broken the parse logic by adding your code.Other than the mistakes others pointed, I found that
- non-SQL-ident like tokens are ignored by their token style,
quoted or not, so the following line works.| "local" All aLL trust
I suppose this is not what you intended. This is because you have
igonred the attribute of a token when comparing it as
non-SQL-ident tokens.- '+' at the head of the sequence '+"' is treated as the first
character of the *quoted* string. e.g. +"hoge" is tokenized as
"+hoge":special_quoted.
I found this is what intended. This should be documented as
comments.
|2) users and user-groups only requires special handling and behavior as follows
| Normal user :
| A. unquoted ( USER ) will be treated as user ( downcase ).
| B. quoted ( "USeR" ) will be treated as USeR (case-sensitive).
| C. quoted ( "+USER" ) will be treated as normal user +USER (i.e. will not be considered as user-group) and case-sensitive as string is quoted.
This seems confising with the B below. This seems should be
rearranged.
| User Group :
| A. unquoted ( +USERGROUP ) will be treated as +usergruop ( downcase ).
| B. plus quoted ( +"UserGROUP" ) will be treated as +UserGROUP (case-sensitive).
This is why you simply continued processing for '+"' without
discarding and skipping the '+', and not setting in_quote so the
following parser code works as it is not intended. You should
understand what the original code does and insert or modify
logics not braeking the assumptions.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Wrong thread...
On 09/10/2014 03:04 AM, Kyotaro HORIGUCHI wrote:
Hmm. Sorry, I misunderstood the specification.
You approach that coloring tokens seems right, but you have
broken the parse logic by adding your code.Other than the mistakes others pointed, I found that
- non-SQL-ident like tokens are ignored by their token style,
quoted or not, so the following line works.| "local" All aLL trust
I suppose this is not what you intended. This is because you have
igonred the attribute of a token when comparing it as
non-SQL-ident tokens.- '+' at the head of the sequence '+"' is treated as the first
character of the *quoted* string. e.g. +"hoge" is tokenized as
"+hoge":special_quoted.I found this is what intended. This should be documented as
comments.|2) users and user-groups only requires special handling and behavior as follows
| Normal user :
| A. unquoted ( USER ) will be treated as user ( downcase ).
| B. quoted ( "USeR" ) will be treated as USeR (case-sensitive).
| C. quoted ( "+USER" ) will be treated as normal user +USER (i.e. will not be considered as user-group) and case-sensitive as string is quoted.This seems confising with the B below. This seems should be
rearranged.| User Group :
| A. unquoted ( +USERGROUP ) will be treated as +usergruop ( downcase ).
| B. plus quoted ( +"UserGROUP" ) will be treated as +UserGROUP (case-sensitive).This is why you simply continued processing for '+"' without
discarding and skipping the '+', and not setting in_quote so the
following parser code works as it is not intended. You should
understand what the original code does and insert or modify
logics not braeking the assumptions.regards,
--
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Sorry for the mistake...
At Wed, 10 Sep 2014 18:53:03 +0300, Heikki Linnakangas <hlinnakangas@vmware.com> wrote in <541073DF.70902@vmware.com>
Wrong thread...
On 09/10/2014 03:04 AM, Kyotaro HORIGUCHI wrote:
Hmm. Sorry, I misunderstood the specification.
You approach that coloring tokens seems right, but you have
broken the parse logic by adding your code.Other than the mistakes others pointed, I found that
- non-SQL-ident like tokens are ignored by their token style,
quoted or not, so the following line works.| "local" All aLL trust
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/09/2014 01:31 PM, Kyotaro HORIGUCHI wrote:
Hi, I added and edited some comments.
Sorry, It tha patch contains a silly bug. Please find the
attatched one.
I must say this scares the heck out of me. The current code goes through
some trouble to not throw an error while in a recv() send(). For
example, you removed the DoingCommandRead check from
prepare_for_client_read(). There's an comment in postgres.c that says this:
/*
* (2) Allow asynchronous signals to be executed immediately if they
* come in while we are waiting for client input. (This must be
* conditional since we don't want, say, reads on behalf of COPY FROM
* STDIN doing the same thing.)
*/
DoingCommandRead = true;
With the patch, we do allow asynchronous signals to be processed while
blocked in COPY FROM STDIN. Maybe that's OK, but I don't feel
comfortable just changing it. (the comment is now wrong, of course)
This patch also enables processing query cancel signals while blocked,
not just SIGTERM. That's not good; we might be in the middle of sending
a message, and we cannot just error out of that or we might violate the
fe/be protocol. That's OK with a SIGTERM as you're terminating the
connection anyway, and we have the PqCommBusy safeguard in place that
prevents us from sending broken messages to the client, but that's not
good enough if we wanted to keep the backend alive, as we won't be able
to send anything to the client anymore.
BTW, we've been talking about blocking in send(), but this patch also
let's a recv() in e.g. COPY FROM STDIN to be interrupted. That's
probably a good thing; surely you have exactly the same issues with that
as with send(). But I didn't realize we had a problem with that too.
In summary, this patch is not ready as it is, but I think we can fix it.
The key question is: is it safe to handle SIGTERM in the signal handler,
calling the exit-handlers and exiting the backend, when blocked in a
recv() or send()? It's a change in the pqcomm.c API; most pqcomm.c
functions have not thrown errors or processed interrupts before. But
looking at the callers, I think it's safe, and there isn't actually any
comments explicitly saying that pqcomm.c will never throw errors.
I propose the attached patch. It adds a new flag ImmediateDieOK, which
is a weaker form of ImmediateInterruptOK that only allows handling a
pending die-signal in the signal handler.
Robert, others, do you see a problem with this?
Over IM, Robert pointed out that it's not safe to jump out of a signal
handler with siglongjmp, when we're inside library calls, like in a
callback called by OpenSSL. But even with current master branch, that's
exactly what we do. In secure_raw_read(), we set ImmediateInterruptOK =
true, which means that any incoming signal will be handled directly in
the signal handler, which can mean elog(ERROR). Should we be worried?
OpenSSL might get confused if control never returns to the SSL_read() or
SSL_write() function that called secure_raw_read().
- Heikki
Attachments:
ImmediateDieOk-1.patchtext/x-diff; name=ImmediateDieOk-1.patchDownload
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 41ec1ad..049e5b1 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -178,5 +178,13 @@ secure_write(Port *port, void *ptr, size_t len)
ssize_t
secure_raw_write(Port *port, const void *ptr, size_t len)
{
- return send(port->sock, ptr, len, 0);
+ ssize_t result;
+
+ prepare_for_client_write();
+
+ result = send(port->sock, ptr, len, 0);
+
+ client_write_ended();
+
+ return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 61f17bf..138060b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -497,6 +497,17 @@ ReadCommand(StringInfo inBuf)
* non-reentrant libc functions. This restriction makes it safe for us
* to allow interrupt service routines to execute nontrivial code while
* we are waiting for input.
+ *
+ * When waiting in the main loop, we can process any interrupt immediately
+ * in the signal handler. In any other read from the client, like in a COPY
+ * FROM STDIN, we can't safely process a query cancel signal, because we might
+ * be in the middle of sending a message to the client, and jumping out would
+ * violate the protocol. Or rather, pqcomm.c would detect it and refuse to
+ * send any more messages to the client. But handling a SIGTERM is OK, because
+ * we're terminating the backend and don't need to send any more messages
+ * anyway. That means that we might not be able to send an error message to
+ * the client, but that seems better than waiting indefinitely, in case the
+ * client is not responding.
*/
void
prepare_for_client_read(void)
@@ -513,6 +524,15 @@ prepare_for_client_read(void)
/* And don't forget to detect one that already arrived */
CHECK_FOR_INTERRUPTS();
}
+ else
+ {
+ /* Allow die interrupts to be processed while waiting */
+ ImmediateDieOK = true;
+
+ /* And don't forget to detect one that already arrived */
+ if (ProcDiePending)
+ CHECK_FOR_INTERRUPTS();
+ }
}
/*
@@ -534,6 +554,35 @@ client_read_ended(void)
errno = save_errno;
}
+ else
+ ImmediateDieOK = false;
+}
+
+/*
+ * prepare_for_client_write -- set up to possibly block on client output
+ *
+ * Like prepare_client_read, but for writing.
+ */
+void
+prepare_for_client_write(void)
+{
+ /* Allow die interrupts to be processed while waiting */
+ ImmediateDieOK = true;
+
+ /* And don't forget to detect one that already arrived */
+ if (ProcDiePending)
+ CHECK_FOR_INTERRUPTS();
+}
+
+/*
+ * client_read_ended -- get out of the client-output state
+ *
+ * This is called just after low-level writes.
+ */
+void
+client_write_ended(void)
+{
+ ImmediateDieOK = false;
}
@@ -2591,8 +2640,8 @@ die(SIGNAL_ARGS)
* If it's safe to interrupt, and we're waiting for input or a lock,
* service the interrupt immediately
*/
- if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
- CritSectionCount == 0)
+ if ((ImmediateInterruptOK || ImmediateDieOK) &&
+ InterruptHoldoffCount == 0 && CritSectionCount == 0)
{
/* bump holdoff count to make ProcessInterrupts() a no-op */
/* until we are done getting ready for it */
@@ -2792,8 +2841,8 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
* If it's safe to interrupt, and we're waiting for input or a lock,
* service the interrupt immediately
*/
- if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
- CritSectionCount == 0)
+ if ((ImmediateInterruptOK || (ImmediateDieOK && ProcDiePending)) &&
+ InterruptHoldoffCount == 0 && CritSectionCount == 0)
{
/* bump holdoff count to make ProcessInterrupts() a no-op */
/* until we are done getting ready for it */
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index be74835..24523f9 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -31,6 +31,7 @@ volatile bool QueryCancelPending = false;
volatile bool ProcDiePending = false;
volatile bool ClientConnectionLost = false;
volatile bool ImmediateInterruptOK = false;
+volatile bool ImmediateDieOK = false;
volatile uint32 InterruptHoldoffCount = 0;
volatile uint32 CritSectionCount = 0;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 2ba9885..9528936 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -81,6 +81,7 @@ extern volatile bool ClientConnectionLost;
/* these are marked volatile because they are examined by signal handlers: */
extern PGDLLIMPORT volatile bool ImmediateInterruptOK;
+extern PGDLLIMPORT volatile bool ImmediateDieOK;
extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;
extern PGDLLIMPORT volatile uint32 CritSectionCount;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 60f7532..c288bdd 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -69,6 +69,8 @@ extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from S
* handler */
extern void prepare_for_client_read(void);
extern void client_read_ended(void);
+extern void prepare_for_client_write(void);
+extern void client_write_ended(void);
extern void process_postgres_switches(int argc, char *argv[],
GucContext ctx, const char **dbname);
extern void PostgresMain(int argc, char *argv[],
On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
On 09/03/2014 12:23 AM, Andres Freund wrote:
On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
second increment, but I see that WaitLatchOrSocket() doesn't currently
support waiting for a socket to become writeable, without also waiting
for it to become readable. I wonder how difficult it would be to lift
that restriction.My recollection is that there was a reason for that, but I don't recall
details any more.http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
In my prototype I've changed the API that errors set both
READABLE/WRITABLE. Seems to work....Andres, would you mind posting the WIP patch you have? That could be a
better foundation for this patch.
Sorry, I missed this message and only cought up when reading your CF
status mail. I've attached three patches:
0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
tested the poll() and select() implementations on linux and
blindly patched up windows.
0002: Put the socket the backend uses to communicate with the client
into nonblocking mode as soon as latches are ready and use latches
to wait. This probably doesn't work correctly without 0003, but
seems easier to review separately.
0003: Don't do sinval catchup and notify processing in signal
handlers. It's quite cool that it worked that well so far, but it
requires some complicated code and is rather fragile. 0002 allows
to move that out of signal handlers and just use a latch
there. This seems remarkably simpler:
4 files changed, 69 insertions(+), 229 deletions(-)
These aren't ready for commit, especially not 0003, but I think they are
quite a good foundation for getting rid of the blocking in send(). I
haven't added any interrupt processing after interrupted writes, but
marked the relevant places with XXXs.
With regard to 0002, I dislike the current need to do interrupt
processing both in be-secure.c and be-secure-openssl.c. I guess we could
solve that by returning something like EINTR from the ssl routines when
they need further reads/writes and do all the processing in one place in
be-secure.c.
There's also some cleanup in 0002/0003 needed:
prepare_for_client_read()/client_read_ended() aren't needed in that form
anymore and should probably rather be something like
CHECK_FOR_READ_INTERRUPT() or similar. Similarly the
EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
pretty ugly.
Btw, be-secure.c is really not a good name anymore...
What do you think?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-09-27 21:12:43 +0200, Andres Freund wrote:
On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
On 09/03/2014 12:23 AM, Andres Freund wrote:
On 2014-09-02 17:21:03 -0400, Tom Lane wrote:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
I was going to suggest using WaitLatchOrSocket instead of sleeping in 1
second increment, but I see that WaitLatchOrSocket() doesn't currently
support waiting for a socket to become writeable, without also waiting
for it to become readable. I wonder how difficult it would be to lift
that restriction.My recollection is that there was a reason for that, but I don't recall
details any more.http://git.postgresql.org/pg/commitdiff/e42a21b9e6c9b9e6346a34b62628d48ff2fc6ddf
In my prototype I've changed the API that errors set both
READABLE/WRITABLE. Seems to work....Andres, would you mind posting the WIP patch you have? That could be a
better foundation for this patch.Sorry, I missed this message and only cought up when reading your CF
status mail. I've attached three patches:0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
tested the poll() and select() implementations on linux and
blindly patched up windows.
0002: Put the socket the backend uses to communicate with the client
into nonblocking mode as soon as latches are ready and use latches
to wait. This probably doesn't work correctly without 0003, but
seems easier to review separately.
0003: Don't do sinval catchup and notify processing in signal
handlers. It's quite cool that it worked that well so far, but it
requires some complicated code and is rather fragile. 0002 allows
to move that out of signal handlers and just use a latch
there. This seems remarkably simpler:
4 files changed, 69 insertions(+), 229 deletions(-)These aren't ready for commit, especially not 0003, but I think they are
quite a good foundation for getting rid of the blocking in send(). I
haven't added any interrupt processing after interrupted writes, but
marked the relevant places with XXXs.With regard to 0002, I dislike the current need to do interrupt
processing both in be-secure.c and be-secure-openssl.c. I guess we could
solve that by returning something like EINTR from the ssl routines when
they need further reads/writes and do all the processing in one place in
be-secure.c.There's also some cleanup in 0002/0003 needed:
prepare_for_client_read()/client_read_ended() aren't needed in that form
anymore and should probably rather be something like
CHECK_FOR_READ_INTERRUPT() or similar. Similarly the
EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
pretty ugly.Btw, be-secure.c is really not a good name anymore...
What do you think?
I've invested some more time in this:
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.
1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
0001-Allow-to-wait-for-a-writable-socket-in-WaitLatchOrSo.patchtext/x-patch; charset=us-asciiDownload
>From f3d3acb5a97c5856a5f2053850edba74b887d224 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 27 Sep 2014 20:34:11 +0200
Subject: [PATCH 1/4] Allow to wait for a writable socket in
WaitLatchOrSocket() without also waiting for reads.
Author: Andres Freund
---
src/backend/port/unix_latch.c | 16 ++++++++++------
src/backend/port/win32_latch.c | 13 ++++++++-----
2 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index d0e928f..ec3ffa8 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -200,9 +200,8 @@ WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
* Like WaitLatch, but with an extra socket argument for WL_SOCKET_*
* conditions.
*
- * When waiting on a socket, WL_SOCKET_READABLE *must* be included in
- * 'wakeEvents'; WL_SOCKET_WRITEABLE is optional. The reason for this is
- * that EOF and error conditions are reported only via WL_SOCKET_READABLE.
+ * EOF and error condition are reported by returning WL_SOCKET_READABLE |
+ * WL_SOCKET_WRITEABLE.
*/
int
WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
@@ -230,8 +229,6 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
Assert(wakeEvents != 0); /* must have at least one wake event */
- /* Cannot specify WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE */
- Assert((wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) != WL_SOCKET_WRITEABLE);
if ((wakeEvents & WL_LATCH_SET) && latch->owner_pid != MyProcPid)
elog(ERROR, "cannot wait on a latch owned by another process");
@@ -346,7 +343,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
{
/* at least one event occurred, so check revents values */
if ((wakeEvents & WL_SOCKET_READABLE) &&
- (pfds[0].revents & (POLLIN | POLLHUP | POLLERR | POLLNVAL)))
+ (pfds[0].revents & POLLIN))
{
/* data available in socket, or EOF/error condition */
result |= WL_SOCKET_READABLE;
@@ -354,8 +351,14 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
if ((wakeEvents & WL_SOCKET_WRITEABLE) &&
(pfds[0].revents & POLLOUT))
{
+ /* socket is writable */
result |= WL_SOCKET_WRITEABLE;
}
+ if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
+ {
+ /* EOF/error condition */
+ result |= WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE;
+ }
/*
* We expect a POLLHUP when the remote end is closed, but because
@@ -439,6 +442,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
}
if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
{
+ /* data available in socket, or EOF */
result |= WL_SOCKET_WRITEABLE;
}
if ((wakeEvents & WL_POSTMASTER_DEATH) &&
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 6c50dbb..95e4a16 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -117,8 +117,6 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
Assert(wakeEvents != 0); /* must have at least one wake event */
- /* Cannot specify WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE */
- Assert((wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) != WL_SOCKET_WRITEABLE);
if ((wakeEvents & WL_LATCH_SET) && latch->owner_pid != MyProcPid)
elog(ERROR, "cannot wait on a latch owned by another process");
@@ -152,10 +150,10 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
if (wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE))
{
/* Need an event object to represent events on the socket */
- int flags = 0;
+ int flags = FD_CLOSE;
if (wakeEvents & WL_SOCKET_READABLE)
- flags |= (FD_READ | FD_CLOSE);
+ flags |= FD_READ;
if (wakeEvents & WL_SOCKET_WRITEABLE)
flags |= FD_WRITE;
@@ -232,7 +230,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
elog(ERROR, "failed to enumerate network events: error code %u",
WSAGetLastError());
if ((wakeEvents & WL_SOCKET_READABLE) &&
- (resEvents.lNetworkEvents & (FD_READ | FD_CLOSE)))
+ (resEvents.lNetworkEvents & FD_READ))
{
result |= WL_SOCKET_READABLE;
}
@@ -241,6 +239,11 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
{
result |= WL_SOCKET_WRITEABLE;
}
+ if (resEvents.lNetworkEvents & FD_CLOSE)
+ {
+ result |= WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE;
+ }
+
}
else if ((wakeEvents & WL_POSTMASTER_DEATH) &&
rc == WAIT_OBJECT_0 + pmdeath_eventno)
--
1.8.3.251.g1462b67
0002-Make-backends-use-a-nonblocking-socket-to-communicat.patchtext/x-patch; charset=us-asciiDownload
>From 04d18663972cf07bf283ac55b10760bbb34841b5 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 27 Sep 2014 20:52:56 +0200
Subject: [PATCH 2/4] Make backends use a nonblocking socket to communicate
with the frontend, block with latches.
Likely doesn't work correctly without the next patch - but it's easier
to review this way.
Author: Andres Freund
---
src/backend/libpq/be-secure-openssl.c | 24 +++++++---------
src/backend/libpq/be-secure.c | 53 ++++++++++++++++++++++++++++++++++-
src/backend/libpq/pqcomm.c | 27 +-----------------
src/backend/tcop/postgres.c | 21 ++++++++++++++
4 files changed, 84 insertions(+), 41 deletions(-)
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 8d8f129..31fd004 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -48,6 +48,8 @@
#include "postgres.h"
+#include "miscadmin.h"
+
#include <sys/stat.h>
#include <signal.h>
#include <fcntl.h>
@@ -72,6 +74,7 @@
#include "libpq/libpq.h"
#include "tcop/tcopprot.h"
+#include "storage/proc.h"
#include "utils/memutils.h"
@@ -371,12 +374,11 @@ aloop:
{
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
-#ifdef WIN32
- pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
- (err == SSL_ERROR_WANT_READ) ?
- FD_READ | FD_CLOSE | FD_ACCEPT : FD_WRITE | FD_CLOSE,
- INFINITE);
-#endif
+ if (MyProc != NULL)
+ WaitLatchOrSocket(&MyProc->procLatch,
+ err == SSL_ERROR_WANT_READ ?
+ WL_SOCKET_READABLE : WL_SOCKET_WRITEABLE,
+ port->sock, 0);
goto aloop;
case SSL_ERROR_SYSCALL:
if (r < 0)
@@ -522,12 +524,6 @@ rloop:
n = -1;
break;
}
-#ifdef WIN32
- pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
- (err == SSL_ERROR_WANT_READ) ?
- FD_READ | FD_CLOSE : FD_WRITE | FD_CLOSE,
- INFINITE);
-#endif
goto rloop;
case SSL_ERROR_SYSCALL:
/* leave it to caller to ereport the value of errno */
@@ -722,7 +718,7 @@ my_sock_read(BIO *h, char *buf, int size)
if (res <= 0)
{
/* If we were interrupted, tell caller to retry */
- if (errno == EINTR)
+ if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
BIO_set_retry_read(h);
}
@@ -741,7 +737,7 @@ my_sock_write(BIO *h, const char *buf, int size)
BIO_clear_retry_flags(h);
if (res <= 0)
{
- if (errno == EINTR)
+ if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
BIO_set_retry_write(h);
}
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 41ec1ad..605c2be 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -18,6 +18,8 @@
#include "postgres.h"
+#include "miscadmin.h"
+
#include <sys/stat.h>
#include <signal.h>
#include <fcntl.h>
@@ -34,6 +36,7 @@
#include "libpq/libpq.h"
#include "tcop/tcopprot.h"
#include "utils/memutils.h"
+#include "storage/proc.h"
char *ssl_cert_file;
@@ -147,8 +150,27 @@ secure_raw_read(Port *port, void *ptr, size_t len)
prepare_for_client_read();
+ /*
+ * Try to read from the socket without blocking. If it suceeds we're
+ * done, otherwise we'll wait for the socket using the latch mechanism.
+ */
+rloop:
n = recv(port->sock, ptr, len, 0);
+ if (!port->noblock && n <= 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
+ {
+ int w;
+
+ Assert(MyProc);
+
+ w = WaitLatchOrSocket(&MyProc->procLatch,
+ WL_SOCKET_READABLE,
+ port->sock, 0);
+
+ if (w & WL_SOCKET_READABLE)
+ goto rloop;
+ }
+
client_read_ended();
return n;
@@ -170,7 +192,9 @@ secure_write(Port *port, void *ptr, size_t len)
}
else
#endif
+ {
n = secure_raw_write(port, ptr, len);
+ }
return n;
}
@@ -178,5 +202,32 @@ secure_write(Port *port, void *ptr, size_t len)
ssize_t
secure_raw_write(Port *port, const void *ptr, size_t len)
{
- return send(port->sock, ptr, len, 0);
+ ssize_t n;
+
+wloop:
+ n = send(port->sock, ptr, len, 0);
+
+ if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
+ {
+ int w;
+
+ Assert(MyProc);
+
+ /*
+ * We probably want to check for latches being set at some point
+ * here. That'd allow us to handle interrupts while blocked on
+ * writes. If set we'd not retry directly, but return. That way we
+ * don't do anything while (possibly) inside a ssl library.
+ */
+ w = WaitLatchOrSocket(&MyProc->procLatch,
+ WL_SOCKET_WRITEABLE,
+ port->sock, 0);
+
+ if (w & WL_SOCKET_WRITEABLE)
+ {
+ goto wloop;
+ }
+ }
+
+ return n;
}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 605d891..443a243 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -792,31 +792,6 @@ TouchSocketFiles(void)
static void
pq_set_nonblocking(bool nonblocking)
{
- if (MyProcPort->noblock == nonblocking)
- return;
-
-#ifdef WIN32
- pgwin32_noblock = nonblocking ? 1 : 0;
-#else
-
- /*
- * Use COMMERROR on failure, because ERROR would try to send the error to
- * the client, which might require changing the mode again, leading to
- * infinite recursion.
- */
- if (nonblocking)
- {
- if (!pg_set_noblock(MyProcPort->sock))
- ereport(COMMERROR,
- (errmsg("could not set socket to nonblocking mode: %m")));
- }
- else
- {
- if (!pg_set_block(MyProcPort->sock))
- ereport(COMMERROR,
- (errmsg("could not set socket to blocking mode: %m")));
- }
-#endif
MyProcPort->noblock = nonblocking;
}
@@ -945,7 +920,7 @@ pq_getbyte_if_available(unsigned char *c)
* EINTR really shouldn't happen with a non-blocking socket). Report
* other errors.
*/
- if (errno == EAGAIN || errno == EWOULDBLOCK || errno == EINTR)
+ if (errno == EAGAIN || errno == EWOULDBLOCK)
r = 0;
else
{
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 61f17bf..b3a332e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3645,6 +3645,27 @@ PostgresMain(int argc, char *argv[],
* platforms */
}
+ /*
+ * We now always operate the underlying socket in nonblocking mode and use
+ * WaitLatchOrSocket to implement blocking semantics if needed. We can't
+ * put the socket into nonblocking mode earlier because latches need some
+ * setup. We also can't ever do so while bootstrapping.
+ *
+ * Use COMMERROR on failure, because ERROR would try to send the error to
+ * the client, which might require changing the mode again, leading to
+ * infinite recursion.
+ */
+ if (MyProcPort != NULL)
+ {
+#ifdef WIN32
+ pgwin32_noblock = true;
+#else
+ if (!pg_set_noblock(MyProcPort->sock))
+ ereport(COMMERROR,
+ (errmsg("could not set socket to nonblocking mode: %m")));
+ }
+#endif
+
pqinitmask();
if (IsUnderPostmaster)
--
1.8.3.251.g1462b67
0003-Move-sinval-catchup-and-notify-processing-out-of-sig.patchtext/x-patch; charset=us-asciiDownload
>From 9b9ffc8377bc90075932321eed23636d36a46ce8 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 27 Sep 2014 23:49:30 +0200
Subject: [PATCH 3/4] Move sinval catchup and notify processing out of signal
handlers.
Author: Andres Freund
---
src/backend/commands/async.c | 186 ++++++----------------------------
src/backend/libpq/be-secure-openssl.c | 25 +++--
src/backend/libpq/be-secure.c | 28 +++--
src/backend/postmaster/autovacuum.c | 6 +-
src/backend/storage/ipc/sinval.c | 166 ++++--------------------------
src/backend/tcop/postgres.c | 103 ++++++-------------
src/include/commands/async.h | 12 +--
src/include/storage/sinval.h | 7 +-
src/include/tcop/tcopprot.h | 4 +-
9 files changed, 137 insertions(+), 400 deletions(-)
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 92f2077..05e02c3 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -126,6 +126,7 @@
#include "miscadmin.h"
#include "storage/ipc.h"
#include "storage/lmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/procsignal.h"
#include "storage/sinval.h"
@@ -334,17 +335,13 @@ static List *pendingNotifies = NIL; /* list of Notifications */
static List *upperPendingNotifies = NIL; /* list of upper-xact lists */
/*
- * State for inbound notifications consists of two flags: one saying whether
- * the signal handler is currently allowed to call ProcessIncomingNotify
- * directly, and one saying whether the signal has occurred but the handler
- * was not allowed to call ProcessIncomingNotify at the time.
- *
- * NB: the "volatile" on these declarations is critical! If your compiler
- * does not grok "volatile", you'd be best advised to compile this file
- * with all optimization turned off.
+ * Inbound notifications are initially processed by HandleNotifyInterrupt(),
+ * called from inside a signal handler. That just sets the
+ * notifyInterruptPending flag and sets the process
+ * latch. ProcessNotifyInterrupt() will then be called whenever it's safe to
+ * actually deal with the interrupt.
*/
-static volatile sig_atomic_t notifyInterruptEnabled = 0;
-static volatile sig_atomic_t notifyInterruptOccurred = 0;
+volatile sig_atomic_t notifyInterruptPending = 0;
/* True if we've registered an on_shmem_exit cleanup */
static bool unlistenExitRegistered = false;
@@ -1625,11 +1622,10 @@ AtSubAbort_Notify(void)
/*
* HandleNotifyInterrupt
*
- * This is called when PROCSIG_NOTIFY_INTERRUPT is received.
- *
- * If we are idle (notifyInterruptEnabled is set), we can safely invoke
- * ProcessIncomingNotify directly. Otherwise, just set a flag
- * to do it later.
+ * Signal handler portion of interrupt handling. Let the backend know
+ * that there's a pending notify interrupt. If we're currently reading
+ * from the client, this will interrupt the read and
+ * ProcessClientReadInterrupt() will call ProcessNotifyInterrupt().
*/
void
HandleNotifyInterrupt(void)
@@ -1641,148 +1637,35 @@ HandleNotifyInterrupt(void)
* they were ever turned on.
*/
- /* Don't joggle the elbow of proc_exit */
- if (proc_exit_inprogress)
- return;
-
- if (notifyInterruptEnabled)
- {
- bool save_ImmediateInterruptOK = ImmediateInterruptOK;
-
- /*
- * We may be called while ImmediateInterruptOK is true; turn it off
- * while messing with the NOTIFY state. This prevents problems if
- * SIGINT or similar arrives while we're working. Just to be real
- * sure, bump the interrupt holdoff counter as well. That way, even
- * if something inside ProcessIncomingNotify() transiently sets
- * ImmediateInterruptOK (eg while waiting on a lock), we won't get
- * interrupted until we're done with the notify interrupt.
- */
- ImmediateInterruptOK = false;
- HOLD_INTERRUPTS();
-
- /*
- * I'm not sure whether some flavors of Unix might allow another
- * SIGUSR1 occurrence to recursively interrupt this routine. To cope
- * with the possibility, we do the same sort of dance that
- * EnableNotifyInterrupt must do --- see that routine for comments.
- */
- notifyInterruptEnabled = 0; /* disable any recursive signal */
- notifyInterruptOccurred = 1; /* do at least one iteration */
- for (;;)
- {
- notifyInterruptEnabled = 1;
- if (!notifyInterruptOccurred)
- break;
- notifyInterruptEnabled = 0;
- if (notifyInterruptOccurred)
- {
- /* Here, it is finally safe to do stuff. */
- if (Trace_notify)
- elog(DEBUG1, "HandleNotifyInterrupt: perform async notify");
-
- ProcessIncomingNotify();
-
- if (Trace_notify)
- elog(DEBUG1, "HandleNotifyInterrupt: done");
- }
- }
+ /* signal that work needs to be done */
+ notifyInterruptPending = 1;
- /*
- * Restore the holdoff level and ImmediateInterruptOK, and check for
- * interrupts if needed.
- */
- RESUME_INTERRUPTS();
- ImmediateInterruptOK = save_ImmediateInterruptOK;
- if (save_ImmediateInterruptOK)
- CHECK_FOR_INTERRUPTS();
- }
- else
- {
- /*
- * In this path it is NOT SAFE to do much of anything, except this:
- */
- notifyInterruptOccurred = 1;
- }
+ /* make sure the event is processed in due course */
+ if (MyProc != NULL)
+ SetLatch(&MyProc->procLatch);
}
-
/*
- * EnableNotifyInterrupt
- *
- * This is called by the PostgresMain main loop just before waiting
- * for a frontend command. If we are truly idle (ie, *not* inside
- * a transaction block), then process any pending inbound notifies,
- * and enable the signal handler to process future notifies directly.
+ * ProcessNotifyInterrupt
*
- * NOTE: the signal handler starts out disabled, and stays so until
- * PostgresMain calls this the first time.
+ * This is called just before/after waiting for a frontend command. If a
+ * interrupt arrives (via HandleNotifyInterrupt()) while reading, the
+ * read will be interrupted, and this will get called. If we are truly
+ * idle (ie, *not* inside a transaction block), process the incoming
+ * notifies.
*/
+
void
-EnableNotifyInterrupt(void)
+ProcessNotifyInterrupt(void)
{
if (IsTransactionOrTransactionBlock())
return; /* not really idle */
- /*
- * This code is tricky because we are communicating with a signal handler
- * that could interrupt us at any point. If we just checked
- * notifyInterruptOccurred and then set notifyInterruptEnabled, we could
- * fail to respond promptly to a signal that happens in between those two
- * steps. (A very small time window, perhaps, but Murphy's Law says you
- * can hit it...) Instead, we first set the enable flag, then test the
- * occurred flag. If we see an unserviced interrupt has occurred, we
- * re-clear the enable flag before going off to do the service work. (That
- * prevents re-entrant invocation of ProcessIncomingNotify() if another
- * interrupt occurs.) If an interrupt comes in between the setting and
- * clearing of notifyInterruptEnabled, then it will have done the service
- * work and left notifyInterruptOccurred zero, so we have to check again
- * after clearing enable. The whole thing has to be in a loop in case
- * another interrupt occurs while we're servicing the first. Once we get
- * out of the loop, enable is set and we know there is no unserviced
- * interrupt.
- *
- * NB: an overenthusiastic optimizing compiler could easily break this
- * code. Hopefully, they all understand what "volatile" means these days.
- */
- for (;;)
+ while (notifyInterruptPending)
{
- notifyInterruptEnabled = 1;
- if (!notifyInterruptOccurred)
- break;
- notifyInterruptEnabled = 0;
- if (notifyInterruptOccurred)
- {
- if (Trace_notify)
- elog(DEBUG1, "EnableNotifyInterrupt: perform async notify");
-
- ProcessIncomingNotify();
-
- if (Trace_notify)
- elog(DEBUG1, "EnableNotifyInterrupt: done");
- }
+ ProcessIncomingNotify();
}
}
-/*
- * DisableNotifyInterrupt
- *
- * This is called by the PostgresMain main loop just after receiving
- * a frontend command. Signal handler execution of inbound notifies
- * is disabled until the next EnableNotifyInterrupt call.
- *
- * The PROCSIG_CATCHUP_INTERRUPT signal handler also needs to call this,
- * so as to prevent conflicts if one signal interrupts the other. So we
- * must return the previous state of the flag.
- */
-bool
-DisableNotifyInterrupt(void)
-{
- bool result = (notifyInterruptEnabled != 0);
-
- notifyInterruptEnabled = 0;
-
- return result;
-}
/*
* Read all pending notifications from the queue, and deliver appropriate
@@ -2076,9 +1959,10 @@ asyncQueueAdvanceTail(void)
/*
* ProcessIncomingNotify
*
- * Deal with arriving NOTIFYs from other backends.
- * This is called either directly from the PROCSIG_NOTIFY_INTERRUPT
- * signal handler, or the next time control reaches the outer idle loop.
+ * Deal with arriving NOTIFYs from other backends as soon as it's safe to
+ * do so. This used to be called from the PROCSIG_NOTIFY_INTERRUPT
+ * signal handler, but isn't anymore.
+ *
* Scan the queue for arriving notifications and report them to my front
* end.
*
@@ -2087,18 +1971,13 @@ asyncQueueAdvanceTail(void)
static void
ProcessIncomingNotify(void)
{
- bool catchup_enabled;
-
/* We *must* reset the flag */
- notifyInterruptOccurred = 0;
+ notifyInterruptPending = 0;
/* Do nothing else if we aren't actively listening */
if (listenChannels == NIL)
return;
- /* Must prevent catchup interrupt while I am running */
- catchup_enabled = DisableCatchupInterrupt();
-
if (Trace_notify)
elog(DEBUG1, "ProcessIncomingNotify");
@@ -2123,9 +2002,6 @@ ProcessIncomingNotify(void)
if (Trace_notify)
elog(DEBUG1, "ProcessIncomingNotify: done");
-
- if (catchup_enabled)
- EnableCatchupInterrupt();
}
/*
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 31fd004..6fc6903 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -374,6 +374,7 @@ aloop:
{
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
+ /* FIXME: interrupt handling? */
if (MyProc != NULL)
WaitLatchOrSocket(&MyProc->procLatch,
err == SSL_ERROR_WANT_READ ?
@@ -518,6 +519,17 @@ rloop:
break;
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
+ /*
+ * We'll, among other situations, get here if the low level
+ * routine doing the actual recv() via the socket got interrupted
+ * by a signal. That's so we can handle interrupts once outside
+ * openssl so we don't jump out from underneath its covers. We can
+ * check this both, when reading and writing, because even when
+ * writing that's just openssl's doing, not a 'proper' write
+ * initiated by postgres.
+ */
+ ProcessClientReadInterrupt();
+
if (port->noblock)
{
errno = EWOULDBLOCK;
@@ -626,12 +638,13 @@ wloop:
break;
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
-#ifdef WIN32
- pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
- (err == SSL_ERROR_WANT_READ) ?
- FD_READ | FD_CLOSE : FD_WRITE | FD_CLOSE,
- INFINITE);
-#endif
+ /* XXX: We likely will want to process some interrupts here */
+ if (port->noblock)
+ {
+ errno = EWOULDBLOCK;
+ n = -1;
+ break;
+ }
goto wloop;
case SSL_ERROR_SYSCALL:
/* leave it to caller to ereport the value of errno */
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 605c2be..7b5b30f 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -129,6 +129,7 @@ secure_read(Port *port, void *ptr, size_t len)
{
ssize_t n;
+retry:
#ifdef USE_SSL
if (port->ssl_in_use)
{
@@ -140,6 +141,14 @@ secure_read(Port *port, void *ptr, size_t len)
n = secure_raw_read(port, ptr, len);
}
+ /* Process interrupts that happened while (or before) receiving. */
+ ProcessClientReadInterrupt();
+
+ /* retry after processing interrupts */
+ if (n < 0 && errno == EINTR)
+ {
+ goto retry;
+ }
return n;
}
@@ -148,8 +157,6 @@ secure_raw_read(Port *port, void *ptr, size_t len)
{
ssize_t n;
- prepare_for_client_read();
-
/*
* Try to read from the socket without blocking. If it suceeds we're
* done, otherwise we'll wait for the socket using the latch mechanism.
@@ -164,15 +171,22 @@ rloop:
Assert(MyProc);
w = WaitLatchOrSocket(&MyProc->procLatch,
- WL_SOCKET_READABLE,
+ WL_LATCH_SET | WL_SOCKET_READABLE,
port->sock, 0);
- if (w & WL_SOCKET_READABLE)
+ if (w & WL_LATCH_SET)
+ {
+ ResetLatch(&MyProc->procLatch);
+ /*
+ * Force a return, so interrupts can be processed when not
+ * (possibly) underneath a ssl library.
+ */
+ errno = EINTR;
+ }
+ else if (w & WL_SOCKET_READABLE)
goto rloop;
}
- client_read_ended();
-
return n;
}
@@ -196,6 +210,8 @@ secure_write(Port *port, void *ptr, size_t len)
n = secure_raw_write(port, ptr, len);
}
+ /* XXX: We likely will want to process some interrupts here */
+
return n;
}
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c240d24..4a43c81 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -601,9 +601,6 @@ AutoVacLauncherMain(int argc, char *argv[])
launcher_determine_sleep(!dlist_is_empty(&AutoVacuumShmem->av_freeWorkers),
false, &nap);
- /* Allow sinval catchup interrupts while sleeping */
- EnableCatchupInterrupt();
-
/*
* Wait until naptime expires or we get some type of signal (all the
* signal handlers will wake us by calling SetLatch).
@@ -614,7 +611,8 @@ AutoVacLauncherMain(int argc, char *argv[])
ResetLatch(&MyProc->procLatch);
- DisableCatchupInterrupt();
+ /* Process sinval catchup interrupts that happened while sleeping */
+ ProcessCatchupInterrupt();
/*
* Emergency bailout if postmaster has died. This is to avoid the
diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index d7d0406..307be49 100644
--- a/src/backend/storage/ipc/sinval.c
+++ b/src/backend/storage/ipc/sinval.c
@@ -18,6 +18,7 @@
#include "commands/async.h"
#include "miscadmin.h"
#include "storage/ipc.h"
+#include "storage/proc.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
@@ -32,17 +33,12 @@ uint64 SharedInvalidMessageCounter;
* through a cache reset exercise. This is done by sending
* PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind.
*
- * State for catchup events consists of two flags: one saying whether
- * the signal handler is currently allowed to call ProcessCatchupEvent
- * directly, and one saying whether the signal has occurred but the handler
- * was not allowed to call ProcessCatchupEvent at the time.
- *
- * NB: the "volatile" on these declarations is critical! If your compiler
- * does not grok "volatile", you'd be best advised to compile this file
- * with all optimization turned off.
+ * The signal handler will set a interrupt pending flag and will set the
+ * processes latch. Whenever starting to read from the client, or when
+ * interrupted while doing so, ProcessClientReadInterrupt() will call
+ * ProcessCatchupEvent().
*/
-static volatile int catchupInterruptEnabled = 0;
-static volatile int catchupInterruptOccurred = 0;
+volatile sig_atomic_t catchupInterruptPending = 0;
static void ProcessCatchupEvent(void);
@@ -141,9 +137,9 @@ ReceiveSharedInvalidMessages(
* catchup signal this way avoids creating spikes in system load for what
* should be just a background maintenance activity.
*/
- if (catchupInterruptOccurred)
+ if (catchupInterruptPending)
{
- catchupInterruptOccurred = 0;
+ catchupInterruptPending = 0;
elog(DEBUG4, "sinval catchup complete, cleaning queue");
SICleanupQueue(false, 0);
}
@@ -155,12 +151,9 @@ ReceiveSharedInvalidMessages(
*
* This is called when PROCSIG_CATCHUP_INTERRUPT is received.
*
- * If we are idle (catchupInterruptEnabled is set), we can safely
- * invoke ProcessCatchupEvent directly. Otherwise, just set a flag
- * to do it later. (Note that it's quite possible for normal processing
- * of the current transaction to cause ReceiveSharedInvalidMessages()
- * to be run later on; in that case the flag will get cleared again,
- * since there's no longer any reason to do anything.)
+ * We used to directly call ProcessCatchupEvent directly when idle. These days
+ * we just set a flag to do it later and notify the process of that fact by
+ * setting the processes latch.
*/
void
HandleCatchupInterrupt(void)
@@ -170,153 +163,37 @@ HandleCatchupInterrupt(void)
* you do here.
*/
- /* Don't joggle the elbow of proc_exit */
- if (proc_exit_inprogress)
- return;
-
- if (catchupInterruptEnabled)
- {
- bool save_ImmediateInterruptOK = ImmediateInterruptOK;
-
- /*
- * We may be called while ImmediateInterruptOK is true; turn it off
- * while messing with the catchup state. This prevents problems if
- * SIGINT or similar arrives while we're working. Just to be real
- * sure, bump the interrupt holdoff counter as well. That way, even
- * if something inside ProcessCatchupEvent() transiently sets
- * ImmediateInterruptOK (eg while waiting on a lock), we won't get
- * interrupted until we're done with the catchup interrupt.
- */
- ImmediateInterruptOK = false;
- HOLD_INTERRUPTS();
-
- /*
- * I'm not sure whether some flavors of Unix might allow another
- * SIGUSR1 occurrence to recursively interrupt this routine. To cope
- * with the possibility, we do the same sort of dance that
- * EnableCatchupInterrupt must do --- see that routine for comments.
- */
- catchupInterruptEnabled = 0; /* disable any recursive signal */
- catchupInterruptOccurred = 1; /* do at least one iteration */
- for (;;)
- {
- catchupInterruptEnabled = 1;
- if (!catchupInterruptOccurred)
- break;
- catchupInterruptEnabled = 0;
- if (catchupInterruptOccurred)
- {
- /* Here, it is finally safe to do stuff. */
- ProcessCatchupEvent();
- }
- }
+ catchupInterruptPending = 1;
- /*
- * Restore the holdoff level and ImmediateInterruptOK, and check for
- * interrupts if needed.
- */
- RESUME_INTERRUPTS();
- ImmediateInterruptOK = save_ImmediateInterruptOK;
- if (save_ImmediateInterruptOK)
- CHECK_FOR_INTERRUPTS();
- }
- else
- {
- /*
- * In this path it is NOT SAFE to do much of anything, except this:
- */
- catchupInterruptOccurred = 1;
- }
+ /* make sure the event is processed in due course */
+ if (MyProc != NULL)
+ SetLatch(&MyProc->procLatch);
}
-/*
- * EnableCatchupInterrupt
- *
- * This is called by the PostgresMain main loop just before waiting
- * for a frontend command. We process any pending catchup events,
- * and enable the signal handler to process future events directly.
- *
- * NOTE: the signal handler starts out disabled, and stays so until
- * PostgresMain calls this the first time.
- */
void
-EnableCatchupInterrupt(void)
+ProcessCatchupInterrupt(void)
{
- /*
- * This code is tricky because we are communicating with a signal handler
- * that could interrupt us at any point. If we just checked
- * catchupInterruptOccurred and then set catchupInterruptEnabled, we could
- * fail to respond promptly to a signal that happens in between those two
- * steps. (A very small time window, perhaps, but Murphy's Law says you
- * can hit it...) Instead, we first set the enable flag, then test the
- * occurred flag. If we see an unserviced interrupt has occurred, we
- * re-clear the enable flag before going off to do the service work. (That
- * prevents re-entrant invocation of ProcessCatchupEvent() if another
- * interrupt occurs.) If an interrupt comes in between the setting and
- * clearing of catchupInterruptEnabled, then it will have done the service
- * work and left catchupInterruptOccurred zero, so we have to check again
- * after clearing enable. The whole thing has to be in a loop in case
- * another interrupt occurs while we're servicing the first. Once we get
- * out of the loop, enable is set and we know there is no unserviced
- * interrupt.
- *
- * NB: an overenthusiastic optimizing compiler could easily break this
- * code. Hopefully, they all understand what "volatile" means these days.
- */
for (;;)
{
- catchupInterruptEnabled = 1;
- if (!catchupInterruptOccurred)
+ if (!catchupInterruptPending)
break;
- catchupInterruptEnabled = 0;
- if (catchupInterruptOccurred)
- ProcessCatchupEvent();
+ ProcessCatchupEvent();
}
}
/*
- * DisableCatchupInterrupt
- *
- * This is called by the PostgresMain main loop just after receiving
- * a frontend command. Signal handler execution of catchup events
- * is disabled until the next EnableCatchupInterrupt call.
- *
- * The PROCSIG_NOTIFY_INTERRUPT signal handler also needs to call this,
- * so as to prevent conflicts if one signal interrupts the other. So we
- * must return the previous state of the flag.
- */
-bool
-DisableCatchupInterrupt(void)
-{
- bool result = (catchupInterruptEnabled != 0);
-
- catchupInterruptEnabled = 0;
-
- return result;
-}
-
-/*
* ProcessCatchupEvent
*
* Respond to a catchup event (PROCSIG_CATCHUP_INTERRUPT) from another
- * backend.
- *
- * This is called either directly from the PROCSIG_CATCHUP_INTERRUPT
- * signal handler, or the next time control reaches the outer idle loop
- * (assuming there's still anything to do by then).
+ * backend once it's safe to do so.
*/
static void
ProcessCatchupEvent(void)
{
- bool notify_enabled;
-
- /* Must prevent notify interrupt while I am running */
- notify_enabled = DisableNotifyInterrupt();
-
/*
* What we need to do here is cause ReceiveSharedInvalidMessages() to run,
* which will do the necessary work and also reset the
- * catchupInterruptOccurred flag. If we are inside a transaction we can
+ * catchupInterruptPending flag. If we are inside a transaction we can
* just call AcceptInvalidationMessages() to do this. If we aren't, we
* start and immediately end a transaction; the call to
* AcceptInvalidationMessages() happens down inside transaction start.
@@ -337,7 +214,4 @@ ProcessCatchupEvent(void)
StartTransactionCommand();
CommitTransactionCommand();
}
-
- if (notify_enabled)
- EnableNotifyInterrupt();
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b3a332e..3a6aa1c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -302,17 +302,23 @@ InteractiveBackend(StringInfo inBuf)
* interactive_getc -- collect one character from stdin
*
* Even though we are not reading from a "client" process, we still want to
- * respond to signals, particularly SIGTERM/SIGQUIT. Hence we must use
- * prepare_for_client_read and client_read_ended.
+ * respond to signals, particularly SIGTERM/SIGQUIT. FIXME.
*/
static int
interactive_getc(void)
{
int c;
- prepare_for_client_read();
+ /*
+ * FIXME: this will not process catchup interrupts or notifications. But
+ * those can't really be relevant for a standalone backend?
+ */
+ ProcessClientReadInterrupt();
+
c = getc(stdin);
- client_read_ended();
+
+ ProcessClientReadInterrupt();
+
return c;
}
@@ -487,50 +493,30 @@ ReadCommand(StringInfo inBuf)
}
/*
- * prepare_for_client_read -- set up to possibly block on client input
+ * ProcessClientReadInterrupt() - Process interrupts specific to client reads
*
- * This must be called immediately before any low-level read from the
- * client connection. It is necessary to do it at a sufficiently low level
- * that there won't be any other operations except the read kernel call
- * itself between this call and the subsequent client_read_ended() call.
- * In particular there mustn't be use of malloc() or other potentially
- * non-reentrant libc functions. This restriction makes it safe for us
- * to allow interrupt service routines to execute nontrivial code while
- * we are waiting for input.
- */
-void
-prepare_for_client_read(void)
-{
- if (DoingCommandRead)
- {
- /* Enable immediate processing of asynchronous signals */
- EnableNotifyInterrupt();
- EnableCatchupInterrupt();
-
- /* Allow cancel/die interrupts to be processed while waiting */
- ImmediateInterruptOK = true;
-
- /* And don't forget to detect one that already arrived */
- CHECK_FOR_INTERRUPTS();
- }
-}
-
-/*
- * client_read_ended -- get out of the client-input state
+ * This is called just after low-level reads. That might be after the read
+ * finished successfully, or it was interrupted via interrupt.
*
- * This is called just after low-level reads. It must preserve errno!
+ * Must preserve errno!
*/
void
-client_read_ended(void)
+ProcessClientReadInterrupt(void)
{
if (DoingCommandRead)
{
int save_errno = errno;
- ImmediateInterruptOK = false;
+ /* Check for general interrupts that arrived while reading */
+ CHECK_FOR_INTERRUPTS();
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
+ /* Process sinval catchup interrupts that happened while reading */
+ if (catchupInterruptPending)
+ ProcessCatchupInterrupt();
+
+ /* Process sinval catchup interrupts that happened while reading */
+ if (notifyInterruptPending)
+ ProcessNotifyInterrupt();
errno = save_errno;
}
@@ -2588,8 +2574,8 @@ die(SIGNAL_ARGS)
ProcDiePending = true;
/*
- * If it's safe to interrupt, and we're waiting for input or a lock,
- * service the interrupt immediately
+ * If it's safe to interrupt, and we're waiting for a lock, service
+ * the interrupt immediately
*/
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
@@ -2598,8 +2584,6 @@ die(SIGNAL_ARGS)
/* until we are done getting ready for it */
InterruptHoldoffCount++;
LockErrorCleanup(); /* prevent CheckDeadLock from running */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
InterruptHoldoffCount--;
ProcessInterrupts();
}
@@ -2630,8 +2614,8 @@ StatementCancelHandler(SIGNAL_ARGS)
QueryCancelPending = true;
/*
- * If it's safe to interrupt, and we're waiting for input or a lock,
- * service the interrupt immediately
+ * If it's safe to interrupt, and we're waiting for a lock, service
+ * the interrupt immediately
*/
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
@@ -2640,8 +2624,6 @@ StatementCancelHandler(SIGNAL_ARGS)
/* until we are done getting ready for it */
InterruptHoldoffCount++;
LockErrorCleanup(); /* prevent CheckDeadLock from running */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
InterruptHoldoffCount--;
ProcessInterrupts();
}
@@ -2789,8 +2771,8 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
RecoveryConflictRetryable = false;
/*
- * If it's safe to interrupt, and we're waiting for input or a lock,
- * service the interrupt immediately
+ * If it's safe to interrupt, and we're waiting for a lock, service
+ * the interrupt immediately
*/
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
@@ -2799,8 +2781,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
/* until we are done getting ready for it */
InterruptHoldoffCount++;
LockErrorCleanup(); /* prevent CheckDeadLock from running */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
InterruptHoldoffCount--;
ProcessInterrupts();
}
@@ -2838,8 +2818,6 @@ ProcessInterrupts(void)
ProcDiePending = false;
QueryCancelPending = false; /* ProcDie trumps QueryCancel */
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
/* As in quickdie, don't risk sending to client during auth */
if (ClientAuthInProgress && whereToSendOutput == DestRemote)
whereToSendOutput = DestNone;
@@ -2874,8 +2852,6 @@ ProcessInterrupts(void)
{
QueryCancelPending = false; /* lost connection trumps QueryCancel */
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
/* don't send to client, we already know the connection to be dead. */
whereToSendOutput = DestNone;
ereport(FATAL,
@@ -2888,8 +2864,6 @@ ProcessInterrupts(void)
if (ClientAuthInProgress)
{
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
/* As in quickdie, don't risk sending to client during auth */
if (whereToSendOutput == DestRemote)
whereToSendOutput = DestNone;
@@ -2906,8 +2880,6 @@ ProcessInterrupts(void)
{
ImmediateInterruptOK = false; /* not idle anymore */
(void) get_timeout_indicator(STATEMENT_TIMEOUT, true);
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
ereport(ERROR,
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
errmsg("canceling statement due to lock timeout")));
@@ -2915,8 +2887,6 @@ ProcessInterrupts(void)
if (get_timeout_indicator(STATEMENT_TIMEOUT, true))
{
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to statement timeout")));
@@ -2924,8 +2894,6 @@ ProcessInterrupts(void)
if (IsAutoVacuumWorkerProcess())
{
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling autovacuum task")));
@@ -2934,8 +2902,6 @@ ProcessInterrupts(void)
{
ImmediateInterruptOK = false; /* not idle anymore */
RecoveryConflictPending = false;
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
pgstat_report_recovery_conflict(RecoveryConflictReason);
if (DoingCommandRead)
ereport(FATAL,
@@ -2959,13 +2925,12 @@ ProcessInterrupts(void)
if (!DoingCommandRead)
{
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to user request")));
}
}
+
/* If we get here, do nothing (probably, QueryCancelPending was reset) */
}
@@ -3853,13 +3818,9 @@ PostgresMain(int argc, char *argv[],
QueryCancelPending = false; /* second to avoid race condition */
/*
- * Turn off these interrupts too. This is only needed here and not in
- * other exception-catching places since these interrupts are only
- * enabled while we wait for client input.
+ * Not reading from the client anymore.
*/
DoingCommandRead = false;
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
/* Make sure libpq is in a good state */
pq_comm_reset();
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 0650e65..520c17b 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -13,6 +13,8 @@
#ifndef ASYNC_H
#define ASYNC_H
+#include <signal.h>
+
#include "fmgr.h"
/*
@@ -21,6 +23,7 @@
#define NUM_ASYNC_BUFFERS 8
extern bool Trace_notify;
+extern volatile sig_atomic_t notifyInterruptPending;
extern Size AsyncShmemSize(void);
extern void AsyncShmemInit(void);
@@ -48,12 +51,7 @@ extern void ProcessCompletedNotifies(void);
/* signal handler for inbound notifies (PROCSIG_NOTIFY_INTERRUPT) */
extern void HandleNotifyInterrupt(void);
-/*
- * enable/disable processing of inbound notifies directly from signal handler.
- * The enable routine first performs processing of any inbound notifies that
- * have occurred since the last disable.
- */
-extern void EnableNotifyInterrupt(void);
-extern bool DisableNotifyInterrupt(void);
+/* process interrupts */
+extern void ProcessNotifyInterrupt(void);
#endif /* ASYNC_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 812ea95..13cd16e 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -14,8 +14,9 @@
#ifndef SINVAL_H
#define SINVAL_H
-#include "storage/relfilenode.h"
+#include <signal.h>
+#include "storage/relfilenode.h"
/*
* We support several types of shared-invalidation messages:
@@ -123,6 +124,7 @@ typedef union
/* Counter of messages processed; don't worry about overflow. */
extern uint64 SharedInvalidMessageCounter;
+extern volatile sig_atomic_t catchupInterruptPending;
extern void SendSharedInvalidMessages(const SharedInvalidationMessage *msgs,
int n);
@@ -138,8 +140,7 @@ extern void HandleCatchupInterrupt(void);
* The enable routine first performs processing of any catchup events that
* have occurred since the last disable.
*/
-extern void EnableCatchupInterrupt(void);
-extern bool DisableCatchupInterrupt(void);
+extern void ProcessCatchupInterrupt(void);
extern int xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
bool *RelcacheInitFileInval);
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 60f7532..e4a1a7d 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -67,8 +67,8 @@ extern void StatementCancelHandler(SIGNAL_ARGS);
extern void FloatExceptionHandler(SIGNAL_ARGS) __attribute__((noreturn));
extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
* handler */
-extern void prepare_for_client_read(void);
-extern void client_read_ended(void);
+extern void ProcessClientReadInterrupt(void);
+
extern void process_postgres_switches(int argc, char *argv[],
GucContext ctx, const char **dbname);
extern void PostgresMain(int argc, char *argv[],
--
1.8.3.251.g1462b67
0004-Heavily-WIP-Process-die-interrupts-while-reading-wri.patchtext/x-patch; charset=us-asciiDownload
>From 222254b14be455e750fd84be3a017a3c47d3f384 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 28 Sep 2014 00:22:39 +0200
Subject: [PATCH 4/4] Heavily-WIP: Process 'die' interrupts while
reading/writing from a socket.
Per discussion with Kyotaro HORIGUCHI and Heikki Linnakangas
---
src/backend/libpq/be-secure-openssl.c | 7 ++++++-
src/backend/libpq/be-secure.c | 23 ++++++++++++++++++++---
src/backend/tcop/postgres.c | 25 +++++++++++++++++++++++++
src/include/tcop/tcopprot.h | 1 +
4 files changed, 52 insertions(+), 4 deletions(-)
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 6fc6903..22e2a7b 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -638,7 +638,12 @@ wloop:
break;
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
- /* XXX: We likely will want to process some interrupts here */
+ /*
+ * Check for interrupts here, in addition to secure_write(),
+ * because a interrupted write in secure_raw_write() will only
+ * return here, not secure_write().
+ */
+ ProcessClientWriteInterrupt(err == SSL_ERROR_WANT_WRITE);
if (port->noblock)
{
errno = EWOULDBLOCK;
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 7b5b30f..6831dd8 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -199,6 +199,7 @@ secure_write(Port *port, void *ptr, size_t len)
{
ssize_t n;
+retry:
#ifdef USE_SSL
if (port->ssl_in_use)
{
@@ -210,7 +211,14 @@ secure_write(Port *port, void *ptr, size_t len)
n = secure_raw_write(port, ptr, len);
}
- /* XXX: We likely will want to process some interrupts here */
+ /* Process interrupts that happened while (or before) writing. */
+ ProcessClientWriteInterrupt(!port->noblock && n < 0);
+
+ /* retry after processing interrupts */
+ if (n < 0 && errno == EINTR)
+ {
+ goto retry;
+ }
return n;
}
@@ -236,10 +244,19 @@ wloop:
* don't do anything while (possibly) inside a ssl library.
*/
w = WaitLatchOrSocket(&MyProc->procLatch,
- WL_SOCKET_WRITEABLE,
+ WL_LATCH_SET | WL_SOCKET_WRITEABLE,
port->sock, 0);
- if (w & WL_SOCKET_WRITEABLE)
+ if (w & WL_LATCH_SET)
+ {
+ ResetLatch(&MyProc->procLatch);
+ /*
+ * Force a return, so interrupts can be processed when not
+ * (possibly) underneath a ssl library.
+ */
+ errno = EINTR;
+ }
+ else if (w & WL_SOCKET_WRITEABLE)
{
goto wloop;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3a6aa1c..ffc7822 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -520,8 +520,33 @@ ProcessClientReadInterrupt(void)
errno = save_errno;
}
+ else if (ProcDiePending)
+ {
+ /*
+ * We're dying. It's safe (and sane) to handle that now.
+ */
+ CHECK_FOR_INTERRUPTS();
+ }
}
+void
+ProcessClientWriteInterrupt(bool blocked)
+{
+ /*
+ * We only want to process the interrupt here if socket writes are
+ * blocking to increase the chance to get an error message to the
+ * client. If we're not blocked there'll soon be a
+ * CHECK_FOR_INTERRUPTS(). But if we're blocked we'll never get out of
+ * that situation if the client has died.
+ */
+ if (ProcDiePending && blocked)
+ {
+ /*
+ * We're dying. It's safe (and sane) to handle that now.
+ */
+ CHECK_FOR_INTERRUPTS();
+ }
+}
/*
* Do raw parsing (only).
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index e4a1a7d..d3db716 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -68,6 +68,7 @@ extern void FloatExceptionHandler(SIGNAL_ARGS) __attribute__((noreturn));
extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
* handler */
extern void ProcessClientReadInterrupt(void);
+extern void ProcessClientWriteInterrupt(bool blocked);
extern void process_postgres_switches(int argc, char *argv[],
GucContext ctx, const char **dbname);
--
1.8.3.251.g1462b67
On 2014-09-28 00:54:21 +0200, Andres Freund wrote:
On 2014-09-27 21:12:43 +0200, Andres Freund wrote:
On 2014-09-03 15:09:54 +0300, Heikki Linnakangas wrote:
Sorry, I missed this message and only cought up when reading your CF
status mail. I've attached three patches:0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
tested the poll() and select() implementations on linux and
blindly patched up windows.
0002: Put the socket the backend uses to communicate with the client
into nonblocking mode as soon as latches are ready and use latches
to wait. This probably doesn't work correctly without 0003, but
seems easier to review separately.
0003: Don't do sinval catchup and notify processing in signal
handlers. It's quite cool that it worked that well so far, but it
requires some complicated code and is rather fragile. 0002 allows
to move that out of signal handlers and just use a latch
there. This seems remarkably simpler:
4 files changed, 69 insertions(+), 229 deletions(-)These aren't ready for commit, especially not 0003, but I think they are
quite a good foundation for getting rid of the blocking in send(). I
haven't added any interrupt processing after interrupted writes, but
marked the relevant places with XXXs.With regard to 0002, I dislike the current need to do interrupt
processing both in be-secure.c and be-secure-openssl.c. I guess we could
solve that by returning something like EINTR from the ssl routines when
they need further reads/writes and do all the processing in one place in
be-secure.c.There's also some cleanup in 0002/0003 needed:
prepare_for_client_read()/client_read_ended() aren't needed in that form
anymore and should probably rather be something like
CHECK_FOR_READ_INTERRUPT() or similar. Similarly the
EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
pretty ugly.Btw, be-secure.c is really not a good name anymore...
What do you think?
I've invested some more time in this:
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.
Kyatoro, could you check whether you can achieve what you want using
0004?
It's imo pretty clear that a fair amount of base work needs to be done
and there's been a fair amount of progress made this fest. I think this
can now be marked returned with feedback.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thank you for reviewing. I'll look close to the patch tomorrow.
I must say this scares the heck out of me. The current code goes
through some trouble to not throw an error while in a recv()
send(). For example, you removed the DoingCommandRead check from
prepare_for_client_read(). There's an comment in postgres.c that says
this:/*
* (2) Allow asynchronous signals to be executed immediately
* if they
* come in while we are waiting for client input. (This must
* be
* conditional since we don't want, say, reads on behalf of
* COPY FROM
* STDIN doing the same thing.)
*/
DoingCommandRead = true;
Hmm. Sorry. That's my fault that I skipped over the issues about
"COPY FROM STDIN".
With the patch, we do allow asynchronous signals to be processed while
blocked in COPY FROM STDIN. Maybe that's OK, but I don't feel
comfortable just changing it. (the comment is now wrong, of course)
I don't see actual problem but I agree that the behavior should
not be chenged.
This patch also enables processing query cancel signals while blocked,
not just SIGTERM. That's not good; we might be in the middle of
sending a message, and we cannot just error out of that or we might
violate the fe/be protocol. That's OK with a SIGTERM as you're
terminating the connection anyway, and we have the PqCommBusy
safeguard in place that prevents us from sending broken messages to
the client, but that's not good enough if we wanted to keep the
backend alive, as we won't be able to send anything to the client
anymore.
Ok, since what I want is escaping from blocked send() only by
SIGTERM, it needs another mechanism from current
prepare_for_client_read().
BTW, we've been talking about blocking in send(), but this patch also
let's a recv() in e.g. COPY FROM STDIN to be interrupted. That's
probably a good thing; surely you have exactly the same issues with
that as with send(). But I didn't realize we had a problem with that
too.
I see. (But it is mere a side-effect of my carelessness, as you know:)
In summary, this patch is not ready as it is, but I think we can fix
it. The key question is: is it safe to handle SIGTERM in the signal
handler, calling the exit-handlers and exiting the backend, when
blocked in a recv() or send()? It's a change in the pqcomm.c API; most
pqcomm.c functions have not thrown errors or processed interrupts
before. But looking at the callers, I think it's safe, and there isn't
actually any comments explicitly saying that pqcomm.c will never throw
errors.I propose the attached patch. It adds a new flag ImmediateDieOK, which
is a weaker form of ImmediateInterruptOK that only allows handling a
pending die-signal in the signal handler.Robert, others, do you see a problem with this?
The patch seems excluding all problems menthioned in the message,
I have no objection to it.
Over IM, Robert pointed out that it's not safe to jump out of a signal
handler with siglongjmp, when we're inside library calls, like in a
callback called by OpenSSL. But even with current master branch,
that's exactly what we do. In secure_raw_read(), we set
ImmediateInterruptOK = true, which means that any incoming signal will
be handled directly in the signal handler, which can mean
elog(ERROR). Should we be worried? OpenSSL might get confused if
control never returns to the SSL_read() or SSL_write() function that
called secure_raw_read().
IMHO, it will soon die even if OpenSSL is confused. It seems a
bit brute that sudden cutoff occurs even when the socket is *not*
blocked, but the backend will soon die and frontend will
immediately get ECONNRESET (..hmm it is not seen in manpages of
recv/read(2)) and should safely exit from OpenSSL.
I cannot run this patch right now, but it seems to be no problem.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Wow, thank you for the patch.
0001: Allows WaitLatchOrSocket(WL_WRITABLE) without WL_READABLE. I've
tested the poll() and select() implementations on linux and
blindly patched up windows.
0002: Put the socket the backend uses to communicate with the client
into nonblocking mode as soon as latches are ready and use latches
to wait. This probably doesn't work correctly without 0003, but
seems easier to review separately.
0003: Don't do sinval catchup and notify processing in signal
handlers. It's quite cool that it worked that well so far, but it
requires some complicated code and is rather fragile. 0002 allows
to move that out of signal handlers and just use a latch
there. This seems remarkably simpler:
4 files changed, 69 insertions(+), 229 deletions(-)These aren't ready for commit, especially not 0003, but I think they are
quite a good foundation for getting rid of the blocking in send(). I
haven't added any interrupt processing after interrupted writes, but
marked the relevant places with XXXs.With regard to 0002, I dislike the current need to do interrupt
processing both in be-secure.c and be-secure-openssl.c. I guess we could
solve that by returning something like EINTR from the ssl routines when
they need further reads/writes and do all the processing in one place in
be-secure.c.There's also some cleanup in 0002/0003 needed:
prepare_for_client_read()/client_read_ended() aren't needed in that form
anymore and should probably rather be something like
CHECK_FOR_READ_INTERRUPT() or similar. Similarly the
EnableCatchupInterrupt()/DisableCatchupInterrupt() in autovacuum.c is
pretty ugly.Btw, be-secure.c is really not a good name anymore...
What do you think?
I've invested some more time in this:
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.Kyatoro, could you check whether you can achieve what you want using
0004?It's imo pretty clear that a fair amount of base work needs to be done
and there's been a fair amount of progress made this fest. I think this
can now be marked returned with feedback.
Myself is satisfied by Heikki's solution, and it seems ready for
commit. But I agree with the temporarily blocked state is seen
often and it breaks even non-blocked socket. If we want to/should
avoid breaking *temporarily or not* blocked socket even for
SIGTERM, this mechanism should be used.
Which way should we take?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
By the way,
Sorry, I missed this message and only cought up when reading your CF
status mail. I've attached three patches:
Could let me know how to get the CF status mail?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/30/2014 10:05 AM, Kyotaro HORIGUCHI wrote:
By the way,
Sorry, I missed this message and only cought up when reading your CF
status mail. I've attached three patches:Could let me know how to get the CF status mail?
I think he meant this email I sent last weekend:
/messages/by-id/542672D2.3060708@vmware.com
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-09-26 21:02:16 +0300, Heikki Linnakangas wrote:
I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
weaker form of ImmediateInterruptOK that only allows handling a pending
die-signal in the signal handler.Robert, others, do you see a problem with this?
Per se I don't have a problem with it. There does exist the problem that
the user doesn't get a error message in more cases though. On the other
hand it's bad if any user can prevent the database from restarting.
Over IM, Robert pointed out that it's not safe to jump out of a signal
handler with siglongjmp, when we're inside library calls, like in a callback
called by OpenSSL. But even with current master branch, that's exactly what
we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
that any incoming signal will be handled directly in the signal handler,
which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
if control never returns to the SSL_read() or SSL_write() function that
called secure_raw_read().
But this is imo prohibitive. Yes, we're doing it for a long while. But
no, that's not ok. It actually prompoted me into prototyping the latch
thing (in some other thread). I don't think existing practice justifies
expanding it further.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Sorry, I missed this message and only cought up when reading your CF
status mail. I've attached three patches:Could let me know how to get the CF status mail?
I think he meant this email I sent last weekend:
I see, that's what I also received. Thank you.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
weaker form of ImmediateInterruptOK that only allows handling a pending
die-signal in the signal handler.Robert, others, do you see a problem with this?
Per se I don't have a problem with it. There does exist the problem that
the user doesn't get a error message in more cases though. On the other
hand it's bad if any user can prevent the database from restarting.Over IM, Robert pointed out that it's not safe to jump out of a signal
handler with siglongjmp, when we're inside library calls, like in a callback
called by OpenSSL. But even with current master branch, that's exactly what
we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
that any incoming signal will be handled directly in the signal handler,
which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
if control never returns to the SSL_read() or SSL_write() function that
called secure_raw_read().But this is imo prohibitive. Yes, we're doing it for a long while. But
no, that's not ok. It actually prompoted me into prototyping the latch
thing (in some other thread). I don't think existing practice justifies
expanding it further.
I see, in that case, this approach seems basically
applicable. But if I understand correctly, this patch seems not
to return out of the openssl code even when latch was found to be
set in secure_raw_write/read. I tried setting errno = ECONNRESET
and it went well but seems a bad deed.
secure_raw_write(Port *port, const void *ptr, size_t len)
{
n = send(port->sock, ptr, len, 0);
if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
{
w = WaitLatchOrSocket(&MyProc->procLatch, ...
if (w & WL_LATCH_SET)
{
ResetLatch(&MyProc->procLatch);
/*
* Force a return, so interrupts can be processed when not
* (possibly) underneath a ssl library.
*/
errno = EINTR;
(return n; // n is negative)
my_sock_write(BIO *h, const char *buf, int size)
{
res = secure_raw_write(((Port *) h->ptr), buf, size);
BIO_clear_retry_flags(h);
if (res <= 0)
{
if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
BIO_set_retry_write(h);
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-10-02 17:47:39 +0900, Kyotaro HORIGUCHI wrote:
Hello,
I propose the attached patch. It adds a new flag ImmediateDieOK, which is a
weaker form of ImmediateInterruptOK that only allows handling a pending
die-signal in the signal handler.Robert, others, do you see a problem with this?
Per se I don't have a problem with it. There does exist the problem that
the user doesn't get a error message in more cases though. On the other
hand it's bad if any user can prevent the database from restarting.Over IM, Robert pointed out that it's not safe to jump out of a signal
handler with siglongjmp, when we're inside library calls, like in a callback
called by OpenSSL. But even with current master branch, that's exactly what
we do. In secure_raw_read(), we set ImmediateInterruptOK = true, which means
that any incoming signal will be handled directly in the signal handler,
which can mean elog(ERROR). Should we be worried? OpenSSL might get confused
if control never returns to the SSL_read() or SSL_write() function that
called secure_raw_read().But this is imo prohibitive. Yes, we're doing it for a long while. But
no, that's not ok. It actually prompoted me into prototyping the latch
thing (in some other thread). I don't think existing practice justifies
expanding it further.I see, in that case, this approach seems basically
applicable. But if I understand correctly, this patch seems not
to return out of the openssl code even when latch was found to be
set in secure_raw_write/read.
Correct. That's why I think it's the way forward. There's several
problems now where the inability to do real things while reading/writing
is a problem.
I tried setting errno = ECONNRESET
and it went well but seems a bad deed.
Where and why did you do that?
secure_raw_write(Port *port, const void *ptr, size_t len)
{
n = send(port->sock, ptr, len, 0);if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
{
w = WaitLatchOrSocket(&MyProc->procLatch, ...if (w & WL_LATCH_SET)
{
ResetLatch(&MyProc->procLatch);
/*
* Force a return, so interrupts can be processed when not
* (possibly) underneath a ssl library.
*/
errno = EINTR;
(return n; // n is negative)my_sock_write(BIO *h, const char *buf, int size)
{
res = secure_raw_write(((Port *) h->ptr), buf, size);
BIO_clear_retry_flags(h);
if (res <= 0)
{
if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
BIO_set_retry_write(h);
Hm, this seems, besides one comment, the code from the last patch in my
series. Do you have a particular question about it?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
But this is imo prohibitive. Yes, we're doing it for a long while. But
no, that's not ok. It actually prompoted me into prototyping the latch
thing (in some other thread). I don't think existing practice justifies
expanding it further.I see, in that case, this approach seems basically
applicable. But if I understand correctly, this patch seems not
to return out of the openssl code even when latch was found to be
set in secure_raw_write/read.Correct. That's why I think it's the way forward. There's several
problems now where the inability to do real things while reading/writing
is a problem.I tried setting errno = ECONNRESET
and it went well but seems a bad deed.Where and why did you do that?
The patch of this message.
/messages/by-id/20140828.214704.93968088.horiguchi.kyotaro@lab.ntt.co.jp
The reason for setting errno (instead of a variable for it) is to
trick openssl (or my_socck_write? I've forgot it..) into
recognizing as if the underneath send(2) have returned with any
uncontinueable error so it cannot be any of continueable errnos
(EINTR/EWOULDBLOCK/EAGAIN). Iy my faint memory, only avoiding
BIO_set_retry_write() in my_sock_write() dosn't work as expected
but it might be enough that my_sock_write returns -1 and doesn't
set BIO_set_retry_write().
The reason why ECONNNRESET is any of other errnos possible for
send(2)(*1) doesn't seem to fit the real situation, and the
blocked situation seems similar to resetted connection from the
view that it cannot continue to work due to external condition,
and it is used in be_tls_write() in a similary way.
Come to think of it, setting ECONNRESET is not so evil?
secure_raw_write(Port *port, const void *ptr, size_t len)
{
n = send(port->sock, ptr, len, 0);if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
{
w = WaitLatchOrSocket(&MyProc->procLatch, ...if (w & WL_LATCH_SET)
{
ResetLatch(&MyProc->procLatch);
/*
* Force a return, so interrupts can be processed when not
* (possibly) underneath a ssl library.
*/
errno = EINTR;
(return n; // n is negative)my_sock_write(BIO *h, const char *buf, int size)
{
res = secure_raw_write(((Port *) h->ptr), buf, size);
BIO_clear_retry_flags(h);
if (res <= 0)
{
if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
BIO_set_retry_write(h);Hm, this seems, besides one comment, the code from the last patch in my
series. Do you have a particular question about it?
I didn't have a particluar qustion about it. This is cited only
in order to show the route to retrying.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/28/2014 01:54 AM, Andres Freund wrote:
I've invested some more time in this:
Thanks!
In 0001, the select() codepath will not return (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the
poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE
was requested as a wake-event, and likewise for writeable, while the
poll() codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)
regardless of the requested wake-events. I'm not sure which is actually
better - a separate WL_SOCKET_ERROR code might be best - but it's
inconsistent as it is.
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.
WaitLatchOrSocket() can throw an error, so it's not totally safe to call
that underneath OpenSSL. Admittedly the cases where it throws an error
are "shouldn't happen" cases like "poll() failed" or "read() on
self-pipe failed", but still. Perhaps those errors should be
reclassified as FATAL; it's not clear you can just roll back and expect
to continue running if any of them happens.
In secure_raw_write(), you need to save and restore errno, as
WaitLatchOrSocket will not preserve it. If secure_raw_write() calls
WaitLatchOrSocket(), and it returns because the latch was set, and we
fall out of secure_raw_write, we will return -1 but the errno might not
be set to anything sensible anymore.
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.
Yeah, 1-3 seem sane. 4 also looks OK to me at a quick glance. It
basically enables handling the "die" interrupt immediately, if we're
blocked in a read or write. It won't be handled in the signal handler,
but within the secure_read/write call anyway.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
On 09/28/2014 01:54 AM, Andres Freund wrote:
I've invested some more time in this:
Thanks!
In 0001, the select() codepath will not return (WL_SOCKET_READABLE |
WL_SOCKET_WRITEABLE) on EOF or error, like the comment says and like the
poll() path does. It only sets WL_SOCKET_READABLE if WL_SOCKET_READABLE was
requested as a wake-event, and likewise for writeable, while the poll()
codepath returns (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE) regardless of
the requested wake-events. I'm not sure which is actually better - a
separate WL_SOCKET_ERROR code might be best - but it's inconsistent as it
is.
Hm. Right. I think we should only report the requested state. We can't
really discern wether it's a hangup, error or actually readable/writable
with select() - it just returns the socket as readable/writable as soon
as it doesn't block anymore. Where not blocking includes the connection
having gone bad.
It took me a while to figure out whether that's actually guaranteed by
the spec, but I'm pretty sure it is...
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.WaitLatchOrSocket() can throw an error, so it's not totally safe to call
that underneath OpenSSL.
Hm. Fair point.
Admittedly the cases where it throws an error are
"shouldn't happen" cases like "poll() failed" or "read() on self-pipe
failed", but still. Perhaps those errors should be reclassified as FATAL;
it's not clear you can just roll back and expect to continue running if any
of them happens.
Fine with me.
In secure_raw_write(), you need to save and restore errno, as
WaitLatchOrSocket will not preserve it. If secure_raw_write() calls
WaitLatchOrSocket(), and it returns because the latch was set, and we fall
out of secure_raw_write, we will return -1 but the errno might not be set to
anything sensible anymore.
Oops.
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.Yeah, 1-3 seem sane.
I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...
Tom, do you happen to have time to look at that bit?
There's also the concern that using a latch for client communication
increases the number of syscalls for the same work. We should at least
try to quantify that...
4 also looks OK to me at a quick glance. It basically
enables handling the "die" interrupt immediately, if we're blocked in a read
or write. It won't be handled in the signal handler, but within the
secure_read/write call anyway.
What are you thinking about the concern that it'll reduce the likelihood
of transferring the error message to the client? I tried to reduce that
by only allowing errors when write() blocks, but that's not an
infrequent event.
Thanks for looking.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/03/2014 05:26 PM, Andres Freund wrote:
On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
On 09/28/2014 01:54 AM, Andres Freund wrote:
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.Yeah, 1-3 seem sane.
I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...
I only looked at the .patch, I didn't apply it, so I didn't look at the
context much. But I don't see any fundamental problem with it. I would
like to have a closer look before it's committed, though.
There's also the concern that using a latch for client communication
increases the number of syscalls for the same work. We should at least
try to quantify that...
I'm not too concerned about that, since we only do extra syscalls when
the socket isn't immediately available for reading/writing, i.e. when we
have to sleep anyway.
4 also looks OK to me at a quick glance. It basically
enables handling the "die" interrupt immediately, if we're blocked in a read
or write. It won't be handled in the signal handler, but within the
secure_read/write call anyway.What are you thinking about the concern that it'll reduce the likelihood
of transferring the error message to the client? I tried to reduce that
by only allowing errors when write() blocks, but that's not an
infrequent event.
I'm not too concerned about that either. I mean, it's probably true that
it reduces the likelihood, but I don't particularly care myself. But if
we care, we could use a timeout there, so that if we receive a SIGTERM
while blocked on a send(), we wait for a few seconds to see if we can
send whatever we were sending, before terminating the backend.
What should we do with this patch in the commitfest? Are you planning to
clean up and commit these patches?
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-10-03 18:29:23 +0300, Heikki Linnakangas wrote:
On 10/03/2014 05:26 PM, Andres Freund wrote:
On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
On 09/28/2014 01:54 AM, Andres Freund wrote:
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.Yeah, 1-3 seem sane.
I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...I only looked at the .patch, I didn't apply it, so I didn't look at the
context much. But I don't see any fundamental problem with it. I would like
to have a closer look before it's committed, though.
I'd appreciate that. I don't want to commit it without a careful review
of another committer.
There's also the concern that using a latch for client communication
increases the number of syscalls for the same work. We should at least
try to quantify that...I'm not too concerned about that, since we only do extra syscalls when the
socket isn't immediately available for reading/writing, i.e. when we have to
sleep anyway.
Well, kernels actually do some nice optimizations for blocking reads -
at least for local sockets. Like switching to the other process
immediately and such.
I'm not super concerned either, but I think we should try to measure
it. And if we're failing, we probably should try to address these
problems - if possible in the latch code.
4 also looks OK to me at a quick glance. It basically
enables handling the "die" interrupt immediately, if we're blocked in a read
or write. It won't be handled in the signal handler, but within the
secure_read/write call anyway.What are you thinking about the concern that it'll reduce the likelihood
of transferring the error message to the client? I tried to reduce that
by only allowing errors when write() blocks, but that's not an
infrequent event.I'm not too concerned about that either. I mean, it's probably true that it
reduces the likelihood, but I don't particularly care myself. But if we
care, we could use a timeout there, so that if we receive a SIGTERM while
blocked on a send(), we wait for a few seconds to see if we can send
whatever we were sending, before terminating the backend.What should we do with this patch in the commitfest?
I think feature should be ddeclare as 'returned with feedback' for this
commitfest. I've done so.
I don't see much of a reason waiting with patch 1 till the next
commitfest. It imo looks uncontroversial and doesn't have any far
reaching consequences.
Are you planning to clean up and commit these patches?
I plan to do so one by one, yes. If you'd like to pick up any of them,
feel free to do (after telling me, to avoid duplicated efforts). I don't
feel proprietary about any of them. But I guess you have other stuff
you'd like to work on too ;)
I'll try to send out a version with the stuff you mentioned earlier in
the next couple days.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello, simplly inhibit set retry flag when ProcDiePending in
my_sock_write seems enough.
But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
I modified the patch 4 as the attached patch.
Finally, the attached patch works as expected with Andres's patch
1-3.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
0004-Process_die_intr_while_writing.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 6fc6903..2288fe2 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size)
BIO_clear_retry_flags(h);
if (res <= 0)
{
- if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
+ if (!ProcDiePending &&
+ (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN))
{
BIO_set_retry_write(h);
}
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 7b5b30f..ab9e122 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -199,6 +199,7 @@ secure_write(Port *port, void *ptr, size_t len)
{
ssize_t n;
+retry:
#ifdef USE_SSL
if (port->ssl_in_use)
{
@@ -210,7 +211,26 @@ secure_write(Port *port, void *ptr, size_t len)
n = secure_raw_write(port, ptr, len);
}
- /* XXX: We likely will want to process some interrupts here */
+ /*
+ * We only want to process the interrupt here if socket writes are
+ * blocking to increase the chance to get an error message to the
+ * client. If we're not blocked there'll soon be a
+ * CHECK_FOR_INTERRUPTS(). But if we're blocked we'll never get out of
+ * that situation if the client has died.
+ */
+ if (ProcDiePending && !port->noblock && n < 0)
+ {
+ /*
+ * We're dying. It's safe (and sane) to handle that now.
+ */
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ /* retry after processing interrupts */
+ if (n < 0 && errno == EINTR)
+ {
+ goto retry;
+ }
return n;
}
@@ -236,10 +256,19 @@ wloop:
* don't do anything while (possibly) inside a ssl library.
*/
w = WaitLatchOrSocket(&MyProc->procLatch,
- WL_SOCKET_WRITEABLE,
+ WL_LATCH_SET | WL_SOCKET_WRITEABLE,
port->sock, 0);
- if (w & WL_SOCKET_WRITEABLE)
+ if (w & WL_LATCH_SET)
+ {
+ ResetLatch(&MyProc->procLatch);
+ /*
+ * Force a return, so interrupts can be processed when not
+ * (possibly) underneath a ssl library.
+ */
+ errno = EINTR;
+ }
+ else if (w & WL_SOCKET_WRITEABLE)
{
goto wloop;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3a6aa1c..61390aa 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -520,6 +520,13 @@ ProcessClientReadInterrupt(void)
errno = save_errno;
}
+ else if (ProcDiePending)
+ {
+ /*
+ * We're dying. It's safe (and sane) to handle that now.
+ */
+ CHECK_FOR_INTERRUPTS();
+ }
}
On 2014-10-09 14:06:35 +0900, Kyotaro HORIGUCHI wrote:
Hello, simplly inhibit set retry flag when ProcDiePending in
my_sock_write seems enough.But it returns with SSL_ERROR_SYSCALL not SSL_ERROR_WANT_WRITE so
I modified the patch 4 as the attached patch.
Why is that necessary? It seems really rather wrong to make
BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
least in my testing, it's not even required because the be_tls_write()
can just check the error properly?
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 6fc6903..2288fe2 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -750,7 +750,8 @@ my_sock_write(BIO *h, const char *buf, int size) BIO_clear_retry_flags(h); if (res <= 0) { - if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN) + if (!ProcDiePending && + (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)) { BIO_set_retry_write(h); }
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hmm.. Sorry for my stupidity.
Why is that necessary? It seems really rather wrong to make
BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
least in my testing, it's not even required because the be_tls_write()
can just check the error properly?
I mistook the previous conversation as it doesn't work as
expected. I confirmed that it works fine.
After all, it works as I expected. The parameter for
ProcessClientWriteInterrupt() looks somewhat uneasy but the patch
4 looks fine as a whole. Do you have anything to worry about in
the patch?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:
On 10/03/2014 05:26 PM, Andres Freund wrote:
On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
On 09/28/2014 01:54 AM, Andres Freund wrote:
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.Yeah, 1-3 seem sane.
I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...I only looked at the .patch, I didn't apply it, so I didn't look at the
context much. But I don't see any fundamental problem with it. I would
like to have a closer look before it's committed, though.
About patch 3:
Looking closer, this design still looks OK to me. As you said yourself,
the comments need some work (e.g. the step 5. in the top comment in
async.c needs updating). And then there are a couple of FIXME and XXX
comments that need to be addressed.
The comment on PGPROC.procLatch in storage/proc.h says just this:
Latch procLatch; /* generic latch for process */
This needs a lot more explaining. It's now used by signal handlers to
interrupt a read or write to the socket; that should be mentioned. What
else is it used for? (for waking up a backend in synchronous
replication, at least) What are the rules on when to arm it and when to
reset it?
Would it be more clear to use a separate, backend-private, latch, for
the signals? I guess that won't work, though, because sometimes we need
need to wait for a wakeup from a different process or from a signal at
the same time (SyncRepWaitForLSN() in particular). Not without adding a
variant of WaitLatch that can wait on two latches simultaneously, anyway.
The assumption in secure_raw_read that MyProc exists is pretty
surprising. I understand why it's that way, and there's a comment in
PostgresMain explaining why the socket cannot be put into non-blocking
mode earlier, but it's still a bit whacky. Not sure what to do about that.
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-10-30 15:27:13 +0200, Heikki Linnakangas wrote:
The comment on PGPROC.procLatch in storage/proc.h says just this:
Latch procLatch; /* generic latch for process */
This needs a lot more explaining. It's now used by signal handlers to
interrupt a read or write to the socket; that should be mentioned. What else
is it used for? (for waking up a backend in synchronous replication, at
least) What are the rules on when to arm it and when to reset it?
Hm. I agree it use expaned commentary, but I'm unsure if that much
detail is really warranted. Any such documentation seems to be almost
guaranteed to be out of date. As evidenced that there's none to date...
Would it be more clear to use a separate, backend-private, latch, for the
signals? I guess that won't work, though, because sometimes we need need to
wait for a wakeup from a different process or from a signal at the same time
(SyncRepWaitForLSN() in particular). Not without adding a variant of
WaitLatch that can wait on two latches simultaneously, anyway.
I wondered the same, but I don't really see what it'd buy us during
normal running. It seems like it'd just make code more complex without
leading to relevantly fewer wakeups.
The assumption in secure_raw_read that MyProc exists is pretty surprising. I
understand why it's that way, and there's a comment in PostgresMain
explaining why the socket cannot be put into non-blocking mode earlier, but
it's still a bit whacky. Not sure what to do about that.
It makes me quite unhappy too. I looked what it'd take to make proclatch
available earlier, but it wasn't pretty. I wondered whether we could use
a 'early proc latch' in MyProcLatch that used until we're fully attached
to shared memory. Then, when attaching to shared memory we'd set the old
latch once, and reset MyProcLatch to the shared memory one.
But that's pretty ugly.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
I've invested some more time in this:
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.
In this patch, the endif appears to be misplaced in PostgresMain:
+ if (MyProcPort != NULL)
+ {
+#ifdef WIN32
+ pgwin32_noblock = true;
+#else
+ if (!pg_set_noblock(MyProcPort->sock))
+ ereport(COMMERROR,
+ (errmsg("could not set socket to nonblocking mode: %m")));
+ }
+#endif
+
pqinitmask();
0003 Sinval/notify processing got simplified further. There really isn't
any need for DisableNotifyInterrupt/DisableCatchupInterrupt
anymore. Also begin_client_read/client_read_ended don't make much
sense anymore. Instead introduce ProcessClientReadInterrupt (which
wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
set. All of that happens via the latch mechanism, nothing happens
inside signal handlers. So I do think it's quite an improvement
over what's been discussed in this thread.
But it (and the other approaches) do noticeably increase the
likelihood of clients not getting the error message if the client
isn't actually dead. The likelihood of write() being blocked
*temporarily* due to normal bandwidth constraints is quite high
when you consider COPY FROM and similar. Right now we'll wait till
we can put the error message into the socket afaics.1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.
After re-reading these I don't see the rest of items I wanted to inqury
about anymore, so it just makes more sense now.
One thing I did try is sending a NOTICE to the client when in
ProcessInterrupts() and DoingCommandRead is true. I think[1]/messages/by-id/1262173040.19367.5015.camel@ebony it was
expected to be delivered instantly, but actually the client (psql) only
displays it after sending the next statement.
While I'm reading on FE/BE protocol someone might want to share his
wisdom on this subject. My guess: psql blocks on readline/libedit call
and can't effectively poll the server socket before complete input from
user.
--
Alex
[1]: /messages/by-id/1262173040.19367.5015.camel@ebony
``AFAIK, NOTICE was suggested because it can be sent at any time,
whereas ERRORs are only associated with statements.''
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Oct 30, 2014 at 10:27 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:
[blah]
About patch 3:
Looking closer, this design still looks OK to me. As you said yourself, the
comments need some work (e.g. the step 5. in the top comment in async.c
needs updating). And then there are a couple of FIXME and XXX comments that
need to be addressed.
Those patches have not been updated in a while, and I am seeing some
feedback from several people, hence returning as returned with
feedback. Horiguchi-san, feel free to add new entries on the CF app in
2014-12 or move this entry if you feel overwise.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Since I don't have clear idea how to promote this, I will remake
and be back with new patch based on Andres' for patches.
Hmm.. Sorry for my stupidity.
Why is that necessary? It seems really rather wrong to make
BIO_set_retry_write() dependant on ProcDiePending? Especially as, at
least in my testing, it's not even required because the be_tls_write()
can just check the error properly?I mistook the previous conversation as it doesn't work as
expected. I confirmed that it works fine.After all, it works as I expected. The parameter for
ProcessClientWriteInterrupt() looks somewhat uneasy but the patch
4 looks fine as a whole. Do you have anything to worry about in
the patch?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-12-15 18:19:26 +0900, Kyotaro HORIGUCHI wrote:
Since I don't have clear idea how to promote this, I will remake
and be back with new patch based on Andres' for patches.
Do my patches miss any functionality you want?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
On 2014-12-15 18:19:26 +0900, Kyotaro HORIGUCHI wrote:
Since I don't have clear idea how to promote this, I will remake
and be back with new patch based on Andres' for patches.Do my patches miss any functionality you want?
The patch satisfies what I want, as I said upthread. What I don't
know is how I can go on with it in this CF topic, in other word
what should I do in order to put it to "ready for committer"?
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-10-03 16:26:35 +0200, Andres Freund wrote:
On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.WaitLatchOrSocket() can throw an error, so it's not totally safe to call
that underneath OpenSSL.Hm. Fair point.
I think we should fix this by simply prohibiting
WaitLatch/WaitLatchOrSocket from ERRORing out. The easiest, and imo
acceptable, thing is to simply convert the relevant ERRORs to FATAL. I
think that'd be perfectly fine as it seems very unlikely that we
continue sanely afterwards.
It would really be nice if we had a simple way to raise a FATAL that
won't go to the client for situations like this. I'd proposed
elog(FATAL | COMERROR, ...) in the past...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-11-17 18:22:54 +0300, Alex Shulgin wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I've invested some more time in this:
0002 now makes sense on its own and doesn't change anything around the
interrupt handling. Oh, and it compiles without 0003.In this patch, the endif appears to be misplaced in PostgresMain:
+ if (MyProcPort != NULL) + { +#ifdef WIN32 + pgwin32_noblock = true; +#else + if (!pg_set_noblock(MyProcPort->sock)) + ereport(COMMERROR, + (errmsg("could not set socket to nonblocking mode: %m"))); + } +#endif +
Uh. Odd. Anyway, that bit of code is now somewhere else anyway...
One thing I did try is sending a NOTICE to the client when in
ProcessInterrupts() and DoingCommandRead is true. I think[1] it was
expected to be delivered instantly, but actually the client (psql) only
displays it after sending the next statement.
Yea, that should be psql specific though. I hope ;)
While I'm reading on FE/BE protocol someone might want to share his
wisdom on this subject. My guess: psql blocks on readline/libedit call
and can't effectively poll the server socket before complete input from
user.
I'm not sure if it's actually a "can't". It doesn't at least ;)
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2014-09-28 00:54:21 +0200, Andres Freund wrote:
I've invested some more time in this:
And yet round of time spent.
The major change since last round is that I've introduced a local latch
that exists in every process. If InitProcess() is run, that latch is
replaced by the shared process latch and the reverse happens in
ProcKill. To implement this, I decided to remove some code duplication
around process initialization.
The major reason to do is that this allows us to get rid of the special
case where MyProc isn't available yet during early process startup,
where the socket still had to be in blocking mode. Instead we can now
rely on the latch all the time.
Other than that I've significantly cleaned up and tested the
patchset. Unfortunately that testing has brought a significant number of
SSL bugs to light, but they all turned out to be independent of these
changes. I'll try to detail them in a separate email early next week.
I've done some performance measurements, to verify this doesn't cause a
regression. When testing with 'SELECT 1' as a trivial statement I
couldn't measure any regression at 32 pgbench clients/threads on my 4
core (i7-4800MQ) laptop. At 390 clients I saw about 1.6%. With statement
that actually does something that difference vanished. Personally I'm
ok with that.
The patches are:
0001-Allow-latches-to-wait-for-socket-writability-without.patch
Imo pretty close to commit and can be committed independently.
0002-Commonalize-process-startup-code.patch
The above mentioned deduplication. Needs a review (completely new),
but it's imo a clear improvement and allows for a fair amount of
future/further deduplication.
0003-Add-a-default-local-latch-for-use-in-signal-handlers.patch
Adds the default latch + the logic to switch to shared latch while
attached. I think it's a good idea, but I'd like some feedback.
0004-Use-a-nonblocking-socket-for-FE-BE-communication-and.patch
The earlier patch that converts the communication to use latches
with some added cleanup/improvements. Specifically we don't have to
rely on MyProc->procLatch anymore due to 0003 which gets rid of
some ugly code in be-secure.c and postgres.c
I think this patch might not be safe without 0005 because I can't
convince myself that it's safe to interrupt latch waits with work
that actually might also use the same latch (sinval
interrupts). But it's easier to review this way.
0005-Introduce-and-use-infrastructure-for-interrupt-proce.patch
Actually move most of sinval.c/async.c's interrupt handling out of
the signal handlers and use the latches. This is the cleaned up
version of the earlier commit.
0006-WIP-Process-die-interrupts-while-reading-writing-fro.patch
Not much has changed, needs at least some comments. I want to get
the other stuff done first.
There remains one 'FIXME' after all the patches, which is
interactive_getc() won't react to catchup/notify interrupts - which, as
far as I can see, is fine, as there are none.
Comments?
Greetings,
Andres Freund
Attachments:
0001-Allow-latches-to-wait-for-socket-writability-without.patchtext/x-patch; charset=us-asciiDownload
>From ac617c2e94f8e875c72b0f0f443b206126de9102 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 8 Jan 2015 14:09:48 +0100
Subject: [PATCH 1/6] Allow latches to wait for socket writability without
waiting for readability.
So far WaitLatchOrSocket() required to pass in WL_SOCKET_READABLE as
that solely was used to indicate error conditions, like EOF. Waiting
for WL_SOCKET_WRITEABLE would have meant to busy wait upon socket
errors.
Adjust the API to signal errors by returning the socket as readable,
writable or both, depending on WL_SOCKET_READABLE/WL_SOCKET_WRITEABLE
being specified. It would arguably be nicer to return WL_SOCKET_ERROR
but that's not possible on platforms and would probably also result in
more complex callsites.
This previously had explicitly been forbidden in e42a21b9e6c9, as
there was no use case at that point. We now are looking into making
FE/BE communication use latches, so it
Discussion: 20140927191243.GD5423@alap3.anarazel.de
Reviewed-By: Heikki Linnakangas
---
src/backend/port/unix_latch.c | 20 ++++++++++++++------
src/backend/port/win32_latch.c | 16 +++++++++++-----
2 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index ecf01bd..e18a2e7 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -200,9 +200,9 @@ WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
* Like WaitLatch, but with an extra socket argument for WL_SOCKET_*
* conditions.
*
- * When waiting on a socket, WL_SOCKET_READABLE *must* be included in
- * 'wakeEvents'; WL_SOCKET_WRITEABLE is optional. The reason for this is
- * that EOF and error conditions are reported only via WL_SOCKET_READABLE.
+ * When waiting on a socket, EOF and error conditions are reported by
+ * returning the socket as readable/writable or both, depending on
+ * WL_SOCKET_READABLE/WL_SOCKET_WRITEABLE being specified.
*/
int
WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
@@ -230,8 +230,6 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
Assert(wakeEvents != 0); /* must have at least one wake event */
- /* Cannot specify WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE */
- Assert((wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) != WL_SOCKET_WRITEABLE);
if ((wakeEvents & WL_LATCH_SET) && latch->owner_pid != MyProcPid)
elog(ERROR, "cannot wait on a latch owned by another process");
@@ -346,7 +344,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
{
/* at least one event occurred, so check revents values */
if ((wakeEvents & WL_SOCKET_READABLE) &&
- (pfds[0].revents & (POLLIN | POLLHUP | POLLERR | POLLNVAL)))
+ (pfds[0].revents & POLLIN))
{
/* data available in socket, or EOF/error condition */
result |= WL_SOCKET_READABLE;
@@ -354,8 +352,17 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
if ((wakeEvents & WL_SOCKET_WRITEABLE) &&
(pfds[0].revents & POLLOUT))
{
+ /* socket is writable */
result |= WL_SOCKET_WRITEABLE;
}
+ if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL))
+ {
+ /* EOF/error condition */
+ if (wakeEvents & WL_SOCKET_READABLE)
+ result |= WL_SOCKET_READABLE;
+ if (wakeEvents & WL_SOCKET_WRITEABLE)
+ result |= WL_SOCKET_WRITEABLE;
+ }
/*
* We expect a POLLHUP when the remote end is closed, but because
@@ -439,6 +446,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
}
if ((wakeEvents & WL_SOCKET_WRITEABLE) && FD_ISSET(sock, &output_mask))
{
+ /* socket is writable, or EOF */
result |= WL_SOCKET_WRITEABLE;
}
if ((wakeEvents & WL_POSTMASTER_DEATH) &&
diff --git a/src/backend/port/win32_latch.c b/src/backend/port/win32_latch.c
index 112e60e..fe650b0 100644
--- a/src/backend/port/win32_latch.c
+++ b/src/backend/port/win32_latch.c
@@ -117,8 +117,6 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
wakeEvents &= ~(WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE);
Assert(wakeEvents != 0); /* must have at least one wake event */
- /* Cannot specify WL_SOCKET_WRITEABLE without WL_SOCKET_READABLE */
- Assert((wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE)) != WL_SOCKET_WRITEABLE);
if ((wakeEvents & WL_LATCH_SET) && latch->owner_pid != MyProcPid)
elog(ERROR, "cannot wait on a latch owned by another process");
@@ -152,10 +150,10 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
if (wakeEvents & (WL_SOCKET_READABLE | WL_SOCKET_WRITEABLE))
{
/* Need an event object to represent events on the socket */
- int flags = 0;
+ int flags = FD_CLOSE; /* always check for errors/EOF */
if (wakeEvents & WL_SOCKET_READABLE)
- flags |= (FD_READ | FD_CLOSE);
+ flags |= FD_READ;
if (wakeEvents & WL_SOCKET_WRITEABLE)
flags |= FD_WRITE;
@@ -232,7 +230,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
elog(ERROR, "failed to enumerate network events: error code %u",
WSAGetLastError());
if ((wakeEvents & WL_SOCKET_READABLE) &&
- (resEvents.lNetworkEvents & (FD_READ | FD_CLOSE)))
+ (resEvents.lNetworkEvents & FD_READ))
{
result |= WL_SOCKET_READABLE;
}
@@ -241,6 +239,14 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
{
result |= WL_SOCKET_WRITEABLE;
}
+ if (resEvents.lNetworkEvents & FD_CLOSE)
+ {
+ if (wakeEvents & WL_SOCKET_READABLE)
+ result |= WL_SOCKET_READABLE;
+ if (wakeEvents & WL_SOCKET_WRITEABLE)
+ result |= WL_SOCKET_WRITEABLE;
+ }
+
}
else if ((wakeEvents & WL_POSTMASTER_DEATH) &&
rc == WAIT_OBJECT_0 + pmdeath_eventno)
--
2.2.1.212.gc5b9256
0002-Commonalize-process-startup-code.patchtext/x-patch; charset=us-asciiDownload
>From c7c3f9968243478060ae344614dc38df6b2c8bfb Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 8 Jan 2015 22:54:58 +0100
Subject: [PATCH 2/6] Commonalize process startup code.
Move common code, that was duplicated in every postmaster child/every
standalone process, into two functions in miscinit.c. Not only does
that already result in a fair amount of net code reduction but it also
makes it much easier to remove more duplication in the future. The
prime motivation wasn't code deduplication though, but easier addition
of new common code.
---
src/backend/bootstrap/bootstrap.c | 25 +++-----------
src/backend/postmaster/autovacuum.c | 48 +++------------------------
src/backend/postmaster/bgworker.c | 17 ----------
src/backend/postmaster/bgwriter.c | 11 -------
src/backend/postmaster/checkpointer.c | 11 -------
src/backend/postmaster/pgarch.c | 20 ++---------
src/backend/postmaster/pgstat.c | 22 ++-----------
src/backend/postmaster/postmaster.c | 60 +++++----------------------------
src/backend/postmaster/startup.c | 9 -----
src/backend/postmaster/syslogger.c | 23 ++-----------
src/backend/postmaster/walwriter.c | 11 -------
src/backend/replication/walreceiver.c | 11 -------
src/backend/tcop/postgres.c | 24 ++------------
src/backend/utils/init/miscinit.c | 62 +++++++++++++++++++++++++++++++++++
src/include/miscadmin.h | 3 ++
15 files changed, 90 insertions(+), 267 deletions(-)
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index d33c683..4c650fb 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -191,19 +191,11 @@ AuxiliaryProcessMain(int argc, char *argv[])
char *userDoption = NULL;
/*
- * initialize globals
+ * Initialize process environment (already done if under postmaster, but
+ * not if standalone).
*/
- MyProcPid = getpid();
-
- MyStartTime = time(NULL);
-
- /* Compute paths, if we didn't inherit them from postmaster */
- if (my_exec_path[0] == '\0')
- {
- if (find_my_exec(progname, my_exec_path) < 0)
- elog(FATAL, "%s: could not locate my own executable path",
- progname);
- }
+ if (!IsUnderPostmaster)
+ InitStandaloneProcess(argv[0]);
/*
* process command arguments
@@ -516,15 +508,6 @@ bootstrap_signals(void)
if (IsUnderPostmaster)
{
/*
- * If possible, make this process a group leader, so that the
- * postmaster can signal any child processes too.
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* Properly accept or ignore signals the postmaster might send us
*/
pqsignal(SIGHUP, SIG_IGN);
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 062b120..2892e7b 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -383,12 +383,11 @@ StartAutoVacLauncher(void)
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+ InitPostmasterChild();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);
- /* Lose the postmaster's on-exit routines */
- on_exit_reset();
-
AutoVacLauncherMain(0, NULL);
break;
#endif
@@ -408,16 +407,8 @@ AutoVacLauncherMain(int argc, char *argv[])
{
sigjmp_buf local_sigjmp_buf;
- /* we are a postmaster subprocess now */
- IsUnderPostmaster = true;
am_autovacuum_launcher = true;
- /* reset MyProcPid */
- MyProcPid = getpid();
-
- /* record Start Time for logging */
- MyStartTime = time(NULL);
-
/* Identify myself via ps */
init_ps_display("autovacuum launcher process", "", "", "");
@@ -430,17 +421,6 @@ AutoVacLauncherMain(int argc, char *argv[])
SetProcessingMode(InitProcessing);
/*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (autovacuum probably never has any
- * child processes, but for consistency we make all postmaster child
- * processes do this.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* Set up signal handlers. We operate on databases much like a regular
* backend, so we use the same signal handling. See equivalent code in
* tcop/postgres.c.
@@ -1455,12 +1435,11 @@ StartAutoVacWorker(void)
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+ InitPostmasterChild();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);
- /* Lose the postmaster's on-exit routines */
- on_exit_reset();
-
AutoVacWorkerMain(0, NULL);
break;
#endif
@@ -1481,33 +1460,14 @@ AutoVacWorkerMain(int argc, char *argv[])
sigjmp_buf local_sigjmp_buf;
Oid dbid;
- /* we are a postmaster subprocess now */
- IsUnderPostmaster = true;
am_autovacuum_worker = true;
- /* reset MyProcPid */
- MyProcPid = getpid();
-
- /* record Start Time for logging */
- MyStartTime = time(NULL);
-
/* Identify myself via ps */
init_ps_display("autovacuum worker process", "", "", "");
SetProcessingMode(InitProcessing);
/*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (autovacuum probably never has any
- * child processes, but for consistency we make all postmaster child
- * processes do this.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* Set up signal handlers. We operate on databases much like a regular
* backend, so we use the same signal handling. See equivalent code in
* tcop/postgres.c.
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 8a7637b..5012355 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -556,16 +556,8 @@ StartBackgroundWorker(void)
if (worker == NULL)
elog(FATAL, "unable to find bgworker entry");
- /* we are a postmaster subprocess now */
- IsUnderPostmaster = true;
IsBackgroundWorker = true;
- /* reset MyProcPid */
- MyProcPid = getpid();
-
- /* record Start Time for logging */
- MyStartTime = time(NULL);
-
/* Identify myself via ps */
snprintf(buf, MAXPGPATH, "bgworker: %s", worker->bgw_name);
init_ps_display(buf, "", "", "");
@@ -591,15 +583,6 @@ StartBackgroundWorker(void)
pg_usleep(PostAuthDelay * 1000000L);
/*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too.
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* Set up signal handlers.
*/
if (worker->bgw_flags & BGWORKER_BACKEND_DATABASE_CONNECTION)
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 872c231..e8ceef7 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -114,17 +114,6 @@ BackgroundWriterMain(void)
bool prev_hibernate;
/*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (bgwriter probably never has any
- * child processes, but for consistency we make all postmaster child
- * processes do this.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* Properly accept or ignore signals the postmaster might send us.
*
* bgwriter doesn't participate in ProcSignal signalling, but a SIGUSR1
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4d1ba40..3c9c216 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -197,17 +197,6 @@ CheckpointerMain(void)
CheckpointerShmem->checkpointer_pid = MyProcPid;
/*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (checkpointer probably never has
- * any child processes, but for consistency we make all postmaster child
- * processes do this.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* Properly accept or ignore signals the postmaster might send us
*
* Note: we deliberately ignore SIGTERM, because during a standard Unix
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 75e7e1b..78dec3a 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -157,12 +157,11 @@ pgarch_start(void)
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+ InitPostmasterChild();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);
- /* Lose the postmaster's on-exit routines */
- on_exit_reset();
-
/* Drop our connection to postmaster's shared memory, as well */
dsm_detach_all();
PGSharedMemoryDetach();
@@ -221,21 +220,6 @@ pgarch_forkexec(void)
NON_EXEC_STATIC void
PgArchiverMain(int argc, char *argv[])
{
- IsUnderPostmaster = true; /* we are a postmaster subprocess now */
-
- MyProcPid = getpid(); /* reset MyProcPid */
-
- MyStartTime = time(NULL); /* record Start Time for logging */
-
- /*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too.
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
InitializeLatchSupport(); /* needed for latch waits */
InitLatch(&mainloop_latch); /* initialize latch used in main loop */
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ea3bd4b..fa87660 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -695,12 +695,11 @@ pgstat_start(void)
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+ InitPostmasterChild();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);
- /* Lose the postmaster's on-exit routines */
- on_exit_reset();
-
/* Drop our connection to postmaster's shared memory, as well */
dsm_detach_all();
PGSharedMemoryDetach();
@@ -3152,23 +3151,6 @@ PgstatCollectorMain(int argc, char *argv[])
PgStat_Msg msg;
int wr;
- IsUnderPostmaster = true; /* we are a postmaster subprocess now */
-
- MyProcPid = getpid(); /* reset MyProcPid */
-
- MyStartTime = time(NULL); /* record Start Time for logging */
-
- /*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (pgstat probably never has any
- * child processes, but for consistency we make all postmaster child
- * processes do this.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
InitializeLatchSupport(); /* needed for latch waits */
/* Initialize private latch for use by signal handlers */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index e2b3b81..216fa21 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3812,19 +3812,8 @@ BackendStartup(Port *port)
{
free(bn);
- /*
- * Let's clean up ourselves as the postmaster child, and close the
- * postmaster's listen sockets. (In EXEC_BACKEND case this is all
- * done in SubPostmasterMain.)
- */
- IsUnderPostmaster = true; /* we are a postmaster subprocess now */
-
- MyProcPid = getpid(); /* reset MyProcPid */
-
- MyStartTime = time(NULL);
-
- /* We don't want the postmaster's proc_exit() handlers */
- on_exit_reset();
+ /* Detangle from postmaster */
+ InitPostmasterChild();
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);
@@ -3941,7 +3930,6 @@ BackendInitialize(Port *port)
/* save process start time */
port->SessionStartTime = GetCurrentTimestamp();
- MyStartTime = timestamptz_to_time_t(port->SessionStartTime);
/* set these to empty in case they are needed before we set them up */
port->remote_host = "";
@@ -3955,16 +3943,6 @@ BackendInitialize(Port *port)
whereToSendOutput = DestRemote; /* now safe to ereport to client */
/*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (We do this now on the off chance
- * that something might spawn a child process during authentication.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* We arrange for a simple exit(1) if we receive SIGTERM or SIGQUIT or
* timeout while trying to collect the startup packet. Otherwise the
* postmaster cannot shutdown the database FAST or IMMED cleanly if a
@@ -4516,30 +4494,12 @@ SubPostmasterMain(int argc, char *argv[])
{
Port port;
- /* Do this sooner rather than later... */
- IsUnderPostmaster = true; /* we are a postmaster subprocess now */
-
- MyProcPid = getpid(); /* reset MyProcPid */
-
- MyStartTime = time(NULL);
-
- /*
- * make sure stderr is in binary mode before anything can possibly be
- * written to it, in case it's actually the syslogger pipe, so the pipe
- * chunking protocol isn't disturbed. Non-logpipe data gets translated on
- * redirection (e.g. via pg_ctl -l) anyway.
- */
-#ifdef WIN32
- _setmode(fileno(stderr), _O_BINARY);
-#endif
-
- /* Lose the postmaster's on-exit routines (really a no-op) */
- on_exit_reset();
-
/* In EXEC_BACKEND case we will not have inherited these settings */
IsPostmasterEnvironment = true;
whereToSendOutput = DestNone;
+ InitPostmasterChild();
+
/* Setup essential subsystems (to ensure elog() behaves sanely) */
InitializeGUCOptions();
@@ -4719,6 +4679,8 @@ SubPostmasterMain(int argc, char *argv[])
/* do this as early as possible; in particular, before InitProcess() */
IsBackgroundWorker = true;
+ InitPostmasterChild();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);
@@ -5139,14 +5101,11 @@ StartChildProcess(AuxProcType type)
if (pid == 0) /* child */
{
- IsUnderPostmaster = true; /* we are a postmaster subprocess now */
+ InitPostmasterChild();
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);
- /* Lose the postmaster's on-exit routines and port connections */
- on_exit_reset();
-
/* Release postmaster's working memory context */
MemoryContextSwitchTo(TopMemoryContext);
MemoryContextDelete(PostmasterContext);
@@ -5425,12 +5384,11 @@ do_start_bgworker(RegisteredBgWorker *rw)
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+ InitPostmasterChild();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);
- /* Lose the postmaster's on-exit routines */
- on_exit_reset();
-
/* Do NOT release postmaster's working memory context */
MyBgworkerEntry = &rw->rw_worker;
diff --git a/src/backend/postmaster/startup.c b/src/backend/postmaster/startup.c
index 72501e0..581837a1 100644
--- a/src/backend/postmaster/startup.c
+++ b/src/backend/postmaster/startup.c
@@ -178,15 +178,6 @@ void
StartupProcessMain(void)
{
/*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too.
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* Properly accept or ignore signals the postmaster might send us.
*/
pqsignal(SIGHUP, StartupProcSigHupHandler); /* reload config file */
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index c1b64ac..6e6754e 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -164,11 +164,6 @@ SysLoggerMain(int argc, char *argv[])
int currentLogRotationAge;
pg_time_t now;
- IsUnderPostmaster = true; /* we are a postmaster subprocess now */
-
- MyProcPid = getpid(); /* reset MyProcPid */
-
- MyStartTime = time(NULL); /* set our start time in case we call elog */
now = MyStartTime;
#ifdef EXEC_BACKEND
@@ -236,19 +231,6 @@ SysLoggerMain(int argc, char *argv[])
syslogPipe[1] = 0;
#endif
- /*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (syslogger probably never has any
- * child processes, but for consistency we make all postmaster child
- * processes do this.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- InitializeLatchSupport(); /* needed for latch waits */
-
/* Initialize private latch for use by signal handlers */
InitLatch(&sysLoggerLatch);
@@ -609,12 +591,11 @@ SysLogger_Start(void)
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+ InitPostmasterChild();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(true);
- /* Lose the postmaster's on-exit routines */
- on_exit_reset();
-
/* Drop our connection to postmaster's shared memory, as well */
dsm_detach_all();
PGSharedMemoryDetach();
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 2101b45..8ff5005 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -102,17 +102,6 @@ WalWriterMain(void)
bool hibernating;
/*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (walwriter probably never has any
- * child processes, but for consistency we make all postmaster child
- * processes do this.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
- /*
* Properly accept or ignore signals the postmaster might send us
*
* We have no particular use for SIGINT at the moment, but seems
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 3dbefc6..bfbc02f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -256,17 +256,6 @@ WalReceiverMain(void)
OwnLatch(&walrcv->latch);
- /*
- * If possible, make this process a group leader, so that the postmaster
- * can signal any child processes too. (walreceiver probably never has
- * any child processes, but for consistency we make all postmaster child
- * processes do this.)
- */
-#ifdef HAVE_SETSID
- if (setsid() < 0)
- elog(FATAL, "setsid() failed: %m");
-#endif
-
/* Properly accept or ignore signals the postmaster might send us */
pqsignal(SIGHUP, WalRcvSigHupHandler); /* set flag to read config
* file */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c321236..5e6d0f7 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3536,27 +3536,12 @@ PostgresMain(int argc, char *argv[],
sigjmp_buf local_sigjmp_buf;
volatile bool send_ready_for_query = true;
- /*
- * Initialize globals (already done if under postmaster, but not if
- * standalone).
- */
+ /* Initialize startup process environment if necessary. */
if (!IsUnderPostmaster)
- {
- MyProcPid = getpid();
-
- MyStartTime = time(NULL);
- }
+ InitStandaloneProcess(argv[0]);
SetProcessingMode(InitProcessing);
- /* Compute paths, if we didn't inherit them from postmaster */
- if (my_exec_path[0] == '\0')
- {
- if (find_my_exec(argv[0], my_exec_path) < 0)
- elog(FATAL, "%s: could not locate my own executable path",
- argv[0]);
- }
-
if (pkglib_path[0] == '\0')
get_pkglib_path(my_exec_path, pkglib_path);
@@ -3590,11 +3575,6 @@ PostgresMain(int argc, char *argv[],
}
/*
- * You might expect to see a setsid() call here, but it's not needed,
- * because if we are under a postmaster then BackendInitialize() did it.
- */
-
- /*
* Set up signal handlers and masks.
*
* Note that postmaster blocked all signals before forking child process,
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 7f386ae..aea26dd 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -16,6 +16,7 @@
#include <sys/param.h>
#include <signal.h>
+#include <time.h>
#include <sys/file.h>
#include <sys/stat.h>
#include <sys/time.h>
@@ -160,6 +161,67 @@ static int SecurityRestrictionContext = 0;
/* We also remember if a SET ROLE is currently active */
static bool SetRoleIsActive = false;
+/*
+ * Initialize the basic environment for a postmaster child
+ *
+ * Should be called as early as possible after the child's startup.
+ */
+void
+InitPostmasterChild(void)
+{
+ IsUnderPostmaster = true; /* we are a postmaster subprocess now */
+
+ MyProcPid = getpid(); /* reset MyProcPid */
+
+ MyStartTime = time(NULL); /* set our start time in case we call elog */
+
+ /*
+ * make sure stderr is in binary mode before anything can possibly be
+ * written to it, in case it's actually the syslogger pipe, so the pipe
+ * chunking protocol isn't disturbed. Non-logpipe data gets translated on
+ * redirection (e.g. via pg_ctl -l) anyway.
+ */
+#ifdef WIN32
+ _setmode(fileno(stderr), _O_BINARY);
+#endif
+
+ /* We don't want the postmaster's proc_exit() handlers */
+ on_exit_reset();
+
+ /*
+ * If possible, make this process a group leader, so that the postmaster
+ * can signal any child processes too. Not all processes will have
+ * children, but for consistency we , but for consistency we make all
+ * postmaster child processes do this.
+ */
+#ifdef HAVE_SETSID
+ if (setsid() < 0)
+ elog(FATAL, "setsid() failed: %m");
+#endif
+}
+
+/*
+ * Initialize the basic environment for a standalone process.
+ *
+ * argv0 has to be suitable to find the program's executable.
+ */
+void
+InitStandaloneProcess(const char *argv0)
+{
+ Assert(!IsPostmasterEnvironment);
+
+ MyProcPid = getpid(); /* reset MyProcPid */
+
+ MyStartTime = time(NULL); /* set our start time in case we call elog */
+
+ /* Compute paths, no postmaster to inherit from */
+ if (my_exec_path[0] == '\0')
+ {
+ if (find_my_exec(argv0, my_exec_path) < 0)
+ elog(FATAL, "%s: could not locate my own executable path",
+ argv0);
+ }
+}
/*
* GetUserId - get the current effective user ID.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index fc8bd7a..b94a944 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -277,6 +277,9 @@ extern int trace_recovery(int trace_level);
extern char *DatabasePath;
/* now in utils/init/miscinit.c */
+extern void InitPostmasterChild(void);
+extern void InitStandaloneProcess(const char *argv0);
+
extern void SetDatabasePath(const char *path);
extern char *GetUserNameFromId(Oid roleid);
--
2.2.1.212.gc5b9256
0003-Add-a-default-local-latch-for-use-in-signal-handlers.patchtext/x-patch; charset=us-asciiDownload
>From 412664b023638d9aa398ec26bcad98e182d9a959 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 8 Jan 2015 22:57:07 +0100
Subject: [PATCH 3/6] Add a default local latch for use in signal handlers.
To do so move InitializeLatchSupport() into the new common process
intialization functions and add two new global variables
MyLocalLatch and MyLatch.
MyLocalLatch always exists and points to a newly introduced local
latch that exists in all processes. MyLatch initially points to
MyLocalLatch but is redirected to the shared process latch in
InitProcess/InitAuxiliaryProcess. When (Auxiliary)ProcKill detaches
the process from it process entry it re-redirects it to the local
latch again.
This is primarily advantageous for two reasons: For one it simplifies
dealing with the shared process latch, especially in signal handlers,
because instead of having to check for MyProc, MyLatch can be used
unconditionally. The bigger advantage is that a later patch that make
FE/BE communication use latches now can just rely on the existance of
a latch, even before doing InitProcess.
---
src/backend/postmaster/autovacuum.c | 9 +++------
src/backend/postmaster/bgwriter.c | 6 ++----
src/backend/postmaster/checkpointer.c | 13 +++++--------
src/backend/postmaster/pgarch.c | 21 ++++++--------------
src/backend/postmaster/pgstat.c | 19 ++++++------------
src/backend/postmaster/syslogger.c | 18 +++++++----------
src/backend/postmaster/walwriter.c | 6 ++----
src/backend/storage/lmgr/proc.c | 33 ++++++++++++++------------------
src/backend/tcop/postgres.c | 12 ++++--------
src/backend/utils/init/globals.c | 2 ++
src/backend/utils/init/miscinit.c | 14 ++++++++++++++
src/backend/utils/misc/timeout.c | 6 ++----
src/include/miscadmin.h | 2 ++
src/include/storage/latch.h | 2 +-
src/test/modules/test_shm_mq/worker.c | 3 +--
src/test/modules/worker_spi/worker_spi.c | 6 ++----
16 files changed, 73 insertions(+), 99 deletions(-)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 2892e7b..f6c04ba 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -1342,8 +1342,7 @@ avl_sighup_handler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGHUP = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -1355,8 +1354,7 @@ avl_sigusr2_handler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGUSR2 = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -1368,8 +1366,7 @@ avl_sigterm_handler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGTERM = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index e8ceef7..842af12 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -427,8 +427,7 @@ BgSigHupHandler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGHUP = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -440,8 +439,7 @@ ReqShutdownHandler(SIGNAL_ARGS)
int save_errno = errno;
shutdown_requested = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 3c9c216..237be12 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -360,7 +360,7 @@ CheckpointerMain(void)
int rc;
/* Clear any already-pending wakeups */
- ResetLatch(&MyProc->procLatch);
+ ResetLatch(MyLatch);
/*
* Process any requests or signals received recently.
@@ -559,7 +559,7 @@ CheckpointerMain(void)
cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs);
}
- rc = WaitLatch(&MyProc->procLatch,
+ rc = WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
cur_timeout * 1000L /* convert to ms */ );
@@ -832,8 +832,7 @@ ChkptSigHupHandler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGHUP = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -845,8 +844,7 @@ ReqCheckpointHandler(SIGNAL_ARGS)
int save_errno = errno;
checkpoint_requested = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -869,8 +867,7 @@ ReqShutdownHandler(SIGNAL_ARGS)
int save_errno = errno;
shutdown_requested = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 78dec3a..9b689af 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -78,11 +78,6 @@ static volatile sig_atomic_t got_SIGTERM = false;
static volatile sig_atomic_t wakened = false;
static volatile sig_atomic_t ready_to_stop = false;
-/*
- * Latch used by signal handlers to wake up the sleep in the main loop.
- */
-static Latch mainloop_latch;
-
/* ----------
* Local function forward declarations
* ----------
@@ -220,10 +215,6 @@ pgarch_forkexec(void)
NON_EXEC_STATIC void
PgArchiverMain(int argc, char *argv[])
{
- InitializeLatchSupport(); /* needed for latch waits */
-
- InitLatch(&mainloop_latch); /* initialize latch used in main loop */
-
/*
* Ignore all signals usually bound to some action in the postmaster,
* except for SIGHUP, SIGTERM, SIGUSR1, SIGUSR2, and SIGQUIT.
@@ -269,7 +260,7 @@ ArchSigHupHandler(SIGNAL_ARGS)
/* set flag to re-read config file at next convenient time */
got_SIGHUP = true;
- SetLatch(&mainloop_latch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -287,7 +278,7 @@ ArchSigTermHandler(SIGNAL_ARGS)
* archive commands.
*/
got_SIGTERM = true;
- SetLatch(&mainloop_latch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -300,7 +291,7 @@ pgarch_waken(SIGNAL_ARGS)
/* set flag that there is work to be done */
wakened = true;
- SetLatch(&mainloop_latch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -313,7 +304,7 @@ pgarch_waken_stop(SIGNAL_ARGS)
/* set flag to do a final cycle and shut down afterwards */
ready_to_stop = true;
- SetLatch(&mainloop_latch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -344,7 +335,7 @@ pgarch_MainLoop(void)
*/
do
{
- ResetLatch(&mainloop_latch);
+ ResetLatch(MyLatch);
/* When we get SIGUSR2, we do one more archive cycle, then exit */
time_to_stop = ready_to_stop;
@@ -397,7 +388,7 @@ pgarch_MainLoop(void)
{
int rc;
- rc = WaitLatch(&mainloop_latch,
+ rc = WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
timeout * 1000L);
if (rc & WL_TIMEOUT)
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index fa87660..9b0c3c6 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -130,8 +130,6 @@ PgStat_MsgBgWriter BgWriterStats;
*/
NON_EXEC_STATIC pgsocket pgStatSock = PGINVALID_SOCKET;
-static Latch pgStatLatch;
-
static struct sockaddr_storage pgStatAddr;
static time_t last_pgstat_start_time;
@@ -3151,15 +3149,10 @@ PgstatCollectorMain(int argc, char *argv[])
PgStat_Msg msg;
int wr;
- InitializeLatchSupport(); /* needed for latch waits */
-
- /* Initialize private latch for use by signal handlers */
- InitLatch(&pgStatLatch);
-
/*
* Ignore all signals usually bound to some action in the postmaster,
* except SIGHUP and SIGQUIT. Note we don't need a SIGUSR1 handler to
- * support latch operations, because pgStatLatch is local not shared.
+ * support latch operations, because we only use a local latch.
*/
pqsignal(SIGHUP, pgstat_sighup_handler);
pqsignal(SIGINT, SIG_IGN);
@@ -3205,7 +3198,7 @@ PgstatCollectorMain(int argc, char *argv[])
for (;;)
{
/* Clear any already-pending wakeups */
- ResetLatch(&pgStatLatch);
+ ResetLatch(MyLocalLatch);
/*
* Quit if we get SIGQUIT from the postmaster.
@@ -3363,7 +3356,7 @@ PgstatCollectorMain(int argc, char *argv[])
/* Sleep until there's something to do */
#ifndef WIN32
- wr = WaitLatchOrSocket(&pgStatLatch,
+ wr = WaitLatchOrSocket(MyLocalLatch,
WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE,
pgStatSock,
-1L);
@@ -3379,7 +3372,7 @@ PgstatCollectorMain(int argc, char *argv[])
* to not provoke "pgstat wait timeout" complaints from
* backend_read_statsfile.
*/
- wr = WaitLatchOrSocket(&pgStatLatch,
+ wr = WaitLatchOrSocket(MyLocalLatch,
WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_SOCKET_READABLE | WL_TIMEOUT,
pgStatSock,
2 * 1000L /* msec */ );
@@ -3409,7 +3402,7 @@ pgstat_exit(SIGNAL_ARGS)
int save_errno = errno;
need_exit = true;
- SetLatch(&pgStatLatch);
+ SetLatch(MyLocalLatch);
errno = save_errno;
}
@@ -3421,7 +3414,7 @@ pgstat_sighup_handler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGHUP = true;
- SetLatch(&pgStatLatch);
+ SetLatch(MyLocalLatch);
errno = save_errno;
}
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 6e6754e..41b8dbb 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -85,7 +85,6 @@ static FILE *csvlogFile = NULL;
NON_EXEC_STATIC pg_time_t first_syslogger_file_time = 0;
static char *last_file_name = NULL;
static char *last_csv_file_name = NULL;
-static Latch sysLoggerLatch;
/*
* Buffers for saving partial messages from different backends.
@@ -231,9 +230,6 @@ SysLoggerMain(int argc, char *argv[])
syslogPipe[1] = 0;
#endif
- /* Initialize private latch for use by signal handlers */
- InitLatch(&sysLoggerLatch);
-
/*
* Properly accept or ignore signals the postmaster might send us
*
@@ -299,7 +295,7 @@ SysLoggerMain(int argc, char *argv[])
#endif
/* Clear any already-pending wakeups */
- ResetLatch(&sysLoggerLatch);
+ ResetLatch(MyLatch);
/*
* Process any requests or signals received recently.
@@ -425,7 +421,7 @@ SysLoggerMain(int argc, char *argv[])
* Sleep until there's something to do
*/
#ifndef WIN32
- rc = WaitLatchOrSocket(&sysLoggerLatch,
+ rc = WaitLatchOrSocket(MyLatch,
WL_LATCH_SET | WL_SOCKET_READABLE | cur_flags,
syslogPipe[0],
cur_timeout);
@@ -477,7 +473,7 @@ SysLoggerMain(int argc, char *argv[])
*/
LeaveCriticalSection(&sysloggerSection);
- (void) WaitLatch(&sysLoggerLatch,
+ (void) WaitLatch(MyLatch,
WL_LATCH_SET | cur_flags,
cur_timeout);
@@ -1058,7 +1054,7 @@ pipeThread(void *arg)
{
if (ftell(syslogFile) >= Log_RotationSize * 1024L ||
(csvlogFile != NULL && ftell(csvlogFile) >= Log_RotationSize * 1024L))
- SetLatch(&sysLoggerLatch);
+ SetLatch(MyLatch);
}
LeaveCriticalSection(&sysloggerSection);
}
@@ -1070,7 +1066,7 @@ pipeThread(void *arg)
flush_pipe_input(logbuffer, &bytes_in_logbuffer);
/* set the latch to waken the main thread, which will quit */
- SetLatch(&sysLoggerLatch);
+ SetLatch(MyLatch);
LeaveCriticalSection(&sysloggerSection);
_endthread();
@@ -1350,7 +1346,7 @@ sigHupHandler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGHUP = true;
- SetLatch(&sysLoggerLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -1362,7 +1358,7 @@ sigUsr1Handler(SIGNAL_ARGS)
int save_errno = errno;
rotation_requested = true;
- SetLatch(&sysLoggerLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
diff --git a/src/backend/postmaster/walwriter.c b/src/backend/postmaster/walwriter.c
index 8ff5005..297faf5 100644
--- a/src/backend/postmaster/walwriter.c
+++ b/src/backend/postmaster/walwriter.c
@@ -347,8 +347,7 @@ WalSigHupHandler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGHUP = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -360,8 +359,7 @@ WalShutdownHandler(SIGNAL_ARGS)
int save_errno = errno;
shutdown_requested = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 777c60b..3ef1d5e 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -291,13 +291,6 @@ InitProcess(void)
elog(ERROR, "you already exist");
/*
- * Initialize process-local latch support. This could fail if the kernel
- * is low on resources, and if so we want to exit cleanly before acquiring
- * any shared-memory resources.
- */
- InitializeLatchSupport();
-
- /*
* Try to get a proc struct from the free list. If this fails, we must be
* out of PGPROC structures (not to mention semaphores).
*
@@ -391,10 +384,15 @@ InitProcess(void)
SHMQueueElemInit(&(MyProc->syncRepLinks));
/*
- * Acquire ownership of the PGPROC's latch, so that we can use WaitLatch.
- * Note that there's no particular need to do ResetLatch here.
+ * Acquire ownership of the PGPROC's latch, so that we can use WaitLatch
+ * on it. That allows us to repoint the process latch, which so far
+ * points to process local one, to the shared one. Set the new latch, so a
+ * potentially pending even from the old latch is definitely visible on
+ * the new one.
*/
OwnLatch(&MyProc->procLatch);
+ MyLatch = &MyProc->procLatch;
+ SetLatch(MyLatch);
/*
* We might be reusing a semaphore that belonged to a failed process. So
@@ -475,13 +473,6 @@ InitAuxiliaryProcess(void)
elog(ERROR, "you already exist");
/*
- * Initialize process-local latch support. This could fail if the kernel
- * is low on resources, and if so we want to exit cleanly before acquiring
- * any shared-memory resources.
- */
- InitializeLatchSupport();
-
- /*
* We use the ProcStructLock to protect assignment and releasing of
* AuxiliaryProcs entries.
*
@@ -551,6 +542,8 @@ InitAuxiliaryProcess(void)
* Note that there's no particular need to do ResetLatch here.
*/
OwnLatch(&MyProc->procLatch);
+ MyLatch = &MyProc->procLatch;
+ SetLatch(MyLatch);
/*
* We might be reusing a semaphore that belonged to a failed process. So
@@ -800,10 +793,11 @@ ProcKill(int code, Datum arg)
ReplicationSlotRelease();
/*
- * Clear MyProc first; then disown the process latch. This is so that
- * signal handlers won't try to clear the process latch after it's no
- * longer ours.
+ * Reset MyLatch to the process local one. This is so that signal
+ * handlers won't try to access the shard process latch after it's no
+ * longer ours. After that clear MyProc and disown the shared latch.
*/
+ MyLatch = MyLocalLatch;
proc = MyProc;
MyProc = NULL;
DisownLatch(&proc->procLatch);
@@ -871,6 +865,7 @@ AuxiliaryProcKill(int code, Datum arg)
* signal handlers won't try to clear the process latch after it's no
* longer ours.
*/
+ MyLatch = MyLocalLatch;
proc = MyProc;
MyProc = NULL;
DisownLatch(&proc->procLatch);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5e6d0f7..8bf006b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2608,8 +2608,7 @@ die(SIGNAL_ARGS)
}
/* If we're still here, waken anything waiting on the process latch */
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -2650,8 +2649,7 @@ StatementCancelHandler(SIGNAL_ARGS)
}
/* If we're still here, waken anything waiting on the process latch */
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -2676,8 +2674,7 @@ SigHupHandler(SIGNAL_ARGS)
int save_errno = errno;
got_SIGHUP = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -2815,8 +2812,7 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
* waiting on that latch, expecting to get interrupted by query cancels et
* al., would also need to set set_latch_on_sigusr1.
*/
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index dc5c8d6..685ad7f 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -37,6 +37,8 @@ volatile uint32 CritSectionCount = 0;
int MyProcPid;
pg_time_t MyStartTime;
struct Port *MyProcPort;
+struct Latch *MyLocalLatch;
+struct Latch *MyLatch;
long MyCancelKey;
int MyPMChildSlot;
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index aea26dd..707348d 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -38,6 +38,7 @@
#include "postmaster/postmaster.h"
#include "storage/fd.h"
#include "storage/ipc.h"
+#include "storage/latch.h"
#include "storage/pg_shmem.h"
#include "storage/proc.h"
#include "storage/procarray.h"
@@ -54,6 +55,7 @@ ProcessingMode Mode = InitProcessing;
/* List of lock files to be removed at proc exit */
static List *lock_files = NIL;
+static Latch LocalLatchData;
/* ----------------------------------------------------------------
* ignoring system indexes support stuff
@@ -188,6 +190,12 @@ InitPostmasterChild(void)
/* We don't want the postmaster's proc_exit() handlers */
on_exit_reset();
+ /* Initialize process-local latch support */
+ InitializeLatchSupport();
+ MyLocalLatch = &LocalLatchData;
+ MyLatch = &LocalLatchData;
+ InitLatch(MyLocalLatch);
+
/*
* If possible, make this process a group leader, so that the postmaster
* can signal any child processes too. Not all processes will have
@@ -214,6 +222,12 @@ InitStandaloneProcess(const char *argv0)
MyStartTime = time(NULL); /* set our start time in case we call elog */
+ /* Initialize process-local latch support */
+ InitializeLatchSupport();
+ MyLocalLatch = &LocalLatchData;
+ MyLatch = &LocalLatchData;
+ InitLatch(MyLocalLatch);
+
/* Compute paths, no postmaster to inherit from */
if (my_exec_path[0] == '\0')
{
diff --git a/src/backend/utils/misc/timeout.c b/src/backend/utils/misc/timeout.c
index 1dec492..ce4bc13 100644
--- a/src/backend/utils/misc/timeout.c
+++ b/src/backend/utils/misc/timeout.c
@@ -284,11 +284,9 @@ handle_sig_alarm(SIGNAL_ARGS)
/*
* SIGALRM is always cause for waking anything waiting on the process
- * latch. Cope with MyProc not being there, as the startup process also
- * uses this signal handler.
+ * latch.
*/
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
/*
* Fire any pending timeouts, but only if we're enabled to do so.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index b94a944..81a5057 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -148,6 +148,8 @@ extern int max_worker_processes;
extern PGDLLIMPORT int MyProcPid;
extern PGDLLIMPORT pg_time_t MyStartTime;
extern PGDLLIMPORT struct Port *MyProcPort;
+extern PGDLLIMPORT struct Latch *MyLatch;
+extern PGDLLIMPORT struct Latch *MyLocalLatch;
extern long MyCancelKey;
extern int MyPMChildSlot;
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 138a748..0d9b14b 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -92,7 +92,7 @@
* the public functions. It is defined here to allow embedding Latches as
* part of bigger structs.
*/
-typedef struct
+typedef struct Latch
{
sig_atomic_t is_set;
bool is_shared;
diff --git a/src/test/modules/test_shm_mq/worker.c b/src/test/modules/test_shm_mq/worker.c
index dec058b..a9d9e0e 100644
--- a/src/test/modules/test_shm_mq/worker.c
+++ b/src/test/modules/test_shm_mq/worker.c
@@ -211,8 +211,7 @@ handle_sigterm(SIGNAL_ARGS)
{
int save_errno = errno;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
if (!proc_exit_inprogress)
{
diff --git a/src/test/modules/worker_spi/worker_spi.c b/src/test/modules/worker_spi/worker_spi.c
index ac0f59c..10080ed 100644
--- a/src/test/modules/worker_spi/worker_spi.c
+++ b/src/test/modules/worker_spi/worker_spi.c
@@ -74,8 +74,7 @@ worker_spi_sigterm(SIGNAL_ARGS)
int save_errno = errno;
got_sigterm = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
@@ -91,8 +90,7 @@ worker_spi_sighup(SIGNAL_ARGS)
int save_errno = errno;
got_sighup = true;
- if (MyProc)
- SetLatch(&MyProc->procLatch);
+ SetLatch(MyLatch);
errno = save_errno;
}
--
2.2.1.212.gc5b9256
0004-Use-a-nonblocking-socket-for-FE-BE-communication-and.patchtext/x-patch; charset=us-asciiDownload
>From 429673a1a287515608686de5addb873fcbd5b464 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Thu, 8 Jan 2015 15:50:10 +0100
Subject: [PATCH 4/6] Use a nonblocking socket for FE/BE communication and
block using latches.
This allows to introduce more elaborate handling of interrupts while
reading from a socket. Currently some interrupt handlers have to do
significant work from inside signal handlers, and it's very hard to
correctly write code to do so. Generic signal handler limitations,
combined with the fact that we can't safely jump out of a signal
handler while reading from the client have prohibited implemenation of
features like timeouts for idle-in-transaction.
This commit probably can't actually safely be used without the next
patch in the series, as WaitLatchOrSocket might not be fully signal
safe. Reviewing them separately is easier though, so I'll just fold
them before the final commit.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
---
src/backend/libpq/be-secure-openssl.c | 32 ++++-----------
src/backend/libpq/be-secure.c | 76 ++++++++++++++++++++++++++++++++++-
src/backend/libpq/pqcomm.c | 41 ++++++++-----------
3 files changed, 99 insertions(+), 50 deletions(-)
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 1dd7770..729b746 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -371,12 +371,8 @@ aloop:
{
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
-#ifdef WIN32
- pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
- (err == SSL_ERROR_WANT_READ) ?
- FD_READ | FD_CLOSE | FD_ACCEPT : FD_WRITE | FD_CLOSE,
- INFINITE);
-#endif
+ /* not allowed during connection establishment */
+ Assert(!port->noblock);
goto aloop;
case SSL_ERROR_SYSCALL:
if (r < 0)
@@ -516,18 +512,9 @@ rloop:
break;
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
+ /* Don't retry if the socket is in nonblocking mode. */
if (port->noblock)
- {
- errno = EWOULDBLOCK;
- n = -1;
break;
- }
-#ifdef WIN32
- pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
- (err == SSL_ERROR_WANT_READ) ?
- FD_READ | FD_CLOSE : FD_WRITE | FD_CLOSE,
- INFINITE);
-#endif
goto rloop;
case SSL_ERROR_SYSCALL:
/* leave it to caller to ereport the value of errno */
@@ -630,12 +617,9 @@ wloop:
break;
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
-#ifdef WIN32
- pgwin32_waitforsinglesocket(SSL_get_fd(port->ssl),
- (err == SSL_ERROR_WANT_READ) ?
- FD_READ | FD_CLOSE : FD_WRITE | FD_CLOSE,
- INFINITE);
-#endif
+ /* Don't retry if the socket is in nonblocking mode. */
+ if (port->noblock)
+ break;
goto wloop;
case SSL_ERROR_SYSCALL:
/* leave it to caller to ereport the value of errno */
@@ -722,7 +706,7 @@ my_sock_read(BIO *h, char *buf, int size)
if (res <= 0)
{
/* If we were interrupted, tell caller to retry */
- if (errno == EINTR)
+ if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
BIO_set_retry_read(h);
}
@@ -741,7 +725,7 @@ my_sock_write(BIO *h, const char *buf, int size)
BIO_clear_retry_flags(h);
if (res <= 0)
{
- if (errno == EINTR)
+ if (errno == EINTR || errno == EWOULDBLOCK || errno == EAGAIN)
{
BIO_set_retry_write(h);
}
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index c592f85..709131f 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -18,6 +18,8 @@
#include "postgres.h"
+#include "miscadmin.h"
+
#include <sys/stat.h>
#include <signal.h>
#include <fcntl.h>
@@ -34,6 +36,7 @@
#include "libpq/libpq.h"
#include "tcop/tcopprot.h"
#include "utils/memutils.h"
+#include "storage/proc.h"
char *ssl_cert_file;
@@ -147,7 +150,37 @@ secure_raw_read(Port *port, void *ptr, size_t len)
prepare_for_client_read();
+ /*
+ * Try to read from the socket without blocking. If it suceeds we're
+ * done, otherwise we'll wait for the socket using the latch mechanism.
+ */
+rloop:
+#ifdef WIN32
+ pgwin32_noblock = true;
+#endif
n = recv(port->sock, ptr, len, 0);
+#ifdef WIN32
+ pgwin32_noblock = false;
+#endif
+
+ if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
+ {
+ int w;
+ int save_errno = errno;
+
+ w = WaitLatchOrSocket(MyLatch,
+ WL_SOCKET_READABLE,
+ port->sock, 0);
+
+ if (w & WL_SOCKET_READABLE)
+ goto rloop;
+
+ /*
+ * Restore errno, clobbered by WaitLatchOrSocket, so the caller can
+ * react properly.
+ */
+ errno = save_errno;
+ }
client_read_ended();
@@ -170,7 +203,9 @@ secure_write(Port *port, void *ptr, size_t len)
}
else
#endif
+ {
n = secure_raw_write(port, ptr, len);
+ }
return n;
}
@@ -178,5 +213,44 @@ secure_write(Port *port, void *ptr, size_t len)
ssize_t
secure_raw_write(Port *port, const void *ptr, size_t len)
{
- return send(port->sock, ptr, len, 0);
+ ssize_t n;
+
+wloop:
+
+#ifdef WIN32
+ pgwin32_noblock = true;
+#endif
+ n = send(port->sock, ptr, len, 0);
+#ifdef WIN32
+ pgwin32_noblock = false;
+#endif
+
+ if (!port->noblock && n < 0 && (errno == EWOULDBLOCK || errno == EAGAIN))
+ {
+ int w;
+ int save_errno = errno;
+
+ /*
+ * We probably want to check for latches being set at some point
+ * here. That'd allow us to handle interrupts while blocked on
+ * writes. If set we'd not retry directly, but return. That way we
+ * don't do anything while (possibly) inside a ssl library.
+ */
+ w = WaitLatchOrSocket(MyLatch,
+ WL_SOCKET_WRITEABLE,
+ port->sock, 0);
+
+ if (w & WL_SOCKET_WRITEABLE)
+ {
+ goto wloop;
+ }
+
+ /*
+ * Restore errno, clobbered by WaitLatchOrSocket, so the caller can
+ * react properly.
+ */
+ errno = save_errno;
+ }
+
+ return n;
}
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index e3efac3..6f35508 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -179,6 +179,22 @@ pq_init(void)
PqCommBusy = false;
DoingCopyOut = false;
on_proc_exit(socket_close, 0);
+
+ /*
+ * In individual backends we operate the underlying socket in nonblocking
+ * mode and use latches to implement blocking semantics if needed. That
+ * allows us to provide safely interruptible reads.
+ *
+ * Use COMMERROR on failure, because ERROR would try to send the error to
+ * the client, which might require changing the mode again, leading to
+ * infinite recursion.
+ */
+#ifndef WIN32
+ if (!pg_set_noblock(MyProcPort->sock))
+ ereport(COMMERROR,
+ (errmsg("could not set socket to nonblocking mode: %m")));
+#endif
+
}
/* --------------------------------
@@ -818,31 +834,6 @@ socket_set_nonblocking(bool nonblocking)
(errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
errmsg("there is no client connection")));
- if (MyProcPort->noblock == nonblocking)
- return;
-
-#ifdef WIN32
- pgwin32_noblock = nonblocking ? 1 : 0;
-#else
-
- /*
- * Use COMMERROR on failure, because ERROR would try to send the error to
- * the client, which might require changing the mode again, leading to
- * infinite recursion.
- */
- if (nonblocking)
- {
- if (!pg_set_noblock(MyProcPort->sock))
- ereport(COMMERROR,
- (errmsg("could not set socket to nonblocking mode: %m")));
- }
- else
- {
- if (!pg_set_block(MyProcPort->sock))
- ereport(COMMERROR,
- (errmsg("could not set socket to blocking mode: %m")));
- }
-#endif
MyProcPort->noblock = nonblocking;
}
--
2.2.1.212.gc5b9256
0005-Introduce-and-use-infrastructure-for-interrupt-proce.patchtext/x-patch; charset=us-asciiDownload
>From 2a0ac8100d0e5d63f7c10462e4b1cfc4ec71b856 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sat, 27 Sep 2014 23:49:30 +0200
Subject: [PATCH 5/6] Introduce and use infrastructure for interrupt processing
during client reads.
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severily limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state.
Now that FE/BE communication in the backend employs nonblocking
sockets and latches to block we can quite simply interrupt reads from
signal handlers by setting a signal. That allows us to signal an
interrupted read wich is supposed to be retried after returning from
within the ssl libray.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
---
src/backend/commands/async.c | 199 ++++++------------------------
src/backend/libpq/be-secure-openssl.c | 18 +++
src/backend/libpq/be-secure.c | 51 ++++++--
src/backend/postmaster/autovacuum.c | 6 +-
src/backend/storage/ipc/sinval.c | 223 +++++++---------------------------
src/backend/tcop/postgres.c | 110 ++++++-----------
src/include/commands/async.h | 12 +-
src/include/storage/sinval.h | 7 +-
src/include/tcop/tcopprot.h | 4 +-
9 files changed, 187 insertions(+), 443 deletions(-)
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index b945d9d..81632e6 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -79,10 +79,12 @@
* either, but just process the queue directly.
*
* 5. Upon receipt of a PROCSIG_NOTIFY_INTERRUPT signal, the signal handler
- * can call inbound-notify processing immediately if this backend is idle
- * (ie, it is waiting for a frontend command and is not within a transaction
- * block). Otherwise the handler may only set a flag, which will cause the
- * processing to occur just before we next go idle.
+ * sets the process's latch which triggers the event to be processed
+ * immediately if this backend is idle (ie, it is waiting for a frontend
+ * command and is not within a transaction block. C.f.
+ * ProcessClientReadInterrupt()). Otherwise the handler may only set a
+ * flag, which will cause the processing to occur just before we next go
+ * idle.
*
* Inbound-notify processing consists of reading all of the notifications
* that have arrived since scanning last time. We read every notification
@@ -126,6 +128,7 @@
#include "miscadmin.h"
#include "storage/ipc.h"
#include "storage/lmgr.h"
+#include "storage/proc.h"
#include "storage/procarray.h"
#include "storage/procsignal.h"
#include "storage/sinval.h"
@@ -334,17 +337,13 @@ static List *pendingNotifies = NIL; /* list of Notifications */
static List *upperPendingNotifies = NIL; /* list of upper-xact lists */
/*
- * State for inbound notifications consists of two flags: one saying whether
- * the signal handler is currently allowed to call ProcessIncomingNotify
- * directly, and one saying whether the signal has occurred but the handler
- * was not allowed to call ProcessIncomingNotify at the time.
- *
- * NB: the "volatile" on these declarations is critical! If your compiler
- * does not grok "volatile", you'd be best advised to compile this file
- * with all optimization turned off.
+ * Inbound notifications are initially processed by HandleNotifyInterrupt(),
+ * called from inside a signal handler. That just sets the
+ * notifyInterruptPending flag and sets the process
+ * latch. ProcessNotifyInterrupt() will then be called whenever it's safe to
+ * actually deal with the interrupt.
*/
-static volatile sig_atomic_t notifyInterruptEnabled = 0;
-static volatile sig_atomic_t notifyInterruptOccurred = 0;
+volatile sig_atomic_t notifyInterruptPending = false;
/* True if we've registered an on_shmem_exit cleanup */
static bool unlistenExitRegistered = false;
@@ -1625,164 +1624,45 @@ AtSubAbort_Notify(void)
/*
* HandleNotifyInterrupt
*
- * This is called when PROCSIG_NOTIFY_INTERRUPT is received.
- *
- * If we are idle (notifyInterruptEnabled is set), we can safely invoke
- * ProcessIncomingNotify directly. Otherwise, just set a flag
- * to do it later.
+ * Signal handler portion of interrupt handling. Let the backend know
+ * that there's a pending notify interrupt. If we're currently reading
+ * from the client, this will interrupt the read and
+ * ProcessClientReadInterrupt() will call ProcessNotifyInterrupt().
*/
void
HandleNotifyInterrupt(void)
{
/*
* Note: this is called by a SIGNAL HANDLER. You must be very wary what
- * you do here. Some helpful soul had this routine sprinkled with
- * TPRINTFs, which would likely lead to corruption of stdio buffers if
- * they were ever turned on.
+ * you do here.
*/
- /* Don't joggle the elbow of proc_exit */
- if (proc_exit_inprogress)
- return;
-
- if (notifyInterruptEnabled)
- {
- bool save_ImmediateInterruptOK = ImmediateInterruptOK;
-
- /*
- * We may be called while ImmediateInterruptOK is true; turn it off
- * while messing with the NOTIFY state. This prevents problems if
- * SIGINT or similar arrives while we're working. Just to be real
- * sure, bump the interrupt holdoff counter as well. That way, even
- * if something inside ProcessIncomingNotify() transiently sets
- * ImmediateInterruptOK (eg while waiting on a lock), we won't get
- * interrupted until we're done with the notify interrupt.
- */
- ImmediateInterruptOK = false;
- HOLD_INTERRUPTS();
-
- /*
- * I'm not sure whether some flavors of Unix might allow another
- * SIGUSR1 occurrence to recursively interrupt this routine. To cope
- * with the possibility, we do the same sort of dance that
- * EnableNotifyInterrupt must do --- see that routine for comments.
- */
- notifyInterruptEnabled = 0; /* disable any recursive signal */
- notifyInterruptOccurred = 1; /* do at least one iteration */
- for (;;)
- {
- notifyInterruptEnabled = 1;
- if (!notifyInterruptOccurred)
- break;
- notifyInterruptEnabled = 0;
- if (notifyInterruptOccurred)
- {
- /* Here, it is finally safe to do stuff. */
- if (Trace_notify)
- elog(DEBUG1, "HandleNotifyInterrupt: perform async notify");
-
- ProcessIncomingNotify();
-
- if (Trace_notify)
- elog(DEBUG1, "HandleNotifyInterrupt: done");
- }
- }
+ /* signal that work needs to be done */
+ notifyInterruptPending = true;
- /*
- * Restore the holdoff level and ImmediateInterruptOK, and check for
- * interrupts if needed.
- */
- RESUME_INTERRUPTS();
- ImmediateInterruptOK = save_ImmediateInterruptOK;
- if (save_ImmediateInterruptOK)
- CHECK_FOR_INTERRUPTS();
- }
- else
- {
- /*
- * In this path it is NOT SAFE to do much of anything, except this:
- */
- notifyInterruptOccurred = 1;
- }
+ /* make sure the event is processed in due course */
+ SetLatch(MyLatch);
}
/*
- * EnableNotifyInterrupt
- *
- * This is called by the PostgresMain main loop just before waiting
- * for a frontend command. If we are truly idle (ie, *not* inside
- * a transaction block), then process any pending inbound notifies,
- * and enable the signal handler to process future notifies directly.
+ * ProcessNotifyInterrupt
*
- * NOTE: the signal handler starts out disabled, and stays so until
- * PostgresMain calls this the first time.
+ * This is called just after waiting for a frontend command. If a
+ * interrupt arrives (via HandleNotifyInterrupt()) while reading, the
+ * read will be interrupted via the process's latch, and this routine
+ * will get called. If we are truly idle (ie, *not* inside a transaction
+ * block), process the incoming notifies.
*/
void
-EnableNotifyInterrupt(void)
+ProcessNotifyInterrupt(void)
{
if (IsTransactionOrTransactionBlock())
return; /* not really idle */
- /*
- * This code is tricky because we are communicating with a signal handler
- * that could interrupt us at any point. If we just checked
- * notifyInterruptOccurred and then set notifyInterruptEnabled, we could
- * fail to respond promptly to a signal that happens in between those two
- * steps. (A very small time window, perhaps, but Murphy's Law says you
- * can hit it...) Instead, we first set the enable flag, then test the
- * occurred flag. If we see an unserviced interrupt has occurred, we
- * re-clear the enable flag before going off to do the service work. (That
- * prevents re-entrant invocation of ProcessIncomingNotify() if another
- * interrupt occurs.) If an interrupt comes in between the setting and
- * clearing of notifyInterruptEnabled, then it will have done the service
- * work and left notifyInterruptOccurred zero, so we have to check again
- * after clearing enable. The whole thing has to be in a loop in case
- * another interrupt occurs while we're servicing the first. Once we get
- * out of the loop, enable is set and we know there is no unserviced
- * interrupt.
- *
- * NB: an overenthusiastic optimizing compiler could easily break this
- * code. Hopefully, they all understand what "volatile" means these days.
- */
- for (;;)
- {
- notifyInterruptEnabled = 1;
- if (!notifyInterruptOccurred)
- break;
- notifyInterruptEnabled = 0;
- if (notifyInterruptOccurred)
- {
- if (Trace_notify)
- elog(DEBUG1, "EnableNotifyInterrupt: perform async notify");
-
- ProcessIncomingNotify();
-
- if (Trace_notify)
- elog(DEBUG1, "EnableNotifyInterrupt: done");
- }
- }
+ while (notifyInterruptPending)
+ ProcessIncomingNotify();
}
-/*
- * DisableNotifyInterrupt
- *
- * This is called by the PostgresMain main loop just after receiving
- * a frontend command. Signal handler execution of inbound notifies
- * is disabled until the next EnableNotifyInterrupt call.
- *
- * The PROCSIG_CATCHUP_INTERRUPT signal handler also needs to call this,
- * so as to prevent conflicts if one signal interrupts the other. So we
- * must return the previous state of the flag.
- */
-bool
-DisableNotifyInterrupt(void)
-{
- bool result = (notifyInterruptEnabled != 0);
-
- notifyInterruptEnabled = 0;
-
- return result;
-}
/*
* Read all pending notifications from the queue, and deliver appropriate
@@ -2076,9 +1956,10 @@ asyncQueueAdvanceTail(void)
/*
* ProcessIncomingNotify
*
- * Deal with arriving NOTIFYs from other backends.
- * This is called either directly from the PROCSIG_NOTIFY_INTERRUPT
- * signal handler, or the next time control reaches the outer idle loop.
+ * Deal with arriving NOTIFYs from other backends as soon as it's safe to
+ * do so. This used to be called from the PROCSIG_NOTIFY_INTERRUPT
+ * signal handler, but isn't anymore.
+ *
* Scan the queue for arriving notifications and report them to my front
* end.
*
@@ -2087,18 +1968,13 @@ asyncQueueAdvanceTail(void)
static void
ProcessIncomingNotify(void)
{
- bool catchup_enabled;
-
/* We *must* reset the flag */
- notifyInterruptOccurred = 0;
+ notifyInterruptPending = false;
/* Do nothing else if we aren't actively listening */
if (listenChannels == NIL)
return;
- /* Must prevent catchup interrupt while I am running */
- catchup_enabled = DisableCatchupInterrupt();
-
if (Trace_notify)
elog(DEBUG1, "ProcessIncomingNotify");
@@ -2123,9 +1999,6 @@ ProcessIncomingNotify(void)
if (Trace_notify)
elog(DEBUG1, "ProcessIncomingNotify: done");
-
- if (catchup_enabled)
- EnableCatchupInterrupt();
}
/*
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 729b746..3a70f43 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -515,6 +515,20 @@ rloop:
/* Don't retry if the socket is in nonblocking mode. */
if (port->noblock)
break;
+
+ /*
+ * We'll, among other situations, get here if the low level
+ * routine doing the actual recv() via the socket got interrupted
+ * by a signal. That's so we can handle interrupts once outside
+ * openssl so we don't jump out from underneath its covers. We can
+ * check this both, when reading and writing, because even when
+ * writing that's just openssl's doing, not a 'proper' write
+ * initiated by postgres.
+ *
+ * Only process interrupts here if we're blocking inside the
+ * function. In the other cases secure_read() will do so.
+ */
+ ProcessClientReadInterrupt(); /* preserves errno */
goto rloop;
case SSL_ERROR_SYSCALL:
/* leave it to caller to ereport the value of errno */
@@ -620,6 +634,10 @@ wloop:
/* Don't retry if the socket is in nonblocking mode. */
if (port->noblock)
break;
+ /*
+ * XXX: We'll, at some later point, likely want to add interrupt
+ * processing here.
+ */
goto wloop;
case SSL_ERROR_SYSCALL:
/* leave it to caller to ereport the value of errno */
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 709131f..71742a6 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -129,6 +129,7 @@ secure_read(Port *port, void *ptr, size_t len)
{
ssize_t n;
+retry:
#ifdef USE_SSL
if (port->ssl_in_use)
{
@@ -140,6 +141,14 @@ secure_read(Port *port, void *ptr, size_t len)
n = secure_raw_read(port, ptr, len);
}
+ /* Process interrupts that happened while (or before) receiving. */
+ ProcessClientReadInterrupt(); /* preserves errno */
+
+ /* retry after processing interrupts */
+ if (n < 0 && errno == EINTR)
+ {
+ goto retry;
+ }
return n;
}
@@ -148,8 +157,6 @@ secure_raw_read(Port *port, void *ptr, size_t len)
{
ssize_t n;
- prepare_for_client_read();
-
/*
* Try to read from the socket without blocking. If it suceeds we're
* done, otherwise we'll wait for the socket using the latch mechanism.
@@ -168,11 +175,22 @@ rloop:
int w;
int save_errno = errno;
+
w = WaitLatchOrSocket(MyLatch,
- WL_SOCKET_READABLE,
+ WL_LATCH_SET | WL_SOCKET_READABLE,
port->sock, 0);
- if (w & WL_SOCKET_READABLE)
+ if (w & WL_LATCH_SET)
+ {
+ ResetLatch(&MyProc->procLatch);
+ /*
+ * Force a return, so interrupts can be processed when not
+ * (possibly) underneath a ssl library.
+ */
+ errno = EINTR;
+ return -1;
+ }
+ else if (w & WL_SOCKET_READABLE)
goto rloop;
/*
@@ -182,8 +200,6 @@ rloop:
errno = save_errno;
}
- client_read_ended();
-
return n;
}
@@ -196,6 +212,7 @@ secure_write(Port *port, void *ptr, size_t len)
{
ssize_t n;
+retry:
#ifdef USE_SSL
if (port->ssl_in_use)
{
@@ -207,6 +224,21 @@ secure_write(Port *port, void *ptr, size_t len)
n = secure_raw_write(port, ptr, len);
}
+ /*
+ * XXX: We'll, at some later point, likely want to add interrupt
+ * processing here.
+ */
+
+ /*
+ * Retry after processing interrupts. This can be triggered even though we
+ * don't check for latch set's during writing yet, because SSL
+ * renegotiations might have required reading from the socket.
+ */
+ if (n < 0 && errno == EINTR)
+ {
+ goto retry;
+ }
+
return n;
}
@@ -231,10 +263,9 @@ wloop:
int save_errno = errno;
/*
- * We probably want to check for latches being set at some point
- * here. That'd allow us to handle interrupts while blocked on
- * writes. If set we'd not retry directly, but return. That way we
- * don't do anything while (possibly) inside a ssl library.
+ * XXX: We'll, at some later point, likely want to add interrupt
+ * processing here. If set we'd not retry directly, but return. That
+ * way we don't do anything while (possibly) inside a ssl library.
*/
w = WaitLatchOrSocket(MyLatch,
WL_SOCKET_WRITEABLE,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f6c04ba..803c40a 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -583,9 +583,6 @@ AutoVacLauncherMain(int argc, char *argv[])
launcher_determine_sleep(!dlist_is_empty(&AutoVacuumShmem->av_freeWorkers),
false, &nap);
- /* Allow sinval catchup interrupts while sleeping */
- EnableCatchupInterrupt();
-
/*
* Wait until naptime expires or we get some type of signal (all the
* signal handlers will wake us by calling SetLatch).
@@ -596,7 +593,8 @@ AutoVacLauncherMain(int argc, char *argv[])
ResetLatch(&MyProc->procLatch);
- DisableCatchupInterrupt();
+ /* Process sinval catchup interrupts that happened while sleeping */
+ ProcessCatchupInterrupt();
/*
* Emergency bailout if postmaster has died. This is to avoid the
diff --git a/src/backend/storage/ipc/sinval.c b/src/backend/storage/ipc/sinval.c
index e6b0d49..67ec515 100644
--- a/src/backend/storage/ipc/sinval.c
+++ b/src/backend/storage/ipc/sinval.c
@@ -18,6 +18,7 @@
#include "commands/async.h"
#include "miscadmin.h"
#include "storage/ipc.h"
+#include "storage/proc.h"
#include "storage/sinvaladt.h"
#include "utils/inval.h"
@@ -32,19 +33,12 @@ uint64 SharedInvalidMessageCounter;
* through a cache reset exercise. This is done by sending
* PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind.
*
- * State for catchup events consists of two flags: one saying whether
- * the signal handler is currently allowed to call ProcessCatchupEvent
- * directly, and one saying whether the signal has occurred but the handler
- * was not allowed to call ProcessCatchupEvent at the time.
- *
- * NB: the "volatile" on these declarations is critical! If your compiler
- * does not grok "volatile", you'd be best advised to compile this file
- * with all optimization turned off.
+ * The signal handler will set a interrupt pending flag and will set the
+ * processes latch. Whenever starting to read from the client, or when
+ * interrupted while doing so, ProcessClientReadInterrupt() will call
+ * ProcessCatchupEvent().
*/
-static volatile int catchupInterruptEnabled = 0;
-static volatile int catchupInterruptOccurred = 0;
-
-static void ProcessCatchupEvent(void);
+volatile sig_atomic_t catchupInterruptPending = false;
/*
@@ -141,9 +135,9 @@ ReceiveSharedInvalidMessages(
* catchup signal this way avoids creating spikes in system load for what
* should be just a background maintenance activity.
*/
- if (catchupInterruptOccurred)
+ if (catchupInterruptPending)
{
- catchupInterruptOccurred = 0;
+ catchupInterruptPending = false;
elog(DEBUG4, "sinval catchup complete, cleaning queue");
SICleanupQueue(false, 0);
}
@@ -155,12 +149,9 @@ ReceiveSharedInvalidMessages(
*
* This is called when PROCSIG_CATCHUP_INTERRUPT is received.
*
- * If we are idle (catchupInterruptEnabled is set), we can safely
- * invoke ProcessCatchupEvent directly. Otherwise, just set a flag
- * to do it later. (Note that it's quite possible for normal processing
- * of the current transaction to cause ReceiveSharedInvalidMessages()
- * to be run later on; in that case the flag will get cleared again,
- * since there's no longer any reason to do anything.)
+ * We used to directly call ProcessCatchupEvent directly when idle. These days
+ * we just set a flag to do it later and notify the process of that fact by
+ * setting the process's latch.
*/
void
HandleCatchupInterrupt(void)
@@ -170,174 +161,46 @@ HandleCatchupInterrupt(void)
* you do here.
*/
- /* Don't joggle the elbow of proc_exit */
- if (proc_exit_inprogress)
- return;
-
- if (catchupInterruptEnabled)
- {
- bool save_ImmediateInterruptOK = ImmediateInterruptOK;
-
- /*
- * We may be called while ImmediateInterruptOK is true; turn it off
- * while messing with the catchup state. This prevents problems if
- * SIGINT or similar arrives while we're working. Just to be real
- * sure, bump the interrupt holdoff counter as well. That way, even
- * if something inside ProcessCatchupEvent() transiently sets
- * ImmediateInterruptOK (eg while waiting on a lock), we won't get
- * interrupted until we're done with the catchup interrupt.
- */
- ImmediateInterruptOK = false;
- HOLD_INTERRUPTS();
-
- /*
- * I'm not sure whether some flavors of Unix might allow another
- * SIGUSR1 occurrence to recursively interrupt this routine. To cope
- * with the possibility, we do the same sort of dance that
- * EnableCatchupInterrupt must do --- see that routine for comments.
- */
- catchupInterruptEnabled = 0; /* disable any recursive signal */
- catchupInterruptOccurred = 1; /* do at least one iteration */
- for (;;)
- {
- catchupInterruptEnabled = 1;
- if (!catchupInterruptOccurred)
- break;
- catchupInterruptEnabled = 0;
- if (catchupInterruptOccurred)
- {
- /* Here, it is finally safe to do stuff. */
- ProcessCatchupEvent();
- }
- }
+ catchupInterruptPending = true;
- /*
- * Restore the holdoff level and ImmediateInterruptOK, and check for
- * interrupts if needed.
- */
- RESUME_INTERRUPTS();
- ImmediateInterruptOK = save_ImmediateInterruptOK;
- if (save_ImmediateInterruptOK)
- CHECK_FOR_INTERRUPTS();
- }
- else
- {
- /*
- * In this path it is NOT SAFE to do much of anything, except this:
- */
- catchupInterruptOccurred = 1;
- }
+ /* make sure the event is processed in due course */
+ SetLatch(MyLatch);
}
/*
- * EnableCatchupInterrupt
- *
- * This is called by the PostgresMain main loop just before waiting
- * for a frontend command. We process any pending catchup events,
- * and enable the signal handler to process future events directly.
+ * ProcessCatchupInterrupt
*
- * NOTE: the signal handler starts out disabled, and stays so until
- * PostgresMain calls this the first time.
+ * The portion of catchup interrupt handling that runs outside of the signal
+ * handler, which allows it to actually process pending invalidations.
*/
void
-EnableCatchupInterrupt(void)
+ProcessCatchupInterrupt(void)
{
- /*
- * This code is tricky because we are communicating with a signal handler
- * that could interrupt us at any point. If we just checked
- * catchupInterruptOccurred and then set catchupInterruptEnabled, we could
- * fail to respond promptly to a signal that happens in between those two
- * steps. (A very small time window, perhaps, but Murphy's Law says you
- * can hit it...) Instead, we first set the enable flag, then test the
- * occurred flag. If we see an unserviced interrupt has occurred, we
- * re-clear the enable flag before going off to do the service work. (That
- * prevents re-entrant invocation of ProcessCatchupEvent() if another
- * interrupt occurs.) If an interrupt comes in between the setting and
- * clearing of catchupInterruptEnabled, then it will have done the service
- * work and left catchupInterruptOccurred zero, so we have to check again
- * after clearing enable. The whole thing has to be in a loop in case
- * another interrupt occurs while we're servicing the first. Once we get
- * out of the loop, enable is set and we know there is no unserviced
- * interrupt.
- *
- * NB: an overenthusiastic optimizing compiler could easily break this
- * code. Hopefully, they all understand what "volatile" means these days.
- */
- for (;;)
+ while (catchupInterruptPending)
{
- catchupInterruptEnabled = 1;
- if (!catchupInterruptOccurred)
- break;
- catchupInterruptEnabled = 0;
- if (catchupInterruptOccurred)
- ProcessCatchupEvent();
- }
-}
-
-/*
- * DisableCatchupInterrupt
- *
- * This is called by the PostgresMain main loop just after receiving
- * a frontend command. Signal handler execution of catchup events
- * is disabled until the next EnableCatchupInterrupt call.
- *
- * The PROCSIG_NOTIFY_INTERRUPT signal handler also needs to call this,
- * so as to prevent conflicts if one signal interrupts the other. So we
- * must return the previous state of the flag.
- */
-bool
-DisableCatchupInterrupt(void)
-{
- bool result = (catchupInterruptEnabled != 0);
-
- catchupInterruptEnabled = 0;
-
- return result;
-}
-
-/*
- * ProcessCatchupEvent
- *
- * Respond to a catchup event (PROCSIG_CATCHUP_INTERRUPT) from another
- * backend.
- *
- * This is called either directly from the PROCSIG_CATCHUP_INTERRUPT
- * signal handler, or the next time control reaches the outer idle loop
- * (assuming there's still anything to do by then).
- */
-static void
-ProcessCatchupEvent(void)
-{
- bool notify_enabled;
-
- /* Must prevent notify interrupt while I am running */
- notify_enabled = DisableNotifyInterrupt();
-
- /*
- * What we need to do here is cause ReceiveSharedInvalidMessages() to run,
- * which will do the necessary work and also reset the
- * catchupInterruptOccurred flag. If we are inside a transaction we can
- * just call AcceptInvalidationMessages() to do this. If we aren't, we
- * start and immediately end a transaction; the call to
- * AcceptInvalidationMessages() happens down inside transaction start.
- *
- * It is awfully tempting to just call AcceptInvalidationMessages()
- * without the rest of the xact start/stop overhead, and I think that
- * would actually work in the normal case; but I am not sure that things
- * would clean up nicely if we got an error partway through.
- */
- if (IsTransactionOrTransactionBlock())
- {
- elog(DEBUG4, "ProcessCatchupEvent inside transaction");
- AcceptInvalidationMessages();
- }
- else
- {
- elog(DEBUG4, "ProcessCatchupEvent outside transaction");
- StartTransactionCommand();
- CommitTransactionCommand();
+ /*
+ * What we need to do here is cause ReceiveSharedInvalidMessages() to
+ * run, which will do the necessary work and also reset the
+ * catchupInterruptPending flag. If we are inside a transaction we
+ * can just call AcceptInvalidationMessages() to do this. If we
+ * aren't, we start and immediately end a transaction; the call to
+ * AcceptInvalidationMessages() happens down inside transaction start.
+ *
+ * It is awfully tempting to just call AcceptInvalidationMessages()
+ * without the rest of the xact start/stop overhead, and I think that
+ * would actually work in the normal case; but I am not sure that things
+ * would clean up nicely if we got an error partway through.
+ */
+ if (IsTransactionOrTransactionBlock())
+ {
+ elog(DEBUG4, "ProcessCatchupEvent inside transaction");
+ AcceptInvalidationMessages();
+ }
+ else
+ {
+ elog(DEBUG4, "ProcessCatchupEvent outside transaction");
+ StartTransactionCommand();
+ CommitTransactionCommand();
+ }
}
-
- if (notify_enabled)
- EnableNotifyInterrupt();
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bf006b..c4e3a61 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -302,17 +302,24 @@ InteractiveBackend(StringInfo inBuf)
* interactive_getc -- collect one character from stdin
*
* Even though we are not reading from a "client" process, we still want to
- * respond to signals, particularly SIGTERM/SIGQUIT. Hence we must use
- * prepare_for_client_read and client_read_ended.
+ * respond to signals, particularly SIGTERM/SIGQUIT. FIXME.
*/
static int
interactive_getc(void)
{
int c;
- prepare_for_client_read();
+ /*
+ * FIXME: this will not process catchup interrupts or notifications while
+ * reading. But those can't really be relevant for a standalone backend
+ * anyway?
+ */
+ ProcessClientReadInterrupt();
+
c = getc(stdin);
- client_read_ended();
+
+ ProcessClientReadInterrupt();
+
return c;
}
@@ -487,53 +494,33 @@ ReadCommand(StringInfo inBuf)
}
/*
- * prepare_for_client_read -- set up to possibly block on client input
+ * ProcessClientReadInterrupt() - Process interrupts specific to client reads
*
- * This must be called immediately before any low-level read from the
- * client connection. It is necessary to do it at a sufficiently low level
- * that there won't be any other operations except the read kernel call
- * itself between this call and the subsequent client_read_ended() call.
- * In particular there mustn't be use of malloc() or other potentially
- * non-reentrant libc functions. This restriction makes it safe for us
- * to allow interrupt service routines to execute nontrivial code while
- * we are waiting for input.
- */
-void
-prepare_for_client_read(void)
-{
- if (DoingCommandRead)
- {
- /* Enable immediate processing of asynchronous signals */
- EnableNotifyInterrupt();
- EnableCatchupInterrupt();
-
- /* Allow cancel/die interrupts to be processed while waiting */
- ImmediateInterruptOK = true;
-
- /* And don't forget to detect one that already arrived */
- CHECK_FOR_INTERRUPTS();
- }
-}
-
-/*
- * client_read_ended -- get out of the client-input state
+ * This is called just after low-level reads. That might be after the read
+ * finished successfully, or it was interrupted via interrupt.
*
- * This is called just after low-level reads. It must preserve errno!
+ * Must preserve errno!
*/
void
-client_read_ended(void)
+ProcessClientReadInterrupt(void)
{
+ int save_errno = errno;
+
if (DoingCommandRead)
{
- int save_errno = errno;
-
- ImmediateInterruptOK = false;
+ /* Check for general interrupts that arrived while reading */
+ CHECK_FOR_INTERRUPTS();
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
+ /* Process sinval catchup interrupts that happened while reading */
+ if (catchupInterruptPending)
+ ProcessCatchupInterrupt();
- errno = save_errno;
+ /* Process sinval catchup interrupts that happened while reading */
+ if (notifyInterruptPending)
+ ProcessNotifyInterrupt();
}
+
+ errno = save_errno;
}
@@ -2590,8 +2577,8 @@ die(SIGNAL_ARGS)
ProcDiePending = true;
/*
- * If it's safe to interrupt, and we're waiting for input or a lock,
- * service the interrupt immediately
+ * If it's safe to interrupt, and we're waiting for a lock, service
+ * the interrupt immediately
*/
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
@@ -2600,8 +2587,6 @@ die(SIGNAL_ARGS)
/* until we are done getting ready for it */
InterruptHoldoffCount++;
LockErrorCleanup(); /* prevent CheckDeadLock from running */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
InterruptHoldoffCount--;
ProcessInterrupts();
}
@@ -2631,8 +2616,8 @@ StatementCancelHandler(SIGNAL_ARGS)
QueryCancelPending = true;
/*
- * If it's safe to interrupt, and we're waiting for input or a lock,
- * service the interrupt immediately
+ * If it's safe to interrupt, and we're waiting for a lock, service
+ * the interrupt immediately
*/
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
@@ -2641,8 +2626,6 @@ StatementCancelHandler(SIGNAL_ARGS)
/* until we are done getting ready for it */
InterruptHoldoffCount++;
LockErrorCleanup(); /* prevent CheckDeadLock from running */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
InterruptHoldoffCount--;
ProcessInterrupts();
}
@@ -2788,8 +2771,8 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
RecoveryConflictRetryable = false;
/*
- * If it's safe to interrupt, and we're waiting for input or a lock,
- * service the interrupt immediately
+ * If it's safe to interrupt, and we're waiting for a lock, service
+ * the interrupt immediately
*/
if (ImmediateInterruptOK && InterruptHoldoffCount == 0 &&
CritSectionCount == 0)
@@ -2798,8 +2781,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
/* until we are done getting ready for it */
InterruptHoldoffCount++;
LockErrorCleanup(); /* prevent CheckDeadLock from running */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
InterruptHoldoffCount--;
ProcessInterrupts();
}
@@ -2836,8 +2817,6 @@ ProcessInterrupts(void)
ProcDiePending = false;
QueryCancelPending = false; /* ProcDie trumps QueryCancel */
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
/* As in quickdie, don't risk sending to client during auth */
if (ClientAuthInProgress && whereToSendOutput == DestRemote)
whereToSendOutput = DestNone;
@@ -2872,8 +2851,6 @@ ProcessInterrupts(void)
{
QueryCancelPending = false; /* lost connection trumps QueryCancel */
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
/* don't send to client, we already know the connection to be dead. */
whereToSendOutput = DestNone;
ereport(FATAL,
@@ -2886,8 +2863,6 @@ ProcessInterrupts(void)
if (ClientAuthInProgress)
{
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
/* As in quickdie, don't risk sending to client during auth */
if (whereToSendOutput == DestRemote)
whereToSendOutput = DestNone;
@@ -2904,8 +2879,6 @@ ProcessInterrupts(void)
{
ImmediateInterruptOK = false; /* not idle anymore */
(void) get_timeout_indicator(STATEMENT_TIMEOUT, true);
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
ereport(ERROR,
(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
errmsg("canceling statement due to lock timeout")));
@@ -2913,8 +2886,6 @@ ProcessInterrupts(void)
if (get_timeout_indicator(STATEMENT_TIMEOUT, true))
{
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to statement timeout")));
@@ -2922,8 +2893,6 @@ ProcessInterrupts(void)
if (IsAutoVacuumWorkerProcess())
{
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling autovacuum task")));
@@ -2932,8 +2901,6 @@ ProcessInterrupts(void)
{
ImmediateInterruptOK = false; /* not idle anymore */
RecoveryConflictPending = false;
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
pgstat_report_recovery_conflict(RecoveryConflictReason);
if (DoingCommandRead)
ereport(FATAL,
@@ -2957,13 +2924,12 @@ ProcessInterrupts(void)
if (!DoingCommandRead)
{
ImmediateInterruptOK = false; /* not idle anymore */
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
ereport(ERROR,
(errcode(ERRCODE_QUERY_CANCELED),
errmsg("canceling statement due to user request")));
}
}
+
/* If we get here, do nothing (probably, QueryCancelPending was reset) */
}
@@ -3810,13 +3776,9 @@ PostgresMain(int argc, char *argv[],
QueryCancelPending = false; /* second to avoid race condition */
/*
- * Turn off these interrupts too. This is only needed here and not in
- * other exception-catching places since these interrupts are only
- * enabled while we wait for client input.
+ * Not reading from the client anymore.
*/
DoingCommandRead = false;
- DisableNotifyInterrupt();
- DisableCatchupInterrupt();
/* Make sure libpq is in a good state */
pq_comm_reset();
diff --git a/src/include/commands/async.h b/src/include/commands/async.h
index 87c3abb..8491f47 100644
--- a/src/include/commands/async.h
+++ b/src/include/commands/async.h
@@ -13,6 +13,8 @@
#ifndef ASYNC_H
#define ASYNC_H
+#include <signal.h>
+
#include "fmgr.h"
/*
@@ -21,6 +23,7 @@
#define NUM_ASYNC_BUFFERS 8
extern bool Trace_notify;
+extern volatile sig_atomic_t notifyInterruptPending;
extern Size AsyncShmemSize(void);
extern void AsyncShmemInit(void);
@@ -48,12 +51,7 @@ extern void ProcessCompletedNotifies(void);
/* signal handler for inbound notifies (PROCSIG_NOTIFY_INTERRUPT) */
extern void HandleNotifyInterrupt(void);
-/*
- * enable/disable processing of inbound notifies directly from signal handler.
- * The enable routine first performs processing of any inbound notifies that
- * have occurred since the last disable.
- */
-extern void EnableNotifyInterrupt(void);
-extern bool DisableNotifyInterrupt(void);
+/* process interrupts */
+extern void ProcessNotifyInterrupt(void);
#endif /* ASYNC_H */
diff --git a/src/include/storage/sinval.h b/src/include/storage/sinval.h
index 1a6f2df..d9ffd72 100644
--- a/src/include/storage/sinval.h
+++ b/src/include/storage/sinval.h
@@ -14,8 +14,9 @@
#ifndef SINVAL_H
#define SINVAL_H
-#include "storage/relfilenode.h"
+#include <signal.h>
+#include "storage/relfilenode.h"
/*
* We support several types of shared-invalidation messages:
@@ -123,6 +124,7 @@ typedef union
/* Counter of messages processed; don't worry about overflow. */
extern uint64 SharedInvalidMessageCounter;
+extern volatile sig_atomic_t catchupInterruptPending;
extern void SendSharedInvalidMessages(const SharedInvalidationMessage *msgs,
int n);
@@ -138,8 +140,7 @@ extern void HandleCatchupInterrupt(void);
* The enable routine first performs processing of any catchup events that
* have occurred since the last disable.
*/
-extern void EnableCatchupInterrupt(void);
-extern bool DisableCatchupInterrupt(void);
+extern void ProcessCatchupInterrupt(void);
extern int xactGetCommittedInvalidationMessages(SharedInvalidationMessage **msgs,
bool *RelcacheInitFileInval);
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 0a350fd..fe8c725 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -67,8 +67,8 @@ extern void StatementCancelHandler(SIGNAL_ARGS);
extern void FloatExceptionHandler(SIGNAL_ARGS) __attribute__((noreturn));
extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
* handler */
-extern void prepare_for_client_read(void);
-extern void client_read_ended(void);
+extern void ProcessClientReadInterrupt(void);
+
extern void process_postgres_switches(int argc, char *argv[],
GucContext ctx, const char **dbname);
extern void PostgresMain(int argc, char *argv[],
--
2.2.1.212.gc5b9256
0006-WIP-Process-die-interrupts-while-reading-writing-fro.patchtext/x-patch; charset=us-asciiDownload
>From d0887c4db8b38ffb27061e7f7709d0f9e2402392 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Sun, 28 Sep 2014 00:22:39 +0200
Subject: [PATCH 6/6] WIP: Process 'die' interrupts while reading/writing from
a socket.
Per discussion with Kyotaro HORIGUCHI and Heikki Linnakangas
---
src/backend/libpq/be-secure-openssl.c | 7 +++++--
src/backend/libpq/be-secure.c | 30 ++++++++++++++----------------
src/backend/tcop/postgres.c | 25 +++++++++++++++++++++++++
src/include/tcop/tcopprot.h | 1 +
4 files changed, 45 insertions(+), 18 deletions(-)
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 3a70f43..1fc4c76 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -634,10 +634,13 @@ wloop:
/* Don't retry if the socket is in nonblocking mode. */
if (port->noblock)
break;
+
/*
- * XXX: We'll, at some later point, likely want to add interrupt
- * processing here.
+ * Check for interrupts here, in addition to secure_write(),
+ * because a interrupted write in secure_raw_write() will only
+ * return here, not secure_write().
*/
+ ProcessClientWriteInterrupt(true);
goto wloop;
case SSL_ERROR_SYSCALL:
/* leave it to caller to ereport the value of errno */
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 71742a6..cae2955 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -224,16 +224,10 @@ retry:
n = secure_raw_write(port, ptr, len);
}
- /*
- * XXX: We'll, at some later point, likely want to add interrupt
- * processing here.
- */
+ /* Process interrupts that happened while (or before) writing. */
+ ProcessClientWriteInterrupt(!port->noblock && n < 0);
- /*
- * Retry after processing interrupts. This can be triggered even though we
- * don't check for latch set's during writing yet, because SSL
- * renegotiations might have required reading from the socket.
- */
+ /* retry after processing interrupts */
if (n < 0 && errno == EINTR)
{
goto retry;
@@ -262,16 +256,20 @@ wloop:
int w;
int save_errno = errno;
- /*
- * XXX: We'll, at some later point, likely want to add interrupt
- * processing here. If set we'd not retry directly, but return. That
- * way we don't do anything while (possibly) inside a ssl library.
- */
w = WaitLatchOrSocket(MyLatch,
- WL_SOCKET_WRITEABLE,
+ WL_LATCH_SET | WL_SOCKET_WRITEABLE,
port->sock, 0);
- if (w & WL_SOCKET_WRITEABLE)
+ if (w & WL_LATCH_SET)
+ {
+ ResetLatch(&MyProc->procLatch);
+ /*
+ * Force a return, so interrupts can be processed when not
+ * (possibly) underneath a ssl library.
+ */
+ errno = EINTR;
+ }
+ else if (w & WL_SOCKET_WRITEABLE)
{
goto wloop;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index c4e3a61..0e5b317 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -519,10 +519,35 @@ ProcessClientReadInterrupt(void)
if (notifyInterruptPending)
ProcessNotifyInterrupt();
}
+ else if (ProcDiePending)
+ {
+ /*
+ * We're dying. It's safe (and sane) to handle that now.
+ */
+ CHECK_FOR_INTERRUPTS();
+ }
errno = save_errno;
}
+void
+ProcessClientWriteInterrupt(bool blocked)
+{
+ /*
+ * We only want to process the interrupt here if socket writes are
+ * blocking to increase the chance to get an error message to the
+ * client. If we're not blocked there'll soon be a
+ * CHECK_FOR_INTERRUPTS(). But if we're blocked we'll never get out of
+ * that situation if the client has died.
+ */
+ if (ProcDiePending && blocked)
+ {
+ /*
+ * We're dying. It's safe (and sane) to handle that now.
+ */
+ CHECK_FOR_INTERRUPTS();
+ }
+}
/*
* Do raw parsing (only).
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index fe8c725..fc04a3e 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -68,6 +68,7 @@ extern void FloatExceptionHandler(SIGNAL_ARGS) __attribute__((noreturn));
extern void RecoveryConflictInterrupt(ProcSignalReason reason); /* called from SIGUSR1
* handler */
extern void ProcessClientReadInterrupt(void);
+extern void ProcessClientWriteInterrupt(bool blocked);
extern void process_postgres_switches(int argc, char *argv[],
GucContext ctx, const char **dbname);
--
2.2.1.212.gc5b9256
On 2014-09-04 08:49:22 -0400, Robert Haas wrote:
On Tue, Sep 2, 2014 at 3:01 PM, Andres Freund <andres@2ndquadrant.com> wrote:
I'm slightly worried about the added overhead due to the latch code. In
my implementation I only use latches after a nonblocking read, but
still. Every WaitLatchOrSocket() does a drainSelfPipe(). I wonder if
that can be made problematic.I think that's not the word you're looking for.
There's a "less" missing...
At some point I hacked up a very crude prototype that made LWLocks use
latches to sleep instead of semaphores. It was slow.
Interesting. I dimly remembered you mentioning this, that's how I
rediscovered this message.
Do you remember any details?
My guess that's not so much the overhead of the latch itself, but the
lack of the directed wakeup stuff the OS provides for semaphores.
If we could replace all usages of semaphores that set immediate
interrupts to ok, we could quite easily make the deadlock detector
et. al. run outside of signal handlers. That would imo make it more
robust, and easier to understand - right now the correctness of locking
done in the deadlock detector isn't obvious. With the infrastructure in
place it'd also allow your new parallelism code to run outside of signal
handlers.
Unfortunately currently sempahores can't be unlocked in a signal handler
(as sysv semaphores aren't signal safe)... It'd also be not so nice to
set both a latch and semaphores in every signal handler.
AIUI, the only reason why we need the self-pipe thing is because on
some platforms signals don't interrupt system calls. But my
impression was that those platforms were somewhat obscure.
To the contrary, I think it's only very obscure platforms where signals
still interrupt syscalls - we set SA_RESTART for pretty much
everything. There's a couple of system calls that ignore SA_RESTART. For
some that's defined in posix, for others it's operating system
specific. E.g. on linux semop(), poll(), select() are defined to always
return EINTR when interrupted.
Anyway, the discussion since cleared up that we need the self byte to
handle a race, anyway.
Basically, it doesn't feel like a good thing that we've got two sets
of primitives for making a backend wait that (1) don't really know
about each other and (2) use different operating system primitives.
Presumably one of the two systems is better; let's figure out which
one it is, use that one all the time, and get rid of the other one.
I think the latch interface is clearly better for what we use
sema/latches for as it allows to wait for signals (latch sets), a socket
and timeouts. So let's try to figure out how to make it perform
comparably or better than semaphores.
There's imo only one semaphore user that can't trivially be replaced by
latches: the semaphore spinlock emulation. Both proc.c and and lwlock.c
can be converted quite easily - in the latter case, it might actually
end up saving some code.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
0001-Allow-latches-to-wait-for-socket-writability-without.patch
Imo pretty close to commit and can be committed independently.
The key open question is whether all platforms of interest can reliably detect
end-of-file when poll()ing or select()ing for write only. Older GNU/Linux
select() cannot; see attached test program. We use poll() there anyway, so
the bug in that configuration does not affect PostgreSQL. Is it a bellwether
of similar bugs in other implementations, bugs that will affect PostgreSQL?
This previously had explicitly been forbidden in e42a21b9e6c9, as
there was no use case at that point. We now are looking into making
FE/BE communication use latches, so it
Truncated sentence.
+ if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL)) + { + /* EOF/error condition */ + if (wakeEvents & WL_SOCKET_READABLE) + result |= WL_SOCKET_READABLE; + if (wakeEvents & WL_SOCKET_WRITEABLE) + result |= WL_SOCKET_WRITEABLE; + }
With some poll() implementations (e.g. OS X), this can wrongly report
WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR). I tentatively think
that's acceptable. libpq does not use shutdown(), and other client interfaces
would do so at their own risk. Should we worry about hostile clients creating
a denial-of-service by causing a server send() to block unexpectedly?
Probably not; a user able to send arbitrary TCP traffic to the postmaster port
can already achieve that.
+ if (resEvents.lNetworkEvents & FD_CLOSE) + { + if (wakeEvents & WL_SOCKET_READABLE) + result |= WL_SOCKET_READABLE; + if (wakeEvents & WL_SOCKET_WRITEABLE) + result |= WL_SOCKET_WRITEABLE; + } + }
Extra blank line.
Attachments:
On Sat, Jan 10, 2015 at 11:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Interesting. I dimly remembered you mentioning this, that's how I
rediscovered this message.Do you remember any details?
No, not really.
My guess that's not so much the overhead of the latch itself, but the
lack of the directed wakeup stuff the OS provides for semaphores.
That's possible.
If we could replace all usages of semaphores that set immediate
interrupts to ok, we could quite easily make the deadlock detector
et. al. run outside of signal handlers. That would imo make it more
robust, and easier to understand - right now the correctness of locking
done in the deadlock detector isn't obvious. With the infrastructure in
place it'd also allow your new parallelism code to run outside of signal
handlers.
Yes, I would be very happy to see ImmediateInterruptOK die in a fire.
Unfortunately currently sempahores can't be unlocked in a signal handler
(as sysv semaphores aren't signal safe)... It'd also be not so nice to
set both a latch and semaphores in every signal handler.
Agreed.
AIUI, the only reason why we need the self-pipe thing is because on
some platforms signals don't interrupt system calls. But my
impression was that those platforms were somewhat obscure.To the contrary, I think it's only very obscure platforms where signals
still interrupt syscalls - we set SA_RESTART for pretty much
everything. There's a couple of system calls that ignore SA_RESTART. For
some that's defined in posix, for others it's operating system
specific. E.g. on linux semop(), poll(), select() are defined to always
return EINTR when interrupted.
The recent problems with test_shm_mq test failing on anole were caused
by the fact that a signal doesn't abort select() on that platform, but
does reset the timer. So a steady stream of signals results in never
reaching the timeout.
Anyway, the discussion since cleared up that we need the self byte to
handle a race, anyway.
Eh?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-11 16:36:07 -0500, Noah Misch wrote:
On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
0001-Allow-latches-to-wait-for-socket-writability-without.patch
Imo pretty close to commit and can be committed independently.The key open question is whether all platforms of interest can reliably detect
end-of-file when poll()ing or select()ing for write only. Older GNU/Linux
select() cannot; see attached test program.
Yuck. By my reading that's a violation of posix.
I did test it a bit, and I didn't see problems, but that obviously
doesn't say much about old versions.
Afaics we interestingly don't have any poll-less buildfarm animals that
use unix_latch.c...
We use poll() there anyway, so the bug in that configuration does not
affect PostgreSQL. Is it a bellwether of similar bugs in other
implementations, bugs that will affect PostgreSQL?
Hm. I can think of two stopgap measures we could add:
1) If we're using select() and WL_SOCKET_WRITEABLE is set without
_READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
time spent waiting only for writable normally shouldn't be very long,
that shouldn't be noticeably bad for power usage.
2) Add a SIGPIPE handler that just does a SetLatch(MyLach).
This previously had explicitly been forbidden in e42a21b9e6c9, as
there was no use case at that point. We now are looking into making
FE/BE communication use latches, so itTruncated sentence.
Fixed in what I've since pushed (as Heikki basically was ok with the
patch sent a couple months back, modulo some fixes)...
+ if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL)) + { + /* EOF/error condition */ + if (wakeEvents & WL_SOCKET_READABLE) + result |= WL_SOCKET_READABLE; + if (wakeEvents & WL_SOCKET_WRITEABLE) + result |= WL_SOCKET_WRITEABLE; + }With some poll() implementations (e.g. OS X), this can wrongly report
WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR). I tentatively think
that's acceptable. libpq does not use shutdown(), and other client interfaces
would do so at their own risk. Should we worry about hostile clients creating
a denial-of-service by causing a server send() to block unexpectedly?
Probably not; a user able to send arbitrary TCP traffic to the postmaster port
can already achieve that.
Yea, this doesn't seem particularly concerning.
a) They can just stop consuming writes and use noticeable amounts of
memory by doing output intensive queries. That uses significant os
resources and is much harder to detect - today.
b) does accepting WL_SOCKET_WRITEABLE without _READABLE change anything
here? We already allow _WRITABLE... What happens if you write/send() in
that state, btw?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-11 16:47:53 -0500, Robert Haas wrote:
My guess that's not so much the overhead of the latch itself, but the
lack of the directed wakeup stuff the OS provides for semaphores.That's possible.
On Sat, Jan 10, 2015 at 11:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Interesting. I dimly remembered you mentioning this, that's how I
rediscovered this message.Do you remember any details?
No, not really.
I've done a hackish conversion of proc.c to use semaphores and in a
sleeping heavy workload (pgbench -j 390 against a scale 1 db) there
originally was about a %5 regression (fluctuating a fair bit).
I've played with hacking unix_latch.c to
a) use eventfd() for the local latch and (no performance difference
anymore)
b) Adding a eventfd to struct Latch for latches that are created from
postmaster. That fd can directly be written to by other processes
(combined slightly faster than semaphores).
Not sure how much b) is acceptable, due to using MaxBackend fds. And
both only help linux.
If we could replace all usages of semaphores that set immediate
interrupts to ok, we could quite easily make the deadlock detector
et. al. run outside of signal handlers. That would imo make it more
robust, and easier to understand - right now the correctness of locking
done in the deadlock detector isn't obvious. With the infrastructure in
place it'd also allow your new parallelism code to run outside of signal
handlers.Yes, I would be very happy to see ImmediateInterruptOK die in a fire.
Yea, I think it's a absolutely horrible idea.
Unfortunately currently sempahores can't be unlocked in a signal handler
(as sysv semaphores aren't signal safe)... It'd also be not so nice to
set both a latch and semaphores in every signal handler.Agreed.
I've since (re-)realised that we've actually relied on semaphore being
signal safe for years. The PGSemaphoreLock() in proc.c allows
interrupts. And the deadlock detector then uses lwlocks, which in turn
use semaphores. That sucks.
Anyway, the discussion since cleared up that we need the self byte to
handle a race, anyway.Eh?
I'm referring to the point Heikki has made in his second paragraph in
5408638C.1080308@vmware.com .
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 12, 2015 at 12:40:50AM +0100, Andres Freund wrote:
On 2015-01-11 16:36:07 -0500, Noah Misch wrote:
On Sat, Jan 10, 2015 at 03:25:42AM +0100, Andres Freund wrote:
0001-Allow-latches-to-wait-for-socket-writability-without.patch
Imo pretty close to commit and can be committed independently.The key open question is whether all platforms of interest can reliably detect
end-of-file when poll()ing or select()ing for write only. Older GNU/Linux
select() cannot; see attached test program.Yuck. By my reading that's a violation of posix.
Agreed.
I did test it a bit, and I didn't see problems, but that obviously
doesn't say much about old versions.Afaics we interestingly don't have any poll-less buildfarm animals that
use unix_latch.c...
More likely is that some system will have a poll() exhibiting the same bug,
possibly via poll() being a wrapper around select(). Systems without poll()
are hard to find; the gnulib manual lists only Windows, BeOS and HP NonStop
OS. HP NonStop OS is the one possibly of interest here. I have never
personally seen a machine running it.
I recommend either (a) taking no action or (b) adding a regression test
verifying WaitLatchOrSocket() conformance in this scenario. Then we can
decide what more to do if failure evidence emerges.
We use poll() there anyway, so the bug in that configuration does not
affect PostgreSQL. Is it a bellwether of similar bugs in other
implementations, bugs that will affect PostgreSQL?Hm. I can think of two stopgap measures we could add:
1) If we're using select() and WL_SOCKET_WRITEABLE is set without
_READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
time spent waiting only for writable normally shouldn't be very long,
that shouldn't be noticeably bad for power usage.
2) Add a SIGPIPE handler that just does a SetLatch(MyLach).
I'm having trouble visualizing those proposed measures in detail, but I trust
that a decent workaround would emerge.
+ if (pfds[0].revents & (POLLHUP | POLLERR | POLLNVAL)) + { + /* EOF/error condition */ + if (wakeEvents & WL_SOCKET_READABLE) + result |= WL_SOCKET_READABLE; + if (wakeEvents & WL_SOCKET_WRITEABLE) + result |= WL_SOCKET_WRITEABLE; + }With some poll() implementations (e.g. OS X), this can wrongly report
WL_SOCKET_WRITEABLE if the peer used shutdown(SHUT_WR). I tentatively think
that's acceptable. libpq does not use shutdown(), and other client interfaces
would do so at their own risk. Should we worry about hostile clients creating
a denial-of-service by causing a server send() to block unexpectedly?
Probably not; a user able to send arbitrary TCP traffic to the postmaster port
can already achieve that.Yea, this doesn't seem particularly concerning.
a) They can just stop consuming writes and use noticeable amounts of
memory by doing output intensive queries. That uses significant os
resources and is much harder to detect - today.
If there's anything to worry about here (unlikely), it would be with respect
to not-yet-authenticated connections only.
b) does accepting WL_SOCKET_WRITEABLE without _READABLE change anything
here? We already allow _WRITABLE...
Today's code translates POLLHUP to WL_SOCKET_READABLE; it must see POLLOUT to
set WL_SOCKET_WRITEABLE. Your patch changes that.
What happens if you write/send() in
that state, btw?
write() reports EAGAIN.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-11 19:37:53 -0500, Noah Misch wrote:
I recommend either (a) taking no action or (b) adding a regression test
verifying WaitLatchOrSocket() conformance in this scenario.
Do you have a good idea how to test b) save a C function in regress.c
that does what your test does using latches?
Then we cane decide what more to do if failure evidence emerges.
Seems fine to me.
Hm. I can think of two stopgap measures we could add:
1) If we're using select() and WL_SOCKET_WRITEABLE is set without
_READABLE, add a timeout of Min(1s, Max(passed_timeout, 1s)). As the
time spent waiting only for writable normally shouldn't be very long,
that shouldn't be noticeably bad for power usage.
2) Add a SIGPIPE handler that just does a SetLatch(MyLach).I'm having trouble visualizing those proposed measures in detail, but I trust
that a decent workaround would emerge.
For 1) I'm thinking of just regularly causing a spurious
WL_SOCKET_WRITEABLE event via timeouts if it's the only parameter. Latch
users have to deal with spurious wakeups anyway, so that should be
mostly unproblematic.
For 2) I was thinking that for now the problem only arises for the main
FE/BE socket. So we can install a sigpipe handler that does a SetLatch()
- that will trigger WaitLatch() to return and, after checking for
interrupts, retry the actual send() - which then'd return ECONNRESET.
What happens if you write/send() in
that state, btw?write() reports EAGAIN.
Grand.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 12, 2015 at 01:45:41AM +0100, Andres Freund wrote:
On 2015-01-11 19:37:53 -0500, Noah Misch wrote:
I recommend either (a) taking no action or (b) adding a regression test
verifying WaitLatchOrSocket() conformance in this scenario.Do you have a good idea how to test b) save a C function in regress.c
that does what your test does using latches?
No, that's what I had in mind. You could probably achieve it with a libpq
program that lets input accumulate, but that's trickier.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-01-12 00:40:50 +0100, Andres Freund wrote:
Fixed in what I've since pushed (as Heikki basically was ok with the
patch sent a couple months back, modulo some fixes)...
I'd not actually pushed that patch... I had pushed some patches
(barriers, atomics), but had decided to hold off on this. I've now done
so.
I've mentioned the portability concerns over select() bugs in the commit
message & a comment. ATM I'm not inclined to add a relatively elaborate
test for the bug on pretty fringe platforms.
Thanks for looking at this!
I plan to continue with committing
1) Commonalize process startup code
2) Add a default local latch for use in signal handlers
3) Use a nonblocking socket for FE/BE communication and block using latches
pretty soon.
As we already seem to assume that WaitLatch() is signal safe/reentrant
(c.f. walsender.c), I'm fine with committing 3) in isolation, without
4). I need a test that properly exercises catchup interrupts before
committing that.
Once I have that test I plan to commit
4) Introduce and use infrastructure for interrupt processing during
client reads.
I'd like some input from others what they think about the problem that
5) "Process 'die' interrupts while reading/writing from a socket."
can reduce the likelihood of clients getting the error message. I
personally think that's more than outweighed by not having backends
stuck (save quickdie) for a long time when the client is gone/stuck. I
think the middleground in the patch to only process die events when
actually blocked in writes reduces the likelihood of this sufficiently.
I have hacks ontop this to get rid of ImmediateInterrupt alltogether,
although I'm not sure how well this will work for some parts of
auth/crypt.c. Everything else, including the deadlock checker, seems
quite doable.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Heikki,
On 2014-09-02 21:22:29 +0300, Heikki Linnakangas wrote:
On 08/28/2014 03:47 PM, Kyotaro HORIGUCHI wrote:
To make the code mentioned above (Patch 0002) tidy, rewrite the
socket emulation code for win32 backends so that each socket
can have its own non-blocking state. (patch 0001)The first patch that makes non-blocking sockets behave more sanely on
Windows seems like a good idea, independently of the second patch. I'm
looking at the first patch now, I'll make a separate post about the second
patch.
On Windows, the backend has an emulation layer for POSIX signals, which uses
threads and Windows events. The reason win32/socket.c always uses
non-blocking mode internally is that it needs to wait for the socket to
become readable/writeable, and for the signal-emulation event, at the same
time. So no, we can't remove it.The approach taken in the first patch seems sensible. I changed it to not
use FD_SET, though. A custom array seems better, that way we don't need the
pgwin32_nonblockset_init() call, we can just use initialize the variable.
It's a little bit more code, but it's well-contained in win32/socket.c.
Please take a look, to double-check that I didn't screw up.
Heikki, what's your plan about this patch? Do you plan to commit it?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers