PATCH: Keep one postmaster monitoring pipe per process

Started by Marco Pfatschbacherover 9 years ago37 messageshackers

Marco_Pfatschbacher@genua.de

over 9 years ago

Hi,

the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour[0]http://man.openbsd.org/OpenBSD-5.9/man2/select.2?manpath=OpenBSD-5.9 in the select/poll
implementation on some operating systems[1]At least OpenBSD and NetBSD are affected, FreeBSD rewrote their select implementation in 8.0.:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.

In our case the system had to wakeup ~3000 idle ssh processes
every time postgresql did call PostmasterIsAlive.
WalReceiver did run trigger with a rate of ~400 calls per second.
With the result that the system performs very badly,
being mostly busy scheduling idle processs.

Attached patch avoids the select contention by using a
separate pipe for each auxiliary and background process.
Since the postmaster has three different ways to create
new processes, the patch got a bit more complicated than
I anticipated :)

For auxiliary processes, pgstat, pgarch and the autovacuum launcher
get a preallocated pipe each. The pipes are held in:
postmaster_alive_fds_own[NUM_AUXPROCTYPES];
postmaster_alive_fds_watch[NUM_AUXPROCTYPES];

Just before we fork a new process we set postmaster_alive_fd
for each process type:
postmaster_alive_fd = postmaster_alive_fds_watch[type];

Since there can be multiple backend processes, BackendStarup()
allocates a pipe on-demand and keeps the reference in the Backend
structure. And is closed when the backend terminates.

The patch was developed and tested under OpenBSD using the REL9_4_STABLE
branch. I've merged it to current, compile tested and ran make check
on Ubuntu 14.04.

Marco

[0]: http://man.openbsd.org/OpenBSD-5.9/man2/select.2?manpath=OpenBSD-5.9
http://man.openbsd.org/OpenBSD-5.9/man2/select.2?manpath=OpenBSD-5.9

BUGS
[...]
"Internally to the kernel, select() and pselect() work poorly if multiple
processes wait on the same file descriptor. Given that, it is rather
surprising to see that many daemons are written that way."

[1]: At least OpenBSD and NetBSD are affected, FreeBSD rewrote their select implementation in 8.0.
At least OpenBSD and NetBSD are affected, FreeBSD rewrote
their select implementation in 8.0.

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Marco Pfatschbacher (#1)

Re: PATCH: Keep one postmaster monitoring pipe per process

Hi,

On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote:

the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour[0] in the select/poll
implementation on some operating systems[1]:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.

Yikes, that's a pretty absurd implementation.

Does openbsd's kqueue implementation have the same issue? There's
http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com

Attached patch avoids the select contention by using a
separate pipe for each auxiliary and background process.

I'm quite unenthusiastic about forcing that many additional file
descriptors onto the postmaster...

I'm not quite sure I understand why this an issue here - there shouldn't
ever be events on this fd, so why is the kernel waking up all processes?
It'd kinda makes sense it'd wake up all processes if there's one
waiting, but ... ?

BUGS
[...]
"Internally to the kernel, select() and pselect() work poorly if multiple
processes wait on the same file descriptor. Given that, it is rather
surprising to see that many daemons are written that way."

Gee. Maybe it's more surprising that that issue isn't being addressed?

Regards,

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@gmail.com

over 9 years ago

In reply to: Marco Pfatschbacher (#1)

Re: PATCH: Keep one postmaster monitoring pipe per process

On Fri, Sep 16, 2016 at 1:57 AM, Marco Pfatschbacher
<Marco_Pfatschbacher@genua.de> wrote:

the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour[0] in the select/poll
implementation on some operating systems[1]:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.

[...]

BUGS
[...]
"Internally to the kernel, select() and pselect() work poorly if multiple
processes wait on the same file descriptor. Given that, it is rather
surprising to see that many daemons are written that way."

[1]
At least OpenBSD and NetBSD are affected, FreeBSD rewrote
their select implementation in 8.0.

Very interesting. Perhaps that is why NetBSD shows a speedup with the
kqueue patch[1]/messages/by-id/CAEepm=2i78TOJeV4O0-0meiihiRfVQ29ur7=MBHxsUKaPSWeAg@mail.gmail.com but FreeBSD doesn't. I guess that if I could get the
kqueue patch to perform better on large FreeBSD systems, it would also
be a solution to this problem.

[1]: /messages/by-id/CAEepm=2i78TOJeV4O0-0meiihiRfVQ29ur7=MBHxsUKaPSWeAg@mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Marco Pfatschbacher (#1)

Re: PATCH: Keep one postmaster monitoring pipe per process

Marco Pfatschbacher <Marco_Pfatschbacher@genua.de> writes:

the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour[0] in the select/poll
implementation on some operating systems[1]:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.

That seems like a rather bad kernel bug.

In our case the system had to wakeup ~3000 idle ssh processes
every time postgresql did call PostmasterIsAlive.

Uh, these are processes not even connected to the pipe in question?
That's *really* a kernel bug.

Attached patch avoids the select contention by using a
separate pipe for each auxiliary and background process.

I think this would likely move the performance problems somewhere else.
In particular, it would mean that every postmaster child would inherit
pipes leading to all the older children. We could close 'em again
I guess, but we have previously found that having to do things that
way is a rather serious performance drag --- see the problems we had
with POSIX named semaphores, here for instance:
/messages/by-id/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC@apple.com
I really don't want the postmaster to be holding any per-child kernel
resources.

It'd certainly be nice if we could find another solution besides
the pipe-based one, but I don't think "more pipes" is the answer.
There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG)
when available; do the BSDen have anything like that?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Thomas Munro (#3)

Re: PATCH: Keep one postmaster monitoring pipe per process

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Very interesting. Perhaps that is why NetBSD shows a speedup with the
kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the
kqueue patch to perform better on large FreeBSD systems, it would also
be a solution to this problem.

I just noticed that kqueue appears to offer a solution to this problem,
ie one of the things you can wait for is exit of another process (named
by PID, looks like). If that's portable to all kqueue platforms, then
integrating a substitute for the postmaster death pipe might push that
patch over the hump to being a net win.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Marco Pfatschbacher

Marco_Pfatschbacher@genua.de

over 9 years ago

In reply to: Tom Lane (#4)

Re: PATCH: Keep one postmaster monitoring pipe per process

On Thu, Sep 15, 2016 at 04:24:07PM -0400, Tom Lane wrote:

Marco Pfatschbacher <Marco_Pfatschbacher@genua.de> writes:

the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour[0] in the select/poll
implementation on some operating systems[1]:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.

That seems like a rather bad kernel bug.

It's a limitation that has been there since at least BSD4.4.
But yeah, I wished things were better.

In our case the system had to wakeup ~3000 idle ssh processes
every time postgresql did call PostmasterIsAlive.

Uh, these are processes not even connected to the pipe in question?
That's *really* a kernel bug.

The kernel does not know if they were selecting on that pipe,
because the only slot to keep that mapping has been already taken.
It's not that bad of a performance hit, If the system doesn't
many processes.

Attached patch avoids the select contention by using a
separate pipe for each auxiliary and background process.

I think this would likely move the performance problems somewhere else.
In particular, it would mean that every postmaster child would inherit
pipes leading to all the older children.

I kept them at a minimum. (See ClosePostmasterPorts)

We could close 'em again
I guess, but we have previously found that having to do things that
way is a rather serious performance drag --- see the problems we had

I think closing a few files doesn't hurt too much, but I see your point.

with POSIX named semaphores, here for instance:
/messages/by-id/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC@apple.com
I really don't want the postmaster to be holding any per-child kernel
resources.

It'd certainly be nice if we could find another solution besides
the pipe-based one, but I don't think "more pipes" is the answer.
There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG)
when available; do the BSDen have anything like that?

Not that I know of.
But since the WalReceiver process seemed to be the one calling
PostmasterIsAlive way more often than the rest, maybe we could limit
the performance hit by not calling it on every received wal chunk?

Cheers,
Marco

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Marco Pfatschbacher

Marco_Pfatschbacher@genua.de

over 9 years ago

In reply to: Tom Lane (#5)

Re: PATCH: Keep one postmaster monitoring pipe per process

On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Very interesting. Perhaps that is why NetBSD shows a speedup with the
kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the
kqueue patch to perform better on large FreeBSD systems, it would also
be a solution to this problem.

I just noticed that kqueue appears to offer a solution to this problem,
ie one of the things you can wait for is exit of another process (named
by PID, looks like). If that's portable to all kqueue platforms, then
integrating a substitute for the postmaster death pipe might push that
patch over the hump to being a net win.

That sounds plausible.
I could give this a try after I get back from my vacation :)

Marco

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Marco Pfatschbacher

Marco_Pfatschbacher@genua.de

over 9 years ago

In reply to: Andres Freund (#2)

Re: PATCH: Keep one postmaster monitoring pipe per process

On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:

Hi,

On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote:

the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour[0] in the select/poll
implementation on some operating systems[1]:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.

Yikes, that's a pretty absurd implementation.

Not when you take into account that it's been written over 20 years ago ;)

Does openbsd's kqueue implementation have the same issue? There's
http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com

Looks ok, but your milage may vary. I've seen strange subtle bugs
using kqeue..

struct kevent {
__uintptr_t ident; /* identifier for this event */
short filter; /* filter for event */
u_short flags;
u_int fflags;
quad_t data;
void *udata; /* opaque user data identifier */
};

Attached patch avoids the select contention by using a
separate pipe for each auxiliary and background process.

I'm quite unenthusiastic about forcing that many additional file
descriptors onto the postmaster...

In our use case it's about 30.

I'm not quite sure I understand why this an issue here - there shouldn't
ever be events on this fd, so why is the kernel waking up all processes?
It'd kinda makes sense it'd wake up all processes if there's one
waiting, but ... ?

Every read is an event, and that's what PostmasterIsAlive does.

Marco

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Marco Pfatschbacher (#8)

Re: PATCH: Keep one postmaster monitoring pipe per process

Hi,

On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote:

On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:

Yikes, that's a pretty absurd implementation.

Not when you take into account that it's been written over 20 years ago ;)

Well, that doesn't mean it can't be fixed ;)

I'm not quite sure I understand why this an issue here - there shouldn't
ever be events on this fd, so why is the kernel waking up all processes?
It'd kinda makes sense it'd wake up all processes if there's one
waiting, but ... ?

Every read is an event, and that's what PostmasterIsAlive does.

But in most places we only do a PostmasterIsAlive if WaitLatch returns
WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is
WalRcvWaitForStartPosition(). If that's indeed the cause of your issues
this quite possibly could be fixed by doing the
if (!PostmasterIsAlive())
exit(1);
check not unconditionally, but only after the WaitLatch at the end of
the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
seems fine, we'll just loop once more outside (after a quick glance at
least).

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Andres Freund (#9)

Re: PATCH: Keep one postmaster monitoring pipe per process

On Fri, Sep 16, 2016 at 2:23 PM, Andres Freund <andres@anarazel.de> wrote:

Every read is an event, and that's what PostmasterIsAlive does.

But in most places we only do a PostmasterIsAlive if WaitLatch returns
WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is
WalRcvWaitForStartPosition(). If that's indeed the cause of your issues
this quite possibly could be fixed by doing the
if (!PostmasterIsAlive())
exit(1);
check not unconditionally, but only after the WaitLatch at the end of
the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
seems fine, we'll just loop once more outside (after a quick glance at
least).

At least some of the latch implementations already check
PostmasterIsAlive() internally to avoid returning spurious events; and
secure_read() at least assumes that the WL_POSTMASTER_DEATH return is
reliable and doesn't need a double-check.

So we can probably just remove the check altogether and instead bail
out if it returns WL_POSTMASTER_DEATH. That probably saves a system
call per loop iteration even on platforms where the kernel doesn't
exhibit any surprising behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Thomas Munro

thomas.munro@gmail.com

over 9 years ago

In reply to: Andres Freund (#9)

Re: PATCH: Keep one postmaster monitoring pipe per process

On Sat, Sep 17, 2016 at 6:23 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote:

On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:

Yikes, that's a pretty absurd implementation.

Not when you take into account that it's been written over 20 years ago ;)

Well, that doesn't mean it can't be fixed ;)

I'm not quite sure I understand why this an issue here - there shouldn't
ever be events on this fd, so why is the kernel waking up all processes?
It'd kinda makes sense it'd wake up all processes if there's one
waiting, but ... ?

Every read is an event, and that's what PostmasterIsAlive does.

But in most places we only do a PostmasterIsAlive if WaitLatch returns
WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is
WalRcvWaitForStartPosition(). If that's indeed the cause of your issues
this quite possibly could be fixed by doing the
if (!PostmasterIsAlive())
exit(1);
check not unconditionally, but only after the WaitLatch at the end of
the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
seems fine, we'll just loop once more outside (after a quick glance at
least).

Yeah, I wondered why that was different than the pattern established
elsewhere when I was hacking on replication code. There are actually
several places where we call PostmasterIsAlive() unconditionally in a
loop that waits for WL_POSTMASTER_DEATH but ignores the return code:
in pgarch.c, syncrep.c, walsender.c and walreceiver.c. Should we just
change them all to check the return code and exit/break/ereport/etc as
appropriate? That would match the code from autovacuum.c,
checkpointer.c, pgstat.c, be-secure.c and bgworker.c. Something like
the attached.

The code in basebackup.c also waits for WL_POSTMASTER_DEATH but
doesn't check for it in the return value *or* call
PostmasterIsAlive(). I'm not sure what to make of that. I didn't
test it but it looks like maybe it would continue running after
postmaster death but not honour the throttling rate limit because
WaitLatch would keep returning immediately.

--
Thomas Munro
http://www.enterprisedb.com

#12

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Thomas Munro (#11)

Re: PATCH: Keep one postmaster monitoring pipe per process

On 2016-09-20 11:07:03 +1200, Thomas Munro wrote:

Yeah, I wondered why that was different than the pattern established
elsewhere when I was hacking on replication code. There are actually
several places where we call PostmasterIsAlive() unconditionally in a
loop that waits for WL_POSTMASTER_DEATH but ignores the return code:

Note that just changing this implies a behavioural change in at least
some of those: If the loop is busy with work, the WaitLatch might never
be reached. I think that might be ok in most of those, but it does
require examination.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Marco Pfatschbacher

Marco_Pfatschbacher@genua.de

over 8 years ago

In reply to: Tom Lane (#5)

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Very interesting. Perhaps that is why NetBSD shows a speedup with the
kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the
kqueue patch to perform better on large FreeBSD systems, it would also
be a solution to this problem.

I just noticed that kqueue appears to offer a solution to this problem,
ie one of the things you can wait for is exit of another process (named
by PID, looks like). If that's portable to all kqueue platforms, then
integrating a substitute for the postmaster death pipe might push that
patch over the hump to being a net win.

regards, tom lane

Hi,

sorry for the long silence.
I forgot about this after being on vacation.

kqueue does indeed support EVFILT_PROC like you said.
But using this effectively, would need a separate monitoring process,
because we cannot use poll(2) and kevent(2) together within the same loop.

But how about this:
We just use getppid to see whether the process id of our parent
has changed to 1 (init).
Just like calling read on the pipe, that's just one systemcall.

I did not bother to weed the postmaster_alive_fds yet, but
this change works fine for me.

Regards,
Marco

diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index 85db6b21f8..a0596c61fd 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -272,21 +272,16 @@ bool
 PostmasterIsAlive(void)
 {
 #ifndef WIN32
-	char		c;
-	ssize_t		rc;
+	pid_t		ppid;

-	rc = read(postmaster_alive_fds[POSTMASTER_FD_WATCH], &c, 1);
-	if (rc < 0)
+	ppid = getppid();
+	if (ppid == 1)
 	{
-		if (errno == EAGAIN || errno == EWOULDBLOCK)
-			return true;
-		else
-			elog(FATAL, "read on postmaster death monitoring pipe failed: %m");
+		elog(FATAL, "postmaster died: our ppid changed to init");
+		return false;
 	}
-	else if (rc > 0)
-		elog(FATAL, "unexpected data in postmaster death monitoring pipe");

- return false;
+ return true;
#else /* WIN32 */
return (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT);
#endif /* WIN32 */

#14

Thomas Munro

thomas.munro@gmail.com

over 8 years ago

In reply to: Marco Pfatschbacher (#13)

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

On Tue, Dec 5, 2017 at 5:55 AM, Marco Pfatschbacher
<Marco_Pfatschbacher@genua.de> wrote:

On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:

I just noticed that kqueue appears to offer a solution to this problem,
ie one of the things you can wait for is exit of another process (named
by PID, looks like). If that's portable to all kqueue platforms, then
integrating a substitute for the postmaster death pipe might push that
patch over the hump to being a net win.

kqueue does indeed support EVFILT_PROC like you said.
But using this effectively, would need a separate monitoring process,
because we cannot use poll(2) and kevent(2) together within the same loop.

FWIW the kqueue patch I posted uses EVFILT_PROC as Tom suggested and
it works fine according to my testing (FreeBSD +macOS). With that
patch, there is no poll, just kqueue everywhere.

But how about this:
We just use getppid to see whether the process id of our parent
has changed to 1 (init).
Just like calling read on the pipe, that's just one systemcall.

I'm still not sure why we have to make any system calls at all. Do we
not trust WaitLatch() to tell us about WL_POSTMASTER_DEATH reliably,
or is it that we don't trust it to tell us about that event when there
are other events happening due to priorities, or...?

--
Thomas Munro
http://www.enterprisedb.com

#15

Thomas Munro

thomas.munro@gmail.com

about 8 years ago

In reply to: Andres Freund (#12)

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

On Tue, Sep 20, 2016 at 11:26 AM, Andres Freund <andres@anarazel.de> wrote:

On 2016-09-20 11:07:03 +1200, Thomas Munro wrote:

Yeah, I wondered why that was different than the pattern established
elsewhere when I was hacking on replication code. There are actually
several places where we call PostmasterIsAlive() unconditionally in a
loop that waits for WL_POSTMASTER_DEATH but ignores the return code:

Note that just changing this implies a behavioural change in at least
some of those: If the loop is busy with work, the WaitLatch might never
be reached. I think that might be ok in most of those, but it does
require examination.

Rebased, but I don't actually like this patch any more. Over in
another thread[1]/messages/by-id/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com I proposed that we should just make exit(1) the
default behaviour built into latch.c for those cases that don't want
to do something special (eg SyncRepWaitForLSN()), so we don't finish
up duplicating the ugly exit(1) code everywhere (or worse, forgetting
to). Tom Lane seemed to think that was an idea worth pursuing.

I think what we need for PG12 is a patch that does that, and also
introduces a reused WaitEventSet object in several of these places.
Then eg SyncRepWaitForLSN() won't be interacting with contended kernel
objects every time (assuming an implementation like epoll or
eventually kqueue, completion ports etc is available).

Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
(ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
approach where available as suggested by Andres[2]/messages/by-id/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi or fall back to
polling a reusable WaitEventSet (timeout = 0), then there'd be no more
calls to PostmasterIsAlive() outside latch.c.

[1]: /messages/by-id/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com
[2]: /messages/by-id/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi

--
Thomas Munro
http://www.enterprisedb.com

#16

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Thomas Munro (#15)

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

Hi,

On 2018-04-11 11:57:20 +1200, Thomas Munro wrote:

Rebased, but I don't actually like this patch any more. Over in
another thread[1] I proposed that we should just make exit(1) the
default behaviour built into latch.c for those cases that don't want
to do something special (eg SyncRepWaitForLSN()), so we don't finish
up duplicating the ugly exit(1) code everywhere (or worse, forgetting
to). Tom Lane seemed to think that was an idea worth pursuing.

I think what we need for PG12 is a patch that does that, and also
introduces a reused WaitEventSet object in several of these places.
Then eg SyncRepWaitForLSN() won't be interacting with contended kernel
objects every time (assuming an implementation like epoll or
eventually kqueue, completion ports etc is available).

Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
(ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
approach where available as suggested by Andres[2] or fall back to
polling a reusable WaitEventSet (timeout = 0), then there'd be no more
calls to PostmasterIsAlive() outside latch.c.

I'm still unconvinced by this. There's good reasons why code might be
busy-looping without checking the latch, and we shouldn't force code to
add more latch checks if unnecessary. Resetting the latch frequently can
actually increase the amount of context switches considerably.

- Andres

#17

Thomas Munro

thomas.munro@gmail.com

about 8 years ago

In reply to: Andres Freund (#16)

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

On Wed, Apr 11, 2018 at 12:03 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-11 11:57:20 +1200, Thomas Munro wrote:

Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
(ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
approach where available as suggested by Andres[2] or fall back to
polling a reusable WaitEventSet (timeout = 0), then there'd be no more
calls to PostmasterIsAlive() outside latch.c.

I'm still unconvinced by this. There's good reasons why code might be
busy-looping without checking the latch, and we shouldn't force code to
add more latch checks if unnecessary. Resetting the latch frequently can
actually increase the amount of context switches considerably.

What latch? There wouldn't be a latch in a WaitEventSet that is used
only to detect postmaster death.

I arrived at this idea via the realisation that the closest thing to
prctl(PR_SET_PDEATHSIG) on BSD-family systems today is
please-tell-my-kqueue-if-this-process-dies. It so happens that my
kqueue patch already uses that instead of the pipe to detect
postmaster death. The only problem is that you have to ask it, by
calling it kevent(). In a busy loop like those two, where there is no
other kind of waiting going on, you could do that by calling kevent()
with timeout = 0 to check the queue.

You could probably figure out a way to hide the
prctl(PR_SET_PDEATHSIG)-based approach inside the WaitEventSet code,
with a fast path that doesn't make any system calls if the only event
registered is postmaster death (you can just check the global variable
set by your signal handler). But I guess you wouldn't like the extra
function call so I guess you'd prefer to check the global variable
directly in the busy loop, in builds that have
prctl(PR_SET_PDEATHSIG).

--
Thomas Munro
http://www.enterprisedb.com

#18

Andres Freund

andres@anarazel.de

about 8 years ago

In reply to: Thomas Munro (#17)

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

Hi,

On 2018-04-11 12:17:14 +1200, Thomas Munro wrote:

I arrived at this idea via the realisation that the closest thing to
prctl(PR_SET_PDEATHSIG) on BSD-family systems today is
please-tell-my-kqueue-if-this-process-dies. It so happens that my
kqueue patch already uses that instead of the pipe to detect
postmaster death. The only problem is that you have to ask it, by
calling it kevent(). In a busy loop like those two, where there is no
other kind of waiting going on, you could do that by calling kevent()
with timeout = 0 to check the queue.

Which is not cheap.

You could probably figure out a way to hide the
prctl(PR_SET_PDEATHSIG)-based approach inside the WaitEventSet code,
with a fast path that doesn't make any system calls if the only event
registered is postmaster death (you can just check the global variable
set by your signal handler). But I guess you wouldn't like the extra
function call so I guess you'd prefer to check the global variable
directly in the busy loop, in builds that have
prctl(PR_SET_PDEATHSIG).

Yea, I'd really want this to be a inline function of the style

static inline bool
PostmasterIsAlive(void)
{
if (!postmaster_possibly_dead)
return true;
return PostmasterIsAliveInternal();
}

I do not like the WES alternative because a special cased "just
postmaster death" WES seems to have absolutely no advantage over a
faster PostmasterIsAlive(). And using WES seems to make a cheap check
like the above significantly more complicated.

Greetings,

Andres Freund

#19

Thomas Munro

thomas.munro@gmail.com

about 8 years ago

In reply to: Andres Freund (#18)

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

On Wed, Apr 11, 2018 at 12:26 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-11 12:17:14 +1200, Thomas Munro wrote:

I arrived at this idea via the realisation that the closest thing to
prctl(PR_SET_PDEATHSIG) on BSD-family systems today is
please-tell-my-kqueue-if-this-process-dies. It so happens that my
kqueue patch already uses that instead of the pipe to detect
postmaster death. The only problem is that you have to ask it, by
calling it kevent(). In a busy loop like those two, where there is no
other kind of waiting going on, you could do that by calling kevent()
with timeout = 0 to check the queue.

Which is not cheap.

Certainly, but I am reacting to reports via you of performance
problems coming from read(heavily_contended_fd) by saying: although we
can't avoid a syscall like Linux can in this case, we *can* avoid the
reportedly contended pipe mutex completely by reading only our own
process's kqueue. Combined with Heikki's proposal not to check every
single time through busy loops to reduce the syscall frequency, that
might be the optimal solution until some hypothetical
prctl(PR_SET_PDEATHSIG)-feature-match patch arrives. Just a thought.

--
Thomas Munro
http://www.enterprisedb.com

#20

Thomas Munro

thomas.munro@gmail.com

about 8 years ago

In reply to: Thomas Munro (#19)

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

On Wed, Apr 11, 2018 at 12:47 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Wed, Apr 11, 2018 at 12:26 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-11 12:17:14 +1200, Thomas Munro wrote:

I arrived at this idea via the realisation that the closest thing to
prctl(PR_SET_PDEATHSIG) on BSD-family systems today is
please-tell-my-kqueue-if-this-process-dies. It so happens that my
kqueue patch already uses that instead of the pipe to detect
postmaster death. The only problem is that you have to ask it, by
calling it kevent(). In a busy loop like those two, where there is no
other kind of waiting going on, you could do that by calling kevent()
with timeout = 0 to check the queue.

Which is not cheap.

After bouncing that idea around with a fellow pgsql-hacker off-list, I
take that idea back. It's a lot simpler and as cheap if not cheaper
to check getppid() == 1 or similar every Nth time through the busy
loop.

I heard another interesting idea -- you can use F_SETOWN + O_ASYNC to
ask for SIGIO to be delivered to you when the pipe is closed.
Unfortunately that'd require a different pipe for each backend so
wouldn't work for us.

--
Thomas Munro
http://www.enterprisedb.com

#21

Thomas Munro

thomas.munro@gmail.com

about 8 years ago

In reply to: Thomas Munro (#20)

#22

Thomas Munro

thomas.munro@gmail.com

about 8 years ago

In reply to: Thomas Munro (#21)

#23