pgsql: Add WL_EXIT_ON_PM_DEATH pseudo-event.
Add WL_EXIT_ON_PM_DEATH pseudo-event.
Users of the WaitEventSet and WaitLatch() APIs can now choose between
asking for WL_POSTMASTER_DEATH and then handling it explicitly, or asking
for WL_EXIT_ON_PM_DEATH to trigger immediate exit on postmaster death.
This reduces code duplication, since almost all callers want the latter.
Repair all code that was previously ignoring postmaster death completely,
or requesting the event but ignoring it, or requesting the event but then
doing an unconditional PostmasterIsAlive() call every time through its
event loop (which is an expensive syscall on platforms for which we don't
have USE_POSTMASTER_DEATH_SIGNAL support).
Assert that callers of WaitLatchXXX() under the postmaster remember to
ask for either WL_POSTMASTER_DEATH or WL_EXIT_ON_PM_DEATH, to prevent
future bugs.
The only process that doesn't handle postmaster death is syslogger. It
waits until all backends holding the write end of the syslog pipe
(including the postmaster) have closed it by exiting, to be sure to
capture any parting messages. By using the WaitEventSet API directly
it avoids the new assertion, and as a by-product it may be slightly
more efficient on platforms that have epoll().
Author: Thomas Munro
Reviewed-by: Kyotaro Horiguchi, Heikki Linnakangas, Tom Lane
Discussion: /messages/by-id/CAEepm=1TCviRykkUb69ppWLr_V697rzd1j3eZsRMmbXvETfqbQ@mail.gmail.com,
/messages/by-id/CAEepm=2LqHzizbe7muD7-2yHUbTOoF7Q+qkSD5Q41kuhttRTwA@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/cfdf4dc4fc9635ac8bf6eaaa5dbbcd364ab29f0c
Modified Files
--------------
- | 0
contrib/pg_prewarm/autoprewarm.c | 8 ++--
contrib/postgres_fdw/connection.c | 6 ++-
src/backend/access/transam/parallel.c | 10 ++---
src/backend/access/transam/xlog.c | 23 +++++-----
src/backend/access/transam/xlogfuncs.c | 8 ++--
src/backend/executor/nodeGather.c | 3 +-
src/backend/libpq/be-secure-openssl.c | 4 +-
src/backend/libpq/pqmq.c | 4 +-
src/backend/postmaster/autovacuum.c | 16 ++-----
src/backend/postmaster/bgwriter.c | 11 +----
src/backend/postmaster/checkpointer.c | 16 ++-----
src/backend/postmaster/pgarch.c | 4 +-
src/backend/postmaster/syslogger.c | 38 ++++++++++-------
src/backend/postmaster/walwriter.c | 16 ++-----
src/backend/replication/basebackup.c | 2 +-
.../libpqwalreceiver/libpqwalreceiver.c | 13 +-----
src/backend/replication/logical/launcher.c | 24 ++---------
src/backend/replication/logical/tablesync.c | 30 ++++---------
src/backend/replication/logical/worker.c | 6 +--
src/backend/replication/syncrep.c | 18 ++++----
src/backend/replication/walreceiver.c | 22 ++--------
src/backend/replication/walsender.c | 49 +++++++---------------
src/backend/storage/ipc/latch.c | 38 ++++++++++++++++-
src/backend/storage/ipc/shm_mq.c | 9 ++--
src/backend/storage/lmgr/condition_variable.c | 14 ++-----
src/backend/storage/lmgr/proc.c | 7 ++--
src/backend/utils/adt/misc.c | 2 +-
src/include/storage/latch.h | 3 +-
src/test/modules/test_shm_mq/setup.c | 3 +-
src/test/modules/test_shm_mq/test.c | 3 +-
src/test/modules/worker_spi/worker_spi.c | 6 +--
32 files changed, 174 insertions(+), 242 deletions(-)
Thomas Munro <tmunro@postgresql.org> writes:
Add WL_EXIT_ON_PM_DEATH pseudo-event.
This patch added an empty file at the top of the git tree:
Modified Files
--------------
- | 0
I assume this was a mistake. Please be more careful in future.
regards, tom lane
On Sat, Nov 24, 2018 at 4:20 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <tmunro@postgresql.org> writes:
- | 0
I assume this was a mistake. Please be more careful in future.
Sorry, somehow this cruft was hiding in plain sight in the last couple
of patch revisions I posted. Thanks for removing it.
--
Thomas Munro
http://www.enterprisedb.com
On Fri, Nov 23, 2018 at 07:54:42AM +0000, Thomas Munro wrote:
Add WL_EXIT_ON_PM_DEATH pseudo-event.
I was not paying much attention to this thread, but that's a nice
cleanup. Thanks, Thomas.
[... switching stuff to the new option ...]
--
Michael
Re: Thomas Munro 2018-11-23 <E1gQ6IU-0002sR-Fm@gemulon.postgresql.org>
Add WL_EXIT_ON_PM_DEATH pseudo-event.
I think this broke something:
TRAP: FailedAssertion(�!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || (wakeEvents & (1 << 4)))�, Datei: �/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c�, Zeile: 389)
2018-11-24 15:20:43.193 CET [17834] LOG: Serverprozess (PID 18425) wurde von Signal 6 beendet: Aborted
I can trigger it just by opening an ssl connection, non-ssl tcp
connections are fine.
Debian unstable/amd64.
Christoph
On Sun, Nov 25, 2018 at 3:38 AM Christoph Berg <myon@debian.org> wrote:
Re: Thomas Munro 2018-11-23 <E1gQ6IU-0002sR-Fm@gemulon.postgresql.org>
Add WL_EXIT_ON_PM_DEATH pseudo-event.
I think this broke something:
TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || (wakeEvents & (1 << 4)))«, Datei: »/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«, Zeile: 389)
2018-11-24 15:20:43.193 CET [17834] LOG: Serverprozess (PID 18425) wurde von Signal 6 beendet: AbortedI can trigger it just by opening an ssl connection, non-ssl tcp
connections are fine.
Thanks. I was initially surprised that this didn't come up in
check-world, but I see now that I need to go and add
PG_TEST_EXTRA="ssl ldap" to my testing routine (and cfbot's).
Reproduced here, and it's a case where we were not handling postmaster
death, which exactly what this assertion was designed to find. The
following is one way to fix the assertion failure, though I'm not sure
if it would be better to request WL_POSTMASTER_DEATH and generate a
FATAL error like secure_read() does:
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -406,9 +406,9 @@ aloop:
* StartupPacketTimeoutHandler() which
directly exits.
*/
if (err == SSL_ERROR_WANT_READ)
- waitfor = WL_SOCKET_READABLE;
+ waitfor = WL_SOCKET_READABLE |
WL_EXIT_ON_PM_DEATH;
else
- waitfor = WL_SOCKET_WRITEABLE;
+ waitfor = WL_SOCKET_WRITEABLE
| WL_EXIT_ON_PM_DEATH;
--
Thomas Munro
http://www.enterprisedb.com
On Sun, Nov 25, 2018 at 12:59 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Sun, Nov 25, 2018 at 3:38 AM Christoph Berg <myon@debian.org> wrote:
TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || (wakeEvents & (1 << 4)))«, Datei: »/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«, Zeile: 389)
2018-11-24 15:20:43.193 CET [17834] LOG: Serverprozess (PID 18425) wurde von Signal 6 beendet: Aborted
Fix pushed.
By way of penance, I have now configured PG_TEST_EXTRA="ssl ldap
kerberos" for my build farm animals elver and eelpout. elver should
pass at the next build, as I just tested it with --nosend, but eelpout
is so slow I'll just take my chances see if that works. I'll also
review the firewall config on those VMs, since apparently everyone is
too chicken to run those tests, perhaps for those sorts of reasons.
I've also set those tests up for cfbot, which would have caught this
when draft patches were posted, and also enabled -Werror on cfbot
which would have caught a GCC warning I missed because I usually
develop/test with clang.
--
Thomas Munro
http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes:
Fix pushed.
By way of penance, I have now configured PG_TEST_EXTRA="ssl ldap
kerberos" for my build farm animals elver and eelpout. elver should
pass at the next build, as I just tested it with --nosend, but eelpout
is so slow I'll just take my chances see if that works.
Nope :-(. Looks like something about key length ... probably just
misconfiguration?
I'll also
review the firewall config on those VMs, since apparently everyone is
too chicken to run those tests, perhaps for those sorts of reasons.
I think in many cases the answer is just "it's not in the default
buildfarm configuration". I couldn't think of a strong reason not
to run the ssl check on longfin, so I've just updated that to do so.
regards, tom lane
On Mon, Nov 26, 2018 at 6:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
Fix pushed.
By way of penance, I have now configured PG_TEST_EXTRA="ssl ldap
kerberos" for my build farm animals elver and eelpout. elver should
pass at the next build, as I just tested it with --nosend, but eelpout
is so slow I'll just take my chances see if that works.Nope :-(. Looks like something about key length ... probably just
misconfiguration?
It seems that we have keys in our tree that are unacceptable to
OpenSSL 1.1.1 as shipped in Debian buster:
2018-11-25 20:32:22.519 UTC [26882] FATAL: could not load server
certificate file "server-cn-only.crt": ee key too small
That's what you get if you use the libssl-dev package (1.1.1a-1), but
you can still install libssl1.0-dev (which uninstalls 1.1's dev
package). I've done that and it the ssl test passes on that machine,
so fingers crossed for the next build farm run.
I see now that Michael already wrote about this recently[1]/messages/by-id/20180917131340.GE31460@paquier.xyz, but that
thread hasn't yet reached a conclusion.
[1]: /messages/by-id/20180917131340.GE31460@paquier.xyz
--
Thomas Munro
http://www.enterprisedb.com
On Mon, Nov 26, 2018 at 09:53:19AM +1300, Thomas Munro wrote:
I see now that Michael already wrote about this recently[1], but that
thread hasn't yet reached a conclusion.
Yes, I heard nothing but crickets on this one. So what I have been
doing is just to update my SSL configuration when running the tests.
That's annoying... Still not impossible to solve. If there are extra
opinions to move on with a key replacement, I could always do so.
--
Michael