Peripatus/failures

Started by Larry Rosenmanabout 7 years ago23 messages
#1Larry Rosenman
ler@lerctr.org

It looks like my upgrade to the current head of FreeBSD 12-to-be, which includes OpenSSL 1.1.1 broke a bunch of our stuff.

In at least the 9.x branches.  Just a heads up.

--

Larry Rosenman http://www.lerctr.org/~ler

Phone: +1 214-642-9640 E-Mail: ler@lerctr.org

US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry Rosenman (#1)
Re: Peripatus/failures

Larry Rosenman <ler@lerctr.org> writes:

It looks like my upgrade to the current head of FreeBSD 12-to-be, which includes OpenSSL 1.1.1 broke a bunch of our stuff.
In at least the 9.x branches.  Just a heads up.

It looks like configure is drawing the wrong conclusions about OpenSSL's
API options. Since the configure outputs are cached, perhaps clearing
your buildfarm critter's accache subdirectory would be enough to fix that.

regards, tom lane

#3Larry Rosenman
ler@lerctr.org
In reply to: Tom Lane (#2)
Re: Peripatus/failures

On Wed, Oct 17, 2018 at 01:41:59PM -0400, Tom Lane wrote:

Larry Rosenman <ler@lerctr.org> writes:

It looks like my upgrade to the current head of FreeBSD 12-to-be, which includes OpenSSL 1.1.1 broke a bunch of our stuff.
In at least the 9.x branches.� Just a heads up.

It looks like configure is drawing the wrong conclusions about OpenSSL's
API options. Since the configure outputs are cached, perhaps clearing
your buildfarm critter's accache subdirectory would be enough to fix that.

That got it further, but still fails at PLCheck-C (at least on 9.3).
It's still running the other branches.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry Rosenman (#3)
DSM robustness failure (was Re: Peripatus/failures)

Larry Rosenman <ler@lerctr.org> writes:

That got it further, but still fails at PLCheck-C (at least on 9.3).
It's still running the other branches.

Hmm. I'm not sure why plpython is crashing for you, but this is exposing
a robustness problem in the DSM logic:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&amp;dt=2018-10-17%2018%3A22%3A50

The postmaster is suffering an Assert failure while trying to clean up
after the crash:

2018-10-17 13:43:23.203 CDT [51974:8] pg_regress LOG: statement: SELECT import_succeed();
2018-10-17 13:43:24.228 CDT [46467:2] LOG: server process (PID 51974) was terminated by signal 11: Segmentation fault
2018-10-17 13:43:24.228 CDT [46467:3] DETAIL: Failed process was running: SELECT import_succeed();
2018-10-17 13:43:24.229 CDT [46467:4] LOG: terminating any other active server processes
2018-10-17 13:43:24.229 CDT [46778:2] WARNING: terminating connection because of crash of another server process
2018-10-17 13:43:24.229 CDT [46778:3] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-10-17 13:43:24.229 CDT [46778:4] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2018-10-17 13:43:24.235 CDT [46467:5] LOG: all server processes terminated; reinitializing
2018-10-17 13:43:24.235 CDT [46467:6] LOG: dynamic shared memory control segment is corrupt
TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 181)

It looks to me like what's happening is

(1) crashing process corrupts the DSM control segment somehow.

(2) dsm_postmaster_shutdown notices that, bleats to the log, and
figures its job is done.

(3) dsm_postmaster_startup crashes on Assert because
dsm_control_mapped_size isn't 0, because the old seg is still mapped.

I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup
are broken here; the former because it makes no attempt to unmap
the old control segment (which it oughta be able to do no matter how badly
broken the contents are), and the latter because it should not let
garbage old state prevent it from establishing a valid new segment.

BTW, the header comment on dsm_postmaster_startup is a lie, which
is probably not unrelated to its failure to consider this situation.

regards, tom lane

#5Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#4)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 9:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

2018-10-17 13:43:24.235 CDT [46467:6] LOG: dynamic shared memory control segment is corrupt
TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 181)

It looks to me like what's happening is

(1) crashing process corrupts the DSM control segment somehow.

I wonder how. Apparently mapped size was tiny (least likely
explanation), control->magic was wrong, or control->maxitems and
control->nitems were inconsistent with each other or the mapped size.

(2) dsm_postmaster_shutdown notices that, bleats to the log, and
figures its job is done.

Right, that seems to be the main problem.

(3) dsm_postmaster_startup crashes on Assert because
dsm_control_mapped_size isn't 0, because the old seg is still mapped.

Right.

I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup
are broken here; the former because it makes no attempt to unmap
the old control segment (which it oughta be able to do no matter how badly
broken the contents are), and the latter because it should not let
garbage old state prevent it from establishing a valid new segment.

Looking.

BTW, the header comment on dsm_postmaster_startup is a lie, which
is probably not unrelated to its failure to consider this situation.

Agreed.

--
Thomas Munro
http://www.enterprisedb.com

#6Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#5)
1 attachment(s)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 9:43 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Oct 18, 2018 at 9:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup
are broken here; the former because it makes no attempt to unmap
the old control segment (which it oughta be able to do no matter how badly
broken the contents are), and the latter because it should not let
garbage old state prevent it from establishing a valid new segment.

Looking.

(CCing Amit Kapila)

To reproduce this, I attached lldb to a backend and did "mem write
&dsm_control->magic 42", and then delivered SIGKILL to the backend.
Here's one way to fix it. I think we have no choice but to leak the
referenced segments, but we can free the control segment. See
comments in the attached patch for rationale.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Fix-thinko-in-DSM-control-segment-crash-restart-logi.patchapplication/octet-stream; name=0001-Fix-thinko-in-DSM-control-segment-crash-restart-logi.patchDownload
From f87cf67dd55d8800973cb1eb37f604aef5eb3154 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Thu, 18 Oct 2018 10:48:18 +1300
Subject: [PATCH] Fix thinko in DSM control segment crash restart logic.

If dsm_postmaster_shutdown() determines that the DSM control segment
is corrupted, it should still destroy it, otherwise it is leaked and
dsm_postmaster_startup() fails in an assertion that we are not
currently attached to a control segment.

Reported by build-farm animal peripatus.  The root cause of the
corrupted segment in that case is not yet known, but that's another
matter.

Author: Thomas Munro
Diagnosed-by: Tom Lane
Discussion: https://postgr.es/m/6153.1539806400%40sss.pgh.pa.us
---
 src/backend/storage/ipc/dsm.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 9629f22f7af..226fdee287a 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -138,8 +138,7 @@ static void *dsm_control_impl_private = NULL;
 /*
  * Start up the dynamic shared memory system.
  *
- * This is called just once during each cluster lifetime, at postmaster
- * startup time.
+ * This is called at postmaster startup time, and again during crash restarts.
  */
 void
 dsm_postmaster_startup(PGShmemHeader *shim)
@@ -234,14 +233,22 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 	{
 		dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
 					&mapped_address, &mapped_size, LOG);
-		return;
+
+		/*
+		 * This control segment doesn't look sane.  We were previously
+		 * attached to it, so we know that we created it and it's
+		 * therefore safe to destroy it.  It's not safe to destroy the list of
+		 * segments that it points to though, because apparently the contents
+		 * are corrupted, so we could inadvertently destroy segments belonging
+		 * to some other software (say, another database cluster) running on
+		 * the machine.  Unfortunately this leaks the segments.
+		 */
+		nitems = 0;
 	}
+	else
+		nitems = old_control->nitems;
 
-	/*
-	 * OK, the control segment looks basically valid, so we can use it to get
-	 * a list of segments that need to be removed.
-	 */
-	nitems = old_control->nitems;
+	/* Get a list of segments that need to be removed. */
 	for (i = 0; i < nitems; ++i)
 	{
 		dsm_handle	handle;
-- 
2.17.1 (Apple Git-112)

#7Larry Rosenman
ler@lerctr.org
In reply to: Thomas Munro (#6)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 11:08:33AM +1300, Thomas Munro wrote:

On Thu, Oct 18, 2018 at 9:43 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Oct 18, 2018 at 9:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup
are broken here; the former because it makes no attempt to unmap
the old control segment (which it oughta be able to do no matter how badly
broken the contents are), and the latter because it should not let
garbage old state prevent it from establishing a valid new segment.

Looking.

(CCing Amit Kapila)

To reproduce this, I attached lldb to a backend and did "mem write
&dsm_control->magic 42", and then delivered SIGKILL to the backend.
Here's one way to fix it. I think we have no choice but to leak the
referenced segments, but we can free the control segment. See
comments in the attached patch for rationale.

On the original failure, I recompiled and reinstalled the 2 Python's I
have on this box, and at least 9.3 went back to OK.

--
Thomas Munro
http://www.enterprisedb.com

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry Rosenman (#7)
Re: DSM robustness failure (was Re: Peripatus/failures)

Larry Rosenman <ler@lerctr.org> writes:

On the original failure, I recompiled and reinstalled the 2 Python's I
have on this box, and at least 9.3 went back to OK.

Hmm. I'd just finished pulling down FreeBSD-12.0-ALPHA9 and failing
to reproduce any problem with that ... and then I noticed your box
said it was on ALPHA10, which I'd missed seeing in the clutter of
FreeBSD's download server. I'm about halfway through switching to
that, but I bet it will work too. Was your Python install built
with any special switches? I just used what came from "pkg install".

regards, tom lane

#9Larry Rosenman
ler@lerctr.org
In reply to: Tom Lane (#8)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Wed, Oct 17, 2018 at 07:07:09PM -0400, Tom Lane wrote:

Larry Rosenman <ler@lerctr.org> writes:

On the original failure, I recompiled and reinstalled the 2 Python's I
have on this box, and at least 9.3 went back to OK.

Hmm. I'd just finished pulling down FreeBSD-12.0-ALPHA9 and failing
to reproduce any problem with that ... and then I noticed your box
said it was on ALPHA10, which I'd missed seeing in the clutter of
FreeBSD's download server. I'm about halfway through switching to
that, but I bet it will work too. Was your Python install built
with any special switches? I just used what came from "pkg install".

It had been built on a previous FreeBSD build, I have my own poudriere
infrastructure. I can probably get you the package from my ZFS snaps if
you'd like.

regards, tom lane

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry Rosenman (#9)
Re: DSM robustness failure (was Re: Peripatus/failures)

Larry Rosenman <ler@lerctr.org> writes:

On Wed, Oct 17, 2018 at 07:07:09PM -0400, Tom Lane wrote:

... Was your Python install built
with any special switches? I just used what came from "pkg install".

It had been built on a previous FreeBSD build, I have my own poudriere
infrastructure. I can probably get you the package from my ZFS snaps if
you'd like.

I've now verified that the bog-standard packages for python 2.7 and python
3.6 both build and pass regression cleanly, using ALPHA10. And I see
peripatus is back to green too. So as far as the plpython end of this is
concerned, I think we can write it off as "something wrong with Larry's
custom package build". However, I'm still slightly interested in how it
was that that broke DSM so thoroughly ... I pulled down your version of
python2.7 and will see if that reproduces it.

regards, tom lane

#11Larry Rosenman
ler@lerctr.org
In reply to: Tom Lane (#10)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Wed, Oct 17, 2018 at 08:10:28PM -0400, Tom Lane wrote:

Larry Rosenman <ler@lerctr.org> writes:

On Wed, Oct 17, 2018 at 07:07:09PM -0400, Tom Lane wrote:

... Was your Python install built
with any special switches? I just used what came from "pkg install".

It had been built on a previous FreeBSD build, I have my own poudriere
infrastructure. I can probably get you the package from my ZFS snaps if
you'd like.

I've now verified that the bog-standard packages for python 2.7 and python
3.6 both build and pass regression cleanly, using ALPHA10. And I see
peripatus is back to green too. So as far as the plpython end of this is
concerned, I think we can write it off as "something wrong with Larry's
custom package build". However, I'm still slightly interested in how it
was that that broke DSM so thoroughly ... I pulled down your version of
python2.7 and will see if that reproduces it.

It was built on a previous alpha, so who knows what the differing
compiler/libs/kernel/etc did. The options used did *NOT* change, just
the userland used to compile it. (I.E. that package, running on
ALPHA10 is what broke).

that's one of the nice things about poudriere, it's a clean room type
environment, and uses canned options as set by the user, and I haven't
changed python options in a LONG (I.E. months/years) time.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry Rosenman (#11)
Re: DSM robustness failure (was Re: Peripatus/failures)

Larry Rosenman <ler@lerctr.org> writes:

On Wed, Oct 17, 2018 at 08:10:28PM -0400, Tom Lane wrote:

However, I'm still slightly interested in how it
was that that broke DSM so thoroughly ... I pulled down your version of
python2.7 and will see if that reproduces it.

It was built on a previous alpha, so who knows what the differing
compiler/libs/kernel/etc did. The options used did *NOT* change, just
the userland used to compile it. (I.E. that package, running on
ALPHA10 is what broke).

Hm. I forcibly installed your package over the regular one using
pkg add -f python27-2.7.15.txz
and rebuilt PG, and what I get is a failure in the plpython regression
test, but no crash. So I'm still confused.

*** /usr/home/tgl/pgsql/src/pl/plpython/expected/plpython_import.out    Wed Oct 17 19:48:19 2018
--- /usr/home/tgl/pgsql/src/pl/plpython/results/plpython_import.out     Wed Oct 17 20:50:22 2018
***************
*** 64,79 ****
  -- test import and simple argument handling
  --
  SELECT import_test_one('sha hash of this string');
!              import_test_one              
! ------------------------------------------
!  a04e23cb9b1a09cd1051a04a7c571aae0f90346c
! (1 row)
! 
  -- test import and tuple argument handling
  --
  select import_test_two(users) from users where fname = 'willem';
!                           import_test_two                          
! -------------------------------------------------------------------
!  sha hash of willemdoe is 3cde6b574953b0ca937b4d76ebc40d534d910759
! (1 row)
! 
--- 64,79 ----
  -- test import and simple argument handling
  --
  SELECT import_test_one('sha hash of this string');
! ERROR:  AttributeError: 'module' object has no attribute 'sha1'
! CONTEXT:  Traceback (most recent call last):
!   PL/Python function "import_test_one", line 3, in <module>
!     digest = hashlib.sha1(p.encode("ascii"))
! PL/Python function "import_test_one"
  -- test import and tuple argument handling
  --
  select import_test_two(users) from users where fname = 'willem';
! ERROR:  AttributeError: 'module' object has no attribute 'sha1'
! CONTEXT:  Traceback (most recent call last):
!   PL/Python function "import_test_two", line 4, in <module>
!     digest = hashlib.sha1(plain.encode("ascii"))
! PL/Python function "import_test_two"

======================================================================

regards, tom lane

#13Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Thomas Munro (#6)
1 attachment(s)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 11:08 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Oct 18, 2018 at 9:43 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Oct 18, 2018 at 9:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup
are broken here; the former because it makes no attempt to unmap
the old control segment (which it oughta be able to do no matter how badly
broken the contents are), and the latter because it should not let
garbage old state prevent it from establishing a valid new segment.

Looking.

(CCing Amit Kapila)

To reproduce this, I attached lldb to a backend and did "mem write
&dsm_control->magic 42", and then delivered SIGKILL to the backend.
Here's one way to fix it. I think we have no choice but to leak the
referenced segments, but we can free the control segment. See
comments in the attached patch for rationale.

I realised that the nearly identical code in dsm_postmaster_shutdown()
might as well destroy a corrupted control segment too. New version
attached.

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Fix-thinko-in-handling-of-corrupted-DSM-control-area.patchapplication/octet-stream; name=0001-Fix-thinko-in-handling-of-corrupted-DSM-control-area.patchDownload
From b5f7903b97ab9d0539630bab8805fcbcdcef793d Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Thu, 18 Oct 2018 10:48:18 +1300
Subject: [PATCH] Fix thinko in handling of corrupted DSM control area.

If dsm_cleanup_using_control_segment() or dsm_postmaster_shutdown()
determine that the DSM control segment is corrupted, they should still
destroy it, otherwise it is leaked.  This fixes an assertion failure
in dsm_postmaster_startup().

Reported by build-farm animal peripatus.  The root cause of the
corrupted segment in that case is not yet known, but that's another
matter.

Author: Thomas Munro
Diagnosed-by: Tom Lane
Discussion: https://postgr.es/m/6153.1539806400%40sss.pgh.pa.us
---
 src/backend/storage/ipc/dsm.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 9629f22f7af..f24bf2e74a1 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -138,8 +138,7 @@ static void *dsm_control_impl_private = NULL;
 /*
  * Start up the dynamic shared memory system.
  *
- * This is called just once during each cluster lifetime, at postmaster
- * startup time.
+ * This is called at postmaster startup time, and again during crash restarts.
  */
 void
 dsm_postmaster_startup(PGShmemHeader *shim)
@@ -227,21 +226,19 @@ dsm_cleanup_using_control_segment(dsm_handle old_control_handle)
 
 	/*
 	 * We've managed to reattach it, but the contents might not be sane. If
-	 * they aren't, we disregard the segment after all.
+	 * they aren't, we disregard the contents of the segment after all.
 	 */
 	old_control = (dsm_control_header *) mapped_address;
 	if (!dsm_control_segment_sane(old_control, mapped_size))
 	{
 		dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
 					&mapped_address, &mapped_size, LOG);
-		return;
+		nitems = 0;
 	}
+	else
+		nitems = old_control->nitems;
 
-	/*
-	 * OK, the control segment looks basically valid, so we can use it to get
-	 * a list of segments that need to be removed.
-	 */
-	nitems = old_control->nitems;
+	/* Get a list of segments that need to be removed. */
 	for (i = 0; i < nitems; ++i)
 	{
 		dsm_handle	handle;
@@ -336,13 +333,14 @@ dsm_postmaster_shutdown(int code, Datum arg)
 	 * stray shared memory segments, but there's not much we can do about that
 	 * if the metadata is gone.
 	 */
-	nitems = dsm_control->nitems;
 	if (!dsm_control_segment_sane(dsm_control, dsm_control_mapped_size))
 	{
 		ereport(LOG,
 				(errmsg("dynamic shared memory control segment is corrupt")));
-		return;
+		nitems = 0;
 	}
+	else
+		nitems = dsm_control->nitems;
 
 	/* Remove any remaining segments. */
 	for (i = 0; i < nitems; ++i)
-- 
2.17.1 (Apple Git-112)

#14Larry Rosenman
ler@lerctr.org
In reply to: Tom Lane (#12)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Wed, Oct 17, 2018 at 08:55:09PM -0400, Tom Lane wrote:

Larry Rosenman <ler@lerctr.org> writes:

On Wed, Oct 17, 2018 at 08:10:28PM -0400, Tom Lane wrote:

However, I'm still slightly interested in how it
was that that broke DSM so thoroughly ... I pulled down your version of
python2.7 and will see if that reproduces it.

It was built on a previous alpha, so who knows what the differing
compiler/libs/kernel/etc did. The options used did *NOT* change, just
the userland used to compile it. (I.E. that package, running on
ALPHA10 is what broke).

Hm. I forcibly installed your package over the regular one using
pkg add -f python27-2.7.15.txz
and rebuilt PG, and what I get is a failure in the plpython regression
test, but no crash. So I'm still confused.

Hell if I know. I've cleaned up my environment (old, broken, etc
stuff), but who knows whats broken what where.

Not sure how to get back to the "broken" state :(

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

#15Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#10)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 1:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

... However, I'm still slightly interested in how it
was that that broke DSM so thoroughly ...

Me too. Frustratingly, that vm object might still exist on Larry's
machine if it hasn't been rebooted (since we failed to shm_unlink()
it), so if we knew its name we could write a program to shm_open(),
mmap(), dump out to a file for analysis and then we could work out
which of the sanity tests it failed and maybe get some clues.
Unfortunately it's not in any of our logs AFAIK, and I can't see any
way to get a list of existing shm_open() objects from the kernel.
From sys/kern/uipc_shm.c:

* TODO:
*
* (1) Need to export data to a userland tool via a sysctl. Should ipcs(1)
* and ipcrm(1) be expanded or should new tools to manage both POSIX
* kernel semaphores and POSIX shared memory be written?

Gah. So basically that's hiding in shm_dictionary in the kernel and I
don't know a way to look at it from userspace (other than trying to
open all 2^32 random paths we're capable of generating).

--
Thomas Munro
http://www.enterprisedb.com

#16Larry Rosenman
ler@lerctr.org
In reply to: Thomas Munro (#15)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 02:17:14PM +1300, Thomas Munro wrote:

On Thu, Oct 18, 2018 at 1:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

... However, I'm still slightly interested in how it
was that that broke DSM so thoroughly ...

Me too. Frustratingly, that vm object might still exist on Larry's
machine if it hasn't been rebooted (since we failed to shm_unlink()
it), so if we knew its name we could write a program to shm_open(),
mmap(), dump out to a file for analysis and then we could work out
which of the sanity tests it failed and maybe get some clues.
Unfortunately it's not in any of our logs AFAIK, and I can't see any
way to get a list of existing shm_open() objects from the kernel.
From sys/kern/uipc_shm.c:

* TODO:
*
* (1) Need to export data to a userland tool via a sysctl. Should ipcs(1)
* and ipcrm(1) be expanded or should new tools to manage both POSIX
* kernel semaphores and POSIX shared memory be written?

Gah. So basically that's hiding in shm_dictionary in the kernel and I
don't know a way to look at it from userspace (other than trying to
open all 2^32 random paths we're capable of generating).

It has *NOT* been rebooted. I can give y'all id's if you want to go
poking around.

--
Thomas Munro
http://www.enterprisedb.com

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#15)
Re: DSM robustness failure (was Re: Peripatus/failures)

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Thu, Oct 18, 2018 at 1:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

... However, I'm still slightly interested in how it
was that that broke DSM so thoroughly ...

Me too. Frustratingly, that vm object might still exist on Larry's
machine if it hasn't been rebooted (since we failed to shm_unlink()
it), so if we knew its name we could write a program to shm_open(),
mmap(), dump out to a file for analysis and then we could work out
which of the sanity tests it failed and maybe get some clues.

Larry's REL_10_STABLE failure logs are interesting:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&amp;dt=2018-10-17%2020%3A42%3A17

2018-10-17 15:48:08.849 CDT [55240:7] LOG: dynamic shared memory control segment is corrupt
2018-10-17 15:48:08.849 CDT [55240:8] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:9] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:10] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:11] LOG: sem_destroy failed: Invalid argument
... lots more ...
2018-10-17 15:48:08.862 CDT [55240:122] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.862 CDT [55240:123] LOG: sem_destroy failed: Invalid argument
TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 182)

So at least in this case, not only did we lose the DSM segment but also
all of our semaphores. Is it conceivable that Python somehow destroyed
those objects, rather than stomping on the contents of the DSM segment?
If not, how do we explain this log?

Also, why is there branch-specific variation? The fact that v11 and HEAD
aren't whinging about lost semaphores is not hard to understand --- we
stopped using SysV semas. But why don't the older branches look like v10
here?

regards, tom lane

#18Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Tom Lane (#17)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 2:36 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Larry's REL_10_STABLE failure logs are interesting:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&amp;dt=2018-10-17%2020%3A42%3A17

2018-10-17 15:48:08.849 CDT [55240:7] LOG: dynamic shared memory control segment is corrupt
2018-10-17 15:48:08.849 CDT [55240:8] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:9] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:10] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.850 CDT [55240:11] LOG: sem_destroy failed: Invalid argument
... lots more ...
2018-10-17 15:48:08.862 CDT [55240:122] LOG: sem_destroy failed: Invalid argument
2018-10-17 15:48:08.862 CDT [55240:123] LOG: sem_destroy failed: Invalid argument
TRAP: FailedAssertion("!(dsm_control_mapped_size == 0)", File: "dsm.c", Line: 182)

So at least in this case, not only did we lose the DSM segment but also
all of our semaphores. Is it conceivable that Python somehow destroyed
those objects, rather than stomping on the contents of the DSM segment?
If not, how do we explain this log?

One idea: In the backend I'm looking at there is a contiguous run of
read/write mappings from the the location of the semaphore array
through to the DSM control segment. That means that a single runaway
loop/memcpy/memset etc could overwrite both of those. Eventually it
would run off the end of contiguously mapped space and SEGV, and we do
indeed see a segfault from that Python code before the trouble begins.

Also, why is there branch-specific variation? The fact that v11 and HEAD
aren't whinging about lost semaphores is not hard to understand --- we
stopped using SysV semas. But why don't the older branches look like v10
here?

I think v10 is where we switched to POSIX unnamed (= sem_destroy()),
so it's 10, 11 and master that should be the same in this respect, no?

--
Thomas Munro
http://www.enterprisedb.com

#19Amit Kapila
amit.kapila16@gmail.com
In reply to: Thomas Munro (#13)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 6:30 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Oct 18, 2018 at 11:08 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Oct 18, 2018 at 9:43 AM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Oct 18, 2018 at 9:00 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would argue that both dsm_postmaster_shutdown and dsm_postmaster_startup
are broken here; the former because it makes no attempt to unmap
the old control segment (which it oughta be able to do no matter how badly
broken the contents are), and the latter because it should not let
garbage old state prevent it from establishing a valid new segment.

Looking.

(CCing Amit Kapila)

To reproduce this, I attached lldb to a backend and did "mem write
&dsm_control->magic 42", and then delivered SIGKILL to the backend.
Here's one way to fix it. I think we have no choice but to leak the
referenced segments, but we can free the control segment. See
comments in the attached patch for rationale.

I realised that the nearly identical code in dsm_postmaster_shutdown()
might as well destroy a corrupted control segment too. New version
attached.

The below code seems to be problemetic:
dsm_cleanup_using_control_segment()
{
..
if (!dsm_control_segment_sane(old_control, mapped_size))
{
dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
&mapped_address, &mapped_size, LOG);
..
}

Here, don't we need to use dsm_control_* variables instead of local
variable mapped_* variables? Do we need to reset the dsm_control_*
variables in dsm_cleanup_for_mmap?

@@ -336,13 +333,14 @@ dsm_postmaster_shutdown(int code, Datum arg)
* stray shared memory segments, but there's not much we can do about that
* if the metadata is gone.
*/
- nitems = dsm_control->nitems;
if (!dsm_control_segment_sane(dsm_control, dsm_control_mapped_size))
{
ereport(LOG,
(errmsg("dynamic shared memory control segment is corrupt")));
- return;
+ nitems = 0;

The comment above this code says:
/*
* If some other backend exited uncleanly, it might have corrupted the
* control segment while it was dying. In that case, we warn and ignore
* the contents of the control segment. This may end up leaving behind
* stray shared memory segments, but there's not much we can do about that
* if the metadata is gone.
*/

Don't we need to adjust this comment, if we want to go with what you
have proposed? BTW, it doesn't appear insane to me that if the
control segment got corrupted, then we leave it as it is rather than
trying to destroy it. I am not sure, but if it is corrupted, then is
it guaranteed that destroy will always succeed?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

#20Larry Rosenman
ler@lerctr.org
In reply to: Larry Rosenman (#16)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Wed, Oct 17, 2018 at 08:19:52PM -0500, Larry Rosenman wrote:

On Thu, Oct 18, 2018 at 02:17:14PM +1300, Thomas Munro wrote:

On Thu, Oct 18, 2018 at 1:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

... However, I'm still slightly interested in how it
was that that broke DSM so thoroughly ...

Me too. Frustratingly, that vm object might still exist on Larry's
machine if it hasn't been rebooted (since we failed to shm_unlink()
it), so if we knew its name we could write a program to shm_open(),
mmap(), dump out to a file for analysis and then we could work out
which of the sanity tests it failed and maybe get some clues.
Unfortunately it's not in any of our logs AFAIK, and I can't see any
way to get a list of existing shm_open() objects from the kernel.
From sys/kern/uipc_shm.c:

* TODO:
*
* (1) Need to export data to a userland tool via a sysctl. Should ipcs(1)
* and ipcrm(1) be expanded or should new tools to manage both POSIX
* kernel semaphores and POSIX shared memory be written?

Gah. So basically that's hiding in shm_dictionary in the kernel and I
don't know a way to look at it from userspace (other than trying to
open all 2^32 random paths we're capable of generating).

It has *NOT* been rebooted. I can give y'all id's if you want to go
poking around.

Let me know soon(ish) if any of you want to poke at this machine, as I'm
likely to forget and reboot it.....

--
Thomas Munro
http://www.enterprisedb.com

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: ler@lerctr.org
US Mail: 5708 Sabbia Drive, Round Rock, TX 78665-2106

#21Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Larry Rosenman (#20)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 6:02 PM Larry Rosenman <ler@lerctr.org> wrote:

Let me know soon(ish) if any of you want to poke at this machine, as I'm
likely to forget and reboot it.....

Hi Larry,

Thanks for the offer but it looks like there is no way to get our
hands on that memory for forensics now. I'll see if there is some way
to improve that situation for next time something like this happens.

--
Thomas Munro
http://www.enterprisedb.com

#22Thomas Munro
thomas.munro@enterprisedb.com
In reply to: Amit Kapila (#19)
1 attachment(s)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

The below code seems to be problemetic:
dsm_cleanup_using_control_segment()
{
..
if (!dsm_control_segment_sane(old_control, mapped_size))
{
dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
&mapped_address, &mapped_size, LOG);
..
}

Here, don't we need to use dsm_control_* variables instead of local
variable mapped_* variables?

I was a little fuzzy on when exactly
dsm_cleanup_using_control_segment() and dsm_postmaster_shutdown() run,
but after some more testing I think I have this straight now. You can
test by setting dsm_control->magic to 42 in a debugger and trying
three cases:

1. Happy shutdown: dsm_postmaster_shutdown() complains on shutdown.
2. kill -9 a non-postmaster process: dsm_postmaster_shutdown()
complains during auto-restart.
3. kill -9 the postmaster, manually start up again:
dsm_cleanup_using_control_segment() runs. It ignores the old segment
quietly if it doesn't pass the sanity test.

So to answer your question: no, dsm_cleanup_using_control_segment() is
case 3. This entirely new postmaster process has never had the
segment mapped in, so the dsm_control_* variables are not relevant
here.

Hmm.... but if you're running N other independent clusters on the same
machine that started up after this cluster crashed in case 3, I think
there is an N-in-four-billion chance that the segment with that ID now
belongs to another cluster and happens to be its DSM control segment,
and therefore passes the magic-number sanity test, and then we'll nuke
it and all the segments it references. Am I missing something? Could
we fix that by putting some kind of cluster ID in the control segment?

So on second thoughts, I think I should remove the
dsm_cleanup_using_control_segment() change from the patch, because it
widens the risk of case 3 nuking someone else's stuff to include even
segments that don't pass the sanity test. That's not good. Hunk
removed.

... Do we need to reset the dsm_control_*
variables in dsm_cleanup_for_mmap?

I don't think so -- that unlinks the files in file-backed DSM, but we
also call the other code in that case.

@@ -336,13 +333,14 @@ dsm_postmaster_shutdown(int code, Datum arg)
* stray shared memory segments, but there's not much we can do about that
* if the metadata is gone.
*/
- nitems = dsm_control->nitems;
if (!dsm_control_segment_sane(dsm_control, dsm_control_mapped_size))
{
ereport(LOG,
(errmsg("dynamic shared memory control segment is corrupt")));
- return;
+ nitems = 0;

The comment above this code says:
/*
* If some other backend exited uncleanly, it might have corrupted the
* control segment while it was dying. In that case, we warn and ignore
* the contents of the control segment. This may end up leaving behind
* stray shared memory segments, but there's not much we can do about that
* if the metadata is gone.
*/

Don't we need to adjust this comment, if we want to go with what you
have proposed?

The comment seems correct: we ignore the *contents* of the control
segment. But to be clearer I'll add a sentence to say that we still
attempt to destroy the control segment itself.

BTW, it doesn't appear insane to me that if the
control segment got corrupted, then we leave it as it is rather than
trying to destroy it. I am not sure, but if it is corrupted, then is
it guaranteed that destroy will always succeed?

You're right that it might fail. If the shm_unlink() fails it doesn't
matter, we just produce LOG, but if the unmap() fails we still
consider the segment to be mapped (because it is!), so ... hmm, we'd
better forget about it if we plan to continue running, by clearing
those variables explicitly. That's actually another pre-existing way
for the assertion to fail in current master. (Realistically, that
unmmap() could only fail if our global variables are trashed so we try
to unmap() a junk address; AFAIK our model here is that the postmaster
is sane, even if everything else is insane, so that's sort of outside
the model. But this new patch addresses it anyway.)

Thanks for the review!

--
Thomas Munro
http://www.enterprisedb.com

Attachments:

0001-Fix-thinko-in-handling-of-corrupted-DSM-control-a-v2.patchapplication/octet-stream; name=0001-Fix-thinko-in-handling-of-corrupted-DSM-control-a-v2.patchDownload
From ead7cba6708502d79c559c67fd204e8df5b3ee99 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@enterprisedb.com>
Date: Thu, 18 Oct 2018 10:48:18 +1300
Subject: [PATCH] Fix thinko in handling of corrupted DSM control area.

If dsm_postmaster_shutdown() determines that the DSM control segment
is corrupted, it should still destroy it, otherwise it is leaked.
Leaving it mapped in was causing an assertion failure in
dsm_postmaster_startup() during crash restart.

Reported by build-farm animal peripatus.  The root cause of the
corrupted segment in that case is not yet known, but that's a
separate problem to investigate.

Author: Thomas Munro
Reviewed-by: Amit Kapila
Diagnosed-by: Tom Lane
Discussion: https://postgr.es/m/6153.1539806400%40sss.pgh.pa.us
---
 src/backend/storage/ipc/dsm.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 9629f22f7af..3521a00798e 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -138,8 +138,7 @@ static void *dsm_control_impl_private = NULL;
 /*
  * Start up the dynamic shared memory system.
  *
- * This is called just once during each cluster lifetime, at postmaster
- * startup time.
+ * This is called at postmaster startup time, and again during crash restarts.
  */
 void
 dsm_postmaster_startup(PGShmemHeader *shim)
@@ -334,15 +333,17 @@ dsm_postmaster_shutdown(int code, Datum arg)
 	 * control segment while it was dying.  In that case, we warn and ignore
 	 * the contents of the control segment.  This may end up leaving behind
 	 * stray shared memory segments, but there's not much we can do about that
-	 * if the metadata is gone.
+	 * if the metadata is gone.  We'll still attempt to destroy the segment
+	 * itself.
 	 */
-	nitems = dsm_control->nitems;
 	if (!dsm_control_segment_sane(dsm_control, dsm_control_mapped_size))
 	{
 		ereport(LOG,
 				(errmsg("dynamic shared memory control segment is corrupt")));
-		return;
+		nitems = 0;
 	}
+	else
+		nitems = dsm_control->nitems;
 
 	/* Remove any remaining segments. */
 	for (i = 0; i < nitems; ++i)
@@ -371,6 +372,16 @@ dsm_postmaster_shutdown(int code, Datum arg)
 	dsm_impl_op(DSM_OP_DESTROY, dsm_control_handle, 0,
 				&dsm_control_impl_private, &dsm_control_address,
 				&dsm_control_mapped_size, LOG);
+
+	/*
+	 * In the unlikely event that we failed to unmap the segment, we'll just
+	 * forget that we have it mapped so that we can proceed without an
+	 * assertion failure, in case of crash restart.
+	 */
+	dsm_control_address = NULL;
+	dsm_control_mapped_size = 0;
+	dsm_control_impl_private = NULL;
+
 	dsm_control = dsm_control_address;
 	shim->dsm_control = 0;
 }
-- 
2.19.1

#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Thomas Munro (#22)
Re: DSM robustness failure (was Re: Peripatus/failures)

On Thu, Oct 18, 2018 at 2:33 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Thu, Oct 18, 2018 at 5:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

The below code seems to be problemetic:
dsm_cleanup_using_control_segment()
{
..
if (!dsm_control_segment_sane(old_control, mapped_size))
{
dsm_impl_op(DSM_OP_DETACH, old_control_handle, 0, &impl_private,
&mapped_address, &mapped_size, LOG);
..
}

Here, don't we need to use dsm_control_* variables instead of local
variable mapped_* variables?

I was a little fuzzy on when exactly
dsm_cleanup_using_control_segment() and dsm_postmaster_shutdown() run,
but after some more testing I think I have this straight now. You can
test by setting dsm_control->magic to 42 in a debugger and trying
three cases:

1. Happy shutdown: dsm_postmaster_shutdown() complains on shutdown.
2. kill -9 a non-postmaster process: dsm_postmaster_shutdown()
complains during auto-restart.
3. kill -9 the postmaster, manually start up again:
dsm_cleanup_using_control_segment() runs. It ignores the old segment
quietly if it doesn't pass the sanity test.

So to answer your question: no, dsm_cleanup_using_control_segment() is
case 3. This entirely new postmaster process has never had the
segment mapped in, so the dsm_control_* variables are not relevant
here.

Hmm.... but if you're running N other independent clusters on the same
machine that started up after this cluster crashed in case 3, I think
there is an N-in-four-billion chance that the segment with that ID now
belongs to another cluster and happens to be its DSM control segment,
and therefore passes the magic-number sanity test, and then we'll nuke
it and all the segments it references. Am I missing something?

Unless the previous cluster (which crashed) has removed the segment,
how will new cluster succeed in getting the same segment. Won't it
get the EExist and retry to get the segment with another id?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com