Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]

Started by Dmitry Ivanovover 9 years ago4 messages
#1Dmitry Ivanov
d.ivanov@postgrespro.ru
1 attachment(s)

Recently I've encountered a strange crash while calling elog(ERROR, "...")
after the WaitForBackgroundWorkerShutdown() function. It turns out that
_returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
since they leave PG_exception_stack pointing to a local struct in a dead
frame, which is an obvious UB. I've attached a patch which fixes this behavior
in the aforementioned function, but there might be more errors like that
elsewhere.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

fix_WaitForBackgroundWorkerShutdown_v1.patchtext/x-patch; charset=UTF-8; name=fix_WaitForBackgroundWorkerShutdown_v1.patchDownload
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 76619e9..5ecb55a 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -1025,13 +1025,14 @@ WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
 
 			status = GetBackgroundWorkerPid(handle, &pid);
 			if (status == BGWH_STOPPED)
-				return status;
+				break;
 
 			rc = WaitLatch(&MyProc->procLatch,
 						   WL_LATCH_SET | WL_POSTMASTER_DEATH, 0);
 
 			if (rc & WL_POSTMASTER_DEATH)
-				return BGWH_POSTMASTER_DIED;
+				status = BGWH_POSTMASTER_DIED;
+				break;
 
 			ResetLatch(&MyProc->procLatch);
 		}
#2Dmitry Ivanov
d.ivanov@postgrespro.ru
In reply to: Dmitry Ivanov (#1)
1 attachment(s)
Re: Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]

Recently I've encountered a strange crash while calling elog(ERROR, "...")
after the WaitForBackgroundWorkerShutdown() function. It turns out that
_returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
since they leave PG_exception_stack pointing to a local struct in a dead
frame, which is an obvious UB. I've attached a patch which fixes this
behavior in the aforementioned function, but there might be more errors
like that elsewhere.

Forgot some curly braces, my bad. v2 attached.

--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

fix_WaitForBackgroundWorkerShutdown_v2.patchtext/x-patch; charset=UTF-8; name=fix_WaitForBackgroundWorkerShutdown_v2.patchDownload
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 76619e9..fd55262 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -1025,13 +1025,16 @@ WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle *handle)
 
 			status = GetBackgroundWorkerPid(handle, &pid);
 			if (status == BGWH_STOPPED)
-				return status;
+				break;
 
 			rc = WaitLatch(&MyProc->procLatch,
 						   WL_LATCH_SET | WL_POSTMASTER_DEATH, 0);
 
 			if (rc & WL_POSTMASTER_DEATH)
-				return BGWH_POSTMASTER_DIED;
+			{
+				status = BGWH_POSTMASTER_DIED;
+				break;
+			}
 
 			ResetLatch(&MyProc->procLatch);
 		}
#3Yury Zhuravlev
u.zhuravlev@postgrespro.ru
In reply to: Dmitry Ivanov (#2)
Re: Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]

Dmitry Ivanov wrote:

Recently I've encountered a strange crash while calling elog(ERROR, "...")
after the WaitForBackgroundWorkerShutdown() function. It turns out that
_returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
since they leave PG_exception_stack pointing to a local struct in a dead
frame, which is an obvious UB. I've attached a patch which fixes this
behavior in the aforementioned function, but there might be more errors
like that elsewhere.

Forgot some curly braces, my bad. v2 attached.

Good catch. But in 9.6 it has been fixed by
db0f6cad4884bd4c835156d3a720d9a79dbd63a9 commit.

--
Yury Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yury Zhuravlev (#3)
1 attachment(s)
Re: Bug in WaitForBackgroundWorkerShutdown() [REL9_5_STABLE]

Yury Zhuravlev <u.zhuravlev@postgrespro.ru> writes:

Dmitry Ivanov wrote:

Recently I've encountered a strange crash while calling elog(ERROR, "...")
after the WaitForBackgroundWorkerShutdown() function. It turns out that
_returns_ inside the PG_TRY()..PG_CATCH() block are *highly* undesirable,
since they leave PG_exception_stack pointing to a local struct in a dead
frame, which is an obvious UB. I've attached a patch which fixes this
behavior in the aforementioned function, but there might be more errors
like that elsewhere.

Good catch. But in 9.6 it has been fixed by
db0f6cad4884bd4c835156d3a720d9a79dbd63a9 commit.

Only accidentally, I'd say. I chose to push this into HEAD as well,
so that WaitForBackgroundWorkerShutdown would look more like
WaitForBackgroundWorkerStartup, and would not have a risk of future
breakage if someone adds code after the for-loop.

I was pretty worried about the "more errors like that" point, because
we recently fixed another similar mistake in plpython, cf 1d2fe56e4.
However, a search using the attached quick-hack script to look for
"return" etc between PG_TRY and PG_CATCH did not find any other
instances.

It'd be nice to have some automatic way of catching future mistakes
like this, but I'm not sure about a reliable way to do that.
One idea is to add an Assert to PG_CATCH:

#define PG_TRY() \
do { \
sigjmp_buf *save_exception_stack = PG_exception_stack; \
ErrorContextCallback *save_context_stack = error_context_stack; \
sigjmp_buf local_sigjmp_buf; \
if (sigsetjmp(local_sigjmp_buf, 0) == 0) \
{ \
PG_exception_stack = &local_sigjmp_buf

#define PG_CATCH() \
+ Assert(PG_exception_stack == &local_sigjmp_buf); \
} \
else \
{ \
PG_exception_stack = save_exception_stack; \
error_context_stack = save_context_stack

but there are a couple of disadvantages to this. One is that you don't
find out about a problem unless the bad code path is actually exercised.
I'm afraid that the most tempting places to do something like this would
be low-probability error cases, and thus that the Assert would do little
to catch them. (Case in point: the postmaster-death path fixed in this
patch would surely never have been caught by testing.) The other and
really worse problem is that when the Assert does fire, it's nowhere near
the scene of the crime, so while it may tell you there's a problem it
does not give much help in locating the bug.

Anyone have a better idea?

regards, tom lane

Attachments:

find_bad_pg_trytext/x-shellscript; charset=us-ascii; name=find_bad_pg_tryDownload