Dereference before NULL check (src/backend/storage/ipc/latch.c)

Started by Ranier Vilelaabout 5 years ago11 messages
#1Ranier Vilela
ranier.vf@gmail.com
1 attachment(s)

Hi,

Per Coverity.

If test set->latch against NULL, is why it can be NULL.
ResetEvent can dereference NULL.

regards,
Ranier Vilela

Attachments:

fix_dereference_before_null_check_latch.patchapplication/octet-stream; name=fix_dereference_before_null_check_latch.patchDownload
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 24d44c982d..f876dc7d97 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1835,15 +1835,18 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 
 	if (cur_event->events == WL_LATCH_SET)
 	{
-		if (!ResetEvent(set->latch->event))
-			elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
-
-		if (set->latch && set->latch->is_set)
+		if (set->latch)
 		{
-			occurred_events->fd = PGINVALID_SOCKET;
-			occurred_events->events = WL_LATCH_SET;
-			occurred_events++;
-			returned_events++;
+		    if (!ResetEvent(set->latch->event))
+			    elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
+
+		    if (set->latch->is_set)
+		    {
+			    occurred_events->fd = PGINVALID_SOCKET;
+			    occurred_events->events = WL_LATCH_SET;
+			    occurred_events++;
+			    returned_events++;
+			}
 		}
 	}
 	else if (cur_event->events == WL_POSTMASTER_DEATH)
#2Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Ranier Vilela (#1)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

At Sat, 31 Oct 2020 11:40:53 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in

Hi,

Per Coverity.

If test set->latch against NULL, is why it can be NULL.
ResetEvent can dereference NULL.

If the returned event is WL_LATCH_SET, set->latch cannot be NULL. We
shouldn't inadvertently ignore the unexpected or broken situation.We
could put Assert instead, but I think that we don't need do something
here at all since SIGSEGV would be raised at the right location.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Kyotaro Horiguchi (#2)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

On Mon, Nov 2, 2020 at 1:49 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Sat, 31 Oct 2020 11:40:53 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in

Per Coverity.

If test set->latch against NULL, is why it can be NULL.
ResetEvent can dereference NULL.

If the returned event is WL_LATCH_SET, set->latch cannot be NULL. We
shouldn't inadvertently ignore the unexpected or broken situation.We
could put Assert instead, but I think that we don't need do something
here at all since SIGSEGV would be raised at the right location.

Hmm. I changed that to support set->latch == NULL, so that you can
use the long lived WES in the rare code paths that call WaitLatch()
without a latch (for example the code I proposed at [1]/messages/by-id/CA+hUKGK1607VmtrDUHQXrsooU=ap4g4R2yaoByWOOA3m8xevUQ@mail.gmail.com). The Windows
version leaves the event handle of the most recently used latch in
set->handles[n] (because AFAICS there is no way to have a "hole" in
the handles array). The event can fire while you are waiting on "no
latch". Perhaps it should be changed to
ResetEvent(set->handles[cur_event->pos + 1])?

[1]: /messages/by-id/CA+hUKGK1607VmtrDUHQXrsooU=ap4g4R2yaoByWOOA3m8xevUQ@mail.gmail.com

#4Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Thomas Munro (#3)
1 attachment(s)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

At Mon, 2 Nov 2020 16:22:09 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in

On Mon, Nov 2, 2020 at 1:49 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Sat, 31 Oct 2020 11:40:53 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in

Per Coverity.

If test set->latch against NULL, is why it can be NULL.
ResetEvent can dereference NULL.

If the returned event is WL_LATCH_SET, set->latch cannot be NULL. We
shouldn't inadvertently ignore the unexpected or broken situation.We
could put Assert instead, but I think that we don't need do something
here at all since SIGSEGV would be raised at the right location.

Hmm. I changed that to support set->latch == NULL, so that you can
use the long lived WES in the rare code paths that call WaitLatch()
without a latch (for example the code I proposed at [1]). The Windows

Ooo. We don't update epoll events in that case. Ok, I understand
WL_LATCH_SET can fire while set->latch == NULL.

(I was confused by WaitEventAdjust* asserts set->latch != NULL for
WL_LATCH_SET. Shouldn't we move the check from ModifyWaitEvent() to
WaitEventAdjust*()?))

version leaves the event handle of the most recently used latch in
set->handles[n] (because AFAICS there is no way to have a "hole" in
the handles array). The event can fire while you are waiting on "no
latch". Perhaps it should be changed to
ResetEvent(set->handles[cur_event->pos + 1])?

[1] /messages/by-id/CA+hUKGK1607VmtrDUHQXrsooU=ap4g4R2yaoByWOOA3m8xevUQ@mail.gmail.com

Seems right. Just doing that *seems* to work fine, but somehow I
cannot build on Windows for now...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

avoid_segv_on_firing_latch_we_re_not_waiting_on_on_win32.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 24d44c982d..318261962e 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1835,7 +1835,11 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 
 	if (cur_event->events == WL_LATCH_SET)
 	{
-		if (!ResetEvent(set->latch->event))
+		/*
+		 * We cannot use set->latch->event to reset the fired event if we
+		 * actually aren't waiting on this latch now.
+		 */
+		if (!ResetEvent(set->handles[cur_event->pos + 1]))
 			elog(ERROR, "ResetEvent failed: error code %lu", GetLastError());
 
 		if (set->latch && set->latch->is_set)
#5Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#4)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

At Mon, 02 Nov 2020 14:33:40 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

At Mon, 2 Nov 2020 16:22:09 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in

On Mon, Nov 2, 2020 at 1:49 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Sat, 31 Oct 2020 11:40:53 -0300, Ranier Vilela <ranier.vf@gmail.com> wrote in

Per Coverity.

If test set->latch against NULL, is why it can be NULL.
ResetEvent can dereference NULL.

If the returned event is WL_LATCH_SET, set->latch cannot be NULL. We
shouldn't inadvertently ignore the unexpected or broken situation.We
could put Assert instead, but I think that we don't need do something
here at all since SIGSEGV would be raised at the right location.

Hmm. I changed that to support set->latch == NULL, so that you can
use the long lived WES in the rare code paths that call WaitLatch()
without a latch (for example the code I proposed at [1]). The Windows

Ooo. We don't update epoll events in that case. Ok, I understand
WL_LATCH_SET can fire while set->latch == NULL.

(I was confused by WaitEventAdjust* asserts set->latch != NULL for
WL_LATCH_SET. Isn't it better if we moved the check on latch from
ModifyWaitEvent() to WaitEventAdjust*()?))

version leaves the event handle of the most recently used latch in
set->handles[n] (because AFAICS there is no way to have a "hole" in
the handles array). The event can fire while you are waiting on "no
latch". Perhaps it should be changed to
ResetEvent(set->handles[cur_event->pos + 1])?

[1] /messages/by-id/CA+hUKGK1607VmtrDUHQXrsooU=ap4g4R2yaoByWOOA3m8xevUQ@mail.gmail.com

Seems right. Just doing that *seems* to work fine, but somehow I
cannot build on Windows for now...

That was caused by a leftover change on config_default.pl I made when
I tried to enable NLS.

I called SetLatch() during WaitLatch(NULL, ) but that doesn't fire
WL_LATCH_SET event for me on Windows. (I got it fired on Linux..) On
Windows, the latch is detected after exiting the WaitLatch()
call. Seems like MyLatch of waiter is different from
peerPGPROC->procLatch. And... an update for Visual Studio broke my
environment... I will investigate this further but everything feel
cumbersome on Windows...

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#6Ranier Vilela
ranier.vf@gmail.com
In reply to: Kyotaro Horiguchi (#5)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

Em seg., 2 de nov. de 2020 às 05:25, Kyotaro Horiguchi <
horikyota.ntt@gmail.com> escreveu:

At Mon, 02 Nov 2020 14:33:40 +0900 (JST), Kyotaro Horiguchi <
horikyota.ntt@gmail.com> wrote in

At Mon, 2 Nov 2020 16:22:09 +1300, Thomas Munro <thomas.munro@gmail.com>

wrote in

On Mon, Nov 2, 2020 at 1:49 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

At Sat, 31 Oct 2020 11:40:53 -0300, Ranier Vilela <

ranier.vf@gmail.com> wrote in

Per Coverity.

If test set->latch against NULL, is why it can be NULL.
ResetEvent can dereference NULL.

If the returned event is WL_LATCH_SET, set->latch cannot be NULL. We
shouldn't inadvertently ignore the unexpected or broken situation.We
could put Assert instead, but I think that we don't need do something
here at all since SIGSEGV would be raised at the right location.

Hmm. I changed that to support set->latch == NULL, so that you can
use the long lived WES in the rare code paths that call WaitLatch()
without a latch (for example the code I proposed at [1]). The Windows

Ooo. We don't update epoll events in that case. Ok, I understand
WL_LATCH_SET can fire while set->latch == NULL.

(I was confused by WaitEventAdjust* asserts set->latch != NULL for
WL_LATCH_SET. Isn't it better if we moved the check on latch from
ModifyWaitEvent() to WaitEventAdjust*()?))

version leaves the event handle of the most recently used latch in
set->handles[n] (because AFAICS there is no way to have a "hole" in
the handles array). The event can fire while you are waiting on "no
latch". Perhaps it should be changed to
ResetEvent(set->handles[cur_event->pos + 1])?

[1]

/messages/by-id/CA+hUKGK1607VmtrDUHQXrsooU=ap4g4R2yaoByWOOA3m8xevUQ@mail.gmail.com

Seems right. Just doing that *seems* to work fine, but somehow I
cannot build on Windows for now...

That was caused by a leftover change on config_default.pl I made when
I tried to enable NLS.

I called SetLatch() during WaitLatch(NULL, ) but that doesn't fire
WL_LATCH_SET event for me on Windows. (I got it fired on Linux..) On
Windows, the latch is detected after exiting the WaitLatch()
call. Seems like MyLatch of waiter is different from
peerPGPROC->procLatch. And... an update for Visual Studio broke my
environment... I will investigate this further but everything feel
cumbersome on Windows...

I can build.
vc_regress is it enough to test it?

regards,
Ranier Vilela

#7Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Kyotaro Horiguchi (#5)
1 attachment(s)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

At Mon, 02 Nov 2020 17:25:04 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in

I called SetLatch() during WaitLatch(NULL, ) but that doesn't fire
WL_LATCH_SET event for me on Windows. (I got it fired on Linux..) On
Windows, the latch is detected after exiting the WaitLatch()
call. Seems like MyLatch of waiter is different from
peerPGPROC->procLatch. And... an update for Visual Studio broke my
environment... I will investigate this further but everything feel
cumbersome on Windows...

I managed to reproduce the issue. FWIW the attached modifies
pg_backend_pid() to call "WaitLatch(NULL," and
pg_terminate_backend(pid) to SetLatch() to the process latch of the
pid. (It's minunderstanding that I could reproduce this on Linux.)

Session A:
=# select pg_backend_pid(); -- sleeps for 10 seconds.

Session B:
=# select pg_terminate_backend(A-pid);

[11628]: LOG: terminating any other active server processes
[11628]: LOG: terminating any other active server processes
[11628]: LOG: terminating any other active server processes
[11628]: LOG: terminating any other active server processes
[2948]: WARNING: terminating connection because of crash of another server process 2
2

With the fix patch, it changes to:

[16632]: LOG: FALSE LATCH: 0000000000000000

rebards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

repro.difftext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 24d44c98..e900bd3c 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -1845,6 +1845,8 @@ WaitEventSetWaitBlock(WaitEventSet *set, int cur_timeout,
 			occurred_events++;
 			returned_events++;
 		}
+		else if (!set->latch)
+		  elog(LOG, "FALSE LATCH: %p", set->latch);
 	}
 	else if (cur_event->events == WL_POSTMASTER_DEATH)
 	{
diff --git a/src/backend/storage/ipc/signalfuncs.c b/src/backend/storage/ipc/signalfuncs.c
index d822e82c..b7340bf4 100644
--- a/src/backend/storage/ipc/signalfuncs.c
+++ b/src/backend/storage/ipc/signalfuncs.c
@@ -134,7 +134,13 @@ pg_cancel_backend(PG_FUNCTION_ARGS)
 Datum
 pg_terminate_backend(PG_FUNCTION_ARGS)
 {
-	int			r = pg_signal_backend(PG_GETARG_INT32(0), SIGTERM);
+	int r = 0;
+	PGPROC *p = BackendPidGetProc(PG_GETARG_INT32(0));
+	
+	if (p)
+	    SetLatch(&p->procLatch);
+
+	PG_RETURN_BOOL(false);
 
 	if (r == SIGNAL_BACKEND_NOSUPERUSER)
 		ereport(ERROR,
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index a210fc93..1e2b97a9 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -960,6 +960,9 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 Datum
 pg_backend_pid(PG_FUNCTION_ARGS)
 {
+	WaitLatch(MyLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 1, WAIT_EVENT_RECOVERY_PAUSE);
+	WaitLatch(NULL, WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 10000, WAIT_EVENT_RECOVERY_PAUSE);
+
 	PG_RETURN_INT32(MyProcPid);
 }
 
#8Thomas Munro
thomas.munro@gmail.com
In reply to: Kyotaro Horiguchi (#7)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

On Tue, Nov 3, 2020 at 12:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

With the fix patch, it changes to:

[16632] LOG: FALSE LATCH: 0000000000000000

Nice repo. But is it OK to not reset the Win32 event in this case?
Does it still work correctly if you wait on the latch after that
happened, and perhaps after the PG latch is reset?

#9Kyotaro Horiguchi
horikyota.ntt@gmail.com
In reply to: Thomas Munro (#8)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

At Tue, 3 Nov 2020 20:44:23 +1300, Thomas Munro <thomas.munro@gmail.com> wrote in

On Tue, Nov 3, 2020 at 12:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

With the fix patch, it changes to:

[16632] LOG: FALSE LATCH: 0000000000000000

Nice repo. But is it OK to not reset the Win32 event in this case?
Does it still work correctly if you wait on the latch after that
happened, and perhaps after the PG latch is reset?

I'm not sure what is the point of the question, but the previous patch
doesn't omit resetting the Win32-event in that case. In the same way
with other implements of the same function, it resets the trigger that
woke up the process since the trigger is no longer needed even if we
are not waiting on it.

If we call WaitLatch(OrSocket) that waits on the latch, it immediately
returns because the latch is set. If we called ResetLatch before the
next call to WaitLatch(), it correctly waits on a trigger to be
pulled.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#10Ranier Vilela
ranier.vf@gmail.com
In reply to: Kyotaro Horiguchi (#9)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

Em ter., 3 de nov. de 2020 às 22:09, Kyotaro Horiguchi <
horikyota.ntt@gmail.com> escreveu:

At Tue, 3 Nov 2020 20:44:23 +1300, Thomas Munro <thomas.munro@gmail.com>
wrote in

On Tue, Nov 3, 2020 at 12:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

With the fix patch, it changes to:

[16632] LOG: FALSE LATCH: 0000000000000000

Nice repo. But is it OK to not reset the Win32 event in this case?
Does it still work correctly if you wait on the latch after that
happened, and perhaps after the PG latch is reset?

I'm not sure what is the point of the question, but the previous patch
doesn't omit resetting the Win32-event in that case. In the same way
with other implements of the same function, it resets the trigger that
woke up the process since the trigger is no longer needed even if we
are not waiting on it.

If we call WaitLatch(OrSocket) that waits on the latch, it immediately
returns because the latch is set. If we called ResetLatch before the
next call to WaitLatch(), it correctly waits on a trigger to be
pulled.

+1
The patch for me is syntactically equal to the code changed and
avoids the dereference.

regards,
Ranier Vilela

#11Thomas Munro
thomas.munro@gmail.com
In reply to: Ranier Vilela (#10)
Re: Dereference before NULL check (src/backend/storage/ipc/latch.c)

On Thu, Nov 5, 2020 at 10:47 AM Ranier Vilela <ranier.vf@gmail.com> wrote:

Em ter., 3 de nov. de 2020 às 22:09, Kyotaro Horiguchi <horikyota.ntt@gmail.com> escreveu:

If we call WaitLatch(OrSocket) that waits on the latch, it immediately
returns because the latch is set. If we called ResetLatch before the
next call to WaitLatch(), it correctly waits on a trigger to be
pulled.

+1
The patch for me is syntactically equal to the code changed and
avoids the dereference.

Thanks! Pushed.