ubsan fails on 32bit builds

Started by Andres Freundabout 3 years ago6 messages
#1Andres Freund
andres@anarazel.de

Hi,

I am working on polishing my patch to make CI use sanitizers. Unfortunately
using -fsanitize=alignment,undefined causes tests to fail on 32bit builds.

https://cirrus-ci.com/task/5092504471601152
https://api.cirrus-ci.com/v1/artifact/task/5092504471601152/testrun/build-32/testrun/recovery/022_crash_temp_files/log/022_crash_temp_files_node_crash.log

../src/backend/storage/lmgr/proc.c:1173:2: runtime error: member access within misaligned address 0xf4019e54 for type 'struct PGPROC', which requires 8 byte alignment
0xf4019e54: note: pointer points here
e0 0d 09 f4 54 9e 01 f4 54 9e 01 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
==65203==Using libbacktrace symbolizer.
#0 0x57076f46 in ProcSleep ../src/backend/storage/lmgr/proc.c:1173
#1 0x57054cf7 in WaitOnLock ../src/backend/storage/lmgr/lock.c:1859
#2 0x57058e4f in LockAcquireExtended ../src/backend/storage/lmgr/lock.c:1101
#3 0x57058f82 in LockAcquire ../src/backend/storage/lmgr/lock.c:752
#4 0x57051bb8 in XactLockTableWait ../src/backend/storage/lmgr/lmgr.c:702
#5 0x569c31b3 in _bt_doinsert ../src/backend/access/nbtree/nbtinsert.c:225
#6 0x569cff09 in btinsert ../src/backend/access/nbtree/nbtree.c:200
#7 0x569ac19d in index_insert ../src/backend/access/index/indexam.c:193
#8 0x56c72af6 in ExecInsertIndexTuples ../src/backend/executor/execIndexing.c:416
#9 0x56d014c7 in ExecInsert ../src/backend/executor/nodeModifyTable.c:1065
...

I can reproduce this locally.

At first I thought the problem was caused by:
46d6e5f5679 Display the time when the process started waiting for the lock, in pg_locks, take 2

as pg_atomic_uint64 is 8 byte aligned on x86 - otherwise one gets into
terrible terrible performance territory because atomics can be split across
cachelines - but 46d6e5f5679 didn't teach ProcGlobalShmemSize() /
InitProcGlobal() that allocations need to be aligned to a larger
size. However, we've made ShmemAllocRaw() use cacheline alignment, which
should suffice. And indeed - ProcGlobal->allProcs is aligned correctly, and
sizeof(PGPROC) % 8 == 0. It doesn't seem great to rely on that, but ...

Printing out *proc in proc.c:1173 seems indicates clearly that it's not a
valid proc for some reason.

(gdb) p myHeldLocks
$26 = 0
(gdb) p lock->waitProcs
$27 = {links = {prev = 0xf33c4b5c, next = 0xf33c4b5c}, size = 0}
(gdb) p &(waitQueue->links)
$29 = (SHM_QUEUE *) 0xf33c4b5c
(gdb) p proc
$30 = (PGPROC *) 0xf33c4b5c

Afaict the problem is that
proc = (PGPROC *) &(waitQueue->links);

is a gross gross hack - this isn't actually a PGPROC, it's pointing to an
SHM_QUEUE, but *not* one embedded in PGPROC. It kinda works because ->links
is at offset 0 in PGPROC, which means that
SHMQueueInsertBefore(&(proc->links), &(MyProc->links));
will turn &proc->links back into waitQueue->links. Which we then can enqueue
again.

I don't see the point of this hack, even leaving ubsan's valid complaints
aside. Why bother having this, sometimes, fake PGPROC pointer when we could
just use a SHM_QUEUE* to determine the insertion point?

Greetings,

Andres Freund

#2Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#1)
1 attachment(s)
Re: ubsan fails on 32bit builds

Hi,

On 2022-11-16 17:42:30 -0800, Andres Freund wrote:

Afaict the problem is that
proc = (PGPROC *) &(waitQueue->links);

is a gross gross hack - this isn't actually a PGPROC, it's pointing to an
SHM_QUEUE, but *not* one embedded in PGPROC. It kinda works because ->links
is at offset 0 in PGPROC, which means that
SHMQueueInsertBefore(&(proc->links), &(MyProc->links));
will turn &proc->links back into waitQueue->links. Which we then can enqueue
again.

I don't see the point of this hack, even leaving ubsan's valid complaints
aside. Why bother having this, sometimes, fake PGPROC pointer when we could
just use a SHM_QUEUE* to determine the insertion point?

As done in the attached patch. With this ubsan passes both on 32bit and 64bit.

Greetings,

Andres Freund

Attachments:

0001-Fix-mislabeling-of-PROC_QUEUE-links-as-PGPROC-fixing.patchtext/x-diff; charset=us-asciiDownload
From 776f002952f92c764e98dfe8c180a00c1f60ab09 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres@anarazel.de>
Date: Wed, 16 Nov 2022 20:00:59 -0800
Subject: [PATCH 1/3] Fix mislabeling of PROC_QUEUE->links as PGPROC, fixing
 ubsan on 32bit

Author:
Reviewed-by:
Discussion: https://postgr.es/m/20221117014230.op5kmgypdv2dtqsf@awork3.anarazel.de
Backpatch:
---
 src/backend/storage/lmgr/proc.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 13fa07b0ff7..214866502ed 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1050,13 +1050,13 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	uint32		hashcode = locallock->hashcode;
 	LWLock	   *partitionLock = LockHashPartitionLock(hashcode);
 	PROC_QUEUE *waitQueue = &(lock->waitProcs);
+	SHM_QUEUE  *waitQueuePos;
 	LOCKMASK	myHeldLocks = MyProc->heldLocks;
 	TimestampTz standbyWaitStart = 0;
 	bool		early_deadlock = false;
 	bool		allow_autovacuum_cancel = true;
 	bool		logged_recovery_conflict = false;
 	ProcWaitStatus myWaitStatus;
-	PGPROC	   *proc;
 	PGPROC	   *leader = MyProc->lockGroupLeader;
 	int			i;
 
@@ -1104,13 +1104,16 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	 * we are only considering the part of the wait queue before my insertion
 	 * point.
 	 */
-	if (myHeldLocks != 0)
+	if (myHeldLocks != 0 && waitQueue->size > 0)
 	{
 		LOCKMASK	aheadRequests = 0;
+		SHM_QUEUE  *proc_node;
 
-		proc = (PGPROC *) waitQueue->links.next;
+		proc_node = waitQueue->links.next;
 		for (i = 0; i < waitQueue->size; i++)
 		{
+			PGPROC	   *proc = (PGPROC *) proc_node;
+
 			/*
 			 * If we're part of the same locking group as this waiter, its
 			 * locks neither conflict with ours nor contribute to
@@ -1118,7 +1121,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			 */
 			if (leader != NULL && leader == proc->lockGroupLeader)
 			{
-				proc = (PGPROC *) proc->links.next;
+				proc_node = proc->links.next;
 				continue;
 			}
 			/* Must he wait for me? */
@@ -1157,20 +1160,21 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 
 		/*
-		 * If we fall out of loop normally, proc points to waitQueue head, so
-		 * we will insert at tail of queue as desired.
+		 * If we iterated through the whole queue, cur points to the waitQueue
+		 * head, so we will insert at tail of queue as desired.
 		 */
+		waitQueuePos = proc_node;
 	}
 	else
 	{
 		/* I hold no locks, so I can't push in front of anyone. */
-		proc = (PGPROC *) &(waitQueue->links);
+		waitQueuePos = &waitQueue->links;
 	}
 
 	/*
-	 * Insert self into queue, ahead of the given proc (or at tail of queue).
+	 * Insert self into queue, at the position determined above.
 	 */
-	SHMQueueInsertBefore(&(proc->links), &(MyProc->links));
+	SHMQueueInsertBefore(waitQueuePos, &MyProc->links);
 	waitQueue->size++;
 
 	lock->waitMask |= LOCKBIT_ON(lockmode);
-- 
2.38.0

#3Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#1)
Re: ubsan fails on 32bit builds

On Wed, Nov 16, 2022 at 8:42 PM Andres Freund <andres@anarazel.de> wrote:

Afaict the problem is that
proc = (PGPROC *) &(waitQueue->links);

is a gross gross hack - this isn't actually a PGPROC, it's pointing to an
SHM_QUEUE, but *not* one embedded in PGPROC. It kinda works because ->links
is at offset 0 in PGPROC, which means that
SHMQueueInsertBefore(&(proc->links), &(MyProc->links));
will turn &proc->links back into waitQueue->links. Which we then can enqueue
again.

Not that I object to a targeted fix, but it's been 10 years since
slist and dlist were committed, and we really ought to eliminate
SHM_QUEUE entirely in favor of using those. It's basically an
open-coded implementation of something for which we now have a
toolkit. Not that it's impossible to make this kind of mistake with a
toolkit, but in general open-coding the same logic in multiple places
increases the risk of bugs.

--
Robert Haas
EDB: http://www.enterprisedb.com

#4Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#3)
Re: ubsan fails on 32bit builds

Hi,

On 2022-11-17 14:20:47 -0500, Robert Haas wrote:

On Wed, Nov 16, 2022 at 8:42 PM Andres Freund <andres@anarazel.de> wrote:

Afaict the problem is that
proc = (PGPROC *) &(waitQueue->links);

is a gross gross hack - this isn't actually a PGPROC, it's pointing to an
SHM_QUEUE, but *not* one embedded in PGPROC. It kinda works because ->links
is at offset 0 in PGPROC, which means that
SHMQueueInsertBefore(&(proc->links), &(MyProc->links));
will turn &proc->links back into waitQueue->links. Which we then can enqueue
again.

Not that I object to a targeted fix

Should we backpatch this fix? Likely this doesn't cause active breakage
outside of 32bit builds under ubsan, but that's not an unreasonable thing to
want to do in the backbranches.

but it's been 10 years since
slist and dlist were committed, and we really ought to eliminate
SHM_QUEUE entirely in favor of using those. It's basically an
open-coded implementation of something for which we now have a
toolkit. Not that it's impossible to make this kind of mistake with a
toolkit, but in general open-coding the same logic in multiple places
increases the risk of bugs.

Agreed. I had started on a set of patches for some of the SHM_QUEUE uses, but
somehow we ended up a bit stuck on the naming of dlist_delete variant that
afterwards zeroes next/prev so we can replace SHMQueueIsDetached() uses.

Should probably find and rebase those patches...

Greetings,

Andres Freund

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: ubsan fails on 32bit builds

Andres Freund <andres@anarazel.de> writes:

On 2022-11-17 14:20:47 -0500, Robert Haas wrote:

Not that I object to a targeted fix

Should we backpatch this fix? Likely this doesn't cause active breakage
outside of 32bit builds under ubsan, but that's not an unreasonable thing to
want to do in the backbranches.

+1 for backpatching what you showed.

but it's been 10 years since
slist and dlist were committed, and we really ought to eliminate
SHM_QUEUE entirely in favor of using those.

Agreed. I had started on a set of patches for some of the SHM_QUEUE uses, but
somehow we ended up a bit stuck on the naming of dlist_delete variant that
afterwards zeroes next/prev so we can replace SHMQueueIsDetached() uses.

Also +1, but of course for HEAD only.

regards, tom lane

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#4)
Re: ubsan fails on 32bit builds

On Fri, Nov 18, 2022 at 9:13 AM Andres Freund <andres@anarazel.de> wrote:

Agreed. I had started on a set of patches for some of the SHM_QUEUE uses, but
somehow we ended up a bit stuck on the naming of dlist_delete variant that
afterwards zeroes next/prev so we can replace SHMQueueIsDetached() uses.

Should probably find and rebase those patches...

/messages/by-id/20200211042229.msv23badgqljrdg2@alap3.anarazel.de