[PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Started by Chengchao Yuabout 7 years ago13 messages

chengyu@microsoft.com

about 7 years ago

2 attachment(s)

Greetings,

Recently, we hit a few occurrences of deadlock when IO failure (including disk full, random remote disk IO failures) happens in single user mode. We found the issue exists on both Linux and Windows in multiple postgres versions.

Here are the steps to repro on Linux (as Windows repro is similar):

1. Get latest PostgreSQL code, build and install the executables.

$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install

2. Run initdb to initialize a PG database folder.

$ /path/to/postgres/bin/initdb -D /path/to/data

3. Because the unable to write relation data scenario is difficult to hit naturally even reserved space is turned off, I have prepared a small patch (see attachment "emulate-error.patch") to force an error when PG tries to write data to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any more.

$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install

4. Connect to the newly initialized database cluster with single user mode, create a table, and insert some data to the table, do a checkpoint or directly give EOF. Then we hit the deadlock issue and the process will not exit until we kill it.

Do a checkpoint explicitly:

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

create table t1(a int);

insert into t1 values (1), (2), (3);

checkpoint;

EOF

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL: Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT: writing block 8 of relation base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT: checkpoint;

2018-11-29 02:55:27.900 UTC [18806] FATAL: Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT: writing block 8 of relation base/12368/1247

Or directly give an EOF:

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true <<EOF

create table t1(a int);

insert into t1 values (1), (2), (3);

EOF

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL: Emulate exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT: writing block 8 of relation base/12368/1247

5. Moreover, when we try to recover the database with single user mode, we hit the issue again, and the process does not bring up the database nor exit.

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG: database system shutdown was interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG: database system was not properly shut down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG: redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG: invalid record length at 0/1684B90: wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG: redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG: last completed transaction was at log time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL: Emulate exception in mdwrite() when writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT: writing block 8 of relation base/12368/1247

Analyses:

So, what happened? Actually, there are 2 types of the deadlock due to the same root cause. Let's first take a look at the scenario in step #5. In this scenario, the deadlock happens when disk IO failure occurs inside StartupXLOG(). If we attach debugger to PG process, we will see the process is stuck acquiring the buffer's lw-lock in AbortBufferIO().

void

AbortBufferIO(void)

{

BufferDesc *buf = InProgressBuf;

if (buf)

{

uint32 buf_state;

* Since LWLockReleaseAll has already been called, we're not holding

* the buffer's io_in_progress_lock. We have to re-acquire it so that

* we can use TerminateBufferIO. Anyone who's executing WaitIO on the

* buffer will be in a busy spin until we succeed in doing this.

LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

This is because the same lock has been acquired before buffer manager attempts to flush the buffer page, which happens in StartBufferIO().

static bool

StartBufferIO(BufferDesc *buf, bool forInput)

{

uint32 buf_state;

Assert(!InProgressBuf);

for (;;)

{

* Grab the io_in_progress lock so that other processes can wait for

* me to finish the I/O.

LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

buf_state = LockBufHdr(buf);

if (!(buf_state & BM_IO_IN_PROGRESS))

break;

After reading the code, AtProcExit_Buffers() assumes all the lw-locks are released. However, in single user mode, at the time StartupXLOG() is being executed, there is no before_shmem_exit/on_shmem_exit callback registered to release the lw-locks.

And, given lw-lock is non-reentrant, so the process gets stuck re-acquiring the same lock.

Here is the call stack:

(gdb) bt

#0 0x00007f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f0fd14c81b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1 do_futex_wait (sem=sem@entry=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:111

#2 0x00007f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8, abstime=0x0) at sem_waitcommon.c:181

#3 0x00005630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at pg_sema.c:316

#4 0x00005630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5 0x00005630d47cd087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6 0x00005630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7 0x00005630d47dbc32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8 0x00005630d47dba5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9 0x00005630d47db9c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x00005630d49811bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x00005630d4801f1f in mdwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x00005630d4804716 in smgrwrite (reln=0x5630d6588c68, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x00005630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00, reln=0x5630d6588c68) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x00005630d47cb219 in SyncOneBuffer (buf_id=0, skip_recently_used=false, wb_context=0x7ffccc371a70) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x00005630d47cab00 in BufferSync (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x00005630d47cb57f in CheckPointBuffers (flags=6) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x00005630d44a685b in CheckPointGuts (checkPointRedo=23612304, flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x00005630d44a62cf in CreateCheckPoint (flags=6) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x00005630d44a45e3 in StartupXLOG () at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723

#20 0x00005630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0 "postgres", dboid=0, username=0x5630d653d7d0 "chengyu", useroid=0, out_dbname=0x0, override_allow_connections=false)

at /path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636

#21 0x00005630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20, dbname=0x5630d65582b0 "postgres", username=0x5630d653d7d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810

#22 0x00005630d4695e8b in main (argc=7, argv=0x5630d6534d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

(gdb) p on_shmem_exit_list

$1 = {{function = 0x55801cc68f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x55801cc68a4f <IpcMemoryDelete>, arg = 2490396},

{function = 0x55801cc689f8 <IpcMemoryDetach>, arg = 140602018975744},

{function = 0x55801cc6842e <ReleaseSemaphores>, arg = 0},

{function = 0x55801ccec48a <dsm_postmaster_shutdown>, arg = 140602018975744},

{function = 0x55801cd04053 <ProcKill>, arg = 0},

{function = 0x55801cd0402d <RemoveProcFromArray>, arg = 0},

{function = 0x55801ccf74e8 <CleanupInvalidationState>, arg = 140601991817088},

{function = 0x55801ccf446f <CleanupProcSignalState>, arg = 1},

{function = 0x55801ccdd3e5 <AtProcExit_Buffers>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$2 = {{function = 0x0, arg = 0} <repeats 20 times>}

The second type is in Step #4. At the time when "checkpoint" SQL command is being executed, PG has already set up the before_shmem_exit callback ShutdownPostgres(), which releases all lw-locks given transaction or sub-transaction is on-going. So after the first IO error, the buffer page's lw-lock gets released successfully. However, later ShutdownXLOG() is invoked, and PG tries to flush buffer pages again, which results in the second IO error. Different from the first time, this time, all the previous executed before/on_shmem_exit callbacks are not invoked again due to the decrease of the indexes. So lw-locks for buffer pages are not released when PG tries to get the same buffer lock in AbortBufferIO(), and then PG process gets stuck.

Here is the call stack:

(gdb) bt

#0 0x00007ff0c0c036d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7ff0b69001b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

#1 do_futex_wait (sem=sem@entry=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:111

#2 0x00007ff0c0c037c8 in __new_sem_wait_slow (sem=0x7ff0b69001b8, abstime=0x0) at sem_waitcommon.c:181

#3 0x0000562077cc258a in PGSemaphoreLock (sema=0x7ff0b69001b8) at pg_sema.c:316

#4 0x0000562077d6289e in LWLockAcquire (lock=0x7ff0bef225c0, mode=LW_EXCLUSIVE) at /path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5 0x0000562077d39087 in AbortBufferIO () at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6 0x0000562077d373f9 in AtProcExit_Buffers (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7 0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8 0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9 0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3e230) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x0000562077d36b00 in BufferSync (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x0000562077d3757f in CheckPointBuffers (flags=5) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x0000562077a122cf in CreateCheckPoint (flags=5) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x0000562077a1164f in ShutdownXLOG (code=1, arg=0) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8485

#20 0x0000562077d47c32 in shmem_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#21 0x0000562077d47a5e in proc_exit_prepare (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#22 0x0000562077d479c6 in proc_exit (code=1) at /path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#23 0x0000562077eed1bc in errfinish (dummy=0) at /path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#24 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#25 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18, forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "", skipFsync=false) at /path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#26 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80, reln=0x562078a12a18) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#27 0x0000562077d37219 in SyncOneBuffer (buf_id=78, skip_recently_used=false, wb_context=0x7fffb0e3fb10) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#28 0x0000562077d36b00 in BufferSync (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#29 0x0000562077d3757f in CheckPointBuffers (flags=44) at /path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#30 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152, flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#31 0x0000562077a122cf in CreateCheckPoint (flags=44) at /path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#32 0x0000562077cca792 in RequestCheckpoint (flags=44) at /path/to/postgres/source/build/../src/backend/postmaster/checkpointer.c:976

#33 0x0000562077d7bce4 in standard_ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:769

#34 0x0000562077d7b204 in ProcessUtility (pstmt=0x562078a00b50, queryString=0x562078a00100 "checkpoint;\n", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x5620783ac5e0 <debugtupDR>,

completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/utility.c:360

#35 0x0000562077d7a347 in PortalRunUtility (portal=0x5620789f20c0, pstmt=0x562078a00b50, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "")

at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1178

#36 0x0000562077d7a534 in PortalRunMulti (portal=0x5620789f20c0, isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1324

#37 0x0000562077d79a61 in PortalRun (portal=0x5620789f20c0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

completionTag=0x7fffb0e41520 "") at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:799

#38 0x0000562077d734c5 in exec_simple_query (query_string=0x562078a00100 "checkpoint;\n") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:1145

#39 0x0000562077d77bd5 in PostgresMain (argc=7, argv=0x562078980d20, dbname=0x5620789a42b0 "postgres", username=0x5620789897d0 "chengyu") at /path/to/postgres/source/build/../src/backend/tcop/postgres.c:4182

#40 0x0000562077c01e8b in main (argc=7, argv=0x562078980d20) at /path/to/postgres/source/build/../src/backend/main/main.c:224

(gdb) p on_shmem_exit_list

$9 = {{function = 0x562077cc2f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x562077cc2a4f <IpcMemoryDelete>, arg = 2457627},

{function = 0x562077cc29f8 <IpcMemoryDetach>, arg = 140672005165056},

{function = 0x562077cc242e <ReleaseSemaphores>, arg = 0},

{function = 0x562077d4648a <dsm_postmaster_shutdown>, arg = 140672005165056},

{function = 0x562077d5e053 <ProcKill>, arg = 0},

{function = 0x562077d5e02d <RemoveProcFromArray>, arg = 0},

{function = 0x562077d514e8 <CleanupInvalidationState>, arg = 140671978006400},

{function = 0x562077d4e46f <CleanupProcSignalState>, arg = 1},

{function = 0x562077d373e5 <AtProcExit_Buffers>, arg = 0},

{function = 0x562077a1159d <ShutdownXLOG>, arg = 0},

{function = 0x562077cd0637 <pgstat_beshutdown_hook>, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$10 = {{function = 0x562077f02caa <ShutdownPostgres>, arg = 0}, {function = 0x0, arg = 0} <repeats 19 times>}

OK, now we understand the deadlock issue for single user mode. However, will this issue affect multi-user mode (i.e. under postmaster process)? We can have 3 cases for discussion:

1. Startup process: at the time StartupXLOG() is invoked, ShutdownAuxiliaryProcess(), which will release all the lw-locks, has been already registered in before_shmem_exit_list[]. So this case is safe.
2. Checkpointer process: ShutdownXLOG() is not registered as a before/on_shmem_exit callback, instead, it's only invoked in the main loop. So there is no chance to hit IO error for second time during shared memory exit callbacks. Also, Same as startup process, ShutdownAuxiliaryProcess() has been registered. So this case is also safe.
3. Other backend/background processes: these processes do not handle XLOG startup or shutdown, and are protected by ShutdownAuxiliaryProcess(). So they are safe to exit too.

In addition, we have done multiple experiments to confirm these cases.

Affected versions: we found this issue in 9.5, 9.6, 10, 11 and 12devel.

Fix proposal:

According to the affected 2 types of deadlock in single user mode discussed above, there might be multiple ways to fix this issue. In the fix proposal we would like to present, we register a new callback to release all the lw-locks (just like what ShutdownAuxiliaryProcess()does) in an order after AtProcExit_Buffers() and before ShutdownXLOG(). Also, it is registered before PG enters StartupXLOG(), so it can cover the case when ShutdownPostgres() has not been registered. Here is the content of the proposal:

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c

index 62baaf0ab3..d74e8aa1d5 100644

--- a/src/backend/utils/init/postinit.c

+++ b/src/backend/utils/init/postinit.c

@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);

static void PerformAuthentication(Port *port);

static void CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connections);

static void InitCommunication(void);

+static void ReleaseLWLocks(int code, Datum arg);

static void ShutdownPostgres(int code, Datum arg);

static void StatementTimeoutHandler(void);

static void LockTimeoutHandler(void);

@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,

* way, start up the XLOG machinery, and register to have it closed

* down at exit.

+ on_shmem_exit(ReleaseLWLocks, 0);

StartupXLOG();

on_shmem_exit(ShutdownXLOG, 0);

}

@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)

heap_close(relsetting, AccessShareLock);

}

+/*

+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers() is

+ * invoked in a bootstrap process or a standalone backend:

+ * (1) Exceptions thrown during StartupXLOG()

+ * (2) Exceptions thrown during exception-handling in ShutdownXLOG()

+ * So we need this on_shmem_exit callback for single user mode.

+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release

+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so there

+ * is no such issue. Also, please note this callback should be registered in

+ * the order after AtProcExit_buffers() and before ShutdownXLOG().

+ */

+static void

+ReleaseLWLocks(int code, Datum arg)

+ LWLockReleaseAll();

* Backend-shutdown callback. Do cleanup that we want to be sure happens

* before all the supporting modules begin to nail their doors shut via

The fix proposal is also attached to this email in file "fix-deadlock.patch".

Please let us know should you have suggestions on this issue and the fix.

Thank you!

Best regards,

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

Attachments:

emulate-error.patchapplication/octet-stream; name=emulate-error.patchDownload

diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 10e0271473..a9b71058fe 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
 										nbytes,
 										BLCKSZ);
 
+	ereport(ERROR,
+			(errcode(ERRCODE_INTERNAL_ERROR),
+			 errmsg("Emulate exception in mdwrite() when writing to disk")));
+
 	if (nbytes != BLCKSZ)
 	{
 		if (nbytes < 0)

fix-deadlock.patchapplication/octet-stream; name=fix-deadlock.patchDownload

diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 62baaf0ab3..d74e8aa1d5 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);
 static void PerformAuthentication(Port *port);
 static void CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connections);
 static void InitCommunication(void);
+static void ReleaseLWLocks(int code, Datum arg);
 static void ShutdownPostgres(int code, Datum arg);
 static void StatementTimeoutHandler(void);
 static void LockTimeoutHandler(void);
@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 		 * way, start up the XLOG machinery, and register to have it closed
 		 * down at exit.
 		 */
+		on_shmem_exit(ReleaseLWLocks, 0);
 		StartupXLOG();
 		on_shmem_exit(ShutdownXLOG, 0);
 	}
@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)
 	heap_close(relsetting, AccessShareLock);
 }
 
+/*
+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers() is
+ * invoked in a bootstrap process or a standalone backend:
+ *  (1) Exceptions thrown during StartupXLOG()
+ *  (2) Exceptions thrown during exception-handling in ShutdownXLOG()
+ * So we need this on_shmem_exit callback for single user mode.
+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release
+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so there
+ * is no such issue. Also, please note this callback should be registered in
+ * the order after AtProcExit_buffers() and before ShutdownXLOG().
+ */
+static void
+ReleaseLWLocks(int code, Datum arg)
+{
+	LWLockReleaseAll();
+}
+
 /*
  * Backend-shutdown callback.  Do cleanup that we want to be sure happens
  * before all the supporting modules begin to nail their doors shut via

Jesse Zhang

sbjesse@gmail.com

about 7 years ago

In reply to: Chengchao Yu (#1)

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Hey Chengyu,
How did you set up your GDB to get "p" to pretty-print a Postgres list?

Cheers,
Jesse

On Fri, Nov 30, 2018 at 1:00 PM Chengchao Yu <chengyu@microsoft.com> wrote:

Show quoted text

Greetings,

Recently, we hit a few occurrences of deadlock when IO failure (including
disk full, random remote disk IO failures) happens in single user mode. We
found the issue exists on both Linux and Windows in multiple postgres
versions.

Here are the steps to repro on Linux (as Windows repro is similar):

1. Get latest PostgreSQL code, build and install the executables.

$ git clone https://git.postgresql.org/git/postgresql.git

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install

2. Run initdb to initialize a PG database folder.

$ /path/to/postgres/bin/initdb -D /path/to/data

3. Because the unable to write relation data scenario is difficult
to hit naturally even reserved space is turned off, I have prepared a small
patch (see attachment “emulate-error.patch”) to force an error when PG
tries to write data to relation files. We can just apply the patch and
there is no need to put efforts flooding data to disk any more.

$ cd $PGROOT

$ git apply /path/to/emulate-error.patch

$ cd build

$ make && make install

4. Connect to the newly initialized database cluster with single
user mode, create a table, and insert some data to the table, do a
checkpoint or directly give EOF. Then we hit the deadlock issue and the
process will not exit until we kill it.

Do a checkpoint explicitly:

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
exit_on_error=true <<EOF

create table t1(a int);

insert into t1 values (1), (2), (3);

checkpoint;

EOF

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:45:27.891 UTC [18806] FATAL:
Emulate exception in mdwrite() when writing to disk

2018-11-29 02:55:27.891 UTC [18806] CONTEXT: writing block 8 of relation
base/12368/1247

2018-11-29 02:55:27.891 UTC [18806] STATEMENT: checkpoint;

2018-11-29 02:55:27.900 UTC [18806] FATAL: Emulate exception in mdwrite()
when writing to disk

2018-11-29 02:55:27.900 UTC [18806] CONTEXT: writing block 8 of relation
base/12368/1247

Or directly give an EOF:

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
exit_on_error=true <<EOF

create table t1(a int);

insert into t1 values (1), (2), (3);

EOF

PostgreSQL stand-alone backend 11.1

backend> backend> backend> 2018-11-29 02:55:24.438 UTC [18149] FATAL:
Emulate exception in mdwrite() when writing to disk

2018-11-29 02:45:24.438 UTC [18149] CONTEXT: writing block 8 of relation
base/12368/1247

5. Moreover, when we try to recover the database with single user
mode, we hit the issue again, and the process does not bring up the
database nor exit.

$ /path/to/postgres/bin/postgres --single -D /path/to/data/ postgres -c
exit_on_error=true

2018-11-29 02:59:33.257 UTC [19058] LOG: database system shutdown was
interrupted; last known up at 2018-11-29 02:58:49 UTC

2018-11-29 02:59:33.485 UTC [19058] LOG: database system was not properly
shut down; automatic recovery in progress

2018-11-29 02:59:33.500 UTC [19058] LOG: redo starts at 0/1672E40

2018-11-29 02:59:33.500 UTC [19058] LOG: invalid record length at
0/1684B90: wanted 24, got 0

2018-11-29 02:59:33.500 UTC [19058] LOG: redo done at 0/1684B68

2018-11-29 02:59:33.500 UTC [19058] LOG: last completed transaction was
at log time 2018-11-29 02:58:49.856663+00

2018-11-29 02:59:33.547 UTC [19058] FATAL: Emulate exception in mdwrite()
when writing to disk

2018-11-29 02:59:33.547 UTC [19058] CONTEXT: writing block 8 of relation
base/12368/1247

Analyses:

So, what happened? Actually, there are 2 types of the deadlock due to the
same root cause. Let’s first take a look at the scenario in step #5. In
this scenario, the deadlock happens when disk IO failure occurs inside
StartupXLOG(). If we attach debugger to PG process, we will see the
process is stuck acquiring the buffer’s lw-lock in AbortBufferIO().

void

AbortBufferIO(void)

{

BufferDesc *buf = InProgressBuf;

if (buf)

{

uint32 buf_state;

/*

* Since LWLockReleaseAll has already been called, we're not
holding

* the buffer's io_in_progress_lock. We have to re-acquire it so
that

* we can use TerminateBufferIO. Anyone who's executing WaitIO on
the

* buffer will be in a busy spin until we succeed in doing this.

*/

LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

This is because the same lock has been acquired before buffer manager
attempts to flush the buffer page, which happens in StartBufferIO().

static bool

StartBufferIO(BufferDesc *buf, bool forInput)

{

uint32 buf_state;

Assert(!InProgressBuf);

for (;;)

{

/*

* Grab the io_in_progress lock so that other processes can wait
for

* me to finish the I/O.

*/

LWLockAcquire(BufferDescriptorGetIOLock(buf), LW_EXCLUSIVE);

buf_state = LockBufHdr(buf);

if (!(buf_state & BM_IO_IN_PROGRESS))

break;

After reading the code, AtProcExit_Buffers() assumes all the lw-locks are
released. However, in single user mode, at the time StartupXLOG() is
being executed, there is no before_shmem_exit/on_shmem_exit callback
registered to release the lw-locks.

And, given lw-lock is non-reentrant, so the process gets stuck
re-acquiring the same lock.

Here is the call stack:

(gdb) bt

#0 0x00007f0fdb7cb6d6 in futex_abstimed_wait_cancelable (private=128,
abstime=0x0, expected=0, futex_word=0x7f0fd14c81b8) at
../sysdeps/unix/sysv/linux/futex-internal.h:205

#1 do_futex_wait (sem=sem@entry=0x7f0fd14c81b8, abstime=0x0) at
sem_waitcommon.c:111

#2 0x00007f0fdb7cb7c8 in __new_sem_wait_slow (sem=0x7f0fd14c81b8,
abstime=0x0) at sem_waitcommon.c:181

#3 0x00005630d475658a in PGSemaphoreLock (sema=0x7f0fd14c81b8) at
pg_sema.c:316

#4 0x00005630d47f689e in LWLockAcquire (lock=0x7f0fd9ae9c00,
mode=LW_EXCLUSIVE) at
/path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5 0x00005630d47cd087 in *AbortBufferIO* () at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6 0x00005630d47cb3f9 in AtProcExit_Buffers (code=1, arg=0) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7 0x00005630d47dbc32 in shmem_exit (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8 0x00005630d47dba5e in proc_exit_prepare (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9 0x00005630d47db9c6 in proc_exit (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x00005630d49811bc in *errfinish* (dummy=0) at
/path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x00005630d4801f1f in mdwrite (reln=0x5630d6588c68,
forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "",
skipFsync=false) at
/path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x00005630d4804716 in smgrwrite (reln=0x5630d6588c68,
forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7f0fd1ae9c00 "",
skipFsync=false) at
/path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x00005630d47cb824 in FlushBuffer (buf=0x7f0fd19e9c00,
reln=0x5630d6588c68) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x00005630d47cb219 in SyncOneBuffer (buf_id=0,
skip_recently_used=false, wb_context=0x7ffccc371a70) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x00005630d47cab00 in BufferSync (flags=6) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x00005630d47cb57f in CheckPointBuffers (flags=6) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x00005630d44a685b in CheckPointGuts (checkPointRedo=23612304,
flags=6) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x00005630d44a62cf in CreateCheckPoint (flags=6) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x00005630d44a45e3 in *StartupXLOG* () at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:7723

#20 0x00005630d4995f88 in InitPostgres (in_dbname=0x5630d65582b0
"postgres", dboid=0, username=0x5630d653d7d0 "chengyu", useroid=0,
out_dbname=0x0, override_allow_connections=false)

at
/path/to/postgres/source/build/../src/backend/utils/init/postinit.c:636

#21 0x00005630d480b68b in PostgresMain (argc=7, argv=0x5630d6534d20,
dbname=0x5630d65582b0 "postgres", username=0x5630d653d7d0 "chengyu") at
/path/to/postgres/source/build/../src/backend/tcop/postgres.c:3810

#22 0x00005630d4695e8b in main (argc=7, argv=0x5630d6534d20) at
/path/to/postgres/source/build/../src/backend/main/main.c:224

(gdb) p on_shmem_exit_list

$1 = {{function = 0x55801cc68f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x55801cc68a4f <IpcMemoryDelete>, arg = 2490396},

{function = 0x55801cc689f8 <IpcMemoryDetach>, arg = 140602018975744},

{function = 0x55801cc6842e <ReleaseSemaphores>, arg = 0},

{function = 0x55801ccec48a <dsm_postmaster_shutdown>, arg =
140602018975744},

{function = 0x55801cd04053 <ProcKill>, arg = 0},

{function = 0x55801cd0402d <RemoveProcFromArray>, arg = 0},

{function = 0x55801ccf74e8 <CleanupInvalidationState>, arg =
140601991817088},

{function = 0x55801ccf446f <CleanupProcSignalState>, arg = 1},

{function = 0x55801ccdd3e5 <AtProcExit_Buffers>, arg = 0}, {function =
0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0},
{function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg
= 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function =
0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$2 = {{function = 0x0, arg = 0} <repeats 20 times>}

The second type is in Step #4. At the time when “checkpoint” SQL command
is being executed, PG has already set up the before_shmem_exit callback
ShutdownPostgres(), which releases all lw-locks given transaction or
sub-transaction is on-going. So after the first IO error, the buffer page’s
lw-lock gets released successfully. However, later ShutdownXLOG() is
invoked, and PG tries to flush buffer pages again, which results in the
second IO error. Different from the first time, this time, all the previous
executed before/on_shmem_exit callbacks are not invoked again due to the
decrease of the indexes. So lw-locks for buffer pages are not released when
PG tries to get the same buffer lock in AbortBufferIO(), and then PG
process gets stuck.

Here is the call stack:

(gdb) bt

#0 0x00007ff0c0c036d6 in futex_abstimed_wait_cancelable (private=128,
abstime=0x0, expected=0, futex_word=0x7ff0b69001b8) at
../sysdeps/unix/sysv/linux/futex-internal.h:205

#1 do_futex_wait (sem=sem@entry=0x7ff0b69001b8, abstime=0x0) at
sem_waitcommon.c:111

#2 0x00007ff0c0c037c8 in __new_sem_wait_slow (sem=0x7ff0b69001b8,
abstime=0x0) at sem_waitcommon.c:181

#3 0x0000562077cc258a in PGSemaphoreLock (sema=0x7ff0b69001b8) at
pg_sema.c:316

#4 0x0000562077d6289e in LWLockAcquire (lock=0x7ff0bef225c0,
mode=LW_EXCLUSIVE) at
/path/to/postgres/source/build/../src/backend/storage/lmgr/lwlock.c:1243

#5 0x0000562077d39087 in *AbortBufferIO* () at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:3988

#6 0x0000562077d373f9 in AtProcExit_Buffers (code=1, arg=0) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2473

#7 0x0000562077d47c32 in shmem_exit (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#8 0x0000562077d47a5e in proc_exit_prepare (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#9 0x0000562077d479c6 in proc_exit (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#10 0x0000562077eed1bc in *errfinish* (dummy=0) at
/path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#11 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18,
forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "",
skipFsync=false) at
/path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#12 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18,
forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "",
skipFsync=false) at
/path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#13 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80,
reln=0x562078a12a18) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#14 0x0000562077d37219 in SyncOneBuffer (buf_id=78,
skip_recently_used=false, wb_context=0x7fffb0e3e230) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#15 0x0000562077d36b00 in BufferSync (flags=5) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#16 0x0000562077d3757f in CheckPointBuffers (flags=5) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#17 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152,
flags=5) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#18 0x0000562077a122cf in CreateCheckPoint (flags=5) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#19 0x0000562077a1164f in *ShutdownXLOG* (code=1, arg=0) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8485

#20 0x0000562077d47c32 in shmem_exit (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:272

#21 0x0000562077d47a5e in proc_exit_prepare (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:194

#22 0x0000562077d479c6 in proc_exit (code=1) at
/path/to/postgres/source/build/../src/backend/storage/ipc/ipc.c:107

#23 0x0000562077eed1bc in *errfinish* (dummy=0) at
/path/to/postgres/source/build/../src/backend/utils/error/elog.c:541

#24 0x0000562077d6df1f in mdwrite (reln=0x562078a12a18,
forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "",
skipFsync=false) at
/path/to/postgres/source/build/../src/backend/storage/smgr/md.c:843

#25 0x0000562077d70716 in smgrwrite (reln=0x562078a12a18,
forknum=MAIN_FORKNUM, blocknum=8, buffer=0x7ff0b6fbdc00 "",
skipFsync=false) at
/path/to/postgres/source/build/../src/backend/storage/smgr/smgr.c:650

#26 0x0000562077d37824 in FlushBuffer (buf=0x7ff0b6e22f80,
reln=0x562078a12a18) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2751

#27 0x0000562077d37219 in SyncOneBuffer (buf_id=78,
skip_recently_used=false, wb_context=0x7fffb0e3fb10) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2394

#28 0x0000562077d36b00 in BufferSync (flags=44) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:1984

#29 0x0000562077d3757f in CheckPointBuffers (flags=44) at
/path/to/postgres/source/build/../src/backend/storage/buffer/bufmgr.c:2578

#30 0x0000562077a1285b in CheckPointGuts (checkPointRedo=24049152,
flags=44) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:9149

#31 0x0000562077a122cf in CreateCheckPoint (flags=44) at
/path/to/postgres/source/build/../src/backend/access/transam/xlog.c:8937

#32 0x0000562077cca792 in RequestCheckpoint (flags=44) at
/path/to/postgres/source/build/../src/backend/postmaster/checkpointer.c:976

#33 0x0000562077d7bce4 in standard_ProcessUtility (pstmt=0x562078a00b50,
queryString=0x562078a00100 "checkpoint;\n",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,

dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "") at
/path/to/postgres/source/build/../src/backend/tcop/utility.c:769

#34 0x0000562077d7b204 in ProcessUtility (pstmt=0x562078a00b50,
queryString=0x562078a00100 "checkpoint;\n",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x5620783ac5e0 <debugtupDR>,

completionTag=0x7fffb0e41520 "") at
/path/to/postgres/source/build/../src/backend/tcop/utility.c:360

#35 0x0000562077d7a347 in PortalRunUtility (portal=0x5620789f20c0,
pstmt=0x562078a00b50, isTopLevel=true, setHoldSnapshot=false,
dest=0x5620783ac5e0 <debugtupDR>, completionTag=0x7fffb0e41520 "")

at /path/to/postgres/source/build/../src/backend/tcop/pquery.c:1178

#36 0x0000562077d7a534 in PortalRunMulti (portal=0x5620789f20c0,
isTopLevel=true, setHoldSnapshot=false, dest=0x5620783ac5e0 <debugtupDR>,
altdest=0x5620783ac5e0 <debugtupDR>,

completionTag=0x7fffb0e41520 "") at
/path/to/postgres/source/build/../src/backend/tcop/pquery.c:1324

#37 0x0000562077d79a61 in PortalRun (portal=0x5620789f20c0,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x5620783ac5e0 <debugtupDR>, altdest=0x5620783ac5e0 <debugtupDR>,

completionTag=0x7fffb0e41520 "") at
/path/to/postgres/source/build/../src/backend/tcop/pquery.c:799

#38 0x0000562077d734c5 in exec_simple_query (query_string=0x562078a00100
"checkpoint;\n") at
/path/to/postgres/source/build/../src/backend/tcop/postgres.c:1145

#39 0x0000562077d77bd5 in PostgresMain (argc=7, argv=0x562078980d20,
dbname=0x5620789a42b0 "postgres", username=0x5620789897d0 "chengyu") at
/path/to/postgres/source/build/../src/backend/tcop/postgres.c:4182

#40 0x0000562077c01e8b in main (argc=7, argv=0x562078980d20) at
/path/to/postgres/source/build/../src/backend/main/main.c:224

(gdb) p on_shmem_exit_list

$9 = {{function = 0x562077cc2f5f <AnonymousShmemDetach>, arg = 0},

{function = 0x562077cc2a4f <IpcMemoryDelete>, arg = 2457627},

{function = 0x562077cc29f8 <IpcMemoryDetach>, arg = 140672005165056},

{function = 0x562077cc242e <ReleaseSemaphores>, arg = 0},

{function = 0x562077d4648a <dsm_postmaster_shutdown>, arg =
140672005165056},

{function = 0x562077d5e053 <ProcKill>, arg = 0},

{function = 0x562077d5e02d <RemoveProcFromArray>, arg = 0},

{function = 0x562077d514e8 <CleanupInvalidationState>, arg =
140671978006400},

{function = 0x562077d4e46f <CleanupProcSignalState>, arg = 1},

{function = 0x562077d373e5 <AtProcExit_Buffers>, arg = 0},

{function = 0x562077a1159d <ShutdownXLOG>, arg = 0},

{function = 0x562077cd0637 <pgstat_beshutdown_hook>, arg = 0}, {function =
0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0},
{function = 0x0, arg = 0}, {function = 0x0, arg = 0}, {function = 0x0, arg
= 0}, {function = 0x0, arg = 0}, {function = 0x0, arg = 0}}

(gdb) p before_shmem_exit_list

$10 = {{function = 0x562077f02caa <ShutdownPostgres>, arg = 0}, {function
= 0x0, arg = 0} <repeats 19 times>}

OK, now we understand the deadlock issue for single user mode. However,
will this issue affect multi-user mode (i.e. under postmaster process)? We
can have 3 cases for discussion:

1. Startup process: at the time StartupXLOG() is invoked,
ShutdownAuxiliaryProcess(), which will release all the lw-locks, has
been already registered in before_shmem_exit_list[]. So this case is
safe.
2. Checkpointer process: ShutdownXLOG() is not registered as a
before/on_shmem_exit callback, instead, it’s only invoked in the main loop.
So there is no chance to hit IO error for second time during shared memory
exit callbacks. Also, Same as startup process,
ShutdownAuxiliaryProcess() has been registered. So this case is also
safe.
3. Other backend/background processes: these processes do not handle
XLOG startup or shutdown, and are protected by
ShutdownAuxiliaryProcess(). So they are safe to exit too.

In addition, we have done multiple experiments to confirm these cases.

Affected versions: we found this issue in 9.5, 9.6, 10, 11 and 12devel.

Fix proposal:

According to the affected 2 types of deadlock in single user mode
discussed above, there might be multiple ways to fix this issue. In the fix
proposal we would like to present, we register a new callback to release
all the lw-locks (just like what ShutdownAuxiliaryProcess()does) in an
order after AtProcExit_Buffers() and before ShutdownXLOG(). Also, it is
registered before PG enters StartupXLOG(), so it can cover the case when
ShutdownPostgres() has not been registered. Here is the content of the
proposal:
diff --git a/src/backend/utils/init/postinit.c
b/src/backend/utils/init/postinit.c
index 62baaf0ab3..d74e8aa1d5 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -71,6 +71,7 @@ static HeapTuple GetDatabaseTupleByOid(Oid dboid);

static void PerformAuthentication(Port *port);

static void CheckMyDatabase(const char *name, bool am_superuser, bool
override_allow_connections);

static void InitCommunication(void);

+static void ReleaseLWLocks(int code, Datum arg);

static void ShutdownPostgres(int code, Datum arg);

static void StatementTimeoutHandler(void);

static void LockTimeoutHandler(void);

@@ -653,6 +654,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const
char *username,

* way, start up the XLOG machinery, and register to have
it closed

* down at exit.

*/

+ on_shmem_exit(ReleaseLWLocks, 0);

StartupXLOG();

on_shmem_exit(ShutdownXLOG, 0);

}

@@ -1214,6 +1216,23 @@ process_settings(Oid databaseid, Oid roleid)

heap_close(relsetting, AccessShareLock);

}

+/*

+ * There are 2 types of buffer locks on-holding when AtProcExit_Buffers()
is

+ * invoked in a bootstrap process or a standalone backend:

+ * (1) Exceptions thrown during StartupXLOG()

+ * (2) Exceptions thrown during exception-handling in ShutdownXLOG()

+ * So we need this on_shmem_exit callback for single user mode.

+ * For processes under postmaster, ShutdownAuxiliaryProcess() will release

+ * the lw-locks and ShutdownXLOG() is not registered as a callback, so
there

+ * is no such issue. Also, please note this callback should be registered
in

+ * the order after AtProcExit_buffers() and before ShutdownXLOG().

+ */

+static void

+ReleaseLWLocks(int code, Datum arg)

+{

+ LWLockReleaseAll();

+}

+

/*

* Backend-shutdown callback. Do cleanup that we want to be sure happens

* before all the supporting modules begin to nail their doors shut via

The fix proposal is also attached to this email in file
“fix-deadlock.patch”.

Please let us know should you have suggestions on this issue and the fix.

Thank you!

Best regards,

--

Chengchao Yu

Software Engineer | Microsoft | Azure Database for PostgreSQL

https://azure.microsoft.com/en-us/services/postgresql/

Chengchao Yu

chengyu@microsoft.com

about 7 years ago

In reply to: Jesse Zhang (#2)

RE: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Hi Jesse,

Actually, before/on_shmem_exit_list are both arrays, they are of type “struct ONEXIT [20]”.
For the pretty-printing, I just edited it manually to make it easier for the reader to look up.
But since you have questioned, there does exit a GDB command of “set print array on”. After that “p” command will print array entry one per line.

Thanks,
Chengchao

From: Jesse Zhang <sbjesse@gmail.com>
Sent: Saturday, December 1, 2018 9:55 AM
To: Chengchao Yu <chengyu@microsoft.com>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

Hey Chengyu,
How did you set up your GDB to get "p" to pretty-print a Postgres list?

Cheers,
Jesse

On Fri, Nov 30, 2018 at 1:00 PM Chengchao Yu <chengyu@microsoft.com<mailto:chengyu@microsoft.com>> wrote:

Greetings,

Here are the steps to repro on Linux (as Windows repro is similar):

1. Get latest PostgreSQL code, build and install the executables.

$ git clone https://git.postgresql.org/git/postgresql.git<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.postgresql.org%2Fgit%2Fpostgresql.git&data=02%7C01%7Cchengyu%40microsoft.com%7Cc20a4c2e38de4f908fbe08d657b62bd1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636792837284736303&sdata=NDLjBnH6xSUyknBELKC1I0omDL2PvZl%2BnEYtPy9JJiI%3D&reserved=0>

$ cd postgresql

$ PGROOT=$(pwd)

$ git checkout REL_11_STABLE

$ mkdir build

$ cd build

$ ../configure --prefix=/path/to/postgres

$ make && make install

2. Run initdb to initialize a PG database folder.

$ /path/to/postgres/bin/initdb -D /path/to/data

3. Because the unable to write relation data scenario is difficult to hit naturally even reserved space is turned off, I have prepared a small patch (see attachment “emulate-error.patch”) to force an error when PG tries to write data to relation files. We can just apply the patch and there is no need to put efforts flooding data to disk any more.