HEAD crashes with assertion and LWLOCK_STATS enabled

Started by Yuto HAYAMIZUover 11 years ago3 messages
#1Yuto HAYAMIZU
y.hayamizu@gmail.com
1 attachment(s)

Hi hackers,

I found a bug that causes a crash when assertion is enabled and LWLOCK_STATS is defined.
I've tested with Debian 7.5 (3.2.0-4-amd64) on VMware fusion 6, but this bug seems to be platform-independent and should reproduce in other environments.
A patch to fix the bug is also attached.

## Reproduing a crash

You can reproduce a crash by this way:

git co a0841ecd2518d4505b96132b764b918ab5d21ad4
git clean -dfx
./configure --enable-cassert CFLAGS='-DLWLOCK_STATS'
make check

In my environment, the following messages appeared.

( omit... )
../../../src/test/regress/pg_regress --inputdir=. --temp-install=./tmp_check --top-builddir=../../.. --dlpath=. --schedule=./parallel_schedule
============== creating temporary installation ==============
============== initializing database system ==============

pg_regress: initdb failed

and initdb.log contained the following messages.

reating directory /tmp/pghead/src/test/regress/./tmp_check/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
creating template1 database in /tmp/pghead/src/test/regress/./tmp_check/data/base/1 ... PID 48239 lwlock main 142: shacq 0 exacq 1 blk 0 spindelay 0
( omit... )
PID 48247 lwlock main 33058: shacq 0 exacq 1 blk 0 spindelay 0
PID 48247 lwlock main 33005: shacq 0 exacq 48 blk 0 spindelay 0
ok
loading system objects' descriptions ... TRAP: FailedAssertion("!(CritSectionCount == 0 || (context) == ErrorContext || (MyAuxProcType == CheckpointerProcess))", File: "mcxt.c", Line: 594)
Aborted (core dumped)
child process exited with exit code 134
initdb: data directory "/tmp/pghead/src/test/regress/./tmp_check/data" not removed at user's request

## The cause of crash

The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit 4a170ee9 on 2014-04-04.

In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry. For each LWLock, a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc. If the first invocation is in a critical section, the assertion fails.

For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem.
I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case, locking CheckpointerCommLock in AbsorbFsyncRequests function was the problem.

## A solution

In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the initialization of each process.
The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and InitAuxiliaryProcess.

With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not cover LWLocks outside of MainLWLockArray. I'm not sure where is the right place to initialize lwlock_stats entries for that locks. So I feel it needs some refinements by you hackers.

Attachments:

0001-pre-populating-lwlock_stats-entries.patchtext/plain; charset=UTF-8; name=0001-pre-populating-lwlock_stats-entries.patch; x-mac-creator=0; x-mac-type=0Download
From f96708d14ab0073abd95c463eaf8d60db42f411d Mon Sep 17 00:00:00 2001
From: Yuto Hayamizu <y.hayamizu@gmail.com>
Date: Tue, 20 May 2014 16:19:56 +0900
Subject: [PATCH] pre-populating lwlock_stats entries

---
 src/backend/storage/lmgr/lwlock.c |   21 +++++++++++++++++++++
 src/backend/storage/lmgr/proc.c   |    8 ++++++++
 src/include/storage/lwlock.h      |    4 ++++
 3 files changed, 33 insertions(+)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index d23ac62..fc97ae0 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -359,8 +359,29 @@ CreateLWLocks(void)
 	MainLWLockTranche.array_base = MainLWLockArray;
 	MainLWLockTranche.array_stride = sizeof(LWLockPadded);
 	LWLockRegisterTranche(0, &MainLWLockTranche);
+
+#ifdef LWLOCK_STATS
+	InitializeLWLockStats();
+#endif
 }
 
+#ifdef LWLOCK_STATS
+void
+InitializeLWLockStats(void)
+{
+	int			numLocks = NumLWLocks();
+	int			id;
+	LWLockPadded	*lock;
+
+	if (MyProc == NULL) return;
+
+	/* Initialize all lwlock_stats entries of MainLWLockArray */
+	for (id = 0, lock = MainLWLockArray; id < numLocks; id++, lock++)
+	{
+		get_lwlock_stats_entry(&lock->lock);
+	}
+}
+#endif
 
 /*
  * LWLockAssign - assign a dynamically-allocated LWLock number
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 266b0da..e09cbf8 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -415,6 +415,10 @@ InitProcess(void)
 	 * the deadlock checker.
 	 */
 	InitDeadLockChecking();
+
+#ifdef LWLOCK_STATS
+	InitializeLWLockStats();
+#endif
 }
 
 /*
@@ -566,6 +570,10 @@ InitAuxiliaryProcess(void)
 	 * Arrange to clean up at process exit.
 	 */
 	on_shmem_exit(AuxiliaryProcKill, Int32GetDatum(proctype));
+
+#ifdef LWLOCK_STATS
+	InitializeLWLockStats();
+#endif
 }
 
 /*
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 175fae3..f0f1c21 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -209,6 +209,10 @@ extern int	LWLockNewTrancheId(void);
 extern void LWLockRegisterTranche(int, LWLockTranche *);
 extern void LWLockInitialize(LWLock *, int tranche_id);
 
+#ifdef LWLOCK_STATS
+extern void InitializeLWLockStats(void);
+#endif
+
 /*
  * Prior to PostgreSQL 9.4, we used an enum type called LWLockId to refer
  * to LWLocks.  New code should instead use LWLock *.  However, for the
-- 
1.7.10.4

#2Robert Haas
robertmhaas@gmail.com
In reply to: Yuto HAYAMIZU (#1)
Re: HEAD crashes with assertion and LWLOCK_STATS enabled

On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU <y.hayamizu@gmail.com> wrote:

The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit 4a170ee9 on 2014-04-04.

In my understanding, the root cause of the assertion failure is on-demand allocation of lwlock_stats entry. For each LWLock, a lwlock_stats entry is created at the first invocation of LWLockAcquire using MemoryContextAlloc. If the first invocation is in a critical section, the assertion fails.

For 'initdb' case I mentioned above, WALWriteLock locking in XLogFlush function was the problem.
I also confirmed the assertion failure on starting postgres on a correctly initialized database. In this case, locking CheckpointerCommLock in AbsorbFsyncRequests function was the problem.

## A solution

In order to avoid memory allocation during critical sections, lwlock_stats hash table should be populated at the initialization of each process.
The attached patch populate lwlock_stats entries of MainLWLockArray at the end of CreateLWLocks, InitProcess and InitAuxiliaryProcess.

With this patch, all regression tests can be passed so far, but I think this patch is not perfect because it does not cover LWLocks outside of MainLWLockArray. I'm not sure where is the right place to initialize lwlock_stats entries for that locks. So I feel it needs some refinements by you hackers.

Prior to my commit ea9df812d8502fff74e7bc37d61bdc7d66d77a7f, which
introduced LWLockTranche, we used to allocate enough storage for all
of the LWLOCK_STATS entries in the system; that commit changed things
so that we allocate entries for particular LWLocks on an as-needed
basis. Although that wasn't the main point of that patch, I thought
it was a nice idea, since it might save you quite a bit of memory if
you have a lot of backends that don't touch very many LWLocks. But
maybe we need to give up on that in view of this report.

I don't think we should adopt the approach proposed in this patch,
though, because if we're going to preallocate all of the entries
anyway there's little reason to use a hash table instead of an array.
If we're going to go with the approach of preallocating all the
entries, maybe we should change the definition of LWLockTranche to
include the number of lwlocks in the tranche. We could then add
another array parallel to LWLockTrancheArray which would point to an
appropriately-sized array of lwlock_stats objects for each tranche.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Fujii Masao
masao.fujii@gmail.com
In reply to: Robert Haas (#2)
Re: HEAD crashes with assertion and LWLOCK_STATS enabled

On Fri, May 23, 2014 at 9:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, May 20, 2014 at 4:02 AM, Yuto HAYAMIZU <y.hayamizu@gmail.com> wrote:

The failing assertion is for prohibiting memory allocation in a critical section, which is introduced by commit 4a170ee9 on 2014-04-04.

This issue is still in open item list for 9.4. But the commit which had
caused this issue was reverted by d27d493. So I removed this from the
Open Item list.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers