[BUG FIX] Removing NamedLWLockTrancheArray

Started by Kyotaro HORIGUCHIalmost 9 years ago13 messages
#1Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
1 attachment(s)

Hello, some of my collegues found that orafce crashes with
postgresql compliled with dtrace.

=== The cause

The immediate cause was I think that it just did
LWLockNewTrancheId and forget to do LWLockRegisterTranche. But
this is hardly detectable by a module developer.

Typical tracepoint looks like the following.

TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);

Where T_NAME is (LWLockTrancheArray[(lock)->tranche]), as you see
T_NAME assumes that all tranches pointed from lwlocks are
available on the array but the tranches without
LWLockRegisterTranche are not. What is worse this doesn't harm so
much without dtrace (or LWLOCK_STATS or error in LWLockRelease)
. Even if dtrace is activated one or two unregistred tranches
(faultly) have their seat (filled with trash) at the end of the
array with the current code.

As my understanding there are two ways to use lwlocks in
extension. One is using LWLockNewTrancheId and
LWLockRegisterTanche on shared memory provided by the module. The
other is using RequestNamedLWLockTranche in _PG_init and
GetNamedLWLockTranche to acquire locks provided by postgresql.

I couldn't find a documentation about lwlock and trance in
extentions, is there any?

=== How to fix this

The most straightforward way to fix this is chekcing the validity
of tranche id on initilization of a lwlock. Even though I think
that degradation won't be a matter here, NamedLWLockTrancheArray
makes the things very ineffective. After some consideration I
decided to remove NamedLWLockTrancheArray.

=== The patch

The attached patch (is for current master) is aming to fix all
the problem by doing the following two things.

- Remove NameLWLockTrancheArray and all tranches are registered
in LWLockTrancheArray. This seems to work at least for
!EXEC_BAKCEND environment but I haven't tested with EXEC_BACKEND.

- Check tranche id in LWLockInitialize.

The first one required refactoring of CreateLWLocks. It is
changed to register tranches first then initialize lwlokcs.

The problem is found with PG9.6 and this should be backpatched at
least to the version. I haven't tested PG9.5 and 9.4 but it seems
to need different amendment. 9.3 doesn't has tranche.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Remove-NamedLWLockTrancheArray.patchtext/x-patch; charset=us-asciiDownload
From e2c4619df6c51dd5b71630de7467b332f7449fc5 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Fri, 3 Mar 2017 15:47:19 +0900
Subject: [PATCH] Remove NamedLWLockTrancheArray

This patch fixes two pattern of server crash of postgresql compled
with --with-dtarce.

One occurs when an extension forgets to call LWLockRegisterTranche for
the tranche id given by LWLockNewTrancheId. For the case, trace point
code of lwlocks looks up to LWLockTrancheArray for tranche names but
it can be outside of valid(mapped) memory. This seems quite common
mistake but hardly detectable by ordinary tests. Range check of
tranche_id in LWLockInitialize can detect the case.

The other occurs when an extension uses RequestNamedfLWLockTranche and
GetNamedLWLockTrasnche. In this case the tranches are stored in
NamedLWLockTrancheArray separatley from LWLockTrancheArray. This also
may cause a crash when a tracepoint is performed. This patch merges
all tranches into LWLockTrancheArray and remove NameLWLockTrancheArray.
---
 src/backend/storage/lmgr/lwlock.c | 89 ++++++++++++++++++++++-----------------
 src/include/storage/lwlock.h      |  8 ----
 2 files changed, 51 insertions(+), 46 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index ab81d94..4d2a1fc 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -153,8 +153,6 @@ NamedLWLockTrancheRequest *NamedLWLockTrancheRequestArray = NULL;
 static int	NamedLWLockTrancheRequestsAllocated = 0;
 int			NamedLWLockTrancheRequests = 0;
 
-NamedLWLockTranche *NamedLWLockTrancheArray = NULL;
-
 static bool lock_named_request_allowed = true;
 
 static void InitializeLWLocks(void);
@@ -360,9 +358,6 @@ LWLockShmemSize(void)
 	/* Space for dynamic allocation counter, plus room for alignment. */
 	size = add_size(size, sizeof(int) + LWLOCK_PADDED_SIZE);
 
-	/* space for named tranches. */
-	size = add_size(size, mul_size(NamedLWLockTrancheRequests, sizeof(NamedLWLockTranche)));
-
 	/* space for name of each tranche. */
 	for (i = 0; i < NamedLWLockTrancheRequests; i++)
 		size = add_size(size, strlen(NamedLWLockTrancheRequestArray[i].tranche_name) + 1);
@@ -410,13 +405,16 @@ CreateLWLocks(void)
 		 */
 		LWLockCounter = (int *) ((char *) MainLWLockArray - sizeof(int));
 		*LWLockCounter = LWTRANCHE_FIRST_USER_DEFINED;
-
-		/* Initialize all LWLocks */
-		InitializeLWLocks();
 	}
 
-	/* Register all LWLock tranches */
+	/* Register all LWLock tranches on local memory */
 	RegisterLWLockTranches();
+
+	/* Initialize all LWLocks */
+	if (!IsUnderPostmaster)
+		InitializeLWLocks();
+
+
 }
 
 /*
@@ -425,7 +423,6 @@ CreateLWLocks(void)
 static void
 InitializeLWLocks(void)
 {
-	int			numNamedLocks = NumLWLocksByNamedTranches();
 	int			id;
 	int			i;
 	int			j;
@@ -451,35 +448,22 @@ InitializeLWLocks(void)
 	for (id = 0; id < NUM_PREDICATELOCK_PARTITIONS; id++, lock++)
 		LWLockInitialize(&lock->lock, LWTRANCHE_PREDICATE_LOCK_MANAGER);
 
-	/* Initialize named tranches. */
+	/*
+	 * Initialize LWLokcs in user tranches defined using
+	 * RequestNamedLWLockTranche
+	 */
 	if (NamedLWLockTrancheRequests > 0)
 	{
-		char	   *trancheNames;
+		LWLockPadded   *lock = &MainLWLockArray[NUM_FIXED_LWLOCKS];
 
-		NamedLWLockTrancheArray = (NamedLWLockTranche *)
-			&MainLWLockArray[NUM_FIXED_LWLOCKS + numNamedLocks];
-
-		trancheNames = (char *) NamedLWLockTrancheArray +
-			(NamedLWLockTrancheRequests * sizeof(NamedLWLockTranche));
-		lock = &MainLWLockArray[NUM_FIXED_LWLOCKS];
-
-		for (i = 0; i < NamedLWLockTrancheRequests; i++)
+		for (i = 0 ; i < NamedLWLockTrancheRequests ; i++)
 		{
-			NamedLWLockTrancheRequest *request;
-			NamedLWLockTranche *tranche;
-			char	   *name;
+			NamedLWLockTrancheRequest *request =
+				&NamedLWLockTrancheRequestArray[i];
+			int tranche_id = LWTRANCHE_FIRST_USER_DEFINED + i;
 
-			request = &NamedLWLockTrancheRequestArray[i];
-			tranche = &NamedLWLockTrancheArray[i];
-
-			name = trancheNames;
-			trancheNames += strlen(request->tranche_name) + 1;
-			strcpy(name, request->tranche_name);
-			tranche->trancheId = LWLockNewTrancheId();
-			tranche->trancheName = name;
-
-			for (j = 0; j < request->num_lwlocks; j++, lock++)
-				LWLockInitialize(&lock->lock, tranche->trancheId);
+			for (j = 0 ; j < request->num_lwlocks ; j++, lock++)
+				LWLockInitialize(&lock->lock, tranche_id);
 		}
 	}
 }
@@ -495,6 +479,10 @@ RegisterLWLockTranches(void)
 	if (LWLockTrancheArray == NULL)
 	{
 		LWLockTranchesAllocated = 64;
+		while(LWTRANCHE_FIRST_USER_DEFINED + NamedLWLockTrancheRequests
+			  > LWLockTranchesAllocated)
+			LWLockTranchesAllocated *= 2;
+
 		LWLockTrancheArray = (char **)
 			MemoryContextAllocZero(TopMemoryContext,
 						  LWLockTranchesAllocated * sizeof(char *));
@@ -511,10 +499,29 @@ RegisterLWLockTranches(void)
 	LWLockRegisterTranche(LWTRANCHE_PARALLEL_QUERY_DSA,
 						  "parallel_query_dsa");
 
-	/* Register named tranches. */
-	for (i = 0; i < NamedLWLockTrancheRequests; i++)
-		LWLockRegisterTranche(NamedLWLockTrancheArray[i].trancheId,
-							  NamedLWLockTrancheArray[i].trancheName);
+	if (NamedLWLockTrancheRequests > 0)
+	{
+		char   *trancheNames =
+			(char *) &MainLWLockArray[NUM_FIXED_LWLOCKS +
+									  NumLWLocksByNamedTranches()];
+
+		for (i = 0; i < NamedLWLockTrancheRequests; i++)
+		{
+			char	   *name;
+			int			tranche_id;
+			NamedLWLockTrancheRequest *request;
+
+			tranche_id = LWLockNewTrancheId();
+			request = &NamedLWLockTrancheRequestArray[i];
+
+			/* Copy the name in request to shared memory */
+			name = trancheNames;
+			trancheNames += strlen(request->tranche_name) + 1;
+			strcpy(name, request->tranche_name);
+
+			LWLockRegisterTranche(tranche_id, name);
+		}
+	}
 }
 
 /*
@@ -665,6 +672,12 @@ RequestNamedLWLockTranche(const char *tranche_name, int num_lwlocks)
 void
 LWLockInitialize(LWLock *lock, int tranche_id)
 {
+	if (tranche_id >= LWTRANCHE_FIRST_USER_DEFINED &&
+		tranche_id >= LWLockTranchesAllocated)
+		ereport(ERROR,
+				(errmsg ("tranche for this lock not available"),
+				 errdetail("LWLockRegisterTranche for this tranche required")));
+
 	pg_atomic_init_u32(&lock->state, LW_FLAG_RELEASE_OK);
 #ifdef LOCK_DEBUG
 	pg_atomic_init_u32(&lock->nwaiters, 0);
diff --git a/src/include/storage/lwlock.h b/src/include/storage/lwlock.h
index 8bd93c3..4645414 100644
--- a/src/include/storage/lwlock.h
+++ b/src/include/storage/lwlock.h
@@ -90,14 +90,6 @@ typedef union LWLockMinimallyPadded
 extern PGDLLIMPORT LWLockPadded *MainLWLockArray;
 extern char *MainLWLockNames[];
 
-/* struct for storing named tranche information */
-typedef struct NamedLWLockTranche
-{
-	int			trancheId;
-	char	   *trancheName;
-} NamedLWLockTranche;
-
-extern PGDLLIMPORT NamedLWLockTranche *NamedLWLockTrancheArray;
 extern PGDLLIMPORT int NamedLWLockTrancheRequests;
 
 /* Names for fixed lwlocks */
-- 
2.9.2

#2Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#1)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

At Fri, 03 Mar 2017 17:13:42 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170303.171342.134582021.horiguchi.kyotaro@lab.ntt.co.jp>

Hello, some of my collegues found that orafce crashes with
postgresql compliled with dtrace.

=== The cause

The immediate cause was I think that it just did
LWLockNewTrancheId and forget to do LWLockRegisterTranche. But
this is hardly detectable by a module developer.

I'm sorry, I found that orafce is calling LWLockRegisterTranche
so this might be a different problem but anyway the problem with
RequestNamedLWLockTranche occurs.

Typical tracepoint looks like the following.

TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);

Where T_NAME is (LWLockTrancheArray[(lock)->tranche]), as you see
T_NAME assumes that all tranches pointed from lwlocks are
available on the array but the tranches without
LWLockRegisterTranche are not. What is worse this doesn't harm so
much without dtrace (or LWLOCK_STATS or error in LWLockRelease)
. Even if dtrace is activated one or two unregistred tranches
(faultly) have their seat (filled with trash) at the end of the
array with the current code.

As my understanding there are two ways to use lwlocks in
extension. One is using LWLockNewTrancheId and
LWLockRegisterTanche on shared memory provided by the module. The
other is using RequestNamedLWLockTranche in _PG_init and
GetNamedLWLockTranche to acquire locks provided by postgresql.

I couldn't find a documentation about lwlock and trance in
extentions, is there any?

=== How to fix this

The most straightforward way to fix this is chekcing the validity
of tranche id on initilization of a lwlock. Even though I think
that degradation won't be a matter here, NamedLWLockTrancheArray
makes the things very ineffective. After some consideration I
decided to remove NamedLWLockTrancheArray.

=== The patch

The attached patch (is for current master) is aming to fix all
the problem by doing the following two things.

- Remove NameLWLockTrancheArray and all tranches are registered
in LWLockTrancheArray. This seems to work at least for
!EXEC_BAKCEND environment but I haven't tested with EXEC_BACKEND.

- Check tranche id in LWLockInitialize.

The first one required refactoring of CreateLWLocks. It is
changed to register tranches first then initialize lwlokcs.

The problem is found with PG9.6 and this should be backpatched at
least to the version. I haven't tested PG9.5 and 9.4 but it seems
to need different amendment. 9.3 doesn't has tranche.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#2)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

Sorry for frequent mails..

At Fri, 03 Mar 2017 17:21:21 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170303.172121.140674354.horiguchi.kyotaro@lab.ntt.co.jp>

At Fri, 03 Mar 2017 17:13:42 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170303.171342.134582021.horiguchi.kyotaro@lab.ntt.co.jp>

Hello, some of my collegues found that orafce crashes with
postgresql compliled with dtrace.

=== The cause

The immediate cause was I think that it just did
LWLockNewTrancheId and forget to do LWLockRegisterTranche. But
this is hardly detectable by a module developer.

I'm sorry, I found that orafce is calling LWLockRegisterTranche
so this might be a different problem but anyway the problem with
RequestNamedLWLockTranche occurs.

As for orafce, the real cuase seems to be calling
LWLockRagisterTranche with tranche on non-static memory.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#3)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

I should be very tired.

This is the last mail today.

At Fri, 03 Mar 2017 17:26:27 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170303.172627.225604431.horiguchi.kyotaro@lab.ntt.co.jp>

Hello, some of my collegues found that orafce crashes with
postgresql compliled with dtrace.

=== The cause

The immediate cause was I think that it just did
LWLockNewTrancheId and forget to do LWLockRegisterTranche. But
this is hardly detectable by a module developer.

I'm sorry, I found that orafce is calling LWLockRegisterTranche
so this might be a different problem but anyway the problem with
RequestNamedLWLockTranche occurs.

As for orafce, the real cuase seems to be calling
LWLockRagisterTranche with tranche on non-static memory.

No, it is giving static variable. Sorry for the noise.

The problem of postgresql is still exists, though.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro HORIGUCHI (#1)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

On Fri, Mar 3, 2017 at 1:43 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello, some of my collegues found that orafce crashes with
postgresql compliled with dtrace.

=== The cause

The immediate cause was I think that it just did
LWLockNewTrancheId and forget to do LWLockRegisterTranche. But
this is hardly detectable by a module developer.

Typical tracepoint looks like the following.

TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(T_NAME(lock), mode);

Where T_NAME is (LWLockTrancheArray[(lock)->tranche]), as you see
T_NAME assumes that all tranches pointed from lwlocks are
available on the array but the tranches without
LWLockRegisterTranche are not. What is worse this doesn't harm so
much without dtrace (or LWLOCK_STATS or error in LWLockRelease)
. Even if dtrace is activated one or two unregistred tranches
(faultly) have their seat (filled with trash) at the end of the
array with the current code.

As my understanding there are two ways to use lwlocks in
extension. One is using LWLockNewTrancheId and
LWLockRegisterTanche on shared memory provided by the module. The
other is using RequestNamedLWLockTranche in _PG_init and
GetNamedLWLockTranche to acquire locks provided by postgresql.

I couldn't find a documentation about lwlock and trance in
extentions, is there any?

You can read about usage of LWLocks in extensions from below location:
https://www.postgresql.org/docs/devel/static/xfunc-c.html#idp86986416

Alternatively, you can check the commit
c1772ad9225641c921545b35c84ee478c326b95e to see how it is used in
pg_stat_statements.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Amit Kapila (#5)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

You can read about usage of LWLocks in extensions from below location:
https://www.postgresql.org/docs/devel/static/xfunc-c.html#idp86986416

Thank you for the pointer. I understand that the document describes the
only correct way to use LWLock in extensions and using
LWLockRegisterTranche is a non-standard or prohibit way to do that.

By the way, in the case of orafce, it uses LWLockRegisterTranche directly
but only when !found. So if any backend other than the creator of the shmem
want to access tranche, the puch tranche is not found on the process and
crashes. I think this is it.
If no other modules is installed, registeriing a tranche even if found will
supress the crash but it is not a solution at all.

At least for 9.6 or 10, orafce should do that following the documentation.
But it still can crash from the problem by the separate
NamedLWLockTrancheArray. (ID range check in LWLockInitialize would be
useless if it is not used by extensions)

I'll continue considering this next week

regards,

---
Kyotaro Horiguchi

#7Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro HORIGUCHI (#6)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

On Fri, Mar 3, 2017 at 3:49 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

You can read about usage of LWLocks in extensions from below location:
https://www.postgresql.org/docs/devel/static/xfunc-c.html#idp86986416

Thank you for the pointer. I understand that the document describes the only
correct way to use LWLock in extensions and using LWLockRegisterTranche is
a non-standard or prohibit way to do that.

By the way, in the case of orafce, it uses LWLockRegisterTranche directly
but only when !found. So if any backend other than the creator of the shmem
want to access tranche, the puch tranche is not found on the process and
crashes. I think this is it.

Yeah and I think that is expected if you use LWLockRegisterTranche.
You might want to read comments in lwlock.h to know the reason behind
the same.

If no other modules is installed, registeriing a tranche even if found will
supress the crash but it is not a solution at all.

At least for 9.6 or 10, orafce should do that following the documentation.

Agreed.

But it still can crash from the problem by the separate
NamedLWLockTrancheArray. (ID range check in LWLockInitialize would be
useless if it is not used by extensions)

What exact problem are you referring here?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Amit Kapila (#7)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

Hello,

At Sat, 4 Mar 2017 10:07:50 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1J0DcZun00PwSiftvUjpGfD2zq8CYXv9RYtiJPGbraPTw@mail.gmail.com>

On Fri, Mar 3, 2017 at 3:49 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

You can read about usage of LWLocks in extensions from below location:
https://www.postgresql.org/docs/devel/static/xfunc-c.html#idp86986416

Thank you for the pointer. I understand that the document describes the only
correct way to use LWLock in extensions and using LWLockRegisterTranche is
a non-standard or prohibit way to do that.

By the way, in the case of orafce, it uses LWLockRegisterTranche directly
but only when !found. So if any backend other than the creator of the shmem
want to access tranche, the puch tranche is not found on the process and
crashes. I think this is it.

(I can't recall what the 'the puch' was...)

Yeah and I think that is expected if you use LWLockRegisterTranche.
You might want to read comments in lwlock.h to know the reason behind
the same.

Ah, yes. I understood that. And even if a module prudently
registers its own tranche, it would be easily broken by some
other modules' omission to call LWLockNewTransactionId() for all
backends. As the result extension modules should not go that way.

If no other modules is installed, registeriing a tranche even if found will
supress the crash but it is not a solution at all.

At least for 9.6 or 10, orafce should do that following the documentation.

Agreed.

But it still can crash from the problem by the separate
NamedLWLockTrancheArray. (ID range check in LWLockInitialize would be
useless if it is not used by extensions)

What exact problem are you referring here?

At many places in lwlock.c, especially on TRACE_POSTGRESQ_*()
macros, T_NAME(lock) is used and the macro just uses tranche id
as index of the main tranche array (LWLockTrancheArray). On the
other hand it is beyond the end of of the array for the "named
tranche"s and can raise SEGV.

I can guess two ways to fix this. One is change the definition of
T_NAME.

| #define T_NAME(l) \
| ((l)->tranche < LWTRANCHE_FIRST_USER_DEFINED ? \
| LWLockTrancheArray[(l)->tranche] : \
| NamedLWLockTrancheArray[(l)->tranche - LWTRANCHE_FIRST_USER_DEFINED]

It makes the patch small but I don't thing the shape is
desirable.

Then, the other way is registering named tranches into the main
tranche array. The number and names of the requested named
tranches are known to postmaster so they can be allocated and
initialized at the time.

The mapping of the shared memory is inherited to backends so
pointing to the addresses in shared memory will work in the
!EXEC_BACKEND case. I confirmed that the behavior is ensured also
in EXEC_BACKEND case.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#8)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

At Mon, 06 Mar 2017 15:44:52 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170306.154452.254472341.horiguchi.kyotaro@lab.ntt.co.jp>

Hello,

At Sat, 4 Mar 2017 10:07:50 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1J0DcZun00PwSiftvUjpGfD2zq8CYXv9RYtiJPGbraPTw@mail.gmail.com>

On Fri, Mar 3, 2017 at 3:49 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

You can read about usage of LWLocks in extensions from below location:
https://www.postgresql.org/docs/devel/static/xfunc-c.html#idp86986416

Thank you for the pointer. I understand that the document describes the only
correct way to use LWLock in extensions and using LWLockRegisterTranche is
a non-standard or prohibit way to do that.

By the way, in the case of orafce, it uses LWLockRegisterTranche directly
but only when !found. So if any backend other than the creator of the shmem
want to access tranche, the puch tranche is not found on the process and
crashes. I think this is it.

(I can't recall what the 'the puch' was...)

Yeah and I think that is expected if you use LWLockRegisterTranche.
You might want to read comments in lwlock.h to know the reason behind
the same.

Ah, yes. I understood that. And even if a module prudently
registers its own tranche, it would be easily broken by some
other modules' omission to call LWLockNewTransactionId() for all
backends. As the result extension modules should not go that way.

It's wrong. The module knows the allocated tranche id so it can
register the tranche properly.

If no other modules is installed, registeriing a tranche even if found will
supress the crash but it is not a solution at all.

At least for 9.6 or 10, orafce should do that following the documentation.

Agreed.

But it still can crash from the problem by the separate
NamedLWLockTrancheArray. (ID range check in LWLockInitialize would be
useless if it is not used by extensions)

What exact problem are you referring here?

At many places in lwlock.c, especially on TRACE_POSTGRESQ_*()
macros, T_NAME(lock) is used and the macro just uses tranche id
as index of the main tranche array (LWLockTrancheArray). On the
other hand it is beyond the end of of the array for the "named
tranche"s and can raise SEGV.

I can guess two ways to fix this. One is change the definition of
T_NAME.

| #define T_NAME(l) \
| ((l)->tranche < LWTRANCHE_FIRST_USER_DEFINED ? \
| LWLockTrancheArray[(l)->tranche] : \
| NamedLWLockTrancheArray[(l)->tranche - LWTRANCHE_FIRST_USER_DEFINED]

It makes the patch small but I don't thing the shape is
desirable.

Then, the other way is registering named tranches into the main
tranche array. The number and names of the requested named
tranches are known to postmaster so they can be allocated and
initialized at the time.

The mapping of the shared memory is inherited to backends so
pointing to the addresses in shared memory will work in the
!EXEC_BACKEND case. I confirmed that the behavior is ensured also
in EXEC_BACKEND case.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#9)
1 attachment(s)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

Ok, I think I understand the complete picture.

At Mon, 06 Mar 2017 15:58:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170306.155856.198084190.horiguchi.kyotaro@lab.ntt.co.jp>

I can guess two ways to fix this. One is change the definition of
T_NAME.

| #define T_NAME(l) \
| ((l)->tranche < LWTRANCHE_FIRST_USER_DEFINED ? \
| LWLockTrancheArray[(l)->tranche] : \
| NamedLWLockTrancheArray[(l)->tranche - LWTRANCHE_FIRST_USER_DEFINED]

It makes the patch small but I don't thing the shape is
desirable.

Then, the other way is registering named tranches into the main
tranche array. The number and names of the requested named
tranches are known to postmaster so they can be allocated and
initialized at the time.

The mapping of the shared memory is inherited to backends so
pointing to the addresses in shared memory will work in the
!EXEC_BACKEND case. I confirmed that the behavior is ensured also
in EXEC_BACKEND case.

But this doesn't work for
LWLockNewTrancheId/LWLockRegisterTranche and it is valuable
interface. So the measure we can take is redefining T_NAME.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

fix_tranche_name_access.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index ab81d94..7c4c8f4 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -115,7 +115,9 @@ static char **LWLockTrancheArray = NULL;
 static int	LWLockTranchesAllocated = 0;
 
 #define T_NAME(lock) \
-	(LWLockTrancheArray[(lock)->tranche])
+	((lock)->tranche < LWTRANCHE_FIRST_USER_DEFINED ? \
+	LWLockTrancheArray[(lock)->tranche] : \
+	 NamedLWLockTrancheArray[(lock)->tranche - LWTRANCHE_FIRST_USER_DEFINED].trancheName)
 
 /*
  * This points to the main array of LWLocks in shared memory.  Backends inherit
#11Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#10)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

By the way,

At Mon, 06 Mar 2017 17:07:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170306.170755.68410634.horiguchi.kyotaro@lab.ntt.co.jp>

Ok, I think I understand the complete picture.

At Mon, 06 Mar 2017 15:58:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170306.155856.198084190.horiguchi.kyotaro@lab.ntt.co.jp>

I can guess two ways to fix this. One is change the definition of
T_NAME.

| #define T_NAME(l) \
| ((l)->tranche < LWTRANCHE_FIRST_USER_DEFINED ? \
| LWLockTrancheArray[(l)->tranche] : \
| NamedLWLockTrancheArray[(l)->tranche - LWTRANCHE_FIRST_USER_DEFINED]

It makes the patch small but I don't thing the shape is
desirable.

Then, the other way is registering named tranches into the main
tranche array. The number and names of the requested named
tranches are known to postmaster so they can be allocated and
initialized at the time.

The mapping of the shared memory is inherited to backends so
pointing to the addresses in shared memory will work in the
!EXEC_BACKEND case. I confirmed that the behavior is ensured also
in EXEC_BACKEND case.

But this doesn't work for
LWLockNewTrancheId/LWLockRegisterTranche and it is valuable
interface. So the measure we can take is redefining T_NAME.

I realized that *I* have used the interface
RequestNamedLWLockTranche in certain extensions but I completely
forgot that and faintly recalled after being pointed by one of my
colleagues..

Sorry for the off-topic.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Amit Kapila
amit.kapila16@gmail.com
In reply to: Kyotaro HORIGUCHI (#10)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

On Mon, Mar 6, 2017 at 1:37 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Ok, I think I understand the complete picture.

At Mon, 06 Mar 2017 15:58:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20170306.155856.198084190.horiguchi.kyotaro@lab.ntt.co.jp>

I can guess two ways to fix this. One is change the definition of
T_NAME.

| #define T_NAME(l) \
| ((l)->tranche < LWTRANCHE_FIRST_USER_DEFINED ? \
| LWLockTrancheArray[(l)->tranche] : \
| NamedLWLockTrancheArray[(l)->tranche - LWTRANCHE_FIRST_USER_DEFINED]

It makes the patch small but I don't thing the shape is
desirable.

Then, the other way is registering named tranches into the main
tranche array. The number and names of the requested named
tranches are known to postmaster so they can be allocated and
initialized at the time.

The mapping of the shared memory is inherited to backends so
pointing to the addresses in shared memory will work in the
!EXEC_BACKEND case. I confirmed that the behavior is ensured also
in EXEC_BACKEND case.

But this doesn't work for
LWLockNewTrancheId/LWLockRegisterTranche and it is valuable
interface. So the measure we can take is redefining T_NAME.

In RegisterLWLockTranches(), we do register the named tranches as well
which should make it available in LWLockTrancheArray. Do you see any
reason why that shouldn't work?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Amit Kapila (#12)
Re: [BUG FIX] Removing NamedLWLockTrancheArray

At Mon, 6 Mar 2017 19:32:25 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in <CAA4eK1LdjtsN3tpB0HJW0vfahd=Hgoq5LRy=eNh1CJZ=0WmCBg@mail.gmail.com>

In RegisterLWLockTranches(), we do register the named tranches as well
which should make it available in LWLockTrancheArray. Do you see any
reason why that shouldn't work?

Ouch! I mingled the two cases
together. RequestNamedLWLockTranche'd tranches are registerd in
the main tranche array.

Sorry for the noise and thanks a lot for kindly answering me.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers