Refactor GetLockStatusData() by skipping unused backends and groups

Started by Fujii Masaoabout 1 year ago5 messages

masao.fujii@oss.nttdata.com

about 1 year ago

1 attachment(s)

Hi,

While reading the fast-path lock code, I noticed that GetLockStatusData()
checks all slots for every backend to gather fast-path lock data. However,
backends with PID=0 don't hold fast-path locks, right? If so, we can
improve efficiency by having GetLockStatusData() skip those backends early.

Additionally, when GetLockStatusData() checks a backend, it currently
goes through all the slots accross its groups. Each group has 16 slots,
so if a backend has 4 groups (this can change depending on max_locks_per_transaction),
that means checking 64 slots. Instead, we could refactor the function
to skip groups without registered fast-path locks, improving performance.
Since each set of 16 slots is packed into a uint32 variable (PGPROC->fpLockBits[i]),
it’s easy to check if a group has any fast-path locks.

I've attached a patch that implements these changes. This refactoring is
especially useful when max_connections and max_locks_per_transaction are
set high, as it reduces unnecessary checks across numerous slots.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v1-0001-Refactor-GetLockStatusData-to-skip-backends-group.patchtext/plain; charset=UTF-8; name=v1-0001-Refactor-GetLockStatusData-to-skip-backends-group.patchDownload

From 2f4c1f51ccd32981a1bfefe04468e630cb0bd63c Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Sun, 20 Oct 2024 14:51:07 +0900
Subject: [PATCH v1] Refactor GetLockStatusData() to skip backends/groups
 without fast-path locks.

Previously, GetLockStatusData() checked all slots for every backend
to gather fast-path lock data, which could be inefficient. This commit
refactors it by skipping backends with PID=0 (since they don't hold
fast-path locks) and skipping groups with no registered fast-path locks,
improving efficiency.

This refactoring is particularly beneficial, for example when
max_connections and max_locks_per_transaction are set high,
as it reduces unnecessary checks across numerous slots.
---
 src/backend/storage/lmgr/lock.c | 67 +++++++++++++++++++--------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 09a8ac1578..4ea47f2d99 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3731,44 +3731,55 @@ GetLockStatusData(void)
 	for (i = 0; i < ProcGlobal->allProcCount; ++i)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[i];
-		uint32		f;
+
+		/* Skip backends with pid=0, as they don't hold fast-path locks */
+		if (proc->pid == 0)
+			continue;
 
 		LWLockAcquire(&proc->fpInfoLock, LW_SHARED);
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; ++f)
+		for (uint32 g = 0; g < FastPathLockGroupsPerBackend; g++)
 		{
-			LockInstanceData *instance;
-			uint32		lockbits = FAST_PATH_GET_BITS(proc, f);
-
-			/* Skip unallocated slots. */
-			if (!lockbits)
+			/* Skip unallocated groups */
+			if (proc->fpLockBits[g] == 0)
 				continue;
 
-			if (el >= els)
+			for (int j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
-				els += MaxBackends;
-				data->locks = (LockInstanceData *)
-					repalloc(data->locks, sizeof(LockInstanceData) * els);
-			}
+				LockInstanceData *instance;
+				uint32		f = FAST_PATH_SLOT(g, j);
+				uint32		lockbits = FAST_PATH_GET_BITS(proc, f);
 
-			instance = &data->locks[el];
-			SET_LOCKTAG_RELATION(instance->locktag, proc->databaseId,
-								 proc->fpRelId[f]);
-			instance->holdMask = lockbits << FAST_PATH_LOCKNUMBER_OFFSET;
-			instance->waitLockMode = NoLock;
-			instance->vxid.procNumber = proc->vxid.procNumber;
-			instance->vxid.localTransactionId = proc->vxid.lxid;
-			instance->pid = proc->pid;
-			instance->leaderPid = proc->pid;
-			instance->fastpath = true;
+				/* Skip unallocated slots. */
+				if (!lockbits)
+					continue;
 
-			/*
-			 * Successfully taking fast path lock means there were no
-			 * conflicting locks.
-			 */
-			instance->waitStart = 0;
+				if (el >= els)
+				{
+					els += MaxBackends;
+					data->locks = (LockInstanceData *)
+						repalloc(data->locks, sizeof(LockInstanceData) * els);
+				}
 
-			el++;
+				instance = &data->locks[el];
+				SET_LOCKTAG_RELATION(instance->locktag, proc->databaseId,
+									 proc->fpRelId[f]);
+				instance->holdMask = lockbits << FAST_PATH_LOCKNUMBER_OFFSET;
+				instance->waitLockMode = NoLock;
+				instance->vxid.procNumber = proc->vxid.procNumber;
+				instance->vxid.localTransactionId = proc->vxid.lxid;
+				instance->pid = proc->pid;
+				instance->leaderPid = proc->pid;
+				instance->fastpath = true;
+
+				/*
+				 * Successfully taking fast path lock means there were no
+				 * conflicting locks.
+				 */
+				instance->waitStart = 0;
+
+				el++;
+			}
 		}
 
 		if (proc->fpVXIDLock)
-- 
2.46.2

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 year ago

In reply to: Fujii Masao (#1)

Re: Refactor GetLockStatusData() by skipping unused backends and groups

Hi,

On Mon, Oct 21, 2024 at 09:19:49AM +0900, Fujii Masao wrote:

Hi,

While reading the fast-path lock code, I noticed that GetLockStatusData()
checks all slots for every backend to gather fast-path lock data. However,
backends with PID=0 don't hold fast-path locks, right?

I think the same as those are not a "regular" backend.

If so, we can
improve efficiency by having GetLockStatusData() skip those backends early.

Agree.

Additionally, when GetLockStatusData() checks a backend, it currently
goes through all the slots accross its groups. Each group has 16 slots,
so if a backend has 4 groups (this can change depending on max_locks_per_transaction),
that means checking 64 slots. Instead, we could refactor the function
to skip groups without registered fast-path locks, improving performance.
Since each set of 16 slots is packed into a uint32 variable (PGPROC->fpLockBits[i]),
it’s easy to check if a group has any fast-path locks.

I've attached a patch that implements these changes. This refactoring is
especially useful when max_connections and max_locks_per_transaction are
set high, as it reduces unnecessary checks across numerous slots.

I think that your refactoring proposal makes sense.

A few random comments on it:

1 ===

+               /* Skip backends with pid=0, as they don't hold fast-path locks */
+               if (proc->pid == 0)
+                       continue;

What about adding a few words in the comment that it represents prepared
transactions? Or maybe add a new macro (say IS_PREPARED_TRANSACTION(proc)) and
use it in the few places where we check for "PGPROC"->pid == 0 or "PGPROC"->pid != 0?

2 ===

-               for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; ++f)
+               for (uint32 g = 0; g < FastPathLockGroupsPerBackend; g++)
                {

As FP_LOCK_SLOTS_PER_BACKEND = (FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
then the proposed approach starts with a "smaller" loop which makes sense.

and then the patch:

+ for (int j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)

So that we are covering "FP_LOCK_SLOTS_PER_BACKEND" but discarded groups that
do not contain registered fast-path locks:

+                       /* Skip unallocated groups */
+                       if (proc->fpLockBits[g] == 0)

That does make sense to me.

One remark about the comment, what about?

s/Skip unallocated groups/Skip groups without registered fast-path locks./?

or at least add a "." at the end to be consistent with:

"/* Skip unallocated slots. */"

3 ===

One thing that worry me a bit is that we "lost" the FP_LOCK_SLOTS_PER_BACKEND usage,
so that if there is a change on it (for wathever reason) then we probably need to
be careful that the change would be reflected here too.

So, what about to add an Assert to check that we overall iterated over FP_LOCK_SLOTS_PER_BACKEND?

4 ===

Then the patch does move existing code around and just add a call to FAST_PATH_SLOT()
to get fast-path lock slot index based on the group and slot indexes we are iterating
on.

That does make sense to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Fujii Masao

masao.fujii@oss.nttdata.com

about 1 year ago

In reply to: Bertrand Drouvot (#2)

1 attachment(s)

Re: Refactor GetLockStatusData() by skipping unused backends and groups

On 2024/10/21 16:32, Bertrand Drouvot wrote:

A few random comments on it:

Thanks for the review!

1 ===
+               /* Skip backends with pid=0, as they don't hold fast-path locks */
+               if (proc->pid == 0)
+                       continue;
What about adding a few words in the comment that it represents prepared
transactions? Or maybe add a new macro (say IS_PREPARED_TRANSACTION(proc)) and
use it in the few places where we check for "PGPROC"->pid == 0 or "PGPROC"->pid != 0?

I understand that PGPROC entries with pid=0 are typically those not yet allocated to
any backends. Yes, as you mentioned, prepared transactions also have pid=0. However,
GetLockStatusData() loops up to ProcGlobal->allProcCount, which is MaxBackends plus
NUM_AUXILIARY_PROCS, excluding prepared transactions. Therefore, GetLockStatusData()
doesn't seem to check PGPROC entries for prepared transactions at all.

In proc.c
--------------
/* XXX allProcCount isn't really all of them; it excludes prepared xacts */
ProcGlobal->allProcCount = MaxBackends + NUM_AUXILIARY_PROCS;
--------------

One remark about the comment, what about?

s/Skip unallocated groups/Skip groups without registered fast-path locks./?

I've updated the source comment accordingly.

or at least add a "." at the end to be consistent with:

"/* Skip unallocated slots. */"

I removed the period at the end to match the usual convention in the codebase
for single-line comment.

I've attached v2 patch.

3 ===

One thing that worry me a bit is that we "lost" the FP_LOCK_SLOTS_PER_BACKEND usage,
so that if there is a change on it (for wathever reason) then we probably need to
be careful that the change would be reflected here too.

So, what about to add an Assert to check that we overall iterated over FP_LOCK_SLOTS_PER_BACKEND?

You mean adding an assertion check to ensure that the slot ID calculated by
FAST_PATH_SLOT() is less than FP_LOCK_SLOTS_PER_BACKEND? But GetLockStatusData()
already calls FAST_PATH_GET_BITS() right after FAST_PATH_SLOT(),
and FAST_PATH_GET_BITS() has an assertion that validates this. So, probably
we can consider that this check is already in place. If it’s still worth adding,
perhaps placing it inside the FAST_PATH_SLOT() macro could be an option...
Or current assertion check is enough? Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v2-0001-Refactor-GetLockStatusData-to-skip-backends-group.patchtext/plain; charset=UTF-8; name=v2-0001-Refactor-GetLockStatusData-to-skip-backends-group.patchDownload

From b55620cf73d12ef95dc21f3b42afaa264500612d Mon Sep 17 00:00:00 2001
From: Fujii Masao <fujii@postgresql.org>
Date: Sun, 20 Oct 2024 14:51:07 +0900
Subject: [PATCH v2] Refactor GetLockStatusData() to skip backends/groups
 without fast-path locks.

Previously, GetLockStatusData() checked all slots for every backend
to gather fast-path lock data, which could be inefficient. This commit
refactors it by skipping backends with PID=0 (since they don't hold
fast-path locks) and skipping groups with no registered fast-path locks,
improving efficiency.

This refactoring is particularly beneficial, for example when
max_connections and max_locks_per_transaction are set high,
as it reduces unnecessary checks across numerous slots.
---
 src/backend/storage/lmgr/lock.c | 67 +++++++++++++++++++--------------
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 09a8ac1578..4fccb7277e 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3731,44 +3731,55 @@ GetLockStatusData(void)
 	for (i = 0; i < ProcGlobal->allProcCount; ++i)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[i];
-		uint32		f;
+
+		/* Skip backends with pid=0, as they don't hold fast-path locks */
+		if (proc->pid == 0)
+			continue;
 
 		LWLockAcquire(&proc->fpInfoLock, LW_SHARED);
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; ++f)
+		for (uint32 g = 0; g < FastPathLockGroupsPerBackend; g++)
 		{
-			LockInstanceData *instance;
-			uint32		lockbits = FAST_PATH_GET_BITS(proc, f);
-
-			/* Skip unallocated slots. */
-			if (!lockbits)
+			/* Skip groups without registered fast-path locks */
+			if (proc->fpLockBits[g] == 0)
 				continue;
 
-			if (el >= els)
+			for (int j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
-				els += MaxBackends;
-				data->locks = (LockInstanceData *)
-					repalloc(data->locks, sizeof(LockInstanceData) * els);
-			}
+				LockInstanceData *instance;
+				uint32		f = FAST_PATH_SLOT(g, j);
+				uint32		lockbits = FAST_PATH_GET_BITS(proc, f);
 
-			instance = &data->locks[el];
-			SET_LOCKTAG_RELATION(instance->locktag, proc->databaseId,
-								 proc->fpRelId[f]);
-			instance->holdMask = lockbits << FAST_PATH_LOCKNUMBER_OFFSET;
-			instance->waitLockMode = NoLock;
-			instance->vxid.procNumber = proc->vxid.procNumber;
-			instance->vxid.localTransactionId = proc->vxid.lxid;
-			instance->pid = proc->pid;
-			instance->leaderPid = proc->pid;
-			instance->fastpath = true;
+				/* Skip unallocated slots */
+				if (!lockbits)
+					continue;
 
-			/*
-			 * Successfully taking fast path lock means there were no
-			 * conflicting locks.
-			 */
-			instance->waitStart = 0;
+				if (el >= els)
+				{
+					els += MaxBackends;
+					data->locks = (LockInstanceData *)
+						repalloc(data->locks, sizeof(LockInstanceData) * els);
+				}
 
-			el++;
+				instance = &data->locks[el];
+				SET_LOCKTAG_RELATION(instance->locktag, proc->databaseId,
+									 proc->fpRelId[f]);
+				instance->holdMask = lockbits << FAST_PATH_LOCKNUMBER_OFFSET;
+				instance->waitLockMode = NoLock;
+				instance->vxid.procNumber = proc->vxid.procNumber;
+				instance->vxid.localTransactionId = proc->vxid.lxid;
+				instance->pid = proc->pid;
+				instance->leaderPid = proc->pid;
+				instance->fastpath = true;
+
+				/*
+				 * Successfully taking fast path lock means there were no
+				 * conflicting locks.
+				 */
+				instance->waitStart = 0;
+
+				el++;
+			}
 		}
 
 		if (proc->fpVXIDLock)
-- 
2.46.2

Bertrand Drouvot

bertranddrouvot.pg@gmail.com

about 1 year ago

In reply to: Fujii Masao (#3)

Re: Refactor GetLockStatusData() by skipping unused backends and groups

Hi,

On Wed, Oct 23, 2024 at 01:19:37AM +0900, Fujii Masao wrote:

I understand that PGPROC entries with pid=0 are typically those not yet allocated to
any backends. Yes, as you mentioned, prepared transactions also have pid=0. However,
GetLockStatusData() loops up to ProcGlobal->allProcCount, which is MaxBackends plus
NUM_AUXILIARY_PROCS, excluding prepared transactions. Therefore, GetLockStatusData()
doesn't seem to check PGPROC entries for prepared transactions at all.

In proc.c
--------------
/* XXX allProcCount isn't really all of them; it excludes prepared xacts */
ProcGlobal->allProcCount = MaxBackends + NUM_AUXILIARY_PROCS;
--------------

Oh right, thanks for pointing out!

I removed the period at the end to match the usual convention in the codebase
for single-line comment.

I've attached v2 patch.

Thanks for the new version!

You mean adding an assertion check to ensure that the slot ID calculated by
FAST_PATH_SLOT() is less than FP_LOCK_SLOTS_PER_BACKEND?

Yes.

But GetLockStatusData()
already calls FAST_PATH_GET_BITS() right after FAST_PATH_SLOT(),
and FAST_PATH_GET_BITS() has an assertion that validates this.

Oh right, it's "already" in FAST_PATH_GROUP() (and FAST_PATH_INDEX()).

So, probably
we can consider that this check is already in place. If it’s still worth adding,
perhaps placing it inside the FAST_PATH_SLOT() macro could be an option...
Or current assertion check is enough? Thought?

Given that it's already done in FAST_PATH_GET_BITS(), I think that's fine as it
is and v2 LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Fujii Masao

masao.fujii@oss.nttdata.com

about 1 year ago

In reply to: Bertrand Drouvot (#4)

Re: Refactor GetLockStatusData() by skipping unused backends and groups

On 2024/10/24 11:12, Bertrand Drouvot wrote:

So, probably
we can consider that this check is already in place. If it’s still worth adding,
perhaps placing it inside the FAST_PATH_SLOT() macro could be an option...
Or current assertion check is enough? Thought?

Given that it's already done in FAST_PATH_GET_BITS(), I think that's fine as it
is and v2 LGTM.

Thanks for the review! Pushed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION