Add “FOR UPDATE NOWAIT” lock details to the log.

Started by Seino Yukiover 1 year ago38 messageshackers

seinoyu@oss.nttdata.com

over 1 year ago

Hello,

I would like to add the information of the PID that caused the failure
when acquiring a lock with "FOR UPDATE NOWAIT".

When "FOR UPDATE" is executed and interrupted by lock_timeout,
xid and PID are output in the logs, but in the case of "FOR UPDATE
NOWAIT",
no information is output, making it impossible to identify the cause of
the lock failure.
Therefore, I would like to output information in the logs in the same
way as
when "FOR UPDATE" is executed and interrupted by lock_timeout.

The patch is attached as well.

Regards,
--
Yuki Seino
NTT DATA CORPORATION

Fujii Masao

masao.fujii@gmail.com

over 1 year ago

In reply to: Seino Yuki (#1)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On 2024/09/13 20:49, Seino Yuki wrote:

Hello,

I would like to add the information of the PID that caused the failure
when acquiring a lock with "FOR UPDATE NOWAIT".

When "FOR UPDATE" is executed and interrupted by lock_timeout,
xid and PID are output in the logs,

Could you explain how to reproduce this? In my quick test,
lock waits canceled by lock_timeout didn’t report either xid or PID,
so I might be missing something.

but in the case of "FOR UPDATE NOWAIT",
no information is output, making it impossible to identify the cause of the lock failure.
Therefore, I would like to output information in the logs in the same way as
when "FOR UPDATE" is executed and interrupted by lock_timeout.

I think NOWAIT is often used for lock waits that can fail frequently.
Logging detailed information for each failed NOWAIT lock wait could
lead to excessive and potentially noisy log messages for some users.

On the other hand, I understand that even with NOWAIT, we might want
to investigate why a lock wait failed. In such cases,
detailed information about the locking process would be useful.

Considering both points, I’m leaning toward adding a new GUC parameter
to control whether detailed logs should be generated for failed
NOWAIT locks (and possibly for those canceled by lock_timeout).
Alternatively, if adding a new GUC is not ideal, we could extend
log_lock_waits to accept a new value like 'error,' which would trigger
detailed logging when a lock wait fails due to NOWAIT, lock_timeout,
or cancellation.

The patch is attached as well.

I got the following compiler errors;

proc.c:1240:3: error: call to undeclared function 'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1240 | CollectLockHoldersAndWaiters(proclock, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum);
| ^
proc.c:1549:4: error: call to undeclared function 'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1549 | CollectLockHoldersAndWaiters(NULL, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum);
| ^
proc.c:1126:8: warning: unused variable 'first_holder' [-Wunused-variable]
1126 | bool first_holder = true,
| ^~~~~~~~~~~~
proc.c:1127:5: warning: unused variable 'first_waiter' [-Wunused-variable]
1127 | first_waiter = true;
| ^~~~~~~~~~~~
proc.c:1982:1: error: conflicting types for 'CollectLockHoldersAndWaiters'
1982 | CollectLockHoldersAndWaiters(PROCLOCK *waitProcLock, LOCK *lock, StringInfo lock_holders_sbuf, StringInfo lock_waiters_sbuf, int *lockHoldersNum)
| ^
proc.c:1240:3: note: previous implicit declaration is here
1240 | CollectLockHoldersAndWaiters(proclock, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum);
| ^
2 warnings and 3 errors generated.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Seino Yuki

seinoyu@oss.nttdata.com

over 1 year ago

In reply to: Fujii Masao (#2)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

Thank you for Review.

Could you explain how to reproduce this? In my quick test,
lock waits canceled by lock_timeout didn’t report either xid or PID,
so I might be missing something.

It seems to be outputting on my end, how about you?
===== Console =====
postgres=# SHOW log_lock_waits;
log_lock_waits
----------------
on
(1 row)

postgres=# SHOW lock_timeout;
lock_timeout
--------------
2s
(1 row)

(tx1) postgres=# BEGIN;
BEGIN
postgres=*# SELECT aid FROM pgbench_accounts WHERE aid = 1 FOR
UPDATE;
aid
-----
1
(1 row)
(tx2) postgres=# SELECT aid FROM pgbench_accounts WHERE aid = 1 FOR
UPDATE;
ERROR: canceling statement due to lock timeout
CONTEXT: while locking tuple (0,1) in relation "pgbench_accounts"

===== Log Output =====
# lock start
2024-10-17 17:49:58.157 JST [1443346] LOG: process 1443346 still
waiting for ShareLock on transaction 770 after 1000.096 ms
2024-10-17 17:49:58.157 JST [1443346] DETAIL: Process holding the lock:
1443241. Wait queue: 1443346.
2024-10-17 17:49:58.157 JST [1443346] CONTEXT: while locking tuple
(0,1) in relation "pgbench_accounts"
2024-10-17 17:49:58.157 JST [1443346] STATEMENT: SELECT * FROM
pgbench_accounts WHERE aid = 1 FOR UPDATE;
# lock timeout
2024-10-17 17:49:59.157 JST [1443346] ERROR: canceling statement due to
lock timeout
2024-10-17 17:49:59.157 JST [1443346] CONTEXT: while locking tuple
(0,1) in relation "pgbench_accounts"
2024-10-17 17:49:59.157 JST [1443346] STATEMENT: SELECT * FROM
pgbench_accounts WHERE aid = 1 FOR UPDATE;

I think NOWAIT is often used for lock waits that can fail frequently.
Logging detailed information for each failed NOWAIT lock wait could
lead to excessive and potentially noisy log messages for some users.

On the other hand, I understand that even with NOWAIT, we might want
to investigate why a lock wait failed. In such cases,
detailed information about the locking process would be useful.

Considering both points, I’m leaning toward adding a new GUC parameter
to control whether detailed logs should be generated for failed
NOWAIT locks (and possibly for those canceled by lock_timeout).
Alternatively, if adding a new GUC is not ideal, we could extend
log_lock_waits to accept a new value like 'error,' which would trigger
detailed logging when a lock wait fails due to NOWAIT, lock_timeout,
or cancellation.

That's a good idea. I'll try that.

I got the following compiler errors;

thank you. I missed it.

Regards,
--
Yuki Seino
NTT DATA CORPORATION

Seino Yuki

seinoyu@oss.nttdata.com

over 1 year ago

In reply to: Seino Yuki (#3)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

Considering both points, I’m leaning toward adding a new GUC parameter
to control whether detailed logs should be generated for failed
NOWAIT locks (and possibly for those canceled by lock_timeout).
Alternatively, if adding a new GUC is not ideal, we could extend
log_lock_waits to accept a new value like 'error,' which would trigger
detailed logging when a lock wait fails due to NOWAIT, lock_timeout,
or cancellation.

That's a good idea. I'll try that.

Sorry, I misunderstood.
The pid and xid are output during deadlock_timeout.

LOG: process 1443346 still waiting for ShareLock on transaction 770
after 1000.096 ms
DETAIL: Process holding the lock: 1443241. Wait queue: 1443346.

This log output is not triggered by lock_timeout or cancellation.
Therefore, it is difficult to support lock_timeout and cancellation.

I am thinking of supporting only NOWAIT. What do you think?

Regards,
--
Yuki Seino
NTT DATA CORPORATION

Fujii Masao

masao.fujii@gmail.com

over 1 year ago

In reply to: Seino Yuki (#4)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On 2024/10/17 22:15, Seino Yuki wrote:

Considering both points, I’m leaning toward adding a new GUC parameter
to control whether detailed logs should be generated for failed
NOWAIT locks (and possibly for those canceled by lock_timeout).
Alternatively, if adding a new GUC is not ideal, we could extend
log_lock_waits to accept a new value like 'error,' which would trigger
detailed logging when a lock wait fails due to NOWAIT, lock_timeout,
or cancellation.

That's a good idea. I'll try that.

Sorry, I misunderstood.
The pid and xid are output during deadlock_timeout.

LOG: process 1443346 still waiting for ShareLock on transaction 770 after 1000.096 ms
DETAIL: Process holding the lock: 1443241. Wait queue: 1443346.

This log output is not triggered by lock_timeout or cancellation.

Yes, these log messages with PID or XID are generated by log_lock_waits, not lock_timeout.

Therefore, it is difficult to support lock_timeout and cancellation.

I am thinking of supporting only NOWAIT. What do you think?

I'm not sure why it's challenging to provide detailed log messages for lock waits canceled
by lock_timeout or user cancellation, while it's considered feasible for the NOWAIT case.
However, I think it's reasonable to implement this feature step by step. We can start
by adding support for the NOWAIT case and consider extending it to handle lock_timeout and
cancellation scenarios later if possible.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Seino Yuki

seinoyu@oss.nttdata.com

over 1 year ago

In reply to: Fujii Masao (#5)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

Currently, we are discussing two improvements:

1. Log output when NOWAIT fails.
2. Adding control via GUC parameters (NOWAIT, lock_timeout,
cancellation).

I'm not sure why it's challenging to provide detailed log messages for
lock waits canceled
by lock_timeout or user cancellation, while it's considered feasible
for the NOWAIT case.

Does this statement mean that for 2, why can NOWAIT but
lock_timeout,cancellation cannot?

For item 2, the lock_timeout and cancellation will log outputs after the
deadlock_timeout(e.g. 1s) has elapsed (by log_lock_waits).
At the time this log is output, it is unclear whether the lock will be
cancellation or lock_timeout.
This means that the timing at "error is determined" and "output logged"
do not match.

However, I think it's reasonable to implement this feature step by
step. We can start
by adding support for the NOWAIT case and consider extending it to
handle lock_timeout and
cancellation scenarios later if possible.

+1.
I will send the version with the GUC parameter added from the previous
patch.

Regards,
--
Yuki Seino
NTT DATA CORPORATION

Seino Yuki

seinoyu@oss.nttdata.com

over 1 year ago

In reply to: Seino Yuki (#6)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

I will send the version with the GUC parameter added from the previous
patch.

I made some minor code refactoring.

Regards,
--
Yuki Seino
NTT DATA CORPORATION

Fujii Masao

masao.fujii@gmail.com

over 1 year ago

In reply to: Seino Yuki (#6)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On 2024/10/18 9:37, Seino Yuki wrote:

Currently, we are discussing two improvements:

1. Log output when NOWAIT fails.
2. Adding control via GUC parameters (NOWAIT, lock_timeout, cancellation).

I'm not sure why it's challenging to provide detailed log messages for lock waits canceled
by lock_timeout or user cancellation, while it's considered feasible for the NOWAIT case.

Does this statement mean that for 2, why can NOWAIT but lock_timeout,cancellation cannot?

For item 2, the lock_timeout and cancellation will log outputs after the deadlock_timeout(e.g. 1s) has elapsed (by log_lock_waits).

This log message isn't directly related to lock_timeout or cancellations. It appears
if log_lock_waits is enabled and the lock wait time exceeds deadlock_timeout,
regardless of whether lock_timeout is set or if users cancel the operation.

I understood your original proposal as suggesting detailed log output whenever
a NOWAIT lock attempt fails. However, as I mentioned earlier, NOWAIT failures can
happen frequently, so logging every failure in detail could be too noisy for some users.
That’s why I proposed adding a new GUC parameter (or extending log_lock_waits)
to control whether such detailed logs should be generated for NOWAIT failures.

During our discussion, I also thought it would be useful to provide detailed logs
for lock waits canceled by user actions or lock_timeout. This behavior could be
controlled by a similar GUC parameter. However, this is an additional feature
beyond your original proposal, so I think it's fine to leave it out of the initial patch.

At the time this log is output, it is unclear whether the lock will be cancellation or lock_timeout.
This means that the timing at "error is determined" and "output logged" do not match.

However, I think it's reasonable to implement this feature step by step. We can start
by adding support for the NOWAIT case and consider extending it to handle lock_timeout and
cancellation scenarios later if possible.

+1.
I will send the version with the GUC parameter added from the previous patch.

The choice between adding a new GUC or extending the existing one (e.g., log_lock_waits)
is debatable, but I prefer the latter. I'm considering extending log_lock_waits
to accept a value like "fail". If set to "on" (the current behavior),
detailed logs are generated when the lock wait time exceeds deadlock_timeout.
If set to "fail", logs are generated whenever a lock wait fails. If both are
specified, logs would be triggered when the wait time exceeds deadlock_timeout or
when a lock wait fails.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Seino Yuki

seinoyu@oss.nttdata.com

over 1 year ago

In reply to: Fujii Masao (#8)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

During our discussion, I also thought it would be useful to provide detailed logs

for lock waits canceled by user actions or lock_timeout.

Thank you. I understand now.
You want to output the logs based on a different trigger than
log_lock_waits.

At first, I found that strange, but I see now that there are cases where
it's necessary to output logs for errors even when log_lock_waits is
disabled.

I have also understood the proposal for improving the GUC parameters.
I will revise the patch.

Regards,
--
Yuki Seino
NTT DATA CORPORATION

#10

Seino Yuki

seinoyu@oss.nttdata.com

over 1 year ago

In reply to: Seino Yuki (#9)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

The choice between adding a new GUC or extending the existing one
(e.g., log_lock_waits)
is debatable, but I prefer the latter. I'm considering extending
log_lock_waits
to accept a value like "fail". If set to "on" (the current behavior),
detailed logs are generated when the lock wait time exceeds
deadlock_timeout.
If set to "fail", logs are generated whenever a lock wait fails. If
both are
specified, logs would be triggered when the wait time exceeds
deadlock_timeout or
when a lock wait fails.

Thanks for the idea.
Changed log_lock_waits to an enum type and added fail and all.
"off" : No log message(default).
"on" : If over deadlock_timeout(the current behavior).
"fail" : If lock failed.
"all" : All pettern.

I struggled with the name "on," but decided to leave it by
compatibility.

#11

Fujii Masao

masao.fujii@gmail.com

over 1 year ago

In reply to: Seino Yuki (#10)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On 2024/10/18 19:07, Seino Yuki wrote:

The choice between adding a new GUC or extending the existing one
(e.g., log_lock_waits)
is debatable, but I prefer the latter. I'm considering extending log_lock_waits
to accept a value like "fail". If set to "on" (the current behavior),
detailed logs are generated when the lock wait time exceeds deadlock_timeout.
If set to "fail", logs are generated whenever a lock wait fails. If both are
specified, logs would be triggered when the wait time exceeds
deadlock_timeout or
when a lock wait fails.

Thanks for the idea.
Changed log_lock_waits to an enum type and added fail and all.
"off" : No log message(default).
"on" : If over deadlock_timeout(the current behavior).
"fail" : If lock failed.
"all" : All pettern.

I'm still thinking about how we should handle logging for lock failures
caused by the nowait option. Extending the existing log_lock_waits parameter
has the advantage of avoiding a new GUC, but it might make configuration
more complicated. I'm starting to think adding a new GUC might be a better option.

Regarding the patch, when I applied it to HEAD, it failed to compile with
the following errors. Could you update the patch to address this?

proc.c:1538:20: error: use of undeclared identifier 'buf'
1538 | initStringInfo(&buf);
| ^
proc.c:1539:20: error: use of undeclared identifier 'lock_waiters_sbuf'
1539 | initStringInfo(&lock_waiters_sbuf);
| ^
proc.c:1540:20: error: use of undeclared identifier 'lock_holders_sbuf'
1540 | initStringInfo(&lock_holders_sbuf);
| ^
....

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#12

Seino Yuki

seinoyu@oss.nttdata.com

over 1 year ago

In reply to: Fujii Masao (#11)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

ing a new GUC or extending the existing one
(e.g., log_lock_waits)
is debatable, but I prefer the latter. I'm considering extending
log_lock_waits
to accept a value like "fail". If set to "on" (the current behavior),
detailed logs are generated when the lock wait time exceeds
deadlock_timeout.
If set to "fail", logs are generated whenever a lock wait fails. If
both are
specified, logs would be triggered when the wait time exceeds
deadlock_timeout or
when a lock wait fails.

Thanks for the idea.
Changed log_lock_waits to an enum type and added fail and all.
"off" : No log message(default).
"on" : If over deadlock_timeout(the current behavior).
"fail" : If lock failed.
"all" : All pettern.

I'm still thinking about how we should handle logging for lock failures
caused by the nowait option. Extending the existing log_lock_waits
parameter
has the advantage of avoiding a new GUC, but it might make configuration
more complicated. I'm starting to think adding a new GUC might be a
better option.

Yes. It’s still simple for now, but reusing an existing GUC could
complicate future expansions.

I will design a new GUC while ensuring consistency with 'log_lock_waits'.

Regarding the patch, when I applied it to HEAD, it failed to compile with
the following errors. Could you update the patch to address this?

proc.c:1538:20: error: use of undeclared identifier 'buf'
1538 |                         initStringInfo(&buf);
      |                                         ^
proc.c:1539:20: error: use of undeclared identifier 'lock_waiters_sbuf'
1539 | initStringInfo(&lock_waiters_sbuf);
      |                                         ^
proc.c:1540:20: error: use of undeclared identifier 'lock_holders_sbuf'
1540 | initStringInfo(&lock_holders_sbuf);
      |                                         ^
....

Conflicted with another commit [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3c0fd64fec8ed6fa3987c33f076fcffbc3f268c3. I'll rebase it later.

[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3c0fd64fec8ed6fa3987c33f076fcffbc3f268c3
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3c0fd64fec8ed6fa3987c33f076fcffbc3f268c3

#13

Seino Yuki

seinoyu@oss.nttdata.com

over 1 year ago

In reply to: Seino Yuki (#12)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

I will design a new GUC while ensuring consistency with 'log_lock_waits'.

Regarding the patch, when I applied it to HEAD, it failed to compile
with
the following errors. Could you update the patch to address this?

proc.c:1538:20: error: use of undeclared identifier 'buf'
1538 |                         initStringInfo(&buf);
      |                                         ^
proc.c:1539:20: error: use of undeclared identifier 'lock_waiters_sbuf'
1539 | initStringInfo(&lock_waiters_sbuf);
      |                                         ^
proc.c:1540:20: error: use of undeclared identifier 'lock_holders_sbuf'
1540 | initStringInfo(&lock_holders_sbuf);
      |                                         ^
....

Conflicted with another commit [1]. I'll rebase it later.

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3c0fd64fec8ed6fa3987c33f076fcffbc3f268c3

Rebased and added new GUC log_lock_failure and minor fixes. Currently
only NOWAIT errors are supported.
I would like to revisit the possibility of extending this GUC to include
client cancellations and lock timeouts at another opportunity.

--
Regards,
Yuki Seino
NTT DATA GROUP CORPORATION

#14

Fujii Masao

masao.fujii@gmail.com

over 1 year ago

In reply to: Seino Yuki (#13)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On 2024/12/19 17:21, Yuki Seino wrote:

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3c0fd64fec8ed6fa3987c33f076fcffbc3f268c3

Rebased and added new GUC log_lock_failure and minor fixes. Currently only NOWAIT errors are supported.

Thanks for updating the patch!

+        Currently, only lock failures due to <literal>NOWAIT</literal> are
+        supported.  The default is <literal>off</literal>.  Only superusers

This GUC also affects SKIP LOCKED, so that should be documented.

Regarding the GUC name log_lock_failure, I'm not sure it's the best fit,
but I don't have a better suggestion at the moment. I'll keep considering alternatives.

If CollectLockHoldersAndWaiters() is used only in proc.c,
it should be made a static function.

+/*
+ * we loop over the lock's procLocks to gather a list of all
+ * holders and waiters. Thus we will be able to provide more
+ * detailed information for lock debugging purposes.

For the source comment of CollectLockHoldersAndWaiters(),
could you follow the style of other functions in proc.c?

Why is the logic to report lock holders and waiters implemented within JoinWaitQueue()?
Wouldn't it be better placed right after JoinWaitQueue() is called in
LockAcquireExtended(), similar to DeadLockReport()? One issue with
the current implementation is that partitionLock is held in exclusive mode
while collecting and reporting lock holders and waiters. If the logic is moved
after JoinWaitQueue() in LockAcquireExtended(), lock holders and waiters could be
processed after releasing partitionLock. Note, however, that partitionLock might
still need to be taken again in shared mode during the collection.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#15

Seino Yuki

seinoyu@oss.nttdata.com

about 1 year ago

In reply to: Fujii Masao (#14)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

Thank you for your comments.

+ Currently, only lock failures due to <literal>NOWAIT</literal> are
+        supported.  The default is <literal>off</literal>.  Only 
superusers

This GUC also affects SKIP LOCKED, so that should be documented.

I overlooked it... I have revised the document with SKIP LOCKED.

If CollectLockHoldersAndWaiters() is used only in proc.c,
it should be made a static function.
+/*
+ * we loop over the lock's procLocks to gather a list of all
+ * holders and waiters. Thus we will be able to provide more
+ * detailed information for lock debugging purposes.
For the source comment of CollectLockHoldersAndWaiters(),
could you follow the style of other functions in proc.c?

Why is the logic to report lock holders and waiters implemented within
JoinWaitQueue()?
Wouldn't it be better placed right after JoinWaitQueue() is called in
LockAcquireExtended(), similar to DeadLockReport()? One issue with
the current implementation is that partitionLock is held in exclusive
mode
while collecting and reporting lock holders and waiters. If the logic
is moved
after JoinWaitQueue() in LockAcquireExtended(), lock holders and
waiters could be
processed after releasing partitionLock. Note, however, that
partitionLock might
still need to be taken again in shared mode during the collection.

Indeed.

I moved the logic to report lock holders and waiters after JoinWaitQueue().

Regards,

#16

Fujii Masao

masao.fujii@gmail.com

about 1 year ago

In reply to: Seino Yuki (#15)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On 2025/01/17 18:29, Yuki Seino wrote:

Thank you for your comments.
+ Currently, only lock failures due to <literal>NOWAIT</literal> are
+        supported.  The default is <literal>off</literal>.  Only superusers
This GUC also affects SKIP LOCKED, so that should be documented.
I overlooked it... I have revised the document with SKIP LOCKED.

SELECT FOR UPDATE SKIP LOCKED might skip a large number of rows, leading to
a high volume of log messages from log_lock_failure in your current patch.
Could this be considered unintended behavior? Would it be better to limit
the number of log messages per query?

I moved the logic to report lock holders and waiters after JoinWaitQueue().

Thanks for updating the patch!

+/*
+ * CollectLockHoldersAndWaiters

Why was this function moved to lmgr.c? Wouldn't it make more sense to
keep it in proc.c, as the original code was located there?

+ * we loop over the lock's procLocks to gather a list of all
+ * holders and waiters. Thus we will be able to provide more
+ * detailed information for lock debugging purposes.
+ *
+ * lock->procLocks contains all processes which hold or wait for
+ * this lock.

Since this seems more like an internal comment, it would be better
placed directly above the dlist_foreach(proc_iter, &lock->procLocks) loop.
For the function header, consider a description about interface, such as:

--------------
CollectLockHoldersAndWaiters -- gather details about lock holders and waiters

The lock table's partition lock must be held on entry and remains held on exit.

Fill lock_holders_sbuf and lock_waiters_sbuf with the PIDs of processes holding
and waiting for the lock, and set lockHoldersNum to the number of lock holders.
--------------

To ensure correctness, it would be better to assert that the partition lock
is held on entry. For example, you could add the following assertion in
CollectLockHoldersAndWaiters(). If so, the argument "LOCK *lock" should be
changed to "LOCALLOCK *locallock".

--------------
#ifdef USE_ASSERT_CHECKING
{
uint32 hashcode = locallock->hashcode;
LWLock *partitionLock = LockHashPartitionLock(hashcode);

Assert(!LWLockHeldByMe(partitionLock));
}
#endif
--------------

+ dlist_foreach(proc_iter, &lock->procLocks)

lockHoldersNum should be initialized to zero before the loop to handle cases
where the caller forgets to do so.

+/* GUC variables. */
+int			max_locks_per_xact; /* This configuration variable is used to set the lock table size */
+bool		log_lock_failure = false;

IMO the declaration of log_lock_failure would be more intuitive if placed
immediately after log_lock_waits in proc.c.

+ int lockHoldersNum = 0;
+ initStringInfo(&buf);

Please add a line break before initStringInfo(&buf) for better readability.

+				LWLockAcquire(partitionLock, LW_SHARED);
+				DescribeLockTag(&buf, &locallock->tag.lock);
+				modename = GetLockmodeName(locallock->tag.lock.locktag_lockmethodid,
+											lockmode);

The partition lock is unnecessary for DescribeLockTag() and GetLockmodeName().
It should only be acquired right before calling CollectLockHoldersAndWaiters()
and released immediately afterward.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#17

Seino Yuki

seinoyu@oss.nttdata.com

about 1 year ago

In reply to: Fujii Masao (#16)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

Thank you for your comment. Sorry for being late.

SELECT FOR UPDATE SKIP LOCKED might skip a large number of rows,
leading to
a high volume of log messages from log_lock_failure in your current
patch.
Could this be considered unintended behavior? Would it be better to limit
the number of log messages per query?

It is necessary to suppress the generation of a large amount of logs due
to SKIP LOCKED.
But I think that when using SKIP LOCKED, the locks are often
intentionally bypassed.
It seems unnatural to log only the first write or to set a specific
number of samples.

What do you think if we simply don't log anything for SKIP LOCKED?

Regards,

#18

Fujii Masao

masao.fujii@gmail.com

about 1 year ago

In reply to: Seino Yuki (#17)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On 2025/02/03 21:30, Yuki Seino wrote:

Thank you for your comment. Sorry for being late.

SELECT FOR UPDATE SKIP LOCKED might skip a large number of rows, leading to
a high volume of log messages from log_lock_failure in your current patch.
Could this be considered unintended behavior? Would it be better to limit
the number of log messages per query?

It is necessary to suppress the generation of a large amount of logs due to SKIP LOCKED.
But I think that when using SKIP LOCKED, the locks are often intentionally bypassed.
It seems unnatural to log only the first write or to set a specific number of samples.

I don't think so. Some users of SKIP LOCKED may want to understand why locks
were skipped and identify the blockers. Even partial information could still
be valuable to them, I think.

What do you think if we simply don't log anything for SKIP LOCKED?

Implementing both NOWAIT and SKIP LOCKED could take time and make the patch
more complex. I'm fine with focusing on the NOWAIT case first as an initial patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#19

Jelte Fennema-Nio

postgres@jeltef.nl

about 1 year ago

In reply to: Fujii Masao (#18)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On Wed, 12 Feb 2025 at 12:32, Fujii Masao <masao.fujii@oss.nttdata.com> wrote:

What do you think if we simply don't log anything for SKIP LOCKED?

Implementing both NOWAIT and SKIP LOCKED could take time and make the patch
more complex. I'm fine with focusing on the NOWAIT case first as an initial patch.

I think that makes sense. It's a fairly common pattern to use SKIP
LOCKED to implement a concurrent job queue. Having such a usecase
suddenly create lots of logs seems undesirable, especially since it
created no logs at all before. Since NOWAIT already results in an
error (and thus a log), having it add some additional info seems
totally reasonable.

#20

Seino Yuki

seinoyu@oss.nttdata.com

about 1 year ago

In reply to: Jelte Fennema-Nio (#19)

Re: Add “FOR UPDATE NOWAIT” lock details to the log.

On 2025/02/13 2:31, Jelte Fennema-Nio wrote:

On Wed, 12 Feb 2025 at 12:32, Fujii Masao<masao.fujii@oss.nttdata.com> wrote:

What do you think if we simply don't log anything for SKIP LOCKED?

Implementing both NOWAIT and SKIP LOCKED could take time and make the patch
more complex. I'm fine with focusing on the NOWAIT case first as an initial patch.

I think that makes sense. It's a fairly common pattern to use SKIP
LOCKED to implement a concurrent job queue. Having such a usecase
suddenly create lots of logs seems undesirable, especially since it
created no logs at all before. Since NOWAIT already results in an
error (and thus a log), having it add some additional info seems
totally reasonable.

Thank you for the advice. For now, my goal is to output only NOWAIT.
Since lock.c cannot reference LockWaitPolicy, I believe we need to
extend the IF conditions in LockAcquire, LockAcquireExtended, and their
higher-level functions. This could be a pretty significant modification.
I’ll think about whether there’s a better approach. I welcome any good
ideas from everyone too. As an aside, I also noticed that
dontWait(=true) is routed not only from NOWAIT and SKIP LOCKED but also
from vacuum and other parts. do_autovacuum(autovacuum.c) ->
ConditionalLockDatabaseObject -> LockAcquireExtended Regards,

#21

Fujii Masao

masao.fujii@gmail.com

about 1 year ago

In reply to: Seino Yuki (#20)

#22

Seino Yuki

seinoyu@oss.nttdata.com

about 1 year ago

In reply to: Fujii Masao (#21)

#23

Fujii Masao