adding wait_start column to pg_locks

Started by torikoshiaabout 5 years ago27 messages

torikoshia@oss.nttdata.com

about 5 years ago

1 attachment(s)

Hi,

When examining the duration of locks, we often do join on pg_locks
and pg_stat_activity and use columns such as query_start or
state_change.

However, since these columns are the moment when queries have
started or their state has changed, we cannot get the exact lock
duration in this way.

So I'm now thinking about adding a new column in pg_locks which
keeps the time at which locks started waiting.

One problem with this idea would be the performance impact of
calling gettimeofday repeatedly.
To avoid it, I reused the result of the gettimeofday which was
called for deadlock_timeout timer start as suggested in the
previous discussion[1]/messages/by-id/28804.1407907184@sss.pgh.pa.us.

Attached a patch.

BTW in this patch, for fast path locks, wait_start is set to
zero because it seems the lock will not be waited for.
If my understanding is wrong, I would appreciate it if
someone could point out.

Any thoughts?

[1]: /messages/by-id/28804.1407907184@sss.pgh.pa.us
/messages/by-id/28804.1407907184@sss.pgh.pa.us

Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Attachments:

v1-0001-Add-wait_start-field-into-pg_locks.patchtext/x-diff; name=v1-0001-Add-wait_start-field-into-pg_locks.patchDownload

From 1a6a7377877cc52e4b87a05bbb8ffae92cdb91ab Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 15 Dec 2020 10:55:32 +0900
Subject: [PATCH v1] Add wait_start field into pg_locks.

To examine the duration of locks, we did join on pg_locks and
pg_stat_activity and used columns such as query_start or state_change.
However, since they are the moment when queries have started or their
state has changed, we could not get the exact lock duration in this way.

This patch adds a new field preserving the time at which locks
started waiting.
---
 doc/src/sgml/catalogs.sgml          |  9 +++++++++
 src/backend/storage/lmgr/lock.c     | 10 ++++++++++
 src/backend/storage/lmgr/proc.c     |  2 ++
 src/backend/utils/adt/lockfuncs.c   |  9 ++++++++-
 src/include/catalog/pg_proc.dat     |  6 +++---
 src/include/storage/lock.h          |  3 +++
 src/test/regress/expected/rules.out |  5 +++--
 7 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 62711ee83f..19af0e9af4 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10567,6 +10567,15 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wait_start</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       The time at which lock started waiting
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index d86566f455..7b30508f95 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1196,6 +1196,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 		lock->waitMask = 0;
 		SHMQueueInit(&(lock->procLocks));
 		ProcQueueInit(&(lock->waitProcs));
+		lock->wait_start = 0;
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
@@ -3628,6 +3629,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there was no
+			 * conflicting locks.
+			 */
+			instance->wait_start = 0;
+
 			el++;
 		}
 
@@ -3655,6 +3662,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->wait_start = 0;
 
 			el++;
 		}
@@ -3707,6 +3715,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->wait_start = lock->wait_start;
 
 		el++;
 	}
@@ -4184,6 +4193,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 		lock->waitMask = 0;
 		SHMQueueInit(&(lock->procLocks));
 		ProcQueueInit(&(lock->waitProcs));
+		lock->wait_start = 0;
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 7dc3911590..f3702cc681 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1259,6 +1259,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		lock->wait_start = get_timeout_start_time(DEADLOCK_TIMEOUT);
 	}
 
 	/*
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index f592292d06..5ee0953305 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "wait_start",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (instance->wait_start != 0)
+			values[15] = TimestampTzGetDatum(instance->wait_start);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index e6c7b070f6..3e67d50891 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5946,9 +5946,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,wait_start}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 1c3e9c1999..8b6517aec3 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -304,6 +305,7 @@ typedef struct LOCK
 	LOCKMASK	waitMask;		/* bitmask for lock types awaited */
 	SHM_QUEUE	procLocks;		/* list of PROCLOCK objects assoc. with lock */
 	PROC_QUEUE	waitProcs;		/* list of PGPROC objects waiting on lock */
+	TimestampTz	wait_start;		/* time at which lock started waiting */
 	int			requested[MAX_LOCKMODES];	/* counts of requested locks */
 	int			nRequested;		/* total of requested[] array */
 	int			granted[MAX_LOCKMODES]; /* counts of granted locks */
@@ -445,6 +447,7 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz	wait_start;		/* time at which lock started waiting */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6293ab57bc..d8f0600a4b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.wait_start
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, wait_start);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,
-- 
2.18.1

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: torikoshia (#1)

Re: adding wait_start column to pg_locks

On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote:

So I'm now thinking about adding a new column in pg_locks which
keeps the time at which locks started waiting.

Attached a patch.

This is failing make check-world, would you send an updated patch ?

I added you as an author so it shows up here.
http://cfbot.cputube.org/atsushi-torikoshi.html

--
Justin

torikoshia

torikoshia@oss.nttdata.com

about 5 years ago

In reply to: Justin Pryzby (#2)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021-01-02 06:49, Justin Pryzby wrote:

On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote:

So I'm now thinking about adding a new column in pg_locks which
keeps the time at which locks started waiting.

Attached a patch.

This is failing make check-world, would you send an updated patch ?

I added you as an author so it shows up here.
http://cfbot.cputube.org/atsushi-torikoshi.html

Thanks!

Attached an updated patch.

Regards,

Attachments:

v2-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchtext/x-diff; name=v2-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchDownload

From 608bba31da1bc5d15db991662fa858cd4632d849 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Mon, 4 Jan 2021 09:53:17 +0900
Subject: [PATCH v2] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the exact lock duration in this way.

This patch adds a new field preserving the time at which locks
started waiting.
---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml               |  9 +++++++++
 src/backend/storage/lmgr/lock.c          | 10 ++++++++++
 src/backend/storage/lmgr/proc.c          |  2 ++
 src/backend/utils/adt/lockfuncs.c        |  9 ++++++++-
 src/include/catalog/pg_proc.dat          |  6 +++---
 src/include/storage/lock.h               |  3 +++
 src/test/regress/expected/rules.out      |  5 +++--
 8 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..c0aecb0288 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+------------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..626e5672bd 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10578,6 +10578,15 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wait_start</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       The time at which lock started waiting
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 20e50247ea..27969d3772 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1195,6 +1195,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 		lock->waitMask = 0;
 		SHMQueueInit(&(lock->procLocks));
 		ProcQueueInit(&(lock->waitProcs));
+		lock->wait_start = 0;
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
@@ -3627,6 +3628,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there was no
+			 * conflicting locks.
+			 */
+			instance->wait_start = 0;
+
 			el++;
 		}
 
@@ -3654,6 +3661,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->wait_start = 0;
 
 			el++;
 		}
@@ -3706,6 +3714,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->wait_start = lock->wait_start;
 
 		el++;
 	}
@@ -4183,6 +4192,7 @@ lock_twophase_recover(TransactionId xid, uint16 info,
 		lock->waitMask = 0;
 		SHMQueueInit(&(lock->procLocks));
 		ProcQueueInit(&(lock->waitProcs));
+		lock->wait_start = 0;
 		lock->nRequested = 0;
 		lock->nGranted = 0;
 		MemSet(lock->requested, 0, sizeof(int) * MAX_LOCKMODES);
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 57717f666d..56aa8b7f6b 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1259,6 +1259,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		lock->wait_start = get_timeout_start_time(DEADLOCK_TIMEOUT);
 	}
 
 	/*
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index 9f2c4946c9..85b216a1d6 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "wait_start",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (instance->wait_start != 0)
+			values[15] = TimestampTzGetDatum(instance->wait_start);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d7b55f57ea..c453cd5875 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5946,9 +5946,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,wait_start}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 2e6ef174e9..a944c93774 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -304,6 +305,7 @@ typedef struct LOCK
 	LOCKMASK	waitMask;		/* bitmask for lock types awaited */
 	SHM_QUEUE	procLocks;		/* list of PROCLOCK objects assoc. with lock */
 	PROC_QUEUE	waitProcs;		/* list of PGPROC objects waiting on lock */
+	TimestampTz	wait_start;		/* time at which lock started waiting */
 	int			requested[MAX_LOCKMODES];	/* counts of requested locks */
 	int			nRequested;		/* total of requested[] array */
 	int			granted[MAX_LOCKMODES]; /* counts of granted locks */
@@ -445,6 +447,7 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz	wait_start;		/* time at which lock started waiting */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6293ab57bc..d8f0600a4b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.wait_start
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, wait_start);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,
-- 
2.18.1

Ian Lawrence Barwick

barwick@gmail.com

almost 5 years ago

In reply to: torikoshia (#3)

1 attachment(s)

Re: adding wait_start column to pg_locks

2021年1月4日(月) 15:04 torikoshia <torikoshia@oss.nttdata.com>:

On 2021-01-02 06:49, Justin Pryzby wrote:

On Tue, Dec 15, 2020 at 11:47:23AM +0900, torikoshia wrote:

So I'm now thinking about adding a new column in pg_locks which
keeps the time at which locks started waiting.

Attached a patch.

This is failing make check-world, would you send an updated patch ?

I added you as an author so it shows up here.
http://cfbot.cputube.org/atsushi-torikoshi.html

Thanks!

Attached an updated patch.

I took a look at this patch as it seems useful (and I have an item on my
bucket
list labelled "look at the locking code", which I am not at all familiar
with).

I tested the patch by doing the following:

Session 1:

postgres=# CREATE TABLE foo (id int);
CREATE TABLE

postgres=# BEGIN ;
BEGIN

postgres=*# INSERT INTO foo VALUES (1);
INSERT 0 1

Session 2:

postgres=# BEGIN ;
BEGIN

postgres=*# LOCK TABLE foo;

Session 3:

postgres=# SELECT locktype, relation, pid, mode, granted, fastpath,
wait_start
FROM pg_locks
WHERE relation = 'foo'::regclass AND NOT granted\x\g\x

So far so good, but checking *all* the locks on this relation:

postgres=# SELECT locktype, relation, pid, mode, granted, fastpath,
wait_start
FROM pg_locks
WHERE relation = 'foo'::regclass\x\g\x

shows the RowExclusiveLock granted in session 1 as apparently waiting from
exactly the same time as session 2's attempt to acquire the lock, which is
clearly
not right.

Also, if a further session attempts to acquire a lock, we get:

postgres=# SELECT locktype, relation, pid, mode, granted, fastpath,
wait_start
FROM pg_locks
WHERE relation = 'foo'::regclass\x\g\x

i.e. all entries now have "wait_start" set to the start time of the latest
session's
lock acquisition attempt.

Looking at the code, this happens as the wait start time is being recorded
in
the lock record itself, so always contains the value reported by the latest
lock
acquisition attempt.

It looks like the logical place to store the value is in the PROCLOCK
structure; the attached patch reworks your patch to do that, and given the
above
scenario produces following output:

postgres=# SELECT locktype, relation, pid, mode, granted, fastpath,
wait_start
FROM pg_locks
WHERE relation = 'foo'::regclass\x\g\x

As mentioned, I'm not at all familiar with the locking code so it's quite
possible that it's incomplete,there may be non-obvious side-effects, or it's
the wrong approach altogether etc., so further review is necessary.

Regards

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

Attachments:

v3-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchtext/x-patch; charset=US-ASCII; name=v3-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchDownload

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..c0aecb0288 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+------------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3a2266526c..7b8a47f438 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10578,6 +10578,16 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wait_start</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Lock acquisition wait start time. <literal>NULL</literal> if
+       lock acquired.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 20e50247ea..f45dbf5dd5 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1268,6 +1268,7 @@ SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 			proc->lockGroupLeader : proc;
 		proclock->holdMask = 0;
 		proclock->releaseMask = 0;
+		proclock->wait_start = 0;
 		/* Add proclock to appropriate lists */
 		SHMQueueInsertBefore(&lock->procLocks, &proclock->lockLink);
 		SHMQueueInsertBefore(&(proc->myProcLocks[partition]),
@@ -3627,6 +3628,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->wait_start = 0;
+
 			el++;
 		}
 
@@ -3654,6 +3661,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->wait_start = 0;
 
 			el++;
 		}
@@ -3706,6 +3714,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->wait_start = proclock->wait_start;
 
 		el++;
 	}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..a06252f245 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1262,6 +1262,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		proclock->wait_start = get_timeout_start_time(DEADLOCK_TIMEOUT);
 	}
 	else if (log_recovery_conflict_waits)
 	{
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..b347de4e61 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "wait_start",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (instance->wait_start != 0)
+			values[15] = TimestampTzGetDatum(instance->wait_start);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d27336adcd..5cd93c4587 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5948,9 +5948,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,wait_start}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 2e6ef174e9..319438a61d 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -366,6 +367,7 @@ typedef struct PROCLOCK
 	LOCKMASK	releaseMask;	/* bitmask for lock types to be released */
 	SHM_QUEUE	lockLink;		/* list link in LOCK's list of proclocks */
 	SHM_QUEUE	procLink;		/* list link in PGPROC's list of proclocks */
+	TimestampTz	wait_start;		/* time at which wait for lock acquisition started */
 } PROCLOCK;
 
 #define PROCLOCK_LOCKMETHOD(proclock) \
@@ -445,6 +447,7 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz	wait_start;		/* time at which this PGPROC started waiting for lock */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a687e99d1e..8ccc859882 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.wait_start
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, wait_start);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,

Robert Haas

robertmhaas@gmail.com

almost 5 years ago

In reply to: Ian Lawrence Barwick (#4)

Re: adding wait_start column to pg_locks

On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick <barwick@gmail.com> wrote:

It looks like the logical place to store the value is in the PROCLOCK
structure; ...

That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.

But I haven't looked at the patch so maybe I'm dumb.

--
Robert Haas
EDB: http://www.enterprisedb.com

Ian Lawrence Barwick

barwick@gmail.com

almost 5 years ago

In reply to: Robert Haas (#5)

Re: adding wait_start column to pg_locks

2021年1月15日(金) 3:45 Robert Haas <robertmhaas@gmail.com>:

On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick <barwick@gmail.com>
wrote:

It looks like the logical place to store the value is in the PROCLOCK
structure; ...

That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.

Umm, I think we're at cross-purposes here. The suggestion is to note
the time when the process started waiting for the lock in the process's
PROCLOCK, rather than in the lock itself (which in the original version
of the patch resulted in all processes with an interest in the lock
appearing
to have been waiting to acquire it since the time a lock acquisition
was most recently attempted).

As mentioned, I hadn't really ever looked at the lock code before so might
be barking up the wrong forest.

Regards

Ian Barwick

--
EnterpriseDB: https://www.enterprisedb.com

torikoshia

torikoshia@oss.nttdata.com

almost 5 years ago

In reply to: Ian Lawrence Barwick (#6)

Re: adding wait_start column to pg_locks

Thanks for your reviewing and comments!

On 2021-01-14 12:39, Ian Lawrence Barwick wrote:

Looking at the code, this happens as the wait start time is being
recorded in
the lock record itself, so always contains the value reported by the
latest lock
acquisition attempt.

I think you are right and wait_start should not be recorded
in the LOCK.

On 2021-01-15 11:48, Ian Lawrence Barwick wrote:

2021年1月15日(金) 3:45 Robert Haas <robertmhaas@gmail.com>:

On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
<barwick@gmail.com> wrote:

It looks like the logical place to store the value is in the

PROCLOCK

structure; ...

That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and
thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.

Umm, I think we're at cross-purposes here. The suggestion is to note
the time when the process started waiting for the lock in the
process's
PROCLOCK, rather than in the lock itself (which in the original
version
of the patch resulted in all processes with an interest in the lock
appearing
to have been waiting to acquire it since the time a lock acquisition
was most recently attempted).

AFAIU, it seems possible to record wait_start in the PROCLOCK but
redundant since each process can wait at most one lock.

To confirm my understanding, I'm going to make another patch that
records wait_start in the PGPROC.

Regards,

--
Atsushi Torikoshi

torikoshia

torikoshia@oss.nttdata.com

almost 5 years ago

In reply to: torikoshia (#7)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021-01-15 15:23, torikoshia wrote:

Thanks for your reviewing and comments!

On 2021-01-14 12:39, Ian Lawrence Barwick wrote:

Looking at the code, this happens as the wait start time is being
recorded in
the lock record itself, so always contains the value reported by the
latest lock
acquisition attempt.

I think you are right and wait_start should not be recorded
in the LOCK.

On 2021-01-15 11:48, Ian Lawrence Barwick wrote:

2021年1月15日(金) 3:45 Robert Haas <robertmhaas@gmail.com>:

On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
<barwick@gmail.com> wrote:

It looks like the logical place to store the value is in the

PROCLOCK

structure; ...

That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and
thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.

Umm, I think we're at cross-purposes here. The suggestion is to note
the time when the process started waiting for the lock in the
process's
PROCLOCK, rather than in the lock itself (which in the original
version
of the patch resulted in all processes with an interest in the lock
appearing
to have been waiting to acquire it since the time a lock acquisition
was most recently attempted).

AFAIU, it seems possible to record wait_start in the PROCLOCK but
redundant since each process can wait at most one lock.

To confirm my understanding, I'm going to make another patch that
records wait_start in the PGPROC.

Attached a patch.

I noticed previous patches left the wait_start untouched even after
it acquired lock.
The patch also fixed it.

Any thoughts?

Regards,

--
Atsushi Torikoshi

Attachments:

v4-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchtext/x-diff; name=v4-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchDownload

From 62ff3e4dba7d45c260a62a33425cb2d1e6b822c9 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Mon, 18 Jan 2021 10:01:35 +0900
Subject: [PATCH v4] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the exact lock duration in this way.

This patch adds a new field preserving the time at which locks started
waiting.
---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml               | 10 ++++++++++
 src/backend/storage/lmgr/lock.c          |  8 ++++++++
 src/backend/storage/lmgr/proc.c          |  4 ++++
 src/backend/utils/adt/lockfuncs.c        |  9 ++++++++-
 src/include/catalog/pg_proc.dat          |  6 +++---
 src/include/storage/lock.h               |  2 ++
 src/include/storage/proc.h               |  1 +
 src/test/regress/expected/rules.out      |  5 +++--
 9 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..c0aecb0288 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | wait_start 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+------------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1ad90..a5ce0835a9 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10589,6 +10589,16 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wait_start</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Lock acquisition wait start time. <literal>NULL</literal> if
+       lock acquired.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 20e50247ea..5b5fb474e0 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3627,6 +3627,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->wait_start = 0;
+
 			el++;
 		}
 
@@ -3654,6 +3660,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->wait_start = 0;
 
 			el++;
 		}
@@ -3706,6 +3713,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->wait_start = proc->wait_start;
 
 		el++;
 	}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..34173a5fdb 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	MyProc->wait_start = 0;
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -1262,6 +1263,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		MyProc->wait_start = get_timeout_start_time(DEADLOCK_TIMEOUT);
 	}
 	else if (log_recovery_conflict_waits)
 	{
@@ -1678,6 +1681,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	proc->waitStatus = waitStatus;
+	proc->wait_start = 0;
 
 	/* And awaken it */
 	SetLatch(&proc->procLatch);
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..e26595bd05 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "wait_start",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (!granted && instance->wait_start != 0)
+			values[15] = TimestampTzGetDatum(instance->wait_start);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d27336adcd..5cd93c4587 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5948,9 +5948,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,wait_start}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 2e6ef174e9..e37b7a66e9 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -445,6 +446,7 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz	wait_start;		/* time at which this PGPROC started waiting for lock */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 683ab64f76..dfbc34163f 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -181,6 +181,7 @@ struct PGPROC
 	LOCKMODE	waitLockMode;	/* type of lock we're waiting for */
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
+	TimestampTz	wait_start;		/* time at which wait for lock acquisition started */
 
 	bool		delayChkpt;		/* true if this proc delays checkpoint start */
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a687e99d1e..8ccc859882 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.wait_start
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, wait_start);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,
-- 
2.18.1

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: torikoshia (#8)

Re: adding wait_start column to pg_locks

On 2021/01/18 12:00, torikoshia wrote:

On 2021-01-15 15:23, torikoshia wrote:

Thanks for your reviewing and comments!

On 2021-01-14 12:39, Ian Lawrence Barwick wrote:

Looking at the code, this happens as the wait start time is being recorded in
the lock record itself, so always contains the value reported by the latest lock
acquisition attempt.

I think you are right and wait_start should not be recorded
in the LOCK.

On 2021-01-15 11:48, Ian Lawrence Barwick wrote:

2021年1月15日(金) 3:45 Robert Haas <robertmhaas@gmail.com>:

On Wed, Jan 13, 2021 at 10:40 PM Ian Lawrence Barwick
<barwick@gmail.com> wrote:

It looks like the logical place to store the value is in the

PROCLOCK

structure; ...

That seems surprising, because there's one PROCLOCK for every
combination of a process and a lock. But, a process can't be waiting
for more than one lock at the same time, because once it starts
waiting to acquire the first one, it can't do anything else, and
thus
can't begin waiting for a second one. So I would have thought that
this would be recorded in the PROC.

Umm, I think we're at cross-purposes here. The suggestion is to note
the time when the process started waiting for the lock in the
process's
PROCLOCK, rather than in the lock itself (which in the original
version
of the patch resulted in all processes with an interest in the lock
appearing
to have been waiting to acquire it since the time a lock acquisition
was most recently attempted).

AFAIU, it seems possible to record wait_start in the PROCLOCK but
redundant since each process can wait at most one lock.

To confirm my understanding, I'm going to make another patch that
records wait_start in the PGPROC.

Attached a patch.

I noticed previous patches left the wait_start untouched even after
it acquired lock.
The patch also fixed it.

Any thoughts?

Thanks for updating the patch! I think that this is really useful feature!!
I have two minor comments.

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>wait_start</structfield> <type>timestamptz</type>

The column name "wait_start" should be "waitstart" for the sake of consistency
with other column names in pg_locks? pg_locks seems to avoid including
an underscore in column names, so "locktype" is used instead of "lock_type",
"virtualtransaction" is used instead of "virtual_transaction", etc.

+       Lock acquisition wait start time. <literal>NULL</literal> if
+       lock acquired.

There seems the case where the wait start time is NULL even when "grant"
is false. It's better to add note about that case into the docs? For example,
I found that the wait start time is NULL while the startup process is waiting
for the lock. Is this only that case?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#10

torikoshia

torikoshia@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#9)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021-01-21 12:48, Fujii Masao wrote:

Thanks for updating the patch! I think that this is really useful
feature!!

Thanks for reviewing!

I have two minor comments.
+      <entry role="catalog_table_entry"><para 
role="column_definition">
+       <structfield>wait_start</structfield> <type>timestamptz</type>
The column name "wait_start" should be "waitstart" for the sake of
consistency
with other column names in pg_locks? pg_locks seems to avoid including
an underscore in column names, so "locktype" is used instead of
"lock_type",
"virtualtransaction" is used instead of "virtual_transaction", etc.
+       Lock acquisition wait start time. <literal>NULL</literal> if
+       lock acquired.

Agreed.

I also changed the variable name "wait_start" in struct PGPROC and
LockInstanceData to "waitStart" for the same reason.

There seems the case where the wait start time is NULL even when
"grant"
is false. It's better to add note about that case into the docs? For
example,
I found that the wait start time is NULL while the startup process is
waiting
for the lock. Is this only that case?

Thanks, this is because I set 'waitstart' in the following
condition.

---src/backend/storage/lmgr/proc.c

1250 if (!InHotStandby)

As far as considering this, I guess startup process would
be the only case.

I also think that in case of startup process, it seems possible
to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I
did it in the attached patch.

Any thoughts?

Regards,

--
Atsushi Torikoshi

Attachments:

v5-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchtext/x-diff; name=v5-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchDownload

From 6beb1c61e72c797c915427ae4e36d6bab9e0594c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Fri, 22 Jan 2021 13:51:00 +0900
Subject: [PATCH v5] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the exact lock duration in this way.

---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml               | 10 ++++++++++
 src/backend/storage/ipc/standby.c        |  8 ++++++--
 src/backend/storage/lmgr/lock.c          |  8 ++++++++
 src/backend/storage/lmgr/proc.c          |  4 ++++
 src/backend/utils/adt/lockfuncs.c        |  9 ++++++++-
 src/include/catalog/pg_proc.dat          |  6 +++---
 src/include/storage/lock.h               |  2 ++
 src/include/storage/proc.h               |  1 +
 src/test/regress/expected/rules.out      |  5 +++--
 10 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+-----------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 43d7a1ad90..ba003ce393 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10589,6 +10589,16 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>waitstart</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Lock acquisition wait start time. <literal>NULL</literal> if
+       lock acquired
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..819e00e4ab 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -539,13 +539,17 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
 void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
-	TimestampTz ltime;
+	TimestampTz ltime, now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0)
+	if (MyProc->waitStart == 0)
+		MyProc->waitStart = now;
+
+	if (now >= ltime && ltime != 0)
 	{
 		/*
 		 * We're already behind, so clear a path as quickly as possible.
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 20e50247ea..ffad4e94bc 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3627,6 +3627,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->waitStart = 0;
+
 			el++;
 		}
 
@@ -3654,6 +3660,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->waitStart = 0;
 
 			el++;
 		}
@@ -3706,6 +3713,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->waitStart = proc->waitStart;
 
 		el++;
 	}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..725f1d367a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	MyProc->waitStart = 0;
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -1262,6 +1263,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		MyProc->waitStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
 	}
 	else if (log_recovery_conflict_waits)
 	{
@@ -1678,6 +1681,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	proc->waitStatus = waitStatus;
+	proc->waitStart = 0;
 
 	/* And awaken it */
 	SetLatch(&proc->procLatch);
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..97f0265c12 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "waitstart",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (!granted && instance->waitStart != 0)
+			values[15] = TimestampTzGetDatum(instance->waitStart);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b5f52d4e4a..61310781ca 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5986,9 +5986,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,waitstart}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 2e6ef174e9..cf312bc55b 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -445,6 +446,7 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz	waitStart;		/* time at which this PGPROC started waiting for lock */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 683ab64f76..83b5ec3bab 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -181,6 +181,7 @@ struct PGPROC
 	LOCKMODE	waitLockMode;	/* type of lock we're waiting for */
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
+	TimestampTz	waitStart;		/* time at which wait for lock acquisition started */
 
 	bool		delayChkpt;		/* true if this proc delays checkpoint start */
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6173473de9..5b86c7d5f2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.waitstart
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, waitstart);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,
-- 
2.18.1

#11

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: torikoshia (#10)

Re: adding wait_start column to pg_locks

On 2021/01/22 14:37, torikoshia wrote:

On 2021-01-21 12:48, Fujii Masao wrote:

Thanks for updating the patch! I think that this is really useful feature!!

Thanks for reviewing!
I have two minor comments.
+ï¿½ï¿½ï¿½ï¿½ï¿½ <entry role="catalog_table_entry"><para role="column_definition">
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ <structfield>wait_start</structfield> <type>timestamptz</type>
The column name "wait_start" should be "waitstart" for the sake of consistency
with other column names in pg_locks? pg_locks seems to avoid including
an underscore in column names, so "locktype" is used instead of "lock_type",
"virtualtransaction" is used instead of "virtual_transaction", etc.
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ Lock acquisition wait start time. <literal>NULL</literal> if
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ lock acquired.
Agreed.

I also changed the variable name "wait_start" in struct PGPROC and
LockInstanceData to "waitStart" for the same reason.

There seems the case where the wait start time is NULL even when "grant"
is false. It's better to add note about that case into the docs? For example,
I found that the wait start time is NULL while the startup process is waiting
for the lock. Is this only that case?

Thanks, this is because I set 'waitstart' in the following
condition.

ï¿½ ---src/backend/storage/lmgr/proc.c
ï¿½ > 1250ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ if (!InHotStandby)

As far as considering this, I guess startup process would
be the only case.

I also think that in case of startup process, it seems possible
to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I
did it in the attached patch.

This change seems to cause "waitstart" to be reset every time
ResolveRecoveryConflictWithLock() is called in the do-while loop.
I guess this is not acceptable. Right?

To avoid that issue, IMO the following change is better. Thought?

-       else if (log_recovery_conflict_waits)
+       else
         {
+               TimestampTz now = GetCurrentTimestamp();
+
+               MyProc->waitStart = now;
+
                 /*
                  * Set the wait start timestamp if logging is enabled and in hot
                  * standby.
                  */
-               standbyWaitStart = GetCurrentTimestamp();
+                if (log_recovery_conflict_waits)
+                        standbyWaitStart = now
         }

This change causes the startup process to call GetCurrentTimestamp()
additionally even when log_recovery_conflict_waits is disabled. Which
might decrease the performance of the startup process, but that performance
degradation can happen only when the startup process waits in
ACCESS EXCLUSIVE lock. So if this my understanding right, IMO it's almost
harmless to call GetCurrentTimestamp() additionally in that case. Thought?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#12

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#11)

Re: adding wait_start column to pg_locks

On 2021/01/22 18:11, Fujii Masao wrote:

On 2021/01/22 14:37, torikoshia wrote:
On 2021-01-21 12:48, Fujii Masao wrote:

Thanks for updating the patch! I think that this is really useful feature!!

Thanks for reviewing!
I have two minor comments.
+ï¿½ï¿½ï¿½ï¿½ï¿½ <entry role="catalog_table_entry"><para role="column_definition">
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ <structfield>wait_start</structfield> <type>timestamptz</type>
The column name "wait_start" should be "waitstart" for the sake of consistency
with other column names in pg_locks? pg_locks seems to avoid including
an underscore in column names, so "locktype" is used instead of "lock_type",
"virtualtransaction" is used instead of "virtual_transaction", etc.
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ Lock acquisition wait start time. <literal>NULL</literal> if
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ lock acquired.
Agreed.

I also changed the variable name "wait_start" in struct PGPROC and
LockInstanceData to "waitStart" for the same reason.

There seems the case where the wait start time is NULL even when "grant"
is false. It's better to add note about that case into the docs? For example,
I found that the wait start time is NULL while the startup process is waiting
for the lock. Is this only that case?

Thanks, this is because I set 'waitstart' in the following
condition.

ï¿½ï¿½ ---src/backend/storage/lmgr/proc.c
ï¿½ï¿½ > 1250ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ if (!InHotStandby)

As far as considering this, I guess startup process would
be the only case.

I also think that in case of startup process, it seems possible
to set 'waitstart' in ResolveRecoveryConflictWithLock(), so I
did it in the attached patch.
This change seems to cause "waitstart" to be reset every time
ResolveRecoveryConflictWithLock() is called in the do-while loop.
I guess this is not acceptable. Right?

To avoid that issue, IMO the following change is better. Thought?
-ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ else if (log_recovery_conflict_waits)
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ else
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ {
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ TimestampTz now = GetCurrentTimestamp();
+
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ MyProc->waitStart = now;
+
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ /*
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * Set the wait start timestamp if logging is enabled and in hot
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ * standby.
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ */
-ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ standbyWaitStart = GetCurrentTimestamp();
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ if (log_recovery_conflict_waits)
+ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ standbyWaitStart = now
ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ }
This change causes the startup process to call GetCurrentTimestamp()
additionally even when log_recovery_conflict_waits is disabled. Which
might decrease the performance of the startup process, but that performance
degradation can happen only when the startup process waits in
ACCESS EXCLUSIVE lock. So if this my understanding right, IMO it's almost
harmless to call GetCurrentTimestamp() additionally in that case. Thought?

According to the off-list discussion with you, this should not happen because ResolveRecoveryConflictWithDatabase() sets MyProc->waitStart only when it's not set yet (i.e., = 0). That's good. So I'd withdraw my comment.

+	if (MyProc->waitStart == 0)
+		MyProc->waitStart = now;
<snip>
+		MyProc->waitStart = get_timeout_start_time(DEADLOCK_TIMEOUT);

Another comment is; Doesn't the change of MyProc->waitStart need the lock table's partition lock? If yes, we can do that by moving LWLockRelease(partitionLock) just after the change of MyProc->waitStart, but which causes the time that lwlock is being held to be long. So maybe we need another way to do that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#13

torikoshia

torikoshia@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#12)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021-01-25 23:44, Fujii Masao wrote:

Another comment is; Doesn't the change of MyProc->waitStart need the
lock table's partition lock? If yes, we can do that by moving
LWLockRelease(partitionLock) just after the change of
MyProc->waitStart, but which causes the time that lwlock is being held
to be long. So maybe we need another way to do that.

Thanks for your comments!

It would be ideal for the consistency of the view to record "waitstart"
during holding the table partition lock.
However, as you pointed out, it would give non-negligible performance
impacts.

I may miss something, but as far as I can see, the influence of not
holding the lock is that "waitstart" can be NULL even though "granted"
is false.

I think people want to know the start time of the lock when locks are
held for a long time.
In that case, "waitstart" should have already been recorded.

If this is true, I think the current implementation may be enough on the
condition that users understand it can happen that "waitStart" is NULL
and "granted" is false.

Attached a patch describing this in the doc and comments.

Any Thoughts?

Regards,

--
Atsushi Torikoshi

Attachments:

v6-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchtext/x-diff; name=v6-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchDownload

From 03c6e1ed6ffa215ee898b5a6a75d77277fb8e672 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Tue, 2 Feb 2021 21:32:36 +0900
Subject: [PATCH v6] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the lock duration in this way. This patch adds
 a new field "waitstart" preserving lock acquisition wait start time.

Note that updating this field and lock acquisition are not performed
synchronously for performance reasons.  Therefore, depending on the
timing, it can happen that waitstart is NULL even though granted is
false.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml               | 14 ++++++++++++++
 src/backend/storage/ipc/standby.c        | 16 ++++++++++++++--
 src/backend/storage/lmgr/lock.c          |  8 ++++++++
 src/backend/storage/lmgr/proc.c          | 10 ++++++++++
 src/backend/utils/adt/lockfuncs.c        |  9 ++++++++-
 src/include/catalog/pg_proc.dat          |  6 +++---
 src/include/storage/lock.h               |  2 ++
 src/include/storage/proc.h               |  1 +
 src/test/regress/expected/rules.out      |  5 +++--
 10 files changed, 65 insertions(+), 10 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+-----------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..d81d6e1c52 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10592,6 +10592,20 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>waitstart</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Lock acquisition wait start time.
+       Note that updating this field and lock acquisition are not performed
+       synchronously for performance reasons.  Therefore, depending on the
+       timing, it can happen that <structfield>waitstart</structfield> is
+       <literal>NULL</literal> even though
+       <structfield>granted</structfield> is false.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..2282229568 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -539,13 +539,25 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
 void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
-	TimestampTz ltime;
+	TimestampTz ltime, now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0)
+	/*
+	 * Record waitStart using the current time obtained for comparison
+	 * with ltime.
+	 *
+	 * It would be ideal this can be synchronously done with updating
+	 * lock information. Howerver, since it gives performance impacts
+	 * to hold partitionLock longer time, we do it here asynchronously.
+	 */
+	if (MyProc->waitStart == 0)
+		MyProc->waitStart = now;
+
+	if (now >= ltime && ltime != 0)
 	{
 		/*
 		 * We're already behind, so clear a path as quickly as possible.
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 79c1cf9b8b..295c3220e0 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3619,6 +3619,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->waitStart = 0;
+
 			el++;
 		}
 
@@ -3646,6 +3652,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->waitStart = 0;
 
 			el++;
 		}
@@ -3698,6 +3705,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->waitStart = proc->waitStart;
 
 		el++;
 	}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..84401e8430 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	MyProc->waitStart = 0;
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -1262,6 +1263,14 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+		/*
+		 * Record waitStart reusing the deadlock timeout timer.
+		 *
+		 * It would be ideal this can be synchronously done with updating
+		 * lock information. Howerver, since it gives performance impacts
+		 * to hold partitionLock longer time, we do it here asynchronously.
+		 */
+		MyProc->waitStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
 	}
 	else if (log_recovery_conflict_waits)
 	{
@@ -1678,6 +1687,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	proc->waitStatus = waitStatus;
+	proc->waitStart = 0;
 
 	/* And awaken it */
 	SetLatch(&proc->procLatch);
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..97f0265c12 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "waitstart",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (!granted && instance->waitStart != 0)
+			values[15] = TimestampTzGetDatum(instance->waitStart);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f8174061ef..a1a99c9ef4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5986,9 +5986,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,waitstart}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 68a3487d49..846487f94b 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -446,6 +447,7 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz	waitStart;		/* time at which this PGPROC started waiting for lock */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 683ab64f76..83b5ec3bab 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -181,6 +181,7 @@ struct PGPROC
 	LOCKMODE	waitLockMode;	/* type of lock we're waiting for */
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
+	TimestampTz	waitStart;		/* time at which wait for lock acquisition started */
 
 	bool		delayChkpt;		/* true if this proc delays checkpoint start */
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6173473de9..5b86c7d5f2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.waitstart
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, waitstart);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,
-- 
2.18.1

#14

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: torikoshia (#13)

Re: adding wait_start column to pg_locks

On 2021/02/02 22:00, torikoshia wrote:

On 2021-01-25 23:44, Fujii Masao wrote:

Another comment is; Doesn't the change of MyProc->waitStart need the
lock table's partition lock? If yes, we can do that by moving
LWLockRelease(partitionLock) just after the change of
MyProc->waitStart, but which causes the time that lwlock is being held
to be long. So maybe we need another way to do that.

Thanks for your comments!

It would be ideal for the consistency of the view to record "waitstart" during holding the table partition lock.
However, as you pointed out, it would give non-negligible performance impacts.

I may miss something, but as far as I can see, the influence of not holding the lock is that "waitstart" can be NULL even though "granted" is false.

I think people want to know the start time of the lock when locks are held for a long time.
In that case, "waitstart" should have already been recorded.

Sounds reasonable.

If this is true, I think the current implementation may be enough on the condition that users understand it can happen that "waitStart" is NULL and "granted" is false.

Attached a patch describing this in the doc and comments.

Any Thoughts?

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

+ Lock acquisition wait start time.

Isn't it better to describe this more clearly? What about the following?

Time when the server process started waiting for this lock,
or null if the lock is held.

+       Note that updating this field and lock acquisition are not performed
+       synchronously for performance reasons.  Therefore, depending on the
+       timing, it can happen that <structfield>waitstart</structfield> is
+       <literal>NULL</literal> even though
+       <structfield>granted</structfield> is false.

I agree that it's helpful to add the note about that NULL can be returned even when "granted" is false. But IMO we don't need to document why this behavior can happen internally. So what about the following?

Note that this can be null for a very short period of time after
the wait started even though <structfield>granted</structfield>
is <literal>false</literal>.

Since the document for pg_locks uses "null" instead of <literal>NULL</literal> (I'm not sure why, though), I used "null" for the sake of consistency.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#15

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#14)

Re: adding wait_start column to pg_locks

On 2021/02/03 1:49, Fujii Masao wrote:

On 2021/02/02 22:00, torikoshia wrote:

On 2021-01-25 23:44, Fujii Masao wrote:

Another comment is; Doesn't the change of MyProc->waitStart need the
lock table's partition lock? If yes, we can do that by moving
LWLockRelease(partitionLock) just after the change of
MyProc->waitStart, but which causes the time that lwlock is being held
to be long. So maybe we need another way to do that.

Thanks for your comments!

It would be ideal for the consistency of the view to record "waitstart" during holding the table partition lock.
However, as you pointed out, it would give non-negligible performance impacts.

I may miss something, but as far as I can see, the influence of not holding the lock is that "waitstart" can be NULL even though "granted" is false.

I think people want to know the start time of the lock when locks are held for a long time.
In that case, "waitstart" should have already been recorded.

Sounds reasonable.

If this is true, I think the current implementation may be enough on the condition that users understand it can happen that "waitStart" is NULL and "granted" is false.

Attached a patch describing this in the doc and comments.

Any Thoughts?

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like pg_atomic_read_u64(), for that.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#16

torikoshia

torikoshia@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#15)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is
necessary when updating "waitStart" without holding the partition
lock? Also GetLockStatusData() needs spinlock when reading
"waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and
pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx
and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong,
I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I
used it as is.

Regards,

--
Atsushi Torikoshi

Attachments:

v7-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchtext/x-diff; name=v7-0001-To-examine-the-duration-of-locks-we-did-join-on-p.patchDownload

From 38a3d8996c4b1690cf18cdb1015e270201d34330 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikoshia@oss.nttdata.com>
Date: Thu, 4 Feb 2021 23:23:36 +0900
Subject: [PATCH v7] To examine the duration of locks, we did join on pg_locks
 and pg_stat_activity and used columns such as query_start or state_change.
 However, since they are the moment when queries have started or their state
 has changed, we could not get the lock duration in this way. This patch adds
 a new field "waitstart" preserving lock acquisition wait start time.

Note that updating this field and lock acquisition are not performed
synchronously for performance reasons.  Therefore, depending on the
timing, it can happen that waitstart is NULL even though granted is
false.

Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com

modifies
---
 contrib/amcheck/expected/check_btree.out |  4 ++--
 doc/src/sgml/catalogs.sgml               | 13 +++++++++++++
 src/backend/storage/ipc/standby.c        | 17 +++++++++++++++--
 src/backend/storage/lmgr/lock.c          |  8 ++++++++
 src/backend/storage/lmgr/proc.c          | 14 ++++++++++++++
 src/backend/utils/adt/lockfuncs.c        |  9 ++++++++-
 src/include/catalog/pg_proc.dat          |  6 +++---
 src/include/storage/lock.h               |  2 ++
 src/include/storage/proc.h               |  1 +
 src/test/regress/expected/rules.out      |  5 +++--
 10 files changed, 69 insertions(+), 10 deletions(-)

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+-----------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 865e826fb0..7df4c30a65 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10592,6 +10592,19 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>waitstart</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Time when the server process started waiting for this lock,
+       or null if the lock is held.
+       Note that this can be null for a very short period of time after
+       the wait started even though <structfield>granted</structfield>
+       is <literal>false</literal>.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..1c8135ba74 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -539,13 +539,26 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
 void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
-	TimestampTz ltime;
+	TimestampTz ltime, now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0)
+	/*
+	 * Record waitStart using the current time obtained for comparison
+	 * with ltime.
+	 *
+	 * It would be ideal this can be synchronously done with updating
+	 * lock information. Howerver, since it gives performance impacts
+	 * to hold partitionLock longer time, we do it here asynchronously.
+	 */
+	if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+		pg_atomic_write_u64(&MyProc->waitStart,
+							pg_atomic_read_u64((pg_atomic_uint64 *) &now));
+
+	if (now >= ltime && ltime != 0)
 	{
 		/*
 		 * We're already behind, so clear a path as quickly as possible.
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 79c1cf9b8b..108b4d9023 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3619,6 +3619,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->waitStart = 0;
+
 			el++;
 		}
 
@@ -3646,6 +3652,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->waitStart = 0;
 
 			el++;
 		}
@@ -3698,6 +3705,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->waitStart = (TimestampTz) pg_atomic_read_u64(&proc->waitStart);
 
 		el++;
 	}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..dc9212cf89 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	pg_atomic_init_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -1248,6 +1249,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	 */
 	if (!InHotStandby)
 	{
+		TimestampTz deadlockStart;
+
 		if (LockTimeout > 0)
 		{
 			EnableTimeoutParams timeouts[2];
@@ -1262,6 +1265,16 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+		/*
+		 * Record waitStart reusing the deadlock timeout timer.
+		 *
+		 * It would be ideal this can be synchronously done with updating
+		 * lock information. Howerver, since it gives performance impacts
+		 * to hold partitionLock longer time, we do it here asynchronously.
+		 */
+		deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+		pg_atomic_write_u64(&MyProc->waitStart,
+					pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
 	}
 	else if (log_recovery_conflict_waits)
 	{
@@ -1678,6 +1691,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	proc->waitStatus = waitStatus;
+	pg_atomic_init_u64(&MyProc->waitStart, 0);
 
 	/* And awaken it */
 	SetLatch(&proc->procLatch);
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..97f0265c12 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "waitstart",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (!granted && instance->waitStart != 0)
+			values[15] = TimestampTzGetDatum(instance->waitStart);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index f8174061ef..a1a99c9ef4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5986,9 +5986,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,waitstart}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 68a3487d49..846487f94b 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -446,6 +447,7 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz	waitStart;		/* time at which this PGPROC started waiting for lock */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 683ab64f76..21aa5afb14 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -181,6 +181,7 @@ struct PGPROC
 	LOCKMODE	waitLockMode;	/* type of lock we're waiting for */
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
+	pg_atomic_uint64	waitStart;		/* time at which wait for lock acquisition started */
 
 	bool		delayChkpt;		/* true if this proc delays checkpoint start */
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6173473de9..5b86c7d5f2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.waitstart
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, waitstart);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,
-- 
2.18.1

#17

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: torikoshia (#16)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I used it as is.

Thanks for updating the patch!

+	if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+		pg_atomic_write_u64(&MyProc->waitStart,
+							pg_atomic_read_u64((pg_atomic_uint64 *) &now));

pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.

+		deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+		pg_atomic_write_u64(&MyProc->waitStart,
+					pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));

Same as above.

+		/*
+		 * Record waitStart reusing the deadlock timeout timer.
+		 *
+		 * It would be ideal this can be synchronously done with updating
+		 * lock information. Howerver, since it gives performance impacts
+		 * to hold partitionLock longer time, we do it here asynchronously.
+		 */

IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v8.patchtext/plain; charset=UTF-8; name=v8.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+-----------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0464..9085d2e64c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10604,6 +10604,19 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>waitstart</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Time when the server process started waiting for this lock,
+       or null if the lock is held.
+       Note that this can be null for a very short period of time after
+       the wait started even though <structfield>granted</structfield>
+       is <literal>false</literal>.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..5877a60715 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -540,12 +540,34 @@ void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
 	TimestampTz ltime;
+	TimestampTz now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0)
+	/*
+	 * Update waitStart if first time through after the startup process
+	 * started waiting for the lock. It should not be updated every time
+	 * ResolveRecoveryConflictWithLock() is called during the wait.
+	 *
+	 * Use the current time obtained for comparison with ltime as waitStart
+	 * (i.e., the time when this process started waiting for the lock). Since
+	 * getting the current time newly can cause overhead, we reuse the
+	 * already-obtained time to avoid that overhead.
+	 *
+	 * Note that waitStart is updated without holding the lock table's
+	 * partition lock, to avoid the overhead by additional lock acquisition.
+	 * This can cause "waitstart" in pg_locks to become NULL for a very short
+	 * period of time after the wait started even though "granted" is false.
+	 * This is OK in practice because we can assume that users are likely to
+	 * look at "waitstart" when waiting for the lock for a long time.
+	 */
+	if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+		pg_atomic_write_u64(&MyProc->waitStart, now);
+
+	if (now >= ltime && ltime != 0)
 	{
 		/*
 		 * We're already behind, so clear a path as quickly as possible.
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 79c1cf9b8b..108b4d9023 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3619,6 +3619,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->waitStart = 0;
+
 			el++;
 		}
 
@@ -3646,6 +3652,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->waitStart = 0;
 
 			el++;
 		}
@@ -3698,6 +3705,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->waitStart = (TimestampTz) pg_atomic_read_u64(&proc->waitStart);
 
 		el++;
 	}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..f059d60a0f 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	pg_atomic_init_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -1262,6 +1263,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		/*
+		 * Use the current time obtained for the deadlock timeout timer as
+		 * waitStart (i.e., the time when this process started waiting for the
+		 * lock). Since getting the current time newly can cause overhead, we
+		 * reuse the already-obtained time to avoid that overhead.
+		 *
+		 * Note that waitStart is updated without holding the lock table's
+		 * partition lock, to avoid the overhead by additional lock
+		 * acquisition. This can cause "waitstart" in pg_locks to become NULL
+		 * for a very short period of time after the wait started even though
+		 * "granted" is false. This is OK in practice because we can assume
+		 * that users are likely to look at "waitstart" when waiting for the
+		 * lock for a long time.
+		 */
+		pg_atomic_write_u64(&MyProc->waitStart,
+							get_timeout_start_time(DEADLOCK_TIMEOUT));
 	}
 	else if (log_recovery_conflict_waits)
 	{
@@ -1678,6 +1696,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	proc->waitStatus = waitStatus;
+	pg_atomic_write_u64(&MyProc->waitStart, 0);
 
 	/* And awaken it */
 	SetLatch(&proc->procLatch);
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..97f0265c12 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "waitstart",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (!granted && instance->waitStart != 0)
+			values[15] = TimestampTzGetDatum(instance->waitStart);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4e0c9be58c..1487710d59 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5994,9 +5994,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,waitstart}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 68a3487d49..9b2a421c32 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -446,6 +447,8 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz waitStart;		/* time at which this PGPROC started waiting
+								 * for lock */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 683ab64f76..a777cb64a1 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -181,6 +181,8 @@ struct PGPROC
 	LOCKMODE	waitLockMode;	/* type of lock we're waiting for */
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
+	pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition
+								 * started */
 
 	bool		delayChkpt;		/* true if this proc delays checkpoint start */
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 6173473de9..5b86c7d5f2 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.waitstart
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, waitstart);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,

#18

torikoshia

torikoshia@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#17)

Re: adding wait_start column to pg_locks

On 2021-02-05 18:49, Fujii Masao wrote:

On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is
necessary when updating "waitStart" without holding the partition
lock? Also GetLockStatusData() needs spinlock when reading
"waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and
pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx
and pg_atomic_write_xxx only supports unsigned int, so I cast the
type.

I may be using these functions not correctly, so if something is
wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I
used it as is.

Thanks for updating the patch!
+	if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+		pg_atomic_write_u64(&MyProc->waitStart,
+							pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+		deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+		pg_atomic_write_u64(&MyProc->waitStart,
+					pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
Same as above.
+		/*
+		 * Record waitStart reusing the deadlock timeout timer.
+		 *
+		 * It would be ideal this can be synchronously done with updating
+		 * lock information. Howerver, since it gives performance impacts
+		 * to hold partitionLock longer time, we do it here asynchronously.
+		 */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch
attached.
Barring any objection, I will commit this version.

Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).

Regards,

--
Atsushi Torikoshi

#19

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: torikoshia (#18)

Re: adding wait_start column to pg_locks

On 2021/02/09 17:48, torikoshia wrote:

On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with updating
+         * lock information. Howerver, since it gives performance impacts
+         * to hold partitionLock longer time, we do it here asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).

Thanks for the test! I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#20

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#19)

Re: adding wait_start column to pg_locks

On 2021/02/09 18:13, Fujii Masao wrote:

On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with updating
+         * lock information. Howerver, since it gives performance impacts
+         * to hold partitionLock longer time, we do it here asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.

But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#21

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#20)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021/02/09 19:11, Fujii Masao wrote:

On 2021/02/09 18:13, Fujii Masao wrote:
On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with updating
+         * lock information. Howerver, since it gives performance impacts
+         * to hold partitionLock longer time, we do it here asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.
But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10

-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0

"rorqual" reported that the above error happened in the server built with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that "waitStart"
atomic variable in the dummy proc created at the end of prepare transaction
was not initialized. I updated the patch so that pg_atomic_init_u64() is
called for the "waitStart" in the dummy proc for prepared transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression tests.

BTW, while investigating this issue, I found that pg_stat_wal_receiver also
could cause this error even in the current master (without the patch).
I will report that in separate thread.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16

"prion" reported the following error. But I'm not sure how the changes of
pg_locks caused this error. I found that Heikki also reported at [1]/messages/by-id/f03ea04a-9b77-e371-9ab9-182cb35db1f9@iki.fi that
"prion" failed with the same error but was not sure how it happened.
This makes me think for now that this issue is not directly related to
the pg_locks changes.

-------------------------------------
pg_dump: error: query failed: ERROR: missing chunk number 0 for toast value 14444 in pg_toast_2619
pg_dump: error: query was: SELECT
a.attnum,
a.attname,
a.atttypmod,
a.attstattarget,
a.attstorage,
t.typstorage,
a.attnotnull,
a.atthasdef,
a.attisdropped,
a.attlen,
a.attalign,
a.attislocal,
pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,
array_to_string(a.attoptions, ', ') AS attoptions,
CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0 END AS attcollation,
pg_catalog.array_to_string(ARRAY(SELECT pg_catalog.quote_ident(option_name) || ' ' || pg_catalog.quote_literal(option_value) FROM pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name), E',
') AS attfdwoptions,
a.attidentity,
CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval ELSE null END AS attmissingval,
a.attgenerated
FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON a.atttypid = t.oid
WHERE a.attrelid = '35987'::pg_catalog.oid AND a.attnum > 0::pg_catalog.int2
ORDER BY a.attnum
pg_dumpall: error: pg_dump failed on database "regression", exiting
waiting for server to shut down.... done
server stopped
pg_dumpall of post-upgrade database cluster failed
-------------------------------------

[1]: /messages/by-id/f03ea04a-9b77-e371-9ab9-182cb35db1f9@iki.fi
/messages/by-id/f03ea04a-9b77-e371-9ab9-182cb35db1f9@iki.fi

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v9.patchtext/plain; charset=UTF-8; name=v9.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+-----------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0464..9085d2e64c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10604,6 +10604,19 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>waitstart</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Time when the server process started waiting for this lock,
+       or null if the lock is held.
+       Note that this can be null for a very short period of time after
+       the wait started even though <structfield>granted</structfield>
+       is <literal>false</literal>.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index fc18b77832..70d22577ce 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -873,8 +873,15 @@ PGPROC *
 TwoPhaseGetDummyProc(TransactionId xid, bool lock_held)
 {
 	GlobalTransaction gxact = TwoPhaseGetGXact(xid, lock_held);
+	PGPROC	   *dummy = &ProcGlobal->allProcs[gxact->pgprocno];
 
-	return &ProcGlobal->allProcs[gxact->pgprocno];
+	/*
+	 * Initialize atomic variable in dummy proc so that GetLockStatusData()
+	 * can read it later.
+	 */
+	pg_atomic_init_u64(&dummy->waitStart, 0);
+
+	return dummy;
 }
 
 /************************************************************************/
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..5877a60715 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -540,12 +540,34 @@ void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
 	TimestampTz ltime;
+	TimestampTz now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0)
+	/*
+	 * Update waitStart if first time through after the startup process
+	 * started waiting for the lock. It should not be updated every time
+	 * ResolveRecoveryConflictWithLock() is called during the wait.
+	 *
+	 * Use the current time obtained for comparison with ltime as waitStart
+	 * (i.e., the time when this process started waiting for the lock). Since
+	 * getting the current time newly can cause overhead, we reuse the
+	 * already-obtained time to avoid that overhead.
+	 *
+	 * Note that waitStart is updated without holding the lock table's
+	 * partition lock, to avoid the overhead by additional lock acquisition.
+	 * This can cause "waitstart" in pg_locks to become NULL for a very short
+	 * period of time after the wait started even though "granted" is false.
+	 * This is OK in practice because we can assume that users are likely to
+	 * look at "waitstart" when waiting for the lock for a long time.
+	 */
+	if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+		pg_atomic_write_u64(&MyProc->waitStart, now);
+
+	if (now >= ltime && ltime != 0)
 	{
 		/*
 		 * We're already behind, so clear a path as quickly as possible.
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 79c1cf9b8b..108b4d9023 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3619,6 +3619,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->waitStart = 0;
+
 			el++;
 		}
 
@@ -3646,6 +3652,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->waitStart = 0;
 
 			el++;
 		}
@@ -3698,6 +3705,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->waitStart = (TimestampTz) pg_atomic_read_u64(&proc->waitStart);
 
 		el++;
 	}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..f059d60a0f 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	pg_atomic_init_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -1262,6 +1263,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		/*
+		 * Use the current time obtained for the deadlock timeout timer as
+		 * waitStart (i.e., the time when this process started waiting for the
+		 * lock). Since getting the current time newly can cause overhead, we
+		 * reuse the already-obtained time to avoid that overhead.
+		 *
+		 * Note that waitStart is updated without holding the lock table's
+		 * partition lock, to avoid the overhead by additional lock
+		 * acquisition. This can cause "waitstart" in pg_locks to become NULL
+		 * for a very short period of time after the wait started even though
+		 * "granted" is false. This is OK in practice because we can assume
+		 * that users are likely to look at "waitstart" when waiting for the
+		 * lock for a long time.
+		 */
+		pg_atomic_write_u64(&MyProc->waitStart,
+							get_timeout_start_time(DEADLOCK_TIMEOUT));
 	}
 	else if (log_recovery_conflict_waits)
 	{
@@ -1678,6 +1696,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	proc->waitStatus = waitStatus;
+	pg_atomic_write_u64(&MyProc->waitStart, 0);
 
 	/* And awaken it */
 	SetLatch(&proc->procLatch);
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..97f0265c12 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "waitstart",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (!granted && instance->waitStart != 0)
+			values[15] = TimestampTzGetDatum(instance->waitStart);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 638830aaac..c1a5dd4ee7 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202102021
+#define CATALOG_VERSION_NO	202102091
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4e0c9be58c..1487710d59 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5994,9 +5994,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,waitstart}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 68a3487d49..9b2a421c32 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -446,6 +447,8 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz waitStart;		/* time at which this PGPROC started waiting
+								 * for lock */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 683ab64f76..a777cb64a1 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -181,6 +181,8 @@ struct PGPROC
 	LOCKMODE	waitLockMode;	/* type of lock we're waiting for */
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
+	pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition
+								 * started */
 
 	bool		delayChkpt;		/* true if this proc delays checkpoint start */
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b632d9f2ea..10a1f34ebc 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.waitstart
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, waitstart);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,

#22

torikoshia

torikoshia@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#21)

Re: adding wait_start column to pg_locks

On 2021-02-09 22:54, Fujii Masao wrote:

On 2021/02/09 19:11, Fujii Masao wrote:
On 2021/02/09 18:13, Fujii Masao wrote:
On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is
necessary when updating "waitStart" without holding the
partition lock? Also GetLockStatusData() needs spinlock when
reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations
like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and
pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems
pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned
int, so I cast the type.

I may be using these functions not correctly, so if something is
wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than
v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            pg_atomic_read_u64((pg_atomic_uint64 
*) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) 
&deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with 
updating
+         * lock information. Howerver, since it gives performance 
impacts
+         * to hold partitionLock longer time, we do it here 
asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can
be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch
attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.
But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0
"rorqual" reported that the above error happened in the server built
with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that
"waitStart"
atomic variable in the dummy proc created at the end of prepare
transaction
was not initialized. I updated the patch so that pg_atomic_init_u64()
is
called for the "waitStart" in the dummy proc for prepared transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression tests.

Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and confirmed
that all tests have passed.

BTW, while investigating this issue, I found that pg_stat_wal_receiver
also
could cause this error even in the current master (without the patch).
I will report that in separate thread.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prion&dt=2021-02-09%2009%3A13%3A16

"prion" reported the following error. But I'm not sure how the changes
of
pg_locks caused this error. I found that Heikki also reported at [1]
that
"prion" failed with the same error but was not sure how it happened.
This makes me think for now that this issue is not directly related to
the pg_locks changes.

Thanks! I was wondering how these errors were related to the commit.

Regards,

--
Atsushi Torikoshi

Show quoted text

-------------------------------------
pg_dump: error: query failed: ERROR: missing chunk number 0 for toast
value 14444 in pg_toast_2619
pg_dump: error: query was: SELECT
a.attnum,
a.attname,
a.atttypmod,
a.attstattarget,
a.attstorage,
t.typstorage,
a.attnotnull,
a.atthasdef,
a.attisdropped,
a.attlen,
a.attalign,
a.attislocal,
pg_catalog.format_type(t.oid, a.atttypmod) AS atttypname,
array_to_string(a.attoptions, ', ') AS attoptions,
CASE WHEN a.attcollation <> t.typcollation THEN a.attcollation ELSE 0
END AS attcollation,
pg_catalog.array_to_string(ARRAY(SELECT
pg_catalog.quote_ident(option_name) || ' ' ||
pg_catalog.quote_literal(option_value) FROM
pg_catalog.pg_options_to_table(attfdwoptions) ORDER BY option_name),
E',
') AS attfdwoptions,
a.attidentity,
CASE WHEN a.atthasmissing AND NOT a.attisdropped THEN a.attmissingval
ELSE null END AS attmissingval,
a.attgenerated
FROM pg_catalog.pg_attribute a LEFT JOIN pg_catalog.pg_type t ON
a.atttypid = t.oid
WHERE a.attrelid = '35987'::pg_catalog.oid AND a.attnum >
0::pg_catalog.int2
ORDER BY a.attnum
pg_dumpall: error: pg_dump failed on database "regression", exiting
waiting for server to shut down.... done
server stopped
pg_dumpall of post-upgrade database cluster failed
-------------------------------------

[1]
/messages/by-id/f03ea04a-9b77-e371-9ab9-182cb35db1f9@iki.fi

Regards,

#23

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: torikoshia (#22)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021/02/09 23:31, torikoshia wrote:

On 2021-02-09 22:54, Fujii Masao wrote:
On 2021/02/09 19:11, Fujii Masao wrote:
On 2021/02/09 18:13, Fujii Masao wrote:
On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with updating
+         * lock information. Howerver, since it gives performance impacts
+         * to hold partitionLock longer time, we do it here asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.
But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0
"rorqual" reported that the above error happened in the server built with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that "waitStart"
atomic variable in the dummy proc created at the end of prepare transaction
was not initialized. I updated the patch so that pg_atomic_init_u64() is
called for the "waitStart" in the dummy proc for prepared transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression tests.
Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and confirmed
that all tests have passed.

Thanks for the test!

I found another bug in the patch. InitProcess() initializes "waitStart",
but previously InitAuxiliaryProcess() did not. This could cause "invalid
spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

v10.patchtext/plain; charset=UTF-8; name=v10.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/contrib/amcheck/expected/check_btree.out b/contrib/amcheck/expected/check_btree.out
index 13848b7449..5a3f1ef737 100644
--- a/contrib/amcheck/expected/check_btree.out
+++ b/contrib/amcheck/expected/check_btree.out
@@ -97,8 +97,8 @@ SELECT bt_index_parent_check('bttest_b_idx');
 SELECT * FROM pg_locks
 WHERE relation = ANY(ARRAY['bttest_a', 'bttest_a_idx', 'bttest_b', 'bttest_b_idx']::regclass[])
     AND pid = pg_backend_pid();
- locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath 
-----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------
+ locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | waitstart 
+----------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-----+------+---------+----------+-----------
 (0 rows)
 
 COMMIT;
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c0464..9085d2e64c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -10604,6 +10604,19 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
        lock table
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>waitstart</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+       Time when the server process started waiting for this lock,
+       or null if the lock is held.
+       Note that this can be null for a very short period of time after
+       the wait started even though <structfield>granted</structfield>
+       is <literal>false</literal>.
+      </para></entry>
+     </row>
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index fc18b77832..70d22577ce 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -873,8 +873,15 @@ PGPROC *
 TwoPhaseGetDummyProc(TransactionId xid, bool lock_held)
 {
 	GlobalTransaction gxact = TwoPhaseGetGXact(xid, lock_held);
+	PGPROC	   *dummy = &ProcGlobal->allProcs[gxact->pgprocno];
 
-	return &ProcGlobal->allProcs[gxact->pgprocno];
+	/*
+	 * Initialize atomic variable in dummy proc so that GetLockStatusData()
+	 * can read it later.
+	 */
+	pg_atomic_init_u64(&dummy->waitStart, 0);
+
+	return dummy;
 }
 
 /************************************************************************/
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 39a30c00f7..5877a60715 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -540,12 +540,34 @@ void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag, bool logging_conflict)
 {
 	TimestampTz ltime;
+	TimestampTz now;
 
 	Assert(InHotStandby);
 
 	ltime = GetStandbyLimitTime();
+	now = GetCurrentTimestamp();
 
-	if (GetCurrentTimestamp() >= ltime && ltime != 0)
+	/*
+	 * Update waitStart if first time through after the startup process
+	 * started waiting for the lock. It should not be updated every time
+	 * ResolveRecoveryConflictWithLock() is called during the wait.
+	 *
+	 * Use the current time obtained for comparison with ltime as waitStart
+	 * (i.e., the time when this process started waiting for the lock). Since
+	 * getting the current time newly can cause overhead, we reuse the
+	 * already-obtained time to avoid that overhead.
+	 *
+	 * Note that waitStart is updated without holding the lock table's
+	 * partition lock, to avoid the overhead by additional lock acquisition.
+	 * This can cause "waitstart" in pg_locks to become NULL for a very short
+	 * period of time after the wait started even though "granted" is false.
+	 * This is OK in practice because we can assume that users are likely to
+	 * look at "waitstart" when waiting for the lock for a long time.
+	 */
+	if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+		pg_atomic_write_u64(&MyProc->waitStart, now);
+
+	if (now >= ltime && ltime != 0)
 	{
 		/*
 		 * We're already behind, so clear a path as quickly as possible.
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 79c1cf9b8b..108b4d9023 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -3619,6 +3619,12 @@ GetLockStatusData(void)
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
 
+			/*
+			 * Successfully taking fast path lock means there were no
+			 * conflicting locks.
+			 */
+			instance->waitStart = 0;
+
 			el++;
 		}
 
@@ -3646,6 +3652,7 @@ GetLockStatusData(void)
 			instance->pid = proc->pid;
 			instance->leaderPid = proc->pid;
 			instance->fastpath = true;
+			instance->waitStart = 0;
 
 			el++;
 		}
@@ -3698,6 +3705,7 @@ GetLockStatusData(void)
 		instance->pid = proc->pid;
 		instance->leaderPid = proclock->groupLeader->pid;
 		instance->fastpath = false;
+		instance->waitStart = (TimestampTz) pg_atomic_read_u64(&proc->waitStart);
 
 		el++;
 	}
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index c87ffc6549..0884909a22 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -402,6 +402,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	pg_atomic_init_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -580,6 +581,7 @@ InitAuxiliaryProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
+	pg_atomic_init_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -1262,6 +1264,23 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 		}
 		else
 			enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
+
+		/*
+		 * Use the current time obtained for the deadlock timeout timer as
+		 * waitStart (i.e., the time when this process started waiting for the
+		 * lock). Since getting the current time newly can cause overhead, we
+		 * reuse the already-obtained time to avoid that overhead.
+		 *
+		 * Note that waitStart is updated without holding the lock table's
+		 * partition lock, to avoid the overhead by additional lock
+		 * acquisition. This can cause "waitstart" in pg_locks to become NULL
+		 * for a very short period of time after the wait started even though
+		 * "granted" is false. This is OK in practice because we can assume
+		 * that users are likely to look at "waitstart" when waiting for the
+		 * lock for a long time.
+		 */
+		pg_atomic_write_u64(&MyProc->waitStart,
+							get_timeout_start_time(DEADLOCK_TIMEOUT));
 	}
 	else if (log_recovery_conflict_waits)
 	{
@@ -1678,6 +1697,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
 	proc->waitStatus = waitStatus;
+	pg_atomic_write_u64(&MyProc->waitStart, 0);
 
 	/* And awaken it */
 	SetLatch(&proc->procLatch);
diff --git a/src/backend/utils/adt/lockfuncs.c b/src/backend/utils/adt/lockfuncs.c
index b1cf5b79a7..97f0265c12 100644
--- a/src/backend/utils/adt/lockfuncs.c
+++ b/src/backend/utils/adt/lockfuncs.c
@@ -63,7 +63,7 @@ typedef struct
 } PG_Lock_Status;
 
 /* Number of columns in pg_locks output */
-#define NUM_LOCK_STATUS_COLUMNS		15
+#define NUM_LOCK_STATUS_COLUMNS		16
 
 /*
  * VXIDGetDatum - Construct a text representation of a VXID
@@ -142,6 +142,8 @@ pg_lock_status(PG_FUNCTION_ARGS)
 						   BOOLOID, -1, 0);
 		TupleDescInitEntry(tupdesc, (AttrNumber) 15, "fastpath",
 						   BOOLOID, -1, 0);
+		TupleDescInitEntry(tupdesc, (AttrNumber) 16, "waitstart",
+						   TIMESTAMPTZOID, -1, 0);
 
 		funcctx->tuple_desc = BlessTupleDesc(tupdesc);
 
@@ -336,6 +338,10 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum(GetLockmodeName(instance->locktag.locktag_lockmethodid, mode));
 		values[13] = BoolGetDatum(granted);
 		values[14] = BoolGetDatum(instance->fastpath);
+		if (!granted && instance->waitStart != 0)
+			values[15] = TimestampTzGetDatum(instance->waitStart);
+		else
+			nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
@@ -406,6 +412,7 @@ pg_lock_status(PG_FUNCTION_ARGS)
 		values[12] = CStringGetTextDatum("SIReadLock");
 		values[13] = BoolGetDatum(true);
 		values[14] = BoolGetDatum(false);
+		nulls[15] = true;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		result = HeapTupleGetDatum(tuple);
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 638830aaac..c1a5dd4ee7 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202102021
+#define CATALOG_VERSION_NO	202102091
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4e0c9be58c..1487710d59 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5994,9 +5994,9 @@
 { oid => '1371', descr => 'view system lock information',
   proname => 'pg_lock_status', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath}',
+  proallargtypes => '{text,oid,oid,int4,int2,text,xid,oid,oid,int2,text,int4,text,bool,bool,timestamptz}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{locktype,database,relation,page,tuple,virtualxid,transactionid,classid,objid,objsubid,virtualtransaction,pid,mode,granted,fastpath,waitstart}',
   prosrc => 'pg_lock_status' },
 { oid => '2561',
   descr => 'get array of PIDs of sessions blocking specified backend PID from acquiring a heavyweight lock',
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 68a3487d49..9b2a421c32 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -22,6 +22,7 @@
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
+#include "utils/timestamp.h"
 
 /* struct PGPROC is declared in proc.h, but must forward-reference it */
 typedef struct PGPROC PGPROC;
@@ -446,6 +447,8 @@ typedef struct LockInstanceData
 	LOCKMODE	waitLockMode;	/* lock awaited by this PGPROC, if any */
 	BackendId	backend;		/* backend ID of this PGPROC */
 	LocalTransactionId lxid;	/* local transaction ID of this PGPROC */
+	TimestampTz waitStart;		/* time at which this PGPROC started waiting
+								 * for lock */
 	int			pid;			/* pid of this PGPROC */
 	int			leaderPid;		/* pid of group leader; = pid if no group */
 	bool		fastpath;		/* taken via fastpath? */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 683ab64f76..a777cb64a1 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -181,6 +181,8 @@ struct PGPROC
 	LOCKMODE	waitLockMode;	/* type of lock we're waiting for */
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
+	pg_atomic_uint64 waitStart; /* time at which wait for lock acquisition
+								 * started */
 
 	bool		delayChkpt;		/* true if this proc delays checkpoint start */
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b632d9f2ea..10a1f34ebc 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1394,8 +1394,9 @@ pg_locks| SELECT l.locktype,
     l.pid,
     l.mode,
     l.granted,
-    l.fastpath
-   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath);
+    l.fastpath,
+    l.waitstart
+   FROM pg_lock_status() l(locktype, database, relation, page, tuple, virtualxid, transactionid, classid, objid, objsubid, virtualtransaction, pid, mode, granted, fastpath, waitstart);
 pg_matviews| SELECT n.nspname AS schemaname,
     c.relname AS matviewname,
     pg_get_userbyid(c.relowner) AS matviewowner,

#24

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#23)

Re: adding wait_start column to pg_locks

On 2021/02/10 10:43, Fujii Masao wrote:

On 2021/02/09 23:31, torikoshia wrote:
On 2021-02-09 22:54, Fujii Masao wrote:
On 2021/02/09 19:11, Fujii Masao wrote:
On 2021/02/09 18:13, Fujii Masao wrote:
On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with updating
+         * lock information. Howerver, since it gives performance impacts
+         * to hold partitionLock longer time, we do it here asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.
But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0
"rorqual" reported that the above error happened in the server built with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that "waitStart"
atomic variable in the dummy proc created at the end of prepare transaction
was not initialized. I updated the patch so that pg_atomic_init_u64() is
called for the "waitStart" in the dummy proc for prepared transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression tests.
Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and confirmed
that all tests have passed.
Thanks for the test!

I found another bug in the patch. InitProcess() initializes "waitStart",
but previously InitAuxiliaryProcess() did not. This could cause "invalid
spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.

I pushed this version. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#25

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#24)

1 attachment(s)

Re: adding wait_start column to pg_locks

On 2021/02/15 15:17, Fujii Masao wrote:

On 2021/02/10 10:43, Fujii Masao wrote:
On 2021/02/09 23:31, torikoshia wrote:
On 2021-02-09 22:54, Fujii Masao wrote:
On 2021/02/09 19:11, Fujii Masao wrote:
On 2021/02/09 18:13, Fujii Masao wrote:
On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with updating
+         * lock information. Howerver, since it gives performance impacts
+         * to hold partitionLock longer time, we do it here asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.
But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0
"rorqual" reported that the above error happened in the server built with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that "waitStart"
atomic variable in the dummy proc created at the end of prepare transaction
was not initialized. I updated the patch so that pg_atomic_init_u64() is
called for the "waitStart" in the dummy proc for prepared transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression tests.
Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and confirmed
that all tests have passed.
Thanks for the test!

I found another bug in the patch. InitProcess() initializes "waitStart",
but previously InitAuxiliaryProcess() did not. This could cause "invalid
spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.
I pushed this version. Thanks!

While reading the patch again, I found two minor things.

1. As discussed in another thread [1]/messages/by-id/7ef8708c-5b6b-edd3-2cf2-7783f1c7c175@oss.nttdata.com, the atomic variable "waitStart" should
be initialized at the postmaster startup rather than the startup of each
child process. I changed "waitStart" so that it's initialized in
InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64() in
InitProcess() and InitAuxiliaryProcess().

2. Thanks to the above change, InitProcGlobal() initializes "waitStart"
even in PGPROC entries for prepare transactions. But those entries are
zeroed in MarkAsPreparingGuts(), so "waitStart" needs to be initialized
again. Currently TwoPhaseGetDummyProc() initializes "waitStart" in the
PGPROC entry for prepare transaction. But it's better to do that in
MarkAsPreparingGuts() instead because that function initializes other
PGPROC variables. So I moved that initialization code from
TwoPhaseGetDummyProc() to MarkAsPreparingGuts().

Patch attached. Thought?

[1]: /messages/by-id/7ef8708c-5b6b-edd3-2cf2-7783f1c7c175@oss.nttdata.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Attachments:

bugfix_pg_locks_waitstart.patchtext/plain; charset=UTF-8; name=bugfix_pg_locks_waitstart.patch; x-mac-creator=0; x-mac-type=0Download

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 70d22577ce..c9b13062b0 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -475,6 +475,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
 	proc->lwWaitMode = 0;
 	proc->waitLock = NULL;
 	proc->waitProcLock = NULL;
+	pg_atomic_init_u64(&proc->waitStart, 0);
 	for (i = 0; i < NUM_LOCK_PARTITIONS; i++)
 		SHMQueueInit(&(proc->myProcLocks[i]));
 	/* subxid data must be filled later by GXactLoadSubxactData */
@@ -873,15 +874,8 @@ PGPROC *
 TwoPhaseGetDummyProc(TransactionId xid, bool lock_held)
 {
 	GlobalTransaction gxact = TwoPhaseGetGXact(xid, lock_held);
-	PGPROC	   *dummy = &ProcGlobal->allProcs[gxact->pgprocno];
 
-	/*
-	 * Initialize atomic variable in dummy proc so that GetLockStatusData()
-	 * can read it later.
-	 */
-	pg_atomic_init_u64(&dummy->waitStart, 0);
-
-	return dummy;
+	return &ProcGlobal->allProcs[gxact->pgprocno];
 }
 
 /************************************************************************/
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 0884909a22..8e7d21456a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -281,6 +281,7 @@ InitProcGlobal(void)
 		 */
 		pg_atomic_init_u32(&(procs[i].procArrayGroupNext), INVALID_PGPROCNO);
 		pg_atomic_init_u32(&(procs[i].clogGroupNext), INVALID_PGPROCNO);
+		pg_atomic_init_u64(&(procs[i].waitStart), 0);
 	}
 
 	/*
@@ -402,7 +403,7 @@ InitProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
-	pg_atomic_init_u64(&MyProc->waitStart, 0);
+	pg_atomic_write_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;
@@ -581,7 +582,7 @@ InitAuxiliaryProcess(void)
 	MyProc->lwWaitMode = 0;
 	MyProc->waitLock = NULL;
 	MyProc->waitProcLock = NULL;
-	pg_atomic_init_u64(&MyProc->waitStart, 0);
+	pg_atomic_write_u64(&MyProc->waitStart, 0);
 #ifdef USE_ASSERT_CHECKING
 	{
 		int			i;

#26

torikoshia

torikoshia@oss.nttdata.com

almost 5 years ago

In reply to: Fujii Masao (#25)

Re: adding wait_start column to pg_locks

On 2021-02-16 16:59, Fujii Masao wrote:

On 2021/02/15 15:17, Fujii Masao wrote:
On 2021/02/10 10:43, Fujii Masao wrote:
On 2021/02/09 23:31, torikoshia wrote:
On 2021-02-09 22:54, Fujii Masao wrote:
On 2021/02/09 19:11, Fujii Masao wrote:
On 2021/02/09 18:13, Fujii Masao wrote:
On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock
is necessary when updating "waitStart" without holding the
partition lock? Also GetLockStatusData() needs spinlock when
reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic
operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and
pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems
pg_atomic_read_xxx and pg_atomic_write_xxx only supports
unsigned int, so I cast the type.

I may be using these functions not correctly, so if something
is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better
than v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+                            
pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = 
get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) 
&deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout 
timer.
+         *
+         * It would be ideal this can be synchronously done 
with updating
+         * lock information. Howerver, since it gives 
performance impacts
+         * to hold partitionLock longer time, we do it here 
asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout
timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart
can be
accessed concurrently there.

I updated the patch and addressed the above review comments.
Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.
But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0
"rorqual" reported that the above error happened in the server
built with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that
"waitStart"
atomic variable in the dummy proc created at the end of prepare
transaction
was not initialized. I updated the patch so that
pg_atomic_init_u64() is
called for the "waitStart" in the dummy proc for prepared
transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression
tests.
Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and
confirmed
that all tests have passed.
Thanks for the test!

I found another bug in the patch. InitProcess() initializes
"waitStart",
but previously InitAuxiliaryProcess() did not. This could cause
"invalid
spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.
I pushed this version. Thanks!
While reading the patch again, I found two minor things.

1. As discussed in another thread [1], the atomic variable "waitStart"
should
be initialized at the postmaster startup rather than the startup of
each
child process. I changed "waitStart" so that it's initialized in
InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64()
in
InitProcess() and InitAuxiliaryProcess().

2. Thanks to the above change, InitProcGlobal() initializes "waitStart"
even in PGPROC entries for prepare transactions. But those entries
are
zeroed in MarkAsPreparingGuts(), so "waitStart" needs to be
initialized
again. Currently TwoPhaseGetDummyProc() initializes "waitStart" in
the
PGPROC entry for prepare transaction. But it's better to do that in
MarkAsPreparingGuts() instead because that function initializes other
PGPROC variables. So I moved that initialization code from
TwoPhaseGetDummyProc() to MarkAsPreparingGuts().

Patch attached. Thought?

Thanks for updating the patch!

It seems to me that the modification is right.
I ran some regression tests but didn't find problems.

Regards,

--
Atsushi Torikoshi

#27

Fujii Masao

masao.fujii@oss.nttdata.com

almost 5 years ago

In reply to: torikoshia (#26)

Re: adding wait_start column to pg_locks

On 2021/02/18 16:26, torikoshia wrote:

On 2021-02-16 16:59, Fujii Masao wrote:
On 2021/02/15 15:17, Fujii Masao wrote:
On 2021/02/10 10:43, Fujii Masao wrote:
On 2021/02/09 23:31, torikoshia wrote:
On 2021-02-09 22:54, Fujii Masao wrote:
On 2021/02/09 19:11, Fujii Masao wrote:
On 2021/02/09 18:13, Fujii Masao wrote:
On 2021/02/09 17:48, torikoshia wrote:
On 2021-02-05 18:49, Fujii Masao wrote:
On 2021/02/05 0:03, torikoshia wrote:

On 2021-02-03 11:23, Fujii Masao wrote:

64-bit fetches are not atomic on some platforms. So spinlock is necessary when updating "waitStart" without holding the partition lock? Also GetLockStatusData() needs spinlock when reading "waitStart"?

Also it might be worth thinking to use 64-bit atomic operations like
pg_atomic_read_u64(), for that.

Thanks for your suggestion and advice!

In the attached patch I used pg_atomic_read_u64() and pg_atomic_write_u64().

waitStart is TimestampTz i.e., int64, but it seems pg_atomic_read_xxx and pg_atomic_write_xxx only supports unsigned int, so I cast the type.

I may be using these functions not correctly, so if something is wrong, I would appreciate any comments.

About the documentation, since your suggestion seems better than v6, I used it as is.

Thanks for updating the patch!
+    if (pg_atomic_read_u64(&MyProc->waitStart) == 0)
+        pg_atomic_write_u64(&MyProc->waitStart,
+ pg_atomic_read_u64((pg_atomic_uint64 *) &now));
pg_atomic_read_u64() is really necessary? I think that
"pg_atomic_write_u64(&MyProc->waitStart, now)" is enough.
+        deadlockStart = get_timeout_start_time(DEADLOCK_TIMEOUT);
+        pg_atomic_write_u64(&MyProc->waitStart,
+                    pg_atomic_read_u64((pg_atomic_uint64 *) &deadlockStart));
Same as above.
+        /*
+         * Record waitStart reusing the deadlock timeout timer.
+         *
+         * It would be ideal this can be synchronously done with updating
+         * lock information. Howerver, since it gives performance impacts
+         * to hold partitionLock longer time, we do it here asynchronously.
+         */
IMO it's better to comment why we reuse the deadlock timeout timer.

proc->waitStatus = waitStatus;
+ pg_atomic_init_u64(&MyProc->waitStart, 0);

pg_atomic_write_u64() should be used instead? Because waitStart can be
accessed concurrently there.

I updated the patch and addressed the above review comments. Patch attached.
Barring any objection, I will commit this version.
Thanks for modifying the patch!
I agree with your comments.

BTW, I ran pgbench several times before and after applying
this patch.

The environment is virtual machine(CentOS 8), so this is
just for reference, but there were no significant difference
in latency or tps(both are below 1%).
Thanks for the test! I pushed the patch.
But I reverted the patch because buildfarm members rorqual and
prion don't like the patch. I'm trying to investigate the cause
of this failures.

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rorqual&dt=2021-02-09%2009%3A20%3A10
-    relation     | locktype |        mode
------------------+----------+---------------------
- test_prepared_1 | relation | RowExclusiveLock
- test_prepared_1 | relation | AccessExclusiveLock
-(2 rows)
-
+ERROR:  invalid spinlock number: 0
"rorqual" reported that the above error happened in the server built with
--disable-atomics --disable-spinlocks when reading pg_locks after
the transaction was prepared. The cause of this issue is that "waitStart"
atomic variable in the dummy proc created at the end of prepare transaction
was not initialized. I updated the patch so that pg_atomic_init_u64() is
called for the "waitStart" in the dummy proc for prepared transaction.
Patch attached. I confirmed that the patched server built with
--disable-atomics --disable-spinlocks passed all the regression tests.
Thanks for fixing the bug, I also tested v9.patch configured with
--disable-atomics --disable-spinlocks on my environment and confirmed
that all tests have passed.
Thanks for the test!

I found another bug in the patch. InitProcess() initializes "waitStart",
but previously InitAuxiliaryProcess() did not. This could cause "invalid
spinlock number" error when reading pg_locks in the standby server.
I fixed that. Attached is the updated version of the patch.
I pushed this version. Thanks!
While reading the patch again, I found two minor things.

1. As discussed in another thread [1], the atomic variable "waitStart" should
be initialized at the postmaster startup rather than the startup of each
child process. I changed "waitStart" so that it's initialized in
InitProcGlobal() and also reset to 0 by using pg_atomic_write_u64() in
InitProcess() and InitAuxiliaryProcess().

2. Thanks to the above change, InitProcGlobal() initializes "waitStart"
even in PGPROC entries for prepare transactions. But those entries are
zeroed in MarkAsPreparingGuts(), so "waitStart" needs to be initialized
again. Currently TwoPhaseGetDummyProc() initializes "waitStart" in the
PGPROC entry for prepare transaction. But it's better to do that in
MarkAsPreparingGuts() instead because that function initializes other
PGPROC variables. So I moved that initialization code from
TwoPhaseGetDummyProc() to MarkAsPreparingGuts().

Patch attached. Thought?
Thanks for updating the patch!

It seems to me that the modification is right.
I ran some regression tests but didn't find problems.

Thanks for the review and test! I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION