Detail part for still waiting for lock log message

Started by Tarvi Pillessaarover 12 years ago7 messages

tarvip@gmail.com

over 12 years ago

1 attachment(s)

Hello!

From time to time when investigating different locking issues using
postgres log i have thought that "process x is still waiting for"
message could be more informative, for example at the moment it is quite
painful to find out which session was actually holding that particular lock.

I would like to add detail part for this message to show information
about the lock holder and also show what the lock holder is actually
doing (yes, i know that i could get the lock holder's statement from the
log, but this not the same, maybe lock holder was idle in transaction).
So, i wrote a small patch, but i am not sure that this is the
best/correct way to do it.
Maybe someone can take a look at my patch and give some feedback.
Even if this idea won't reach to upstream, i would still like to get
feedback.

About patch:
Patch is tested against 9.2.4.
I was not sure that i should check if the lock holder's proclock was
found (as lock holder's proclock should be always there), check is there
to be on the safe side, but maybe it's unnecessary.
If it's not needed then fallback to old behavior (logging without
detail) is not needed as well.
And yes, i know that the lock holding time is not actually correct and
it actually shows milliseconds since transaction start.

Regards,
Tarvi Pillessaar

Attachments:

still_waiting_for_lock-v1.patchtext/x-patch; name=still_waiting_for_lock-v1.patchDownload

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 3bfdd23..4dc7b37 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -49,6 +49,7 @@
 #include "storage/procsignal.h"
 #include "storage/spin.h"
 #include "utils/timestamp.h"
+#include "pgstat.h"
 
 
 /* GUC variables */
@@ -1198,9 +1199,66 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			}
 
 			if (myWaitStatus == STATUS_WAITING)
-				ereport(LOG,
-						(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
-							  MyProcPid, modename, buf.data, msecs, usecs)));
+			{
+				PROCLOCK   *proclock2;
+				PGPROC *proc2;
+				PgBackendStatus *be;
+				bool found = false;
+
+				/* find lock holder */
+				proclock2 = (PROCLOCK *) SHMQueueNext(&(lock->procLocks), &(lock->procLocks),
+						offsetof(PROCLOCK, lockLink));
+				while (proclock2)
+				{
+					if (lockMethodTable->conflictTab[lockmode] & proclock2->holdMask)
+						break;
+
+					proclock2 = (PROCLOCK *) SHMQueueNext(&(lock->procLocks), &proclock2->lockLink,
+							offsetof(PROCLOCK, lockLink));
+				}
+
+				if (proclock2)
+				{
+					proc2 = proclock2->tag.myProc;
+
+					/* get lock holder's beentry */
+					int numbackends = pgstat_fetch_stat_numbackends();
+					for (i = 1; i <= numbackends; i++)
+					{
+						be = pgstat_fetch_stat_beentry(i);
+						if (be)
+						{
+							if (be->st_procpid == proc2->pid)
+							{
+								found = true;
+								break;
+							}
+						}
+					}
+				}
+
+				if (found)
+				{
+					long secs2;
+					int usecs2;
+					long msecs2;
+
+					/* calculate lock holder's tx duration */
+					TimestampDifference(be->st_xact_start_timestamp, GetCurrentTimestamp(), &secs2, &usecs2);
+					msecs2 = secs2 * 1000 + usecs2 / 1000;
+					usecs2 = usecs2 % 1000;
+
+					ereport(LOG,
+							(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
+									MyProcPid, modename, buf.data, msecs, usecs),
+							 errdetail_log("process %d is holding lock for %ld.%03d ms, activity: %s",
+									proc2->pid, msecs2, usecs2, be->st_activity)));
+				}
+				else
+					ereport(LOG,
+							(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
+									MyProcPid, modename, buf.data, msecs, usecs)));
+			}
 			else if (myWaitStatus == STATUS_OK)
 				ereport(LOG,
 					(errmsg("process %d acquired %s on %s after %ld.%03d ms",

Kevin Grittner

kgrittn@ymail.com

over 12 years ago

In reply to: Tarvi Pillessaar (#1)

Re: Detail part for still waiting for lock log message

Tarvi Pillessaar <tarvip@gmail.com> wrote:

Maybe someone can take a look at my patch and give some feedback.

Even if this idea won't reach to upstream, i would still like to get
feedback.

Please add the patch to the open CommitFest.

You can read about the process here:

http://wiki.postgresql.org/wiki/CommitFest

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Eisentraut

peter_e@gmx.net

over 12 years ago

In reply to: Tarvi Pillessaar (#1)

Re: Detail part for still waiting for lock log message

On Tue, 2013-08-20 at 19:21 +0300, Tarvi Pillessaar wrote:

About patch:
Patch is tested against 9.2.4.
I was not sure that i should check if the lock holder's proclock was
found (as lock holder's proclock should be always there), check is there
to be on the safe side, but maybe it's unnecessary.
If it's not needed then fallback to old behavior (logging without
detail) is not needed as well.
And yes, i know that the lock holding time is not actually correct and
it actually shows milliseconds since transaction start.

Please fix this compiler warning:

proc.c: In function ‘ProcSleep’:
proc.c:1258:6: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tarvi Pillessaar

tarvip@gmail.com

over 12 years ago

In reply to: Peter Eisentraut (#3)

1 attachment(s)

Re: Detail part for still waiting for lock log message

Fixed patch attached.

Regards,
Tarvi Pillessaar

Show quoted text

On 24.08.2013 17:58, Peter Eisentraut wrote:

On Tue, 2013-08-20 at 19:21 +0300, Tarvi Pillessaar wrote:

About patch:
Patch is tested against 9.2.4.
I was not sure that i should check if the lock holder's proclock was
found (as lock holder's proclock should be always there), check is there
to be on the safe side, but maybe it's unnecessary.
If it's not needed then fallback to old behavior (logging without
detail) is not needed as well.
And yes, i know that the lock holding time is not actually correct and
it actually shows milliseconds since transaction start.

Please fix this compiler warning:

proc.c: In function ‘ProcSleep’:
proc.c:1258:6: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]

Attachments:

still_waiting_for_lock-v2.patchtext/x-patch; name=still_waiting_for_lock-v2.patchDownload

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 3bfdd23..d75ca93 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -49,6 +49,7 @@
 #include "storage/procsignal.h"
 #include "storage/spin.h"
 #include "utils/timestamp.h"
+#include "pgstat.h"
 
 
 /* GUC variables */
@@ -1198,9 +1199,67 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 			}
 
 			if (myWaitStatus == STATUS_WAITING)
-				ereport(LOG,
-						(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
-							  MyProcPid, modename, buf.data, msecs, usecs)));
+			{
+				PROCLOCK   *proclock2;
+				PGPROC *proc2;
+				PgBackendStatus *be;
+				bool found = false;
+
+				/* find lock holder */
+				proclock2 = (PROCLOCK *) SHMQueueNext(&(lock->procLocks), &(lock->procLocks),
+						offsetof(PROCLOCK, lockLink));
+				while (proclock2)
+				{
+					if (lockMethodTable->conflictTab[lockmode] & proclock2->holdMask)
+						break;
+
+					proclock2 = (PROCLOCK *) SHMQueueNext(&(lock->procLocks), &proclock2->lockLink,
+							offsetof(PROCLOCK, lockLink));
+				}
+
+				if (proclock2)
+				{
+					int numbackends;
+
+					/* get lock holder's beentry */
+					proc2 = proclock2->tag.myProc;
+					numbackends = pgstat_fetch_stat_numbackends();
+					for (i = 1; i <= numbackends; i++)
+					{
+						be = pgstat_fetch_stat_beentry(i);
+						if (be)
+						{
+							if (be->st_procpid == proc2->pid)
+							{
+								found = true;
+								break;
+							}
+						}
+					}
+				}
+
+				if (found)
+				{
+					long secs2;
+					int usecs2;
+					long msecs2;
+
+					/* calculate lock holder's tx duration */
+					TimestampDifference(be->st_xact_start_timestamp, GetCurrentTimestamp(), &secs2, &usecs2);
+					msecs2 = secs2 * 1000 + usecs2 / 1000;
+					usecs2 = usecs2 % 1000;
+
+					ereport(LOG,
+							(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
+									MyProcPid, modename, buf.data, msecs, usecs),
+							 errdetail_log("process %d is holding lock for %ld.%03d ms, activity: %s",
+									proc2->pid, msecs2, usecs2, be->st_activity)));
+				}
+				else
+					ereport(LOG,
+							(errmsg("process %d still waiting for %s on %s after %ld.%03d ms",
+									MyProcPid, modename, buf.data, msecs, usecs)));
+			}
 			else if (myWaitStatus == STATUS_OK)
 				ereport(LOG,
 					(errmsg("process %d acquired %s on %s after %ld.%03d ms",

Alvaro Herrera

alvherre@2ndquadrant.com

over 12 years ago

In reply to: Tarvi Pillessaar (#4)

Re: Detail part for still waiting for lock log message

Tarvi Pillessaar escribiï¿½:

Fixed patch attached.

1. this assumes there is only one holder, which is not correct.
(Consider two backends holding shared lock on something and another one
stuck trying to acquire exclusive)

2. I think pgstat_get_backend_current_activity() can be helpful.

3. Doesn't this risk excessive overhead? Can the other backends be gone
(or done with the lock) before the report has completed? If this
happens, is there a problem?

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tarvi Pillessaar

tarvip@gmail.com

over 12 years ago

In reply to: Alvaro Herrera (#5)

Re: Detail part for still waiting for lock log message

Thank you for feedback.

On 26.08.2013 22:20, Alvaro Herrera wrote:

1. this assumes there is only one holder, which is not correct.
(Consider two backends holding shared lock on something and another one
stuck trying to acquire exclusive)

Hmm, true. Then it's not that simple as I thought in first place.
I guess it is possible to find all those backends that are holding that
shared lock, but i'm not sure about the usefulness anymore as this list
can be huge.

2. I think pgstat_get_backend_current_activity() can be helpful.

Yes, I saw that function, but i also wanted tx start time.

3. Doesn't this risk excessive overhead?

About the overhead, i may be wrong, but i was thinking that that
particular backend will be put to sleep anyway and also in normal
workload such log messages are not very common (or at least shouldn't be).

Can the other backends be gone
(or done with the lock) before the report has completed? If this
happens, is there a problem?

That's why i have added check if other backend is found etc, but maybe i
missed something. This is one of reasons why i wanted feedback.

Regards
Tarvi Pillessaar

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alvaro Herrera

alvherre@2ndquadrant.com

over 12 years ago

In reply to: Tarvi Pillessaar (#6)

Re: Detail part for still waiting for lock log message

Tarvi Pillessaar escribiï¿½:

Thank you for feedback.

On 26.08.2013 22:20, Alvaro Herrera wrote:

1. this assumes there is only one holder, which is not correct.
(Consider two backends holding shared lock on something and another one
stuck trying to acquire exclusive)

Hmm, true. Then it's not that simple as I thought in first place.
I guess it is possible to find all those backends that are holding
that shared lock, but i'm not sure about the usefulness anymore as
this list can be huge.

We already do this for deadlock reports, so I think it's fine.

2. I think pgstat_get_backend_current_activity() can be helpful.

Yes, I saw that function, but i also wanted tx start time.

So add that. But note the existing function has checks for
"changecount" and stuff that your code didn't. I think for robustness
you need to add that too.

3. Doesn't this risk excessive overhead?

About the overhead, i may be wrong, but i was thinking that that
particular backend will be put to sleep anyway and also in normal
workload such log messages are not very common (or at least
shouldn't be).

I guess there's no way to know how long the process is going to sleep
after logging this message.

Can the other backends be gone
(or done with the lock) before the report has completed? If this
happens, is there a problem?

That's why i have added check if other backend is found etc, but
maybe i missed something. This is one of reasons why i wanted
feedback.

Make sure you understand the "changecount" stuff and the reasons to use
it in pgstat_get_backend_current_activity.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers