Patch: show relation and tuple infos of a lock to acquire

Started by Christian Kruseover 12 years ago63 messageshackers
Jump to latest
#1Christian Kruse
christian@2ndquadrant.com

Hi there,

I created a patch improving the log_lock_wait messages by adding
relation infos (name and OID) as well as tuple infos (if present –
ctid and, if available, the tuple itself) in the context.

Sample output (log_lock_waits=on required):

session 1:
CREATE TABLE foo (val integer);
INSERT INTO foo (val) VALUES (1);
BEGIN;
UPDATE foo SET val = 3;

session 2:
BEGIN;
UPDATE TABLE foo SET val = 2;

Output w/o patch:

LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms

Output with patch:

LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms
CONTEXT: relation name: foo (OID 16385)
tuple (ctid (0,1)): (1)

I am not really sure where to put the functions. Currently they are
located in backend/storage/lmgr/lmgr.c because XactLockTableWait() is
located there, too. What do you think?

I also created a test spec for easy creation of the log output;
however, I was not able to provide an expected file since the process
IDs vary from test run to test run.

Regards,

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

show_table_name_and_tuple_in_lock_log.patchtext/x-diff; charset=us-asciiDownload+209-1
lock-log-contains-tuple-and-relation.spectext/plain; charset=us-asciiDownload
#2Simon Riggs
simon@2ndQuadrant.com
In reply to: Christian Kruse (#1)
Re: Patch: show relation and tuple infos of a lock to acquire

On 31 December 2013 09:12, Christian Kruse <christian@2ndquadrant.com> wrote:

Output w/o patch:

LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms

Output with patch:

LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms
CONTEXT: relation name: foo (OID 16385)
tuple (ctid (0,1)): (1)

That is useful info.

I think the message should be changed to say this only, without a context line

LOG: process 24774 acquired ShareLock on relation "foo" (OID 16385)
tuple (0,1) after 11688.720 ms

My reason is that pid 24774 was waiting for a *tuple lock* and it was
eventually granted, so thats what it should say. ALL locks wait for
the current transaction that holds the lock to complete, but only
tuple locks currently refer to a lock wait as a wait for a
transaction. That is confusing for users, as well as being
inconsistent with other existing messages.

Replacing the old text with the new info would represent a net
improvement in usefulness and understandability.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#2)
Re: Patch: show relation and tuple infos of a lock to acquire

Simon Riggs <simon@2ndQuadrant.com> writes:

On 31 December 2013 09:12, Christian Kruse <christian@2ndquadrant.com> wrote:

Output with patch:

LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms
CONTEXT: relation name: foo (OID 16385)
tuple (ctid (0,1)): (1)

That is useful info.

I think the message should be changed to say this only, without a context line

LOG: process 24774 acquired ShareLock on relation "foo" (OID 16385)
tuple (0,1) after 11688.720 ms

My reason is that pid 24774 was waiting for a *tuple lock* and it was
eventually granted, so thats what it should say.

No, that wasn't what it was waiting for, and lying to the user like that
isn't going to make things better.

Christian's idea of a context line seems plausible to me. I don't
care for this implementation too much --- a global variable? Ick.
Make a wrapper function for XactLockTableWait instead, please.
And I'm not real sure that showing the whole tuple contents is a good
thing; I can see people calling that a security leak, not to mention
that the performance consequences could be dire.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#3)
Re: Patch: show relation and tuple infos of a lock to acquire

On 31 December 2013 16:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 31 December 2013 09:12, Christian Kruse <christian@2ndquadrant.com> wrote:

Output with patch:

LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms
CONTEXT: relation name: foo (OID 16385)
tuple (ctid (0,1)): (1)

That is useful info.

I think the message should be changed to say this only, without a context line

LOG: process 24774 acquired ShareLock on relation "foo" (OID 16385)
tuple (0,1) after 11688.720 ms

My reason is that pid 24774 was waiting for a *tuple lock* and it was
eventually granted, so thats what it should say.

No, that wasn't what it was waiting for, and lying to the user like that
isn't going to make things better.

"Like that" is ambiguous and I don't understand you or what you are
objecting to.

When we run SELECT ... FOR SHARE we are attempting to lock rows. Why
is the transaction we are waiting for important when we wait to lock
rows, but when we wait to lock relations it isn't?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Christian Kruse (#1)
Re: Patch: show relation and tuple infos of a lock to acquire

On Tue, Dec 31, 2013 at 1:12 AM, Christian Kruse
<christian@2ndquadrant.com> wrote:

I created a patch improving the log_lock_wait messages by adding
relation infos (name and OID) as well as tuple infos (if present –
ctid and, if available, the tuple itself) in the context.

I think that this is a worthwhile effort, but like Tom I don't think
that it's universally true that these waiters are waiting on a tuple
lock. Most obviously, the XactLockTableWait() call you've modified
within nbtinsert.c is not. In actuality, it's waiting on what I
informally refer to as a "value lock" for a value being inserted into
a unique index. Unlike the tuple locking code, a LockTuple() call in
advance of the XactLockTableWait() call, to arbitrate ordering, is not
present. No tuple is locked at all.

ISTM that you should be printing just the value and the unique index
there, and not any information about the tuple proper. Doing any less
could be highly confusing. You should do an analysis of where this
kind of exception applies. For better direction about where that new
infrastructure ought to live, you might find some useful insight from
commit 991f3e5ab3f8196d18d5b313c81a5f744f3baaea.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Christian Kruse
christian@2ndquadrant.com
In reply to: Peter Geoghegan (#5)
Re: Patch: show relation and tuple infos of a lock to acquire

Hi,

On 31/12/13 13:56, Peter Geoghegan wrote:

I think that this is a worthwhile effort, but like Tom I don't think
that it's universally true that these waiters are waiting on a tuple
lock. Most obviously, the XactLockTableWait() call you've modified
within nbtinsert.c is not.

This is why I don't set the tuple data in this case:

XactLockTableWaitSetupErrorContextCallback(rel, NULL);

The second argument is the tuple argument. If it is set to NULL in the
error context callback, all output regarding tuple are suppressed.

ISTM that you should be printing just the value and the unique index
there, and not any information about the tuple proper.

Good point, I will have a look at this.

For better direction about where that new
infrastructure ought to live, you might find some useful insight from
commit 991f3e5ab3f8196d18d5b313c81a5f744f3baaea.

Thanks for the pointer!

But, to be honest, I am still unsure where to put this. As far as I
understand this commit has substantial parts in relcache.c and
elog.c – both don't seem to be very good fitting places?

Regards,

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Christian Kruse
christian@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: Patch: show relation and tuple infos of a lock to acquire

On 31/12/13 11:36, Tom Lane wrote:

Christian's idea of a context line seems plausible to me. I don't
care for this implementation too much --- a global variable? Ick.
Make a wrapper function for XactLockTableWait instead, please.

Point taken. You are right.

And I'm not real sure that showing the whole tuple contents is a good
thing; I can see people calling that a security leak, not to mention
that the performance consequences could be dire.

What do you suggest to include? Having some information to identify
the tuple may be very helpful for debugging. This is why I did it this
way.

Regards,

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#8Simon Riggs
simon@2ndQuadrant.com
In reply to: Simon Riggs (#4)
Re: Patch: show relation and tuple infos of a lock to acquire

On 31 December 2013 17:06, Simon Riggs <simon@2ndquadrant.com> wrote:

On 31 December 2013 16:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 31 December 2013 09:12, Christian Kruse <christian@2ndquadrant.com> wrote:

Output with patch:

LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms
CONTEXT: relation name: foo (OID 16385)
tuple (ctid (0,1)): (1)

That is useful info.

I think the message should be changed to say this only, without a context line

LOG: process 24774 acquired ShareLock on relation "foo" (OID 16385)
tuple (0,1) after 11688.720 ms

My reason is that pid 24774 was waiting for a *tuple lock* and it was
eventually granted, so thats what it should say.

No, that wasn't what it was waiting for, and lying to the user like that
isn't going to make things better.

"Like that" is ambiguous and I don't understand you or what you are
objecting to.

When we run SELECT ... FOR SHARE we are attempting to lock rows. Why
is the transaction we are waiting for important when we wait to lock
rows, but when we wait to lock relations it isn't?

My thought is that this message refers to a lock wait for a
transaction to end, which is made as part of a request to lock a row.
But it is not 100% of the lock request and a race condition exists
that means that we might need to wait multiple times in rare
circumstances.

So reporting that tuple lock has been *acquired* would be inaccurate,
since at that point it isn't true. So we need to describe the
situation better, e.g. "as part of waiting for a tuple lock we waited
on a transaction". Not very snappy.

Anyway, the important thing is that we can work out the tie being
waited for. Maybe logging the PK value would be useful as well, but
not the whole row.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#5)
Re: Patch: show relation and tuple infos of a lock to acquire

On 2013-12-31 13:56:53 -0800, Peter Geoghegan wrote:

ISTM that you should be printing just the value and the unique index
there, and not any information about the tuple proper. Doing any less
could be highly confusing.

I agree that the message needs improvement, but I don't agree that we
shouldn't lock the tuple's location. If you manually investigate the
situation that's where you'll find the conflicting tuple - I don't see
what we gain by not logging the ctid.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Patch: show relation and tuple infos of a lock to acquire

On 2013-12-31 11:36:36 -0500, Tom Lane wrote:

Simon Riggs <simon@2ndQuadrant.com> writes:

On 31 December 2013 09:12, Christian Kruse <christian@2ndquadrant.com> wrote:

Output with patch:

LOG: process 24774 acquired ShareLock on transaction 696 after 11688.720 ms
CONTEXT: relation name: foo (OID 16385)
tuple (ctid (0,1)): (1)

That is useful info.

I think the message should be changed to say this only, without a context line

LOG: process 24774 acquired ShareLock on relation "foo" (OID 16385)
tuple (0,1) after 11688.720 ms

My reason is that pid 24774 was waiting for a *tuple lock* and it was
eventually granted, so thats what it should say.

No, that wasn't what it was waiting for, and lying to the user like that
isn't going to make things better.

Agreed.

Christian's idea of a context line seems plausible to me. I don't
care for this implementation too much --- a global variable? Ick.

Yea, the data should be stored in ErrorContextCallback.arg instead.

Make a wrapper function for XactLockTableWait instead, please.

I don't think that'd work out all too nicely - we do the
XactLockTableWait() inside other functions like MultiXactIdWait(),
passing all the detail along for those would end up being ugly. So using
error context callbacks properly seems like the best way in the end.

And I'm not real sure that showing the whole tuple contents is a good
thing; I can see people calling that a security leak, not to mention
that the performance consequences could be dire.

I don't think that the security argument has too much merit given
today's PG - we already log the full tuple for various constraint
violations. And we also accept the performance penalty there. I don't
see any easy way to select a sensible subset of columns to print here,
and printing the columns is what would make investigating issues around
this so much easier. At the very least we'd need to print the pkey
columns and the columns of the unique key that might have been violated.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Andres Freund (#9)
Re: Patch: show relation and tuple infos of a lock to acquire

On Thu, Jan 2, 2014 at 12:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:

I agree that the message needs improvement, but I don't agree that we
shouldn't lock the tuple's location. If you manually investigate the
situation that's where you'll find the conflicting tuple - I don't see
what we gain by not logging the ctid.

I suppose so, but the tuple probably isn't going to be visible anyway,
at least when the message is initially logged.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andres Freund
andres@anarazel.de
In reply to: Peter Geoghegan (#11)
Re: Patch: show relation and tuple infos of a lock to acquire

On 2014-01-02 01:40:38 -0800, Peter Geoghegan wrote:

On Thu, Jan 2, 2014 at 12:56 AM, Andres Freund <andres@2ndquadrant.com> wrote:

I agree that the message needs improvement, but I don't agree that we
shouldn't lock the tuple's location. If you manually investigate the
situation that's where you'll find the conflicting tuple - I don't see
what we gain by not logging the ctid.

I suppose so, but the tuple probably isn't going to be visible anyway,
at least when the message is initially logged.

But that's the case for all these, otherwise we wouldn't wait on the
row.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Christian Kruse
christian@2ndquadrant.com
In reply to: Andres Freund (#10)
Re: Patch: show relation and tuple infos of a lock to acquire

Hi,

On 02/01/14 10:02, Andres Freund wrote:

Christian's idea of a context line seems plausible to me. I don't
care for this implementation too much --- a global variable? Ick.

Yea, the data should be stored in ErrorContextCallback.arg instead.

Fixed.

I also palloc() the ErrorContextCallback itself. But it doesn't feel
right since I could not find a piece of code doing so. What do you
think?

Make a wrapper function for XactLockTableWait instead, please.

I don't think that'd work out all too nicely - we do the
XactLockTableWait() inside other functions like MultiXactIdWait(),
passing all the detail along for those would end up being ugly. So using
error context callbacks properly seems like the best way in the end.

Wrapping XactLockTableWait()/MultiXactIdWait() at least allows the
ErrorContextCallback and ErrorContextCallback.arg to live on the
stack. So what we could do is wrap this in a macro instead of a
function (like e.g. PG_TRY) or write two different functions. And I
don't like the two functions approach since it means duplication of
code.

While writing the response I really think writing a macro in PG_TRY
style for setting up and cleaning the error context callback I begin
to think that this would be the best way to go.

And I'm not real sure that showing the whole tuple contents is a good
thing; I can see people calling that a security leak, not to mention
that the performance consequences could be dire.

I don't think that the security argument has too much merit given
today's PG - we already log the full tuple for various constraint
violations. And we also accept the performance penalty there. I don't
see any easy way to select a sensible subset of columns to print here,
and printing the columns is what would make investigating issues around
this so much easier. At the very least we'd need to print the pkey
columns and the columns of the unique key that might have been violated.

I agree. Even printing only the PK values would be much more
complex. As far as I can see we would even have to gain another lock
for this (see comment for relationHasPrimaryKey() in
src/backend/catalog/index.c).

Regards,

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

show_table_name_and_tuple_in_lock_log_v2.patchtext/x-diff; charset=us-asciiDownload+215-1
#14Amit Kapila
amit.kapila16@gmail.com
In reply to: Christian Kruse (#13)
Re: Patch: show relation and tuple infos of a lock to acquire

On Thu, Jan 2, 2014 at 4:38 PM, Christian Kruse
<christian@2ndquadrant.com> wrote:

Hi,

On 02/01/14 10:02, Andres Freund wrote:

Christian's idea of a context line seems plausible to me. I don't
care for this implementation too much --- a global variable? Ick.

Yea, the data should be stored in ErrorContextCallback.arg instead.

Fixed.

I have looked into this patch. Below are some points which I
think should be improved in this patch:

1. New context added by this patch is display at wrong place
Session-1
Create table foo(c1 int);
insert into foo values(generate_series(1,10);
Begin;
update foo set c1 =11;

Session-2
postgres=# begin;
BEGIN
postgres=# update foo set val1 = 13 where val1=3;
Cancel request sent
ERROR: canceling statement due to user request
CONTEXT: relation name: foo (OID 57496)
tuple ctid (0,7)

Do this patch expect to print the context at cancel request
as well?
I don't find it useful. I think the reason is that after setup of
context, if any other error happens, it will display the newly setup
context.

2a. About the message:
LOG: process 1936 still waiting for ShareLock on transaction 842
after 1014.000ms
CONTEXT: relation name: foo (OID 57496)
tuple (ctid (0,1)): (1, 2, abc )

Isn't it better to display database name as well, as if we see in
one of related messages as below, it displays database name as well.

"LOG: process 3012 still waiting for ExclusiveLock on tuple (0,1) of
relation 57
499 of database 12045 after 1014.000 ms"

2b. I think there is no need to display *ctid* before tuple offset, it seems
to be obvious as well as consistent with other similar message.

3. About callback
I think rather than calling setup/tear down functions from multiple places,
it will be better to have wrapper functions for
XactTableLockWait() and MultiXactIdWait().

Just check in plpgsql_exec_function(), how the errorcallback is set,
can we do something similar without to avoid palloc?

I think we still need to see how to avoid issue mentioned in point-1.

4. About the values of tuple
I think it might be useful to display some unique part of tuple for
debugging part, but what will we do incase there is no primary
key on table, so may be we can display initial few columns (2 or 3)
or just display tuple offset as is done in other similar message
shown above. More to the point, I have observed that in few cases
displaying values of tuple can be confusing as well. For Example:

Session-1
Create table foo(c1 int);
postgres=# insert into foo values(generate_series(1,3));
INSERT 0 3
postgres=# begin;
BEGIN
postgres=# update foo set c1 = 4;
UPDATE 3

Session-2
postgres=# update foo set c1=6;
LOG: process 1936 still waiting for ShareLock on transaction 884 after 1014.000
ms
CONTEXT: relation name: foo (OID 57499)
tuple (ctid (0,1)): (1)

Session-3
postgres=# begin;
BEGIN
postgres=# update foo set c1=5 where c1 =3;
LOG: process 3012 still waiting for ShareLock on transaction 884 after 1015.000
ms
CONTEXT: relation name: foo (OID 57499)
tuple (ctid (0,3)): (3)

Session-1
Commit;

Session-2
LOG: process 1936 acquired ShareLock on transaction 884 after 25294.000 ms
CONTEXT: relation name: foo (OID 57499)
tuple (ctid (0,1)): (1)
UPDATE 3

Session-3
LOG: process 3012 acquired ShareLock on transaction 884 after 13217.000 ms
CONTEXT: relation name: foo (OID 57499)
tuple (ctid (0,3)): (3)
LOG: process 3012 still waiting for ShareLock on transaction 885 after 1014.000
ms
CONTEXT: relation name: foo (OID 57499)
tuple (ctid (0,6)): (4)

Now here it will be bit confusing to display values with tuple,
as in session-3 statement has asked to update value (3), but we have started
waiting for value (4). Although it is right, but might not make much sense.
Also after session-2 commits the transaction, session-3 will say acquired lock,
however still it will not be able to update it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Christian Kruse
christian@2ndquadrant.com
In reply to: Amit Kapila (#14)
Re: Patch: show relation and tuple infos of a lock to acquire

Hi,

On 19.02.2014 08:11, Amit Kapila wrote:

I have looked into this patch. Below are some points which I
think should be improved in this patch:

Thank you for your review.

1. New context added by this patch is display at wrong place
[…]
Do this patch expect to print the context at cancel request
as well?
I don't find it useful. I think the reason is that after setup of
context, if any other error happens, it will display the newly setup
context.

To be honest, I don't see a problem here. If you cancel the request in
most cases it is because it doesn't finish in an acceptable time. So
displaying the context seems logical to me.

2a. About the message:
LOG: process 1936 still waiting for ShareLock on transaction 842
after 1014.000ms
CONTEXT: relation name: foo (OID 57496)
tuple (ctid (0,1)): (1, 2, abc )

Isn't it better to display database name as well, as if we see in
one of related messages as below, it displays database name as well.

"LOG: process 3012 still waiting for ExclusiveLock on tuple (0,1) of
relation 57
499 of database 12045 after 1014.000 ms"

Maybe. But then we either have duplicate infos in some cases (e.g. on
a table lock) or we have to distinguish error cases and show distinct
contextes. And also getting the database name from a relation is not
really straight forward (according to Andres we would have to look at
rel->rd_node.dbNode) and since other commands dealing with names don't
do so, either, I think we should leave out the database name.

2b. I think there is no need to display *ctid* before tuple offset, it seems
to be obvious as well as consistent with other similar message.

Ok, I'll drop it.

3. About callback I think rather than calling setup/tear down
functions from multiple places, it will be better to have wrapper
functions for XactTableLockWait() and MultiXactIdWait(). Just check
in plpgsql_exec_function(), how the errorcallback is set, can we do
something similar without to avoid palloc?

OK, Attached.

4. About the values of tuple
I think it might be useful to display some unique part of tuple for
debugging part, but what will we do incase there is no primary
key on table, so may be we can display initial few columns (2 or 3)
or just display tuple offset as is done in other similar message
shown above.

Currently I simply display the whole tuple with truncating long
fields. This is plain easy, no distinction necessary. I did some code
reading and it seems somewhat complex to get the PK columns and it
seems that we need another lock for this, too – maybe not the best
idea when we are already in locking trouble, do you disagree?

Also showing just a few columns when no PK is available might not be
helpful since the tuple is not necessarily identified by the columns
showed. And since we drop the CTID there is no alternative solution to
identify the tuple.

IMHO simply displaying the whole tuple is the best solution in this
case, do you agree?

More to the point, I have observed that in few cases
displaying values of tuple can be confusing as well. For Example:
[…]
Now here it will be bit confusing to display values with tuple, as
in session-3 statement has asked to update value (3), but we have
started waiting for value (4). Although it is right, but might not
make much sense.

What do you suggest to avoid confusion? I can see what you are talking
about, but I'm not sure what to do about it. Keep in mind that this
patch should help debugging locking, so IMHO it is essential to be
able to identify a tuple and therefore knowing its values.

Also after session-2 commits the transaction, session-3 will say
acquired lock, however still it will not be able to update it.

To be honest, I don't think that this is a problem of the
patch. Concurrency is not easy and it might lead to confusing
situations. I don't think that we should drop the tuple values because
of this, since they provide useful and precious debugging information.

Attached you will find a new version of the patch, mainly using
wrapper functions for XactLockTableWait() and MultiXactIdWait().

Best regards,

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

show_table_name_and_tuple_in_lock_log_v4.patchtext/x-diff; charset=us-asciiDownload+200-16
#16Amit Kapila
amit.kapila16@gmail.com
In reply to: Christian Kruse (#15)
Re: Patch: show relation and tuple infos of a lock to acquire

On Fri, Feb 21, 2014 at 4:55 PM, Christian Kruse
<christian@2ndquadrant.com> wrote:

Hi,

1. New context added by this patch is display at wrong place
[...]
Do this patch expect to print the context at cancel request
as well?

To be honest, I don't see a problem here. If you cancel the request in
most cases it is because it doesn't finish in an acceptable time. So
displaying the context seems logical to me.

Isn't it a matter of consistency, we might not be able to display it
on all long running updates, rather only when it goes for update on tuple
on which some other transaction is operating.

Also though this functionality is mainly going to be useful for the case
when log_lock_waits = on, but it will display newly added context even
without that.

Is it possible that we set callback to error_context_stack, only
when we go for displaying this message. The way to do could be that
rather than forming callback in local variable in fuction
XactLockTableWaitWithErrorCallback(), store it in global variable
and then use that at appropriate place to set error_context_stack.

In general, why I am suggesting to restrict display of newly added
context for the case it is added to ensure that it doesn't get started
displaying in other cases where it might make sense or might not
make sense.

I think if it is too tedious to do it or there are some problems
due to which we cannot restrict it for case it has been added, then
we can think of going with it.

2a. About the message:
LOG: process 1936 still waiting for ShareLock on transaction 842
after 1014.000ms
CONTEXT: relation name: foo (OID 57496)
tuple (ctid (0,1)): (1, 2, abc )

Isn't it better to display database name as well, as if we see in
one of related messages as below, it displays database name as well.

"LOG: process 3012 still waiting for ExclusiveLock on tuple (0,1) of
relation 57
499 of database 12045 after 1014.000 ms"

Maybe. But then we either have duplicate infos in some cases (e.g. on
a table lock) or we have to distinguish error cases and show distinct
contextes.

I think if we go with the way suggested above, then this should
not be problem, or do you think it can still create duplicate
info problem.

And also getting the database name from a relation is not
really straight forward

I think there should not be any complication, we can do something like
get_database_name(rel->rd_node.dbNode);

(according to Andres we would have to look at
rel->rd_node.dbNode) and since other commands dealing with names don't
do so, either, I think we should leave out the database name.

For this case as I have shown that in similar message, we already display
database oid, it should not be a problem to display here as well.
It will be more information to user.

2b. I think there is no need to display *ctid* before tuple offset, it seems
to be obvious as well as consistent with other similar message.

Ok, I'll drop it.

I have asked to just remove *ctid* from info not the actual info of
tuple location. It should look like tuple (0,1).

The way you have currently changed the code, it is crashing the backend.

Another thing related to this is that below code should  also not use *ctid*
+ if (geterrlevel() >= ERROR)
+ {
+ errcontext("tuple ctid (%u,%u)",
+   BlockIdGetBlockNumber(&(info->tuple->t_self.ip_blkid)),
+   info->tuple->t_self.ip_posid);
+ }

3. About callback I think rather than calling setup/tear down
functions from multiple places, it will be better to have wrapper
functions for XactTableLockWait() and MultiXactIdWait(). Just check
in plpgsql_exec_function(), how the errorcallback is set, can we do
something similar without to avoid palloc?

OK, Attached.

It looks better than previous, one minor suggestion:
Function name XactLockTableWaitWithErrorCallback() looks bit awkward
to me, shall we name something like XactLockTableWaitWithInfo()?

4. About the values of tuple
I think it might be useful to display some unique part of tuple for
debugging part, but what will we do incase there is no primary
key on table, so may be we can display initial few columns (2 or 3)
or just display tuple offset as is done in other similar message
shown above.

Currently I simply display the whole tuple with truncating long
fields. This is plain easy, no distinction necessary. I did some code
reading and it seems somewhat complex to get the PK columns and it
seems that we need another lock for this, too - maybe not the best
idea when we are already in locking trouble, do you disagree?

I don't think you need to take another lock for this, it must already
have appropriate lock by that time. There should not be any problem
in calling relationHasPrimaryKey except that currently it is static, you
need to expose it.

Also showing just a few columns when no PK is available might not be
helpful since the tuple is not necessarily identified by the columns
showed. And since we drop the CTID there is no alternative solution to
identify the tuple.

We should not drop the CTID value.

IMHO simply displaying the whole tuple is the best solution in this
case, do you agree?

I think what we can do is check if relation has primary or unique
key, then display the value for same, else don't display.

More to the point, I have observed that in few cases
displaying values of tuple can be confusing as well. For Example:
[...]
Now here it will be bit confusing to display values with tuple, as
in session-3 statement has asked to update value (3), but we have
started waiting for value (4). Although it is right, but might not
make much sense.

What do you suggest to avoid confusion? I can see what you are talking
about, but I'm not sure what to do about it.

I think best could have been to just display tuple offset (ctid) to avoid
such confusion, but I think displaying for unique/primary key could be
acceptable way.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#16)
Re: Patch: show relation and tuple infos of a lock to acquire

On 2014-02-22 11:53:24 +0530, Amit Kapila wrote:

On Fri, Feb 21, 2014 at 4:55 PM, Christian Kruse
<christian@2ndquadrant.com> wrote:

Hi,

1. New context added by this patch is display at wrong place
[...]
Do this patch expect to print the context at cancel request
as well?

To be honest, I don't see a problem here. If you cancel the request in
most cases it is because it doesn't finish in an acceptable time. So
displaying the context seems logical to me.

Isn't it a matter of consistency, we might not be able to display it
on all long running updates, rather only when it goes for update on tuple
on which some other transaction is operating.

We'll display it when we are waiting for a lock. Which quite possibly is
the reason it's cancelled, so logging the locking information is highly
pertinent.

Is it possible that we set callback to error_context_stack, only
when we go for displaying this message. The way to do could be that
rather than forming callback in local variable in fuction
XactLockTableWaitWithErrorCallback(), store it in global variable
and then use that at appropriate place to set error_context_stack.

That seems like unneccessary complexity.

In general, why I am suggesting to restrict display of newly added
context for the case it is added to ensure that it doesn't get started
displaying in other cases where it might make sense or might not
make sense.

Anything failing while inside an XactLockTableWait() et al. will benefit
from the context. In which scenarios could that be unneccessary?

(according to Andres we would have to look at
rel->rd_node.dbNode) and since other commands dealing with names don't
do so, either, I think we should leave out the database name.

For this case as I have shown that in similar message, we already display
database oid, it should not be a problem to display here as well.
It will be more information to user.

I don't think we do for any where we actually display the relation name,
do we?

Currently I simply display the whole tuple with truncating long
fields. This is plain easy, no distinction necessary. I did some code
reading and it seems somewhat complex to get the PK columns and it
seems that we need another lock for this, too - maybe not the best
idea when we are already in locking trouble, do you disagree?

I don't think you need to take another lock for this, it must already
have appropriate lock by that time. There should not be any problem
in calling relationHasPrimaryKey except that currently it is static, you
need to expose it.

Other locations that deal with similar things (notably
ExecBuildSlotValueDescription()) doesn't either. I don't think this
patch should introduce it then.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#17)
Re: Patch: show relation and tuple infos of a lock to acquire

On Sat, Feb 22, 2014 at 3:17 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2014-02-22 11:53:24 +0530, Amit Kapila wrote:

In general, why I am suggesting to restrict display of newly added
context for the case it is added to ensure that it doesn't get started
displaying in other cases where it might make sense or might not
make sense.

Anything failing while inside an XactLockTableWait() et al. will benefit
from the context. In which scenarios could that be unneccessary?

This path is quite deep, so I have not verified that whether
this new context can make sense for all error cases.
For example, in below path (error message), it doesn't seem to
make any sense (I have tried to generate it through debugger,
actual message will vary).

XactLockTableWait->SubTransGetParent->SimpleLruReadPage_ReadOnly->SimpleLruReadPage->SlruReportIOError

ERROR: could not access status of transaction 927
DETAIL: Could not open file "pg_subtrans/0000": No error.
CONTEXT: relation "public"."t1"
tuple ctid (0,2)

(according to Andres we would have to look at
rel->rd_node.dbNode) and since other commands dealing with names don't
do so, either, I think we should leave out the database name.

For this case as I have shown that in similar message, we already display
database oid, it should not be a problem to display here as well.
It will be more information to user.

I don't think we do for any where we actually display the relation name,
do we?

I have not checked that, but the reason I mentioned about database name
is due to display of database oid in similar message, please see the
message below. I think in below context we get it from lock tag and
I think for the patch case also, it might be better to display, but not
a mandatory thing. Consider it as a suggestion which if you also feel
good, then do it, else ignore it.

LOG: process 4656 still waiting for ExclusiveLock on tuple (0,1) of relation 57
513 of database 12045 after 1046.000 ms

Currently I simply display the whole tuple with truncating long
fields. This is plain easy, no distinction necessary. I did some code
reading and it seems somewhat complex to get the PK columns and it
seems that we need another lock for this, too - maybe not the best
idea when we are already in locking trouble, do you disagree?

I don't think you need to take another lock for this, it must already
have appropriate lock by that time. There should not be any problem
in calling relationHasPrimaryKey except that currently it is static, you
need to expose it.

Other locations that deal with similar things (notably
ExecBuildSlotValueDescription()) doesn't either. I don't think this
patch should introduce it then.

Displaying whole tuple doesn't seem to be the most right way
to get debug information and especially in this case we are
already displaying tuple offset(ctid) which is unique identity
to identify a tuple. It seems to me that it is sufficient to display
unique value of tuple if present.
I understand that there is no clear issue here, so may be if others also
share their opinion then it will be quite easy to take a call.

PS - I will be out for work assignment for next week, so might not be
able to complete the review, so others are welcome to takeup and
complete it in the meantime.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Christian Kruse
christian@2ndquadrant.com
In reply to: Amit Kapila (#18)
Re: Patch: show relation and tuple infos of a lock to acquire

Hi,

attached you will find a new version of the patch, removing the ctid
text but leaving the ctid itself in the message.

On 23/02/14 11:14, Amit Kapila wrote:

In general, why I am suggesting to restrict display of newly added
context for the case it is added to ensure that it doesn't get started
displaying in other cases where it might make sense or might not
make sense.

Anything failing while inside an XactLockTableWait() et al. will benefit
from the context. In which scenarios could that be unneccessary?

This path is quite deep, so I have not verified that whether
this new context can make sense for all error cases.
For example, in below path (error message), it doesn't seem to
make any sense (I have tried to generate it through debugger,
actual message will vary).

XactLockTableWait->SubTransGetParent->SimpleLruReadPage_ReadOnly->SimpleLruReadPage->SlruReportIOError

ERROR: could not access status of transaction 927
DETAIL: Could not open file "pg_subtrans/0000": No error.
CONTEXT: relation "public"."t1"
tuple ctid (0,2)

To be honest, I don't like the idea of setting up this error context
only for log_lock_wait messages. This sounds unnecessary complex to me
and I think that in the few cases where this message doesn't add a
value (and thus is useless) don't justify such complexity.

I have not checked that, but the reason I mentioned about database name
is due to display of database oid in similar message, please see the
message below. I think in below context we get it from lock tag and
I think for the patch case also, it might be better to display, but not
a mandatory thing. Consider it as a suggestion which if you also feel
good, then do it, else ignore it.

LOG: process 4656 still waiting for ExclusiveLock on tuple (0,1) of relation 57
513 of database 12045 after 1046.000 ms

After thinking a bit about this I added the database name to the
context message, see attached patch.

Currently I simply display the whole tuple with truncating long
fields. This is plain easy, no distinction necessary. I did some code
reading and it seems somewhat complex to get the PK columns and it
seems that we need another lock for this, too - maybe not the best
idea when we are already in locking trouble, do you disagree?

I don't think you need to take another lock for this, it must already
have appropriate lock by that time. There should not be any problem
in calling relationHasPrimaryKey except that currently it is static, you
need to expose it.

Other locations that deal with similar things (notably
ExecBuildSlotValueDescription()) doesn't either. I don't think this
patch should introduce it then.

Displaying whole tuple doesn't seem to be the most right way
to get debug information and especially in this case we are
already displaying tuple offset(ctid) which is unique identity
to identify a tuple. It seems to me that it is sufficient to display
unique value of tuple if present.
I understand that there is no clear issue here, so may be if others also
share their opinion then it will be quite easy to take a call.

That would be nice. I didn't change it, yet.

Best regards,

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

show_table_name_and_tuple_in_lock_log_v5.patchtext/x-diff; charset=us-asciiDownload+204-16
#20Robert Haas
robertmhaas@gmail.com
In reply to: Christian Kruse (#19)
Re: Patch: show relation and tuple infos of a lock to acquire

On Mon, Feb 24, 2014 at 10:13 AM, Christian Kruse
<christian@2ndquadrant.com> wrote:

To be honest, I don't like the idea of setting up this error context
only for log_lock_wait messages. This sounds unnecessary complex to me
and I think that in the few cases where this message doesn't add a
value (and thus is useless) don't justify such complexity.

Reading this over, I'm not sure I understand why this is a CONTEXT at
all and not just a DETAIL for the particular error message that it's
supposed to be decorating. Generally CONTEXT should be used for
information that will be relevant to all errors in a given code path,
and DETAIL for extra information specific to a particular error.

If we're going to stick with CONTEXT, we could rephrase it like this:

CONTEXT: while attempting to lock tuple (1,2) in relation with OID 3456

or when the relation name is known:

CONTEXT: while attempting to lock tuple (1,2) in relation "public"."foo"

Displaying whole tuple doesn't seem to be the most right way
to get debug information and especially in this case we are
already displaying tuple offset(ctid) which is unique identity
to identify a tuple. It seems to me that it is sufficient to display
unique value of tuple if present.
I understand that there is no clear issue here, so may be if others also
share their opinion then it will be quite easy to take a call.

I wouldn't be inclined to dump the whole tuple under any
circumstances. That could be a lot more data than what you want
dumped in your log. The PK could already be somewhat unreasonably
large, but the whole tuple could be a lot more unreasonably large.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Christian Kruse
christian@2ndquadrant.com
In reply to: Robert Haas (#20)
#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Christian Kruse (#21)
#23Amit Kapila
amit.kapila16@gmail.com
In reply to: Christian Kruse (#21)
#24Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#22)
#25Andres Freund
andres@anarazel.de
In reply to: Amit Kapila (#23)
#26Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#25)
#27Amit Kapila
amit.kapila16@gmail.com
In reply to: Andres Freund (#24)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Amit Kapila (#27)
#29Amit Kapila
amit.kapila16@gmail.com
In reply to: Simon Riggs (#28)
#30Amit Kapila
amit.kapila16@gmail.com
In reply to: Christian Kruse (#21)
#31Christian Kruse
christian@2ndquadrant.com
In reply to: Amit Kapila (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Christian Kruse (#31)
#33Christian Kruse
christian@2ndquadrant.com
In reply to: Robert Haas (#32)
#34Amit Kapila
amit.kapila16@gmail.com
In reply to: Christian Kruse (#31)
#35Christian Kruse
christian@2ndquadrant.com
In reply to: Amit Kapila (#34)
#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Christian Kruse (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#34)
#38Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#37)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#39)
#41Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#39)
#42Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Christian Kruse (#35)
#43Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#41)
#44Christian Kruse
christian@2ndquadrant.com
In reply to: Amit Kapila (#43)
#45Amit Kapila
amit.kapila16@gmail.com
In reply to: Christian Kruse (#44)
#46Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Christian Kruse (#44)
#47Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#46)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#47)
#49Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#48)
#50Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#48)
#51Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#48)
#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Alvaro Herrera (#46)
#53Bruce Momjian
bruce@momjian.us
In reply to: Amit Kapila (#51)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#53)
#55Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#52)
#56Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#55)
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#55)
#58Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#57)
#59Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#58)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#59)
#61Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#59)
#62Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#61)
#63Christian Kruse
christian@2ndquadrant.com
In reply to: Alvaro Herrera (#62)