proposal: lock_time for pg_stat_database

Started by Pavel Stehulealmost 11 years ago9 messages
#1Pavel Stehule
pavel.stehule@gmail.com

Hi all,

some time ago, I proposed a lock time measurement related to query. A main
issue was a method, how to show this information. Today proposal is little
bit simpler, but still useful. We can show a total lock time per database
in pg_stat_database statistics. High number can be signal about lock issues.

Comments, ideas, notices?

Regards

Pavel

#2Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Pavel Stehule (#1)
Re: proposal: lock_time for pg_stat_database

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.

Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).

Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jim Nasby (#2)
Re: proposal: lock_time for pg_stat_database

2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement related to query. A
main issue was a method, how to show this information. Today proposal is
little bit simpler, but still useful. We can show a total lock time per
database in pg_stat_database statistics. High number can be signal about
lock issues.

Would this not use the existing stats mechanisms? If so, couldn't we do
this per table? (I realize that won't handle all cases; we'd still need a
"lock_time_other" somewhere).

it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it
depends on order

Also, what do you mean by 'lock'? Heavyweight? We already have some
visibility there. What I wish we had was some way to know if we're spending
a lot of time in a particular non-heavy lock. Actually measuring time
probably wouldn't make sense but we might be able to count how often we
fail initial acquisition or something.

now, when I am thinking about it, lock_time is not good name - maybe
"waiting lock time" (lock time should not be interesting, waiting is
interesting) - it can be divided to some more categories - in GoodData we
use Heavyweight, pages, and others categories.

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

#4Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Pavel Stehule (#3)
Re: proposal: lock_time for pg_stat_database

On 1/16/15 11:35 AM, Pavel Stehule wrote:

2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.

Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).

it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it depends on order

Huh? Order of what?

Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.

now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.

So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on dtrace.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jim Nasby (#4)
Re: proposal: lock_time for pg_stat_database

2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/16/15 11:35 AM, Pavel Stehule wrote:

2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:
Jim.Nasby@bluetreble.com>>:

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement related to
query. A main issue was a method, how to show this information. Today
proposal is little bit simpler, but still useful. We can show a total lock
time per database in pg_stat_database statistics. High number can be signal
about lock issues.

Would this not use the existing stats mechanisms? If so, couldn't we
do this per table? (I realize that won't handle all cases; we'd still need
a "lock_time_other" somewhere).

it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it
depends on order

Huh? Order of what?

when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is
locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have
to cont as lock time for T1 and T2?

DDL statements are exception - there is almost simple mapping between
relations and lock time reason.

Also, what do you mean by 'lock'? Heavyweight? We already have some

visibility there. What I wish we had was some way to know if we're spending
a lot of time in a particular non-heavy lock. Actually measuring time
probably wouldn't make sense but we might be able to count how often we
fail initial acquisition or something.

now, when I am thinking about it, lock_time is not good name - maybe
"waiting lock time" (lock time should not be interesting, waiting is
interesting) - it can be divided to some more categories - in GoodData we
use Heavyweight, pages, and others categories.

So do you see this somehow encompassing locks other than heavyweight
locks? Because I think that's the biggest need here. Basically, something
akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on
dtrace.

For these global statistics I see as important a common total waiting time
for locks - we can use a more detailed granularity but I am not sure, if a
common statistics are best tool.

My motivations is - look to statistics -- and I can see ... lot of
rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue
too. It is tool for people without possibility to use dtrace and similar
tools and for everyday usage - simple check if locks are not a issue (or if
locking is stable).

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#5)
Re: proposal: lock_time for pg_stat_database

2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com>:

2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/16/15 11:35 AM, Pavel Stehule wrote:

2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:
Jim.Nasby@bluetreble.com>>:

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement related to
query. A main issue was a method, how to show this information. Today
proposal is little bit simpler, but still useful. We can show a total lock
time per database in pg_stat_database statistics. High number can be signal
about lock issues.

Would this not use the existing stats mechanisms? If so, couldn't we
do this per table? (I realize that won't handle all cases; we'd still need
a "lock_time_other" somewhere).

it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it
depends on order

Huh? Order of what?

when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is
locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have
to cont as lock time for T1 and T2?

DDL statements are exception - there is almost simple mapping between
relations and lock time reason.

Also, what do you mean by 'lock'? Heavyweight? We already have some

visibility there. What I wish we had was some way to know if we're spending
a lot of time in a particular non-heavy lock. Actually measuring time
probably wouldn't make sense but we might be able to count how often we
fail initial acquisition or something.

now, when I am thinking about it, lock_time is not good name - maybe
"waiting lock time" (lock time should not be interesting, waiting is
interesting) - it can be divided to some more categories - in GoodData we
use Heavyweight, pages, and others categories.

So do you see this somehow encompassing locks other than heavyweight
locks? Because I think that's the biggest need here. Basically, something
akin to TRACE_POSTGRESQL_LWLOCK_WAIT_START() that doesn't depend on
dtrace.

For these global statistics I see as important a common total waiting time
for locks - we can use a more detailed granularity but I am not sure, if a
common statistics are best tool.

My motivations is - look to statistics -- and I can see ... lot of
rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue
too. It is tool for people without possibility to use dtrace and similar
tools and for everyday usage - simple check if locks are not a issue (or if
locking is stable).

and this proposal has sense only for heavyweight locks - because others
locks are everywhere

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Pavel Stehule (#6)
Re: proposal: lock_time for pg_stat_database

On 1/16/15 12:30 PM, Pavel Stehule wrote:

2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com <mailto:pavel.stehule@gmail.com>>:

2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com>>:

On 1/16/15 11:35 AM, Pavel Stehule wrote:

2015-01-16 18:23 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> <mailto:Jim.Nasby@bluetreble.__com <mailto:Jim.Nasby@bluetreble.com>>>:

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement related to query. A main issue was a method, how to show this information. Today proposal is little bit simpler, but still useful. We can show a total lock time per database in pg_stat_database statistics. High number can be signal about lock issues.

Would this not use the existing stats mechanisms? If so, couldn't we do this per table? (I realize that won't handle all cases; we'd still need a "lock_time_other" somewhere).

it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table - because it depends on order

Huh? Order of what?

when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have to cont as lock time for T1 and T2?

If that select is waiting on a lock on t2, then it's waiting on that lock on that table. It doesn't matter who else has the lock.

Also, what do you mean by 'lock'? Heavyweight? We already have some visibility there. What I wish we had was some way to know if we're spending a lot of time in a particular non-heavy lock. Actually measuring time probably wouldn't make sense but we might be able to count how often we fail initial acquisition or something.

now, when I am thinking about it, lock_time is not good name - maybe "waiting lock time" (lock time should not be interesting, waiting is interesting) - it can be divided to some more categories - in GoodData we use Heavyweight, pages, and others categories.

So do you see this somehow encompassing locks other than heavyweight locks? Because I think that's the biggest need here. Basically, something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't depend on dtrace.

For these global statistics I see as important a common total waiting time for locks - we can use a more detailed granularity but I am not sure, if a common statistics are best tool.

Locks may be global, but what you're waiting for a lock on certainly isn't. It's almost always a lock either on a table or a row in a table. Of course this does mean you can't just blindly report that you're blocked on some XID; that doesn't tell anyone anything.

My motivations is - look to statistics -- and I can see ... lot of rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue too. It is tool for people without possibility to use dtrace and similar tools and for everyday usage - simple check if locks are not a issue (or if locking is stable).

Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just about as useful. Or just turn on lock logging.

If you really want to add it at the database level I'm not opposed (so long as it leaves the door open for more granular locking later), but I can't really get excited about it either.

and this proposal has sense only for heavyweight locks - because others locks are everywhere

So what if they're everywhere? Right now if you're spending a lot of time waiting for LWLocks you have no way to know what's going on unless you happen to have dtrace. Obviously we're not going to something like issue a stats update every time we attempt to acquire an LWLock, but that doesn't mean we can't keep some counters on the locks and periodically report that.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jim Nasby (#7)
Re: proposal: lock_time for pg_stat_database

2015-01-16 20:33 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/16/15 12:30 PM, Pavel Stehule wrote:

2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>>:

2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>>:

On 1/16/15 11:35 AM, Pavel Stehule wrote:

2015-01-16 18:23 GMT+01:00 Jim Nasby <
Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> <mailto:
Jim.Nasby@bluetreble.__com <mailto:Jim.Nasby@bluetreble.com>>>:

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement
related to query. A main issue was a method, how to show this information.
Today proposal is little bit simpler, but still useful. We can show a total
lock time per database in pg_stat_database statistics. High number can be
signal about lock issues.

Would this not use the existing stats mechanisms? If so,
couldn't we do this per table? (I realize that won't handle all cases; we'd
still need a "lock_time_other" somewhere).

it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table -
because it depends on order

Huh? Order of what?

when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is
locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have
to cont as lock time for T1 and T2?

If that select is waiting on a lock on t2, then it's waiting on that lock
on that table. It doesn't matter who else has the lock.

Also, what do you mean by 'lock'? Heavyweight? We

already have some visibility there. What I wish we had was some way to know
if we're spending a lot of time in a particular non-heavy lock. Actually
measuring time probably wouldn't make sense but we might be able to count
how often we fail initial acquisition or something.

now, when I am thinking about it, lock_time is not good name
- maybe "waiting lock time" (lock time should not be interesting, waiting
is interesting) - it can be divided to some more categories - in GoodData
we use Heavyweight, pages, and others categories.

So do you see this somehow encompassing locks other than
heavyweight locks? Because I think that's the biggest need here. Basically,
something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't
depend on dtrace.

For these global statistics I see as important a common total waiting
time for locks - we can use a more detailed granularity but I am not sure,
if a common statistics are best tool.

Locks may be global, but what you're waiting for a lock on certainly
isn't. It's almost always a lock either on a table or a row in a table. Of
course this does mean you can't just blindly report that you're blocked on
some XID; that doesn't tell anyone anything.

My motivations is - look to statistics -- and I can see ... lot of

rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue
too. It is tool for people without possibility to use dtrace and similar
tools and for everyday usage - simple check if locks are not a issue (or if
locking is stable).

Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just
about as useful. Or just turn on lock logging.

If you really want to add it at the database level I'm not opposed (so
long as it leaves the door open for more granular locking later), but I
can't really get excited about it either.

and this proposal has sense only for heavyweight locks - because others

locks are everywhere

So what if they're everywhere? Right now if you're spending a lot of time
waiting for LWLocks you have no way to know what's going on unless you
happen to have dtrace. Obviously we're not going to something like issue a
stats update every time we attempt to acquire an LWLock, but that doesn't
mean we can't keep some counters on the locks and periodically report that.

I have a plan to update statistics when all necessary keys are acquired -
so it is once per statement - it is similar press on stats system like now.

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

#9Pavel Stehule
pavel.stehule@gmail.com
In reply to: Jim Nasby (#7)
Re: proposal: lock_time for pg_stat_database

Hi

2015-01-16 20:33 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:

On 1/16/15 12:30 PM, Pavel Stehule wrote:

2015-01-16 19:24 GMT+01:00 Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>>:

2015-01-16 19:06 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com
<mailto:Jim.Nasby@bluetreble.com>>:

On 1/16/15 11:35 AM, Pavel Stehule wrote:

2015-01-16 18:23 GMT+01:00 Jim Nasby <
Jim.Nasby@bluetreble.com <mailto:Jim.Nasby@bluetreble.com> <mailto:
Jim.Nasby@bluetreble.__com <mailto:Jim.Nasby@bluetreble.com>>>:

On 1/16/15 11:00 AM, Pavel Stehule wrote:

Hi all,

some time ago, I proposed a lock time measurement
related to query. A main issue was a method, how to show this information.
Today proposal is little bit simpler, but still useful. We can show a total
lock time per database in pg_stat_database statistics. High number can be
signal about lock issues.

Would this not use the existing stats mechanisms? If so,
couldn't we do this per table? (I realize that won't handle all cases; we'd
still need a "lock_time_other" somewhere).

it can use a current existing stats mechanisms

I afraid so isn't possible to assign waiting time to table -
because it depends on order

Huh? Order of what?

when you have a SELECT FROM T1, T2 and T1 is locked for t1, and T2 is
locked for t2 -- but if t2 < t1 then t2 is not important -- so what I have
to cont as lock time for T1 and T2?

If that select is waiting on a lock on t2, then it's waiting on that lock
on that table. It doesn't matter who else has the lock.

Also, what do you mean by 'lock'? Heavyweight? We

already have some visibility there. What I wish we had was some way to know
if we're spending a lot of time in a particular non-heavy lock. Actually
measuring time probably wouldn't make sense but we might be able to count
how often we fail initial acquisition or something.

now, when I am thinking about it, lock_time is not good name
- maybe "waiting lock time" (lock time should not be interesting, waiting
is interesting) - it can be divided to some more categories - in GoodData
we use Heavyweight, pages, and others categories.

So do you see this somehow encompassing locks other than
heavyweight locks? Because I think that's the biggest need here. Basically,
something akin to TRACE_POSTGRESQL_LWLOCK_WAIT___START() that doesn't
depend on dtrace.

For these global statistics I see as important a common total waiting
time for locks - we can use a more detailed granularity but I am not sure,
if a common statistics are best tool.

Locks may be global, but what you're waiting for a lock on certainly
isn't. It's almost always a lock either on a table or a row in a table. Of
course this does mean you can't just blindly report that you're blocked on
some XID; that doesn't tell anyone anything.

My motivations is - look to statistics -- and I can see ... lot of

rollbacks -- issue, lot of deadlocks -- issue, lot of waiting time -- issue
too. It is tool for people without possibility to use dtrace and similar
tools and for everyday usage - simple check if locks are not a issue (or if
locking is stable).

Meh. SELECT sum(state_change) FROM pg_stat_activity WHERE waiting is just
about as useful. Or just turn on lock logging.

If you really want to add it at the database level I'm not opposed (so
long as it leaves the door open for more granular locking later), but I
can't really get excited about it either.

and this proposal has sense only for heavyweight locks - because others

locks are everywhere

So what if they're everywhere? Right now if you're spending a lot of time
waiting for LWLocks you have no way to know what's going on unless you
happen to have dtrace. Obviously we're not going to something like issue a
stats update every time we attempt to acquire an LWLock, but that doesn't
mean we can't keep some counters on the locks and periodically report that.

I was wrong - probably is possible to attach lock waiting time per table

Regards

Pavel

Show quoted text

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com