pg_stat_activity.waiting_start

Started by Joel Jacobsonover 9 years ago32 messageshackers
Jump to latest
#1Joel Jacobson
joel@trustly.com

Hi hackers,

We already have xact_start, query_start and backend_start
to get the timestamptz for when different things happened.

I would like to propose adding a fourth such column, "waiting_start",
which would tell how long time a backend has been waiting.

The column would be NULL when waiting=FALSE.

While it's trivial to write a script that just polls pg_stat_activity
every second and keeps tack of when a backend started
waiting by just checking for any new waiting=TRUE rows,
it would be more convenient to just get the information from
pg_stat_activity directly.

The use-case would be e.g. monitoring tools
when you want to know how long time queries are waiting.

--
Joel Jacobson

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Joel Jacobson
joel@trustly.com
In reply to: Joel Jacobson (#1)
Re: pg_stat_activity.waiting_start

Actually, "wait_start" is a better name to match the others
("wait_event_type" and "wait_event").

On Sat, Dec 24, 2016 at 8:20 AM, Joel Jacobson <joel@trustly.com> wrote:

Hi hackers,

We already have xact_start, query_start and backend_start
to get the timestamptz for when different things happened.

I would like to propose adding a fourth such column, "waiting_start",
which would tell how long time a backend has been waiting.

The column would be NULL when waiting=FALSE.

While it's trivial to write a script that just polls pg_stat_activity
every second and keeps tack of when a backend started
waiting by just checking for any new waiting=TRUE rows,
it would be more convenient to just get the information from
pg_stat_activity directly.

The use-case would be e.g. monitoring tools
when you want to know how long time queries are waiting.

--
Joel Jacobson

--
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joel Jacobson (#1)
Re: pg_stat_activity.waiting_start

Joel Jacobson <joel@trustly.com> writes:

We already have xact_start, query_start and backend_start
to get the timestamptz for when different things happened.

I would like to propose adding a fourth such column, "waiting_start",
which would tell how long time a backend has been waiting.

The difficulty with that is it'd require a gettimeofday() call for
every wait start. Even on platforms where those are relatively cheap,
the overhead would be nasty --- and on some platforms, it'd be
astonishingly bad. We sweated quite a lot to get the overhead of
pg_stat_activity wait monitoring down to the point where it would be
tolerable for non-heavyweight locks, but I'm afraid this would push
it back into the not-tolerable range.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#3)
Re: pg_stat_activity.waiting_start

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Joel Jacobson <joel@trustly.com> writes:

We already have xact_start, query_start and backend_start
to get the timestamptz for when different things happened.

I would like to propose adding a fourth such column, "waiting_start",
which would tell how long time a backend has been waiting.

The difficulty with that is it'd require a gettimeofday() call for
every wait start. Even on platforms where those are relatively cheap,
the overhead would be nasty --- and on some platforms, it'd be
astonishingly bad. We sweated quite a lot to get the overhead of
pg_stat_activity wait monitoring down to the point where it would be
tolerable for non-heavyweight locks, but I'm afraid this would push
it back into the not-tolerable range.

Could we handle this like log_lock_waits..?

Thanks!

Stephen

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#4)
Re: pg_stat_activity.waiting_start

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

The difficulty with that is it'd require a gettimeofday() call for
every wait start. Even on platforms where those are relatively cheap,
the overhead would be nasty --- and on some platforms, it'd be
astonishingly bad. We sweated quite a lot to get the overhead of
pg_stat_activity wait monitoring down to the point where it would be
tolerable for non-heavyweight locks, but I'm afraid this would push
it back into the not-tolerable range.

Could we handle this like log_lock_waits..?

Well, that only applies to heavyweight locks, which do a gettimeofday
anyway in order to schedule the deadlock-check timeout. If you were
willing to populate this new column only for heavyweight locks, maybe it
could be done for minimal overhead. But that would be backsliding
quite a lot compared to what we just did to extend pg_stat_activity's
coverage of lock types.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Joel Jacobson
joel@trustly.com
In reply to: Tom Lane (#3)
Re: pg_stat_activity.waiting_start

On Sat, Dec 24, 2016 at 9:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I would like to propose adding a fourth such column, "waiting_start",
which would tell how long time a backend has been waiting.

The difficulty with that is it'd require a gettimeofday() call for
every wait start. Even on platforms where those are relatively cheap,
the overhead would be nasty --- and on some platforms, it'd be
astonishingly bad. We sweated quite a lot to get the overhead of
pg_stat_activity wait monitoring down to the point where it would be
tolerable for non-heavyweight locks, but I'm afraid this would push
it back into the not-tolerable range.

I don't think we need the microsecond resolution provided by
gettimeofday() via GetCurrentTimestamp()
It would be enough to know which second the waiting started, so we
could use time().

gettimeofday() takes 42 cycles.
time() only takes 3 cycles. [1]http://stackoverflow.com/questions/6498972/faster-equivalent-of-gettimeofday

[1]: http://stackoverflow.com/questions/6498972/faster-equivalent-of-gettimeofday

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Joel Jacobson
joel@trustly.com
In reply to: Joel Jacobson (#6)
Re: pg_stat_activity.waiting_start

On Sat, Dec 24, 2016 at 9:56 AM, Joel Jacobson <joel@trustly.com> wrote:

The difficulty with that is it'd require a gettimeofday() call for
every wait start. Even on platforms where those are relatively cheap,

I just realized how this can be optimized.
We only need to set wait_start for every new waiting period,
not for every wait start, i.e. not for every call to pgstat_report_wait_start():

Example:

In pgstat_report_wait_start():
if (proc->wait_start == 0)
proc->wait_start = (pg_time_t) time(NULL);

And then in pgstat_report_wait_end():
proc->wait_start = 0;

This means we only need to call time() or gettimeofday() once per
waiting period.

--
Joel Jacobson

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Joel Jacobson
joel@trustly.com
In reply to: Joel Jacobson (#7)
Re: pg_stat_activity.waiting_start

Attached is a patch implementing the seconds-resolution wait_start, but
presented as a timestamptz to the user, just like the other *_start fields:

commit c001e5c537e36d2683a7e55c7c8bfcc154de4c9d
Author: Joel Jacobson <joel@trustly.com>
Date: Sat Dec 24 13:20:09 2016 +0700

Add OUT parameter "wait_start" timestamptz to pg_stat_get_activity()
and pg_catalog.pg_stat_activity

This is set to the timestamptz with seconds resolution
when the process started waiting, and reset to NULL
when it's not waiting any longer.

This is useful if you want to know not only what the process is
waiting for, but also how long it has been waiting,
which can be useful in situations when it might be
normal for different users/applications to wait for some amount of time,
but abnormal if they are waiting longer than some threshold.

When such a threshold is exceeded, monitoring applications
could then alert the user or possibly cancel/terminate
the blocking processes.

Without information on how long time processes have been waiting,
the monitoring applications would have no other option than
to cancel/terminate a process as soon as something is waiting,
or keep track of how long time processes have been waiting
by polling and keeping track on a per process basis,
which is less user-friendly than if PostgreSQL would provide
the information directly to the user.

src/backend/catalog/system_views.sql | 3 ++-
src/backend/postmaster/pgstat.c | 1 +
src/backend/storage/lmgr/proc.c | 1 +
src/backend/utils/adt/pgstatfuncs.c | 7 ++++++-
src/include/catalog/pg_proc.h | 2 +-
src/include/pgstat.h | 6 +++++-
src/include/storage/proc.h | 3 +++
7 files changed, 19 insertions(+), 4 deletions(-)

On Sat, Dec 24, 2016 at 12:32 PM, Joel Jacobson <joel@trustly.com> wrote:

On Sat, Dec 24, 2016 at 9:56 AM, Joel Jacobson <joel@trustly.com> wrote:

The difficulty with that is it'd require a gettimeofday() call for
every wait start. Even on platforms where those are relatively cheap,

I just realized how this can be optimized.
We only need to set wait_start for every new waiting period,
not for every wait start, i.e. not for every call to
pgstat_report_wait_start():

Example:

In pgstat_report_wait_start():
if (proc->wait_start == 0)
proc->wait_start = (pg_time_t) time(NULL);

And then in pgstat_report_wait_end():
proc->wait_start = 0;

This means we only need to call time() or gettimeofday() once per
waiting period.

--
Joel Jacobson

--
Joel Jacobson

Mobile: +46703603801
*Trustly.com <http://trustly.com/&gt; | Newsroom
<http://www.mynewsdesk.com/trustly_en&gt; | LinkedIn
<https://www.linkedin.com/company/trustly-group-ab&gt; | **Twitter
<https://twitter.com/Trustly&gt;*

* <https://trustly.com/&gt;*

Attachments:

0001-pg_stat_get_activity_wait_start.patchapplication/octet-stream; name=0001-pg_stat_get_activity_wait_start.patchDownload+23-9
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joel Jacobson (#6)
Re: pg_stat_activity.waiting_start

Joel Jacobson <joel@trustly.com> writes:

On Sat, Dec 24, 2016 at 9:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The difficulty with that is it'd require a gettimeofday() call for
every wait start.

I don't think we need the microsecond resolution provided by
gettimeofday() via GetCurrentTimestamp()
It would be enough to know which second the waiting started, so we
could use time().

For some use-cases, perhaps ...

gettimeofday() takes 42 cycles.
time() only takes 3 cycles. [1]

... on some platforms, perhaps, back in 2011. My own quick testing flatly
contradicts that. On Linux (RHEL6) x86_64, I see gettimeofday() taking
about 40ns (or circa 100 CPU cycles @ 2.4GHz), time() only a shade better
at 38ns. On macOS x86_64 2.7GHz, I see gettimeofday() taking about 37ns,
time() way behind at 178ns. I think we'd need at least an order of
magnitude cheaper to consider putting timing calls into spinlock or lwlock
paths, and that's just not available at all, let alone portably.

This is not an easy problem. See our most recent discussion at
/messages/by-id/31856.1400021891@sss.pgh.pa.us

I'm prepared to consider an argument that wait timing might have weaker
requirements than EXPLAIN ANALYZE (which certainly needs to measure short
durations) but you didn't actually make that argument.

regards, tom lane

Linux:

$ gcc -Wall -O2 -o time-timing-calls time-timing-calls.c
$ time ./time-timing-calls

real 0m37.678s
user 0m37.685s
sys 0m0.002s
$ gcc -Wall -O2 -o time-timing-calls -DUSE_GT time-timing-calls.c
$ time ./time-timing-calls

real 0m39.964s
user 0m39.971s
sys 0m0.001s

macOS:

$ gcc -Wall -O2 -o time-timing-calls time-timing-calls.c
$ time ./time-timing-calls

real 2m58.683s
user 2m58.515s
sys 0m0.131s
$ gcc -Wall -O2 -o time-timing-calls -DUSE_GT time-timing-calls.c
$ time ./time-timing-calls

real 0m37.004s
user 0m36.993s
sys 0m0.006s

Attachments:

time-timing-calls.ctext/x-c; charset=us-ascii; name=time-timing-calls.cDownload
#10Joel Jacobson
joel@trustly.com
In reply to: Tom Lane (#9)
Re: pg_stat_activity.waiting_start

This is not an easy problem. See our most recent discussion at
/messages/by-id/31856.1400021891@sss.pgh.pa.us

Thanks for the small test program.
I tested it on my MacBook Pro and gettimeofday() was way faster than time().

The clock_gettime() used by the patch clock_gettime_1.patch in the
mailing thread from Hari Babu apparently doesn't work on OS X.

Instead, I tested the OS X specific mach_absolute_time() which was the fastest:

gcc -Wall -O2 -o time-timing-calls -DUSE_MACH_ABSOLUTE_TIME time-timing-calls.c
time ./time-timing-calls
real 0m16.806s
user 0m16.781s
sys 0m0.012s

gcc -Wall -O2 -o time-timing-calls -DUSE_GETTIMEOFDAY time-timing-calls.c
time ./time-timing-calls
real 0m35.466s
user 0m35.062s
sys 0m0.393s

Code:

#ifdef __MACH__
#include <mach/mach_time.h>
#endif
#ifdef USE_MACH_ABSOLUTE_TIME
uint64_t tv;
tv = mach_absolute_time();
#endif

I'm prepared to consider an argument that wait timing might have weaker
requirements than EXPLAIN ANALYZE (which certainly needs to measure short
durations) but you didn't actually make that argument.

I can see why timing overload is a problem in EXPLAIN ANALYZE and at
other places,
and that would of course be a great thing to fix.

However, I'm not sure I fully understand how it can be that much of a
problem in pgstat_report_wait_start()?

As far as I can tell from reading the source code, it only appears
pgstat_report_wait_start() is only entered when a process is waiting?

Is it not likely the time spent waiting will vastly exceed the amount
of extra time for the gettimeofday() call?

Is it really a typical real-life scenario that processes can be
waiting extremely often for extremely short periods of time,
where the timing overhead would be significant?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Joel Jacobson (#10)
Re: pg_stat_activity.waiting_start

On December 25, 2016 1:21:43 AM GMT+01:00, Joel Jacobson <joel@trustly.com> wrote:

Is it really a typical real-life scenario that processes can be
waiting extremely often for extremely short periods of time,
where the timing overhead would be significant?

Yes. Consider WAL insertion, procarray or other similar contended locks.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Joel Jacobson
joel@trustly.com
In reply to: Andres Freund (#11)
Re: pg_stat_activity.waiting_start

On Sun, Dec 25, 2016 at 8:01 PM, Andres Freund <andres@anarazel.de> wrote:

On December 25, 2016 1:21:43 AM GMT+01:00, Joel Jacobson <joel@trustly.com> wrote:

Is it really a typical real-life scenario that processes can be
waiting extremely often for extremely short periods of time,
where the timing overhead would be significant?

Yes. Consider WAL insertion, procarray or other similar contended locks.

Ah, I see, then I understand it has to be blazingly fast.

Maybe a good tradeoff then would be to let "wait_start" represent the
very first time the txn started waiting?
That way gettimeofday() would only be called once per txn, and the
value would be remembered, but not exposed when the txn is not
waiting.
If the txn is waiting/not waiting multiple times during it's
life-time, the same "wait_start" value would be exposed when it's
waiting, and NULL when it's not. Sounds good?

As long as the documentation is clear on "wait_start" meaning when the
first wait started in the txn, I think that's useful enough to improve
the situation, as one could then ask a query like "select all
processes that have possibly been waiting for at least 5 seconds",
which you cannot do today.

The best you can do today is ask a query like "select all processes
that are waiting and have been running for at least 5 seconds", but
during those 5 seconds they have been running, they might only have
been waiting for the very last few milliseconds, which might not be a
problem at all. If instead knowing they were waiting 5 seconds ago,
and are still waiting, but might have had periods in between where
they were not waiting, I would say that is close enough to what I as a
user want to know, and can use that information for automatic
decision-making on e.g. if I want to terminate other blocking
processes.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joel Jacobson (#12)
Re: pg_stat_activity.waiting_start

Joel Jacobson <joel@trustly.com> writes:

Maybe a good tradeoff then would be to let "wait_start" represent the
very first time the txn started waiting?
...
As long as the documentation is clear on "wait_start" meaning when the
first wait started in the txn, I think that's useful enough to improve
the situation, as one could then ask a query like "select all
processes that have possibly been waiting for at least 5 seconds",
which you cannot do today.

Meh. You *can* do that now: query pg_stat_activity for waiting processes,
wait a couple seconds, query again, intersect the results. I think really
the only useful improvement on that would be to be able to tell that the
process has been blocked continuously for X seconds, and what you're
proposing here won't do that.

In practice, there should never be waits on LWLocks (much less spinlocks)
that exceed order-of-milliseconds; if there are, either we chose the wrong
lock type or the system is pretty broken in general. So maybe it's
sufficient if we provide a wait start time for heavyweight locks ...
though that still seems kind of ugly. (Also, I don't recall the existing
code factorization there, but getting the start time into pg_stat_activity
without an extra gettimeofday call might be hard. As I said, there is
one being done, but I'm not sure how accessible its result is.)

I did a bit more research over the weekend into the cost of gettimeofday
and clock_gettime, and have some new results that I'll post into that
thread shortly. But the short answer is still "they're damn expensive
on some platforms, and not really cheap anywhere".

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#13)
Re: pg_stat_activity.waiting_start

Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

In practice, there should never be waits on LWLocks (much less spinlocks)
that exceed order-of-milliseconds; if there are, either we chose the wrong
lock type or the system is pretty broken in general. So maybe it's
sufficient if we provide a wait start time for heavyweight locks ...
though that still seems kind of ugly.

While I agree that it's a bit ugly, if the alternative is "don't have
anything", then I'd argue that it's worth it. The use-case for this
feature, as I see it, is for admins to be able to go look at how long
something has been waiting and monitoring scripts to which fire only
every minute or more, and order-of-milliseconds differences aren't
significant there.

It's terribly ugly, but from a practical standpoint, we could probably
make it "waiting after deadlock timeout" and just set the time when the
deadlock timeout fires and the use-case for this would be satisfied.

Thanks!

Stephen

#15Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#9)
Re: pg_stat_activity.waiting_start

On Dec 24, 2016 5:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think we'd need at least an order of
magnitude cheaper to consider putting timing calls into spinlock or lwlock
paths, and that's just not available at all, let alone portably.

For spinlocks we could conceivably just bite the bullet and use a raw rdtsc
or the equivalent for other platforms. It might be pretty easy to
distinguish sane numbers from numbers that result after a process
reschedule and we could just discard data when that happens (or count
occurrences).

That may possibly work for spinlocks but it won't work for anything heavier
where process reschedules are routine.

#16Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Bruce Momjian (#15)
Re: pg_stat_activity.waiting_start

On 12/27/16 11:17 AM, Greg Stark wrote:

On Dec 24, 2016 5:44 PM, "Tom Lane" <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

I think we'd need at least an order of
magnitude cheaper to consider putting timing calls into spinlock or
lwlock
paths, and that's just not available at all, let alone portably.

For spinlocks we could conceivably just bite the bullet and use a raw
rdtsc or the equivalent for other platforms. It might be pretty easy to
distinguish sane numbers from numbers that result after a process
reschedule and we could just discard data when that happens (or count
occurrences).

That may possibly work for spinlocks but it won't work for anything
heavier where process reschedules are routine.

Anything heavier and a ~50ns gettimeofday() call is probably not that
bad anymore...

I also wonder if setting an alarm is cheap enough to happen during a
spinlock? Set a fairly short alarm on entry, clear it on exit. If the
alarm fires, do the gettimeofday(). Add the alarm time back in and
you're going to be close enough to when the wait started for any
practical use of pg_stat_activity (perhaps this is what Stephen is
suggesting about deadlock timeout...)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#5)
Re: pg_stat_activity.waiting_start

On Sat, Dec 24, 2016 at 7:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

The difficulty with that is it'd require a gettimeofday() call for
every wait start. Even on platforms where those are relatively cheap,
the overhead would be nasty --- and on some platforms, it'd be
astonishingly bad. We sweated quite a lot to get the overhead of
pg_stat_activity wait monitoring down to the point where it would be
tolerable for non-heavyweight locks, but I'm afraid this would push
it back into the not-tolerable range.

Could we handle this like log_lock_waits..?

Well, that only applies to heavyweight locks, which do a gettimeofday
anyway in order to schedule the deadlock-check timeout. If you were
willing to populate this new column only for heavyweight locks, maybe it
could be done for minimal overhead. But that would be backsliding
quite a lot compared to what we just did to extend pg_stat_activity's
coverage of lock types.

Can we think of introducing new guc trace_system_waits or something
like that which will indicate that the sessions will report the value
of wait_start in pg_stat_activity? The default value of such a
parameter can be false which means wait_start will be shown as NULL in
pg_stat_activity and when it is enabled the wait_start can show the
time as proposed in this thread.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Amit Kapila (#17)
Re: pg_stat_activity.waiting_start

On 12/28/16 7:10 AM, Amit Kapila wrote:

Can we think of introducing new guc trace_system_waits or something
like that which will indicate that the sessions will report the value
of wait_start in pg_stat_activity? The default value of such a
parameter can be false which means wait_start will be shown as NULL in
pg_stat_activity and when it is enabled the wait_start can show the
time as proposed in this thread.

In my experience the problem with those kind of settings is that they're
never enabled when you actually need them. I think it'd be much better
to find a way to always capture wait_starts that are over some minimum
duration, where collection overhead won't matter but you still have some
good info about what's going on. For pg_stat_activity I'd think that
threshold would be on the order of 50-100ms, though maybe there's other
places where a tighter tolerance would help.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#18)
Re: pg_stat_activity.waiting_start

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:

On 12/28/16 7:10 AM, Amit Kapila wrote:

Can we think of introducing new guc trace_system_waits or something
like that which will indicate that the sessions will report the value
of wait_start in pg_stat_activity?

In my experience the problem with those kind of settings is that they're
never enabled when you actually need them.

Yeah. And they especially won't be enabled in production situations,
if people find out that they cause lots of overhead.

I think it'd be much better
to find a way to always capture wait_starts that are over some minimum
duration, where collection overhead won't matter but you still have some
good info about what's going on. For pg_stat_activity I'd think that
threshold would be on the order of 50-100ms, though maybe there's other
places where a tighter tolerance would help.

The idea of just capturing the wait start for heavyweight locks, and
not other lock types, still seems superior to any of the alternatives
that have been suggested ...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Tom Lane (#19)
Re: pg_stat_activity.waiting_start

On 12/28/16 11:25 AM, Tom Lane wrote:

The idea of just capturing the wait start for heavyweight locks, and
not other lock types, still seems superior to any of the alternatives
that have been suggested ...

Is some kind of alarm a viable option for the others? If setting the
alarm is cheap, you could just set one for say 5ms when you have to wait
on a lock. If setting one is too expensive perhaps one could be set at
transaction start that looks at a global (and the global would be set
when a wait started). Obviously that means there's inaccuracy to the
true time spent waiting, but I think that's certainly fine for
pg_stat_activity. Most importantly, it would mean that if something has
gone horribly wrong you'd at least have some kind of relatively accurate
duration to work from.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Nasby (#20)
#22Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#19)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#21)
#24Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#13)
#25Joel Jacobson
joel@trustly.com
In reply to: Bruce Momjian (#24)
#26Bruce Momjian
bruce@momjian.us
In reply to: Joel Jacobson (#25)
#27Andres Freund
andres@anarazel.de
In reply to: Bruce Momjian (#26)
#28Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#24)
#29Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#28)
#30Joel Jacobson
joel@trustly.com
In reply to: Bruce Momjian (#28)
#31Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#23)
#32Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Joel Jacobson (#30)