performance statistics monitoring without spamming logs
I'm looking for a way of gathering performance stats in a more usable
way than turning on `log_statement_stats` (or other related modules).
The problem I have with the log_*_stats family of modules is that they
log every single query, which makes them unusable in production. Aside
from consuming space, there's also the problem that the log system
wouldn't be able to keep up with the rate.
There are a couple ideas that pop into mind that would make these stats
more usable:
1. Only log when the statement would otherwise already be logged. Such
as due to the `log_statement` or `log_min_duration_statement` settings.
2. Make stats available in `pg_stat_statements` (or alternate view that
could be joined on). The block stats are already available here, but
others like CPU usage, page faults, and context switches are not.
To answer why I want this data: I want to be able to identify queries
which are consuming large amounts of CPU time so that I can either
optimize the query or optimize the application making the query, and
free up CPU resources on the database. The `pg_stat_statements` view
provides the `total_time` metric, but many things can contribute to
query time other than CPU usage, and CPU usage is my primary concern at
the moment.
Do these seem like reasonable requests? And if so, what's the procedure
for getting them implemented?
Any thoughts on whether they would be hard to implement? I'm unfamiliar
with the PostgresQL code base, but might be willing to attempt an
implementation if it wouldn't be terribly difficult.
-Patrick
On Tue, Jul 10, 2018 at 01:54:12PM -0400, Patrick Hemmer wrote:
I'm looking for a way of gathering performance stats in a more usable
way than turning on `log_statement_stats` (or other related modules).
The problem I have with the log_*_stats family of modules is that they
log every single query, which makes them unusable in production. Aside
from consuming space, there's also the problem that the log system
wouldn't be able to keep up with the rate.There are a couple ideas that pop into mind that would make these stats
more usable:
1. Only log when the statement would otherwise already be logged. Such
as due to the `log_statement` or `log_min_duration_statement` settings.
Did you see: (Added Adrien to Cc);
https://commitfest.postgresql.org/18/1691/
I don't think the existing patch does what you want, but perhaps all that's
needed is this:
if (save_log_statement_stats)
+ if (log_sample_rate==1 || was_logged)
ShowUsage("EXECUTE MESSAGE STATISTICS");
In any case, I'm thinking that your request could/should be considered by
whatever future patch implements sampling (if not implemented/included in the
patch itself).
If that doesn't do what's needed, that patch might still be a good crash course
in how to start implementing what you need (perhaps on top of that patch).
2. Make stats available in `pg_stat_statements` (or alternate view that
could be joined on). The block stats are already available here, but
others like CPU usage, page faults, and context switches are not.
pg_stat_statements is ./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC.
getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c
Justin
On Tue, Jul 10, 2018 at 11:38 AM, Justin Pryzby <pryzby@telsasoft.com>
wrote:
2. Make stats available in `pg_stat_statements` (or alternate view that
could be joined on). The block stats are already available here, but
others like CPU usage, page faults, and context switches are not.pg_stat_statements is ./contrib/pg_stat_statements/pg_stat_statements.c
which is 3k LOC.getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c
Before you start implementing something here, take a look at pg_stat_kcache
[0]: https://github.com/powa-team/pg_stat_kcache
Which already aims to collect a few more system statistics than what
pg_stat_statements provides today, and might be a good basis to extend from.
It might also be worth to look at pg_stat_activity wait event sampling to
determine where a system spends time, see e.g. pg_wait_sampling [1]https://github.com/postgrespro/pg_wait_sampling for one
approach to this.
[0]: https://github.com/powa-team/pg_stat_kcache
[1]: https://github.com/postgrespro/pg_wait_sampling
Best,
Lukas
--
Lukas Fittl
On 07/13/2018 12:25 AM, Lukas Fittl wrote:
On Tue, Jul 10, 2018 at 11:38 AM, Justin Pryzby <pryzby@telsasoft.com
<mailto:pryzby@telsasoft.com>> wrote:2. Make stats available in `pg_stat_statements` (or alternate view that
could be joined on). The block stats are already available here, but
others like CPU usage, page faults, and context switches are not.pg_stat_statements is
./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC.getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c
Before you start implementing something here, take a look at
pg_stat_kcache [0]Which already aims to collect a few more system statistics than what
pg_stat_statements provides today, and might be a good basis to extend from.It might also be worth to look at pg_stat_activity wait event sampling
to determine where a system spends time, see e.g. pg_wait_sampling
[1] for one approach to this.
Hi,
You should look Powa stack :
https://github.com/powa-team/powa
Powa can aggregate metrics from different extensions such as
pg_stat_statements, pg_stat_kcache and pg_wait_sampling recently :
https://rjuju.github.io/postgresql/2018/07/09/wait-events-support-for-powa.html
Regards,
Show quoted text
[0]: https://github.com/powa-team/pg_stat_kcache
<https://github.com/powa-team/pg_stat_kcache>
[1]: https://github.com/postgrespro/pg_wait_sampling
<https://github.com/postgrespro/pg_wait_sampling>Best,
Lukas--
Lukas Fittl
On 07/10/2018 08:38 PM, Justin Pryzby wrote:
On Tue, Jul 10, 2018 at 01:54:12PM -0400, Patrick Hemmer wrote:
I'm looking for a way of gathering performance stats in a more usable
way than turning on `log_statement_stats` (or other related modules).
The problem I have with the log_*_stats family of modules is that they
log every single query, which makes them unusable in production. Aside
from consuming space, there's also the problem that the log system
wouldn't be able to keep up with the rate.There are a couple ideas that pop into mind that would make these stats
more usable:
1. Only log when the statement would otherwise already be logged. Such
as due to the `log_statement` or `log_min_duration_statement` settings.Did you see: (Added Adrien to Cc);
https://commitfest.postgresql.org/18/1691/I don't think the existing patch does what you want, but perhaps all that's
needed is this:if (save_log_statement_stats)
+ if (log_sample_rate==1 || was_logged)
ShowUsage("EXECUTE MESSAGE STATISTICS");In any case, I'm thinking that your request could/should be considered by
whatever future patch implements sampling (if not implemented/included in the
patch itself).
Hi,
Thanks for Cc, it seems a good idea. Will think about it ;)
Show quoted text
If that doesn't do what's needed, that patch might still be a good crash course
in how to start implementing what you need (perhaps on top of that patch).2. Make stats available in `pg_stat_statements` (or alternate view that
could be joined on). The block stats are already available here, but
others like CPU usage, page faults, and context switches are not.pg_stat_statements is ./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC.
getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c
Justin
On Fri, Jul 13, 2018 at 9:23 AM, Adrien NAYRAT
<adrien.nayrat@anayrat.info> wrote:
On 07/13/2018 12:25 AM, Lukas Fittl wrote:
On Tue, Jul 10, 2018 at 11:38 AM, Justin Pryzby <pryzby@telsasoft.com
<mailto:pryzby@telsasoft.com>> wrote:2. Make stats available in `pg_stat_statements` (or alternate view
that
could be joined on). The block stats are already available here, but
others like CPU usage, page faults, and context switches are not.pg_stat_statements is
./contrib/pg_stat_statements/pg_stat_statements.c which is 3k LOC.getrusage stuff and log_*_stat stuff is in src/backend/tcop/postgres.c
Before you start implementing something here, take a look at
pg_stat_kcache [0]Which already aims to collect a few more system statistics than what
pg_stat_statements provides today, and might be a good basis to extend from.
Also no one asked for it before, but we can definitely add all the
other fields returned by get_rusage(2) in pg_stat_kcache. You can
also look at https://github.com/markwkm/pg_proctab.
Hi,
I'm replying to an old thread from -performance:
/messages/by-id/7ffb9dbe-c76f-8ca3-12ee-7914ede872e6@stormcloud9.net
I was looking at:
https://commitfest.postgresql.org/20/1691/
"New GUC to sample log queries"
On Tue, Jul 10, 2018 at 01:54:12PM -0400, Patrick Hemmer wrote:
I'm looking for a way of gathering performance stats in a more usable
way than turning on `log_statement_stats` (or other related modules).
The problem I have with the log_*_stats family of modules is that they
log every single query, which makes them unusable in production. Aside
from consuming space, there's also the problem that the log system
wouldn't be able to keep up with the rate.There are a couple ideas that pop into mind that would make these stats
more usable:
1. Only log when the statement would otherwise already be logged. Such
as due to the `log_statement` or `log_min_duration_statement` settings.
..but instead came back to this parallel thread and concluded that I'm
interested in exactly that behavior: log_statement_stats only if
log_min_duration_statement exceeded.
If there's agreement that's desirable behavior, should I change
log_statement_stats to a GUC (on/off/logged) ?
Or, would it be reasonable to instead change the existing behavior of
log_statement_stats=on to mean what I want: only log statement stats if
statement is otherwise logged. If log_statement_stats is considered to be a
developer option, most likely to be enabled either in a development or other
segregated or non-production environment, or possibly for only a single session
for diagnostics.
My own use case is that I'd like to know if a longrunning query was doing lots
of analytics (aggregation/sorting), or possibly spinning away on a nested
nested loop. But I only care about the longest queries, and then probably look
at the ratio of user CPU/clock time.
Justin
On 11/22/18 6:41 AM, Justin Pryzby wrote:
and then probably look
at the ratio of user CPU/clock time.
Maybe pg_stat_kcache could help you :
https://github.com/powa-team/pg_stat_kcache
https://rjuju.github.io/postgresql/2018/07/17/pg_stat_kcache-2-1-is-out.html