Adding percentile metrics to pg_stat_statements module

Started by Igor Calabriaabout 6 years ago8 messages
#1Igor Calabria
igor.calabria@gmail.com

Hi everyone,

I was taking a look at pg_stat_statements module and noticed that it does
not collect any percentile metrics. I believe that It would be really handy
to have those available and I'd love to contribute with this feature.

The basic idea is to accumulate the the query execution times using an
approximation structure like q-digest or t-digest and add those results to
the pg_stat_statements table as fixed columns. Something like this

p90_time:
p95_time:
p99_time:
p70_time:
...

Another solution is to persist de digest structure in a binary column and
use a function to extract the desired quantile ilke this SELECT
approx_quantile(digest_times, 0.99) FROM pg_stat_statements

What do you guys think?
Cheers,

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Igor Calabria (#1)
Re: Adding percentile metrics to pg_stat_statements module

čt 31. 10. 2019 v 16:51 odesílatel Igor Calabria <igor.calabria@gmail.com>
napsal:

Hi everyone,

I was taking a look at pg_stat_statements module and noticed that it does
not collect any percentile metrics. I believe that It would be really handy
to have those available and I'd love to contribute with this feature.

The basic idea is to accumulate the the query execution times using an
approximation structure like q-digest or t-digest and add those results to
the pg_stat_statements table as fixed columns. Something like this

p90_time:
p95_time:
p99_time:
p70_time:
...

Another solution is to persist de digest structure in a binary column and
use a function to extract the desired quantile ilke this SELECT
approx_quantile(digest_times, 0.99) FROM pg_stat_statements

What do you guys think?

+ the idea

But I am not sure about performance and memory overhead

Pavel

Show quoted text

Cheers,

#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Igor Calabria (#1)
Re: Adding percentile metrics to pg_stat_statements module

On Thu, Oct 31, 2019 at 12:51:17PM -0300, Igor Calabria wrote:

Hi everyone,

I was taking a look at pg_stat_statements module and noticed that it does
not collect any percentile metrics. I believe that It would be really handy
to have those available and I'd love to contribute with this feature.

The basic idea is to accumulate the the query execution times using an
approximation structure like q-digest or t-digest and add those results to
the pg_stat_statements table as fixed columns. Something like this

p90_time:
p95_time:
p99_time:
p70_time:
...

Another solution is to persist de digest structure in a binary column and
use a function to extract the desired quantile ilke this SELECT
approx_quantile(digest_times, 0.99) FROM pg_stat_statements

IMO having some sort of CDF approximation (being a q-digest or t-digest)
would be useful, although it'd probably need to be optional (mostly
becuase of memory consumption).

I don't see why we would not store the digests themselves. Storing just
some selected percentiles would be pretty problematic due to losing a
lot of information on restart. Also, pg_stat_statements is not a table
but a view on in-memory hash table.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Igor Calabria
igor.calabria@gmail.com
In reply to: Tomas Vondra (#3)
Re: Adding percentile metrics to pg_stat_statements module

Yeah, I agree that there's no reason to store the digests themselves and I
really liked the idea of it being optional.
If it turns out that memory consumption on real workloads is small enough,
it could eventually be turned on by default.

I'll start working on patch

Em qui, 31 de out de 2019 às 16:32, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

Show quoted text

On Thu, Oct 31, 2019 at 12:51:17PM -0300, Igor Calabria wrote:

Hi everyone,

I was taking a look at pg_stat_statements module and noticed that it does
not collect any percentile metrics. I believe that It would be really

handy

to have those available and I'd love to contribute with this feature.

The basic idea is to accumulate the the query execution times using an
approximation structure like q-digest or t-digest and add those results to
the pg_stat_statements table as fixed columns. Something like this

p90_time:
p95_time:
p99_time:
p70_time:
...

Another solution is to persist de digest structure in a binary column and
use a function to extract the desired quantile ilke this SELECT
approx_quantile(digest_times, 0.99) FROM pg_stat_statements

IMO having some sort of CDF approximation (being a q-digest or t-digest)
would be useful, although it'd probably need to be optional (mostly
becuase of memory consumption).

I don't see why we would not store the digests themselves. Storing just
some selected percentiles would be pretty problematic due to losing a
lot of information on restart. Also, pg_stat_statements is not a table
but a view on in-memory hash table.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Igor Calabria (#4)
Re: Adding percentile metrics to pg_stat_statements module

On Fri, Nov 01, 2019 at 11:11:13AM -0300, Igor Calabria wrote:

Yeah, I agree that there's no reason to store the digests themselves and I
really liked the idea of it being optional.

That's not what I wrote. My point was that we *should* store the digests
themselves, otherwise we just introduce additional errors into the
estimates, because it discards the weights/frequencies.

If it turns out that memory consumption on real workloads is small enough,
it could eventually be turned on by default.

Maybe, but it's not just about memory consumption. CPU matters too.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Igor Calabria
igor.calabria@gmail.com
In reply to: Tomas Vondra (#5)
Re: Adding percentile metrics to pg_stat_statements module

That's not what I wrote. My point was that we *should* store the digests
themselves, otherwise we just introduce additional errors into the
estimates, because it discards the weights/frequencies.

Sorry. I meant to write "no reason to *not* store the digests"

Em sex, 1 de nov de 2019 às 11:17, Tomas Vondra <
tomas.vondra@2ndquadrant.com> escreveu:

Show quoted text

On Fri, Nov 01, 2019 at 11:11:13AM -0300, Igor Calabria wrote:

Yeah, I agree that there's no reason to store the digests themselves and I
really liked the idea of it being optional.

That's not what I wrote. My point was that we *should* store the digests
themselves, otherwise we just introduce additional errors into the
estimates, because it discards the weights/frequencies.

If it turns out that memory consumption on real workloads is small enough,
it could eventually be turned on by default.

Maybe, but it's not just about memory consumption. CPU matters too.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Adrien Nayrat
adrien.nayrat@anayrat.info
In reply to: Tomas Vondra (#3)
Re: Adding percentile metrics to pg_stat_statements module

On 10/31/19 8:32 PM, Tomas Vondra wrote:

IMO having some sort of CDF approximation (being a q-digest or t-digest)
would be useful, although it'd probably need to be optional (mostly
becuase of memory consumption).

+1, I like this idea. If we are afraid of CPU cost we could imagine some kind of
sampling or add the possibility to collect only for a specific queryid.

I dreamed of this kind of feature for PoWA. Thus, it could make possible to
compare CDF between two days for example, before and after introducing a change.

Regards,

--
Adrien NAYRAT

#8benoit
benoit@hopsandfork.com
In reply to: Adrien Nayrat (#7)
RE: Adding percentile metrics to pg_stat_statements module

Hello

While looking at slow queries on pg_stat_statements. I was looking for percentile fields..

If we are worried about CPU cost, maybe it could be useful to turn in on when you have a high stddev_exec_time for the query ?

Regards,

________________________________
De : Adrien Nayrat <adrien.nayrat@anayrat.info>
Envoyé : samedi 2 novembre 2019 10:23:49
À : Tomas Vondra; Igor Calabria
Cc : pgsql-hackers@postgresql.org
Objet : Re: Adding percentile metrics to pg_stat_statements module

On 10/31/19 8:32 PM, Tomas Vondra wrote:

IMO having some sort of CDF approximation (being a q-digest or t-digest)
would be useful, although it'd probably need to be optional (mostly
becuase of memory consumption).

+1, I like this idea. If we are afraid of CPU cost we could imagine some kind of
sampling or add the possibility to collect only for a specific queryid.

I dreamed of this kind of feature for PoWA. Thus, it could make possible to
compare CDF between two days for example, before and after introducing a change.

Regards,

--
Adrien NAYRAT