Sample rate added to pg_stat_statements

Started by Ilia Evdokimovover 1 year ago58 messageshackers

ilya.evdokimov@tantorlabs.com

over 1 year ago

Hi hackers,

Under high-load scenarios with a significant number of transactions per
second, pg_stat_statements introduces substantial overhead due to the
collection and storage of statistics. Currently, we are sometimes forced
to disable pg_stat_statements or adjust the size of the statistics using
pg_stat_statements.max, which is not always optimal. One potential
solution to this issue could be query sampling in pg_stat_statements.

A similar approach has been implemented in extensions like auto_explain
and pg_store_plans, and it has proven very useful in high-load systems.
However, this approach has its trade-offs, as it sacrifices statistical
accuracy for improved performance. This patch introduces a new
configuration parameter, pg_stat_statements.sample_rate for the
pg_stat_statements extension. The patch provides the ability to control
the sampling of query statistics in pg_stat_statements.

This patch serves as a proof of concept (POC), and I would like to hear
your thoughts on whether such an approach is viable and applicable.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#1)

Re: Sample rate added to pg_stat_statements

Hi everyone,

I believe we should also include this check in the pgss_ExecutorEnd()
function because sampling in pgss_ExecutorEnd() ensures that a query not
initially sampled in pgss_ExecutorStart() can still be logged if it
meets the pg_stat_statements.sample_rate criteria. This approach adds
flexibility by allowing critical queries to be captured while
maintaining efficient sampling.

I attached new version of patch.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

amborodin@acm.org

over 1 year ago

In reply to: Ilia Evdokimov (#2)

Re: Sample rate added to pg_stat_statements

On 18 Nov 2024, at 23:33, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:

Hi hackers,

Under high-load scenarios with a significant number of transactions per second, pg_stat_statements introduces substantial overhead due to the collection and storage of statistics. Currently, we are sometimes forced to disable pg_stat_statements or adjust the size of the statistics using pg_stat_statements.max, which is not always optimal. One potential solution to this issue could be query sampling in pg_stat_statements.

A similar approach has been implemented in extensions like auto_explain and pg_store_plans, and it has proven very useful in high-load systems. However, this approach has its trade-offs, as it sacrifices statistical accuracy for improved performance. This patch introduces a new configuration parameter, pg_stat_statements.sample_rate for the pg_stat_statements extension. The patch provides the ability to control the sampling of query statistics in pg_stat_statements.

This patch serves as a proof of concept (POC), and I would like to hear your thoughts on whether such an approach is viable and applicable.

+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though. But at least it could give an easy way to rule out performance impact of pgss.

On 19 Nov 2024, at 15:09, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:

I believe we should also include this check in the pgss_ExecutorEnd() function because sampling in pgss_ExecutorEnd() ensures that a query not initially sampled in pgss_ExecutorStart() can still be logged if it meets the pg_stat_statements.sample_rate criteria. This approach adds flexibility by allowing critical queries to be captured while maintaining efficient sampling.

Is there a reason why pgss_ProcessUtility is excluded?

Best regards, Andrey Borodin.

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Andrey Borodin (#3)

Re: Sample rate added to pg_stat_statements

On 19.11.2024 15:11, Andrey M. Borodin wrote:

On 18 Nov 2024, at 23:33, Ilia Evdokimov<ilya.evdokimov@tantorlabs.com> wrote:

Hi hackers,

Under high-load scenarios with a significant number of transactions per second, pg_stat_statements introduces substantial overhead due to the collection and storage of statistics. Currently, we are sometimes forced to disable pg_stat_statements or adjust the size of the statistics using pg_stat_statements.max, which is not always optimal. One potential solution to this issue could be query sampling in pg_stat_statements.

A similar approach has been implemented in extensions like auto_explain and pg_store_plans, and it has proven very useful in high-load systems. However, this approach has its trade-offs, as it sacrifices statistical accuracy for improved performance. This patch introduces a new configuration parameter, pg_stat_statements.sample_rate for the pg_stat_statements extension. The patch provides the ability to control the sampling of query statistics in pg_stat_statements.

This patch serves as a proof of concept (POC), and I would like to hear your thoughts on whether such an approach is viable and applicable.

+1 for the idea. I heard a lot of complaints about that pgss is costly. Most of them were using it wrong though. But at least it could give an easy way to rule out performance impact of pgss.

Thank you for review.

On 19 Nov 2024, at 15:09, Ilia Evdokimov<ilya.evdokimov@tantorlabs.com> wrote:

I believe we should also include this check in the pgss_ExecutorEnd() function because sampling in pgss_ExecutorEnd() ensures that a query not initially sampled in pgss_ExecutorStart() can still be logged if it meets the pg_stat_statements.sample_rate criteria. This approach adds flexibility by allowing critical queries to be captured while maintaining efficient sampling.

Is there a reason why pgss_ProcessUtility is excluded?

Best regards, Andrey Borodin.

Ah, you’re right! Moreover, this check is needed not only in
pgss_ProcessUtility() but in all places where pgss_enabled() is called.
Therefore, it’s better to move the sampling check directly into
pgss_enabled().

However, another issue arises with the initialization of
'current_query_sample' variable that contains whether query is sampling
or not. Initializing it in pgss_ExecutorStart()||is not sufficient,
because pgss_post_parse_analyze() or pgss_ProcessUtility() might be
called earlier. This could lead to different values of
'current_query_sample' being used in these functions, which is
undesirable. Run the regression tests for pg_stat_statements with
initializing in pgss_ExecutorStart(), and you'll see this.

To avoid this, we need to find a function that is called earlier than
all the others. In my opinion, pgss_post_parse_analyze() is a good
candidate for this purpose. If you have objections or better
suggestions, feel free to share them. I attached the patch with fixes.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

Greg Sabino Mullane

greg@turnstep.com

over 1 year ago

In reply to: Andrey Borodin (#3)

Re: Sample rate added to pg_stat_statements

On Tue, Nov 19, 2024 at 7:12 AM Andrey M. Borodin <x4mmm@yandex-team.ru>
wrote:

+1 for the idea. I heard a lot of complaints about that pgss is costly.
Most of them were using it wrong though.

I'm curious what "using it wrong" means exactly?

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

Cheers,
Greg

Michael Paquier

michael@paquier.xyz

over 1 year ago

In reply to: Greg Sabino Mullane (#5)

Re: Sample rate added to pg_stat_statements

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core. This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans). The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

This would also get rid of the pgss text file for the queries, which
is a source of one of the bottlenecks, as we could just store query
strings upper-bounded based on a postmaster GUC to control the size of
the entries in the pgstats dshash. More normalization for IN and ANY
clauses would also help a not here, these are a cause of a lot of
bloat.

This integration is not something I will be able to work on for the
PG18 dev cycle as I'm in full review/commit mode for the rest of this
release, but I got some plans for it in PG19 except if somebody beats
me to it.
--
Michael

Greg Sabino Mullane

greg@turnstep.com

over 1 year ago

In reply to: Michael Paquier (#6)

Re: Sample rate added to pg_stat_statements

On Tue, Nov 19, 2024 at 5:07 PM Michael Paquier <michael@paquier.xyz> wrote:

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core.

Any particular reason these days we cannot push this into core and allow
disabling on startup? To say this extension is widely used would be an
understatement.

Cheers,
Greg

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Michael Paquier (#6)

Re: Sample rate added to pg_stat_statements

On 20.11.2024 01:07, Michael Paquier wrote:

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core. This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans). The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

This would also get rid of the pgss text file for the queries, which
is a source of one of the bottlenecks, as we could just store query
strings upper-bounded based on a postmaster GUC to control the size of
the entries in the pgstats dshash. More normalization for IN and ANY
clauses would also help a not here, these are a cause of a lot of
bloat.

This integration is not something I will be able to work on for the
PG18 dev cycle as I'm in full review/commit mode for the rest of this
release, but I got some plans for it in PG19 except if somebody beats
me to it.
--
Michael

I agree. Your proposal can indeed improve performance. Currently, I am
working on these changes and will validate them with benchmarks. Once I
have concrete results, I will open new threads to facilitate further
discussion.

However, in my opinion, the suggested improvements are not enough, and
sampling is essential.

1. I agree with Greg that pgss is widely used. It's quite odd that
sampling exists in 'auto_explain' but not in pgss.

2. If performance issues arise even after these improvements and it
turns out that pgss is the cause, the only painless solution without
restarting the instance is sampling. The current pgss's parameters are
not optimal for achieving this.

BTW, I forgot to include a case of nested statements. Either all will be
tracked or none. I attached new version of patch.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

amborodin@acm.org

over 1 year ago

In reply to: Greg Sabino Mullane (#5)

Re: Sample rate added to pg_stat_statements

On 19 Nov 2024, at 17:39, Greg Sabino Mullane <htamfids@gmail.com> wrote:

I'm curious what "using it wrong" means exactly?

Here's an example. pgSCV is querying pgss for every database separately every minute. It makes sense for the project. But when you have ~1000 databases, you have a lot of traffic to pgss alone. Even if you have only one active database, because all pgss results must be put into tuplestore before filtering. See [0]/messages/by-id/1AEEB240-9B68-44D5-8A29-8F9FDB22C801@yandex-team.ru for details.

Best regards, Andrey Borodin.

[0]: /messages/by-id/1AEEB240-9B68-44D5-8A29-8F9FDB22C801@yandex-team.ru

Alexander Korotkov

aekorotkov@gmail.com

over 1 year ago

In reply to: Michael Paquier (#6)

Re: Sample rate added to pg_stat_statements

On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core. This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans). The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck. And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread. But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold. The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

------
Regards,
Alexander Korotkov
Supabase

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alexander Korotkov (#10)

Re: Sample rate added to pg_stat_statements

On 22.11.2024 09:08, Alexander Korotkov wrote:

On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier<michael@paquier.xyz> wrote:

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core. This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans). The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck. And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread. But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold. The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

------
Regards,
Alexander Korotkov
Supabase

I really liked your idea, and I’d like to propose an enhancement that I
believe improves it further.

Yes, if a query’s execution time exceeds the threshold, it should always
be tracked without sampling. However, for queries with execution times
below the threshold, the sampling logic should prioritize shorter
queries over those closer to the threshold. In my view, the ideal
approach is for shorter queries to have the highest probability of being
sampled, while queries closer to the threshold are less likely to be
sampled.

This behavior can be achieved with the following logic:

pg_stat_statements.sample_exectime_threshold * random(0, 1) < total_time

Here’s how it works:

* As a query’s execution time approaches zero, the probability of it
being sampled approaches one.
* Conversely, as a query’s execution time approaches the threshold,
the probability of it being sampled approaches zero.

In other words, the sampling probability decreases linearly from 1 to 0
as the execution time gets closer to the threshold.

I believe this approach offers an ideal user experience. I have attached
a new patch implementing this logic. Please let me know if you have any
feedback regarding the comments in the code, the naming of variables or
documentation. I’m always open to discussion.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#11)

Re: Sample rate added to pg_stat_statements

On 26.11.2024 01:15, Ilia Evdokimov wrote:

On 22.11.2024 09:08, Alexander Korotkov wrote:

On Wed, Nov 20, 2024 at 12:07 AM Michael Paquier<michael@paquier.xyz> wrote:

On Tue, Nov 19, 2024 at 09:39:21AM -0500, Greg Sabino Mullane wrote:

Oh, and a +1 in general to the patch, OP, although it would also be nice to
start finding the bottlenecks that cause such performance issues.

FWIW, I'm not eager to integrate this proposal without looking at this
exact argument in depth.

One piece of it would be to see how much of such "bottlenecks" we
would be able to get rid of by integrating pg_stat_statements into
the central pgstats with the custom APIs, without pushing the module
into core. This means that we would combine the existing hash of pgss
to shrink to 8 bytes for objid rather than 13 bytes now as the current
code relies on (toplevel, userid, queryid) for the entry lookup (entry
removal is sniped with these three values as well, or dshash seq
scans). The odds of conflicts still still play in our favor even if
we have a few million entries, or even ten times that.

If you run "pgbench -S -M prepared" on a pretty large machine with
high concurrency, then spin lock in pgss_store() could become pretty
much of a bottleneck. And I'm not sure switching all counters to
atomics could somehow improve the situation given we already have
pretty many counters.

I'm generally +1 for the approach taken in this thread. But I would
suggest introducing a threshold value for a query execution time, and
sample just everything below that threshold. The slower query
shouldn't be sampled, because it can't be too frequent, and also it
could be more valuable to be counter individually (while very fast
queries probably only matter "in average").

------
Regards,
Alexander Korotkov
Supabase

I really liked your idea, and I’d like to propose an enhancement that
I believe improves it further.

Yes, if a query’s execution time exceeds the threshold, it should
always be tracked without sampling. However, for queries with
execution times below the threshold, the sampling logic should
prioritize shorter queries over those closer to the threshold. In my
view, the ideal approach is for shorter queries to have the highest
probability of being sampled, while queries closer to the threshold
are less likely to be sampled.

This behavior can be achieved with the following logic:

pg_stat_statements.sample_exectime_threshold * random(0, 1) < total_time

Here’s how it works:

* As a query’s execution time approaches zero, the probability of it
being sampled approaches one.
* Conversely, as a query’s execution time approaches the threshold,
the probability of it being sampled approaches zero.

In other words, the sampling probability decreases linearly from 1 to
0 as the execution time gets closer to the threshold.

I believe this approach offers an ideal user experience. I have
attached a new patch implementing this logic. Please let me know if
you have any feedback regarding the comments in the code, the naming
of variables or documentation. I’m always open to discussion.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

I’ve been closely reviewing my last (v5-*.patch) patch on implementing
time-based sampling, and I’ve realized that it might not be the best
approach. Let me explain the reasons.

* We can only perform sampling before the 'pgss_planner()' function.
However, at that point, we don’t yet know the query's execution time
since it only becomes available during 'pgss_ExecutorEnd()' or
'pgss_ProcessUtility()';
* If we wait to sample until the execution completes and we have the
actual execution time, this introduces a problem. By that point, we
might have already recorded the query's statistics into shared
memory from the 'pgss_planner()' making it too late to decide
whether to sample the query;
* Delaying sampling until the execution finishes would require waiting
for the execution time, which could introduce performance overhead.
This defeats the purpose of sampling, which aims to reduce the cost
of tracking query.

I believe we should reconsider the approach and revert to sampling based
on v4-*.patch. If I’ve missed anything or there’s an alternative way to
implement time threshold-based sampling efficiently, I’d be grateful to
hear your insights.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#12)

Re: Sample rate added to pg_stat_statements

Hi everyone,

I attached previous sampling patch for pg_stat_statements (v4).
Suggestions like sampling based on execution time remain unfeasible, as
pg_stat_statements can track query during query planning, not execution.

To evaluate the implementation, I ran a benchmark creating 1,000 random
tables and executing randomized JOIN queries on a small machine. When
pg_stat_statements enabled, performance decreases, but reducing the
sampling rate helps mitigate the impact and improves performance.

I’d be interested in hearing your new thoughts. Are there areas where
this patch could be improved, or other ideas worth exploring?

--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC.

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#13)

Re: Sample rate added to pg_stat_statements

On 10.12.2024 17:35, Ilia Evdokimov wrote:

Hi everyone,

I attached previous sampling patch for pg_stat_statements (v4).
Suggestions like sampling based on execution time remain unfeasible,
as pg_stat_statements can track query during query planning, not
execution.

To evaluate the implementation, I ran a benchmark creating 1,000
random tables and executing randomized JOIN queries on a small
machine. When pg_stat_statements enabled, performance decreases, but
reducing the sampling rate helps mitigate the impact and improves
performance.

I’d be interested in hearing your new thoughts. Are there areas where
this patch could be improved, or other ideas worth exploring?

--
Best regards.
Ilia Evdokimov,
Tantor Labs LLC.

I've fixed some small typos in the documentation and in the GUC
description in the attached patch. Any suggestions for improvements?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

amborodin@acm.org

over 1 year ago

In reply to: Ilia Evdokimov (#14)

Re: Sample rate added to pg_stat_statements

On 6 Jan 2025, at 15:50, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:

Any suggestions for improvements?

The patch looks good to me, just a few nits.

0. Perhaps, we could have a test for edge values of 0 and 1. I do not insist, just an idea to think about.

1. This code seems a little different from your patch. It is trying to avoid engaging PRNG. I'm not sure it's a good idea, but still. Also, it uses "<=", not "<".

xact_is_sampled = log_xact_sample_rate != 0 &&
(log_xact_sample_rate == 1 ||
pg_prng_double(&pg_global_prng_state) <= log_xact_sample_rate);

Thanks!

Best regards, Andrey Borodin.

samimseih@gmail.com

over 1 year ago

In reply to: Andrey Borodin (#15)

Re: Sample rate added to pg_stat_statements

Hi,

I was looking at this patch, and I was specifically curious about
how this works with prepared statements. The way the patch
works now is that it determines if the statement is to be sampled
at post_parse_analyze time which could lead to unexpected
behavior.

Let's take an example below in which the
pg_stat_statements.sample_rate is set to 0 ( to mimic
some sampling rate < 1 in which this query does not
get sampled ). At that point, all subsequent executions
of the statement will not get tracked at all. Is this
what is expected for prepared statements? My concern
is we will even lose more stats than what a user
may expect.

This of course will not be an issue for simple query.

postgres=# set pg_stat_statements.sample_rate = 0;
SET
postgres=# select pg_stat_statements_reset();
pg_stat_statements_reset
-------------------------------
2025-01-06 11:45:23.484793-06
(1 row)

postgres=# SELECT $1 \parse stmt

postgres=#
postgres=# \bind_named stmt 1 \g
?column?
----------
1
(1 row)

postgres=# \bind_named stmt 1 \g
?column?
----------
1
(1 row)

postgres=# set pg_stat_statements.sample_rate = 1;
SET
postgres=# \bind_named stmt 1 \g
?column?
----------
1
(1 row)

postgres=# \bind_named stmt 1 \g
?column?
----------
1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
query | calls
-------+-------
(0 rows)

Regards,

Sami Imseih
Amazon Web Services (AWS)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#16)

Re: Sample rate added to pg_stat_statements

On 06.01.2025 18:57, Andrey M. Borodin wrote:

On 6 Jan 2025, at 15:50, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:

Any suggestions for improvements?

The patch looks good to me, just a few nits.

0. Perhaps, we could have a test for edge values of 0 and 1. I do not insist, just an idea to think about.

I would add tests, because they are never useless. I've added a simple
test which verifies hash table with queries after setting sample_rate =
0.0 and sample_rate = 1.0.

1. This code seems a little different from your patch. It is trying to avoid engaging PRNG. I'm not sure it's a good idea, but still. Also, it uses "<=", not "<".

xact_is_sampled = log_xact_sample_rate != 0 &&
(log_xact_sample_rate == 1 ||
pg_prng_double(&pg_global_prng_state) <= log_xact_sample_rate);

Thanks!

Best regards, Andrey Borodin.

Are we sure we're discussing the same patch? Because these remarks refer
to the 5 version of the patch, which I abandoned due to your remarks.

On 06.01.2025 20:51, Sami Imseih wrote:

Hi,

I was looking at this patch, and I was specifically curious about
how this works with prepared statements. The way the patch
works now is that it determines if the statement is to be sampled
at post_parse_analyze time which could lead to unexpected
behavior.

Let's take an example below in which the
pg_stat_statements.sample_rate is set to 0 ( to mimic
some sampling rate < 1 in which this query does not
get sampled ). At that point, all subsequent executions
of the statement will not get tracked at all. Is this
what is expected for prepared statements? My concern
is we will even lose more stats than what a user
may expect.

This of course will not be an issue for simple query.

postgres=# set pg_stat_statements.sample_rate = 0;
SET
postgres=# select pg_stat_statements_reset();
pg_stat_statements_reset
-------------------------------
2025-01-06 11:45:23.484793-06
(1 row)

postgres=# SELECT $1 \parse stmt

postgres=#
postgres=# \bind_named stmt 1 \g
?column?
----------
1
(1 row)

postgres=# \bind_named stmt 1 \g
?column?
----------
1
(1 row)

postgres=# set pg_stat_statements.sample_rate = 1;
SET
postgres=# \bind_named stmt 1 \g
?column?
----------
1
(1 row)

postgres=# \bind_named stmt 1 \g
?column?
----------
1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
query | calls
-------+-------
(0 rows)

Regards,

Sami Imseih
Amazon Web Services (AWS)

You are right. This is absolutely unexpected for users. Thank you for
the review.

To fix this, I suggest storing a random number in the [0, 1) range in a
separate variable, which will be used for comparisons in any place. We
will compare 'sample_rate' and random value not only in
pgss_post_parse_analyze(), but also in pgss_ProcessUtility() and
pgss_planner().

I attached patch with test and fixes.

If you have any objections or suggestions on how to improve it, I'm
always open to feedback.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#17)

Re: Sample rate added to pg_stat_statements

I completely forgot about ordering pg_stat_statements in the test, which
is why the test failed in [0]https://cirrus-ci.com/task/5724458477944832. I've added ORDER BY query COLLATE "C" to
avoid any non-deterministic ordering in the table.

[0]: https://cirrus-ci.com/task/5724458477944832

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

amborodin@acm.org

over 1 year ago

In reply to: Ilia Evdokimov (#17)

Re: Sample rate added to pg_stat_statements

On 7 Jan 2025, at 16:05, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote:

1. This code seems a little different from your patch. It is trying to avoid engaging PRNG. I'm not sure it's a good idea, but still. Also, it uses "<=", not "<".

xact_is_sampled = log_xact_sample_rate != 0 &&
(log_xact_sample_rate == 1 ||
pg_prng_double(&pg_global_prng_state) <= log_xact_sample_rate);

Are we sure we're discussing the same patch? Because these remarks refer to the 5 version of the patch, which I abandoned due to your remarks.

Yes. v6 has this code
+	if (nesting_level == 0)
+		current_query_sampled = (pg_prng_double(&pg_global_prng_state) < pgss_sample_rate);

while upstream has code that I cited. And logic is slightly different.

Best regards, Andrey Borodin.

samimseih@gmail.com

over 1 year ago

In reply to: Andrey Borodin (#19)

Re: Sample rate added to pg_stat_statements

You are right. This is absolutely unexpected for users. Thank you for
the review.

To fix this, I suggest storing a random number in the [0, 1) range in a
separate variable, which will be used for comparisons in any place. We
will compare 'sample_rate' and random value not only in
pgss_post_parse_analyze(), but also in pgss_ProcessUtility() and
pgss_planner().

I attached patch with test and fixes.

I still think this is not the correct approach. It seems that post_parse_analyze
should not have to deal with checking for a sample rate. This is because
post_parse_analyze, which is the only hook with access to jstate, is
only responsible
for storing a normalized query text on disk and creating a not-yet
user visible entry
in the hash. i.e. pgss_store will never increment counters when called from
pgss_post_parse_analyze.

/* Increment the counts, except when jstate is not NULL */
if (!jstate)
{

What I think may be a better idea is to add something similar
to auto_explain.c, but it should only be added to the top of
pgss_planner, pgss_ExecutorStart and pgss_ProcessUtility.

if (nesting_level == 0)
{
if (!IsParallelWorker())
current_query_sampled = pg_prng_double(&pg_global_prng_state) <
pgss_sample_rate;
else
current_query_sampled = false;
}

This is needed for ExecutorStart and not ExecutorEnd because
ExecutorStart is where the instrumentation is initialized with
queryDesc->totaltime. The above code block could be
turned into a macro and reused in the routines mentioned.

However, it seems with this change, we can see a much
higher likelihood of non-normalized query texts being stored.
This is because jstate is only available during post_parse_analyze.
Therefore, if the first time you are sampling the statement is in ExecutorEnd,
then you will end up storing a non-normalized version of the query text,
see below example with the attached v8.

postgres=# set pg_stat_statements.sample_rate = 0;
SET
postgres=# select pg_stat_statements_reset();
pg_stat_statements_reset
-------------------------------
2025-01-07 13:02:11.807952-06
(1 row)

postgres=# SELECT 1 \parse stmt

postgres=# \bind_named stmt \g
?column?
----------
1
(1 row)

postgres=# \bind_named stmt \g
?column?
----------
1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
query | calls
-------+-------
(0 rows)

postgres=# set pg_stat_statements.sample_rate = 1;
SET
postgres=# \bind_named stmt \g
?column?
----------
1
(1 row)

postgres=# \bind_named stmt \g
?column?
----------
1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
query | calls
-------+-------
(0 rows)

postgres=# \bind_named stmt \g
?column?
----------
1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
query | calls
---------------------------------------------+-------
SELECT 1 | 1
SELECT query, calls FROM pg_stat_statements | 1
(2 rows)

postgres=# \bind_named stmt \g
?column?
----------
1
(1 row)

postgres=# SELECT query, calls FROM pg_stat_statements;
query | calls
---------------------------------------------+-------
SELECT 1 | 2
SELECT query, calls FROM pg_stat_statements | 2
(2 rows)

One idea is to make jstate available to all hooks, and completely
remove reliance on post_parse_analyze in pg_stat_statements.
I can't find the thread now, but I know this has come up in past discussions
when troubleshooting gaps in query normalization. My concern is this
feature will greatly increase the likelihood of non-normalized query texts.

Also, with regards to the benchmarks, it seems
sampling will be beneficial for workloads that are touching a small number
of entries with high concurrency. This is why we see benefit for
a standard pgbench workload.
Samping becomes less beneficial when there is a large set of queries
being updated.
I still think this is a good approach for workloads that touch a small set
of entries.

Regards,

Sami

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#20)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#20)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#22)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#23)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#24)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ilia Evdokimov (#22)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#26)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alexander Korotkov (#10)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Alena Rybakina (#26)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#29)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#30)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#31)

a.rybakina@postgrespro.ru

over 1 year ago

In reply to: Ilia Evdokimov (#31)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#32)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#34)

amborodin@acm.org

over 1 year ago

In reply to: Ilia Evdokimov (#31)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Andrey Borodin (#36)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#37)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#38)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#39)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#40)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#41)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#42)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#42)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#44)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#45)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#46)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#47)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#48)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#48)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#50)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#51)

samimseih@gmail.com

over 1 year ago

In reply to: Ilia Evdokimov (#52)

samimseih@gmail.com

over 1 year ago

In reply to: Sami Imseih (#53)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#53)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Sami Imseih (#54)

ilya.evdokimov@tantorlabs.com

over 1 year ago

In reply to: Ilia Evdokimov (#56)

ilya.evdokimov@tantorlabs.com

about 1 year ago

In reply to: Ilia Evdokimov (#57)