minimizing pg_stat_statements performance overhead

Started by Raymond Martinalmost 7 years ago19 messages
#1Raymond Martin
ramarti@microsoft.com
Hello hackers, 
This email is regarding the Postgres pg_stat_statements extension. 
I noticed that enabling pg_stat_statements can effect performance. I thought that changing the pg_stat_statements.track parameter to 'none' could reduce this overhead without requiring a restart to remove it from shared_preload_libraries. Changing this config did not improve performance as I expected. Looking over the code, I noticed that pg_stat_statements is not checking if it is enabled before executing the post_parse_analyze_hook function. Other hooks that require access to the pg_stat_statements query hash table (through the pgss_store function) check for pgss_enabled. 
Would it make sense to check for pgss_enabled in the post_parse_analyze_hook function?
 
**Patching**
Making this change drastically improved performance while pg_stat_statement.track was set to NONE. This change allows me to more effectively enable/disable pg_stat_statements without requiring a restart. 
Example patch:
@@ -783,8 +783,8 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
        /* Assert we didn't do this already */
        Assert(query->queryId == 0);
 
-       /* Safety check... */
-       if (!pgss || !pgss_hash)
+       /* Safety check...and ensure that pgss is enabled before we do any work */
+       if (!pgss || !pgss_hash || !pgss_enabled())
                return;

**Simple Mini Benchmark**
I ran a simple test on my local machine with this spec: 16 core/32 GB memory/Windows Server 2016.
The simple query I used was 'select 1'. I called pg_stat_statements_reset() before each simple query to clear the pg_stat_statements query hash. The majority of the latency happens the first time a query is run.
Median runtime of 100 simple queries in milliseconds:
PGSS loaded (ms) PGSS loaded + this patch (ms)
track = top          0.53 0.55
track = none       0.41 0.20

PGSS not loaded: 0.18ms

--
Raymond Martin
ramarti@microsoft.com
Azure Database for PostgreSQL

#2Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Raymond Martin (#1)
Re: minimizing pg_stat_statements performance overhead

Hello Raymond,

Would it make sense to check for pgss_enabled in the post_parse_analyze_hook function?

Probably.

-������ /* Safety check... */
-������ if (!pgss || !pgss_hash)
+������ /* Safety check...and ensure that pgss is enabled before we do any work */
+������ if (!pgss || !pgss_hash || !pgss_enabled())
��������������� return;

**Simple Mini Benchmark**
I ran a simple test on my local machine with this spec: 16 core/32 GB memory/Windows Server 2016.
The simple query I used was 'select 1'.

I called pg_stat_statements_reset() before each simple query to clear
the pg_stat_statements query hash.

This sentence seems to suggest that reset is called before each 'select
1'? I assume it is before each test run.

The majority of the latency happens the first time a query is run.

Median runtime of 100 simple queries in milliseconds:
PGSS loaded (ms) PGSS loaded + this patch (ms)
track = top��������� 0.53 0.55
track = none������ 0.41 0.20

PGSS not loaded: 0.18ms

This means 0.0018 ms latency per transaction, which seems rather fast, on
my laptop I have typically 0.0XX ms...

I could not reproduce these results on my ubuntu laptop. Could you be more
precise about the test? Did you use pgbench? Did it run in parallel? What
options were used? What is the test script?

I tried With "pgbench" on one thread on a local socket directory
connection on a 11.2 server:

sh> vi one.sql # SELECT 1;
sh> pgbench -n -T 100 -P 1 -M prepared -f one.sql

And I had the following latencies :

pgss not loaded: 0.026 ms
pgss top: 0.026/0.027 ms
pgss none: 0.027 ms

The effect is minimal. More precise per second analysis suggest a few
percent.

Ok, maybe my habit of -M prepared would hide some of the processing cost,
so:

sh> pgbench -n -T 100 -P 1 -f one.sql

pgss top: 0.035 ms
pgss none: 0.035 ms
pgss dropped but loaded: 0.035 ms
pgss not loaded: 0.032 ms

There I have an impact of 10% in these ideal testing conditions wrt
latency where the DB does basically nothing, thus which would not warrant
to disable pg_stat_statements given the great service this extension
brings to performance analysis.

Note that this does not mean that the patch should not be applied, it
looks like an oversight, but really I do not have the performance
degradation you are suggesting.

--
Fabien.

#3legrand legrand
legrand_legrand@hotmail.com
In reply to: Raymond Martin (#1)
Re: minimizing pg_stat_statements performance overhead

Hi,
the part that hurts in terms or performances is:

if (jstate.clocations_count > 0)
pgss_store(pstate->p_sourcetext,
query->queryId,
query->stmt_location,
query->stmt_len,
0,
0,
0,
NULL,
&jstate);

that writes the query text to disk, when it has at less one parameter ...
Computing the QueryId should stay very small and can be very usefull when
used in conjonction with
https://www.postgresql-archive.org/Feature-improvement-can-we-add-queryId-for-pg-catalog-pg-stat-activity-view-td6077275.html#a6077602
for wait events sampling.

I would propose to fix this by
if (jstate.clocations_count > 0 && pgss_enabled())
pgss_store(pstate->p_sourcetext,
...

Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#4Raymond Martin
ramarti@microsoft.com
In reply to: legrand legrand (#3)
RE: minimizing pg_stat_statements performance overhead

Hi Pascal,
Thanks for your feedback! I like your ideas.

the part that hurts in terms or performances is:

if (jstate.clocations_count > 0)
pgss_store(pstate->p_sourcetext,

I agree that this is the typically the most expensive part, but query normalization and hashing can also start becoming expensive with larger queries.

that writes the query text to disk, when it has at less one parameter ...
Computing the QueryId should stay very small and can be very usefull when used in conjonction with
https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql-archive.org%2FFeature-improvement-can-we-add-queryId-for-pg-catalog-pg-stat-activity-view-td6077275.html%23a6077602&data=02%7C01%7Cramarti%40microsoft.com%7Cfaa866abf1d5478e9a2208d6b2887cc4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636892696615564063&sdata=pWbyVleHceAHoNTMzb5oHGzois5yDaMpEHKmappTIwk%3D&reserved=0
for wait events sampling.

I also agree that the query id can be very useful! In cases where query id is required, pg_stat_statements can be enabled.
My intent of turning tracking off is to minimize the performance impact of pgss as much as possible and the thread above states: "PGSS jumble query logic is not bullet proof and may take time then impact the perf".

I believe your fix is a great step forward, but checking enabled at the very beginning would lead to better performance. This also follows the paradigm set by the rest of the pg_stat_statements codebase.
In the future, if we want only the query ID to be calculated maybe we can add another option for that?

--
Raymond Martin
ramarti@microsoft.com
Azure Database for PostgreSQL

#5legrand legrand
legrand_legrand@hotmail.com
In reply to: Raymond Martin (#4)
RE: minimizing pg_stat_statements performance overhead

Your fix is probably the best one.
Maybe this could be considered as a bug and back ported to previous versions
...

Regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#6Raymond Martin
ramarti@microsoft.com
In reply to: Fabien COELHO (#2)
RE: minimizing pg_stat_statements performance overhead

Hi Fabien,
Thank you for your time. Apologies for not being more specific about my testing methodology.

PGSS not loaded: 0.18ms

This means 0.0018 ms latency per transaction, which seems rather fast, on my laptop I have typically 0.0XX ms...

This actually means 0.18 milliseconds. I agree that this is a bit high, so I instead created an Ubuntu VM to get results that would align with yours.

I could not reproduce these results on my ubuntu laptop. Could you be more precise about the test? Did you use pgbench? Did it run in parallel? What options were used? What is the test script?

I did not use pgbench. It is important to call pg_stat_statements_reset before every query. This simulates a user that is performing distinct and non-repeated queries on their database. If you use prepared statements or the same set of queries each time, you would remove the contention on the pgss query text file.
I re-tested this on an Ubuntu machine with 4cores and 14GB ram. I did not run it in parallel. I used a python script that implements the follow logic:
- select pg_stat_statements_reset() -- this is important because we are making pgss treat the 'select 1' like a new query which it has not cached into pgss_hash.
- time 'select 1'
Repeat 100 times for each configuration.

Here are my Ubuntu results:
pgss unloaded
Mean: 0.076
Standard Deviation: 0.038

pgss.track=none
Mean: 0.099
Standard Deviation: 0.040

pgss.track=top
Mean: 0.098
Standard Deviation: 0.107

pgss.track=none + patch
Mean: 0.078
Standard Deviation: 0.042

The results are less noticeable, but I still see about a 20% performance improvement here.

There I have an impact of 10% in these ideal testing conditions wrt latency where the DB does basically nothing, thus which would not warrant to disable pg_stat_statements given the great service this extension brings to performance analysis.

I agree that pg_stat_statements should not be disabled based on these performance results.

Note that this does not mean that the patch should not be applied, it looks like an oversight, but really I do not have the performance degradation you are suggesting.

I appreciate your input and I want to come up with a canonical test that makes this contention more obvious.
Unfortunately, it is difficult because the criteria that causes this slow down (large query sizes and distinct non-repeated queries) are difficult to reproduce with pgbench. I would be open to any suggestions here.

So even though the performance gains in this specific scenario are not as great, do you still think it would make sense to submit a patch like this?

--
Raymond Martin
ramarti@microsoft.com
Azure Database for PostgreSQL

#7legrand legrand
legrand_legrand@hotmail.com
In reply to: Raymond Martin (#6)
RE: minimizing pg_stat_statements performance overhead

my test case:

drop table a;
create table a ();

DO
$$
DECLARE
i int;
BEGIN
for i in 1..20
loop
execute 'alter table a add column a'||i::text||' int';
end loop;
END
$$;

select pg_stat_statements_reset();
set pg_stat_statements.track='none';

DO
$$
DECLARE
i int;
j int;
BEGIN
for j in 1..20
loop
for i in 1..20
loop
execute 'select a'||i::text||',a'||j::text||' from a where 1=2';
end loop;
end loop;
END
$$;

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#8Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Raymond Martin (#6)
RE: minimizing pg_stat_statements performance overhead

Hello Raymond,

Note that this does not mean that the patch should not be applied, it
looks like an oversight, but really I do not have the performance
degradation you are suggesting.

I appreciate your input and I want to come up with a canonical test that
makes this contention more obvious. Unfortunately, it is difficult
because the criteria that causes this slow down (large query sizes and
distinct non-repeated queries) are difficult to reproduce with pgbench.
I would be open to any suggestions here.

So even though the performance gains in this specific scenario are not
as great, do you still think it would make sense to submit a patch like
this?

Sure, it definitely makes sense to reduce the overhead when the extension
is disabled. I wanted to understand the source of performance issue, and
your explanations where not enough for reproducing it.

--
Fabien.

#9Raymond Martin
ramarti@microsoft.com
In reply to: Fabien COELHO (#8)
1 attachment(s)
RE: minimizing pg_stat_statements performance overhead

Hi Fabien,

Sure, it definitely makes sense to reduce the overhead when the extension is disabled. I wanted to understand the source of performance issue, and your explanations where not enough for reproducing it.

Thanks again Fabien. I am attaching the patch to this email in the hope of getting it approved during the next commit fest.
I will continue trying to find a simple performance test to exemplify the performance degradation that I have seen with more complex workloads.

--
Raymond Martin
ramarti@Microsoft.com
Azure Database for PostgreSQL

Attachments:

check_pgss_enabled.patchapplication/octet-stream; name=check_pgss_enabled.patchDownload
��diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c

index 16b8074a00..ec54bee4bb 100644

--- a/contrib/pg_stat_statements/pg_stat_statements.c

+++ b/contrib/pg_stat_statements/pg_stat_statements.c

@@ -786,8 +786,8 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)

 	/* Assert we didn't do this already */

 	Assert(query->queryId == UINT64CONST(0));

 

-	/* Safety check... */

-	if (!pgss || !pgss_hash)

+	/* Safety check... and check that pgss is enabled*/

+	if (!pgss || !pgss_hash || pgss_enabled())

 		return;

 

 	/*

#10legrand legrand
legrand_legrand@hotmail.com
In reply to: Raymond Martin (#9)
RE: minimizing pg_stat_statements performance overhead

Hi,

it seems that your patch is not readable.
If you want it to be included in a commitfest, you should add it by yourself
in https://commitfest.postgresql.org/

Not sure that there is any room left in pg12 commitfest.

Regard
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#11Christoph Berg
myon@debian.org
In reply to: Raymond Martin (#9)
Re: minimizing pg_stat_statements performance overhead

Re: Raymond Martin 2019-04-01 <BN8PR21MB121708579A3782866DF1F745B1550@BN8PR21MB1217.namprd21.prod.outlook.com>

Thanks again Fabien. I am attaching the patch to this email in the hope of getting it approved during the next commit fest.

Raymond,

you sent the patch as UTF-16, could you re-send it as plain ascii?

Christoph

#12Robert Haas
robertmhaas@gmail.com
In reply to: Christoph Berg (#11)
Re: minimizing pg_stat_statements performance overhead

On Tue, Apr 2, 2019 at 5:37 AM Christoph Berg <myon@debian.org> wrote:

Re: Raymond Martin 2019-04-01 <BN8PR21MB121708579A3782866DF1F745B1550@BN8PR21MB1217.namprd21.prod.outlook.com>

Thanks again Fabien. I am attaching the patch to this email in the hope of getting it approved during the next commit fest.

you sent the patch as UTF-16, could you re-send it as plain ascii?

One thing that needs some thought here is what happens if the value of
pgss_enabled() changes. For example we don't want a situation where
if the value changes from off to on between one stage of processing
and another, the server crashes.

I don't know whether that's a risk here or not; it's just something to
think about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13legrand legrand
legrand_legrand@hotmail.com
In reply to: Robert Haas (#12)
Re: minimizing pg_stat_statements performance overhead

Robert Haas wrote

On Tue, Apr 2, 2019 at 5:37 AM Christoph Berg &lt;

myon@

&gt; wrote:

Re: Raymond Martin 2019-04-01 &lt;

BN8PR21MB121708579A3782866DF1F745B1550@.outlook

&gt;

Thanks again Fabien. I am attaching the patch to this email in the hope

of getting it approved during the next commit fest.

you sent the patch as UTF-16, could you re-send it as plain ascii?

One thing that needs some thought here is what happens if the value of
pgss_enabled() changes. For example we don't want a situation where
if the value changes from off to on between one stage of processing
and another, the server crashes.

I don't know whether that's a risk here or not; it's just something to
think about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi, here is a simple test where I commented that line
in pgss_post_parse_analyze
to force return; (as if pgss_enabled() was disabled)
but kept pgss_enabled() every where else

/* Safety check... */
// if (!pgss || !pgss_hash || !pgss_enabled())
return;

This works without crash as you can see here after:

postgres=# select pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)

postgres=# show pg_stat_statements.track;
pg_stat_statements.track
--------------------------
all
(1 row)

postgres=# create table a(id int);
CREATE TABLE

postgres=# select * from a where id=1;
id
----
(0 rows)

postgres=# select queryid,query,calls from pg_stat_statements;
queryid | query | calls
---------------------+-------------------------------+-------
1033669194118974675 | show pg_stat_statements.track | 1
3022461129400094363 | create table a(id int) | 1
(2 rows)

regards
PAscal

--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#14Raymond Martin
ramarti@microsoft.com
In reply to: Christoph Berg (#11)
1 attachment(s)
RE: minimizing pg_stat_statements performance overhead

Hi Christoph,

you sent the patch as UTF-16, could you re-send it as plain ascii?

Apologies. I re-attached the plain ascii version.

--
Raymond Martin
ramarti@microsoft.com
Azure Database for PostgreSQL

Attachments:

check_pgss_enabled.patchapplication/octet-stream; name=check_pgss_enabled.patchDownload
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 16b8074a00..6646690105 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -786,8 +786,8 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
 	/* Assert we didn't do this already */
 	Assert(query->queryId == UINT64CONST(0));
 
-	/* Safety check... */
-	if (!pgss || !pgss_hash)
+	/* Safety check...and check if pgss is enabled */
+	if (!pgss || !pgss_hash || !pgss_enabled())
 		return;
 
 	/*
#15Raymond Martin
ramarti@microsoft.com
In reply to: legrand legrand (#13)
RE: minimizing pg_stat_statements performance overhead

From: Robert Haas

One thing that needs some thought here is what happens if the value of
pgss_enabled() changes. For example we don't want a situation where
if the value changes from off to on between one stage of processing
and another, the server crashes.

I don't know whether that's a risk here or not; it's just something to
think about.

This is definitely an important consideration for this change. A hook could
have the implicit assumption that a query ID is always generated.

From: PAscal

Hi, here is a simple test where I commented that line in pgss_post_parse_analyze
to force return; (as if pgss_enabled() was disabled) but kept pgss_enabled() every where else

/* Safety check... */
// if (!pgss || !pgss_hash || !pgss_enabled())
return;

This works without crash as you can see here after:

In theory, the rest of the hooks look solid.
As mentioned above, I think the major concern would be a hook that depends
on a variable generated in pgss_post_parse_analyze. Two hooks
(pgss_ExecutorStart, pgss_ExecutorEnd) depend on the query ID
generated from pgss_post_parse_analyze. Fortunately, both of these
functions already check for query ID before doing work.

I really appreciate you putting this change into practice.
Great to see your results align with mine.
Thanks Pascal!!!

--
Raymond Martin
ramarti@microsoft.com
Azure Database for PostgreSQL

#16legrand legrand
legrand_legrand@hotmail.com
In reply to: Raymond Martin (#15)
RE: minimizing pg_stat_statements performance overhead
#17Raymond Martin
ramarti@microsoft.com
In reply to: legrand legrand (#16)
Re: minimizing pg_stat_statements performance overhead

Hi,
Apologies, but I had already created a commit here: https://commitfest.postgresql.org/23/2080/ .
Any preference on which to keep?

Thanks,
Raymond Martin
ramarti@microsoft.com

#18Fabien COELHO
coelho@cri.ensmp.fr
In reply to: Raymond Martin (#9)
RE: minimizing pg_stat_statements performance overhead

Hello Raymond,

Sure, it definitely makes sense to reduce the overhead when the extension is disabled. I wanted to understand the source of performance issue, and your explanations where not enough for reproducing it.

Thanks again Fabien. I am attaching the patch to this email in the hope of getting it approved during the next commit fest.
I will continue trying to find a simple performance test to exemplify the performance degradation that I have seen with more complex workloads.

Patch applies and compiles cleanly. Global and local make check ok.

The patch adds an early exit in one of the hook when pgss is not enabled
on a given query. This seems to be a long time oversight of some earlier
additions which only had some (small or large depending) performance
impact.

About the comment "...and" -> "... and" (add a space)

Otherwise all is well.

--
Fabien.

#19Jeff Davis
pgsql@j-davis.com
In reply to: Raymond Martin (#14)
Re: minimizing pg_stat_statements performance overhead

On Wed, 2019-04-03 at 23:20 +0000, Raymond Martin wrote:

Hi Christoph,

you sent the patch as UTF-16, could you re-send it as plain ascii?

Apologies. I re-attached the plain ascii version.

Committed. Thanks!

Regards,
Jeff Davis