[Patch] Add a reset_computed_values function in pg_stat_statements

Started by Pierre Ducroquetover 6 years ago6 messages
#1Pierre Ducroquet
p.psql@pinaraf.info
1 attachment(s)

Hello

pg_stat_statements is a great tool to track performance issue in live
databases, especially when adding interfaces like PoWA on top of it.
But so far, tools like PoWA can not track the min_time, max_time, mean_time
and sum_var_time of queries : these statistics are cumulated over time,
fetching points in time would be of little to no use, especially when looking
at the impact of a DDL change.
This patch thus introduces a simple pg_stat_statements_reset_computed_values
function that reset the computed statistics, leaving all other information
alive, thus allowing the aforementioned scenario.

Regards

Pierre Ducroquet

Attachments:

0001-Add-a-function-to-reset-the-cumulative-statistics.patchtext/x-patch; charset=UTF-8; name=0001-Add-a-function-to-reset-the-cumulative-statistics.patchDownload
From 5e6d16d738e5279968321c5f3695e72ded2432db Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroquet@people-doc.com>
Date: Thu, 14 Feb 2019 14:37:48 +0100
Subject: [PATCH] Add a function to reset the cumulative statistics

pg_stat_statements has two parts : raw statistics that are simple 'stable'
counters, and computed statistics (averages, min time, max time...)

When using pg_stat_statements to find and fix performance issues, being
able to reset the computed statistics can help track the issue and the
impact of the fixes.
This would also make it possible for tools like powa to collect these
statistics too and track them over time by resetting them after each
collection.
---
 .../pg_stat_statements--1.7--1.8.sql          | 11 ++++
 .../pg_stat_statements/pg_stat_statements.c   | 52 +++++++++++++++++--
 .../pg_stat_statements.control                |  2 +-
 3 files changed, 61 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql

diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..7690a9ceba
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,11 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+CREATE FUNCTION pg_stat_statements_reset_computed_values()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C PARALLEL SAFE;
+
+
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 221b47298c..3a6c227a80 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -150,6 +150,7 @@ typedef struct Counters
 	double		max_time;		/* maximum execution time in msec */
 	double		mean_time;		/* mean execution time in msec */
 	double		sum_var_time;	/* sum of variances in execution time in msec */
+	int64		computed_calls;		/* # of times executed considered for the previous computed values */
 	int64		rows;			/* total # of retrieved or affected rows */
 	int64		shared_blks_hit;	/* # of shared buffer hits */
 	int64		shared_blks_read;	/* # of shared disk blocks read */
@@ -289,6 +290,7 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 void		_PG_init(void);
 void		_PG_fini(void);
 
+PG_FUNCTION_INFO_V1(pg_stat_statements_reset_computed_values);
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
@@ -1252,8 +1254,9 @@ pgss_store(const char *query, uint64 queryId,
 			e->counters.usage = USAGE_INIT;
 
 		e->counters.calls += 1;
+		e->counters.computed_calls += 1;
 		e->counters.total_time += total_time;
-		if (e->counters.calls == 1)
+		if (e->counters.computed_calls == 1)
 		{
 			e->counters.min_time = total_time;
 			e->counters.max_time = total_time;
@@ -1268,7 +1271,7 @@ pgss_store(const char *query, uint64 queryId,
 			double		old_mean = e->counters.mean_time;
 
 			e->counters.mean_time +=
-				(total_time - old_mean) / e->counters.calls;
+				(total_time - old_mean) / e->counters.computed_calls;
 			e->counters.sum_var_time +=
 				(total_time - old_mean) * (total_time - e->counters.mean_time);
 
@@ -1324,7 +1327,7 @@ pg_stat_statements_reset_1_7(PG_FUNCTION_ARGS)
 }
 
 /*
- * Reset statement statistics.
+ * Reset all statement statistics.
  */
 Datum
 pg_stat_statements_reset(PG_FUNCTION_ARGS)
@@ -1334,6 +1337,49 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * Reset computed statistics from all statements.
+ */
+Datum
+pg_stat_statements_reset_computed_values(PG_FUNCTION_ARGS)
+{
+	if (!pgss || !pgss_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("pg_stat_statements must be loaded via shared_preload_libraries")));
+	entry_reset_computed();
+	PG_RETURN_VOID();
+}
+
+
+/*
+ * Reset statement computed statistics.
+ * This takes a shared lock only on the hash table, and a lock per entry
+ */
+static void
+entry_reset_computed(void)
+{
+	HASH_SEQ_STATUS hash_seq;
+	pgssEntry  *entry;
+
+	/* Lookup the hash table entry with shared lock. */
+	LWLockAcquire(pgss->lock, LW_SHARED);
+
+	hash_seq_init(&hash_seq, pgss_hash);
+	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+	{
+		SpinLockAcquire(&entry->mutex);
+		entry->counters.computed_calls = 0;
+		entry->counters.min_time = 0;
+		entry->counters.max_time = 0;
+		entry->counters.mean_time = 0;
+		entry->counters.sum_var_time = 0;
+		SpinLockRelease(&entry->mutex);
+	}
+
+	LWLockRelease(pgss->lock);
+}
+
 /* Number of output arguments (columns) for various API versions */
 #define PG_STAT_STATEMENTS_COLS_V1_0	14
 #define PG_STAT_STATEMENTS_COLS_V1_1	18
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
 # pg_stat_statements extension
 comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
 module_pathname = '$libdir/pg_stat_statements'
 relocatable = true
-- 
2.23.0.rc1

#2Euler Taveira
euler@timbira.com.br
In reply to: Pierre Ducroquet (#1)
Re: [Patch] Add a reset_computed_values function in pg_stat_statements

Em dom, 1 de set de 2019 às 06:04, Pierre Ducroquet
<p.psql@pinaraf.info> escreveu:

Hello

pg_stat_statements is a great tool to track performance issue in live
databases, especially when adding interfaces like PoWA on top of it.
But so far, tools like PoWA can not track the min_time, max_time, mean_time
and sum_var_time of queries : these statistics are cumulated over time,
fetching points in time would be of little to no use, especially when looking
at the impact of a DDL change.
This patch thus introduces a simple pg_stat_statements_reset_computed_values
function that reset the computed statistics, leaving all other information
alive, thus allowing the aforementioned scenario.

Pierre, I don't understand why you want to reset part of the statement
statistics. If you want the minimal query time every week, reset all
statistics of that statement (v12 or later). Postgres provides
histogram metrics (min, max, mean, stddev). AFAICS you want a metric
type called timer (combination of histogram and meter). For example,
calls, sum, min, max, mean, standard deviation will be calculated each
hour. If I were to add a new metric to pg_stat_statements, it would be
last_time. Compare histogram metrics with the last statement time is
useful to check if a certain optimization was effective.

Talking about your patch, math is wrong. If you don't reset
total_time, all computed values will be incorrect. You are changing
the actual meaning of those metrics and I think it is unacceptable
because it will break applications. Instead, you should provide (i)
new counters or (ii) add GUC to control this behavior. It is a
trade-off between incompatibility and too many metrics. Though I
wouldn't like to break compatibility (as I said you can always reset
all statement statistics). Why don't you provide a
pg_stat_statements_reset_computed_values(userid Oid, dbid Oid, queryid
bigint)? You forgot to provide documentation for the new function. You
should revoke pg_stat_statements_reset_computed_values from PUBLIC.

Regards,

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#3Pierre Ducroquet
p.psql@pinaraf.info
In reply to: Euler Taveira (#2)
1 attachment(s)
Re: [Patch] Add a reset_computed_values function in pg_stat_statements

On Monday, September 2, 2019 3:25:00 AM CEST Euler Taveira wrote:

Em dom, 1 de set de 2019 às 06:04, Pierre Ducroquet

<p.psql@pinaraf.info> escreveu:

Hello

pg_stat_statements is a great tool to track performance issue in live
databases, especially when adding interfaces like PoWA on top of it.
But so far, tools like PoWA can not track the min_time, max_time,
mean_time
and sum_var_time of queries : these statistics are cumulated over time,
fetching points in time would be of little to no use, especially when
looking at the impact of a DDL change.
This patch thus introduces a simple
pg_stat_statements_reset_computed_values function that reset the computed
statistics, leaving all other information alive, thus allowing the
aforementioned scenario.

Hello

Thank you for reviewing this patch.

Pierre, I don't understand why you want to reset part of the statement
statistics. If you want the minimal query time every week, reset all
statistics of that statement (v12 or later). Postgres provides
histogram metrics (min, max, mean, stddev). AFAICS you want a metric
type called timer (combination of histogram and meter). For example,
calls, sum, min, max, mean, standard deviation will be calculated each
hour. If I were to add a new metric to pg_stat_statements, it would be
last_time. Compare histogram metrics with the last statement time is
useful to check if a certain optimization was effective.

min/max/mean timing in pg_stat_statements are *computed* fields, not simple
cumulative ones, hence the distinction I do here. For a tool like PoWA, that
(to simplify) copies all the statistics every few minutes to do its own
computation, it makes it impossible to get any sense out of them. The facility
to reset all statistics is an option, but it feels like an heavy solution when
only some statistics are different from the others.

Talking about your patch, math is wrong. If you don't reset
total_time, all computed values will be incorrect. You are changing
the actual meaning of those metrics and I think it is unacceptable
because it will break applications.

I think you misunderstood the way this patch is written.
total_time, in this function, is not a global statistics: pgss_store receive
the total time of the current query. When the computed statistics are reset,
we simply drop them and start them fresh from the first query. There is
absolutely no compatibility impact here.

Instead, you should provide (i) new counters or (ii) add GUC to control this
behavior. It is a trade-off between incompatibility and too many metrics.
Though I wouldn't like to break compatibility (as I said you can always
reset all statement statistics).

As explained: it does not break compatibility, I don't see a need for a new
GUC.

Why don't you provide a pg_stat_statements_reset_computed_values(userid Oid,
dbid Oid, queryid bigint)?

Because I did not need one, and with pg_stat_statements_reset(userid Oid, dbid
Oid, queryid bigint) being newly added, I did not think about it.
I can add one in a new version of the patch of course.

You forgot to provide documentation for the new function.

Indeed, I am not very used to SGML and first wanted a discussion about the
feature.

You should revoke pg_stat_statements_reset_computed_values from PUBLIC.

Indeed, I missed that one.

Attached to this mail is a new version fixing the revoke issue, and adding
documentation.

Regards

Pierre

Attachments:

0001-Add-a-function-to-reset-the-cumulative-statistics.patchtext/x-patch; charset=UTF-8; name=0001-Add-a-function-to-reset-the-cumulative-statistics.patchDownload
From 4e20277aad50e8de322f19d7c18e86dead436e0f Mon Sep 17 00:00:00 2001
From: Pierre <pierre.ducroquet@people-doc.com>
Date: Thu, 14 Feb 2019 14:37:48 +0100
Subject: [PATCH] Add a function to reset the cumulative statistics

pg_stat_statements has two parts : raw statistics that are simple 'stable'
counters, and computed statistics (averages, min time, max time...)

When using pg_stat_statements to find and fix performance issues, being
able to reset the computed statistics can help track the issue and the
impact of the fixes.
This would also make it possible for tools like powa to collect these
statistics too and track them over time by resetting them after each
collection.
---
 .../pg_stat_statements--1.7--1.8.sql          | 12 +++++
 .../pg_stat_statements/pg_stat_statements.c   | 51 +++++++++++++++++--
 .../pg_stat_statements.control                |  2 +-
 doc/src/sgml/pgstatstatements.sgml            | 25 +++++++++
 4 files changed, 86 insertions(+), 4 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql

diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
new file mode 100644
index 0000000000..351187e0a1
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql
@@ -0,0 +1,12 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.8'" to load this file. \quit
+
+CREATE FUNCTION pg_stat_statements_reset_computed_values()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C PARALLEL SAFE;
+
+-- Don't want this to be available to non-superusers.
+REVOKE ALL ON FUNCTION pg_stat_statements_reset_computed_values() FROM PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 221b47298c..fa6bdf57bd 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -150,6 +150,7 @@ typedef struct Counters
 	double		max_time;		/* maximum execution time in msec */
 	double		mean_time;		/* mean execution time in msec */
 	double		sum_var_time;	/* sum of variances in execution time in msec */
+	int64		computed_calls;		/* # of times executed considered for the previous computed values */
 	int64		rows;			/* total # of retrieved or affected rows */
 	int64		shared_blks_hit;	/* # of shared buffer hits */
 	int64		shared_blks_read;	/* # of shared disk blocks read */
@@ -289,6 +290,7 @@ static bool pgss_save;			/* whether to save stats across shutdown */
 void		_PG_init(void);
 void		_PG_fini(void);
 
+PG_FUNCTION_INFO_V1(pg_stat_statements_reset_computed_values);
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
@@ -1252,8 +1254,9 @@ pgss_store(const char *query, uint64 queryId,
 			e->counters.usage = USAGE_INIT;
 
 		e->counters.calls += 1;
+		e->counters.computed_calls += 1;
 		e->counters.total_time += total_time;
-		if (e->counters.calls == 1)
+		if (e->counters.computed_calls == 1)
 		{
 			e->counters.min_time = total_time;
 			e->counters.max_time = total_time;
@@ -1268,7 +1271,7 @@ pgss_store(const char *query, uint64 queryId,
 			double		old_mean = e->counters.mean_time;
 
 			e->counters.mean_time +=
-				(total_time - old_mean) / e->counters.calls;
+				(total_time - old_mean) / e->counters.computed_calls;
 			e->counters.sum_var_time +=
 				(total_time - old_mean) * (total_time - e->counters.mean_time);
 
@@ -1324,7 +1327,7 @@ pg_stat_statements_reset_1_7(PG_FUNCTION_ARGS)
 }
 
 /*
- * Reset statement statistics.
+ * Reset all statement statistics.
  */
 Datum
 pg_stat_statements_reset(PG_FUNCTION_ARGS)
@@ -1334,6 +1337,48 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/*
+ * Reset computed statistics from all statements.
+ */
+Datum
+pg_stat_statements_reset_computed_values(PG_FUNCTION_ARGS)
+{
+	if (!pgss || !pgss_hash)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("pg_stat_statements must be loaded via shared_preload_libraries")));
+	entry_reset_computed();
+	PG_RETURN_VOID();
+}
+
+
+/*
+ * Reset computed statement statistics - inner function.
+ */
+static void
+entry_reset_computed(void)
+{
+	HASH_SEQ_STATUS hash_seq;
+	pgssEntry  *entry;
+
+	/* Lookup the hash table entry with shared lock. */
+	LWLockAcquire(pgss->lock, LW_SHARED);
+
+	hash_seq_init(&hash_seq, pgss_hash);
+	while ((entry = hash_seq_search(&hash_seq)) != NULL)
+	{
+		SpinLockAcquire(&entry->mutex);
+		entry->counters.computed_calls = 0;
+		entry->counters.min_time = 0;
+		entry->counters.max_time = 0;
+		entry->counters.mean_time = 0;
+		entry->counters.sum_var_time = 0;
+		SpinLockRelease(&entry->mutex);
+	}
+
+	LWLockRelease(pgss->lock);
+}
+
 /* Number of output arguments (columns) for various API versions */
 #define PG_STAT_STATEMENTS_COLS_V1_0	14
 #define PG_STAT_STATEMENTS_COLS_V1_1	18
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 14cb422354..7fb20df886 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
 # pg_stat_statements extension
 comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.7'
+default_version = '1.8'
 module_pathname = '$libdir/pg_stat_statements'
 relocatable = true
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 26bb82da4a..6958b233f3 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -387,6 +387,31 @@
      </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term>
+     <function>pg_stat_statements_reset_computed_values() returns void</function>
+     <indexterm>
+      <primary>pg_stat_statements_reset_computed_values</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+      <function>pg_stat_statements_reset_computed_values</function> discards the
+      computed part of statistics gathered so far by <filename>pg_stat_statements</filename>,
+      aka the min_time, max_time, mean_time and sum_var_time fields. These fields,
+      unlike the others, are not plain cumulative values and instead the results of
+      an internal computation. When doing statistics on statements execution or
+      graphing them in an external tool, it can thus be usefull to be able to reset
+      only these instead of resetting the whole statistics. By default, this 
+      function can only be executed by superusers.  Access may be granted to others 
+      using <command>GRANT</command>.
+     </para>
+    </listitem>
+   </varlistentry>
+
+
   </variablelist>
  </sect2>
 
-- 
2.23.0.rc1

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pierre Ducroquet (#3)
Re: [Patch] Add a reset_computed_values function in pg_stat_statements

This patch seems to be failing the contrib build. Please fix.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Alvaro Herrera (#4)
Re: [Patch] Add a reset_computed_values function in pg_stat_statements

On Thu, Sep 26, 2019 at 8:05 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

This patch seems to be failing the contrib build. Please fix.

Hi Pierre,

In contrib/pg_stat_statements/pg_stat_statements.c, you need to
declare or define entry_reset_computed() before you use it. I suppose
your compiler must be warning about that. I personally put
"COPT=-Wall -Werror" into src/Makefile.custom to make sure that I
don't miss any warnings. That's also what http://cfbot.cputube.org/
does (a thing that does a full check-world on all registered patches
to look for such problems).

#6Michael Paquier
michael@paquier.xyz
In reply to: Thomas Munro (#5)
Re: [Patch] Add a reset_computed_values function in pg_stat_statements

On Tue, Nov 05, 2019 at 11:03:58AM +1300, Thomas Munro wrote:

In contrib/pg_stat_statements/pg_stat_statements.c, you need to
declare or define entry_reset_computed() before you use it. I suppose
your compiler must be warning about that. I personally put
"COPT=-Wall -Werror" into src/Makefile.custom to make sure that I
don't miss any warnings. That's also what http://cfbot.cputube.org/
does (a thing that does a full check-world on all registered patches
to look for such problems).

Still the same problem which has not been addressed after a couple of
weeks, so I am marking the entry as returned with feedback.
--
Michael