pg_stat_statements oddity with track = all

Started by Julien Rouhaudabout 5 years ago32 messages
#1Julien Rouhaud
rjuju123@gmail.com

Hi,

Someone raised an interested point recently on pg_stat_kcache extension for
handling nested statements, which also applies to pg_stat_statements.

The root issue is that when pg_stat_statements tracks nested statements,
there's no way to really make sense of the counters, as top level statements
will also account for underlying statements. Using a naive example:

=# CREATE FUNCTION f1() RETURNS VOID AS $$BEGIN PERFORM pg_sleep(5); END; $$ LANGUAGE plpgsql;
CREATE FUNCTION

=# SELECT pg_stat_statements_reset();
pg_stat_statements_reset
--------------------------

(1 row)

=# SELECT f1();
f1
----

(1 row)

=# select sum(total_exec_time) from pg_stat_statements;
sum
--------------
10004.403601
(1 row)

It looks like there was 10s total execution time overall all statements, which
doesn't really make sense.

It's of course possible to avoid that using track = top, but tracking all
nested statements is usually quite useful so it could be better to find a way
to better address that problem.

The only idea I have for that is to add a new field to entry key, for instance
is_toplevel. The immediate cons is obviously that it could amplify quite a lot
the number of entries tracked, so people may need to increase
pg_stat_statements.max to avoid slowdown if that makes them reach frequent
entry eviction.

Should we address the problem, and in that case do you see a better way for
that, or should we just document this behavior?

#2Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Julien Rouhaud (#1)
Re: pg_stat_statements oddity with track = all

On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

Someone raised an interested point recently on pg_stat_kcache extension for
handling nested statements, which also applies to pg_stat_statements.

...

The only idea I have for that is to add a new field to entry key, for
instance
is_toplevel.

This particular problem often bothered me when dealing with
pg_stat_statements contents operating under "track = all" (especially when
performing the aggregated analysis, like you showed).

I think the idea of having a flag to distinguish the top-level entries is
great.

The immediate cons is obviously that it could amplify quite a lot
the number of entries tracked, so people may need to increase
pg_stat_statements.max to avoid slowdown if that makes them reach frequent
entry eviction.

If all top-level records in pg_stat_statements have "true" in the new
column (is_toplevel), how would this lead to the need to increase
pg_stat_statements.max? The number of records would remain the same, as
before extending pg_stat_statements.

#3Julien Rouhaud
rjuju123@gmail.com
In reply to: Nikolay Samokhvalov (#2)
Re: pg_stat_statements oddity with track = all

On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:

On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Someone raised an interested point recently on pg_stat_kcache extension for
handling nested statements, which also applies to pg_stat_statements.

...

The only idea I have for that is to add a new field to entry key, for
instance
is_toplevel.

This particular problem often bothered me when dealing with
pg_stat_statements contents operating under "track = all" (especially when
performing the aggregated analysis, like you showed).

I think the idea of having a flag to distinguish the top-level entries is
great.

Ok!

The immediate cons is obviously that it could amplify quite a lot
the number of entries tracked, so people may need to increase
pg_stat_statements.max to avoid slowdown if that makes them reach frequent
entry eviction.

If all top-level records in pg_stat_statements have "true" in the new
column (is_toplevel), how would this lead to the need to increase
pg_stat_statements.max? The number of records would remain the same, as
before extending pg_stat_statements.

If the same query is getting executed both at top level and as a nested
statement, two entries will then be created. That's probably unlikely for
things like RI trigger queries, but I don't know what to expect for client
application queries.

#4Fujii Masao
masao.fujii@oss.nttdata.com
In reply to: Julien Rouhaud (#3)
Re: pg_stat_statements oddity with track = all

On 2020/12/02 15:32, Julien Rouhaud wrote:

On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:

On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Someone raised an interested point recently on pg_stat_kcache extension for
handling nested statements, which also applies to pg_stat_statements.

...

The only idea I have for that is to add a new field to entry key, for
instance
is_toplevel.

This particular problem often bothered me when dealing with
pg_stat_statements contents operating under "track = all" (especially when
performing the aggregated analysis, like you showed).

I think the idea of having a flag to distinguish the top-level entries is
great.

Ok!

The immediate cons is obviously that it could amplify quite a lot
the number of entries tracked, so people may need to increase
pg_stat_statements.max to avoid slowdown if that makes them reach frequent
entry eviction.

If all top-level records in pg_stat_statements have "true" in the new
column (is_toplevel), how would this lead to the need to increase
pg_stat_statements.max? The number of records would remain the same, as
before extending pg_stat_statements.

If the same query is getting executed both at top level and as a nested
statement, two entries will then be created. That's probably unlikely for
things like RI trigger queries, but I don't know what to expect for client
application queries.

Just idea; instead of boolean is_toplevel flag, what about
counting the number of times when the statement is executed
in toplevel, and also in nested level?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Fujii Masao (#4)
Re: pg_stat_statements oddity with track = all

On Wed, Dec 02, 2020 at 03:52:37PM +0900, Fujii Masao wrote:

On 2020/12/02 15:32, Julien Rouhaud wrote:

On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:

On Tue, Dec 1, 2020 at 8:05 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Someone raised an interested point recently on pg_stat_kcache extension for
handling nested statements, which also applies to pg_stat_statements.

...

The only idea I have for that is to add a new field to entry key, for
instance
is_toplevel.

[...]

Just idea; instead of boolean is_toplevel flag, what about
counting the number of times when the statement is executed
in toplevel, and also in nested level?

Ah, indeed that would avoid extraneous entries. FTR we would also need that
for the planning part.

The cons I can see is that it'll make the counters harder to process (unless we
provide a specific view for the top-level statements only for instance), and
that it assumes that doing a simple division is representative enough for the
top level/nested repartition. This might be quite off for in some cases, e.g.
big stored procedures due to lack of autovacuum, but that can't be worse than
what we currently have.

#6legrand legrand
legrand_legrand@hotmail.com
In reply to: Julien Rouhaud (#5)
Re: pg_stat_statements oddity with track = all

Hi,

a crazy idea:
- add a parent_statement_id column that would be NULL for top level queries
- build statement_id for nested queries based on the merge of:
a/ current_statement_id and parent one
or
b/ current_statement_id and nested level.

this would offer the ability to track counters at any depth level ;o)
Regards
PAscal

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In reply to: legrand legrand (#6)
Re: pg_stat_statements oddity with track = all

Hello

- add a parent_statement_id column that would be NULL for top level queries

Will generate too much entries... Every FK for each different delete/insert, for example.
But very useful for databases with a lot of stored procedures to find where this query is called. May be new mode track = tree? Use NULL to indicate a top-level query (same as with track=tree) and some constant for any nested queries when track = all.

Also, currently a top statement will account buffers usage for underlying statements?

regards, Sergei

#8Nikolay Samokhvalov
samokhvalov@gmail.com
In reply to: Julien Rouhaud (#3)
Re: pg_stat_statements oddity with track = all

On Tue, Dec 1, 2020 at 10:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:

If all top-level records in pg_stat_statements have "true" in the new
column (is_toplevel), how would this lead to the need to increase
pg_stat_statements.max? The number of records would remain the same, as
before extending pg_stat_statements.

If the same query is getting executed both at top level and as a nested
statement, two entries will then be created. That's probably unlikely for
things like RI trigger queries, but I don't know what to expect for client
application queries.

Right, but this is how things already work. The extra field you've proposed
won't increase the number of records so it shouldn't affect how users
choose pg_stat_statements.max.

#9Julien Rouhaud
rjuju123@gmail.com
In reply to: Nikolay Samokhvalov (#8)
Re: pg_stat_statements oddity with track = all

On Wed, Dec 02, 2020 at 06:23:54AM -0800, Nikolay Samokhvalov wrote:

On Tue, Dec 1, 2020 at 10:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Dec 01, 2020 at 10:08:06PM -0800, Nikolay Samokhvalov wrote:

If all top-level records in pg_stat_statements have "true" in the new
column (is_toplevel), how would this lead to the need to increase
pg_stat_statements.max? The number of records would remain the same, as
before extending pg_stat_statements.

If the same query is getting executed both at top level and as a nested
statement, two entries will then be created. That's probably unlikely for
things like RI trigger queries, but I don't know what to expect for client
application queries.

Right, but this is how things already work. The extra field you've proposed
won't increase the number of records so it shouldn't affect how users
choose pg_stat_statements.max.

The extra field I've proposed would increase the number of records, as it needs
to be a part of the key. The only alternative would be what Fufi-san
mentioned, i.e. to split plans and calls (for instance plans_toplevel,
plans_nested, calls_toplevel, calls_nested) and let users compute an
approximate value for toplevel counters.

#10legrand legrand
legrand_legrand@hotmail.com
In reply to: Julien Rouhaud (#9)
Re: pg_stat_statements oddity with track = all

Hi Julien,

The extra field I've proposed would increase the number of records, as it
needs

to be a part of the key.

To get an increase in the number of records that means that the same
statement
would appear at top level AND nested level. This seems a corner case with
very low
(neglectible) occurence rate. Did I miss something ?

Regards
PAscal

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

In reply to: legrand legrand (#10)
Re: pg_stat_statements oddity with track = all

Hello

To get an increase in the number of records that means that the same
statement
would appear at top level AND nested level. This seems a corner case with
very low
(neglectible) occurence rate.

+1
I think splitting fields into plans_toplevel / plans_nested will be less convenient. And more code with higher chance of copypaste errors

regards, Sergei

#12Julien Rouhaud
rjuju123@gmail.com
In reply to: Sergei Kornilov (#7)
Re: pg_stat_statements oddity with track = all

On Wed, Dec 02, 2020 at 05:13:56PM +0300, Sergei Kornilov wrote:

Hello

- add a parent_statement_id column that would be NULL for top level queries

Will generate too much entries... Every FK for each different delete/insert, for example.
But very useful for databases with a lot of stored procedures to find where this query is called. May be new mode track = tree? Use NULL to indicate a top-level query (same as with track=tree) and some constant for any nested queries when track = all.

Maybe pg_stat_statements isn't the best tool for that use case. For the record
the profiler in plpgsql_check can now track queryid for each statements inside
a function, so you match pg_stat_statements entries. That's clearly not
perfect as dynamic queries could generate different queryid, but that's a
start.

Also, currently a top statement will account buffers usage for underlying statements?

I think so.

#13Julien Rouhaud
rjuju123@gmail.com
In reply to: Sergei Kornilov (#11)
Re: pg_stat_statements oddity with track = all

On Thu, Dec 03, 2020 at 11:40:22AM +0300, Sergei Kornilov wrote:

Hello

To get an increase in the number of records that means that the same
statement
would appear at top level AND nested level. This seems a corner case with
very low
(neglectible) occurence rate.

+1
I think splitting fields into plans_toplevel / plans_nested will be less convenient. And more code with higher chance of copypaste errors

As I mentioned in a previous message, I really have no idea if that would be a
corner case or not. For instance with native partitioning, the odds to have
many different query executed both at top level and as a nested statement may
be quite higher.

#14Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#13)
1 attachment(s)
Re: pg_stat_statements oddity with track = all

On Thu, Dec 03, 2020 at 04:53:59PM +0800, Julien Rouhaud wrote:

On Thu, Dec 03, 2020 at 11:40:22AM +0300, Sergei Kornilov wrote:

Hello

To get an increase in the number of records that means that the same
statement
would appear at top level AND nested level. This seems a corner case with
very low
(neglectible) occurence rate.

+1
I think splitting fields into plans_toplevel / plans_nested will be less convenient. And more code with higher chance of copypaste errors

As I mentioned in a previous message, I really have no idea if that would be a
corner case or not. For instance with native partitioning, the odds to have
many different query executed both at top level and as a nested statement may
be quite higher.

The consensus seems to be adding a new boolean toplevel flag in the entry key,
so PFA a patch implementing that. Note that the key now has padding, so
memset() calls are required.

Attachments:

v1-0001-Add-a-bool-toplevel-column-to-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 6c738707abdd72807ec94dbafd346f077e482f74 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 4 Dec 2020 13:33:51 +0800
Subject: [PATCH v1] Add a bool toplevel column to pg_stat_statements.

As top level statements include resource consumption for underlying statements,
it's not possible to compute the total resource consumption accurately.  Fix
that by adding a new toplevel boolean field that indicates whether the counters
were cumulated for queries executed at top level or not.

It can lead to more entries being stored for the same workload if
pg_stat_statements.track is set to all, but it seems unlikely to have the same
queries being executed both at top level and as nested statements.
---
 contrib/pg_stat_statements/Makefile           |  3 +-
 .../expected/pg_stat_statements.out           | 40 +++++++++++++
 .../pg_stat_statements--1.9--1.10.sql         | 57 +++++++++++++++++++
 .../pg_stat_statements/pg_stat_statements.c   | 42 +++++++++++---
 .../pg_stat_statements.control                |  2 +-
 .../sql/pg_stat_statements.sql                | 21 +++++++
 doc/src/sgml/pgstatstatements.sgml            |  9 +++
 7 files changed, 165 insertions(+), 9 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 3ec627b956..cab4f626ad 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql \
+        pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
 	pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..fb97f68737 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -876,4 +876,44 @@ SELECT dealloc FROM pg_stat_statements_info;
        0
 (1 row)
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+         query         | toplevel | plans | calls 
+-----------------------+----------+-------+-------
+ DELETE FROM test      | t        |     1 |     1
+ DO $$                +| t        |     0 |     1
+ BEGIN                +|          |       | 
+     DELETE FROM test;+|          |       | 
+ END;                 +|          |       | 
+ $$ LANGUAGE plpgsql   |          |       | 
+(2 rows)
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+         query         | toplevel | plans | calls 
+-----------------------+----------+-------+-------
+ DELETE FROM test      | f        |     1 |     1
+ DELETE FROM test      | t        |     2 |     2
+ DO $$                +| t        |     0 |     2
+ BEGIN                +|          |       | 
+     DELETE FROM test;+|          |       | 
+ END;                 +|          |       | 
+ $$ LANGUAGE plpgsql   |          |       | 
+(3 rows)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
new file mode 100644
index 0000000000..f97d16497d
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
@@ -0,0 +1,57 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+    OUT userid oid,
+    OUT dbid oid,
+    OUT toplevel bool,
+    OUT queryid bigint,
+    OUT query text,
+    OUT plans int8,
+    OUT total_plan_time float8,
+    OUT min_plan_time float8,
+    OUT max_plan_time float8,
+    OUT mean_plan_time float8,
+    OUT stddev_plan_time float8,
+    OUT calls int8,
+    OUT total_exec_time float8,
+    OUT min_exec_time float8,
+    OUT max_exec_time float8,
+    OUT mean_exec_time float8,
+    OUT stddev_exec_time float8,
+    OUT rows int8,
+    OUT shared_blks_hit int8,
+    OUT shared_blks_read int8,
+    OUT shared_blks_dirtied int8,
+    OUT shared_blks_written int8,
+    OUT local_blks_hit int8,
+    OUT local_blks_read int8,
+    OUT local_blks_dirtied int8,
+    OUT local_blks_written int8,
+    OUT temp_blks_read int8,
+    OUT temp_blks_written int8,
+    OUT blk_read_time float8,
+    OUT blk_write_time float8,
+    OUT wal_records int8,
+    OUT wal_fpi int8,
+    OUT wal_bytes numeric
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_10'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 70cfdb2c9d..c79e60bfde 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -124,7 +124,8 @@ typedef enum pgssVersion
 	PGSS_V1_1,
 	PGSS_V1_2,
 	PGSS_V1_3,
-	PGSS_V1_8
+	PGSS_V1_8,
+	PGSS_V1_10
 } pgssVersion;
 
 typedef enum pgssStoreKind
@@ -145,17 +146,13 @@ typedef enum pgssStoreKind
 /*
  * Hashtable key that defines the identity of a hashtable entry.  We separate
  * queries by user and by database even if they are otherwise identical.
- *
- * Right now, this structure contains no padding.  If you add any, make sure
- * to teach pgss_store() to zero the padding bytes.  Otherwise, things will
- * break, because pgss_hash is created using HASH_BLOBS, and thus tag_hash
- * is used to hash this.
  */
 typedef struct pgssHashKey
 {
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
 	uint64		queryid;		/* query identifier */
+	bool		toplevel;		/* query executed at top level */
 } pgssHashKey;
 
 /*
@@ -335,6 +332,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_8);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_10);
 PG_FUNCTION_INFO_V1(pg_stat_statements);
 PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
@@ -1327,9 +1325,11 @@ pgss_store(const char *query, uint64 queryId,
 	}
 
 	/* Set up key for hashtable search */
+	memset(&key, 0, sizeof(pgssHashKey));
 	key.userid = GetUserId();
 	key.dbid = MyDatabaseId;
 	key.queryid = queryId;
+	key.toplevel = (exec_nested_level == 0);
 
 	/* Lookup the hash table entry with shared lock. */
 	LWLockAcquire(pgss->lock, LW_SHARED);
@@ -1509,7 +1509,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 #define PG_STAT_STATEMENTS_COLS_V1_2	19
 #define PG_STAT_STATEMENTS_COLS_V1_3	23
 #define PG_STAT_STATEMENTS_COLS_V1_8	32
-#define PG_STAT_STATEMENTS_COLS			32	/* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_10	33
+#define PG_STAT_STATEMENTS_COLS			33	/* maximum of above */
 
 /*
  * Retrieve statement statistics.
@@ -1521,6 +1522,16 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
  * expected API version is identified by embedding it in the C name of the
  * function.  Unfortunately we weren't bright enough to do that for 1.1.
  */
+Datum
+pg_stat_statements_1_10(PG_FUNCTION_ARGS)
+{
+	bool		showtext = PG_GETARG_BOOL(0);
+
+	pg_stat_statements_internal(fcinfo, PGSS_V1_10, showtext);
+
+	return (Datum) 0;
+}
+
 Datum
 pg_stat_statements_1_8(PG_FUNCTION_ARGS)
 {
@@ -1640,6 +1651,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 			if (api_version != PGSS_V1_8)
 				elog(ERROR, "incorrect number of output arguments");
 			break;
+		case PG_STAT_STATEMENTS_COLS_V1_10:
+			if (api_version != PGSS_V1_10)
+				elog(ERROR, "incorrect number of output arguments");
+			break;
 		default:
 			elog(ERROR, "incorrect number of output arguments");
 	}
@@ -1731,6 +1746,8 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 
 		values[i++] = ObjectIdGetDatum(entry->key.userid);
 		values[i++] = ObjectIdGetDatum(entry->key.dbid);
+		if (api_version >= PGSS_V1_10)
+			values[i++] = BoolGetDatum(entry->key.toplevel);
 
 		if (is_allowed_role || entry->key.userid == userid)
 		{
@@ -1868,6 +1885,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 					 api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
 					 api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
 					 api_version == PGSS_V1_8 ? PG_STAT_STATEMENTS_COLS_V1_8 :
+					 api_version == PGSS_V1_10 ? PG_STAT_STATEMENTS_COLS_V1_10 :
 					 -1 /* fail if you forget to update this assert */ ));
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
@@ -2519,9 +2537,19 @@ entry_reset(Oid userid, Oid dbid, uint64 queryid)
 	if (userid != 0 && dbid != 0 && queryid != UINT64CONST(0))
 	{
 		/* If all the parameters are available, use the fast path. */
+		memset(&key, 0, sizeof(pgssHashKey));
 		key.userid = userid;
 		key.dbid = dbid;
 		key.queryid = queryid;
+		key.toplevel = false;
+
+		/* Remove the key if exists */
+		entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_REMOVE, NULL);
+		if (entry)				/* found */
+			num_remove++;
+
+		/* Also remove entries for top level statements */
+		key.toplevel = true;
 
 		/* Remove the key if exists */
 		entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_REMOVE, NULL);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 2f1ce6ed50..0747e48138 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
 # pg_stat_statements extension
 comment = 'track planning and execution statistics of all SQL statements executed'
-default_version = '1.9'
+default_version = '1.10'
 module_pathname = '$libdir/pg_stat_statements'
 relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 6f58d9d0f6..56d8526ccf 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -364,4 +364,25 @@ SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE
 SELECT pg_stat_statements_reset();
 SELECT dealloc FROM pg_stat_statements_info;
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+
 DROP EXTENSION pg_stat_statements;
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 81915ea69b..1ad47ce97c 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -79,6 +79,15 @@
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>toplevel</structfield> <type>bool</type>
+      </para>
+      <para>
+       True if the query was executed as a top level statement
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>queryid</structfield> <type>bigint</type>
-- 
2.20.1

In reply to: Julien Rouhaud (#14)
Re: pg_stat_statements oddity with track = all

Hello

Seems we need also change PGSS_FILE_HEADER.

regards, Sergei

#16Julien Rouhaud
rjuju123@gmail.com
In reply to: Sergei Kornilov (#15)
1 attachment(s)
Re: pg_stat_statements oddity with track = all

On Fri, Dec 04, 2020 at 12:06:10PM +0300, Sergei Kornilov wrote:

Hello

Seems we need also change PGSS_FILE_HEADER.

Indeed, thanks! v2 attached.

Attachments:

v2-0001-Add-a-bool-toplevel-column-to-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 1da24926d9645ee997aabd2907482a29332e3729 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 4 Dec 2020 13:33:51 +0800
Subject: [PATCH v2] Add a bool toplevel column to pg_stat_statements.

As top level statements include resource consumption for underlying statements,
it's not possible to compute the total resource consumption accurately.  Fix
that by adding a new toplevel boolean field that indicates whether the counters
were cumulated for queries executed at top level or not.

It can lead to more entries being stored for the same workload if
pg_stat_statements.track is set to all, but it seems unlikely to have the same
queries being executed both at top level and as nested statements.
---
 contrib/pg_stat_statements/Makefile           |  3 +-
 .../expected/pg_stat_statements.out           | 40 +++++++++++++
 .../pg_stat_statements--1.9--1.10.sql         | 57 +++++++++++++++++++
 .../pg_stat_statements/pg_stat_statements.c   | 44 +++++++++++---
 .../pg_stat_statements.control                |  2 +-
 .../sql/pg_stat_statements.sql                | 21 +++++++
 doc/src/sgml/pgstatstatements.sgml            |  9 +++
 7 files changed, 166 insertions(+), 10 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 3ec627b956..cab4f626ad 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql \
+        pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
 	pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..fb97f68737 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -876,4 +876,44 @@ SELECT dealloc FROM pg_stat_statements_info;
        0
 (1 row)
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+         query         | toplevel | plans | calls 
+-----------------------+----------+-------+-------
+ DELETE FROM test      | t        |     1 |     1
+ DO $$                +| t        |     0 |     1
+ BEGIN                +|          |       | 
+     DELETE FROM test;+|          |       | 
+ END;                 +|          |       | 
+ $$ LANGUAGE plpgsql   |          |       | 
+(2 rows)
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+         query         | toplevel | plans | calls 
+-----------------------+----------+-------+-------
+ DELETE FROM test      | f        |     1 |     1
+ DELETE FROM test      | t        |     2 |     2
+ DO $$                +| t        |     0 |     2
+ BEGIN                +|          |       | 
+     DELETE FROM test;+|          |       | 
+ END;                 +|          |       | 
+ $$ LANGUAGE plpgsql   |          |       | 
+(3 rows)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
new file mode 100644
index 0000000000..f97d16497d
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
@@ -0,0 +1,57 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+    OUT userid oid,
+    OUT dbid oid,
+    OUT toplevel bool,
+    OUT queryid bigint,
+    OUT query text,
+    OUT plans int8,
+    OUT total_plan_time float8,
+    OUT min_plan_time float8,
+    OUT max_plan_time float8,
+    OUT mean_plan_time float8,
+    OUT stddev_plan_time float8,
+    OUT calls int8,
+    OUT total_exec_time float8,
+    OUT min_exec_time float8,
+    OUT max_exec_time float8,
+    OUT mean_exec_time float8,
+    OUT stddev_exec_time float8,
+    OUT rows int8,
+    OUT shared_blks_hit int8,
+    OUT shared_blks_read int8,
+    OUT shared_blks_dirtied int8,
+    OUT shared_blks_written int8,
+    OUT local_blks_hit int8,
+    OUT local_blks_read int8,
+    OUT local_blks_dirtied int8,
+    OUT local_blks_written int8,
+    OUT temp_blks_read int8,
+    OUT temp_blks_written int8,
+    OUT blk_read_time float8,
+    OUT blk_write_time float8,
+    OUT wal_records int8,
+    OUT wal_fpi int8,
+    OUT wal_bytes numeric
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_10'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 70cfdb2c9d..52d5c455b3 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -98,7 +98,7 @@ PG_MODULE_MAGIC;
 #define PGSS_TEXT_FILE	PG_STAT_TMP_DIR "/pgss_query_texts.stat"
 
 /* Magic number identifying the stats file format */
-static const uint32 PGSS_FILE_HEADER = 0x20201126;
+static const uint32 PGSS_FILE_HEADER = 0x20201204;
 
 /* PostgreSQL major version number, changes in which invalidate all entries */
 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
@@ -124,7 +124,8 @@ typedef enum pgssVersion
 	PGSS_V1_1,
 	PGSS_V1_2,
 	PGSS_V1_3,
-	PGSS_V1_8
+	PGSS_V1_8,
+	PGSS_V1_10
 } pgssVersion;
 
 typedef enum pgssStoreKind
@@ -145,17 +146,13 @@ typedef enum pgssStoreKind
 /*
  * Hashtable key that defines the identity of a hashtable entry.  We separate
  * queries by user and by database even if they are otherwise identical.
- *
- * Right now, this structure contains no padding.  If you add any, make sure
- * to teach pgss_store() to zero the padding bytes.  Otherwise, things will
- * break, because pgss_hash is created using HASH_BLOBS, and thus tag_hash
- * is used to hash this.
  */
 typedef struct pgssHashKey
 {
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
 	uint64		queryid;		/* query identifier */
+	bool		toplevel;		/* query executed at top level */
 } pgssHashKey;
 
 /*
@@ -335,6 +332,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_8);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_10);
 PG_FUNCTION_INFO_V1(pg_stat_statements);
 PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
@@ -1327,9 +1325,11 @@ pgss_store(const char *query, uint64 queryId,
 	}
 
 	/* Set up key for hashtable search */
+	memset(&key, 0, sizeof(pgssHashKey));
 	key.userid = GetUserId();
 	key.dbid = MyDatabaseId;
 	key.queryid = queryId;
+	key.toplevel = (exec_nested_level == 0);
 
 	/* Lookup the hash table entry with shared lock. */
 	LWLockAcquire(pgss->lock, LW_SHARED);
@@ -1509,7 +1509,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 #define PG_STAT_STATEMENTS_COLS_V1_2	19
 #define PG_STAT_STATEMENTS_COLS_V1_3	23
 #define PG_STAT_STATEMENTS_COLS_V1_8	32
-#define PG_STAT_STATEMENTS_COLS			32	/* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_10	33
+#define PG_STAT_STATEMENTS_COLS			33	/* maximum of above */
 
 /*
  * Retrieve statement statistics.
@@ -1521,6 +1522,16 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
  * expected API version is identified by embedding it in the C name of the
  * function.  Unfortunately we weren't bright enough to do that for 1.1.
  */
+Datum
+pg_stat_statements_1_10(PG_FUNCTION_ARGS)
+{
+	bool		showtext = PG_GETARG_BOOL(0);
+
+	pg_stat_statements_internal(fcinfo, PGSS_V1_10, showtext);
+
+	return (Datum) 0;
+}
+
 Datum
 pg_stat_statements_1_8(PG_FUNCTION_ARGS)
 {
@@ -1640,6 +1651,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 			if (api_version != PGSS_V1_8)
 				elog(ERROR, "incorrect number of output arguments");
 			break;
+		case PG_STAT_STATEMENTS_COLS_V1_10:
+			if (api_version != PGSS_V1_10)
+				elog(ERROR, "incorrect number of output arguments");
+			break;
 		default:
 			elog(ERROR, "incorrect number of output arguments");
 	}
@@ -1731,6 +1746,8 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 
 		values[i++] = ObjectIdGetDatum(entry->key.userid);
 		values[i++] = ObjectIdGetDatum(entry->key.dbid);
+		if (api_version >= PGSS_V1_10)
+			values[i++] = BoolGetDatum(entry->key.toplevel);
 
 		if (is_allowed_role || entry->key.userid == userid)
 		{
@@ -1868,6 +1885,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 					 api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
 					 api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
 					 api_version == PGSS_V1_8 ? PG_STAT_STATEMENTS_COLS_V1_8 :
+					 api_version == PGSS_V1_10 ? PG_STAT_STATEMENTS_COLS_V1_10 :
 					 -1 /* fail if you forget to update this assert */ ));
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
@@ -2519,9 +2537,19 @@ entry_reset(Oid userid, Oid dbid, uint64 queryid)
 	if (userid != 0 && dbid != 0 && queryid != UINT64CONST(0))
 	{
 		/* If all the parameters are available, use the fast path. */
+		memset(&key, 0, sizeof(pgssHashKey));
 		key.userid = userid;
 		key.dbid = dbid;
 		key.queryid = queryid;
+		key.toplevel = false;
+
+		/* Remove the key if exists */
+		entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_REMOVE, NULL);
+		if (entry)				/* found */
+			num_remove++;
+
+		/* Also remove entries for top level statements */
+		key.toplevel = true;
 
 		/* Remove the key if exists */
 		entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_REMOVE, NULL);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 2f1ce6ed50..0747e48138 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
 # pg_stat_statements extension
 comment = 'track planning and execution statistics of all SQL statements executed'
-default_version = '1.9'
+default_version = '1.10'
 module_pathname = '$libdir/pg_stat_statements'
 relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 6f58d9d0f6..56d8526ccf 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -364,4 +364,25 @@ SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE
 SELECT pg_stat_statements_reset();
 SELECT dealloc FROM pg_stat_statements_info;
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+
 DROP EXTENSION pg_stat_statements;
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 81915ea69b..1ad47ce97c 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -79,6 +79,15 @@
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>toplevel</structfield> <type>bool</type>
+      </para>
+      <para>
+       True if the query was executed as a top level statement
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>queryid</structfield> <type>bigint</type>
-- 
2.20.1

#17Julien Rouhaud
rjuju123@gmail.com
In reply to: Julien Rouhaud (#16)
1 attachment(s)
Re: pg_stat_statements oddity with track = all

On Fri, Dec 04, 2020 at 06:09:13PM +0800, Julien Rouhaud wrote:

On Fri, Dec 04, 2020 at 12:06:10PM +0300, Sergei Kornilov wrote:

Hello

Seems we need also change PGSS_FILE_HEADER.

Indeed, thanks! v2 attached.

There was a conflict on PGSS_FILE_HEADER since some recent commit, v3 attached.

Attachments:

v3-0001-Add-a-bool-toplevel-column-to-pg_stat_statements.patchtext/plain; charset=us-asciiDownload
From 832b1a81cfba8a38c6b58b0a9212a6a95fc231a8 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Fri, 4 Dec 2020 13:33:51 +0800
Subject: [PATCH v3] Add a bool toplevel column to pg_stat_statements.

As top level statements include resource consumption for underlying statements,
it's not possible to compute the total resource consumption accurately.  Fix
that by adding a new toplevel boolean field that indicates whether the counters
were cumulated for queries executed at top level or not.

It can lead to more entries being stored for the same workload if
pg_stat_statements.track is set to all, but it seems unlikely to have the same
queries being executed both at top level and as nested statements.
---
 contrib/pg_stat_statements/Makefile           |  3 +-
 .../expected/pg_stat_statements.out           | 40 +++++++++++++
 .../pg_stat_statements--1.9--1.10.sql         | 57 +++++++++++++++++++
 .../pg_stat_statements/pg_stat_statements.c   | 44 +++++++++++---
 .../pg_stat_statements.control                |  2 +-
 .../sql/pg_stat_statements.sql                | 21 +++++++
 doc/src/sgml/pgstatstatements.sgml            |  9 +++
 7 files changed, 166 insertions(+), 10 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index 3ec627b956..cab4f626ad 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,7 +6,8 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql \
+        pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
 	pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 16158525ca..fb97f68737 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -876,4 +876,44 @@ SELECT dealloc FROM pg_stat_statements_info;
        0
 (1 row)
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+         query         | toplevel | plans | calls 
+-----------------------+----------+-------+-------
+ DELETE FROM test      | t        |     1 |     1
+ DO $$                +| t        |     0 |     1
+ BEGIN                +|          |       | 
+     DELETE FROM test;+|          |       | 
+ END;                 +|          |       | 
+ $$ LANGUAGE plpgsql   |          |       | 
+(2 rows)
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+         query         | toplevel | plans | calls 
+-----------------------+----------+-------+-------
+ DELETE FROM test      | f        |     1 |     1
+ DELETE FROM test      | t        |     2 |     2
+ DO $$                +| t        |     0 |     2
+ BEGIN                +|          |       | 
+     DELETE FROM test;+|          |       | 
+ END;                 +|          |       | 
+ $$ LANGUAGE plpgsql   |          |       | 
+(3 rows)
+
 DROP EXTENSION pg_stat_statements;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
new file mode 100644
index 0000000000..f97d16497d
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
@@ -0,0 +1,57 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this file. \quit
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+    OUT userid oid,
+    OUT dbid oid,
+    OUT toplevel bool,
+    OUT queryid bigint,
+    OUT query text,
+    OUT plans int8,
+    OUT total_plan_time float8,
+    OUT min_plan_time float8,
+    OUT max_plan_time float8,
+    OUT mean_plan_time float8,
+    OUT stddev_plan_time float8,
+    OUT calls int8,
+    OUT total_exec_time float8,
+    OUT min_exec_time float8,
+    OUT max_exec_time float8,
+    OUT mean_exec_time float8,
+    OUT stddev_exec_time float8,
+    OUT rows int8,
+    OUT shared_blks_hit int8,
+    OUT shared_blks_read int8,
+    OUT shared_blks_dirtied int8,
+    OUT shared_blks_written int8,
+    OUT local_blks_hit int8,
+    OUT local_blks_read int8,
+    OUT local_blks_dirtied int8,
+    OUT local_blks_written int8,
+    OUT temp_blks_read int8,
+    OUT temp_blks_written int8,
+    OUT blk_read_time float8,
+    OUT blk_write_time float8,
+    OUT wal_records int8,
+    OUT wal_fpi int8,
+    OUT wal_bytes numeric
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_10'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 196e1e2142..41ae98176c 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -99,7 +99,7 @@ PG_MODULE_MAGIC;
 #define PGSS_TEXT_FILE	PG_STAT_TMP_DIR "/pgss_query_texts.stat"
 
 /* Magic number identifying the stats file format */
-static const uint32 PGSS_FILE_HEADER = 0x20201218;
+static const uint32 PGSS_FILE_HEADER = 0x20201227;
 
 /* PostgreSQL major version number, changes in which invalidate all entries */
 static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
@@ -125,7 +125,8 @@ typedef enum pgssVersion
 	PGSS_V1_1,
 	PGSS_V1_2,
 	PGSS_V1_3,
-	PGSS_V1_8
+	PGSS_V1_8,
+	PGSS_V1_10
 } pgssVersion;
 
 typedef enum pgssStoreKind
@@ -146,17 +147,13 @@ typedef enum pgssStoreKind
 /*
  * Hashtable key that defines the identity of a hashtable entry.  We separate
  * queries by user and by database even if they are otherwise identical.
- *
- * Right now, this structure contains no padding.  If you add any, make sure
- * to teach pgss_store() to zero the padding bytes.  Otherwise, things will
- * break, because pgss_hash is created using HASH_BLOBS, and thus tag_hash
- * is used to hash this.
  */
 typedef struct pgssHashKey
 {
 	Oid			userid;			/* user OID */
 	Oid			dbid;			/* database OID */
 	uint64		queryid;		/* query identifier */
+	bool		toplevel;		/* query executed at top level */
 } pgssHashKey;
 
 /*
@@ -337,6 +334,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_8);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_10);
 PG_FUNCTION_INFO_V1(pg_stat_statements);
 PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
@@ -1329,9 +1327,11 @@ pgss_store(const char *query, uint64 queryId,
 	}
 
 	/* Set up key for hashtable search */
+	memset(&key, 0, sizeof(pgssHashKey));
 	key.userid = GetUserId();
 	key.dbid = MyDatabaseId;
 	key.queryid = queryId;
+	key.toplevel = (exec_nested_level == 0);
 
 	/* Lookup the hash table entry with shared lock. */
 	LWLockAcquire(pgss->lock, LW_SHARED);
@@ -1511,7 +1511,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 #define PG_STAT_STATEMENTS_COLS_V1_2	19
 #define PG_STAT_STATEMENTS_COLS_V1_3	23
 #define PG_STAT_STATEMENTS_COLS_V1_8	32
-#define PG_STAT_STATEMENTS_COLS			32	/* maximum of above */
+#define PG_STAT_STATEMENTS_COLS_V1_10	33
+#define PG_STAT_STATEMENTS_COLS			33	/* maximum of above */
 
 /*
  * Retrieve statement statistics.
@@ -1523,6 +1524,16 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
  * expected API version is identified by embedding it in the C name of the
  * function.  Unfortunately we weren't bright enough to do that for 1.1.
  */
+Datum
+pg_stat_statements_1_10(PG_FUNCTION_ARGS)
+{
+	bool		showtext = PG_GETARG_BOOL(0);
+
+	pg_stat_statements_internal(fcinfo, PGSS_V1_10, showtext);
+
+	return (Datum) 0;
+}
+
 Datum
 pg_stat_statements_1_8(PG_FUNCTION_ARGS)
 {
@@ -1642,6 +1653,10 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 			if (api_version != PGSS_V1_8)
 				elog(ERROR, "incorrect number of output arguments");
 			break;
+		case PG_STAT_STATEMENTS_COLS_V1_10:
+			if (api_version != PGSS_V1_10)
+				elog(ERROR, "incorrect number of output arguments");
+			break;
 		default:
 			elog(ERROR, "incorrect number of output arguments");
 	}
@@ -1733,6 +1748,8 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 
 		values[i++] = ObjectIdGetDatum(entry->key.userid);
 		values[i++] = ObjectIdGetDatum(entry->key.dbid);
+		if (api_version >= PGSS_V1_10)
+			values[i++] = BoolGetDatum(entry->key.toplevel);
 
 		if (is_allowed_role || entry->key.userid == userid)
 		{
@@ -1870,6 +1887,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 					 api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
 					 api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
 					 api_version == PGSS_V1_8 ? PG_STAT_STATEMENTS_COLS_V1_8 :
+					 api_version == PGSS_V1_10 ? PG_STAT_STATEMENTS_COLS_V1_10 :
 					 -1 /* fail if you forget to update this assert */ ));
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
@@ -2537,9 +2555,19 @@ entry_reset(Oid userid, Oid dbid, uint64 queryid)
 	if (userid != 0 && dbid != 0 && queryid != UINT64CONST(0))
 	{
 		/* If all the parameters are available, use the fast path. */
+		memset(&key, 0, sizeof(pgssHashKey));
 		key.userid = userid;
 		key.dbid = dbid;
 		key.queryid = queryid;
+		key.toplevel = false;
+
+		/* Remove the key if exists */
+		entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_REMOVE, NULL);
+		if (entry)				/* found */
+			num_remove++;
+
+		/* Also remove entries for top level statements */
+		key.toplevel = true;
 
 		/* Remove the key if exists */
 		entry = (pgssEntry *) hash_search(pgss_hash, &key, HASH_REMOVE, NULL);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 2f1ce6ed50..0747e48138 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
 # pg_stat_statements extension
 comment = 'track planning and execution statistics of all SQL statements executed'
-default_version = '1.9'
+default_version = '1.10'
 module_pathname = '$libdir/pg_stat_statements'
 relocatable = true
diff --git a/contrib/pg_stat_statements/sql/pg_stat_statements.sql b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
index 6f58d9d0f6..56d8526ccf 100644
--- a/contrib/pg_stat_statements/sql/pg_stat_statements.sql
+++ b/contrib/pg_stat_statements/sql/pg_stat_statements.sql
@@ -364,4 +364,25 @@ SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE
 SELECT pg_stat_statements_reset();
 SELECT dealloc FROM pg_stat_statements_info;
 
+--
+-- top level handling
+--
+SET pg_stat_statements.track = 'top';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+
+SET pg_stat_statements.track = 'all';
+DELETE FROM test;
+DO $$
+BEGIN
+    DELETE FROM test;
+END;
+$$ LANGUAGE plpgsql;
+SELECT query, toplevel, plans, calls FROM pg_stat_statements WHERE query LIKE '%DELETE%' ORDER BY query COLLATE "C", toplevel;
+
 DROP EXTENSION pg_stat_statements;
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 464bf0e5ae..cb3bbdf050 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -79,6 +79,15 @@
       </para></entry>
      </row>
 
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>toplevel</structfield> <type>bool</type>
+      </para>
+      <para>
+       True if the query was executed as a top level statement
+      </para></entry>
+     </row>
+
      <row>
       <entry role="catalog_table_entry"><para role="column_definition">
        <structfield>queryid</structfield> <type>bigint</type>
-- 
2.20.1

#18Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Julien Rouhaud (#17)
Re: pg_stat_statements oddity with track = all

Hi,

Thanks for making the patch to add "toplevel" column in
pg_stat_statements.
This is a review comment.

There hasn't been any discussion, at least that I've been able to find.
So, +1 to change the status to "Ready for Committer".

1. submission/feature review
=============================

This patch can be applied cleanly to the current master branch(ed4367).
I tested with `make check-world` and I checked there is no fail.

This patch has reasonable documents and tests.
A "toplevel" column of pg_stat_statements view is documented and
following tests are added.
- pg_stat_statements.track option : 'top' and 'all'
- query type: normal query and nested query(pl/pgsql)

I tested the "update" command can work.
postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';

Although the "toplevel" column of all queries which already stored is
'false',
we have to decide the default. I think 'false' is ok.

2. usability review
====================

This patch solves the problem we can't get to know
which query is top-level if pg_stat_statements.track = 'all'.

This leads that we can analyze with aggregated queries.

There is some use-case.
For example, we can know the sum of total_exec_time of queries
even if nested queries are executed.

We can know how efficiently a database can use CPU resource for queries
using the sum of the total_exec_time, and the exec_user_time and
exec_system_time in pg_stat_kcache which is the extension gathering
os resources.

Although one concern is whether only top-level is enough or not,
I couldn't come up with any use-case to use nested level, so I think
it's ok.

3. coding review
=================

Although I had two concerns, I think they are no problem.
So, this patch looks good to me.

Following were my concerns.

A. the risk of too many same queries is duplicate.

Since this patch adds a "top" member in the hash key,
it leads to store duplicated same query which "top" is false and true.

This concern is already discussed and I agreed to the consensus
it seems unlikely to have the same queries being executed both
at the top level and as nested statements.

B. add a argument of the pg_stat_statements_reset().

Now, pg_stat_statements supports a flexible reset feature.

pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)

Although I wondered whether we need to add a top-level flag to the
arguments,
I couldn't come up with any use-case to reset only top-level queries or
not top-level queries.

4. others
==========

These comments are not related to this patch.

A. about the topic of commitfests

Since I think this feature is for monitoring,
it's better to change the topic from "System Administration"
to "Monitoring & Control".

B. check logic whether a query is top-level or not in pg_stat_kcache

This patch uses the only exec_nested_level to check whether a query is
top-level or not.
The reason is nested_level is less useful and I agree.

But, pg_stat_kcache uses plan_nested_level too.
Although the check result is the same, it's better to change it
corresponding to this patch after it's committed.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#19Julien Rouhaud
rjuju123@gmail.com
In reply to: Masahiro Ikeda (#18)
Re: pg_stat_statements oddity with track = all

Hi,

On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

Thanks for making the patch to add "toplevel" column in
pg_stat_statements.
This is a review comment.

Thanks a lot for the thorough review!

I tested the "update" command can work.
postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';

Although the "toplevel" column of all queries which already stored is
'false',
we have to decide the default. I think 'false' is ok.

Mmm, I'm not sure that I understand this result. The "toplevel" value
is tracked by the C code loaded at startup, so it should have a
correct value even if you used to have the extension in a previous
version, it's just that you can't access the toplevel field before
doing the ALTER EXTENSION pg_stat_statements UPDATE. There's also a
change in the magic number, so you can't use the stored stat file from
a version before this patch.

I also fail to reproduce this behavior. I did the following:

- start postgres with pg_stat_statements v1.10 (so with this patch) in
shared_preload_libraries
- CREATE EXTENSION pg_stat_statements VERSION '1.9';
- execute a few queries
- ALTER EXTENSION pg_stat_statements UPDATE;
- SELECT * FROM pg_stat_statements reports the query with toplvel to TRUE

Can you share a way to reproduce your findings?

2. usability review
====================
[...]
Although one concern is whether only top-level is enough or not,
I couldn't come up with any use-case to use nested level, so I think
it's ok.

I agree, I don't see how tracking statistics per nesting level would
really help. The only additional use case I see would tracking
triggers/FK query execution, but the nesting level won't help with
that.

3. coding review
=================
[...]
B. add a argument of the pg_stat_statements_reset().

Now, pg_stat_statements supports a flexible reset feature.

pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)

Although I wondered whether we need to add a top-level flag to the
arguments,
I couldn't come up with any use-case to reset only top-level queries or
not top-level queries.

Ah, I didn't think of the reset function. I also fail to see a
reasonable use case to provide a switch for the reset function.

4. others
==========

These comments are not related to this patch.

A. about the topic of commitfests

Since I think this feature is for monitoring,
it's better to change the topic from "System Administration"
to "Monitoring & Control".

I agree, thanks for the change!

B. check logic whether a query is top-level or not in pg_stat_kcache

This patch uses the only exec_nested_level to check whether a query is
top-level or not.
The reason is nested_level is less useful and I agree.

Do you mean plan_nest_level is less useful?

But, pg_stat_kcache uses plan_nested_level too.
Although the check result is the same, it's better to change it
corresponding to this patch after it's committed.

I agree, we should be consistent here. I'll take care of the needed
changes if and when this patch is commited!

#20Masahiro Ikeda
ikedamsh@oss.nttdata.com
In reply to: Julien Rouhaud (#19)
Re: pg_stat_statements oddity with track = all

On 2021-01-20 18:14, Julien Rouhaud wrote:

On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda
<ikedamsh@oss.nttdata.com> wrote:

I tested the "update" command can work.
postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';

Although the "toplevel" column of all queries which already stored is
'false',
we have to decide the default. I think 'false' is ok.

Mmm, I'm not sure that I understand this result. The "toplevel" value
is tracked by the C code loaded at startup, so it should have a
correct value even if you used to have the extension in a previous
version, it's just that you can't access the toplevel field before
doing the ALTER EXTENSION pg_stat_statements UPDATE. There's also a
change in the magic number, so you can't use the stored stat file from
a version before this patch.

I also fail to reproduce this behavior. I did the following:

- start postgres with pg_stat_statements v1.10 (so with this patch) in
shared_preload_libraries
- CREATE EXTENSION pg_stat_statements VERSION '1.9';
- execute a few queries
- ALTER EXTENSION pg_stat_statements UPDATE;
- SELECT * FROM pg_stat_statements reports the query with toplvel to
TRUE

Can you share a way to reproduce your findings?

Sorry for making you confused. I can't reproduce although I tried now.
I think my procedure was something wrong. So, please ignore this
comment, sorry about that.

B. check logic whether a query is top-level or not in pg_stat_kcache

This patch uses the only exec_nested_level to check whether a query is
top-level or not.
The reason is nested_level is less useful and I agree.

Do you mean plan_nest_level is less useful?

I think so. Anyway, it's important to correspond core's implementation
because pg_stat_statements and pg_stat_kcache are used at the same time.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

#21Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Julien Rouhaud (#19)
Re: pg_stat_statements oddity with track = all

On Wed, Jan 20, 2021 at 6:15 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

On Tue, Jan 19, 2021 at 4:55 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote:

Thanks for making the patch to add "toplevel" column in
pg_stat_statements.
This is a review comment.

Thanks a lot for the thorough review!

I tested the "update" command can work.
postgres=# ALTER EXTENSION pg_stat_statements UPDATE TO '1.10';

Although the "toplevel" column of all queries which already stored is
'false',
we have to decide the default. I think 'false' is ok.

Mmm, I'm not sure that I understand this result. The "toplevel" value
is tracked by the C code loaded at startup, so it should have a
correct value even if you used to have the extension in a previous
version, it's just that you can't access the toplevel field before
doing the ALTER EXTENSION pg_stat_statements UPDATE. There's also a
change in the magic number, so you can't use the stored stat file from
a version before this patch.

I also fail to reproduce this behavior. I did the following:

- start postgres with pg_stat_statements v1.10 (so with this patch) in
shared_preload_libraries
- CREATE EXTENSION pg_stat_statements VERSION '1.9';
- execute a few queries
- ALTER EXTENSION pg_stat_statements UPDATE;
- SELECT * FROM pg_stat_statements reports the query with toplvel to TRUE

Can you share a way to reproduce your findings?

2. usability review
====================
[...]
Although one concern is whether only top-level is enough or not,
I couldn't come up with any use-case to use nested level, so I think
it's ok.

I agree, I don't see how tracking statistics per nesting level would
really help. The only additional use case I see would tracking
triggers/FK query execution, but the nesting level won't help with
that.

3. coding review
=================
[...]
B. add a argument of the pg_stat_statements_reset().

Now, pg_stat_statements supports a flexible reset feature.

pg_stat_statements_reset(userid Oid, dbid Oid, queryid bigint)

Although I wondered whether we need to add a top-level flag to the
arguments,
I couldn't come up with any use-case to reset only top-level queries or
not top-level queries.

Ah, I didn't think of the reset function. I also fail to see a
reasonable use case to provide a switch for the reset function.

4. others
==========

These comments are not related to this patch.

A. about the topic of commitfests

Since I think this feature is for monitoring,
it's better to change the topic from "System Administration"
to "Monitoring & Control".

I agree, thanks for the change!

I've changed the topic accordingly.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

#22Magnus Hagander
magnus@hagander.net
In reply to: Julien Rouhaud (#17)
Re: pg_stat_statements oddity with track = all

On Sun, Dec 27, 2020 at 9:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Fri, Dec 04, 2020 at 06:09:13PM +0800, Julien Rouhaud wrote:

On Fri, Dec 04, 2020 at 12:06:10PM +0300, Sergei Kornilov wrote:

Hello

Seems we need also change PGSS_FILE_HEADER.

Indeed, thanks! v2 attached.

There was a conflict on PGSS_FILE_HEADER since some recent commit, v3 attached.

- *
- * Right now, this structure contains no padding. If you add any, make sure
- * to teach pgss_store() to zero the padding bytes. Otherwise, things will
- * break, because pgss_hash is created using HASH_BLOBS, and thus tag_hash
- * is used to hash this.

I don't think removing this comment completely is a good idea. Right
now it ends up there, yes, but eventually it might reach the same
state again. I think it's better to reword it based on the current
situation while keeping the part about it having to be zeroed for
padding. And maybe along with a comment at the actual memset() sites
as well?

AFAICT, it's going to store two copies of the query in the query text
file (pgss_query_texts.stat)? Can we find a way around that? Maybe by
looking up the entry with the flag set to the other value, and then
reusing that?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#23Julien Rouhaud
rjuju123@gmail.com
In reply to: Masahiko Sawada (#21)
Re: pg_stat_statements oddity with track = all

On Thu, Jan 21, 2021 at 12:43:22AM +0900, Masahiko Sawada wrote:

On Wed, Jan 20, 2021 at 6:15 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

I agree, thanks for the change!

I've changed the topic accordingly.

Thanks Sawada-san! I thought that I took care of that but I somehow missed it.

#24Julien Rouhaud
rjuju123@gmail.com
In reply to: Magnus Hagander (#22)
Re: pg_stat_statements oddity with track = all

On Sat, Mar 06, 2021 at 06:56:49PM +0100, Magnus Hagander wrote:

On Sun, Dec 27, 2020 at 9:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

- *
- * Right now, this structure contains no padding. If you add any, make sure
- * to teach pgss_store() to zero the padding bytes. Otherwise, things will
- * break, because pgss_hash is created using HASH_BLOBS, and thus tag_hash
- * is used to hash this.

I don't think removing this comment completely is a good idea. Right
now it ends up there, yes, but eventually it might reach the same
state again. I think it's better to reword it based on the current
situation while keeping the part about it having to be zeroed for
padding. And maybe along with a comment at the actual memset() sites
as well?

Agreed, I'll take care of that.

AFAICT, it's going to store two copies of the query in the query text
file (pgss_query_texts.stat)? Can we find a way around that? Maybe by
looking up the entry with the flag set to the other value, and then
reusing that?

Yes I was a bit worried about the possible extra text entry. I kept things
simple until now as the general opinion was that entries existing for both top
level and nested level should be very rare so adding extra code and overhead to
spare a few query texts wouldn't be worth it.

I think that using a flag might be a bit expensive, as we would have to make
sure that at least one of the possible two entries has it. So if there are 2
entries and the one with the flag is evicted, we would have to transfer the
flag to the other one, and check the existence of the flag when allocatin a new
entry. And all of that has to be done holding an exclusive lock on pgss->lock.

Maybe having a new hash table (without the toplevel flag) for those query text
might be better, or maybe pgss performance is already so terrible when you have
to regularly evict entries that it wouldn't make any real difference.

Should I try to add some extra code to make sure that we only store the query
text once, or should I document that there might be duplicate, but we expect
that to be very rare?

#25Magnus Hagander
magnus@hagander.net
In reply to: Julien Rouhaud (#24)
Re: pg_stat_statements oddity with track = all

On Sun, Mar 7, 2021 at 8:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Sat, Mar 06, 2021 at 06:56:49PM +0100, Magnus Hagander wrote:

On Sun, Dec 27, 2020 at 9:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

- *
- * Right now, this structure contains no padding. If you add any, make sure
- * to teach pgss_store() to zero the padding bytes. Otherwise, things will
- * break, because pgss_hash is created using HASH_BLOBS, and thus tag_hash
- * is used to hash this.

I don't think removing this comment completely is a good idea. Right
now it ends up there, yes, but eventually it might reach the same
state again. I think it's better to reword it based on the current
situation while keeping the part about it having to be zeroed for
padding. And maybe along with a comment at the actual memset() sites
as well?

Agreed, I'll take care of that.

AFAICT, it's going to store two copies of the query in the query text
file (pgss_query_texts.stat)? Can we find a way around that? Maybe by
looking up the entry with the flag set to the other value, and then
reusing that?

Yes I was a bit worried about the possible extra text entry. I kept things
simple until now as the general opinion was that entries existing for both top
level and nested level should be very rare so adding extra code and overhead to
spare a few query texts wouldn't be worth it.

I think that using a flag might be a bit expensive, as we would have to make
sure that at least one of the possible two entries has it. So if there are 2
entries and the one with the flag is evicted, we would have to transfer the
flag to the other one, and check the existence of the flag when allocatin a new
entry. And all of that has to be done holding an exclusive lock on pgss->lock.

Yeah, we'd certainly want to minimize things. But what if they both
have the flag at that point? Then we wouldn't have to care on
eviction? But yes, for new allications we'd have to look up if the
query existed with the other value of the flag, and copy it over in
that case.

Maybe having a new hash table (without the toplevel flag) for those query text
might be better, or maybe pgss performance is already so terrible when you have
to regularly evict entries that it wouldn't make any real difference.

Should I try to add some extra code to make sure that we only store the query
text once, or should I document that there might be duplicate, but we expect
that to be very rare?

If we expect it to be rare, I think it might be reasonable to just
document that. But do we really have a strong argument for it being
rare?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#26Julien Rouhaud
rjuju123@gmail.com
In reply to: Magnus Hagander (#25)
Re: pg_stat_statements oddity with track = all

On Mon, Mar 08, 2021 at 06:03:59PM +0100, Magnus Hagander wrote:

On Sun, Mar 7, 2021 at 8:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Yes I was a bit worried about the possible extra text entry. I kept things
simple until now as the general opinion was that entries existing for both top
level and nested level should be very rare so adding extra code and overhead to
spare a few query texts wouldn't be worth it.

I think that using a flag might be a bit expensive, as we would have to make
sure that at least one of the possible two entries has it. So if there are 2
entries and the one with the flag is evicted, we would have to transfer the
flag to the other one, and check the existence of the flag when allocatin a new
entry. And all of that has to be done holding an exclusive lock on pgss->lock.

Yeah, we'd certainly want to minimize things. But what if they both
have the flag at that point? Then we wouldn't have to care on
eviction? But yes, for new allications we'd have to look up if the
query existed with the other value of the flag, and copy it over in
that case.

I think that we might be able to handle that without a flag. The only thing
that would need to be done is when creating an entry, look for an existing
entry with the opposite flag, and if there's simply use the same
(query_offset, query_len) info. This doesn't sound that expensive.

The real pain point will be that the garbage collection phase
will become way more expensive as it will now have to somehow maintain that
knowledge, which will require additional lookups for each entry. I'm a bit
concerned about that, especially with the current heuristic to schedule garbage
collection. For now, need_qc_qtext says that we have to do it if the extent is
more than 512 (B) * pgss_max. This probably doesn't work well for people using
ORM as they tend to generate gigantic SQL queries.

If we implement query text deduplication, should we add another GUC for that
"512" magic value so that people can minimize the gc overhead if they know they
have gigantic queries, or simply don't mind bigger qtext file?

Maybe having a new hash table (without the toplevel flag) for those query text
might be better, or maybe pgss performance is already so terrible when you have
to regularly evict entries that it wouldn't make any real difference.

Should I try to add some extra code to make sure that we only store the query
text once, or should I document that there might be duplicate, but we expect
that to be very rare?

If we expect it to be rare, I think it might be reasonable to just
document that. But do we really have a strong argument for it being
rare?

I don't that think that anyone really had a strong argument, mostly gut
feeling. Note that pg_stat_kcache already implemented that toplevel flags, so
if people are using that extension in a recent version they might have some
figures to show. I'll ping some people that I know are using it.

One good argument would be that gigantic queries generated by ORM should always
be executed as top level statements.

I previously tried with the postgres regression tests, which clearly isn't a
representative workload, and as far as I can see the vast majority of queries
executed bost as top level and nested level are DDL implying recursion (e.g. a
CREATE TABLE with underlying index creation).

#27Magnus Hagander
magnus@hagander.net
In reply to: Julien Rouhaud (#26)
Re: pg_stat_statements oddity with track = all

On Tue, Mar 9, 2021 at 3:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 08, 2021 at 06:03:59PM +0100, Magnus Hagander wrote:

On Sun, Mar 7, 2021 at 8:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

Yes I was a bit worried about the possible extra text entry. I kept things
simple until now as the general opinion was that entries existing for both top
level and nested level should be very rare so adding extra code and overhead to
spare a few query texts wouldn't be worth it.

I think that using a flag might be a bit expensive, as we would have to make
sure that at least one of the possible two entries has it. So if there are 2
entries and the one with the flag is evicted, we would have to transfer the
flag to the other one, and check the existence of the flag when allocatin a new
entry. And all of that has to be done holding an exclusive lock on pgss->lock.

Yeah, we'd certainly want to minimize things. But what if they both
have the flag at that point? Then we wouldn't have to care on
eviction? But yes, for new allications we'd have to look up if the
query existed with the other value of the flag, and copy it over in
that case.

I think that we might be able to handle that without a flag. The only thing
that would need to be done is when creating an entry, look for an existing
entry with the opposite flag, and if there's simply use the same
(query_offset, query_len) info. This doesn't sound that expensive.

That's basically what I was trying to say :)

The real pain point will be that the garbage collection phase
will become way more expensive as it will now have to somehow maintain that
knowledge, which will require additional lookups for each entry. I'm a bit
concerned about that, especially with the current heuristic to schedule garbage
collection. For now, need_qc_qtext says that we have to do it if the extent is
more than 512 (B) * pgss_max. This probably doesn't work well for people using
ORM as they tend to generate gigantic SQL queries.

Right, the cost would be mostly on the GC side. I've never done any
profiling to see how big of a thing that is in systems today -- have
you?

If we implement query text deduplication, should we add another GUC for that
"512" magic value so that people can minimize the gc overhead if they know they
have gigantic queries, or simply don't mind bigger qtext file?

Maybe having a new hash table (without the toplevel flag) for those query text
might be better, or maybe pgss performance is already so terrible when you have
to regularly evict entries that it wouldn't make any real difference.

Should I try to add some extra code to make sure that we only store the query
text once, or should I document that there might be duplicate, but we expect
that to be very rare?

If we expect it to be rare, I think it might be reasonable to just
document that. But do we really have a strong argument for it being
rare?

I don't that think that anyone really had a strong argument, mostly gut
feeling. Note that pg_stat_kcache already implemented that toplevel flags, so
if people are using that extension in a recent version they might have some
figures to show. I'll ping some people that I know are using it.

Great -- data always wins over gut feelings :)

One good argument would be that gigantic queries generated by ORM should always
be executed as top level statements.

Yes, that's true. And it probably holds as a more generic case as
well, that is the queries that are likely to show up both top-level
and lower-level are more likely to be relatively simple ones. (Except
for example during the development of functions/procs where they're
often executed top level as well to test etc, but that's not the most
important case to optimize for)

I previously tried with the postgres regression tests, which clearly isn't a
representative workload, and as far as I can see the vast majority of queries
executed bost as top level and nested level are DDL implying recursion (e.g. a
CREATE TABLE with underlying index creation).

I think the PostgreSQL regression tests are so far from a real world
workload that the input in this case has a value of exactly zero.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#28Julien Rouhaud
rjuju123@gmail.com
In reply to: Magnus Hagander (#27)
Re: pg_stat_statements oddity with track = all

On Tue, Mar 16, 2021 at 12:55:45PM +0100, Magnus Hagander wrote:

On Tue, Mar 9, 2021 at 3:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

I think that we might be able to handle that without a flag. The only thing
that would need to be done is when creating an entry, look for an existing
entry with the opposite flag, and if there's simply use the same
(query_offset, query_len) info. This doesn't sound that expensive.

That's basically what I was trying to say :)

Oh ok sorry :)

The real pain point will be that the garbage collection phase
will become way more expensive as it will now have to somehow maintain that
knowledge, which will require additional lookups for each entry. I'm a bit
concerned about that, especially with the current heuristic to schedule garbage
collection. For now, need_qc_qtext says that we have to do it if the extent is
more than 512 (B) * pgss_max. This probably doesn't work well for people using
ORM as they tend to generate gigantic SQL queries.

Right, the cost would be mostly on the GC side. I've never done any
profiling to see how big of a thing that is in systems today -- have
you?

I didn't, but I don't see how it could be anything but ridiculously impacting.
it's basically preventing any query from being planned or executed on the whole
instance the time needed to read the previous qtext file, and write all entries
still needed.

I don't that think that anyone really had a strong argument, mostly gut
feeling. Note that pg_stat_kcache already implemented that toplevel flags, so
if people are using that extension in a recent version they might have some
figures to show. I'll ping some people that I know are using it.

Great -- data always wins over gut feelings :)

So I asked some friends that have latest pg_stat_kcache installed on some
preproduction environment configured to track nested queries. There isn't a
high throughput but the activity should still be representative of the
production queries. There are a lot of applications plugged there, around 20
databases and quite a lot of PL code.

After a few days, here are the statistics:

- total of ~ 9500 entries
- ~ 900 entries for nested statements
- ~ 35 entries existing for both top level and nested statements

So the duplicates account for less than 4% of the nested statements, and less
than 0.5% of the whole entries.

I wish I had more reports, but if this one is representative enough then it
seems that trying to avoid storing duplicated queries wouldn't be worth it.

One good argument would be that gigantic queries generated by ORM should always
be executed as top level statements.

Yes, that's true. And it probably holds as a more generic case as
well, that is the queries that are likely to show up both top-level
and lower-level are more likely to be relatively simple ones. (Except
for example during the development of functions/procs where they're
often executed top level as well to test etc, but that's not the most
important case to optimize for)

Agreed.

I previously tried with the postgres regression tests, which clearly isn't a
representative workload, and as far as I can see the vast majority of queries
executed bost as top level and nested level are DDL implying recursion (e.g. a
CREATE TABLE with underlying index creation).

I think the PostgreSQL regression tests are so far from a real world
workload that the input in this case has a value of exactly zero.

I totally agree, but that's the only one I had at that time :) Still it wasn't
entirely useless as I didn't realize before that that some DDL would lead to
duplicated entries.

#29Magnus Hagander
magnus@hagander.net
In reply to: Julien Rouhaud (#28)
Re: pg_stat_statements oddity with track = all

On Tue, Mar 16, 2021 at 4:34 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Mar 16, 2021 at 12:55:45PM +0100, Magnus Hagander wrote:

On Tue, Mar 9, 2021 at 3:39 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

I think that we might be able to handle that without a flag. The only thing
that would need to be done is when creating an entry, look for an existing
entry with the opposite flag, and if there's simply use the same
(query_offset, query_len) info. This doesn't sound that expensive.

That's basically what I was trying to say :)

Oh ok sorry :)

The real pain point will be that the garbage collection phase
will become way more expensive as it will now have to somehow maintain that
knowledge, which will require additional lookups for each entry. I'm a bit
concerned about that, especially with the current heuristic to schedule garbage
collection. For now, need_qc_qtext says that we have to do it if the extent is
more than 512 (B) * pgss_max. This probably doesn't work well for people using
ORM as they tend to generate gigantic SQL queries.

Right, the cost would be mostly on the GC side. I've never done any
profiling to see how big of a thing that is in systems today -- have
you?

I didn't, but I don't see how it could be anything but ridiculously impacting.
it's basically preventing any query from being planned or executed on the whole
instance the time needed to read the previous qtext file, and write all entries
still needed.

I don't that think that anyone really had a strong argument, mostly gut
feeling. Note that pg_stat_kcache already implemented that toplevel flags, so
if people are using that extension in a recent version they might have some
figures to show. I'll ping some people that I know are using it.

Great -- data always wins over gut feelings :)

So I asked some friends that have latest pg_stat_kcache installed on some
preproduction environment configured to track nested queries. There isn't a
high throughput but the activity should still be representative of the
production queries. There are a lot of applications plugged there, around 20
databases and quite a lot of PL code.

After a few days, here are the statistics:

- total of ~ 9500 entries
- ~ 900 entries for nested statements
- ~ 35 entries existing for both top level and nested statements

So the duplicates account for less than 4% of the nested statements, and less
than 0.5% of the whole entries.

I wish I had more reports, but if this one is representative enough then it
seems that trying to avoid storing duplicated queries wouldn't be worth it.

I agree. If those numbers are indeed representable, it seems like
better to pay that overhead than to pay the overhead of trying to
de-dupe it.

Let's hope they are :)

Looking through ti again my feeling said the toplevel column should go
after the queryid and not before, but I'm not going to open up a
bikeshed over that.

I've added in a comment to cover that one that you removed (if you did
send an updated patch as you said, then I missed it -- sorry), and
applied the rest.

Thanks!

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#30Julien Rouhaud
rjuju123@gmail.com
In reply to: Magnus Hagander (#29)
1 attachment(s)
Re: pg_stat_statements oddity with track = all

On Thu, Apr 08, 2021 at 10:30:53AM +0200, Magnus Hagander wrote:

I agree. If those numbers are indeed representable, it seems like
better to pay that overhead than to pay the overhead of trying to
de-dupe it.

Let's hope they are :)

:)

Looking through ti again my feeling said the toplevel column should go
after the queryid and not before, but I'm not going to open up a
bikeshed over that.

I've added in a comment to cover that one that you removed (if you did
send an updated patch as you said, then I missed it -- sorry), and
applied the rest.

Oops, somehow I totally forgot to send the new patch, sorry :(

While looking at the patch, I unfortunately just realize that I unnecessarily
bumped the version to 1.10, as 1.9 was already new as of pg14. Honestly I have
no idea why I used 1.10 at that time. Version numbers are not a scarce
resource but maybe it would be better to keep 1.10 for a future major postgres
version?

If yes, PFA a patch to merge 1.10 in 1.9.

Attachments:

v1-0001-Don-t-bump-pg_stat_statements-to-1.10-in-REL_14_S.patchtext/x-diff; charset=us-asciiDownload
From b2b3102fa16d2b02d1838cf7853d0869dbb966cc Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 8 Apr 2021 19:52:33 +0800
Subject: [PATCH v1] Don't bump pg_stat_statements to 1.10 in REL_14_STABLE.

---
 contrib/pg_stat_statements/Makefile           |  3 +-
 .../pg_stat_statements--1.8--1.9.sql          | 53 +++++++++++++++++
 .../pg_stat_statements--1.9--1.10.sql         | 57 -------------------
 .../pg_stat_statements/pg_stat_statements.c   | 18 +++---
 .../pg_stat_statements.control                |  2 +-
 5 files changed, 64 insertions(+), 69 deletions(-)
 delete mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index cab4f626ad..3ec627b956 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -6,8 +6,7 @@ OBJS = \
 	pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.4.sql \
-        pg_stat_statements--1.9--1.10.sql pg_stat_statements--1.8--1.9.sql \
+DATA = pg_stat_statements--1.4.sql pg_stat_statements--1.8--1.9.sql \
 	pg_stat_statements--1.7--1.8.sql pg_stat_statements--1.6--1.7.sql \
 	pg_stat_statements--1.5--1.6.sql pg_stat_statements--1.4--1.5.sql \
 	pg_stat_statements--1.3--1.4.sql pg_stat_statements--1.2--1.3.sql \
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
index 3504ca7eb1..c45223f888 100644
--- a/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.8--1.9.sql
@@ -16,3 +16,56 @@ CREATE VIEW pg_stat_statements_info AS
   SELECT * FROM pg_stat_statements_info();
 
 GRANT SELECT ON pg_stat_statements_info TO PUBLIC;
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements(boolean);
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(IN showtext boolean,
+    OUT userid oid,
+    OUT dbid oid,
+    OUT toplevel bool,
+    OUT queryid bigint,
+    OUT query text,
+    OUT plans int8,
+    OUT total_plan_time float8,
+    OUT min_plan_time float8,
+    OUT max_plan_time float8,
+    OUT mean_plan_time float8,
+    OUT stddev_plan_time float8,
+    OUT calls int8,
+    OUT total_exec_time float8,
+    OUT min_exec_time float8,
+    OUT max_exec_time float8,
+    OUT mean_exec_time float8,
+    OUT stddev_exec_time float8,
+    OUT rows int8,
+    OUT shared_blks_hit int8,
+    OUT shared_blks_read int8,
+    OUT shared_blks_dirtied int8,
+    OUT shared_blks_written int8,
+    OUT local_blks_hit int8,
+    OUT local_blks_read int8,
+    OUT local_blks_dirtied int8,
+    OUT local_blks_written int8,
+    OUT temp_blks_read int8,
+    OUT temp_blks_written int8,
+    OUT blk_read_time float8,
+    OUT blk_write_time float8,
+    OUT wal_records int8,
+    OUT wal_fpi int8,
+    OUT wal_bytes numeric
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_9'
+LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements(true);
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql b/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
deleted file mode 100644
index f97d16497d..0000000000
--- a/contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql
+++ /dev/null
@@ -1,57 +0,0 @@
-/* contrib/pg_stat_statements/pg_stat_statements--1.9--1.10.sql */
-
--- complain if script is sourced in psql, rather than via ALTER EXTENSION
-\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.10'" to load this file. \quit
-
-/* First we have to remove them from the extension */
-ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
-ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements(boolean);
-
-/* Then we can drop them */
-DROP VIEW pg_stat_statements;
-DROP FUNCTION pg_stat_statements(boolean);
-
-/* Now redefine */
-CREATE FUNCTION pg_stat_statements(IN showtext boolean,
-    OUT userid oid,
-    OUT dbid oid,
-    OUT toplevel bool,
-    OUT queryid bigint,
-    OUT query text,
-    OUT plans int8,
-    OUT total_plan_time float8,
-    OUT min_plan_time float8,
-    OUT max_plan_time float8,
-    OUT mean_plan_time float8,
-    OUT stddev_plan_time float8,
-    OUT calls int8,
-    OUT total_exec_time float8,
-    OUT min_exec_time float8,
-    OUT max_exec_time float8,
-    OUT mean_exec_time float8,
-    OUT stddev_exec_time float8,
-    OUT rows int8,
-    OUT shared_blks_hit int8,
-    OUT shared_blks_read int8,
-    OUT shared_blks_dirtied int8,
-    OUT shared_blks_written int8,
-    OUT local_blks_hit int8,
-    OUT local_blks_read int8,
-    OUT local_blks_dirtied int8,
-    OUT local_blks_written int8,
-    OUT temp_blks_read int8,
-    OUT temp_blks_written int8,
-    OUT blk_read_time float8,
-    OUT blk_write_time float8,
-    OUT wal_records int8,
-    OUT wal_fpi int8,
-    OUT wal_bytes numeric
-)
-RETURNS SETOF record
-AS 'MODULE_PATHNAME', 'pg_stat_statements_1_10'
-LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
-
-CREATE VIEW pg_stat_statements AS
-  SELECT * FROM pg_stat_statements(true);
-
-GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index fc2677643b..24b453adcb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -120,7 +120,7 @@ typedef enum pgssVersion
 	PGSS_V1_2,
 	PGSS_V1_3,
 	PGSS_V1_8,
-	PGSS_V1_10
+	PGSS_V1_9
 } pgssVersion;
 
 typedef enum pgssStoreKind
@@ -299,7 +299,7 @@ PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_3);
 PG_FUNCTION_INFO_V1(pg_stat_statements_1_8);
-PG_FUNCTION_INFO_V1(pg_stat_statements_1_10);
+PG_FUNCTION_INFO_V1(pg_stat_statements_1_9);
 PG_FUNCTION_INFO_V1(pg_stat_statements);
 PG_FUNCTION_INFO_V1(pg_stat_statements_info);
 
@@ -1414,7 +1414,7 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 #define PG_STAT_STATEMENTS_COLS_V1_2	19
 #define PG_STAT_STATEMENTS_COLS_V1_3	23
 #define PG_STAT_STATEMENTS_COLS_V1_8	32
-#define PG_STAT_STATEMENTS_COLS_V1_10	33
+#define PG_STAT_STATEMENTS_COLS_V1_9	33
 #define PG_STAT_STATEMENTS_COLS			33	/* maximum of above */
 
 /*
@@ -1428,11 +1428,11 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
  * function.  Unfortunately we weren't bright enough to do that for 1.1.
  */
 Datum
-pg_stat_statements_1_10(PG_FUNCTION_ARGS)
+pg_stat_statements_1_9(PG_FUNCTION_ARGS)
 {
 	bool		showtext = PG_GETARG_BOOL(0);
 
-	pg_stat_statements_internal(fcinfo, PGSS_V1_10, showtext);
+	pg_stat_statements_internal(fcinfo, PGSS_V1_9, showtext);
 
 	return (Datum) 0;
 }
@@ -1556,8 +1556,8 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 			if (api_version != PGSS_V1_8)
 				elog(ERROR, "incorrect number of output arguments");
 			break;
-		case PG_STAT_STATEMENTS_COLS_V1_10:
-			if (api_version != PGSS_V1_10)
+		case PG_STAT_STATEMENTS_COLS_V1_9:
+			if (api_version != PGSS_V1_9)
 				elog(ERROR, "incorrect number of output arguments");
 			break;
 		default:
@@ -1651,7 +1651,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 
 		values[i++] = ObjectIdGetDatum(entry->key.userid);
 		values[i++] = ObjectIdGetDatum(entry->key.dbid);
-		if (api_version >= PGSS_V1_10)
+		if (api_version >= PGSS_V1_9)
 			values[i++] = BoolGetDatum(entry->key.toplevel);
 
 		if (is_allowed_role || entry->key.userid == userid)
@@ -1790,7 +1790,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 					 api_version == PGSS_V1_2 ? PG_STAT_STATEMENTS_COLS_V1_2 :
 					 api_version == PGSS_V1_3 ? PG_STAT_STATEMENTS_COLS_V1_3 :
 					 api_version == PGSS_V1_8 ? PG_STAT_STATEMENTS_COLS_V1_8 :
-					 api_version == PGSS_V1_10 ? PG_STAT_STATEMENTS_COLS_V1_10 :
+					 api_version == PGSS_V1_9 ? PG_STAT_STATEMENTS_COLS_V1_9 :
 					 -1 /* fail if you forget to update this assert */ ));
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 0747e48138..2f1ce6ed50 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
 # pg_stat_statements extension
 comment = 'track planning and execution statistics of all SQL statements executed'
-default_version = '1.10'
+default_version = '1.9'
 module_pathname = '$libdir/pg_stat_statements'
 relocatable = true
-- 
2.30.1

#31Magnus Hagander
magnus@hagander.net
In reply to: Julien Rouhaud (#30)
Re: pg_stat_statements oddity with track = all

On Thu, Apr 8, 2021 at 2:04 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Thu, Apr 08, 2021 at 10:30:53AM +0200, Magnus Hagander wrote:

I agree. If those numbers are indeed representable, it seems like
better to pay that overhead than to pay the overhead of trying to
de-dupe it.

Let's hope they are :)

:)

Looking through ti again my feeling said the toplevel column should go
after the queryid and not before, but I'm not going to open up a
bikeshed over that.

I've added in a comment to cover that one that you removed (if you did
send an updated patch as you said, then I missed it -- sorry), and
applied the rest.

Oops, somehow I totally forgot to send the new patch, sorry :(

While looking at the patch, I unfortunately just realize that I unnecessarily
bumped the version to 1.10, as 1.9 was already new as of pg14. Honestly I have
no idea why I used 1.10 at that time. Version numbers are not a scarce
resource but maybe it would be better to keep 1.10 for a future major postgres
version?

If yes, PFA a patch to merge 1.10 in 1.9.

I actually thought I looked at that, but clearly I was confused one
way or another.

I think you're right, it's cleaner to merge it into 1.9, so applied and pushed.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#32Julien Rouhaud
rjuju123@gmail.com
In reply to: Magnus Hagander (#31)
Re: pg_stat_statements oddity with track = all

On Thu, Apr 08, 2021 at 03:18:09PM +0200, Magnus Hagander wrote:

On Thu, Apr 8, 2021 at 2:04 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

If yes, PFA a patch to merge 1.10 in 1.9.

I actually thought I looked at that, but clearly I was confused one
way or another.

I think you're right, it's cleaner to merge it into 1.9, so applied and pushed.

Thanks Magnus!