pg_stat_statements: calls under-estimation propagation

Started by Daniel Farinaabout 13 years ago71 messages
#1Daniel Farina
drfarina@acm.org
1 attachment(s)

Hello,

After long delay (sorry) here's a patch implementing what was
hand-waved at in
http://archives.postgresql.org/pgsql-hackers/2012-10/msg00176.php

I am still something at a loss at how to test it besides prodding it
by hand; it seems like it's going to involve infrastructure or
introducing hooks into pg_stat_statements for the express purpose.

The patch can also be sourced from:

https://github.com/fdr/postgres.git error-prop-pg_stat_statements

Without further ado, the cover letter taken from the top of the patch:

This tries to establish a maximum under-estimate of the number of
calls for a given pg_stat_statements entry. That means the number of
calls to the canonical form of the query is between 'calls' and 'calls
+ calls_underest'.

This is useful to determine when accumulating statistics if a given
record is bouncing in and out of the pg_stat_statements table, having
its ncalls reset all the time, but also having calls_underest grow
very rapidly.

Records that always stay in pg_stat_statements will have a
calls_underest that do not change at all.

An interesting case is when a query that usually is called is not
called for a while, and falls out of pg_stat_statements. The result
can be that the query with the most 'calls' can also have more
uncertainty than the query with the second most calls, which is also
exactly the truth in reality.

Unceremoniously bundled into this patch is a reduction in the minimum
table size for pg_stat_statements, from 100 to 1. Using tiny values
is not likely to be seen in production, but makes testing the patch a
lot easier in some situations.

I will add this to the commitfest.

--
fdr

Attachments:

add-pg_stat_statements-calls-underestimation-v1.patchapplication/octet-stream; name=add-pg_stat_statements-calls-underestimation-v1.patchDownload
From e76af0c23888cf8232b59ce8cbb65f6008d544a4 Mon Sep 17 00:00:00 2001
From: Daniel Farina <daniel@fdr.io>
Date: Tue, 2 Oct 2012 23:06:51 -0700
Subject: [PATCH] Add pg_stat_statements calls underestimation propagation

This tries to establish a maximum under-estimate of the number of
calls for a given pg_stat_statements entry.  That means the number of
calls to the canonical form of the query is between 'calls' and 'calls
+ calls_underest'.

This is useful to determine when accumulating statistics if a given
record is bouncing in and out of the pg_stat_statements table, having
its ncalls reset all the time, but also having calls_underest grow
very rapidly.

Records that always stay in pg_stat_statements will have a
calls_underest that do not change at all.

An interesting case is when a query that usually is called is not
called for a while, and falls out of pg_stat_statements.  The result
can be that the query with the most 'calls' can also have more
uncertainty than the query with the second most calls, which is also
exactly the truth in reality.

Unceremoniously bundled into this patch is a reduction in the minimum
table size for pg_stat_statements, from 100 to 1.  Using tiny values
is not likely to be seen in production, but makes testing the patch a
lot easier in some situations.

Signed-off-by: Daniel Farina <daniel@fdr.io>
---
 contrib/pg_stat_statements/Makefile                |    1 +
 .../pg_stat_statements--1.1--1.2.sql               |   43 +++++++++++++++++++
 .../pg_stat_statements/pg_stat_statements--1.2.sql |   44 ++++++++++++++++++++
 contrib/pg_stat_statements/pg_stat_statements.c    |   37 +++++++++++++++-
 .../pg_stat_statements/pg_stat_statements.control  |    2 +-
 5 files changed, 124 insertions(+), 3 deletions(-)
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
 create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.2.sql

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index e8aed61..8656c17 100644
*** a/contrib/pg_stat_statements/Makefile
--- b/contrib/pg_stat_statements/Makefile
***************
*** 5,10 **** OBJS = pg_stat_statements.o
--- 5,11 ----
  
  EXTENSION = pg_stat_statements
  DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+ 	pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
  	pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
*** /dev/null
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***************
*** 0 ****
--- 1,43 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use "ALTER EXTENSION pg_stat_statements UPDATE" to load this file. \quit
+ 
+ /* First we have to remove them from the extension */
+ ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+ 
+ /* Then we can drop them */
+ DROP VIEW pg_stat_statements;
+ DROP FUNCTION pg_stat_statements();
+ 
+ /* Now redefine */
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT query text,
+     OUT calls int8,
+     OUT calls_underest int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
*** /dev/null
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***************
*** 0 ****
--- 1,44 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
+ 
+ -- Register functions.
+ CREATE FUNCTION pg_stat_statements_reset()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT query text,
+     OUT calls int8,
+     OUT calls_underest int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ -- Register a view on the function for ease of use.
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
+ 
+ -- Don't want this to be available to non-superusers.
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 101,106 **** typedef struct pgssHashKey
--- 101,107 ----
  typedef struct Counters
  {
  	int64		calls;			/* # of times executed */
+ 	int64		calls_underest;	/* max underestimation of # of executions */
  	double		total_time;		/* total execution time, in msec */
  	int64		rows;			/* total # of retrieved or affected rows */
  	int64		shared_blks_hit;	/* # of shared buffer hits */
***************
*** 139,144 **** typedef struct pgssEntry
--- 140,154 ----
  typedef struct pgssSharedState
  {
  	LWLockId	lock;			/* protects hashtable search/modification */
+ 
+ 	/*
+ 	 * cache of maximum calls-counter underestimation in hashtab
+ 	 *
+ 	 * Only accessed and changed along with the hash table, so also protected
+ 	 * by 'lock'.
+ 	 */
+ 	int64		calls_max_underest;
+ 
  	int			query_size;		/* max query length in bytes */
  	double		cur_median_usage;		/* current median usage in hashtable */
  } pgssSharedState;
***************
*** 292,298 **** _PG_init(void)
  							NULL,
  							&pgss_max,
  							1000,
! 							100,
  							INT_MAX,
  							PGC_POSTMASTER,
  							0,
--- 302,308 ----
  							NULL,
  							&pgss_max,
  							1000,
! 							1,
  							INT_MAX,
  							PGC_POSTMASTER,
  							0,
***************
*** 1066,1072 **** pg_stat_statements_reset(PG_FUNCTION_ARGS)
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS			18
  
  /*
   * Retrieve statement statistics.
--- 1076,1082 ----
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS			19
  
  /*
   * Retrieve statement statistics.
***************
*** 1163,1168 **** pg_stat_statements(PG_FUNCTION_ARGS)
--- 1173,1179 ----
  			continue;
  
  		values[i++] = Int64GetDatumFast(tmp.calls);
+ 		values[i++] = Int64GetDatumFast(tmp.calls_underest);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
***************
*** 1251,1256 **** entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky)
--- 1262,1271 ----
  		memset(&entry->counters, 0, sizeof(Counters));
  		/* set the appropriate initial usage count */
  		entry->counters.usage = sticky ? pgss->cur_median_usage : USAGE_INIT;
+ 
+ 		/* propagate calls under-estimation bound */
+ 		entry->counters.calls_underest = pgss->calls_max_underest;
+ 
  		/* re-initialize the mutex each time ... we assume no one using it */
  		SpinLockInit(&entry->mutex);
  		/* ... and don't forget the query text */
***************
*** 1283,1288 **** entry_cmp(const void *lhs, const void *rhs)
--- 1298,1306 ----
  /*
   * Deallocate least used entries.
   * Caller must hold an exclusive lock on pgss->lock.
+  *
+  * Also increases the underestimation maximum in pgss as a side
+  * effect, if necessary.
   */
  static void
  entry_dealloc(void)
***************
*** 1305,1316 **** entry_dealloc(void)
--- 1323,1349 ----
  	hash_seq_init(&hash_seq, pgss_hash);
  	while ((entry = hash_seq_search(&hash_seq)) != NULL)
  	{
+ 		const Counters *cur_counts = &entry->counters;
+ 		int64 cur_underest;
+ 
  		entries[i++] = entry;
+ 
  		/* "Sticky" entries get a different usage decay rate. */
  		if (entry->counters.calls == 0)
  			entry->counters.usage *= STICKY_DECREASE_FACTOR;
  		else
  			entry->counters.usage *= USAGE_DECREASE_FACTOR;
+ 
+ 		/*
+ 		 * Update global calls estimation state, if necessary.
+ 		 *
+ 		 * NB: It is necessary to compute the uncertainty over *all*
+ 		 * entries rather than from just those that will undergo
+ 		 * eviction.  This is because the metric used to choose
+ 		 * eviction is different than the metric reported to the user.
+ 		 */
+ 		cur_underest = cur_counts->calls + cur_counts->calls_underest;
+ 		pgss->calls_max_underest = Max(pgss->calls_max_underest, cur_underest);
  	}
  
  	qsort(entries, i, sizeof(pgssEntry *), entry_cmp);
*** a/contrib/pg_stat_statements/pg_stat_statements.control
--- b/contrib/pg_stat_statements/pg_stat_statements.control
***************
*** 1,5 ****
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.1'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
--- 1,5 ----
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.2'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
#2Peter Geoghegan
peter@2ndquadrant.com
In reply to: Daniel Farina (#1)
Re: pg_stat_statements: calls under-estimation propagation

On 28 December 2012 11:43, Daniel Farina <drfarina@acm.org> wrote:

Without further ado, the cover letter taken from the top of the patch:

This tries to establish a maximum under-estimate of the number of
calls for a given pg_stat_statements entry. That means the number of
calls to the canonical form of the query is between 'calls' and 'calls
+ calls_underest'.

Cool.

One possible issue I see with this is that this code:

+
+ 		/* propagate calls under-estimation bound */
+ 		entry->counters.calls_underest = pgss->calls_max_underest;
+

which appears within entry_alloc(). So if the first execution of the
query results in an error during planning or (much more likely)
execution, as in the event of an integrity constraint violation,
you're going to get a dead sticky entry that no one will ever see.
Now, we currently decay sticky entries much more aggressively, so the
mere fact that we waste an entry isn't a real problem. This was
established by this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d5375491f8e391224b48e4bb449995a4642183ea

However, with this approach, calls_underest values might appear to the
user in what might be considered an astonishing order. Now, I'm not
suggesting that that's a real problem - just that they may not be the
semantics we want, particularly as we can reasonably defer assigning a
calls_underest until a sticky entry is "unstuck", and an entry becomes
user-visible, within pgss_store().

Also, it seems like you should initialise pgss->calls_max_underest,
within pgss_shmem_startup(). You should probably serialise the value
to disk, and initialise it to 0 if there is no such value to begin
with.

I wonder if the way that pg_stat_statements throws its hands up when
it comes to crash safety (i.e. the contents of the hash table are
completely lost) could be a concern here. In other words, a program
tasked with tracking execution costs over time and graphing
time-series data from snapshots has to take responsibility for
ensuring that there hasn't been a crash (or, indeed, a reset).

Another issue is that I don't think that what you've done here solves
the problem of uniquely identify each entry over time, in the same way
that simply exposing the hash value would. I'm concerned with the
related problem to the problem solved here - simply identifying the
entry uniquely. As we've already discussed, the query string is an
imperfect proxy for each entry, even with constants swapped with '?'
characters (and even when combined with the userid and dbid values -
it's still not the same as the hashtable key, in a way that is likely
to bother people that use pg_stat_statements for long enough).

I think you probably should have created a
PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module
is now legacy, like *V1_0 is in HEAD.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Daniel Farina
drfarina@acm.org
In reply to: Peter Geoghegan (#2)
2 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

Attached is a cumulative patch attempting to address the below. One
can see the deltas to get there at https://github.com/fdr/postgres.git
error-prop-pg_stat_statements-v2.

On Fri, Dec 28, 2012 at 9:58 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

However, with this approach, calls_underest values might appear to the
user in what might be considered an astonishing order. Now, I'm not
suggesting that that's a real problem - just that they may not be the
semantics we want, particularly as we can reasonably defer assigning a
calls_underest until a sticky entry is "unstuck", and an entry becomes
user-visible, within pgss_store().

Fix for this as I understand it:

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 1036,1041 **** pgss_store(const char *query, uint32 queryId,
--- 1036,1042 ----
  			e->counters.usage = USAGE_INIT;

e->counters.calls += 1;
+ e->counters.calls_underest = pgss->calls_max_underest;
e->counters.total_time += total_time;
e->counters.rows += rows;
e->counters.shared_blks_hit += bufusage->shared_blks_hit;
***************
*** 1264,1272 **** entry_alloc(pgssHashKey *key, const char *query,
int query_len, bool sticky)
/* set the appropriate initial usage count */
entry->counters.usage = sticky ? pgss->cur_median_usage : USAGE_INIT;

- 		/* propagate calls under-estimation bound */
- 		entry->counters.calls_underest = pgss->calls_max_underest;
-
  		/* re-initialize the mutex each time ... we assume no one using it */
  		SpinLockInit(&entry->mutex);
  		/* ... and don't forget the query text */
--- 1265,1270 ----

Also, it seems like you should initialise pgss->calls_max_underest,
within pgss_shmem_startup().

Easy enough. Somehow I wrongly thought zero-initialization was a thing
for the shmem functions.

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 426,431 **** pgss_shmem_startup(void)
--- 426,432 ----
  	{
  		/* First time through ... */
  		pgss->lock = LWLockAssign();
+ 		pgss->calls_max_underest = 0;
  		pgss->query_size = pgstat_track_activity_query_size;
  		pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
  	}

You should probably serialise the value
to disk, and initialise it to 0 if there is no such value to begin
with.

I prefer different approach here: just compute it while loading the
entries from disk, since the calls + underestimation can be used to
find a new pessimum underestimation global value.

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 419,424 **** pgss_shmem_startup(void)
--- 419,425 ----
  	int			query_size;
  	int			buffer_size;
  	char	   *buffer = NULL;
+ 	int64		calls_max_underest = 0;
  	if (prev_shmem_startup_hook)
  		prev_shmem_startup_hook();
***************
*** 440,446 **** pgss_shmem_startup(void)
  	{
  		/* First time through ... */
  		pgss->lock = LWLockAssign();
! 		pgss->calls_max_underest = 0;
  		pgss->query_size = pgstat_track_activity_query_size;
  		pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
  	}
--- 441,447 ----
  	{
  		/* First time through ... */
  		pgss->lock = LWLockAssign();
! 		pgss->calls_max_underest = calls_max_underest;
  		pgss->query_size = pgstat_track_activity_query_size;
  		pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
  	}
***************
*** 528,533 **** pgss_shmem_startup(void)
--- 529,545 ----
  												   temp.query_len,
  												   query_size - 1);
+ 		/*
+ 		 * Compute maxima of under-estimation over the read entries
+ 		 * for reinitializing pgss->calls_max_underest.
+ 		 */
+ 		{
+ 			int64 cur_underest;
+ 				
+ 			cur_underest = temp.calls + temp.calls_underest;
+ 			calls_max_underest = Max(calls_max_underest, cur_underest);
+ 		}
+
  		/* make the hashtable entry (discards old entries if too many) */
  		entry = entry_alloc(&temp.key, buffer, temp.query_len, false);
***************
*** 535,540 **** pgss_shmem_startup(void)
--- 547,559 ----
  		entry->counters = temp.counters;
  	}
+ 	/*
+ 	 * Reinitialize global under-estimation information from the
+ 	 * computed maxima, if any.  Otherwise, calls_max_underest should
+ 	 * be zero.
+ 	 */
+ 	pgss->calls_max_underest = calls_max_underest;
+
  	pfree(buffer);
  	FreeFile(file);

I think you probably should have created a
PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module
is now legacy, like *V1_0 is in HEAD.

Indeed. Prepare to scroll, this change is not very complex but a bit
more bloated looking:

--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 96,101 **** typedef struct pgssHashKey
--- 96,115 ----
  } pgssHashKey;
  /*
+  * Identifies the tuple format detected by pg_stat_statements.
+  *
+  * Used to identify features of newer formats and enable smooth
+  * upgrades: one can install a new pg_stat_statements binary while
+  * running with the old SQL function definitions.
+  */
+ typedef enum pgssTupVersion
+ {
+ 	PGSS_TUP_V1_0 = 1,
+ 	PGSS_TUP_V1_1,
+ 	PGSS_TUP_LATEST
+ } pgssTupVersion;
+
+ /*
   * The actual stats counters kept within pgssEntry.
   */
  typedef struct Counters
***************
*** 1078,1083 **** pg_stat_statements_reset(PG_FUNCTION_ARGS)
--- 1092,1098 ----
  }

#define PG_STAT_STATEMENTS_COLS_V1_0 14
+ #define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS 19

/*
***************
*** 1095,1101 **** pg_stat_statements(PG_FUNCTION_ARGS)
bool is_superuser = superuser();
HASH_SEQ_STATUS hash_seq;
pgssEntry *entry;
! bool sql_supports_v1_1_counters = true;

  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
--- 1110,1116 ----
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	pgssTupVersion detected_version;

if (!pgss || !pgss_hash)
ereport(ERROR,
***************
*** 1116,1123 **** pg_stat_statements(PG_FUNCTION_ARGS)
/* Build a tuple descriptor for our result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
elog(ERROR, "return type must be a row type");
if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! sql_supports_v1_1_counters = false;

  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
--- 1131,1158 ----
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
+
+ 	/* Perform version detection */
  	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! 		detected_version = PGSS_TUP_V1_0;
! 	else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_1)
! 		detected_version = PGSS_TUP_V1_1;
! 	else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS)
! 		detected_version = PGSS_TUP_LATEST;
! 	else
! 	{
! 		/*
! 		 * Couldn't identify the tuple format.  Raise error.
! 		 *
! 		 * This is an exceptional case that may only happen in bizarre
! 		 * situations, since it is thought that every released version
! 		 * of pg_stat_statements has a matching schema.
! 		 */
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! 				 errmsg("pg_stat_statements schema is not supported "
! 						"by its installed binary")));
! 	}

per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
oldcontext = MemoryContextSwitchTo(per_query_ctx);
***************
*** 1175,1196 **** pg_stat_statements(PG_FUNCTION_ARGS)
continue;

  		values[i++] = Int64GetDatumFast(tmp.calls);
! 		values[i++] = Int64GetDatumFast(tmp.calls_underest);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (sql_supports_v1_1_counters)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
--- 1210,1232 ----
  			continue;

values[i++] = Int64GetDatumFast(tmp.calls);
! if (detected_version >= PGSS_TUP_LATEST)
! values[i++] = Int64GetDatumFast(tmp.calls_underest);
values[i++] = Float8GetDatumFast(tmp.total_time);
values[i++] = Int64GetDatumFast(tmp.rows);
values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! if (detected_version >= PGSS_TUP_V1_1)
values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! if (detected_version >= PGSS_TUP_V1_1)
values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
values[i++] = Int64GetDatumFast(tmp.local_blks_written);
values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! if (detected_version >= PGSS_TUP_V1_1)
{
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);

I wonder if the way that pg_stat_statements throws its hands up when
it comes to crash safety (i.e. the contents of the hash table are
completely lost) could be a concern here. In other words, a program
tasked with tracking execution costs over time and graphing
time-series data from snapshots has to take responsibility for
ensuring that there hasn't been a crash (or, indeed, a reset).

Another issue is that I don't think that what you've done here solves
the problem of uniquely identify each entry over time, in the same way
that simply exposing the hash value would. I'm concerned with the
related problem to the problem solved here - simply identifying the
entry uniquely. As we've already discussed, the query string is an
imperfect proxy for each entry, even with constants swapped with '?'
characters (and even when combined with the userid and dbid values -
it's still not the same as the hashtable key, in a way that is likely
to bother people that use pg_stat_statements for long enough).

These were not express goals of the patch, but so long as you are
inviting features, attached is a bonus patch that exposes the queryid
and also the notion of a "statistics session" that is re-rolled
whenever the stats file could not be read or the stats are reset, able
to fully explain all obvious causes of retrograde motion in
statistics. It too is cumulative, so it includes the under-estimation
field. Notably, I also opted to nullify extra pg_stat_statements
fields when they'd also show "insufficient privileges" (that one is
spared from this censorship), because I feel as though a bit too much
information leaks from pg_stat_statement's statistics to ignore,
especially after adding the query id. Since the common theme here is
identifying queries, I have called it
"pg_stat_statements-identification", and it can be found in the git
repo above under the same name (...-v1).

--
fdr

Attachments:

add-pg_stat_statements-calls-underestimation-v2.patch.gzapplication/x-gzip; name=add-pg_stat_statements-calls-underestimation-v2.patch.gzDownload
����P�Z{w�F��|���'�1���M�[qhmp�H�������XHD;n����;#	���'���$6h�������uf�X�����lbD��_|��8�97��������mv��������J��76x���^�U�@���
���g����������_
�
']�[����4�f��[���ox�l�U�w�����7�%�fZW����L�,s�6�����hBH;6�����_��T���\i�v����N�}�I����V0�y���9Y�;���(HB���gh�����1}v���{�����������"�6�KT�F0u����b�J��d��'n,����0��
g�y�i{��}r����`J��s?r_��N������\�x��G�c�|C�v���;�'.,�����_{%�9��@@�R����P�����.��t�D<tm�v����\}�>��-�!��S���V�<5��!�2��� 6=#v���a��ap��0rLpi\zW������!7�;7�n���=7���V�xx[OGa�����:*
[�=��8d���:�:���������a_c��^��� ��}4�y>8�!Vv�7������g����{����fS�\c�@#���3�Xg����].p:���l{���v1zu�_$�?4v�~N��Ra;�����Z"��"6H���8u��V��5�`���E��F�y�1����u�����&���0$,��d���A��	_�d���:2��$�7���~,�A��3��t=��y���Q2�!��p���v���L��z��K�����:�����U��Q��
(��QV��8L�,D�3r~���@���K�O��#Y:��IMu��q�;vy$'3HZ��1�y��~qy[Vk	��M2w%�[6�f�@�d7>���+pF��m�}��h�#1$�Ih�<�&(�R��Y���2��t}a����Q 	�w�	�qcG�x6�~:����D���7X&L�'S!*=����`V?Rawq�i�>�0�*F���X~���>��x�i	c�C	L���������"M|8d��8�P��.O�PaA�M���Kp��a�#��3������G�J����
"�n%�Y������8"����e\��d,�DB��U� �Z+���r$���

�	��R���qRr%�8)�p~
z���XZ���������k��.��K�����aCAn����d
���P�2'+���S5��L-g?��U���j�J���Y��2rb��-gg�pMK�=K�*�u+�����%w
�
�N-hEoxz��O�|�&h������Bv6��^Pt&f��;��WJ;�Sd����fS>X5/�:71�H��Y����fC�'3��.oc�lnVSn��o$�9I��4D�`r����<.�|Z��Q�Y������4����(R�t�F�������1���0��gJ��j<���������y������c�b���]
�a`d�e��i�3|:O��LX��j���8<r�|J�;�������m����>R��,"���0H&k6���+�s�%�B�t�n�_f��U����Q���D�O�0qhZWb�{���F�*��6�������{�����t��<��������R���W�*�P�`����K� ��t��-U���Z@��_P>a�=���iH�rj���6Y4l�F��[Y+�n�X�V��
�.\+;E��WV��
lD�m�\�k�Oy�w�L�k�=�t����F~�IJ����-����/�\���q<8�DNPQv�N��$O;��/q������OH�])��4�~���&K�/D��8|�*9�����u*)UF�2�J���dY=L2�L��Mo��CS��4����?�������Q�x��!�!L�1�L&<sw���&��L|0�y�CU��Q��(sG��YL4�����2��3P�>��K)�S����EBn�U�}�(
Dq��$D�%^,r�,��t�l�rul��8h0q���@I�j��L��B��a����z������1���2	}y�4Ax�����x�!�@����_�f�$���;�&.�;�i��)�&(=�b���)����S��?���Cl���"
BQ�UP���x.����6����z�$
��@{y�����,f<���]/�|�.��}�P����G�����<������4GO?
 Dt����b����������&cC�q<���f�:�d\��Q�Y�cAj9�I�c")7oQ���p��L4�P��n"c�PD.jbQ\��p}b���d0�hp� �����A��QI���,�ZJ���1�#�.�@�%W�����NTc���X7���q1T��O����I-�$M�Iu��y<1�1KCX��s�
�TT����bE�c�V�	%~�:���f����tw3���a�?�c+���������o��G�)�O�8��6��Og2=�qzi�k�$]����P��P^���K9�!�!,�+"���$��c��
�'R��������!����h{�D�O��P�|,��gY�K��=r>����6e��N�W�Y^����r��\��jb����u�_"��)@���Uc����Y�y�h�R�)MN�{({�{�M���=�����_��+i��~,;p����"�Q���V�2�N��*4��aQk��8k��j�%���`\��6��a�4�6R�0��.�'&�VH�eWO ���K����������X~����x:>���i�%B�	2A�M��Y�����(J����.�(�vs�����Y5�3	LsN	
l1�Ee1IY����vN��R����Mk:K���bfu��2U��e������H�v�	&A2��"���-��<��7vLyx(KS�F���� �k�D[:���I]��vij�z&��E�\��4�4'��1Q�D(�$.�_
*}NS�8�7K���R������^*M�-���t������m/7q1|e�����e]*CN����5Q�SG�6o��b��;�d�MqC6�����}��vi�G�Fi�6�PO��Y>!�����������r���`��;��W�D�w4�x�=��5a����L������l;l�K�Po����s�?��Z0����R9Q�"M8�N�,��v�rtIS#��4[�mI��f6Y��.E]N�)X�(��,,��U�(q���t0��V�-�p�t���P]��X`��Jjb���nfC�EX�o�/EP-��z>x���T-�@����A��'������TcO�����e-���bl������`�bpU�����
�W)�U�<�-�M��6����7�6�������x������S�Y6����HbJ�"���*f��u��r�����U�Kq�� ��0
pg_stat_statements-identification-v1.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v1.patch.gzDownload
#4Daniel Farina
drfarina@acm.org
In reply to: Daniel Farina (#3)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 29, 2012 at 4:21 AM, Daniel Farina <drfarina@acm.org> wrote:

These were not express goals of the patch, but so long as you are
inviting features, attached is a bonus patch that exposes the queryid
and also the notion of a "statistics session" that is re-rolled
whenever the stats file could not be read or the stats are reset, able
to fully explain all obvious causes of retrograde motion in
statistics. It too is cumulative, so it includes the under-estimation
field. Notably, I also opted to nullify extra pg_stat_statements
fields when they'd also show "insufficient privileges" (that one is
spared from this censorship), because I feel as though a bit too much
information leaks from pg_stat_statement's statistics to ignore,
especially after adding the query id. Since the common theme here is
identifying queries, I have called it
"pg_stat_statements-identification", and it can be found in the git
repo above under the same name (...-v1).

A small amendment that doesn't really change the spirit of the
narrative is attached.

--
fdr

Attachments:

pg_stat_statements-identification-v2.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v2.patch.gzDownload
#5Peter Geoghegan
peter@2ndquadrant.com
In reply to: Daniel Farina (#3)
Re: pg_stat_statements: calls under-estimation propagation

On 29 December 2012 12:21, Daniel Farina <drfarina@acm.org> wrote:

These were not express goals of the patch, but so long as you are
inviting features, attached is a bonus patch that exposes the queryid
and also the notion of a "statistics session" that is re-rolled
whenever the stats file could not be read or the stats are reset, able
to fully explain all obvious causes of retrograde motion in
statistics. It too is cumulative, so it includes the under-estimation
field.

Cool.

I had a thought about Tom's objection to exposing the hash value. I
would like to propose a compromise between exposing the hash value and
not doing so at all.

What if we expose the hash value without documenting it, in a way not
apparent to normal users, while letting experts willing to make an
executive decision about its stability use it? What I have in mind is
to expose the hash value from the pg_stat_statements function, and yet
to avoid exposing it within the pg_stat_statements view definition.
The existence of the hash value would not need to be documented, since
the pg_stat_statements function is an undocumented implementation
detail.

Precedent for this exists, I think - the undocumented system hash
functions are exposed via an SQL interface. Some satellite projects
rely on this (apparently the pl/proxy documentation shows the use of
hashtext(), which is a thin wrapper on hash_any(), and there is
chatter about it elsewhere). So it is already the case that people are
using hashtext(), which should not be problematic if the applications
that do so have a reasonable set of expectations about its stability
(i.e. it's not going to change in a point release, because that would
break hash indexes, but may well change across major releases). We've
already in effect promised to not break hashtext() in a point release,
just as we've already in effect promised to not break the hash values
that pg_stat_statements uses internally (to do any less would
invalidate the on-disk representation, and necessitate bumping
PGSS_FILE_HEADER to wipe the stored stats).

Thoughts?

Notably, I also opted to nullify extra pg_stat_statements
fields when they'd also show "insufficient privileges" (that one is
spared from this censorship), because I feel as though a bit too much
information leaks from pg_stat_statement's statistics to ignore,
especially after adding the query id.

That seems sensible.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Daniel Farina
daniel@heroku.com
In reply to: Peter Geoghegan (#5)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 29, 2012 at 6:37 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 29 December 2012 12:21, Daniel Farina <drfarina@acm.org> wrote:

These were not express goals of the patch, but so long as you are
inviting features, attached is a bonus patch that exposes the queryid
and also the notion of a "statistics session" that is re-rolled
whenever the stats file could not be read or the stats are reset, able
to fully explain all obvious causes of retrograde motion in
statistics. It too is cumulative, so it includes the under-estimation
field.

Cool.

I had a thought about Tom's objection to exposing the hash value. I
would like to propose a compromise between exposing the hash value and
not doing so at all.

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it. Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions. That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

A change in hash function should also then necessitate changing the
stat file header.

Another more minor issue is the hard-wiring of the precision of the
id. For that reason, I have thought it reasonable to expose this as a
string also.

--
fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Peter Geoghegan
peter@2ndquadrant.com
In reply to: Daniel Farina (#6)
Re: pg_stat_statements: calls under-estimation propagation

On 30 December 2012 02:45, Daniel Farina <daniel@heroku.com> wrote:

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it. Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions. That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

Hmm. I like the idea, but a concern there would be that you'd
introduce additional scope for collisions in the third-party utility
building time-series data from snapshots. I currently put the
probability of a collision within pg_stat_statements as 1% in the
event of a pg_stat_statements.max of 10,000.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Daniel Farina
daniel@heroku.com
In reply to: Peter Geoghegan (#7)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 29, 2012 at 7:12 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 30 December 2012 02:45, Daniel Farina <daniel@heroku.com> wrote:

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it. Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions. That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

Hmm. I like the idea, but a concern there would be that you'd
introduce additional scope for collisions in the third-party utility
building time-series data from snapshots. I currently put the
probability of a collision within pg_stat_statements as 1% in the
event of a pg_stat_statements.max of 10,000.

We can use a longer session key and duplicate the queryid (effectively
padding) a couple of times to complete the XOR. I think that makes
the cases of collisions introduced by this astronomically low, as an
increase over the base collision rate.

--
fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Daniel Farina
daniel@heroku.com
In reply to: Daniel Farina (#8)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 29, 2012 at 7:16 PM, Daniel Farina <daniel@heroku.com> wrote:

On Sat, Dec 29, 2012 at 7:12 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 30 December 2012 02:45, Daniel Farina <daniel@heroku.com> wrote:

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it. Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions. That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

Hmm. I like the idea, but a concern there would be that you'd
introduce additional scope for collisions in the third-party utility
building time-series data from snapshots. I currently put the
probability of a collision within pg_stat_statements as 1% in the
event of a pg_stat_statements.max of 10,000.

We can use a longer session key and duplicate the queryid (effectively
padding) a couple of times to complete the XOR. I think that makes
the cases of collisions introduced by this astronomically low, as an
increase over the base collision rate.

A version implementing that is attached, except I generate an
additional 64-bit session not exposed to the client to prevent even
casual de-leaking of the session state. That may seem absurd, until
someone writes a tool that de-xors things and relies on it and then
nobody feels inclined to break it. It also keeps the public session
number short.

I also opted to save the underestimate since I'm adding a handful of
fixed width fields to the file format anyway.

--
fdr

Attachments:

pg_stat_statements-identification-v3.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v3.patch.gzDownload
#10Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Daniel Farina (#9)
Re: pg_stat_statements: calls under-estimation propagation

On 30.12.2012 08:31, Daniel Farina wrote:

A version implementing that is attached, except I generate an
additional 64-bit session not exposed to the client to prevent even
casual de-leaking of the session state. That may seem absurd, until
someone writes a tool that de-xors things and relies on it and then
nobody feels inclined to break it. It also keeps the public session
number short.

I also opted to save the underestimate since I'm adding a handful of
fixed width fields to the file format anyway.

This patch needs documentation. At a minimum, the new calls_underest
field needs to be listed in the description of the pg_stat_statements.

Pardon for not following the discussion: What exactly does the
calls_underest field mean? I couldn't decipher it from the patch. What
can an admin do with the value? How does it compare with just bumping up
pg_stat_statements.max?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11samthakur74
samthakur74@gmail.com
In reply to: Heikki Linnakangas (#10)
Re: pg_stat_statements: calls under-estimation propagation

This patch needs documentation. At a minimum, the new calls_underest
field needs to be listed in the description of the pg_stat_statements.

I have attached a version which includes documentation.
pg_stat_statements-identification-v4.patch.gz
<http://postgresql.1045698.n5.nabble.com/file/n5770844/pg_stat_statements-identification-v4.patch.gz&gt;

Pardon for not following the discussion: What exactly does the
calls_underest field mean? I couldn't decipher it from the patch. What
can an admin do with the value? How does it compare with just bumping up
pg_stat_statements.max?

Paraphrasing the documentation.
There is a possibility that,for queries which are tracked intermittently,
could have their statistics silently reset.
The calls_underest field gives an indication that a query has been tracked
intermittently and consequently
have a higher probability of erroneous statistics. Queries tracked
constantly will have a zero value for
calls_underest indicating zero probability of erroneous statistics.
A greater value of calls_underest indicates greater degree of inconsistent
tracking which in
turn means greater possibility of erroneous statistics due to statistics
being reset when
query was not being tracked.

Increasing pg_stat_statements.max reduces the possibility of a query being
tracked intermittently but does not address the problem of indicating the
probability of erroneous statistics if the query is indeed being tracked
intermittently. I do not believe the admin can influence the value of
calls_underest as it is not a GUC parameter.

We have a need to see this patch committed hence the revived interest in
this thread

regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5770844.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Peter Eisentraut
peter_e@gmx.net
In reply to: samthakur74 (#11)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, 2013-09-14 at 03:51 -0700, samthakur74 wrote:

We have a need to see this patch committed hence the revived interest
in this thread

You have added this email to the commit fest, but it contains no patch.
Please add the email with the actual patch. Maybe the author should be
given a chance to update the patches, though, because they are quite
old.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13samthakur74
samthakur74@gmail.com
In reply to: Peter Eisentraut (#12)
Re: pg_stat_statements: calls under-estimation propagation

You have added this email to the commit fest, but it contains no patch.

Please add the email with the actual patch.

I hope its attached now!

Maybe the author should be

given a chance to update the patches, though, because they are quite
old.

I did connect with Daniel and he did have some improvement ideas. I am not
sure when they could be implemented. Since we have a interest in the
current version of the patch, which needed documentation, i tried to
complete that.
Thank you,
Sameer

pg_stat_statements-identification-v4.patch.gz (8K) <http://postgresql.1045698.n5.nabble.com/attachment/5770937/0/pg_stat_statements-identification-v4.patch.gz&gt;

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5770937.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#14Fujii Masao
masao.fujii@gmail.com
In reply to: samthakur74 (#13)
Re: pg_stat_statements: calls under-estimation propagation

On Sun, Sep 15, 2013 at 3:54 PM, samthakur74 <samthakur74@gmail.com> wrote:

You have added this email to the commit fest, but it contains no patch.

Please add the email with the actual patch.

I hope its attached now!

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15samthakur74
samthakur74@gmail.com
In reply to: Fujii Masao (#14)
Re: pg_stat_statements: calls under-estimation propagation

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

<http://www.postgresql.org/mailpref/pgsql-hackers&gt;

NAML<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&amp;id=instant_html%21nabble%3Aemail.naml&amp;base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&amp;breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml&gt;

On Tue, Sep 17, 2013 at 5:47 PM, Fujii Masao-2 [via PostgreSQL] <
ml-node+s1045698n5771213h27@n5.nabble.com> wrote:

On Sun, Sep 15, 2013 at 3:54 PM, samthakur74 <[hidden email]<http://user/SendEmail.jtp?type=node&amp;node=5771213&amp;i=0&gt;&gt;
wrote:

You have added this email to the commit fest, but it contains no

patch.

Please add the email with the actual patch.

I hope its attached now!

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list ([hidden email]<http://user/SendEmail.jtp?type=node&amp;node=5771213&amp;i=1&gt;)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

------------------------------
If you reply to this email, your message will be added to the discussion
below:

http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771213.html
To unsubscribe from pg_stat_statements: calls under-estimation
propagation, click here<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&amp;node=5738128&amp;code=c2FtdGhha3VyNzRAZ21haWwuY29tfDU3MzgxMjh8ODM4MzYxMTcy&gt;
.
NAML<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&amp;id=instant_html%21nabble%3Aemail.naml&amp;base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&amp;breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml&gt;

pg_stat_statements-identification-v4.patch.gz (9K) <http://postgresql.1045698.n5.nabble.com/attachment/5771248/0/pg_stat_statements-identification-v4.patch.gz&gt;

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771248.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#16Fujii Masao
masao.fujii@gmail.com
In reply to: samthakur74 (#15)
Re: pg_stat_statements: calls under-estimation propagation

On Tue, Sep 17, 2013 at 10:48 PM, samthakur74 <samthakur74@gmail.com> wrote:

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

pg_stat_statements--1.2.sql is missing. Could you confirm that the patch
includes all the changes?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Sameer Thakur
samthakur74@gmail.com
In reply to: Fujii Masao (#16)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

I did not add pg_stat_statements--1.2.sql. I have added that now and
updated the patch again.

The patch attached should contain following file changes
patching file contrib/pg_stat_statements/Makefile
patching file contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
patching file contrib/pg_stat_statements/pg_stat_statements--1.2.sql
patching file contrib/pg_stat_statements/pg_stat_statements.c
patching file contrib/pg_stat_statements/pg_stat_statements.control
patching file doc/src/sgml/pgstatstatements.sgml

regards
Sameer

Attachments:

pg_stat_statements-identification-v4.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v4.patch.gzDownload
#18Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#17)
Re: pg_stat_statements: calls under-estimation propagation

On Wed, Sep 18, 2013 at 2:41 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

I did not add pg_stat_statements--1.2.sql. I have added that now and updated
the patch again.

Thanks!

I got the segmentation fault when I tested the case where the least-executed
query statistics is discarded, i.e., when I executed different queries more than
pg_stat_statements.max times. I guess that the patch might have a bug.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#18)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Sep 19, 2013 at 2:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Sep 18, 2013 at 2:41 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

I did not add pg_stat_statements--1.2.sql. I have added that now and updated
the patch again.

pg_stat_statements--1.1.sql should be removed.

+      <entry><structfield>queryid</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Unique value of each representative statement for the
current statistics session.
+       This value will change for each new statistics session.</entry>

What does "statistics session" mean?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20samthakur74
samthakur74@gmail.com
In reply to: Fujii Masao (#19)
Re: pg_stat_statements: calls under-estimation propagation

I got the segmentation fault when I tested the case where the

least-executed

query statistics is discarded, i.e., when I executed different queries

more than

pg_stat_statements.max times. I guess that the patch might have a bug.

Thanks, will try to fix it.

pg_stat_statements--1.1.sql should be removed.
Yes will do that

+      <entry><structfield>queryid</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Unique value of each representative statement for the
current statistics session.
+       This value will change for each new statistics session.</entry>

What does "statistics session" mean?

The time period when statistics are gathered by statistics collector
without being reset. So the statistics session continues across normal
shutdowns, but in case of abnormal situations like crashes, format upgrades
or statistics being reset for any other reason, a new time period of
statistics collection starts i.e. a new statistics session. The queryid
value generation is linked to statistics session so emphasize the fact that
in case of crashes,format upgrades or any situation of statistics reset,
the queryid for the same queries will also change. Will update
documentation clearly explain the term statistics session in this context

regards

Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771562.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#21Fujii Masao
masao.fujii@gmail.com
In reply to: samthakur74 (#20)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Sep 19, 2013 at 2:25 PM, samthakur74 <samthakur74@gmail.com> wrote:

I got the segmentation fault when I tested the case where the
least-executed
query statistics is discarded, i.e., when I executed different queries more
than
pg_stat_statements.max times. I guess that the patch might have a bug.

Thanks, will try to fix it.

pg_stat_statements--1.1.sql should be removed.

Yes will do that

+      <entry><structfield>queryid</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Unique value of each representative statement for the
current statistics session.
+       This value will change for each new statistics session.</entry>

What does "statistics session" mean?

The time period when statistics are gathered by statistics collector
without being reset. So the statistics session continues across normal
shutdowns, but in case of abnormal situations like crashes, format upgrades
or statistics being reset for any other reason, a new time period of
statistics collection starts i.e. a new statistics session. The queryid
value generation is linked to statistics session so emphasize the fact that
in case of crashes,format upgrades or any situation of statistics reset, the
queryid for the same queries will also change.

I'm afraid that this behavior narrows down the use case of queryid very much.
For example, since the queryid of the same query would not be the same in
the master and the standby servers, we cannot associate those two statistics
by using the queryid. The queryid changes through the crash recovery, so
we cannot associate the query statistics generated before the crash with that
generated after the crash recovery even if the query is the same.

This is not directly related to the patch itself, but why does the queryid
need to be calculated based on also the "statistics session"?

Will update documentation
clearly explain the term statistics session in this context

Yep, that's helpful!

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22samthakur74
samthakur74@gmail.com
In reply to: Fujii Masao (#21)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Sep 19, 2013 at 11:32 AM, Fujii Masao-2 [via PostgreSQL] <
ml-node+s1045698n5771565h39@n5.nabble.com> wrote:

On Thu, Sep 19, 2013 at 2:25 PM, samthakur74 <[hidden email]<http://user/SendEmail.jtp?type=node&amp;node=5771565&amp;i=0&gt;&gt;
wrote:

I got the segmentation fault when I tested the case where the
least-executed
query statistics is discarded, i.e., when I executed different queries

more

than
pg_stat_statements.max times. I guess that the patch might have a bug.

Thanks, will try to fix it.

pg_stat_statements--1.1.sql should be removed.

Yes will do that

+      <entry><structfield>queryid</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Unique value of each representative statement for the
current statistics session.
+       This value will change for each new statistics

session.</entry>

What does "statistics session" mean?

The time period when statistics are gathered by statistics collector
without being reset. So the statistics session continues across normal
shutdowns, but in case of abnormal situations like crashes, format

upgrades

or statistics being reset for any other reason, a new time period of
statistics collection starts i.e. a new statistics session. The queryid
value generation is linked to statistics session so emphasize the fact

that

in case of crashes,format upgrades or any situation of statistics

reset, the

queryid for the same queries will also change.

I'm afraid that this behavior narrows down the use case of queryid very

much.

For example, since the queryid of the same query would not be the same in
the master and the standby servers, we cannot associate those two

statistics

by using the queryid. The queryid changes through the crash recovery, so
we cannot associate the query statistics generated before the crash with

that

generated after the crash recovery even if the query is the same.

Yes, these are limitations in this approach. The other approaches

suggested included
1. Expose query id hash value as is, in the view, but document the fact
that it will be unstable between releases
2. Expose query id hash value via an undocumented function and let more
expert users decided if they want to use it.

The approach of using statistics session id to generate queryid is a
compromise between not exposing it at all and exposing it without warning
the users of unstable hash value from query tree between releases.

This is not directly related to the patch itself, but why does the

queryid

need to be calculated based on also the "statistics session"?

If we expose hash value of query tree, without using statistics session,
it is possible that users might make wrong assumption that this value
remains stable across version upgrades, when in reality it depends on
whether the version has make changes to query tree internals. So to
explicitly ensure that users do not make this wrong assumption, hash value
generation use statistics session id, which is newly created under
situations described above.

Will update documentation
clearly explain the term statistics session in this context

Yep, that's helpful!

Regards,
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771701.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#23Daniel Farina
daniel@fdr.io
In reply to: samthakur74 (#13)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Sep 14, 2013 at 11:54 PM, samthakur74 <samthakur74@gmail.com> wrote:

You have added this email to the commit fest, but it contains no patch.

Please add the email with the actual patch.

I hope its attached now!

Maybe the author should be

given a chance to update the patches, though, because they are quite
old.

I did connect with Daniel and he did have some improvement ideas. I am
not sure when they could be implemented. Since we have a interest in the
current version of the patch, which needed documentation, i tried to
complete that.
Thank you,
Sameer

Hello,

With regard to the improvements mentioned:

So I took a second look at this to hack on.

I think the n-call underestimation propagation may not be quite precise for
various detailed reasons (having to do with 'sticky' queries) and to make
it precise is probably more work than it's worth. And, on more reflection,
I'm also having a hard time imaging people intuiting that value usefully.
So, here's a version removing that. This is my way of saying that I think
this feature idea of mine is not good, even in spite of the loss of being
able to see when queries bounce in and out of the session. A
non-cumulative diff vs. v4 to speed review is affixed to the bottom of this
mail.

I think a more generally useful approach would be to spiritually re-cast
ncall under-estimation (as "(re)introduced to session time") and and the
session-id as timestamps ("session started"). I think this is prettier,
has more prior art in Postgres, and more useful to most people.

A small spanner in the works is being sensitive to binaries with
non-integral timestamp representation in the statistics file. Any
suggestions there?

The appeal of the randomized session-id is that one can catenate the output
of views from multiple servers directly together without creating
ambiguity, but I have come to think anyone doing such an advanced use case
ought to seed in some of that server information onto the timestamp itself,
and for most inspections the time of the current running statistics session
is probably more useful.

Delta between v4 and v5 begins.

*** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***************
*** 19,25 **** CREATE FUNCTION pg_stat_statements(
      OUT query text,
      OUT query_id int4,
      OUT calls int8,
-     OUT calls_underest int8,
      OUT total_time float8,
      OUT rows int8,
      OUT shared_blks_hit int8,
--- 19,24 ----
*** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***************
*** 16,22 **** CREATE FUNCTION pg_stat_statements(
      OUT query text,
      OUT query_id int8,
      OUT calls int8,
-     OUT calls_underest int8,
      OUT total_time float8,
      OUT rows int8,
      OUT shared_blks_hit int8,
--- 16,21 ----
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 68,74 **** PG_MODULE_MAGIC;
  #define PGSS_DUMP_FILE "global/pg_stat_statements.stat"

/* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20121231;

  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration) (1.0)
--- 68,74 ----
  #define PGSS_DUMP_FILE "global/pg_stat_statements.stat"

/* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20130820;

  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration) (1.0)
***************
*** 116,122 **** typedef enum pgssTupVersion
  typedef struct Counters
  {
  int64 calls; /* # of times executed */
- int64 calls_underest; /* max underestimation of # of executions */
  double total_time; /* total execution time, in msec */
  int64 rows; /* total # of retrieved or affected rows */
  int64 shared_blks_hit; /* # of shared buffer hits */
--- 116,121 ----
***************
*** 156,170 **** typedef struct pgssEntry
  typedef struct pgssSharedState
  {
  LWLockId lock; /* protects hashtable search/modification */
-
- /*
-  * cache of maximum calls-counter underestimation in hashtab
-  *
-  * Only accessed and changed along with the hash table, so also protected
-  * by 'lock'.
-  */
- int64 calls_max_underest;
-
  int query_size; /* max query length in bytes */
  double cur_median_usage; /* current median usage in hashtable */
--- 155,160 ----
***************
*** 505,511 **** pgss_shmem_startup(void)
  {
  /* First time through ... */
  pgss->lock = LWLockAssign();
- pgss->calls_max_underest = 0;
  pgss->query_size = pgstat_track_activity_query_size;
  pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
  }
--- 495,500 ----
***************
*** 560,570 **** pgss_shmem_startup(void)
  header != PGSS_FILE_HEADER)
  goto error;
- /* Restore under-estimation state */
- if (fread(&pgss->calls_max_underest,
-   sizeof pgss->calls_max_underest, 1, file) != 1)
- goto error;
-
  /* Restore saved session key, if possible. */
  if (fread(&pgss->stat_session_key,
    sizeof pgss->stat_session_key, 1, file) != 1)
--- 549,554 ----
***************
*** 685,695 **** pgss_shmem_shutdown(int code, Datum arg)
  if (fwrite(&PGSS_FILE_HEADER, sizeof(uint32), 1, file) != 1)
  goto error;
- /* Save under-estimation state */
- if (fwrite(&pgss->calls_max_underest,
-    sizeof pgss->calls_max_underest, 1, file) != 1)
- goto error;
-
  /* Save stat session key. */
  if (fwrite(&pgss->stat_session_key,
     sizeof pgss->stat_session_key, 1, file) != 1)
--- 669,674 ----
***************
*** 1164,1170 **** pgss_store(const char *query, uint32 queryId,
  e->counters.usage = USAGE_INIT;
  e->counters.calls += 1;
- e->counters.calls_underest = pgss->calls_max_underest;
  e->counters.total_time += total_time;
  e->counters.rows += rows;
  e->counters.shared_blks_hit += bufusage->shared_blks_hit;
--- 1143,1148 ----
***************
*** 1207,1213 **** pg_stat_statements_reset(PG_FUNCTION_ARGS)

#define PG_STAT_STATEMENTS_COLS_V1_0 14
#define PG_STAT_STATEMENTS_COLS_V1_1 18
! #define PG_STAT_STATEMENTS_COLS 21

  /*
   * Retrieve statement statistics.
--- 1185,1191 ----

#define PG_STAT_STATEMENTS_COLS_V1_0 14
#define PG_STAT_STATEMENTS_COLS_V1_1 18
! #define PG_STAT_STATEMENTS_COLS 20

/*
* Retrieve statement statistics.
***************
*** 1349,1356 **** pg_stat_statements(PG_FUNCTION_ARGS)
}

  values[i++] = Int64GetDatumFast(tmp.calls);
- if (detected_version >= PGSS_TUP_LATEST)
- values[i++] = Int64GetDatumFast(tmp.calls_underest);
  values[i++] = Float8GetDatumFast(tmp.total_time);
  values[i++] = Int64GetDatumFast(tmp.rows);
  values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
--- 1327,1332 ----
***************
*** 1491,1499 **** entry_cmp(const void *lhs, const void *rhs)
  /*
   * Deallocate least used entries.
   * Caller must hold an exclusive lock on pgss->lock.
-  *
-  * Also increases the underestimation maximum in pgss as a side
-  * effect, if necessary.
   */
  static void
  entry_dealloc(void)
--- 1467,1472 ----
***************
*** 1536,1548 **** entry_dealloc(void)

for (i = 0; i < nvictims; i++)
{
- const Counters *cur_counts = &entry->counters;
- int64 cur_underest;
-
- /* Update global calls estimation state, if necessary. */
- cur_underest = cur_counts->calls + cur_counts->calls_underest;
- pgss->calls_max_underest = Max(pgss->calls_max_underest, cur_underest);
-
hash_search(pgss_hash, &entries[i]->key, HASH_REMOVE, NULL);
}

--- 1509,1514 ----

Attachments:

pg_stat_statements-identification-v5.patchapplication/octet-stream; name=pg_stat_statements-identification-v5.patchDownload
*** a/contrib/pg_stat_statements/Makefile
--- b/contrib/pg_stat_statements/Makefile
***************
*** 5,10 **** OBJS = pg_stat_statements.o
--- 5,11 ----
  
  EXTENSION = pg_stat_statements
  DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+ 	pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
  	pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
*** /dev/null
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***************
*** 0 ****
--- 1,44 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use "ALTER EXTENSION pg_stat_statements UPDATE" to load this file. \quit
+ 
+ /* First we have to remove them from the extension */
+ ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+ 
+ /* Then we can drop them */
+ DROP VIEW pg_stat_statements;
+ DROP FUNCTION pg_stat_statements();
+ 
+ /* Now redefine */
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT stat_session_id int8,
+     OUT query text,
+     OUT query_id int4,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
*** /dev/null
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***************
*** 0 ****
--- 1,45 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
+ 
+ -- Register functions.
+ CREATE FUNCTION pg_stat_statements_reset()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT stat_session_id int4,
+     OUT query text,
+     OUT query_id int8,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ -- Register a view on the function for ease of use.
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
+ 
+ -- Don't want this to be available to non-superusers.
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 43,48 ****
--- 43,49 ----
   */
  #include "postgres.h"
  
+ #include <time.h>
  #include <unistd.h>
  
  #include "access/hash.h"
***************
*** 67,73 **** PG_MODULE_MAGIC;
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20120328;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
--- 68,74 ----
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20130820;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
***************
*** 96,101 **** typedef struct pgssHashKey
--- 97,116 ----
  } pgssHashKey;
  
  /*
+  * Identifies the tuple format detected by pg_stat_statements.
+  *
+  * Used to identify features of newer formats and enable smooth
+  * upgrades: one can install a new pg_stat_statements binary while
+  * running with the old SQL function definitions.
+  */
+ typedef enum pgssTupVersion
+ {
+ 	PGSS_TUP_V1_0 = 1,
+ 	PGSS_TUP_V1_1,
+ 	PGSS_TUP_LATEST
+ } pgssTupVersion;
+ 
+ /*
   * The actual stats counters kept within pgssEntry.
   */
  typedef struct Counters
***************
*** 128,133 **** typedef struct pgssEntry
--- 143,149 ----
  	pgssHashKey key;			/* hash key of entry - MUST BE FIRST */
  	Counters	counters;		/* the statistics for this query */
  	int			query_len;		/* # of valid bytes in query string */
+ 	uint32		query_id;		/* jumble value for this entry */
  	slock_t		mutex;			/* protects the counters only */
  	char		query[1];		/* VARIABLE LENGTH ARRAY - MUST BE LAST */
  	/* Note: the allocated length of query[] is actually pgss->query_size */
***************
*** 141,146 **** typedef struct pgssSharedState
--- 157,171 ----
  	LWLockId	lock;			/* protects hashtable search/modification */
  	int			query_size;		/* max query length in bytes */
  	double		cur_median_usage;		/* current median usage in hashtable */
+ 
+ 	/*
+ 	 * Is set to a new random value when the server restarts or stats
+ 	 * are reset, to de-confuse cases of non-monotonic counters (as
+ 	 * they had reasons to decrease) for tools that aggregate
+ 	 * snapshots of pg_stat_statements.
+ 	 */
+ 	int32		stat_session_key;
+ 	uint64		private_stat_session_key;
  } pgssSharedState;
  
  /*
***************
*** 192,197 **** static ProcessUtility_hook_type prev_ProcessUtility = NULL;
--- 217,226 ----
  static pgssSharedState *pgss = NULL;
  static HTAB *pgss_hash = NULL;
  
+ /* Random number seeding */
+ static unsigned int random_seed = 0;
+ static struct timeval random_start_time;
+ 
  /*---- GUC variables ----*/
  
  typedef enum
***************
*** 230,235 **** Datum		pg_stat_statements(PG_FUNCTION_ARGS);
--- 259,266 ----
  PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
  PG_FUNCTION_INFO_V1(pg_stat_statements);
  
+ static long pgssRandom(void);
+ static void pgss_new_stat_session(pgssSharedState *state);
  static void pgss_shmem_startup(void);
  static void pgss_shmem_shutdown(int code, Datum arg);
  static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
***************
*** 251,257 **** static void pgss_store(const char *query, uint32 queryId,
  		   pgssJumbleState *jstate);
  static Size pgss_memsize(void);
  static pgssEntry *entry_alloc(pgssHashKey *key, const char *query,
! 			int query_len, bool sticky);
  static void entry_dealloc(void);
  static void entry_reset(void);
  static void AppendJumble(pgssJumbleState *jstate,
--- 282,288 ----
  		   pgssJumbleState *jstate);
  static Size pgss_memsize(void);
  static pgssEntry *entry_alloc(pgssHashKey *key, const char *query,
! 							  int query_len, uint32 query_id, bool sticky);
  static void entry_dealloc(void);
  static void entry_reset(void);
  static void AppendJumble(pgssJumbleState *jstate,
***************
*** 291,297 **** _PG_init(void)
  							NULL,
  							&pgss_max,
  							1000,
! 							100,
  							INT_MAX,
  							PGC_POSTMASTER,
  							0,
--- 322,328 ----
  							NULL,
  							&pgss_max,
  							1000,
! 							1,
  							INT_MAX,
  							PGC_POSTMASTER,
  							0,
***************
*** 379,384 **** _PG_fini(void)
--- 410,463 ----
  }
  
  /*
+  * Random numbers -- ported from PostmasterRandom.
+  */
+ static long
+ pgssRandom(void)
+ {
+ 	/*
+ 	 * Select a random seed if one hasn't been selected before.
+ 	 */
+ 	if (random_seed == 0)
+ 	{
+ 		do
+ 		{
+ 			struct timeval random_stop_time;
+ 
+ 			gettimeofday(&random_stop_time, NULL);
+ 
+ 			/*
+ 			 * We are not sure how much precision is in tv_usec, so we swap
+ 			 * the high and low 16 bits of 'random_stop_time' and XOR them
+ 			 * with 'random_start_time'. On the off chance that the result is
+ 			 * 0, we loop until it isn't.
+ 			 */
+ 			random_seed = random_start_time.tv_usec ^
+ 				((random_stop_time.tv_usec << 16) |
+ 				 ((random_stop_time.tv_usec >> 16) & 0xffff));
+ 		}
+ 		while (random_seed == 0);
+ 
+ 		srandom(random_seed);
+ 	}
+ 
+ 	return random();
+ }
+ 
+ /*
+  * Create a new, random stat session key.
+  */
+ static void
+ pgss_new_stat_session(pgssSharedState *state)
+ {
+ 	state->stat_session_key = (int32) pgssRandom();
+ 
+ 	/* Generate 64-bit private session */
+ 	state->private_stat_session_key	 = (uint64) pgssRandom();
+ 	state->private_stat_session_key |= ((uint64) pgssRandom()) << 32;
+ }
+ 
+ /*
   * shmem_startup hook: allocate or attach to shared memory,
   * then load any pre-existing statistics from file.
   */
***************
*** 394,399 **** pgss_shmem_startup(void)
--- 473,479 ----
  	int			query_size;
  	int			buffer_size;
  	char	   *buffer = NULL;
+ 	bool 		log_cannot_read = true;
  
  	if (prev_shmem_startup_hook)
  		prev_shmem_startup_hook();
***************
*** 456,471 **** pgss_shmem_startup(void)
  	if (file == NULL)
  	{
  		if (errno == ENOENT)
! 			return;				/* ignore not-found error */
  		goto error;
  	}
  
  	buffer_size = query_size;
  	buffer = (char *) palloc(buffer_size);
  
  	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
! 		header != PGSS_FILE_HEADER ||
! 		fread(&num, sizeof(int32), 1, file) != 1)
  		goto error;
  
  	for (i = 0; i < num; i++)
--- 536,566 ----
  	if (file == NULL)
  	{
  		if (errno == ENOENT)
! 			log_cannot_read = false;
! 
  		goto error;
  	}
  
  	buffer_size = query_size;
  	buffer = (char *) palloc(buffer_size);
  
+ 	/* Check header existence and magic number match. */
  	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
! 		header != PGSS_FILE_HEADER)
! 		goto error;
! 
! 	/* Restore saved session key, if possible. */
! 	if (fread(&pgss->stat_session_key,
! 			  sizeof pgss->stat_session_key, 1, file) != 1)
! 		goto error;
! 
! 	/* Restore private session key */
! 	if (fread(&pgss->private_stat_session_key,
! 			  sizeof pgss->private_stat_session_key, 1, file) != 1)
! 		goto error;
! 
! 	/* Read how many table entries there are. */
! 	if (fread(&num, sizeof(int32), 1, file) != 1)
  		goto error;
  
  	for (i = 0; i < num; i++)
***************
*** 503,509 **** pgss_shmem_startup(void)
  												   query_size - 1);
  
  		/* make the hashtable entry (discards old entries if too many) */
! 		entry = entry_alloc(&temp.key, buffer, temp.query_len, false);
  
  		/* copy in the actual stats */
  		entry->counters = temp.counters;
--- 598,605 ----
  												   query_size - 1);
  
  		/* make the hashtable entry (discards old entries if too many) */
! 		entry = entry_alloc(&temp.key, buffer, temp.query_len, temp.query_id,
! 							false);
  
  		/* copy in the actual stats */
  		entry->counters = temp.counters;
***************
*** 521,530 **** pgss_shmem_startup(void)
  	return;
  
  error:
! 	ereport(LOG,
! 			(errcode_for_file_access(),
! 			 errmsg("could not read pg_stat_statement file \"%s\": %m",
! 					PGSS_DUMP_FILE)));
  	if (buffer)
  		pfree(buffer);
  	if (file)
--- 617,632 ----
  	return;
  
  error:
! 	/* Ignore not-found error. */
! 	if (log_cannot_read)
! 		ereport(LOG,
! 				(errcode_for_file_access(),
! 				 errmsg("could not read pg_stat_statement file \"%s\": %m",
! 						PGSS_DUMP_FILE)));
! 
! 	/* Need a new session if none could be read from the file */
! 	pgss_new_stat_session(pgss);
! 
  	if (buffer)
  		pfree(buffer);
  	if (file)
***************
*** 563,570 **** pgss_shmem_shutdown(int code, Datum arg)
--- 665,685 ----
  	if (file == NULL)
  		goto error;
  
+ 	/* Save header/magic number.  */
  	if (fwrite(&PGSS_FILE_HEADER, sizeof(uint32), 1, file) != 1)
  		goto error;
+ 
+ 	/* Save stat session key. */
+ 	if (fwrite(&pgss->stat_session_key,
+ 			   sizeof pgss->stat_session_key, 1, file) != 1)
+ 			goto error;
+ 
+ 	/* Save private session key */
+ 	if (fwrite(&pgss->private_stat_session_key,
+ 			   sizeof pgss->private_stat_session_key, 1, file) != 1)
+ 			goto error;
+ 
+ 	/* Write how many table entries there are. */
  	num_entries = hash_get_num_entries(pgss_hash);
  	if (fwrite(&num_entries, sizeof(int32), 1, file) != 1)
  		goto error;
***************
*** 991,997 **** pgss_store(const char *query, uint32 queryId,
  			/* Acquire exclusive lock as required by entry_alloc() */
  			LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
  
! 			entry = entry_alloc(&key, norm_query, query_len, true);
  		}
  		else
  		{
--- 1106,1112 ----
  			/* Acquire exclusive lock as required by entry_alloc() */
  			LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
  
! 			entry = entry_alloc(&key, norm_query, query_len, queryId, true);
  		}
  		else
  		{
***************
*** 1008,1014 **** pgss_store(const char *query, uint32 queryId,
  			/* Acquire exclusive lock as required by entry_alloc() */
  			LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
  
! 			entry = entry_alloc(&key, query, query_len, false);
  		}
  	}
  
--- 1123,1129 ----
  			/* Acquire exclusive lock as required by entry_alloc() */
  			LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
  
! 			entry = entry_alloc(&key, query, query_len, queryId, false);
  		}
  	}
  
***************
*** 1069,1075 **** pg_stat_statements_reset(PG_FUNCTION_ARGS)
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS			18
  
  /*
   * Retrieve statement statistics.
--- 1184,1191 ----
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS_V1_1	18
! #define PG_STAT_STATEMENTS_COLS			20
  
  /*
   * Retrieve statement statistics.
***************
*** 1086,1092 **** pg_stat_statements(PG_FUNCTION_ARGS)
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	bool		sql_supports_v1_1_counters = true;
  
  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
--- 1202,1208 ----
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	pgssTupVersion detected_version;
  
  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
***************
*** 1107,1114 **** pg_stat_statements(PG_FUNCTION_ARGS)
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
  	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! 		sql_supports_v1_1_counters = false;
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
--- 1223,1250 ----
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
+ 
+ 	/* Perform version detection */
  	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! 		detected_version = PGSS_TUP_V1_0;
! 	else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_1)
! 		detected_version = PGSS_TUP_V1_1;
! 	else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS)
! 		detected_version = PGSS_TUP_LATEST;
! 	else
! 	{
! 		/*
! 		 * Couldn't identify the tuple format.  Raise error.
! 		 *
! 		 * This is an exceptional case that may only happen in bizarre
! 		 * situations, since it is thought that every released version
! 		 * of pg_stat_statements has a matching schema.
! 		 */
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! 				 errmsg("pg_stat_statements schema is not supported "
! 						"by its installed binary")));
! 	}
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
***************
*** 1135,1140 **** pg_stat_statements(PG_FUNCTION_ARGS)
--- 1271,1277 ----
  
  		values[i++] = ObjectIdGetDatum(entry->key.userid);
  		values[i++] = ObjectIdGetDatum(entry->key.dbid);
+ 		values[i++] = DatumGetInt32(pgss->stat_session_key);
  
  		if (is_superuser || entry->key.userid == userid)
  		{
***************
*** 1150,1156 **** pg_stat_statements(PG_FUNCTION_ARGS)
  				pfree(qstr);
  		}
  		else
! 			values[i++] = CStringGetTextDatum("<insufficient privilege>");
  
  		/* copy counters to a local variable to keep locking time short */
  		{
--- 1287,1307 ----
  				pfree(qstr);
  		}
  		else
! 		{
! 			/*
! 			 * The role calling this function is unable to see
! 			 * sensitive aspects of this tuple.
! 			 *
! 			 * Nullify everything except the "insufficient privilege"
! 			 * message for this entry
! 			 */
! 			memset(nulls, 1, sizeof nulls);
! 
! 			nulls[i]  = 0;
! 			values[i] = CStringGetTextDatum("<insufficient privilege>");
! 
! 			i += 1;
! 		}
  
  		/* copy counters to a local variable to keep locking time short */
  		{
***************
*** 1165,1193 **** pg_stat_statements(PG_FUNCTION_ARGS)
  		if (tmp.calls == 0)
  			continue;
  
  		values[i++] = Int64GetDatumFast(tmp.calls);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (sql_supports_v1_1_counters)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
  		}
  
! 		Assert(i == (sql_supports_v1_1_counters ?
! 					 PG_STAT_STATEMENTS_COLS : PG_STAT_STATEMENTS_COLS_V1_0));
  
  		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
  	}
--- 1316,1369 ----
  		if (tmp.calls == 0)
  			continue;
  
+ 		if (detected_version >= PGSS_TUP_LATEST)
+ 		{
+ 			uint64 qid = pgss->private_stat_session_key;
+ 
+ 			qid ^= (uint64) entry->query_id;
+ 			qid ^= ((uint64) entry->query_id) << 32;
+ 
+ 			values[i++] = Int64GetDatumFast(qid);
+ 		}
+ 
  		values[i++] = Int64GetDatumFast(tmp.calls);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (detected_version >= PGSS_TUP_V1_1)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (detected_version >= PGSS_TUP_V1_1)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (detected_version >= PGSS_TUP_V1_1)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
  		}
  
! #ifdef USE_ASSERT_CHECKING
! 		/* Check that every column appears to be filled */
! 		switch (detected_version)
! 		{
! 				case PGSS_TUP_V1_0:
! 						Assert(i == PG_STAT_STATEMENTS_COLS_V1_0);
! 						break;
! 				case PGSS_TUP_V1_1:
! 						Assert(i == PG_STAT_STATEMENTS_COLS_V1_1);
! 						break;
! 				case PGSS_TUP_LATEST:
! 						Assert(i == PG_STAT_STATEMENTS_COLS);
! 						break;
! 				default:
! 						Assert(false);
! 		}
! #endif
  
  		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
  	}
***************
*** 1234,1240 **** pgss_memsize(void)
   * have made the entry while we waited to get exclusive lock.
   */
  static pgssEntry *
! entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky)
  {
  	pgssEntry  *entry;
  	bool		found;
--- 1410,1417 ----
   * have made the entry while we waited to get exclusive lock.
   */
  static pgssEntry *
! entry_alloc(pgssHashKey *key, const char *query, int query_len,
! 			uint32 query_id, bool sticky)
  {
  	pgssEntry  *entry;
  	bool		found;
***************
*** 1254,1259 **** entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky)
--- 1431,1437 ----
  		memset(&entry->counters, 0, sizeof(Counters));
  		/* set the appropriate initial usage count */
  		entry->counters.usage = sticky ? pgss->cur_median_usage : USAGE_INIT;
+ 
  		/* re-initialize the mutex each time ... we assume no one using it */
  		SpinLockInit(&entry->mutex);
  		/* ... and don't forget the query text */
***************
*** 1261,1266 **** entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky)
--- 1439,1447 ----
  		entry->query_len = query_len;
  		memcpy(entry->query, query, query_len);
  		entry->query[query_len] = '\0';
+ 
+ 		/* Copy in the query id for reporting */
+ 		entry->query_id = query_id;
  	}
  
  	return entry;
***************
*** 1309,1314 **** entry_dealloc(void)
--- 1490,1496 ----
  	while ((entry = hash_seq_search(&hash_seq)) != NULL)
  	{
  		entries[i++] = entry;
+ 
  		/* "Sticky" entries get a different usage decay rate. */
  		if (entry->counters.calls == 0)
  			entry->counters.usage *= STICKY_DECREASE_FACTOR;
***************
*** 1350,1355 **** entry_reset(void)
--- 1532,1543 ----
  		hash_search(pgss_hash, &entry->key, HASH_REMOVE, NULL);
  	}
  
+ 	/*
+ 	 * Counters all reset, so need to generate a new identity for the
+ 	 * session.
+ 	 */
+ 	pgss_new_stat_session(pgss);
+ 
  	LWLockRelease(pgss->lock);
  }
  
*** a/contrib/pg_stat_statements/pg_stat_statements.control
--- b/contrib/pg_stat_statements/pg_stat_statements.control
***************
*** 1,5 ****
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.1'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
--- 1,5 ----
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.2'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
*** a/doc/src/sgml/pgstatstatements.sgml
--- b/doc/src/sgml/pgstatstatements.sgml
***************
*** 57,76 ****
        <entry>OID of database in which the statement was executed</entry>
       </row>
  
!     <row>
        <entry><structfield>query</structfield></entry>
        <entry><type>text</type></entry>
        <entry></entry>
        <entry>Text of a representative statement (up to <xref linkend="guc-track-activity-query-size"> bytes)</entry>
       </row>
! 
       <row>
        <entry><structfield>calls</structfield></entry>
        <entry><type>bigint</type></entry>
        <entry></entry>
        <entry>Number of times executed</entry>
       </row>
! 
       <row>
        <entry><structfield>total_time</structfield></entry>
        <entry><type>double precision</type></entry>
--- 57,87 ----
        <entry>OID of database in which the statement was executed</entry>
       </row>
  
!      <row>
        <entry><structfield>query</structfield></entry>
        <entry><type>text</type></entry>
        <entry></entry>
        <entry>Text of a representative statement (up to <xref linkend="guc-track-activity-query-size"> bytes)</entry>
       </row>
!      <row>
!       <entry><structfield>queryid</structfield></entry>
!       <entry><type>bigint</type></entry>
!       <entry></entry>
!       <entry>Unique value of each representative statement for the current statistics session.
!        This value will change for each new statistics session.</entry>
!      </row>
       <row>
        <entry><structfield>calls</structfield></entry>
        <entry><type>bigint</type></entry>
        <entry></entry>
        <entry>Number of times executed</entry>
       </row>
!      <row>
!       <entry><structfield>calls_underest</structfield></entry>
!       <entry><type>bigint</type></entry>
!       <entry></entry>
!       <entry>Number indicating if the statement statistics has a probability of being erroneous.</entry>
!      </row>
       <row>
        <entry><structfield>total_time</structfield></entry>
        <entry><type>double precision</type></entry>
***************
*** 228,233 ****
--- 239,254 ----
     might appear as separate entries, if they have different meanings as a
     result of factors such as different <varname>search_path</> settings.
    </para>
+ 
+   <para>
+   There is a possibility that,for queries which are tracked intermittently,could have their
+   statistics silently reset while query is not being tracked.The <varname>calls_underest</>
+   field gives an indication that a query has been tracked intermittently and consequently
+   has a probability of erroneous statistics. Queries tracked constantly will have a zero value for
+   <varname>calls_underest</> indicating zero probability of erroneous statistics.
+   A greater value of <varname>calls_underest</> indicates greater degree of intermittent tracking which in
+   turn means greater probability of erroneous statistics.
+   </para>
   </sect2>
  
   <sect2>
#24Daniel Farina
daniel@fdr.io
In reply to: Daniel Farina (#23)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina <daniel@fdr.io> wrote:

I think the n-call underestimation propagation may not be quite precise for
various detailed reasons (having to do with 'sticky' queries) and to make it
precise is probably more work than it's worth. And, on more reflection, I'm
also having a hard time imaging people intuiting that value usefully. So,
here's a version removing that.

I forgot about removal of the relevant SGML, amended here in v6.

Attachments:

pg_stat_statements-identification-v6.patchapplication/octet-stream; name=pg_stat_statements-identification-v6.patchDownload
*** a/contrib/pg_stat_statements/Makefile
--- b/contrib/pg_stat_statements/Makefile
***************
*** 5,10 **** OBJS = pg_stat_statements.o
--- 5,11 ----
  
  EXTENSION = pg_stat_statements
  DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+ 	pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
  	pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
*** /dev/null
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***************
*** 0 ****
--- 1,44 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use "ALTER EXTENSION pg_stat_statements UPDATE" to load this file. \quit
+ 
+ /* First we have to remove them from the extension */
+ ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+ 
+ /* Then we can drop them */
+ DROP VIEW pg_stat_statements;
+ DROP FUNCTION pg_stat_statements();
+ 
+ /* Now redefine */
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT stat_session_id int8,
+     OUT query text,
+     OUT query_id int4,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
*** /dev/null
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***************
*** 0 ****
--- 1,45 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
+ 
+ -- Register functions.
+ CREATE FUNCTION pg_stat_statements_reset()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT stat_session_id int4,
+     OUT query text,
+     OUT query_id int8,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ -- Register a view on the function for ease of use.
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
+ 
+ -- Don't want this to be available to non-superusers.
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 43,48 ****
--- 43,49 ----
   */
  #include "postgres.h"
  
+ #include <time.h>
  #include <unistd.h>
  
  #include "access/hash.h"
***************
*** 67,73 **** PG_MODULE_MAGIC;
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20120328;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
--- 68,74 ----
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20130820;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
***************
*** 96,101 **** typedef struct pgssHashKey
--- 97,116 ----
  } pgssHashKey;
  
  /*
+  * Identifies the tuple format detected by pg_stat_statements.
+  *
+  * Used to identify features of newer formats and enable smooth
+  * upgrades: one can install a new pg_stat_statements binary while
+  * running with the old SQL function definitions.
+  */
+ typedef enum pgssTupVersion
+ {
+ 	PGSS_TUP_V1_0 = 1,
+ 	PGSS_TUP_V1_1,
+ 	PGSS_TUP_LATEST
+ } pgssTupVersion;
+ 
+ /*
   * The actual stats counters kept within pgssEntry.
   */
  typedef struct Counters
***************
*** 128,133 **** typedef struct pgssEntry
--- 143,149 ----
  	pgssHashKey key;			/* hash key of entry - MUST BE FIRST */
  	Counters	counters;		/* the statistics for this query */
  	int			query_len;		/* # of valid bytes in query string */
+ 	uint32		query_id;		/* jumble value for this entry */
  	slock_t		mutex;			/* protects the counters only */
  	char		query[1];		/* VARIABLE LENGTH ARRAY - MUST BE LAST */
  	/* Note: the allocated length of query[] is actually pgss->query_size */
***************
*** 141,146 **** typedef struct pgssSharedState
--- 157,171 ----
  	LWLockId	lock;			/* protects hashtable search/modification */
  	int			query_size;		/* max query length in bytes */
  	double		cur_median_usage;		/* current median usage in hashtable */
+ 
+ 	/*
+ 	 * Is set to a new random value when the server restarts or stats
+ 	 * are reset, to de-confuse cases of non-monotonic counters (as
+ 	 * they had reasons to decrease) for tools that aggregate
+ 	 * snapshots of pg_stat_statements.
+ 	 */
+ 	int32		stat_session_key;
+ 	uint64		private_stat_session_key;
  } pgssSharedState;
  
  /*
***************
*** 192,197 **** static ProcessUtility_hook_type prev_ProcessUtility = NULL;
--- 217,226 ----
  static pgssSharedState *pgss = NULL;
  static HTAB *pgss_hash = NULL;
  
+ /* Random number seeding */
+ static unsigned int random_seed = 0;
+ static struct timeval random_start_time;
+ 
  /*---- GUC variables ----*/
  
  typedef enum
***************
*** 230,235 **** Datum		pg_stat_statements(PG_FUNCTION_ARGS);
--- 259,266 ----
  PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
  PG_FUNCTION_INFO_V1(pg_stat_statements);
  
+ static long pgssRandom(void);
+ static void pgss_new_stat_session(pgssSharedState *state);
  static void pgss_shmem_startup(void);
  static void pgss_shmem_shutdown(int code, Datum arg);
  static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
***************
*** 251,257 **** static void pgss_store(const char *query, uint32 queryId,
  		   pgssJumbleState *jstate);
  static Size pgss_memsize(void);
  static pgssEntry *entry_alloc(pgssHashKey *key, const char *query,
! 			int query_len, bool sticky);
  static void entry_dealloc(void);
  static void entry_reset(void);
  static void AppendJumble(pgssJumbleState *jstate,
--- 282,288 ----
  		   pgssJumbleState *jstate);
  static Size pgss_memsize(void);
  static pgssEntry *entry_alloc(pgssHashKey *key, const char *query,
! 							  int query_len, uint32 query_id, bool sticky);
  static void entry_dealloc(void);
  static void entry_reset(void);
  static void AppendJumble(pgssJumbleState *jstate,
***************
*** 291,297 **** _PG_init(void)
  							NULL,
  							&pgss_max,
  							1000,
! 							100,
  							INT_MAX,
  							PGC_POSTMASTER,
  							0,
--- 322,328 ----
  							NULL,
  							&pgss_max,
  							1000,
! 							1,
  							INT_MAX,
  							PGC_POSTMASTER,
  							0,
***************
*** 379,384 **** _PG_fini(void)
--- 410,463 ----
  }
  
  /*
+  * Random numbers -- ported from PostmasterRandom.
+  */
+ static long
+ pgssRandom(void)
+ {
+ 	/*
+ 	 * Select a random seed if one hasn't been selected before.
+ 	 */
+ 	if (random_seed == 0)
+ 	{
+ 		do
+ 		{
+ 			struct timeval random_stop_time;
+ 
+ 			gettimeofday(&random_stop_time, NULL);
+ 
+ 			/*
+ 			 * We are not sure how much precision is in tv_usec, so we swap
+ 			 * the high and low 16 bits of 'random_stop_time' and XOR them
+ 			 * with 'random_start_time'. On the off chance that the result is
+ 			 * 0, we loop until it isn't.
+ 			 */
+ 			random_seed = random_start_time.tv_usec ^
+ 				((random_stop_time.tv_usec << 16) |
+ 				 ((random_stop_time.tv_usec >> 16) & 0xffff));
+ 		}
+ 		while (random_seed == 0);
+ 
+ 		srandom(random_seed);
+ 	}
+ 
+ 	return random();
+ }
+ 
+ /*
+  * Create a new, random stat session key.
+  */
+ static void
+ pgss_new_stat_session(pgssSharedState *state)
+ {
+ 	state->stat_session_key = (int32) pgssRandom();
+ 
+ 	/* Generate 64-bit private session */
+ 	state->private_stat_session_key	 = (uint64) pgssRandom();
+ 	state->private_stat_session_key |= ((uint64) pgssRandom()) << 32;
+ }
+ 
+ /*
   * shmem_startup hook: allocate or attach to shared memory,
   * then load any pre-existing statistics from file.
   */
***************
*** 394,399 **** pgss_shmem_startup(void)
--- 473,479 ----
  	int			query_size;
  	int			buffer_size;
  	char	   *buffer = NULL;
+ 	bool 		log_cannot_read = true;
  
  	if (prev_shmem_startup_hook)
  		prev_shmem_startup_hook();
***************
*** 456,471 **** pgss_shmem_startup(void)
  	if (file == NULL)
  	{
  		if (errno == ENOENT)
! 			return;				/* ignore not-found error */
  		goto error;
  	}
  
  	buffer_size = query_size;
  	buffer = (char *) palloc(buffer_size);
  
  	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
! 		header != PGSS_FILE_HEADER ||
! 		fread(&num, sizeof(int32), 1, file) != 1)
  		goto error;
  
  	for (i = 0; i < num; i++)
--- 536,566 ----
  	if (file == NULL)
  	{
  		if (errno == ENOENT)
! 			log_cannot_read = false;
! 
  		goto error;
  	}
  
  	buffer_size = query_size;
  	buffer = (char *) palloc(buffer_size);
  
+ 	/* Check header existence and magic number match. */
  	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
! 		header != PGSS_FILE_HEADER)
! 		goto error;
! 
! 	/* Restore saved session key, if possible. */
! 	if (fread(&pgss->stat_session_key,
! 			  sizeof pgss->stat_session_key, 1, file) != 1)
! 		goto error;
! 
! 	/* Restore private session key */
! 	if (fread(&pgss->private_stat_session_key,
! 			  sizeof pgss->private_stat_session_key, 1, file) != 1)
! 		goto error;
! 
! 	/* Read how many table entries there are. */
! 	if (fread(&num, sizeof(int32), 1, file) != 1)
  		goto error;
  
  	for (i = 0; i < num; i++)
***************
*** 503,509 **** pgss_shmem_startup(void)
  												   query_size - 1);
  
  		/* make the hashtable entry (discards old entries if too many) */
! 		entry = entry_alloc(&temp.key, buffer, temp.query_len, false);
  
  		/* copy in the actual stats */
  		entry->counters = temp.counters;
--- 598,605 ----
  												   query_size - 1);
  
  		/* make the hashtable entry (discards old entries if too many) */
! 		entry = entry_alloc(&temp.key, buffer, temp.query_len, temp.query_id,
! 							false);
  
  		/* copy in the actual stats */
  		entry->counters = temp.counters;
***************
*** 521,530 **** pgss_shmem_startup(void)
  	return;
  
  error:
! 	ereport(LOG,
! 			(errcode_for_file_access(),
! 			 errmsg("could not read pg_stat_statement file \"%s\": %m",
! 					PGSS_DUMP_FILE)));
  	if (buffer)
  		pfree(buffer);
  	if (file)
--- 617,632 ----
  	return;
  
  error:
! 	/* Ignore not-found error. */
! 	if (log_cannot_read)
! 		ereport(LOG,
! 				(errcode_for_file_access(),
! 				 errmsg("could not read pg_stat_statement file \"%s\": %m",
! 						PGSS_DUMP_FILE)));
! 
! 	/* Need a new session if none could be read from the file */
! 	pgss_new_stat_session(pgss);
! 
  	if (buffer)
  		pfree(buffer);
  	if (file)
***************
*** 563,570 **** pgss_shmem_shutdown(int code, Datum arg)
--- 665,685 ----
  	if (file == NULL)
  		goto error;
  
+ 	/* Save header/magic number.  */
  	if (fwrite(&PGSS_FILE_HEADER, sizeof(uint32), 1, file) != 1)
  		goto error;
+ 
+ 	/* Save stat session key. */
+ 	if (fwrite(&pgss->stat_session_key,
+ 			   sizeof pgss->stat_session_key, 1, file) != 1)
+ 			goto error;
+ 
+ 	/* Save private session key */
+ 	if (fwrite(&pgss->private_stat_session_key,
+ 			   sizeof pgss->private_stat_session_key, 1, file) != 1)
+ 			goto error;
+ 
+ 	/* Write how many table entries there are. */
  	num_entries = hash_get_num_entries(pgss_hash);
  	if (fwrite(&num_entries, sizeof(int32), 1, file) != 1)
  		goto error;
***************
*** 991,997 **** pgss_store(const char *query, uint32 queryId,
  			/* Acquire exclusive lock as required by entry_alloc() */
  			LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
  
! 			entry = entry_alloc(&key, norm_query, query_len, true);
  		}
  		else
  		{
--- 1106,1112 ----
  			/* Acquire exclusive lock as required by entry_alloc() */
  			LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
  
! 			entry = entry_alloc(&key, norm_query, query_len, queryId, true);
  		}
  		else
  		{
***************
*** 1008,1014 **** pgss_store(const char *query, uint32 queryId,
  			/* Acquire exclusive lock as required by entry_alloc() */
  			LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
  
! 			entry = entry_alloc(&key, query, query_len, false);
  		}
  	}
  
--- 1123,1129 ----
  			/* Acquire exclusive lock as required by entry_alloc() */
  			LWLockAcquire(pgss->lock, LW_EXCLUSIVE);
  
! 			entry = entry_alloc(&key, query, query_len, queryId, false);
  		}
  	}
  
***************
*** 1069,1075 **** pg_stat_statements_reset(PG_FUNCTION_ARGS)
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS			18
  
  /*
   * Retrieve statement statistics.
--- 1184,1191 ----
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS_V1_1	18
! #define PG_STAT_STATEMENTS_COLS			20
  
  /*
   * Retrieve statement statistics.
***************
*** 1086,1092 **** pg_stat_statements(PG_FUNCTION_ARGS)
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	bool		sql_supports_v1_1_counters = true;
  
  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
--- 1202,1208 ----
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	pgssTupVersion detected_version;
  
  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
***************
*** 1107,1114 **** pg_stat_statements(PG_FUNCTION_ARGS)
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
  	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! 		sql_supports_v1_1_counters = false;
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
--- 1223,1250 ----
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
+ 
+ 	/* Perform version detection */
  	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! 		detected_version = PGSS_TUP_V1_0;
! 	else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_1)
! 		detected_version = PGSS_TUP_V1_1;
! 	else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS)
! 		detected_version = PGSS_TUP_LATEST;
! 	else
! 	{
! 		/*
! 		 * Couldn't identify the tuple format.  Raise error.
! 		 *
! 		 * This is an exceptional case that may only happen in bizarre
! 		 * situations, since it is thought that every released version
! 		 * of pg_stat_statements has a matching schema.
! 		 */
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! 				 errmsg("pg_stat_statements schema is not supported "
! 						"by its installed binary")));
! 	}
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
***************
*** 1135,1140 **** pg_stat_statements(PG_FUNCTION_ARGS)
--- 1271,1277 ----
  
  		values[i++] = ObjectIdGetDatum(entry->key.userid);
  		values[i++] = ObjectIdGetDatum(entry->key.dbid);
+ 		values[i++] = DatumGetInt32(pgss->stat_session_key);
  
  		if (is_superuser || entry->key.userid == userid)
  		{
***************
*** 1150,1156 **** pg_stat_statements(PG_FUNCTION_ARGS)
  				pfree(qstr);
  		}
  		else
! 			values[i++] = CStringGetTextDatum("<insufficient privilege>");
  
  		/* copy counters to a local variable to keep locking time short */
  		{
--- 1287,1307 ----
  				pfree(qstr);
  		}
  		else
! 		{
! 			/*
! 			 * The role calling this function is unable to see
! 			 * sensitive aspects of this tuple.
! 			 *
! 			 * Nullify everything except the "insufficient privilege"
! 			 * message for this entry
! 			 */
! 			memset(nulls, 1, sizeof nulls);
! 
! 			nulls[i]  = 0;
! 			values[i] = CStringGetTextDatum("<insufficient privilege>");
! 
! 			i += 1;
! 		}
  
  		/* copy counters to a local variable to keep locking time short */
  		{
***************
*** 1165,1193 **** pg_stat_statements(PG_FUNCTION_ARGS)
  		if (tmp.calls == 0)
  			continue;
  
  		values[i++] = Int64GetDatumFast(tmp.calls);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (sql_supports_v1_1_counters)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
  		}
  
! 		Assert(i == (sql_supports_v1_1_counters ?
! 					 PG_STAT_STATEMENTS_COLS : PG_STAT_STATEMENTS_COLS_V1_0));
  
  		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
  	}
--- 1316,1369 ----
  		if (tmp.calls == 0)
  			continue;
  
+ 		if (detected_version >= PGSS_TUP_LATEST)
+ 		{
+ 			uint64 qid = pgss->private_stat_session_key;
+ 
+ 			qid ^= (uint64) entry->query_id;
+ 			qid ^= ((uint64) entry->query_id) << 32;
+ 
+ 			values[i++] = Int64GetDatumFast(qid);
+ 		}
+ 
  		values[i++] = Int64GetDatumFast(tmp.calls);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (detected_version >= PGSS_TUP_V1_1)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (detected_version >= PGSS_TUP_V1_1)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (detected_version >= PGSS_TUP_V1_1)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
  		}
  
! #ifdef USE_ASSERT_CHECKING
! 		/* Check that every column appears to be filled */
! 		switch (detected_version)
! 		{
! 				case PGSS_TUP_V1_0:
! 						Assert(i == PG_STAT_STATEMENTS_COLS_V1_0);
! 						break;
! 				case PGSS_TUP_V1_1:
! 						Assert(i == PG_STAT_STATEMENTS_COLS_V1_1);
! 						break;
! 				case PGSS_TUP_LATEST:
! 						Assert(i == PG_STAT_STATEMENTS_COLS);
! 						break;
! 				default:
! 						Assert(false);
! 		}
! #endif
  
  		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
  	}
***************
*** 1234,1240 **** pgss_memsize(void)
   * have made the entry while we waited to get exclusive lock.
   */
  static pgssEntry *
! entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky)
  {
  	pgssEntry  *entry;
  	bool		found;
--- 1410,1417 ----
   * have made the entry while we waited to get exclusive lock.
   */
  static pgssEntry *
! entry_alloc(pgssHashKey *key, const char *query, int query_len,
! 			uint32 query_id, bool sticky)
  {
  	pgssEntry  *entry;
  	bool		found;
***************
*** 1254,1259 **** entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky)
--- 1431,1437 ----
  		memset(&entry->counters, 0, sizeof(Counters));
  		/* set the appropriate initial usage count */
  		entry->counters.usage = sticky ? pgss->cur_median_usage : USAGE_INIT;
+ 
  		/* re-initialize the mutex each time ... we assume no one using it */
  		SpinLockInit(&entry->mutex);
  		/* ... and don't forget the query text */
***************
*** 1261,1266 **** entry_alloc(pgssHashKey *key, const char *query, int query_len, bool sticky)
--- 1439,1447 ----
  		entry->query_len = query_len;
  		memcpy(entry->query, query, query_len);
  		entry->query[query_len] = '\0';
+ 
+ 		/* Copy in the query id for reporting */
+ 		entry->query_id = query_id;
  	}
  
  	return entry;
***************
*** 1309,1314 **** entry_dealloc(void)
--- 1490,1496 ----
  	while ((entry = hash_seq_search(&hash_seq)) != NULL)
  	{
  		entries[i++] = entry;
+ 
  		/* "Sticky" entries get a different usage decay rate. */
  		if (entry->counters.calls == 0)
  			entry->counters.usage *= STICKY_DECREASE_FACTOR;
***************
*** 1350,1355 **** entry_reset(void)
--- 1532,1543 ----
  		hash_search(pgss_hash, &entry->key, HASH_REMOVE, NULL);
  	}
  
+ 	/*
+ 	 * Counters all reset, so need to generate a new identity for the
+ 	 * session.
+ 	 */
+ 	pgss_new_stat_session(pgss);
+ 
  	LWLockRelease(pgss->lock);
  }
  
*** a/contrib/pg_stat_statements/pg_stat_statements.control
--- b/contrib/pg_stat_statements/pg_stat_statements.control
***************
*** 1,5 ****
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.1'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
--- 1,5 ----
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.2'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
*** a/doc/src/sgml/pgstatstatements.sgml
--- b/doc/src/sgml/pgstatstatements.sgml
***************
*** 57,76 ****
        <entry>OID of database in which the statement was executed</entry>
       </row>
  
!     <row>
        <entry><structfield>query</structfield></entry>
        <entry><type>text</type></entry>
        <entry></entry>
        <entry>Text of a representative statement (up to <xref linkend="guc-track-activity-query-size"> bytes)</entry>
       </row>
! 
       <row>
        <entry><structfield>calls</structfield></entry>
        <entry><type>bigint</type></entry>
        <entry></entry>
        <entry>Number of times executed</entry>
       </row>
- 
       <row>
        <entry><structfield>total_time</structfield></entry>
        <entry><type>double precision</type></entry>
--- 57,81 ----
        <entry>OID of database in which the statement was executed</entry>
       </row>
  
!      <row>
        <entry><structfield>query</structfield></entry>
        <entry><type>text</type></entry>
        <entry></entry>
        <entry>Text of a representative statement (up to <xref linkend="guc-track-activity-query-size"> bytes)</entry>
       </row>
!      <row>
!       <entry><structfield>queryid</structfield></entry>
!       <entry><type>bigint</type></entry>
!       <entry></entry>
!       <entry>Unique value of each representative statement for the current statistics session.
!        This value will change for each new statistics session.</entry>
!      </row>
       <row>
        <entry><structfield>calls</structfield></entry>
        <entry><type>bigint</type></entry>
        <entry></entry>
        <entry>Number of times executed</entry>
       </row>
       <row>
        <entry><structfield>total_time</structfield></entry>
        <entry><type>double precision</type></entry>
#25samthakur74
samthakur74@gmail.com
In reply to: Daniel Farina (#24)
Re: pg_stat_statements: calls under-estimation propagation

I forgot about removal of the relevant SGML, amended here in v6.

Thank you for this!
i did a quick test with following steps:
1. Applied v6 patch
2. make and make install on pg_stat_statements;
3. Restarted Postgres with pg_stat_statements loaded with
pg_stat_statements.max = 4
4. Dropped and created extension pg_stat_statements.

Executed following:
select * from pg_stat_statements_reset();
select * from pgbench_branches ;
select * from pgbench_history ;
select * from pgbench_tellers ;
select * from pgbench_accounts;

I expected 4 rows in pg_stat_statements view for each of 4 queries
above. But i saw just 2 rows.

select * from pg_stat_statements;

userid | dbid | stat_session_id | query | quer
y_id | calls | total_time | rows | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | l
ocal_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written | bl
k_read_time | blk_write_time
--------+-------+-----------------+---------------------------------+-----------
-----------+-------+------------+--------+-----------------+------------------+-
--------------------+---------------------+----------------+-----------------+--
------------------+--------------------+----------------+-------------------+---
------------+----------------
10 | 12900 | 21595345 | select * from pgbench_accounts; | -803800319
3522943111 | 1 | 108.176 | 100000 | 1640 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0
10 | 12900 | 21595345 | select * from pgbench_tellers ; | -149722997
7134331757 | 1 | 0.227 | 10 | 1 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 | 0
(2 rows)

Am i doing something wrong?

regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771960.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Farina (#24)
Re: pg_stat_statements: calls under-estimation propagation

Daniel Farina escribi�:

On Fri, Sep 20, 2013 at 1:11 AM, Daniel Farina <daniel@fdr.io> wrote:

I think the n-call underestimation propagation may not be quite precise for
various detailed reasons (having to do with 'sticky' queries) and to make it
precise is probably more work than it's worth. And, on more reflection, I'm
also having a hard time imaging people intuiting that value usefully. So,
here's a version removing that.

I forgot about removal of the relevant SGML, amended here in v6.

Nice.

You need to remove the --1.1.sql entry (the file itself and the DATA
entry in Makefile).

Also, I think it would be good to have a common "fooRandom()" routine,
instead of having a second copy of PostmasterRandom. Might I suggest
putting it in a new file under src/common/.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Sameer Thakur
samthakur74@gmail.com
In reply to: samthakur74 (#25)
Re: pg_stat_statements: calls under-estimation propagation

On Mon, Sep 23, 2013 at 1:26 PM, samthakur74 <samthakur74@gmail.com> wrote:

I forgot about removal of the relevant SGML, amended here in v6.

Thank you for this!
i did a quick test with following steps:
1. Applied v6 patch
2. make and make install on pg_stat_statements;
3. Restarted Postgres with pg_stat_statements loaded with
pg_stat_statements.max = 4
4. Dropped and created extension pg_stat_statements.

Executed following:
select * from pg_stat_statements_reset();
select * from pgbench_branches ;
select * from pgbench_history ;
select * from pgbench_tellers ;
select * from pgbench_accounts;

I expected 4 rows in pg_stat_statements view for each of 4 queries
above. But i saw just 2 rows.

select * from pg_stat_statements;

userid | dbid | stat_session_id | query |
quer
y_id | calls | total_time | rows | shared_blks_hit |
shared_blks_read |
shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read
| l
ocal_blks_dirtied | local_blks_written | temp_blks_read | temp_blks_written
| bl
k_read_time | blk_write_time
--------+-------+-----------------+---------------------------------+-----------
-----------+-------+------------+--------+-----------------+------------------+-
--------------------+---------------------+----------------+-----------------+--
------------------+--------------------+----------------+-------------------+---
------------+----------------
10 | 12900 | 21595345 | select * from pgbench_accounts; |
-803800319
3522943111 | 1 | 108.176 | 100000 | 1640 |
0 |
0 | 0 | 0 | 0
|
0 | 0 | 0 | 0
|
0 | 0
10 | 12900 | 21595345 | select * from pgbench_tellers ; |
-149722997
7134331757 | 1 | 0.227 | 10 | 1 |
0 |
0 | 0 | 0 | 0
|
0 | 0 | 0 | 0
|
0 | 0
(2 rows)

Am i doing something wrong?

Yes i was. Just saw a warning when pg_stat_statements is loaded that
valid values for pg_stat_statements.max is between 100 and 2147483647.
Not sure why though.
regards
Sameer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Daniel Farina
daniel@fdr.io
In reply to: Sameer Thakur (#27)
Re: pg_stat_statements: calls under-estimation propagation

On Sun, Sep 29, 2013 at 10:25 AM, Sameer Thakur <samthakur74@gmail.com> wrote:

Yes i was. Just saw a warning when pg_stat_statements is loaded that
valid values for pg_stat_statements.max is between 100 and 2147483647.
Not sure why though.

I remember hacking that out for testing sake.

I can only justify it as a foot-gun to prevent someone from being
stuck restarting the database to get a reasonable number in there.
Let's CC Peter; maybe he can remember some thoughts about that.

Also, for onlookers, I have changed this patch around to do the
date-oriented stuff but want to look it over before stapling it up and
sending it. If one cannot wait, one can look at
https://github.com/fdr/postgres/tree/queryid. The squashed-version of
that history contains a reasonable patch I think, but a re-read often
finds something for me and I've only just completed it yesterday.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#28)
Re: pg_stat_statements: calls under-estimation propagation

Also, for onlookers, I have changed this patch around to do the
date-oriented stuff but want to look it over before stapling it up and
sending it. If one cannot wait, one can look at
https://github.com/fdr/postgres/tree/queryid. The squashed-version of
that history contains a reasonable patch I think, but a re-read often
finds something for me and I've only just completed it yesterday.

I did the following
1. Forked from fdr/postgres
2. cloned branch queryid
3. squashed
22899c802571a57cfaf0df38e6c5c366b5430c74
d813096e29049667151a49fc5e5cf3d6bbe55702
picked
be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
4. usual make/make install/create extension pg_stat_statements.
(pg_stat_statements.max=100).
5. select * from pg_stat_statements_reset(), select * from pgbench_tellers.
result below:

userid | dbid | session_start | introduced
| query | query_id
| calls | total_time |
rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
shared_blks_written | local_blks_hit | local_blks_read |
local_blks_dirtied | local_blks_written | t
emp_blks_read | temp_blks_written | blk_read_time | blk_write_time
--------+-------+----------------------------------+---------------------------+-------------------------------------------+---------------------+-------+------------+
------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+--
--------------+-------------------+---------------+----------------
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pg_stat_statements_reset(); |
2531907647060518039 | 1 | 0 |
1 | 0 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pgbench_tellers ; |
7580333025384382649 | 1 | 0 |
10 | 1 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
(2 rows)

I understand session_start and verified that it changes with each
database restart to reflect current time. I am not sure why introduced
keeps showing the same "1970-01-01 05:30:00+05:30" value. I thought it
reflected the (most recent) time query statements statistics is added
to hashtable. Is this a bug?
Will continue to test and try and understand the code.

regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5772841.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#30Daniel Farina
daniel@fdr.io
In reply to: Sameer Thakur (#29)
Re: pg_stat_statements: calls under-estimation propagation

On Sep 30, 2013 4:39 AM, "Sameer Thakur" <samthakur74@gmail.com> wrote:

Also, for onlookers, I have changed this patch around to do the
date-oriented stuff but want to look it over before stapling it up and
sending it. If one cannot wait, one can look at
https://github.com/fdr/postgres/tree/queryid. The squashed-version of
that history contains a reasonable patch I think, but a re-read often
finds something for me and I've only just completed it yesterday.

I did the following
1. Forked from fdr/postgres
2. cloned branch queryid
3. squashed
22899c802571a57cfaf0df38e6c5c366b5430c74
d813096e29049667151a49fc5e5cf3d6bbe55702
picked
be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
4. usual make/make install/create extension pg_stat_statements.
(pg_stat_statements.max=100).
5. select * from pg_stat_statements_reset(), select * from

pgbench_tellers.

result below:

userid | dbid | session_start | introduced
| query | query_id
| calls | total_time |
rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
shared_blks_written | local_blks_hit | local_blks_read |
local_blks_dirtied | local_blks_written | t
emp_blks_read | temp_blks_written | blk_read_time | blk_write_time

--------+-------+----------------------------------+---------------------------+-------------------------------------------+---------------------+-------+------------+

------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+--

--------------+-------------------+---------------+----------------
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pg_stat_statements_reset(); |
2531907647060518039 | 1 | 0 |
1 | 0 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pgbench_tellers ; |
7580333025384382649 | 1 | 0 |
10 | 1 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
(2 rows)

I understand session_start and verified that it changes with each
database restart to reflect current time.

It should only restart when the statistics file cannot be loaded.

I am not sure why introduced

keeps showing the same "1970-01-01 05:30:00+05:30" value. I thought it
reflected the (most recent) time query statements statistics is added
to hashtable. Is this a bug?
Will continue to test and try and understand the code.

Yes, a bug. There are a few calls to pgss store and I must be submitting a
zero value for the introduction time in one of those cases.

Heh, I thought that was fixed, but maybe I broke something. Like I said;
preliminary. At the earliest I can look at this Wednesday, but feel free to
amend and resubmit including my changes if you feel inclined and get to it
first.

#31Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#30)
Re: pg_stat_statements: calls under-estimation propagation

On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL]
<ml-node+s1045698n5772887h21@n5.nabble.com> wrote:

On Sep 30, 2013 4:39 AM, "Sameer Thakur" <[hidden email]> wrote:

Also, for onlookers, I have changed this patch around to do the
date-oriented stuff but want to look it over before stapling it up and
sending it. If one cannot wait, one can look at
https://github.com/fdr/postgres/tree/queryid. The squashed-version of
that history contains a reasonable patch I think, but a re-read often
finds something for me and I've only just completed it yesterday.

I did the following
1. Forked from fdr/postgres
2. cloned branch queryid
3. squashed
22899c802571a57cfaf0df38e6c5c366b5430c74
d813096e29049667151a49fc5e5cf3d6bbe55702
picked
be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
4. usual make/make install/create extension pg_stat_statements.
(pg_stat_statements.max=100).
5. select * from pg_stat_statements_reset(), select * from
pgbench_tellers.
result below:

userid | dbid | session_start | introduced
| query | query_id
| calls | total_time |
rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
shared_blks_written | local_blks_hit | local_blks_read |
local_blks_dirtied | local_blks_written | t
emp_blks_read | temp_blks_written | blk_read_time | blk_write_time

--------+-------+----------------------------------+---------------------------+-------------------------------------------+---------------------+-------+------------+

------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+--
--------------+-------------------+---------------+----------------
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pg_stat_statements_reset(); |
2531907647060518039 | 1 | 0 |
1 | 0 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pgbench_tellers ; |
7580333025384382649 | 1 | 0 |
10 | 1 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
(2 rows)

I understand session_start and verified that it changes with each
database restart to reflect current time.

It should only restart when the statistics file cannot be loaded.

I am not sure why introduced

keeps showing the same "1970-01-01 05:30:00+05:30" value. I thought it
reflected the (most recent) time query statements statistics is added
to hashtable. Is this a bug?
Will continue to test and try and understand the code.

Yes, a bug. There are a few calls to pgss store and I must be submitting a
zero value for the introduction time in one of those cases.

Heh, I thought that was fixed, but maybe I broke something. Like I said;
preliminary. At the earliest I can look at this Wednesday, but feel free to
amend and resubmit including my changes if you feel inclined and get to it
first.

In pg_stat_statements.c line 1440
changed
if (instr == NULL)
to
if (instr == NULL || INSTR_TIME_IS_ZERO(instr->starttime))

This seemed to do the trick. I will continue to test some more.
regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5772930.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#32Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#30)
Re: pg_stat_statements: calls under-estimation propagation

On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL]
<ml-node+s1045698n5772887h21@n5.nabble.com> wrote:

On Sep 30, 2013 4:39 AM, "Sameer Thakur" <[hidden email]> wrote:

Also, for onlookers, I have changed this patch around to do the
date-oriented stuff but want to look it over before stapling it up and
sending it. If one cannot wait, one can look at
https://github.com/fdr/postgres/tree/queryid. The squashed-version of
that history contains a reasonable patch I think, but a re-read often
finds something for me and I've only just completed it yesterday.

I did the following
1. Forked from fdr/postgres
2. cloned branch queryid
3. squashed
22899c802571a57cfaf0df38e6c5c366b5430c74
d813096e29049667151a49fc5e5cf3d6bbe55702
picked
be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
4. usual make/make install/create extension pg_stat_statements.
(pg_stat_statements.max=100).
5. select * from pg_stat_statements_reset(), select * from
pgbench_tellers.
result below:

userid | dbid | session_start | introduced
| query | query_id
| calls | total_time |
rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
shared_blks_written | local_blks_hit | local_blks_read |
local_blks_dirtied | local_blks_written | t
emp_blks_read | temp_blks_written | blk_read_time | blk_write_time

--------+-------+----------------------------------+---------------------------+-------------------------------------------+---------------------+-------+------------+

------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+--
--------------+-------------------+---------------+----------------
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pg_stat_statements_reset(); |
2531907647060518039 | 1 | 0 |
1 | 0 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pgbench_tellers ; |
7580333025384382649 | 1 | 0 |
10 | 1 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
(2 rows)

I understand session_start and verified that it changes with each
database restart to reflect current time.

It should only restart when the statistics file cannot be loaded.

This seems to work fine.
1. Started the instance
2. Executed pg_stat_statements_reset(), select * from
pgbench_history,select* from pgbench_tellers. Got the following in
pg_stat_statements view
userid | dbid | session_start |
introduced | query |
query_id | calls | tota
l_time | rows | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_written | local_blks_hit |
local_blks_read | local_blks_dirtied | local_blks_wri
tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time
--------+-------+----------------------------------+----------------------------------+-------------------------------------------+----------------------+-------+-----
-------+------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+---------------
-----+----------------+-------------------+---------------+----------------
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:43.724301+05:30 | select * from pgbench_history; |
-165801328395488047 | 1 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:37.379785+05:30 | select * from pgbench_tellers; |
8376871363863945311 | 1 |
0 | 10 | 0 | 1 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:26.667178+05:30 | select * from pg_stat_statements_reset(); |
-1061018443194138344 | 1 |
0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
(3 rows)

Then restarted the server and saw pg_stat_statements view again.

userid | dbid | session_start |
introduced | query |
query_id | calls | tota
l_time | rows | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_written | local_blks_hit |
local_blks_read | local_blks_dirtied | local_blks_wri
tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time
--------+-------+----------------------------------+----------------------------------+-------------------------------------------+----------------------+-------+-----
-------+------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+---------------
-----+----------------+-------------------+---------------+----------------
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:45:15.130261+05:30 | select * from pgbench_history; |
-165801328395488047 | 1 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:45:15.130266+05:30 | select * from pg_stat_statements ; |
-247576122750898541 | 1 |
0 | 3 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:45:15.130271+05:30 | select * from pgbench_tellers; |
8376871363863945311 | 1 |
0 | 10 | 0 | 1 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:45:15.130276+05:30 | select * from pg_stat_statements_reset(); |
-1061018443194138344 | 1 |
0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
(4 rows)

Correctly, session start remains same after restart for all queries
and introduced time differs slightly reflecting re-introduction of
statistics into hashtable after reading from statistics file. Also,
correctly, queryid remains same for all queries.

Now shutdown and delete pg_stat_statements.stat under data/global.
Restart again and check pg_stat_statements view.

userid | dbid | session_start | introduced | query | query_id | calls
| total_time | rows | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_wri
tten | local_blks_hit | local_blks_read | local_blks_dirtied |
local_blks_written | temp_blks_read | temp_blks_written |
blk_read_time | blk_write_time
--------+------+---------------+------------+-------+----------+-------+------------+------+-----------------+------------------+---------------------+----------------
-----+----------------+-----------------+--------------------+--------------------+----------------+-------------------+---------------+----------------
(0 rows)

Correctly it has been reset.

regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5772955.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#33Daniel Farina
daniel@heroku.com
In reply to: Sameer Thakur (#32)
Re: pg_stat_statements: calls under-estimation propagation

On Tue, Oct 1, 2013 at 5:32 AM, Sameer Thakur <samthakur74@gmail.com> wrote:

On Tue, Oct 1, 2013 at 12:48 AM, Daniel Farina-5 [via PostgreSQL]
<[hidden email]> wrote:

On Sep 30, 2013 4:39 AM, "Sameer Thakur" <[hidden email]> wrote:

Also, for onlookers, I have changed this patch around to do the
date-oriented stuff but want to look it over before stapling it up and
sending it. If one cannot wait, one can look at
https://github.com/fdr/postgres/tree/queryid. The squashed-version of
that history contains a reasonable patch I think, but a re-read often
finds something for me and I've only just completed it yesterday.

I did the following
1. Forked from fdr/postgres
2. cloned branch queryid
3. squashed
22899c802571a57cfaf0df38e6c5c366b5430c74
d813096e29049667151a49fc5e5cf3d6bbe55702
picked
be2671a4a6aa355c5e8ae646210e6c8e0b84ecb5
4. usual make/make install/create extension pg_stat_statements.
(pg_stat_statements.max=100).
5. select * from pg_stat_statements_reset(), select * from
pgbench_tellers.
result below:

userid | dbid | session_start | introduced
| query | query_id
| calls | total_time |
rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied |
shared_blks_written | local_blks_hit | local_blks_read |
local_blks_dirtied | local_blks_written | t
emp_blks_read | temp_blks_written | blk_read_time | blk_write_time

--------+-------+----------------------------------+---------------------------+-------------------------------------------+---------------------+-------+------------+

------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+--
--------------+-------------------+---------------+----------------
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pg_stat_statements_reset(); |
2531907647060518039 | 1 | 0 |
1 | 0 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
10 | 12900 | 2013-09-30 16:55:22.285113+05:30 | 1970-01-01
05:30:00+05:30 | select * from pgbench_tellers ; |
7580333025384382649 | 1 | 0 |
10 | 1 | 0 | 0 |
0 | 0 | 0 |
0 | 0 |
0 | 0 | 0 | 0
(2 rows)

I understand session_start and verified that it changes with each
database restart to reflect current time.

It should only restart when the statistics file cannot be loaded.

This seems to work fine.
1. Started the instance
2. Executed pg_stat_statements_reset(), select * from
pgbench_history,select* from pgbench_tellers. Got the following in
pg_stat_statements view
userid | dbid | session_start |
introduced | query |
query_id | calls | tota
l_time | rows | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_written | local_blks_hit |
local_blks_read | local_blks_dirtied | local_blks_wri
tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time
--------+-------+----------------------------------+----------------------------------+-------------------------------------------+----------------------+-------+-----
-------+------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+---------------
-----+----------------+-------------------+---------------+----------------
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:43.724301+05:30 | select * from pgbench_history; |
-165801328395488047 | 1 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:37.379785+05:30 | select * from pgbench_tellers; |
8376871363863945311 | 1 |
0 | 10 | 0 | 1 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:43:26.667178+05:30 | select * from pg_stat_statements_reset(); |
-1061018443194138344 | 1 |
0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
(3 rows)

Then restarted the server and saw pg_stat_statements view again.

userid | dbid | session_start |
introduced | query |
query_id | calls | tota
l_time | rows | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_written | local_blks_hit |
local_blks_read | local_blks_dirtied | local_blks_wri
tten | temp_blks_read | temp_blks_written | blk_read_time | blk_write_time
--------+-------+----------------------------------+----------------------------------+-------------------------------------------+----------------------+-------+-----
-------+------+-----------------+------------------+---------------------+---------------------+----------------+-----------------+--------------------+---------------
-----+----------------+-------------------+---------------+----------------
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:45:15.130261+05:30 | select * from pgbench_history; |
-165801328395488047 | 1 |
0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:45:15.130266+05:30 | select * from pg_stat_statements ; |
-247576122750898541 | 1 |
0 | 3 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:45:15.130271+05:30 | select * from pgbench_tellers; |
8376871363863945311 | 1 |
0 | 10 | 0 | 1 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
10 | 12900 | 2013-10-01 17:43:26.667074+05:30 | 2013-10-01
17:45:15.130276+05:30 | select * from pg_stat_statements_reset(); |
-1061018443194138344 | 1 |
0 | 1 | 0 | 0 |
0 | 0 | 0 | 0 |
0 |
0 | 0 | 0 | 0 | 0
(4 rows)

Correctly, session start remains same after restart for all queries
and introduced time differs slightly reflecting re-introduction of
statistics into hashtable after reading from statistics file. Also,
correctly, queryid remains same for all queries.

Now shutdown and delete pg_stat_statements.stat under data/global.
Restart again and check pg_stat_statements view.

userid | dbid | session_start | introduced | query | query_id | calls
| total_time | rows | shared_blks_hit | shared_blks_read |
shared_blks_dirtied | shared_blks_wri
tten | local_blks_hit | local_blks_read | local_blks_dirtied |
local_blks_written | temp_blks_read | temp_blks_written |
blk_read_time | blk_write_time
--------+------+---------------+------------+-------+----------+-------+------------+------+-----------------+------------------+---------------------+----------------
-----+----------------+-----------------+--------------------+--------------------+----------------+-------------------+---------------+----------------
(0 rows)

Correctly it has been reset.

regards
Sameer

Looks pretty good. Do you want to package up the patch with your
change and do the honors and re-submit it? Thanks for helping out so
much!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Peter Geoghegan
pg@heroku.com
In reply to: Daniel Farina (#28)
Re: pg_stat_statements: calls under-estimation propagation

On Sun, Sep 29, 2013 at 10:58 PM, Daniel Farina <daniel@fdr.io> wrote:

I remember hacking that out for testing sake.

I can only justify it as a foot-gun to prevent someone from being
stuck restarting the database to get a reasonable number in there.
Let's CC Peter; maybe he can remember some thoughts about that.

On reflection I think it was entirely to do with the testing of the
patch. I don't think that the minimum value of pg_stat_statements has
any real significance.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#33)
Re: pg_stat_statements: calls under-estimation propagation

Looks pretty good. Do you want to package up the patch with your
change and do the honors and re-submit it? Thanks for helping out so
much!

Sure, will do. Need to add a bit of documentation explaining
statistics session as well.
I did some more basic testing around pg_stat_statements.max, now that
we have clarity from Peter about its value being legitimate below 100.
Seems to work fine, with pg_stat_statements =4 the max unique queries
in the view are 4. On the 5th query the view holds just the latest
unique query discarding the previous 4. Fujii had reported a
segmentation fault in this scenario.
Thank you for the patch
regards
Sameer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Sameer Thakur
samthakur74@gmail.com
In reply to: Sameer Thakur (#35)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

Looks pretty good. Do you want to package up the patch with your
change and do the honors and re-submit it? Thanks for helping out so
much!

Sure, will do. Need to add a bit of documentation explaining
statistics session as well.
I did some more basic testing around pg_stat_statements.max, now that
we have clarity from Peter about its value being legitimate below 100.
Seems to work fine, with pg_stat_statements =4 the max unique queries
in the view are 4. On the 5th query the view holds just the latest
unique query discarding the previous 4. Fujii had reported a
segmentation fault in this scenario.
Thank you for the patch

Please find the patch attached

regards
Sameer

Attachments:

pg_stat_statements-identification-v7.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v7.patch.gzDownload
#37Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#36)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

Looks pretty good. Do you want to package up the patch with your
change and do the honors and re-submit it? Thanks for helping out so
much!

Sure, will do. Need to add a bit of documentation explaining
statistics session as well.
I did some more basic testing around pg_stat_statements.max, now that
we have clarity from Peter about its value being legitimate below 100.
Seems to work fine, with pg_stat_statements =4 the max unique queries
in the view are 4. On the 5th query the view holds just the latest
unique query discarding the previous 4. Fujii had reported a
segmentation fault in this scenario.
Thank you for the patch

Please find the patch attached

Thanks for the patch! Here are the review comments:

+ OUT session_start timestamptz,
+ OUT introduced timestamptz,

The patch exposes these columns in pg_stat_statements view.
These should be documented.

I don't think that session_start should be exposed in every
rows in pg_stat_statements because it's updated only when
all statistics are reset, i.e., session_start of all entries
in pg_stat_statements indicate the same.

+ OUT query_id int8,

query_id or queryid? I like the latter. Also the document
uses the latter.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38Daniel Farina
daniel@heroku.com
In reply to: Fujii Masao (#37)
Re: pg_stat_statements: calls under-estimation propagation

On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

Looks pretty good. Do you want to package up the patch with your
change and do the honors and re-submit it? Thanks for helping out so
much!

Sure, will do. Need to add a bit of documentation explaining
statistics session as well.
I did some more basic testing around pg_stat_statements.max, now that
we have clarity from Peter about its value being legitimate below 100.
Seems to work fine, with pg_stat_statements =4 the max unique queries
in the view are 4. On the 5th query the view holds just the latest
unique query discarding the previous 4. Fujii had reported a
segmentation fault in this scenario.
Thank you for the patch

Please find the patch attached

Thanks for the patch! Here are the review comments:

+ OUT session_start timestamptz,
+ OUT introduced timestamptz,

The patch exposes these columns in pg_stat_statements view.
These should be documented.

I don't think that session_start should be exposed in every
rows in pg_stat_statements because it's updated only when
all statistics are reset, i.e., session_start of all entries
in pg_stat_statements indicate the same.

Dunno. I agree it'd be less query traffic and noise. Maybe hidden
behind a UDF? I thought "stats_reset" on pg_database may be prior
art, but realized that the statistics there differ depending on stats
resets per database (maybe a name change of 'session' to 'stats_reset'
would be useful to avoid too much in-cohesion, though).

I didn't want to bloat the taxonomy of exposed API/symbols too much
for pg_stat_statements, but perhaps in this instance it is reasonable.
Also, isn't the interlock with the result set is perhaps more
precise/fine-grained with the current solution? Yet, that's awfully
corner-casey.

I'm on the fence because the simplicity and precision of the current
regime for aggregation tools is nice, but avoiding the noise for
inspecting humans in the common case is also nice. I don't see a
reason right now to go strongly either way, so if you feel moderately
strongly that the repetitive column should be stripped then I am happy
to relent there and help out. Let me know of your detailed thoughts
(or modify the patch) with your idea.

+ OUT query_id int8,

query_id or queryid? I like the latter. Also the document
uses the latter.

Not much opinion. I prefer expending the _ character because most
compound words in postgres performance statistics catalogs seem to use
an underscore, though.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39Sameer Thakur
samthakur74@gmail.com
In reply to: Fujii Masao (#37)
Re: pg_stat_statements: calls under-estimation propagation

Please find the patch attached

Thanks for the patch! Here are the review comments:

+ OUT session_start timestamptz,
+ OUT introduced timestamptz,

The patch exposes these columns in pg_stat_statements view.
These should be documented.

Yes, will add to documentation.

I don't think that session_start should be exposed in every
rows in pg_stat_statements because it's updated only when
all statistics are reset, i.e., session_start of all entries
in pg_stat_statements indicate the same.

I understand. Will remove session_start from view and expose it via a
function pg_stat_statements_current_session_start() ?

+ OUT query_id int8,
query_id or queryid? I like the latter. Also the document
uses the latter.

Will change to queryid

Thank you
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5773448.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#40Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#38)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Oct 5, 2013 at 1:38 PM, Daniel Farina <daniel@heroku.com> wrote:

On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

Looks pretty good. Do you want to package up the patch with your
change and do the honors and re-submit it? Thanks for helping out so
much!

Sure, will do. Need to add a bit of documentation explaining
statistics session as well.
I did some more basic testing around pg_stat_statements.max, now that
we have clarity from Peter about its value being legitimate below 100.
Seems to work fine, with pg_stat_statements =4 the max unique queries
in the view are 4. On the 5th query the view holds just the latest
unique query discarding the previous 4. Fujii had reported a
segmentation fault in this scenario.
Thank you for the patch

Please find the patch attached

Thanks for the patch! Here are the review comments:

+ OUT session_start timestamptz,
+ OUT introduced timestamptz,

The patch exposes these columns in pg_stat_statements view.
These should be documented.

I don't think that session_start should be exposed in every
rows in pg_stat_statements because it's updated only when
all statistics are reset, i.e., session_start of all entries
in pg_stat_statements indicate the same.

Dunno. I agree it'd be less query traffic and noise. Maybe hidden
behind a UDF? I thought "stats_reset" on pg_database may be prior
art, but realized that the statistics there differ depending on stats
resets per database (maybe a name change of 'session' to 'stats_reset'
would be useful to avoid too much in-cohesion, though).

I didn't want to bloat the taxonomy of exposed API/symbols too much
for pg_stat_statements, but perhaps in this instance it is reasonable.
Also, isn't the interlock with the result set is perhaps more
precise/fine-grained with the current solution? Yet, that's awfully
corner-casey.

I'm on the fence because the simplicity and precision of the current
regime for aggregation tools is nice, but avoiding the noise for
inspecting humans in the common case is also nice. I don't see a
reason right now to go strongly either way, so if you feel moderately
strongly that the repetitive column should be stripped then I am happy
to relent there and help out. Let me know of your detailed thoughts
(or modify the patch) with your idea.

Thinking a bit more, if its just a question of a repeating value we
have the same situation for userid and dbid. They would be the same
for a user across multiple queries in same statistics session. So
userid,dbid and session_start do repeat across rows. Not sure why
treatment for session_start be different. I also checked pg_stat_plans
@ https://github.com/2ndQuadrant/pg_stat_plans and did not see any
special treatment given for a particular field in terms of access i.e.
the granularity of api wrt pg_stat_statements has been maintained.

regards
Sameer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#41Noname
pilum.70@uni-muenster.de
In reply to: Sameer Thakur (#40)
Re: pg_stat_statements: calls under-estimation propagation

The V7-Patch applied cleanly and I got no issues in my first tests.

The change from column session_start to a function seems very reasonable for me.

Concernig the usability, I would like to suggest a minor change,
that massively increases the usefulness of the patch for beginners,
who often use this view as a first approach to optimize index structure.

The history of this tool contains a first version without normalization.
This wasn't useful enough except for prepared queries.
The actual version has normalized queries, so calls get
summarized to get a glimpse of bad queries.
But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

The identification patch provides the possibility to summarize calls
by query_id, so that the normalized query string itself is no longer
needed to be exposed in the view for everyone.

I suggest to add a parameter to recover the possibility to
display real queries. The following very minor change
(based on V7) exposes the first real query getting this
query_id if normalization of the exposed string ist deactivated
(The actual behaviour is the default).
This new option is not always the best starting point to discover index shortfalls,
but a huge gain for beginners because it serves the needs
in more than 90% of the normal use cases.

What do you think?

Arne

Date: Mon Oct 7 17:54:08 2013 +0000

Switch to disable normalized query strings

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index e50dfba..6cc9244 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -234,7 +234,7 @@ static int  pgss_max;                       /* max # statements to track */
  static int     pgss_track;                     /* tracking level */
  static bool pgss_track_utility; /* whether to track utility commands */
  static bool pgss_save;                 /* whether to save stats across shutdown */
-
+static bool pgss_normalize;             /* whether to normalize the query representation shown in the view (otherwise show the first query executed with this query_id) */

#define pgss_enabled() \
(pgss_track == PGSS_TRACK_ALL || \
@@ -356,6 +356,17 @@ _PG_init(void)
NULL,
NULL);

+       DefineCustomBoolVariable("pg_stat_statements.normalize",
+                            "Selects whether the view column contains the query strings in a normalized form.",
+                                                          NULL,
+                                                          &pgss_normalize,
+                                                          true,
+                                                          PGC_SUSET,
+                                                          0,
+                                                          NULL,
+                                                          NULL,
+                                                          NULL);
+
         EmitWarningsOnPlaceholders("pg_stat_statements");

/*
@@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId,

query_len = strlen(query);

-               if (jstate)
+               if (jstate && pgss_normalize)
                 {
-                       /* Normalize the string if enabled */
+                       /* Normalize the string is not NULL and normalized query strings are enabled */
                         norm_query = generate_normalized_query(jstate, query,
                                                                                                    &query_len,
                                                                                                    key.encoding);

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#42Noname
pilum.70@uni-muenster.de
In reply to: Noname (#41)
Re: pg_stat_statements: calls under-estimation propagation

The V7-Patch applied cleanly and I got no issues in my first tests.

The change from column session_start to a function seems very reasonable for
me.

Concernig the usability, I would like to suggest a minor change, that massively
increases the usefulness of the patch for beginners, who often use this view as
a first approach to optimize index structure.

The history of this tool contains a first version without normalization.
This wasn't useful enough except for prepared queries.
The actual version has normalized queries, so calls get
summarized to get a glimpse of bad queries.
But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

The identification patch provides the possibility to summarize calls
by query_id, so that the normalized query string itself is no longer needed to
be exposed in the view for everyone.

I suggest to add a parameter to recover the possibility to
display real queries. The following very minor change
(based on V7) exposes the first real query getting this
query_id if normalization of the exposed string ist deactivated (The actual
behaviour is the default). This new option is not always the best starting
point to discover index shortfalls, but a huge gain for beginners because it
serves the needs
in more than 90% of the normal use cases.

What do you think?

Arne

P.S. This message is resent, because it is stalled two days, so sorry for a possible duplicate

Date: Mon Oct 7 17:54:08 2013 +0000

Switch to disable normalized query strings

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index e50dfba..6cc9244 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -234,7 +234,7 @@ static int  pgss_max;                       /* max # statements to track */
  static int     pgss_track;                     /* tracking level */
  static bool pgss_track_utility; /* whether to track utility commands */
  static bool pgss_save;                 /* whether to save stats across shutdown */
-
+static bool pgss_normalize;             /* whether to normalize the query representation shown in the view (otherwise show the first query executed with this query_id) */

#define pgss_enabled() \
(pgss_track == PGSS_TRACK_ALL || \
@@ -356,6 +356,17 @@ _PG_init(void)
NULL,
NULL);

+       DefineCustomBoolVariable("pg_stat_statements.normalize",
+                            "Selects whether the view column contains the query strings in a normalized form.",
+                                                          NULL,
+                                                          &pgss_normalize,
+                                                          true,
+                                                          PGC_SUSET,
+                                                          0,
+                                                          NULL,
+                                                          NULL,
+                                                          NULL);
+
         EmitWarningsOnPlaceholders("pg_stat_statements");

/*
@@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId,

query_len = strlen(query);

-               if (jstate)
+               if (jstate && pgss_normalize)
                 {
-                       /* Normalize the string if enabled */
+                       /* Normalize the string is not NULL and normalized query strings are enabled */
                         norm_query = generate_normalized_query(jstate, query,
                                                                                                    &query_len,
                                                                                                    key.encoding);

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43Sameer Thakur
samthakur74@gmail.com
In reply to: Sameer Thakur (#40)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

Please find patch attached which adds documentation for session_start
and introduced fields and corrects documentation for queryid to be
query_id. session_start remains in the view as agreed.
regards
Sameer

Attachments:

pg_stat_statements-identification-v8.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v8.patch.gzDownload
#44Peter Geoghegan
pg@heroku.com
In reply to: Noname (#42)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Oct 10, 2013 at 3:11 AM, <pilum.70@uni-muenster.de> wrote:

But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

You can fairly easily disable the swapping of constants with '?'
symbols, so that the query text stored would match the full originally
executed query. Why would you want to, though? There could be many
actual plans whose costs are aggregated as one query. Seeing one of
them is not necessarily useful at all, and could be misleading.

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45Noname
pilum.70@uni-muenster.de
In reply to: Peter Geoghegan (#44)
Re: pg_stat_statements: calls under-estimation propagation

Thx for your reply.

On Thu, 10 Oct 2013, Peter Geoghegan wrote:

On Thu, Oct 10, 2013 at 3:11 AM, <pilum.70@uni-muenster.de> wrote:

But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

You can fairly easily disable the swapping of constants with '?'
symbols, so that the query text stored would match the full originally

I thought I did ?! I introduced an additional user parameter
to disable the normalization in the patch shown in my last mail.

If there is already an easier way in the actual distribution,
i simply missed ist.
Where is this behaviour documented?

executed query. Why would you want to, though? There could be many
actual plans whose costs are aggregated as one query. Seeing one of
them is not necessarily useful at all, and could be misleading.

Yeah, (thinking of for example parameter ranges) I mentioned that, I think,
but in the majority of cases beginners can easily conclude missing indices
executing explain analyze, because the queries, that are aggregated
and displayed under one query_id have very similar (or simply the same) query plans.

It's also only an option disabled by default: You can simply do nothing, if you don't like it :-)

VlG

Arne Scheffer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#46Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#43)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

Please find patch attached which adds documentation for session_start
and introduced fields and corrects documentation for queryid to be
query_id. session_start remains in the view as agreed.

Thanks for updating the document!

I'm not clear about the use case of 'session_start' and 'introduced' yet.
Could you elaborate it? Maybe we should add that explanation into
the document?

In my test, I found that pg_stat_statements.total_time always indicates a zero.
I guess that the patch might handle pg_stat_statements.total_time wrongly.

+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(pgss->session_start));
+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(entry->introduced));

These should be executed only when detected_version >= PGSS_TUP_LATEST?

+      <entry><structfield>session_start</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>Start time of a statistics session.</entry>
+     </row>
+
+     <row>
+      <entry><structfield>introduced</structfield></entry>
+      <entry><type>text</type></entry>

The data type of these columns is not text.

+    instr_time  session_start;
+    uint64        private_stat_session_key;

it's better to add the comments explaining these fields.

+    microsec = INSTR_TIME_GET_MICROSEC(t) -
+        ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY);

HAVE_INT64_TIMESTAMP should be considered here.

+    if (log_cannot_read)
+        ereport(LOG,
+                (errcode_for_file_access(),
+                 errmsg("could not read pg_stat_statement file \"%s\": %m",
+                        PGSS_DUMP_FILE)));

Is it better to use WARNING instead of LOG as the log level here?

+            /*
+             * The role calling this function is unable to see
+             * sensitive aspects of this tuple.
+             *
+             * Nullify everything except the "insufficient privilege"
+             * message for this entry
+             */
+            memset(nulls, 1, sizeof nulls);
+
+            nulls[i]  = 0;
+            values[i] = CStringGetTextDatum("<insufficient privilege>");

Why do we need to hide *all* the fields in pg_stat_statements, when
it's accessed by improper user? This is a big change of pg_stat_statements
behavior, and it might break the compatibility.

This is not directly related to the patch itself, but why does the
queryid
need to be calculated based on also the "statistics session"?

If we expose hash value of query tree, without using statistics session,
it is possible that users might make wrong assumption that this value
remains stable across version upgrades, when in reality it depends on
whether the version has make changes to query tree internals. So to
explicitly ensure that users do not make this wrong assumption, hash value
generation use statistics session id, which is newly created under
situations described above.

So, ISTM that we can use, for example, the version number to calculate
the query_id. Right?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47Daniel Farina
daniel@heroku.com
In reply to: Fujii Masao (#46)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

Please find patch attached which adds documentation for session_start
and introduced fields and corrects documentation for queryid to be
query_id. session_start remains in the view as agreed.

Thanks for updating the document!

I'm not clear about the use case of 'session_start' and 'introduced' yet.
Could you elaborate it? Maybe we should add that explanation into
the document?

Probably.

The idea is that without those fields it's, to wit, impossible to
explain non-monotonic movement in metrics of those queries for precise
use in tools that insist on monotonicity of the fields, which are all
cumulative to date I think.

pg_stat_statements_reset() or crashing loses the session, so one
expects "ncalls" may decrease. In addition, a query that is bouncing
in and out of the hash table will have its statistics be lost, so its
statistics will also decrease. This can cause un-resolvable artifact
in analysis tools.

The two fields allow for precise understanding of when the statistics
for a given query_id are continuous since the last time a program
inspected it.

In my test, I found that pg_stat_statements.total_time always indicates a zero.
I guess that the patch might handle pg_stat_statements.total_time wrongly.

+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(pgss->session_start));
+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(entry->introduced));

These should be executed only when detected_version >= PGSS_TUP_LATEST?

Yes. Oversight.

+      <entry><structfield>session_start</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry></entry>
+      <entry>Start time of a statistics session.</entry>
+     </row>
+
+     <row>
+      <entry><structfield>introduced</structfield></entry>
+      <entry><type>text</type></entry>

The data type of these columns is not text.

Oops

+    instr_time  session_start;
+    uint64        private_stat_session_key;

it's better to add the comments explaining these fields.

Yeah.

+    microsec = INSTR_TIME_GET_MICROSEC(t) -
+        ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY);

HAVE_INT64_TIMESTAMP should be considered here.

That's not a bad idea.

+    if (log_cannot_read)
+        ereport(LOG,
+                (errcode_for_file_access(),
+                 errmsg("could not read pg_stat_statement file \"%s\": %m",
+                        PGSS_DUMP_FILE)));

Is it better to use WARNING instead of LOG as the log level here?

Is this new code? ....Why did I add it again? Seems reasonable
though to call it a WARNING.

+            /*
+             * The role calling this function is unable to see
+             * sensitive aspects of this tuple.
+             *
+             * Nullify everything except the "insufficient privilege"
+             * message for this entry
+             */
+            memset(nulls, 1, sizeof nulls);
+
+            nulls[i]  = 0;
+            values[i] = CStringGetTextDatum("<insufficient privilege>");

Why do we need to hide *all* the fields in pg_stat_statements, when
it's accessed by improper user? This is a big change of pg_stat_statements
behavior, and it might break the compatibility.

It seems like an information leak that grows more major if query_id is
exposed and, at any point, one can determine the query_id for a query
text.

This is not directly related to the patch itself, but why does the
queryid
need to be calculated based on also the "statistics session"?

If we expose hash value of query tree, without using statistics session,
it is possible that users might make wrong assumption that this value
remains stable across version upgrades, when in reality it depends on
whether the version has make changes to query tree internals. So to
explicitly ensure that users do not make this wrong assumption, hash value
generation use statistics session id, which is newly created under
situations described above.

So, ISTM that we can use, for example, the version number to calculate
the query_id. Right?

That does seem like it may be a more reasonable stability vs. once per
statistics session, because then use case with standbys work somewhat
better.

That said, the general concern was accidental contracts being assumed
by consuming code, so this approach is tuned for making the query_id
as unstable as possible while still being useful: stable, but only in
one statistics gathering section.

I did not raise the objection about over-aggressive contracts being
exposed although I think the concern has merit...but given the use
case involving standbys, I am for now charitable to increasing the
stability to the level you indicate: a guaranteed break every point
release.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48Fujii Masao
masao.fujii@gmail.com
In reply to: Daniel Farina (#47)
Re: pg_stat_statements: calls under-estimation propagation

On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina <daniel@heroku.com> wrote:

Probably.

The idea is that without those fields it's, to wit, impossible to
explain non-monotonic movement in metrics of those queries for precise
use in tools that insist on monotonicity of the fields, which are all
cumulative to date I think.

pg_stat_statements_reset() or crashing loses the session, so one
expects "ncalls" may decrease. In addition, a query that is bouncing
in and out of the hash table will have its statistics be lost, so its
statistics will also decrease. This can cause un-resolvable artifact
in analysis tools.

The two fields allow for precise understanding of when the statistics
for a given query_id are continuous since the last time a program
inspected it.

Thanks for elaborating them! Since 'introduced' is reset even when
the statistics is reset, maybe we can do without 'session_start' for
that purpose?

+            /*
+             * The role calling this function is unable to see
+             * sensitive aspects of this tuple.
+             *
+             * Nullify everything except the "insufficient privilege"
+             * message for this entry
+             */
+            memset(nulls, 1, sizeof nulls);
+
+            nulls[i]  = 0;
+            values[i] = CStringGetTextDatum("<insufficient privilege>");

Why do we need to hide *all* the fields in pg_stat_statements, when
it's accessed by improper user? This is a big change of pg_stat_statements
behavior, and it might break the compatibility.

It seems like an information leak that grows more major if query_id is
exposed and, at any point, one can determine the query_id for a query
text.

So hiding only query and query_id is enough?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Farina (#47)
Re: pg_stat_statements: calls under-estimation propagation

Daniel Farina escribi�:

On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

In my test, I found that pg_stat_statements.total_time always indicates a zero.
I guess that the patch might handle pg_stat_statements.total_time wrongly.

+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(pgss->session_start));
+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(entry->introduced));

These should be executed only when detected_version >= PGSS_TUP_LATEST?

Yes. Oversight.

Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I mean, if
later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
becomes latest, somebody running the current definition with the updated
.so will no longer get these values. Or is the intention that
PGSS_TUP_LATEST will never be updated again, and future versions will
get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
I mean, surely we can always come up with new symbols to use, but it
seems hard to follow.

There's one other use of PGSS_TUP_LATEST here which I think should also
be changed to >= SOME_SPECIFIC_VERSION:

+       if (detected_version >= PGSS_TUP_LATEST)
+       {
+           uint64 qid = pgss->private_stat_session_key;
+ 
+           qid ^= (uint64) entry->query_id;
+           qid ^= ((uint64) entry->query_id) << 32;
+ 
+           values[i++] = Int64GetDatumFast(qid);
+       }

This paragraph reads a bit strange to me:

+  A statistics session is the time period when statistics are gathered by statistics collector 
+  without being reset. So a statistics session continues across normal shutdowns, 
+  but whenever statistics are reset, like during a crash or upgrade, a new time period 
+  of statistics collection commences i.e. a new statistics session. 
+  The query_id value generation is linked to statistics session to emphasize the fact 
+  that whenever statistics are reset,the query_id for the same queries will also change.

"time period when"? Shouldn't that be "time period during which".
Also, doesn't a new "statistics session" start when a stats reset is
invoked by the user? The bit after "commences" appears correct (to me,
not a native by any means) but seems also a bit strange.

I just noticed that the table describing the output indicates that
session_start and introduced are of type text, but the SQL defines
timestamptz.

The instr_time thingy being used for these times maps to
QueryPerformanceCounter() on Windows, and I'm not sure how useful this
is for long-term time tracking; see
http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
for instance. I think it'd be better to use TimestampTz and
GetCurrentTimestamp() for this.

Just noticed that you changed the timer to struct Instrumentation. Not
really sure about that change. Since you seem to be using only the
start time and counter, wouldn't it be better to store only those?
Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50Fujii Masao
masao.fujii@gmail.com
In reply to: Alvaro Herrera (#49)
Re: pg_stat_statements: calls under-estimation propagation

On Fri, Oct 11, 2013 at 2:49 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Daniel Farina escribió:

On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

In my test, I found that pg_stat_statements.total_time always indicates a zero.
I guess that the patch might handle pg_stat_statements.total_time wrongly.

+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(pgss->session_start));
+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(entry->introduced));

These should be executed only when detected_version >= PGSS_TUP_LATEST?

Yes. Oversight.

Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?

I was just thinking the same thing. Agreed.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#51Daniel Farina
daniel@heroku.com
In reply to: Fujii Masao (#48)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina <daniel@heroku.com> wrote:

Probably.

The idea is that without those fields it's, to wit, impossible to
explain non-monotonic movement in metrics of those queries for precise
use in tools that insist on monotonicity of the fields, which are all
cumulative to date I think.

pg_stat_statements_reset() or crashing loses the session, so one
expects "ncalls" may decrease. In addition, a query that is bouncing
in and out of the hash table will have its statistics be lost, so its
statistics will also decrease. This can cause un-resolvable artifact
in analysis tools.

The two fields allow for precise understanding of when the statistics
for a given query_id are continuous since the last time a program
inspected it.

Thanks for elaborating them! Since 'introduced' is reset even when
the statistics is reset, maybe we can do without 'session_start' for
that purpose?

There is a small loss of precision.

The original reason was that if one wanted to know, given two samples
of pg_stat_statements, when the query_id is going to remain stable for
a given query.

For example:

If the session changes on account of a reset, then it is known that
all query_ids one's external program is tracking are no longer going
to be updated, and the program can take account for the fact that the
same query text is going to have a new query_id.

Given the new idea of mixing in the point release number:

If the code is changed to instead mixing in the full version of
Postgres, as you suggested recently, this can probably be removed.
The caveat there is then the client is going to have to do something a
bit weird like ask for the point release and perhaps even compile
options of Postgres to know when the query_id is going to have a
different value for a given query text. But, maybe this is an
acceptable compromise to remove one field.

+            /*
+             * The role calling this function is unable to see
+             * sensitive aspects of this tuple.
+             *
+             * Nullify everything except the "insufficient privilege"
+             * message for this entry
+             */
+            memset(nulls, 1, sizeof nulls);
+
+            nulls[i]  = 0;
+            values[i] = CStringGetTextDatum("<insufficient privilege>");

Why do we need to hide *all* the fields in pg_stat_statements, when
it's accessed by improper user? This is a big change of pg_stat_statements
behavior, and it might break the compatibility.

It seems like an information leak that grows more major if query_id is
exposed and, at any point, one can determine the query_id for a query
text.

So hiding only query and query_id is enough?

Yeah, I think so. The other fields feel a bit weird to leave hanging
around as well, so I thought I'd just "fix" it in one shot, but doing
the minimum or only applying this idea to new fields is safer. It's
shorter to hit all the fields, though, which is why I was tempted to
do that.

Perhaps not a good economy for potential subtle breaks in the next version.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52Daniel Farina
daniel@heroku.com
In reply to: Alvaro Herrera (#49)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Oct 10, 2013 at 10:49 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Daniel Farina escribió:

On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

In my test, I found that pg_stat_statements.total_time always indicates a zero.
I guess that the patch might handle pg_stat_statements.total_time wrongly.

+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(pgss->session_start));
+        values[i++] = DatumGetTimestamp(
+            instr_get_timestamptz(entry->introduced));

These should be executed only when detected_version >= PGSS_TUP_LATEST?

Yes. Oversight.

Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2? I mean, if
later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
becomes latest, somebody running the current definition with the updated
.so will no longer get these values. Or is the intention that
PGSS_TUP_LATEST will never be updated again, and future versions will
get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
I mean, surely we can always come up with new symbols to use, but it
seems hard to follow.

There's one other use of PGSS_TUP_LATEST here which I think should also
be changed to >= SOME_SPECIFIC_VERSION:

+       if (detected_version >= PGSS_TUP_LATEST)
+       {
+           uint64 qid = pgss->private_stat_session_key;
+
+           qid ^= (uint64) entry->query_id;
+           qid ^= ((uint64) entry->query_id) << 32;
+
+           values[i++] = Int64GetDatumFast(qid);
+       }

I made some confusing mistakes here in using the newer tuple
versioning. Let me try to explain what the motivation was:

I was adding new fields to pg_stat_statements and was afraid that it'd
be hard to get a very clear view that all the set fields are in
alignment and there were no accidental overruns, with the problem
getting more convoluted as more versions are added.

The idea of PGSS_TUP_LATEST is useful to catch common programmer error
by testing some invariants, and it'd be nice not to have to thrash the
invariant checking code every release, which would probably defeat the
point of such "oops" prevention code. But, the fact that I went on to
rampantly do questionable things PGSS_TUP_LATEST is a bad sign.

By example, here are the two uses that have served me very well:

/* Perform version detection */
if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
detected_version = PGSS_TUP_V1_0;
else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_1)
detected_version = PGSS_TUP_V1_1;
else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS)
detected_version = PGSS_TUP_LATEST;
else
{
/*
* Couldn't identify the tuple format. Raise error.
*
* This is an exceptional case that may only happen in bizarre
* situations, since it is thought that every released version
* of pg_stat_statements has a matching schema.
*/
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("pg_stat_statements schema is not supported "
"by its installed binary")));
}

And

#ifdef USE_ASSERT_CHECKING
/* Check that every column appears to be filled */
switch (detected_version)
{
case PGSS_TUP_V1_0:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_0);
break;
case PGSS_TUP_V1_1:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_1);
break;
case PGSS_TUP_LATEST:
Assert(i == PG_STAT_STATEMENTS_COLS);
break;
default:
Assert(false);
}
#endif

Given that, perhaps a way to fix this is something like this patch-fragment:

"""
{
PGSS_TUP_V1_0 = 1,
PGSS_TUP_V1_1,
- PGSS_TUP_LATEST
+ PGSS_TUP_V1_2
} pgssTupVersion;

+#define PGSS_TUP_LATEST PGSS_TUP_V1_2
"""

This way when a programmer is making new tuple versions, they are much
more likely to see the immediate need to teach those two sites about
the new tuple size. But, the fact that one does not get the
invariants updated in a completely compulsory way may push the value
of this checking under water.

I'd be sad to see it go, it has saved me a lot of effort when
returning to the code after a while.

What do you think?

The instr_time thingy being used for these times maps to
QueryPerformanceCounter() on Windows, and I'm not sure how useful this
is for long-term time tracking; see
http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
for instance. I think it'd be better to use TimestampTz and
GetCurrentTimestamp() for this.

Ah. I was going to do that, but thought it'd be nice to merely push
down the already-extant Instr struct in most cases, as to get the
'start' time stored there for "free." But if it can't work on
Windows, maybe trying to spare the code from a new abstraction to
learn or argument list creep is ending in failure this time.

Just noticed that you changed the timer to struct Instrumentation. Not
really sure about that change. Since you seem to be using only the
start time and counter, wouldn't it be better to store only those?
Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

Yeah, I was unsure about that too.

The motivation was that I need one more piece of information in
pgss_store (the absolute start time). I was going to widen the
argument list, but it was looking pretty long, so instead I was
thinking it'd be more concise to push the entire, typically extant
Instrumentation struct pointer down.

There is that one ugly site you noticed where I had to make a new,
zero-valued Instrumentation value and its pointer to do that, and I
also found was that I needed guards for NULL Instrumentation values,
so maybe this experiment was a failure.

I considered ginning up a new Instr on the stack as a
micro-optimization, but figured getting a full-blown allocated
zero-valued Instr by re-using the API was clean enough for the
purpose.

It looks like the concern about Windows may demand that re-using the
"start" time out of Instr is not a good idea, and that was the whole
point of that attempt.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#53Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Farina (#52)
Re: pg_stat_statements: calls under-estimation propagation

Daniel Farina escribi�:

Given that, perhaps a way to fix this is something like this patch-fragment:

"""
{
PGSS_TUP_V1_0 = 1,
PGSS_TUP_V1_1,
- PGSS_TUP_LATEST
+ PGSS_TUP_V1_2
} pgssTupVersion;

+#define PGSS_TUP_LATEST PGSS_TUP_V1_2
"""

This sounds good. I have seen other places that have the LATEST
definition as part of the enum, assigning the previous value to it. I'm
not really sure which of these is harder to miss when updating the code.
I'm happy with either.

Make sure to use the PGSS_TUP_V1_2 in the two places mentioned, in any
case.

Just noticed that you changed the timer to struct Instrumentation. Not
really sure about that change. Since you seem to be using only the
start time and counter, wouldn't it be better to store only those?
Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

Yeah, I was unsure about that too.

The motivation was that I need one more piece of information in
pgss_store (the absolute start time). I was going to widen the
argument list, but it was looking pretty long, so instead I was
thinking it'd be more concise to push the entire, typically extant
Instrumentation struct pointer down.

Would it work to define your own struct to pass around?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54Daniel Farina
daniel@heroku.com
In reply to: Alvaro Herrera (#53)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Oct 10, 2013 at 1:30 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Just noticed that you changed the timer to struct Instrumentation. Not
really sure about that change. Since you seem to be using only the
start time and counter, wouldn't it be better to store only those?
Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

Yeah, I was unsure about that too.

The motivation was that I need one more piece of information in
pgss_store (the absolute start time). I was going to widen the
argument list, but it was looking pretty long, so instead I was
thinking it'd be more concise to push the entire, typically extant
Instrumentation struct pointer down.

Would it work to define your own struct to pass around?

Absolutely, I was just hoping to spare the code another abstraction if
another was a precise superset.

Looks like that isn't going to happen, though, so a pgss-oriented
struct is likely what will have to be.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55Peter Eisentraut
peter_e@gmx.net
In reply to: Sameer Thakur (#43)
Re: pg_stat_statements: calls under-estimation propagation

On 10/10/13 6:20 AM, Sameer Thakur wrote:

Please find patch attached which adds documentation for session_start
and introduced fields and corrects documentation for queryid to be
query_id. session_start remains in the view as agreed.

Please fix the tabs in the SGML files.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56Sameer Thakur
samthakur74@gmail.com
In reply to: Alvaro Herrera (#49)
Re: pg_stat_statements: calls under-estimation propagation

This paragraph reads a bit strange to me:

+  A statistics session is the time period when statistics are gathered by
statistics collector
+  without being reset. So a statistics session continues across normal
shutdowns,
+  but whenever statistics are reset, like during a crash or upgrade, a new
time period
+  of statistics collection commences i.e. a new statistics session.
+  The query_id value generation is linked to statistics session to
emphasize the fact
+  that whenever statistics are reset,the query_id for the same queries will
also change.

"time period when"? Shouldn't that be "time period during which".
Also, doesn't a new "statistics session" start when a stats reset is
invoked by the user? The bit after "commences" appears correct (to me,
not a native by any means) but seems also a bit strange.

I have tried to rephrase this. Hopefully less confusing

A statistics session refers to the time period when statement
statistics are gathered by
statistics collector. A statistics session persists across normal
shutdowns. Whenever statistics are reset like during a crash or upgrade, a new
statistics session starts. The query_id value generation is linked to
statistics session to
emphasize that whenever statistics are reset,the query_id for the same
queries will also change.

regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5774365.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#57Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#54)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

Hello,
Please find attached pg_stat_statements-identification-v9.patch.
I have tried to address the following review comments
1. Use version PGSS_TUP_V1_2
2.Fixed total time being zero
3. Remove 'session_start' from the view and use point release number
to generate queryid
4. Hide only queryid and query text and not all fields from unauthorized user
5. Removed introduced field from view and code as statistics session
concept is not being used
6. Removed struct Instrumentation usage
7. Updated sgml to reflect changes made. Removed all references to
statistics session, and introduced fields.

regards
Sameer

Attachments:

pg_stat_statements-identification-v9.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v9.patch.gzDownload
#58Peter Geoghegan
pg@heroku.com
In reply to: Sameer Thakur (#57)
Re: pg_stat_statements: calls under-estimation propagation

On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur <samthakur74@gmail.com> wrote:

Hello,
Please find attached pg_stat_statements-identification-v9.patch.

I took a quick look. Observations:

+	/* Making query ID dependent on PG version */
+	query->queryId |= PG_VERSION_NUM << 16;

If you want to do something like this, make the value of
PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

Why are you doing this?

@@ -128,6 +146,7 @@ typedef struct pgssEntry
 	pgssHashKey key;			/* hash key of entry - MUST BE FIRST */
 	Counters	counters;		/* the statistics for this query */
 	int			query_len;		/* # of valid bytes in query string */
+	uint32		query_id;		/* jumble value for this entry */

query_id is already in "key".

Not sure I like the idea of the new enum at all, but in any case you
shouldn't have a PGSS_TUP_LATEST constant - should someone go update
all usage of that constant only when your version isn't the latest?
Like here:

+ if (detected_version >= PGSS_TUP_LATEST)

I forget why Daniel originally altered the min value of
pg_stat_statements.max to 1 (I just remember that he did), but I don't
think it holds that you should keep it there. Have you considered the
failure modes when it is actually set to 1?

This is what I call a "can't happen" error, or a defensive one:

+	else
+	{
+		/*
+		 * Couldn't identify the tuple format.  Raise error.
+		 *
+		 * This is an exceptional case that may only happen in bizarre
+		 * situations, since it is thought that every released version
+		 * of pg_stat_statements has a matching schema.
+		 */
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("pg_stat_statements schema is not supported "
+						"by its installed binary")));
+	}

I'll generally make these simple elogs(), which are more terse. No one
is going to find all that dressing useful.

Please take a look at this, for future reference:

https://wiki.postgresql.org/wiki/Creating_Clean_Patches

The whitespace changes are distracting.

It probably isn't useful to comment random, unaffected code that isn't
affected by your patch - I don't find this new refactoring useful, and
am surprised to see it in your patch:

+	/* Check header existence and magic number match. */
 	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
-		header != PGSS_FILE_HEADER ||
-		fread(&num, sizeof(int32), 1, file) != 1)
+		header != PGSS_FILE_HEADER)
+		goto error;
+
+	/* Read how many table entries there are. */
+	if (fread(&num, sizeof(int32), 1, file) != 1)
 		goto error;

Did you mean to add all this, or is it left over from Daniel's patch?

@@ -43,6 +43,7 @@
*/
#include "postgres.h"

+#include <time.h>
#include <unistd.h>

#include "access/hash.h"
@@ -59,15 +60,18 @@
#include "storage/spin.h"
#include "tcop/utility.h"
#include "utils/builtins.h"
+#include "utils/timestamp.h"

Final thought: I think the order in the pg_stat_statements view is
wrong. It ought to be like a composite primary key - (userid, dbid,
query_id).

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59Daniel Farina
daniel@heroku.com
In reply to: Peter Geoghegan (#58)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Nov 14, 2013 at 7:25 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Tue, Nov 5, 2013 at 5:30 AM, Sameer Thakur <samthakur74@gmail.com> wrote:

Hello,
Please find attached pg_stat_statements-identification-v9.patch.

I took a quick look. Observations:

+       /* Making query ID dependent on PG version */
+       query->queryId |= PG_VERSION_NUM << 16;

If you want to do something like this, make the value of
PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

Why are you doing this?

The destabilization of the query_id is to attempt to skirt the
objection that the unpredictability of instability in the query_id
would otherwise cause problems and present a false contract,
particular in regards to point release upgrades.

Earliest drafts of mine included destabilizing every session, but this
kills off a nice use case of correlating query ids between primaries
and standbys, if memory serves. This has the semantics of
destabilizing -- for sure -- every point release.

@@ -128,6 +146,7 @@ typedef struct pgssEntry
pgssHashKey key;                        /* hash key of entry - MUST BE FIRST */
Counters        counters;               /* the statistics for this query */
int                     query_len;              /* # of valid bytes in query string */
+       uint32          query_id;               /* jumble value for this entry */

query_id is already in "key".

Not sure I like the idea of the new enum at all, but in any case you
shouldn't have a PGSS_TUP_LATEST constant - should someone go update
all usage of that constant only when your version isn't the latest?
Like here:

+ if (detected_version >= PGSS_TUP_LATEST)

It is roughly modeled after the "column" version of the same thing
that pre-existed in pg_stat_statements. The only reason to have a
LATEST is for some of the invariants though; otherwise, most uses
should use the version-stamped symbol. I did this wrongly a bunch of
times as spotted by Alvaro, I believe.

I forget why Daniel originally altered the min value of
pg_stat_statements.max to 1 (I just remember that he did), but I don't
think it holds that you should keep it there. Have you considered the
failure modes when it is actually set to 1?

Testing. It was very useful to set to small numbers, like two or
three. It's not crucial to the patch at all but having to whack it
back all the time to keep the patch submission devoid of it seemed
impractically tedious during revisions.

This is what I call a "can't happen" error, or a defensive one:

+       else
+       {
+               /*
+                * Couldn't identify the tuple format.  Raise error.
+                *
+                * This is an exceptional case that may only happen in bizarre
+                * situations, since it is thought that every released version
+                * of pg_stat_statements has a matching schema.
+                */
+               ereport(ERROR,
+                               (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                errmsg("pg_stat_statements schema is not supported "
+                                               "by its installed binary")));
+       }

I'll generally make these simple elogs(), which are more terse. No one
is going to find all that dressing useful.

That seems reasonable. Yes, it's basically a soft assert. I hit this
one in development a few times, it was handy.

It probably isn't useful to comment random, unaffected code that isn't
affected by your patch - I don't find this new refactoring useful, and
am surprised to see it in your patch:

+       /* Check header existence and magic number match. */
if (fread(&header, sizeof(uint32), 1, file) != 1 ||
-               header != PGSS_FILE_HEADER ||
-               fread(&num, sizeof(int32), 1, file) != 1)
+               header != PGSS_FILE_HEADER)
+               goto error;
+
+       /* Read how many table entries there are. */
+       if (fread(&num, sizeof(int32), 1, file) != 1)
goto error;

Did you mean to add all this, or is it left over from Daniel's patch?

The whitespace changes are not intentional, but I think the comments
help a reader track what's going on better, which becomes more
important if one adds more fields. It can be broken out into a
separate patch, but it didn't seem like that bookkeeping was
necessary.

@@ -43,6 +43,7 @@
*/
#include "postgres.h"

+#include <time.h>
#include <unistd.h>

#include "access/hash.h"
@@ -59,15 +60,18 @@
#include "storage/spin.h"
#include "tcop/utility.h"
#include "utils/builtins.h"
+#include "utils/timestamp.h"

Final thought: I think the order in the pg_stat_statements view is
wrong. It ought to be like a composite primary key - (userid, dbid,
query_id).

I made no design decisions here...no complaints from me.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#60Sameer Thakur
samthakur74@gmail.com
In reply to: Peter Geoghegan (#58)
Re: pg_stat_statements: calls under-estimation propagation

I took a quick look. Observations:

+ /* Making query ID dependent on PG version */
+ query->queryId |= PG_VERSION_NUM << 16;

If you want to do something like this, make the value of
PGSS_FILE_HEADER incorporate (PG_VERSION_NUM / 100) or something.

Why are you doing this?

The thought was queryid should have a different value for the same
query across PG versions, to ensure that clients using
the view,do not assume otherwise.

@@ -128,6 +146,7 @@ typedef struct pgssEntry
pgssHashKey key; /* hash key of entry - MUST BE FIRST */
Counters counters; /* the statistics for this query */
int query_len; /* # of valid bytes in query string */
+ uint32 query_id; /* jumble value for this entry */

query_id is already in "key".

Not sure I like the idea of the new enum at all, but in any case you
shouldn't have a PGSS_TUP_LATEST constant - should someone go update
all usage of that constant only when your version isn't the latest?
Like here:

+ if (detected_version >= PGSS_TUP_LATEST)

There is #define PGSS_TUP_LATEST PGSS_TUP_V1_2
So if an update has to be done, this is the one place to do it.

I forget why Daniel originally altered the min value of
pg_stat_statements.max to 1 (I just remember that he did), but I don't
think it holds that you should keep it there. Have you considered the
failure modes when it is actually set to 1?

Will set it back to the original value and also test for max value = 1

This is what I call a "can't happen" error, or a defensive one:

+ else
+ {
+ /*
+ * Couldn't identify the tuple format.  Raise error.
+ *
+ * This is an exceptional case that may only happen in bizarre
+ * situations, since it is thought that every released version
+ * of pg_stat_statements has a matching schema.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("pg_stat_statements schema is not supported "
+ "by its installed binary")));
+ }

I'll generally make these simple elogs(), which are more terse. No one
is going to find all that dressing useful.

Will convert to using elogs

Please take a look at this, for future reference:

https://wiki.postgresql.org/wiki/Creating_Clean_Patches

The whitespace changes are distracting.

Thanks! Still learning the art of clean patch submission.

It probably isn't useful to comment random, unaffected code that isn't
affected by your patch - I don't find this new refactoring useful, and
am surprised to see it in your patch:

+ /* Check header existence and magic number match. */
if (fread(&header, sizeof(uint32), 1, file) != 1 ||
- header != PGSS_FILE_HEADER ||
- fread(&num, sizeof(int32), 1, file) != 1)
+ header != PGSS_FILE_HEADER)
+ goto error;
+
+ /* Read how many table entries there are. */
+ if (fread(&num, sizeof(int32), 1, file) != 1)
goto error;

Did you mean to add all this, or is it left over from Daniel's patch?

I think its a carry over from Daniel's code. I understand the thought.
Will keep patch strictly restricted to functionality implemented

@@ -43,6 +43,7 @@
*/
#include "postgres.h"

+#include <time.h>
#include <unistd.h>

#include "access/hash.h"
@@ -59,15 +60,18 @@
#include "storage/spin.h"
#include "tcop/utility.h"
#include "utils/builtins.h"
+#include "utils/timestamp.h"

Final thought: I think the order in the pg_stat_statements view is
wrong. It ought to be like a composite primary key - (userid, dbid,
query_id).

Will make the change.

--
Peter Geoghegan

Thank you for the review
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5778472.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#61Sameer Thakur
samthakur74@gmail.com
In reply to: Sameer Thakur (#60)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

Hello,
Please find v10 of patch attached. This patch addresses following
review comments
1. Removed errcode and used elogs for error "pg_stat_statements schema
is not supported by its binary"
2. Removed comments and other code formatting not directly relevant to
patch functionality
3. changed position of query_id in view to userid,dbid,query_id..
4 cleaned the patch some more to avoid unnecessary whitespaces, newlines.

I assume the usage of PGSS_TUP_LATEST after explanation given.
Also the mixing of PG_VERSION_NUM with query_id is ok after after
explanation given.

regards
Sameer

Attachments:

pg_stat_statements-identification-v10.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v10.patch.gzDownload
#62Peter Geoghegan
pg@heroku.com
In reply to: Sameer Thakur (#61)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

On Mon, Nov 18, 2013 at 1:54 AM, Sameer Thakur <samthakur74@gmail.com> wrote:

Please find v10 of patch attached. This patch addresses following
review comments

I've cleaned this up - revision attached - and marked it "ready for committer".

I decided that queryid should be of type oid, not bigint. This is
arguably a slight abuse of notation, but since ultimately Oids are
just abstract object identifiers (so say the docs), but also because
there is no other convenient, minimal way of representing unsigned
32-bit integers in the view that I'm aware of, I'm inclined to think
that it's appropriate.

In passing, I've made pg_stat_statements invalidate serialized entries
if there is a change in major version. This seems desirable as a
catch-all invalidator of entries.

I note that Tom has objected to exposing the queryid in the past, on
numerous occasions. I'm more confident than ever that it's actually
the right thing to do. I've had people I don't know walk up to me at
conferences and ask me what we don't already expose this at least
twice now. There are very strong indications that many people want
this, and given that I've documented the caveats, I think that we
should trust those calling for this. At the very least, it allows
people to GROUP BY queryid, when they don't want things broken out by
userid.

We're self-evidently already effectively relying on the queryid to be
as stable as it is documented to be in this patch. The hash function
cannot really change in minor releases, because to do so would at the
very least necessitate re-indexing hash indexes, and would of course
invalidate internally managed pg_stat_statements entries, both of
which are undesirable outcomes (and therefore, for these reasons and
more, unlikely). Arguments for not documenting hash_any() do not apply
here -- we're already suffering the full consequences of whatever
queryid instability may exist.

Quite apart from all of that, I think we need to have a way of
identifying particular entries for the purposes of supporting
per-entry "settings". Recent discussion about min/max query time, or
somehow characterizing the distribution of each entry's historic
execution time (or whatever) have not considered one important
questoin: What are you supposed to do when you find out that there is
an outlier (whatever an outlier is)?

I won't go into the details, because there is little point, but I'm
reasonably confident that it will be virtually impossible for
pg_stat_statements itself to usefully classify particular query
executions as outliers (I'm not even sure that we could do it if we
assumed a normal distribution, which would be bogus, and certainly
made very noisy by caching effects and so on. Furthermore, who are we
to say that an outlier is an execution time two sigmas to the right?
Seems useless).

Outliers are typically caused by things like bad plans, or problematic
constant values that appear in the most common values list (and are
therefore just inherently far more expensive to query against), or
lock contention. In all of those cases, with a min/max or something we
probably won't even get to know what the problematic constant values
were when response time suddenly suffers, because of course
pg_stat_statements doesn't help with that. So have we gained much?
Even with detective work, the trail might have gone cold by the time
the outlier is examined. And, monitoring is only one concern -- what
about alerting?

The bigger point is that having this will facilitate being able to
mark entries as "SLA queries" or something like that, where if their
execution exceeds a time (specified by the DBA per entry), that is
assumed to be very bad, and pg_stat_statements complains. That gets
dumped to the logs (which ought to be a rare occurrence when the
facility is used correctly). Of course, the (typically particularly
problematic) constant values *do* appear in the logs, and there is a
greppable keyword, potentially for the benefit of a tool like
tail_n_mail. You could think of this as being like a smart
log_min_duration_statement. I think that the DBA needs to tell
pg_stat_statements what to care about, and what bad looks like. If the
DBA doesn't know where to start specifying such things, the 5 queries
with the most calls can usefully have this set to (mean_execution_time
* 1.5) or something. SLA queries can also be "pinned", perhaps (that
is, given a "stay of execution" when eviction occurs).

--
Peter Geoghegan

Attachments:

pg_stat_statements-identification-v11.patchtext/x-patch; charset=US-ASCII; name=pg_stat_statements-identification-v11.patchDownload
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
new file mode 100644
index e8aed61..95a2767
*** a/contrib/pg_stat_statements/Makefile
--- b/contrib/pg_stat_statements/Makefile
*************** MODULE_big = pg_stat_statements
*** 4,11 ****
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
! 	pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
--- 4,11 ----
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
! 	pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
new file mode 100644
index ...7dd6368
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***************
*** 0 ****
--- 1,43 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use "ALTER EXTENSION pg_stat_statements UPDATE" to load this file. \quit
+ 
+ /* First we have to remove them from the extension */
+ ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+ 
+ /* Then we can drop them */
+ DROP VIEW pg_stat_statements;
+ DROP FUNCTION pg_stat_statements();
+ 
+ /* Now redefine */
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT queryid oid,
+     OUT query text,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
new file mode .
index 42e4d68..e69de29
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
***************
*** 1,43 ****
- /* contrib/pg_stat_statements/pg_stat_statements--1.1.sql */
- 
- -- complain if script is sourced in psql, rather than via CREATE EXTENSION
- \echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
- 
- -- Register functions.
- CREATE FUNCTION pg_stat_statements_reset()
- RETURNS void
- AS 'MODULE_PATHNAME'
- LANGUAGE C;
- 
- CREATE FUNCTION pg_stat_statements(
-     OUT userid oid,
-     OUT dbid oid,
-     OUT query text,
-     OUT calls int8,
-     OUT total_time float8,
-     OUT rows int8,
-     OUT shared_blks_hit int8,
-     OUT shared_blks_read int8,
-     OUT shared_blks_dirtied int8,
-     OUT shared_blks_written int8,
-     OUT local_blks_hit int8,
-     OUT local_blks_read int8,
-     OUT local_blks_dirtied int8,
-     OUT local_blks_written int8,
-     OUT temp_blks_read int8,
-     OUT temp_blks_written int8,
-     OUT blk_read_time float8,
-     OUT blk_write_time float8
- )
- RETURNS SETOF record
- AS 'MODULE_PATHNAME'
- LANGUAGE C;
- 
- -- Register a view on the function for ease of use.
- CREATE VIEW pg_stat_statements AS
-   SELECT * FROM pg_stat_statements();
- 
- GRANT SELECT ON pg_stat_statements TO PUBLIC;
- 
- -- Don't want this to be available to non-superusers.
- REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
--- 0 ----
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
new file mode 100644
index ...83a8454
*** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***************
*** 0 ****
--- 1,44 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
+ 
+ -- Register functions.
+ CREATE FUNCTION pg_stat_statements_reset()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT queryid oid,
+     OUT query text,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ -- Register a view on the function for ease of use.
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
+ 
+ -- Don't want this to be available to non-superusers.
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
new file mode 100644
index ea930af..3e6b146
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
*************** PG_MODULE_MAGIC;
*** 67,73 ****
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20120328;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
--- 67,75 ----
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20131115;
! /* Postgres major version number, changes in which invalidate all entries */
! static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
*************** static const uint32 PGSS_FILE_HEADER = 0
*** 80,85 ****
--- 82,97 ----
  #define JUMBLE_SIZE				1024	/* query serialization buffer size */
  
  /*
+  * Extension version number, for supporting older extension versions' objects
+  */
+ typedef enum pgssVersion
+ {
+ 	PGSS_V1_0,
+ 	PGSS_V1_1,
+ 	PGSS_V1_2
+ } pgssVersion;
+ 
+ /*
   * Hashtable key that defines the identity of a hashtable entry.  We separate
   * queries by user and by database even if they are otherwise identical.
   *
*************** typedef struct pgssHashKey
*** 92,98 ****
  	Oid			userid;			/* user OID */
  	Oid			dbid;			/* database OID */
  	int			encoding;		/* query encoding */
! 	uint32		queryid;		/* query identifier */
  } pgssHashKey;
  
  /*
--- 104,110 ----
  	Oid			userid;			/* user OID */
  	Oid			dbid;			/* database OID */
  	int			encoding;		/* query encoding */
! 	Oid			queryid;		/* query identifier */
  } pgssHashKey;
  
  /*
*************** static void pgss_ProcessUtility(Node *pa
*** 245,251 ****
  static uint32 pgss_hash_fn(const void *key, Size keysize);
  static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
  static uint32 pgss_hash_string(const char *str);
! static void pgss_store(const char *query, uint32 queryId,
  		   double total_time, uint64 rows,
  		   const BufferUsage *bufusage,
  		   pgssJumbleState *jstate);
--- 257,263 ----
  static uint32 pgss_hash_fn(const void *key, Size keysize);
  static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
  static uint32 pgss_hash_string(const char *str);
! static void pgss_store(const char *query, Oid queryId,
  		   double total_time, uint64 rows,
  		   const BufferUsage *bufusage,
  		   pgssJumbleState *jstate);
*************** pgss_shmem_startup(void)
*** 390,395 ****
--- 402,408 ----
  	FILE	   *file;
  	uint32		header;
  	int32		num;
+ 	int32		pgver;
  	int32		i;
  	int			query_size;
  	int			buffer_size;
*************** pgss_shmem_startup(void)
*** 465,470 ****
--- 478,485 ----
  
  	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
  		header != PGSS_FILE_HEADER ||
+ 		fread(&pgver, sizeof(uint32), 1, file) != 1 ||
+ 		pgver != PGSS_PG_MAJOR_VERSION ||
  		fread(&num, sizeof(int32), 1, file) != 1)
  		goto error;
  
*************** pgss_shmem_shutdown(int code, Datum arg)
*** 565,570 ****
--- 580,587 ----
  
  	if (fwrite(&PGSS_FILE_HEADER, sizeof(uint32), 1, file) != 1)
  		goto error;
+ 	if (fwrite(&PGSS_PG_MAJOR_VERSION, sizeof(uint32), 1, file) != 1)
+ 		goto error;
  	num_entries = hash_get_num_entries(pgss_hash);
  	if (fwrite(&num_entries, sizeof(int32), 1, file) != 1)
  		goto error;
*************** pgss_ExecutorFinish(QueryDesc *queryDesc
*** 756,762 ****
  static void
  pgss_ExecutorEnd(QueryDesc *queryDesc)
  {
! 	uint32		queryId = queryDesc->plannedstmt->queryId;
  
  	if (queryId != 0 && queryDesc->totaltime && pgss_enabled())
  	{
--- 773,779 ----
  static void
  pgss_ExecutorEnd(QueryDesc *queryDesc)
  {
! 	Oid		queryId = (Oid) queryDesc->plannedstmt->queryId;
  
  	if (queryId != 0 && queryDesc->totaltime && pgss_enabled())
  	{
*************** pgss_ProcessUtility(Node *parsetree, con
*** 809,815 ****
  		uint64		rows = 0;
  		BufferUsage bufusage_start,
  					bufusage;
! 		uint32		queryId;
  
  		bufusage_start = pgBufferUsage;
  		INSTR_TIME_SET_CURRENT(start);
--- 826,832 ----
  		uint64		rows = 0;
  		BufferUsage bufusage_start,
  					bufusage;
! 		Oid			queryId;
  
  		bufusage_start = pgBufferUsage;
  		INSTR_TIME_SET_CURRENT(start);
*************** pgss_hash_string(const char *str)
*** 942,948 ****
   * query string.  total_time, rows, bufusage are ignored in this case.
   */
  static void
! pgss_store(const char *query, uint32 queryId,
  		   double total_time, uint64 rows,
  		   const BufferUsage *bufusage,
  		   pgssJumbleState *jstate)
--- 959,965 ----
   * query string.  total_time, rows, bufusage are ignored in this case.
   */
  static void
! pgss_store(const char *query, Oid queryId,
  		   double total_time, uint64 rows,
  		   const BufferUsage *bufusage,
  		   pgssJumbleState *jstate)
*************** pg_stat_statements_reset(PG_FUNCTION_ARG
*** 1069,1075 ****
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS			18
  
  /*
   * Retrieve statement statistics.
--- 1086,1093 ----
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS_V1_1	18
! #define PG_STAT_STATEMENTS_COLS			19
  
  /*
   * Retrieve statement statistics.
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1086,1092 ****
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	bool		sql_supports_v1_1_counters = true;
  
  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
--- 1104,1110 ----
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	pgssVersion detected_version;
  
  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1107,1114 ****
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
! 	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! 		sql_supports_v1_1_counters = false;
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
--- 1125,1145 ----
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
! 
! 	switch (tupdesc->natts)
! 	{
! 		case PG_STAT_STATEMENTS_COLS_V1_0:
! 			detected_version = PGSS_V1_0;
! 			break;
! 		case PG_STAT_STATEMENTS_COLS_V1_1:
! 			detected_version = PGSS_V1_1;
! 			break;
! 		case PG_STAT_STATEMENTS_COLS:
! 			detected_version = PGSS_V1_2;
! 			break;
! 		default:
! 			elog(ERROR, "pgss version unrecognized from tuple descriptor");
! 	}
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1140,1145 ****
--- 1171,1179 ----
  		{
  			char	   *qstr;
  
+ 			if (detected_version >= PGSS_V1_2)
+ 				values[i++] = ObjectIdGetDatum(entry->key.queryid);
+ 
  			qstr = (char *)
  				pg_do_encoding_conversion((unsigned char *) entry->query,
  										  entry->query_len,
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1150,1156 ****
--- 1184,1195 ----
  				pfree(qstr);
  		}
  		else
+ 		{
+ 			if (detected_version >= PGSS_V1_2)
+ 				nulls[i++] = true;
+ 
  			values[i++] = CStringGetTextDatum("<insufficient privilege>");
+ 		}
  
  		/* copy counters to a local variable to keep locking time short */
  		{
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1170,1193 ****
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (sql_supports_v1_1_counters)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
  		}
  
! 		Assert(i == (sql_supports_v1_1_counters ?
! 					 PG_STAT_STATEMENTS_COLS : PG_STAT_STATEMENTS_COLS_V1_0));
  
  		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
  	}
--- 1209,1235 ----
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (detected_version >= PGSS_V1_1)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (detected_version >= PGSS_V1_1)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (detected_version >= PGSS_V1_1)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
  		}
  
! 		Assert(i == (detected_version == PGSS_V1_0?
! 						 PG_STAT_STATEMENTS_COLS_V1_0:
! 					 detected_version == PGSS_V1_1?
! 						 PG_STAT_STATEMENTS_COLS_V1_1:
! 					 PG_STAT_STATEMENTS_COLS));
  
  		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
  	}
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
new file mode 100644
index 428fbb2..6ecf2b6
*** a/contrib/pg_stat_statements/pg_stat_statements.control
--- b/contrib/pg_stat_statements/pg_stat_statements.control
***************
*** 1,5 ****
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.1'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
--- 1,5 ----
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.2'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
new file mode 100644
index c02fdf4..f54c6a0
*** a/doc/src/sgml/pgstatstatements.sgml
--- b/doc/src/sgml/pgstatstatements.sgml
***************
*** 23,33 ****
    <title>The <structname>pg_stat_statements</structname> View</title>
  
    <para>
!    The statistics gathered by the module are made available via a system view
!    named <structname>pg_stat_statements</>.  This view contains one row for
!    each distinct query, database ID, and user ID (up to the maximum
!    number of distinct statements that the module can track).  The columns
!    of the view are shown in <xref linkend="pgstatstatements-columns">.
    </para>
  
    <table id="pgstatstatements-columns">
--- 23,34 ----
    <title>The <structname>pg_stat_statements</structname> View</title>
  
    <para>
!    The statistics gathered by the module are made available via a
!    system view named <structname>pg_stat_statements</>.  This view
!    contains one row for each distinct database ID, user ID and query
!    ID (up to the maximum number of distinct statements that the module
!    can track).  The columns of the view are shown in
!    <xref linkend="pgstatstatements-columns">.
    </para>
  
    <table id="pgstatstatements-columns">
***************
*** 57,63 ****
        <entry>OID of database in which the statement was executed</entry>
       </row>
  
!     <row>
        <entry><structfield>query</structfield></entry>
        <entry><type>text</type></entry>
        <entry></entry>
--- 58,71 ----
        <entry>OID of database in which the statement was executed</entry>
       </row>
  
!      <row>
!       <entry><structfield>queryid</structfield></entry>
!       <entry><type>oid</type></entry>
!       <entry></entry>
!       <entry>OID of internal identifier, a hash value computed from the entry's observed post-parse-analysis tree</entry>
!      </row>
! 
!      <row>
        <entry><structfield>query</structfield></entry>
        <entry><type>text</type></entry>
        <entry></entry>
***************
*** 189,197 ****
    </para>
  
    <para>
!    For security reasons, non-superusers are not allowed to see the text of
!    queries executed by other users.  They can see the statistics, however,
!    if the view has been installed in their database.
    </para>
  
    <para>
--- 197,206 ----
    </para>
  
    <para>
!    For security reasons, non-superusers are not allowed to see the SQL
!    text or queryid of queries executed by other users.  They can see
!    the statistics, however, if the view has been installed in their
!    database.
    </para>
  
    <para>
***************
*** 209,216 ****
     When a constant's value has been ignored for purposes of matching the
     query to other queries, the constant is replaced by <literal>?</literal>
     in the <structname>pg_stat_statements</> display.  The rest of the query
!    text is that of the first query that had the particular hash value
!    associated with the <structname>pg_stat_statements</> entry.
    </para>
  
    <para>
--- 218,226 ----
     When a constant's value has been ignored for purposes of matching the
     query to other queries, the constant is replaced by <literal>?</literal>
     in the <structname>pg_stat_statements</> display.  The rest of the query
!    text is that of the first query that had the particular
!    <structfield>queryid</> hash value associated with the
!    <structname>pg_stat_statements</> entry.
    </para>
  
    <para>
***************
*** 223,232 ****
    </para>
  
    <para>
!    Since the hash value is computed on the post-parse-analysis representation
!    of the queries, the opposite is also possible: queries with identical texts
!    might appear as separate entries, if they have different meanings as a
!    result of factors such as different <varname>search_path</> settings.
    </para>
   </sect2>
  
--- 233,277 ----
    </para>
  
    <para>
!    Since the <structfield>queryid</> hash value is computed on the
!    post-parse-analysis representation of the queries, the opposite is
!    also possible: queries with identical texts might appear as
!    separate entries, if they have different meanings as a result of
!    factors such as different <varname>search_path</> settings.
!   </para>
! 
!   <para>
!    Consumers of <literal>pg_stat_statements</> may wish to use
!    <structfield>queryid</> (perhaps in composite with
!    <structfield>dbid</> and <structfield>userid</>) as a more stable
!    and reliable identifier for each entry than its query text.
!    However, it is important to understand that there are only limited
!    guarantees around the stability of the <structfield>queryid</> hash
!    value.  Since the identifier is derived from the
!    post-parse-analysis tree, its value is a function of, among other
!    things, the internal identifiers that comprise this representation.
!    This has some counterintuitive implications.  For example, a query
!    against a table that is fingerprinted by
!    <literal>pg_stat_statements</> will appear distinct to a
!    subsequently executed query that a reasonable observer might judge
!    to be a non-distinct, if in the interim the table was dropped and
!    re-created.  The hashing process is sensitive to difference in
!    machine architecture and other facets of the platform.
!    Furthermore, it is not safe to assume that <structfield>queryid</>
!    will be stable across major versions of <productname>PostgreSQL</>.
!   </para>
! 
!   <para>
!    As a rule of thumb, an assumption of the stability or comparability
!    of <structfield>querid</> values should be predicated on the the
!    underlying catalog metadata and hash function implementation
!    details exactly matching.  Any two servers participating in any
!    variety of replication based on physical WAL-replay can be expected
!    to have identical <structfield>querid</> values for the same query.
!    Logical replication schemes do not have replicas comparable in all
!    relevant regards, and so <structfield>querid</> will not be a
!    useful identifier for accumulating costs for the entire replica set
!    in aggregate.  If in doubt, direct testing is recommended.
    </para>
   </sect2>
  
#63Sameer Thakur
samthakur74@gmail.com
In reply to: Peter Geoghegan (#62)
Re: pg_stat_statements: calls under-estimation propagation

I've cleaned this up - revision attached - and marked it "ready for

committer".
Thank you for this.

I did the basic hygiene test. The patch applies correctly and compiles
with no warnings. Did not find anything broken in basic functionality.
In the documentation i have a minor suggestion of replacing phrase "might
judge to be a non-distinct " with ->" may judge to be non- distinct".

regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5781577.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#64Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Geoghegan (#62)
Re: pg_stat_statements: calls under-estimation propagation

On Sun, Nov 24, 2013 at 10:58 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Nov 18, 2013 at 1:54 AM, Sameer Thakur <samthakur74@gmail.com> wrote:

Please find v10 of patch attached. This patch addresses following
review comments

I've cleaned this up - revision attached - and marked it "ready for committer".

I decided that queryid should be of type oid, not bigint. This is
arguably a slight abuse of notation, but since ultimately Oids are
just abstract object identifiers (so say the docs), but also because
there is no other convenient, minimal way of representing unsigned
32-bit integers in the view that I'm aware of, I'm inclined to think
that it's appropriate.

There seems to be no problem even if we use bigint as the type of
unsigned 32-bit integer like queryid. For example, txid_current()
returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
Could you tell me what the problem is when using bigint for queryid?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#64)
Re: pg_stat_statements: calls under-estimation propagation

Fujii Masao <masao.fujii@gmail.com> writes:

On Sun, Nov 24, 2013 at 10:58 AM, Peter Geoghegan <pg@heroku.com> wrote:

I decided that queryid should be of type oid, not bigint. This is
arguably a slight abuse of notation, but since ultimately Oids are
just abstract object identifiers (so say the docs), but also because
there is no other convenient, minimal way of representing unsigned
32-bit integers in the view that I'm aware of, I'm inclined to think
that it's appropriate.

There seems to be no problem even if we use bigint as the type of
unsigned 32-bit integer like queryid. For example, txid_current()
returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
Could you tell me what the problem is when using bigint for queryid?

We're talking about the output of some view, right, not internal storage?
+1 for using bigint for that. Using OID is definitely an abuse, because
the value *isn't* an OID. And besides, what if we someday decide we need
64-bit keys not 32-bit?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66Peter Geoghegan
pg@heroku.com
In reply to: Tom Lane (#65)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

On Fri, Dec 6, 2013 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There seems to be no problem even if we use bigint as the type of
unsigned 32-bit integer like queryid. For example, txid_current()
returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
Could you tell me what the problem is when using bigint for queryid?

We're talking about the output of some view, right, not internal storage?
+1 for using bigint for that. Using OID is definitely an abuse, because
the value *isn't* an OID. And besides, what if we someday decide we need
64-bit keys not 32-bit?

Fair enough. I was concerned about the cost of external storage of
64-bit integers (unlike query text, they might have to be stored many
times for many distinct intervals or something like that), but in
hindsight that was fairly miserly of me.

Attached revision displays signed 64-bit integers instead.

--
Peter Geoghegan

Attachments:

pg_stat_statements-identification-v12.patchtext/x-patch; charset=US-ASCII; name=pg_stat_statements-identification-v12.patchDownload
diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
new file mode 100644
index e8aed61..95a2767
*** a/contrib/pg_stat_statements/Makefile
--- b/contrib/pg_stat_statements/Makefile
*************** MODULE_big = pg_stat_statements
*** 4,11 ****
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
! 	pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
--- 4,11 ----
  OBJS = pg_stat_statements.o
  
  EXTENSION = pg_stat_statements
! DATA = pg_stat_statements--1.2.sql pg_stat_statements--1.1--1.2.sql \
! 	pg_stat_statements--1.0--1.1.sql pg_stat_statements--unpackaged--1.0.sql
  
  ifdef USE_PGXS
  PG_CONFIG = pg_config
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
new file mode 100644
index ...7824e3e
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
***************
*** 0 ****
--- 1,43 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via ALTER EXTENSION
+ \echo Use "ALTER EXTENSION pg_stat_statements UPDATE" to load this file. \quit
+ 
+ /* First we have to remove them from the extension */
+ ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+ 
+ /* Then we can drop them */
+ DROP VIEW pg_stat_statements;
+ DROP FUNCTION pg_stat_statements();
+ 
+ /* Now redefine */
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT queryid bigint,
+     OUT query text,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
new file mode .
index 42e4d68..e69de29
*** a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
***************
*** 1,43 ****
- /* contrib/pg_stat_statements/pg_stat_statements--1.1.sql */
- 
- -- complain if script is sourced in psql, rather than via CREATE EXTENSION
- \echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
- 
- -- Register functions.
- CREATE FUNCTION pg_stat_statements_reset()
- RETURNS void
- AS 'MODULE_PATHNAME'
- LANGUAGE C;
- 
- CREATE FUNCTION pg_stat_statements(
-     OUT userid oid,
-     OUT dbid oid,
-     OUT query text,
-     OUT calls int8,
-     OUT total_time float8,
-     OUT rows int8,
-     OUT shared_blks_hit int8,
-     OUT shared_blks_read int8,
-     OUT shared_blks_dirtied int8,
-     OUT shared_blks_written int8,
-     OUT local_blks_hit int8,
-     OUT local_blks_read int8,
-     OUT local_blks_dirtied int8,
-     OUT local_blks_written int8,
-     OUT temp_blks_read int8,
-     OUT temp_blks_written int8,
-     OUT blk_read_time float8,
-     OUT blk_write_time float8
- )
- RETURNS SETOF record
- AS 'MODULE_PATHNAME'
- LANGUAGE C;
- 
- -- Register a view on the function for ease of use.
- CREATE VIEW pg_stat_statements AS
-   SELECT * FROM pg_stat_statements();
- 
- GRANT SELECT ON pg_stat_statements TO PUBLIC;
- 
- -- Don't want this to be available to non-superusers.
- REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
--- 0 ----
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
new file mode 100644
index ...80b74a1
*** a/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
--- b/contrib/pg_stat_statements/pg_stat_statements--1.2.sql
***************
*** 0 ****
--- 1,44 ----
+ /* contrib/pg_stat_statements/pg_stat_statements--1.2.sql */
+ 
+ -- complain if script is sourced in psql, rather than via CREATE EXTENSION
+ \echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
+ 
+ -- Register functions.
+ CREATE FUNCTION pg_stat_statements_reset()
+ RETURNS void
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ CREATE FUNCTION pg_stat_statements(
+     OUT userid oid,
+     OUT dbid oid,
+     OUT queryid bigint,
+     OUT query text,
+     OUT calls int8,
+     OUT total_time float8,
+     OUT rows int8,
+     OUT shared_blks_hit int8,
+     OUT shared_blks_read int8,
+     OUT shared_blks_dirtied int8,
+     OUT shared_blks_written int8,
+     OUT local_blks_hit int8,
+     OUT local_blks_read int8,
+     OUT local_blks_dirtied int8,
+     OUT local_blks_written int8,
+     OUT temp_blks_read int8,
+     OUT temp_blks_written int8,
+     OUT blk_read_time float8,
+     OUT blk_write_time float8
+ )
+ RETURNS SETOF record
+ AS 'MODULE_PATHNAME'
+ LANGUAGE C;
+ 
+ -- Register a view on the function for ease of use.
+ CREATE VIEW pg_stat_statements AS
+   SELECT * FROM pg_stat_statements();
+ 
+ GRANT SELECT ON pg_stat_statements TO PUBLIC;
+ 
+ -- Don't want this to be available to non-superusers.
+ REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
new file mode 100644
index 12322b8..5890109
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
*************** PG_MODULE_MAGIC;
*** 67,73 ****
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20120328;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
--- 67,75 ----
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20131115;
! /* Postgres major version number, changes in which invalidate all entries */
! static const uint32 PGSS_PG_MAJOR_VERSION = PG_VERSION_NUM / 100;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
*************** static const uint32 PGSS_FILE_HEADER = 0
*** 80,85 ****
--- 82,97 ----
  #define JUMBLE_SIZE				1024	/* query serialization buffer size */
  
  /*
+  * Extension version number, for supporting older extension versions' objects
+  */
+ typedef enum pgssVersion
+ {
+ 	PGSS_V1_0 = 0,
+ 	PGSS_V1_1,
+ 	PGSS_V1_2
+ } pgssVersion;
+ 
+ /*
   * Hashtable key that defines the identity of a hashtable entry.  We separate
   * queries by user and by database even if they are otherwise identical.
   *
*************** pgss_shmem_startup(void)
*** 390,395 ****
--- 402,408 ----
  	FILE	   *file;
  	uint32		header;
  	int32		num;
+ 	int32		pgver;
  	int32		i;
  	int			query_size;
  	int			buffer_size;
*************** pgss_shmem_startup(void)
*** 465,470 ****
--- 478,485 ----
  
  	if (fread(&header, sizeof(uint32), 1, file) != 1 ||
  		header != PGSS_FILE_HEADER ||
+ 		fread(&pgver, sizeof(uint32), 1, file) != 1 ||
+ 		pgver != PGSS_PG_MAJOR_VERSION ||
  		fread(&num, sizeof(int32), 1, file) != 1)
  		goto error;
  
*************** pgss_shmem_shutdown(int code, Datum arg)
*** 565,570 ****
--- 580,587 ----
  
  	if (fwrite(&PGSS_FILE_HEADER, sizeof(uint32), 1, file) != 1)
  		goto error;
+ 	if (fwrite(&PGSS_PG_MAJOR_VERSION, sizeof(uint32), 1, file) != 1)
+ 		goto error;
  	num_entries = hash_get_num_entries(pgss_hash);
  	if (fwrite(&num_entries, sizeof(int32), 1, file) != 1)
  		goto error;
*************** pg_stat_statements_reset(PG_FUNCTION_ARG
*** 1069,1075 ****
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS			18
  
  /*
   * Retrieve statement statistics.
--- 1086,1093 ----
  }
  
  #define PG_STAT_STATEMENTS_COLS_V1_0	14
! #define PG_STAT_STATEMENTS_COLS_V1_1	18
! #define PG_STAT_STATEMENTS_COLS			19
  
  /*
   * Retrieve statement statistics.
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1086,1092 ****
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	bool		sql_supports_v1_1_counters = true;
  
  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
--- 1104,1110 ----
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	pgssVersion detected_version;
  
  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1107,1114 ****
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
! 	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! 		sql_supports_v1_1_counters = false;
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
--- 1125,1145 ----
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
! 
! 	switch (tupdesc->natts)
! 	{
! 		case PG_STAT_STATEMENTS_COLS_V1_0:
! 			detected_version = PGSS_V1_0;
! 			break;
! 		case PG_STAT_STATEMENTS_COLS_V1_1:
! 			detected_version = PGSS_V1_1;
! 			break;
! 		case PG_STAT_STATEMENTS_COLS:
! 			detected_version = PGSS_V1_2;
! 			break;
! 		default:
! 			elog(ERROR, "pgss version unrecognized from tuple descriptor");
! 	}
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1140,1145 ****
--- 1171,1179 ----
  		{
  			char	   *qstr;
  
+ 			if (detected_version >= PGSS_V1_2)
+ 				values[i++] = Int64GetDatumFast((int64) entry->key.queryid);
+ 
  			qstr = (char *)
  				pg_do_encoding_conversion((unsigned char *) entry->query,
  										  entry->query_len,
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1150,1156 ****
--- 1184,1195 ----
  				pfree(qstr);
  		}
  		else
+ 		{
+ 			if (detected_version >= PGSS_V1_2)
+ 				nulls[i++] = true;
+ 
  			values[i++] = CStringGetTextDatum("<insufficient privilege>");
+ 		}
  
  		/* copy counters to a local variable to keep locking time short */
  		{
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1170,1193 ****
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (sql_supports_v1_1_counters)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
  		}
  
! 		Assert(i == (sql_supports_v1_1_counters ?
! 					 PG_STAT_STATEMENTS_COLS : PG_STAT_STATEMENTS_COLS_V1_0));
  
  		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
  	}
--- 1209,1235 ----
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (detected_version >= PGSS_V1_1)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (detected_version >= PGSS_V1_1)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (detected_version >= PGSS_V1_1)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
  		}
  
! 		Assert(i == (detected_version == PGSS_V1_0?
! 						 PG_STAT_STATEMENTS_COLS_V1_0:
! 					 detected_version == PGSS_V1_1?
! 						 PG_STAT_STATEMENTS_COLS_V1_1:
! 					 PG_STAT_STATEMENTS_COLS));
  
  		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
  	}
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
new file mode 100644
index 428fbb2..6ecf2b6
*** a/contrib/pg_stat_statements/pg_stat_statements.control
--- b/contrib/pg_stat_statements/pg_stat_statements.control
***************
*** 1,5 ****
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.1'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
--- 1,5 ----
  # pg_stat_statements extension
  comment = 'track execution statistics of all SQL statements executed'
! default_version = '1.2'
  module_pathname = '$libdir/pg_stat_statements'
  relocatable = true
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
new file mode 100644
index c02fdf4..c607710
*** a/doc/src/sgml/pgstatstatements.sgml
--- b/doc/src/sgml/pgstatstatements.sgml
***************
*** 23,33 ****
    <title>The <structname>pg_stat_statements</structname> View</title>
  
    <para>
!    The statistics gathered by the module are made available via a system view
!    named <structname>pg_stat_statements</>.  This view contains one row for
!    each distinct query, database ID, and user ID (up to the maximum
!    number of distinct statements that the module can track).  The columns
!    of the view are shown in <xref linkend="pgstatstatements-columns">.
    </para>
  
    <table id="pgstatstatements-columns">
--- 23,34 ----
    <title>The <structname>pg_stat_statements</structname> View</title>
  
    <para>
!    The statistics gathered by the module are made available via a
!    system view named <structname>pg_stat_statements</>.  This view
!    contains one row for each distinct database ID, user ID and query
!    ID (up to the maximum number of distinct statements that the module
!    can track).  The columns of the view are shown in
!    <xref linkend="pgstatstatements-columns">.
    </para>
  
    <table id="pgstatstatements-columns">
***************
*** 57,63 ****
        <entry>OID of database in which the statement was executed</entry>
       </row>
  
!     <row>
        <entry><structfield>query</structfield></entry>
        <entry><type>text</type></entry>
        <entry></entry>
--- 58,71 ----
        <entry>OID of database in which the statement was executed</entry>
       </row>
  
!      <row>
!       <entry><structfield>queryid</structfield></entry>
!       <entry><type>bigint</type></entry>
!       <entry></entry>
!       <entry>Internal hash identifier, computed from the entry's post-parse-analysis tree</entry>
!      </row>
! 
!      <row>
        <entry><structfield>query</structfield></entry>
        <entry><type>text</type></entry>
        <entry></entry>
***************
*** 189,197 ****
    </para>
  
    <para>
!    For security reasons, non-superusers are not allowed to see the text of
!    queries executed by other users.  They can see the statistics, however,
!    if the view has been installed in their database.
    </para>
  
    <para>
--- 197,206 ----
    </para>
  
    <para>
!    For security reasons, non-superusers are not allowed to see the SQL
!    text or queryid of queries executed by other users.  They can see
!    the statistics, however, if the view has been installed in their
!    database.
    </para>
  
    <para>
***************
*** 209,216 ****
     When a constant's value has been ignored for purposes of matching the
     query to other queries, the constant is replaced by <literal>?</literal>
     in the <structname>pg_stat_statements</> display.  The rest of the query
!    text is that of the first query that had the particular hash value
!    associated with the <structname>pg_stat_statements</> entry.
    </para>
  
    <para>
--- 218,226 ----
     When a constant's value has been ignored for purposes of matching the
     query to other queries, the constant is replaced by <literal>?</literal>
     in the <structname>pg_stat_statements</> display.  The rest of the query
!    text is that of the first query that had the particular
!    <structfield>queryid</> hash value associated with the
!    <structname>pg_stat_statements</> entry.
    </para>
  
    <para>
***************
*** 223,232 ****
    </para>
  
    <para>
!    Since the hash value is computed on the post-parse-analysis representation
!    of the queries, the opposite is also possible: queries with identical texts
!    might appear as separate entries, if they have different meanings as a
!    result of factors such as different <varname>search_path</> settings.
    </para>
   </sect2>
  
--- 233,277 ----
    </para>
  
    <para>
!    Since the <structfield>queryid</> hash value is computed on the
!    post-parse-analysis representation of the queries, the opposite is
!    also possible: queries with identical texts might appear as
!    separate entries, if they have different meanings as a result of
!    factors such as different <varname>search_path</> settings.
!   </para>
! 
!   <para>
!    Consumers of <literal>pg_stat_statements</> may wish to use
!    <structfield>queryid</> (perhaps in composite with
!    <structfield>dbid</> and <structfield>userid</>) as a more stable
!    and reliable identifier for each entry than its query text.
!    However, it is important to understand that there are only limited
!    guarantees around the stability of the <structfield>queryid</> hash
!    value.  Since the identifier is derived from the
!    post-parse-analysis tree, its value is a function of, among other
!    things, the internal identifiers that comprise this representation.
!    This has some counterintuitive implications.  For example, a query
!    against a table that is fingerprinted by
!    <literal>pg_stat_statements</> will appear distinct to a
!    subsequently executed query that a reasonable observer might judge
!    to be a non-distinct, if in the interim the table was dropped and
!    re-created.  The hashing process is sensitive to difference in
!    machine architecture and other facets of the platform.
!    Furthermore, it is not safe to assume that <structfield>queryid</>
!    will be stable across major versions of <productname>PostgreSQL</>.
!   </para>
! 
!   <para>
!    As a rule of thumb, an assumption of the stability or comparability
!    of <structfield>querid</> values should be predicated on the the
!    underlying catalog metadata and hash function implementation
!    details exactly matching.  Any two servers participating in any
!    variety of replication based on physical WAL-replay can be expected
!    to have identical <structfield>querid</> values for the same query.
!    Logical replication schemes do not have replicas comparable in all
!    relevant regards, and so <structfield>querid</> will not be a
!    useful identifier for accumulating costs for the entire replica
!    set.  If in doubt, direct testing is recommended.
    </para>
   </sect2>
  
#67Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Geoghegan (#66)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 7, 2013 at 6:31 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Fri, Dec 6, 2013 at 12:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

There seems to be no problem even if we use bigint as the type of
unsigned 32-bit integer like queryid. For example, txid_current()
returns the transaction ID, i.e., unsigned 32-bit integer, as bigint.
Could you tell me what the problem is when using bigint for queryid?

We're talking about the output of some view, right, not internal storage?
+1 for using bigint for that. Using OID is definitely an abuse, because
the value *isn't* an OID. And besides, what if we someday decide we need
64-bit keys not 32-bit?

Fair enough. I was concerned about the cost of external storage of
64-bit integers (unlike query text, they might have to be stored many
times for many distinct intervals or something like that), but in
hindsight that was fairly miserly of me.

Attached revision displays signed 64-bit integers instead.

Thanks! Looks good to me. Committed!

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#68Peter Eisentraut
peter_e@gmx.net
In reply to: Fujii Masao (#67)
Re: pg_stat_statements: calls under-estimation propagation

On Sun, 2013-12-08 at 02:08 +0900, Fujii Masao wrote:

Attached revision displays signed 64-bit integers instead.

Thanks! Looks good to me. Committed!

32-bit buildfarm members are having problems with this patch.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69Peter Geoghegan
pg@heroku.com
In reply to: Peter Eisentraut (#68)
1 attachment(s)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

32-bit buildfarm members are having problems with this patch.

This should fix that problem. Thanks.

--
Peter Geoghegan

Attachments:

fix_32bit_pgss.patchtext/x-patch; charset=US-ASCII; name=fix_32bit_pgss.patchDownload
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
new file mode 100644
index 4e262b4..9f3e376
*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1160,1165 ****
--- 1160,1166 ----
  		bool		nulls[PG_STAT_STATEMENTS_COLS];
  		int			i = 0;
  		Counters	tmp;
+ 		int64		queryid = entry->key.queryid;
  
  		memset(values, 0, sizeof(values));
  		memset(nulls, 0, sizeof(nulls));
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 1172,1178 ****
  			char	   *qstr;
  
  			if (detected_version >= PGSS_V1_2)
! 				values[i++] = Int64GetDatumFast((int64) entry->key.queryid);
  
  			qstr = (char *)
  				pg_do_encoding_conversion((unsigned char *) entry->query,
--- 1173,1179 ----
  			char	   *qstr;
  
  			if (detected_version >= PGSS_V1_2)
! 				values[i++] = Int64GetDatumFast(queryid);
  
  			qstr = (char *)
  				pg_do_encoding_conversion((unsigned char *) entry->query,
#70Magnus Hagander
magnus@hagander.net
In reply to: Peter Geoghegan (#69)
Re: pg_stat_statements: calls under-estimation propagation

On Sun, Dec 8, 2013 at 1:00 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

32-bit buildfarm members are having problems with this patch.

This should fix that problem. Thanks.

Applied.

I also noted on
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=frogmouth&amp;dt=2013-12-08%2007%3A30%3A01&amp;stg=make-contrib
that
there are compiler warnings being generated in pgss. But from a quick look
that looks like something pre-existing and not caused by the latest patch.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#71Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Geoghegan (#69)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, 2013-12-07 at 16:00 -0800, Peter Geoghegan wrote:

On Sat, Dec 7, 2013 at 3:50 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

32-bit buildfarm members are having problems with this patch.

This should fix that problem. Thanks.

This is incidentally the same problem that was reported here about
another pg_stat_statements patch:
/messages/by-id/BF2827DCCE55594C8D7A8F7FFD3AB7713DDAC54F@SZXEML508-MBX.china.huawei.com

Can we make this more robust so that we don't accidentally keep breaking
32-bit systems? Maybe a static assert?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers