pg_stat_statements: calls under-estimation propagation

Started by Daniel Farinaover 13 years ago71 messageshackers
Jump to latest
#1Daniel Farina
drfarina@acm.org

Hello,

After long delay (sorry) here's a patch implementing what was
hand-waved at in
http://archives.postgresql.org/pgsql-hackers/2012-10/msg00176.php

I am still something at a loss at how to test it besides prodding it
by hand; it seems like it's going to involve infrastructure or
introducing hooks into pg_stat_statements for the express purpose.

The patch can also be sourced from:

https://github.com/fdr/postgres.git error-prop-pg_stat_statements

Without further ado, the cover letter taken from the top of the patch:

This tries to establish a maximum under-estimate of the number of
calls for a given pg_stat_statements entry. That means the number of
calls to the canonical form of the query is between 'calls' and 'calls
+ calls_underest'.

This is useful to determine when accumulating statistics if a given
record is bouncing in and out of the pg_stat_statements table, having
its ncalls reset all the time, but also having calls_underest grow
very rapidly.

Records that always stay in pg_stat_statements will have a
calls_underest that do not change at all.

An interesting case is when a query that usually is called is not
called for a while, and falls out of pg_stat_statements. The result
can be that the query with the most 'calls' can also have more
uncertainty than the query with the second most calls, which is also
exactly the truth in reality.

Unceremoniously bundled into this patch is a reduction in the minimum
table size for pg_stat_statements, from 100 to 1. Using tiny values
is not likely to be seen in production, but makes testing the patch a
lot easier in some situations.

I will add this to the commitfest.

--
fdr

Attachments:

add-pg_stat_statements-calls-underestimation-v1.patchapplication/octet-stream; name=add-pg_stat_statements-calls-underestimation-v1.patchDownload+128-6
In reply to: Daniel Farina (#1)
Re: pg_stat_statements: calls under-estimation propagation

On 28 December 2012 11:43, Daniel Farina <drfarina@acm.org> wrote:

Without further ado, the cover letter taken from the top of the patch:

This tries to establish a maximum under-estimate of the number of
calls for a given pg_stat_statements entry. That means the number of
calls to the canonical form of the query is between 'calls' and 'calls
+ calls_underest'.

Cool.

One possible issue I see with this is that this code:

+
+ 		/* propagate calls under-estimation bound */
+ 		entry->counters.calls_underest = pgss->calls_max_underest;
+

which appears within entry_alloc(). So if the first execution of the
query results in an error during planning or (much more likely)
execution, as in the event of an integrity constraint violation,
you're going to get a dead sticky entry that no one will ever see.
Now, we currently decay sticky entries much more aggressively, so the
mere fact that we waste an entry isn't a real problem. This was
established by this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d5375491f8e391224b48e4bb449995a4642183ea

However, with this approach, calls_underest values might appear to the
user in what might be considered an astonishing order. Now, I'm not
suggesting that that's a real problem - just that they may not be the
semantics we want, particularly as we can reasonably defer assigning a
calls_underest until a sticky entry is "unstuck", and an entry becomes
user-visible, within pgss_store().

Also, it seems like you should initialise pgss->calls_max_underest,
within pgss_shmem_startup(). You should probably serialise the value
to disk, and initialise it to 0 if there is no such value to begin
with.

I wonder if the way that pg_stat_statements throws its hands up when
it comes to crash safety (i.e. the contents of the hash table are
completely lost) could be a concern here. In other words, a program
tasked with tracking execution costs over time and graphing
time-series data from snapshots has to take responsibility for
ensuring that there hasn't been a crash (or, indeed, a reset).

Another issue is that I don't think that what you've done here solves
the problem of uniquely identify each entry over time, in the same way
that simply exposing the hash value would. I'm concerned with the
related problem to the problem solved here - simply identifying the
entry uniquely. As we've already discussed, the query string is an
imperfect proxy for each entry, even with constants swapped with '?'
characters (and even when combined with the userid and dbid values -
it's still not the same as the hashtable key, in a way that is likely
to bother people that use pg_stat_statements for long enough).

I think you probably should have created a
PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module
is now legacy, like *V1_0 is in HEAD.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Daniel Farina
drfarina@acm.org
In reply to: Peter Geoghegan (#2)
Re: pg_stat_statements: calls under-estimation propagation

Attached is a cumulative patch attempting to address the below. One
can see the deltas to get there at https://github.com/fdr/postgres.git
error-prop-pg_stat_statements-v2.

On Fri, Dec 28, 2012 at 9:58 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

However, with this approach, calls_underest values might appear to the
user in what might be considered an astonishing order. Now, I'm not
suggesting that that's a real problem - just that they may not be the
semantics we want, particularly as we can reasonably defer assigning a
calls_underest until a sticky entry is "unstuck", and an entry becomes
user-visible, within pgss_store().

Fix for this as I understand it:

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 1036,1041 **** pgss_store(const char *query, uint32 queryId,
--- 1036,1042 ----
  			e->counters.usage = USAGE_INIT;

e->counters.calls += 1;
+ e->counters.calls_underest = pgss->calls_max_underest;
e->counters.total_time += total_time;
e->counters.rows += rows;
e->counters.shared_blks_hit += bufusage->shared_blks_hit;
***************
*** 1264,1272 **** entry_alloc(pgssHashKey *key, const char *query,
int query_len, bool sticky)
/* set the appropriate initial usage count */
entry->counters.usage = sticky ? pgss->cur_median_usage : USAGE_INIT;

- 		/* propagate calls under-estimation bound */
- 		entry->counters.calls_underest = pgss->calls_max_underest;
-
  		/* re-initialize the mutex each time ... we assume no one using it */
  		SpinLockInit(&entry->mutex);
  		/* ... and don't forget the query text */
--- 1265,1270 ----

Also, it seems like you should initialise pgss->calls_max_underest,
within pgss_shmem_startup().

Easy enough. Somehow I wrongly thought zero-initialization was a thing
for the shmem functions.

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 426,431 **** pgss_shmem_startup(void)
--- 426,432 ----
  	{
  		/* First time through ... */
  		pgss->lock = LWLockAssign();
+ 		pgss->calls_max_underest = 0;
  		pgss->query_size = pgstat_track_activity_query_size;
  		pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
  	}

You should probably serialise the value
to disk, and initialise it to 0 if there is no such value to begin
with.

I prefer different approach here: just compute it while loading the
entries from disk, since the calls + underestimation can be used to
find a new pessimum underestimation global value.

*** a/contrib/pg_stat_statements/pg_stat_statements.c
--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 419,424 **** pgss_shmem_startup(void)
--- 419,425 ----
  	int			query_size;
  	int			buffer_size;
  	char	   *buffer = NULL;
+ 	int64		calls_max_underest = 0;
  	if (prev_shmem_startup_hook)
  		prev_shmem_startup_hook();
***************
*** 440,446 **** pgss_shmem_startup(void)
  	{
  		/* First time through ... */
  		pgss->lock = LWLockAssign();
! 		pgss->calls_max_underest = 0;
  		pgss->query_size = pgstat_track_activity_query_size;
  		pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
  	}
--- 441,447 ----
  	{
  		/* First time through ... */
  		pgss->lock = LWLockAssign();
! 		pgss->calls_max_underest = calls_max_underest;
  		pgss->query_size = pgstat_track_activity_query_size;
  		pgss->cur_median_usage = ASSUMED_MEDIAN_INIT;
  	}
***************
*** 528,533 **** pgss_shmem_startup(void)
--- 529,545 ----
  												   temp.query_len,
  												   query_size - 1);
+ 		/*
+ 		 * Compute maxima of under-estimation over the read entries
+ 		 * for reinitializing pgss->calls_max_underest.
+ 		 */
+ 		{
+ 			int64 cur_underest;
+ 				
+ 			cur_underest = temp.calls + temp.calls_underest;
+ 			calls_max_underest = Max(calls_max_underest, cur_underest);
+ 		}
+
  		/* make the hashtable entry (discards old entries if too many) */
  		entry = entry_alloc(&temp.key, buffer, temp.query_len, false);
***************
*** 535,540 **** pgss_shmem_startup(void)
--- 547,559 ----
  		entry->counters = temp.counters;
  	}
+ 	/*
+ 	 * Reinitialize global under-estimation information from the
+ 	 * computed maxima, if any.  Otherwise, calls_max_underest should
+ 	 * be zero.
+ 	 */
+ 	pgss->calls_max_underest = calls_max_underest;
+
  	pfree(buffer);
  	FreeFile(file);

I think you probably should have created a
PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module
is now legacy, like *V1_0 is in HEAD.

Indeed. Prepare to scroll, this change is not very complex but a bit
more bloated looking:

--- b/contrib/pg_stat_statements/pg_stat_statements.c
***************
*** 96,101 **** typedef struct pgssHashKey
--- 96,115 ----
  } pgssHashKey;
  /*
+  * Identifies the tuple format detected by pg_stat_statements.
+  *
+  * Used to identify features of newer formats and enable smooth
+  * upgrades: one can install a new pg_stat_statements binary while
+  * running with the old SQL function definitions.
+  */
+ typedef enum pgssTupVersion
+ {
+ 	PGSS_TUP_V1_0 = 1,
+ 	PGSS_TUP_V1_1,
+ 	PGSS_TUP_LATEST
+ } pgssTupVersion;
+
+ /*
   * The actual stats counters kept within pgssEntry.
   */
  typedef struct Counters
***************
*** 1078,1083 **** pg_stat_statements_reset(PG_FUNCTION_ARGS)
--- 1092,1098 ----
  }

#define PG_STAT_STATEMENTS_COLS_V1_0 14
+ #define PG_STAT_STATEMENTS_COLS_V1_1 18
#define PG_STAT_STATEMENTS_COLS 19

/*
***************
*** 1095,1101 **** pg_stat_statements(PG_FUNCTION_ARGS)
bool is_superuser = superuser();
HASH_SEQ_STATUS hash_seq;
pgssEntry *entry;
! bool sql_supports_v1_1_counters = true;

  	if (!pgss || !pgss_hash)
  		ereport(ERROR,
--- 1110,1116 ----
  	bool		is_superuser = superuser();
  	HASH_SEQ_STATUS hash_seq;
  	pgssEntry  *entry;
! 	pgssTupVersion detected_version;

if (!pgss || !pgss_hash)
ereport(ERROR,
***************
*** 1116,1123 **** pg_stat_statements(PG_FUNCTION_ARGS)
/* Build a tuple descriptor for our result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
elog(ERROR, "return type must be a row type");
if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! sql_supports_v1_1_counters = false;

  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
--- 1131,1158 ----
  	/* Build a tuple descriptor for our result type */
  	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
  		elog(ERROR, "return type must be a row type");
+
+ 	/* Perform version detection */
  	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
! 		detected_version = PGSS_TUP_V1_0;
! 	else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_1)
! 		detected_version = PGSS_TUP_V1_1;
! 	else if (tupdesc->natts == PG_STAT_STATEMENTS_COLS)
! 		detected_version = PGSS_TUP_LATEST;
! 	else
! 	{
! 		/*
! 		 * Couldn't identify the tuple format.  Raise error.
! 		 *
! 		 * This is an exceptional case that may only happen in bizarre
! 		 * situations, since it is thought that every released version
! 		 * of pg_stat_statements has a matching schema.
! 		 */
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
! 				 errmsg("pg_stat_statements schema is not supported "
! 						"by its installed binary")));
! 	}

per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
oldcontext = MemoryContextSwitchTo(per_query_ctx);
***************
*** 1175,1196 **** pg_stat_statements(PG_FUNCTION_ARGS)
continue;

  		values[i++] = Int64GetDatumFast(tmp.calls);
! 		values[i++] = Int64GetDatumFast(tmp.calls_underest);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! 		if (sql_supports_v1_1_counters)
  			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
  		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
  		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! 		if (sql_supports_v1_1_counters)
  		{
  			values[i++] = Float8GetDatumFast(tmp.blk_read_time);
  			values[i++] = Float8GetDatumFast(tmp.blk_write_time);
--- 1210,1232 ----
  			continue;

values[i++] = Int64GetDatumFast(tmp.calls);
! if (detected_version >= PGSS_TUP_LATEST)
! values[i++] = Int64GetDatumFast(tmp.calls_underest);
values[i++] = Float8GetDatumFast(tmp.total_time);
values[i++] = Int64GetDatumFast(tmp.rows);
values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
! if (detected_version >= PGSS_TUP_V1_1)
values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
values[i++] = Int64GetDatumFast(tmp.local_blks_read);
! if (detected_version >= PGSS_TUP_V1_1)
values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
values[i++] = Int64GetDatumFast(tmp.local_blks_written);
values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
! if (detected_version >= PGSS_TUP_V1_1)
{
values[i++] = Float8GetDatumFast(tmp.blk_read_time);
values[i++] = Float8GetDatumFast(tmp.blk_write_time);

I wonder if the way that pg_stat_statements throws its hands up when
it comes to crash safety (i.e. the contents of the hash table are
completely lost) could be a concern here. In other words, a program
tasked with tracking execution costs over time and graphing
time-series data from snapshots has to take responsibility for
ensuring that there hasn't been a crash (or, indeed, a reset).

Another issue is that I don't think that what you've done here solves
the problem of uniquely identify each entry over time, in the same way
that simply exposing the hash value would. I'm concerned with the
related problem to the problem solved here - simply identifying the
entry uniquely. As we've already discussed, the query string is an
imperfect proxy for each entry, even with constants swapped with '?'
characters (and even when combined with the userid and dbid values -
it's still not the same as the hashtable key, in a way that is likely
to bother people that use pg_stat_statements for long enough).

These were not express goals of the patch, but so long as you are
inviting features, attached is a bonus patch that exposes the queryid
and also the notion of a "statistics session" that is re-rolled
whenever the stats file could not be read or the stats are reset, able
to fully explain all obvious causes of retrograde motion in
statistics. It too is cumulative, so it includes the under-estimation
field. Notably, I also opted to nullify extra pg_stat_statements
fields when they'd also show "insufficient privileges" (that one is
spared from this censorship), because I feel as though a bit too much
information leaks from pg_stat_statement's statistics to ignore,
especially after adding the query id. Since the common theme here is
identifying queries, I have called it
"pg_stat_statements-identification", and it can be found in the git
repo above under the same name (...-v1).

--
fdr

Attachments:

add-pg_stat_statements-calls-underestimation-v2.patch.gzapplication/x-gzip; name=add-pg_stat_statements-calls-underestimation-v2.patch.gzDownload
pg_stat_statements-identification-v1.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v1.patch.gzDownload
#4Daniel Farina
drfarina@acm.org
In reply to: Daniel Farina (#3)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 29, 2012 at 4:21 AM, Daniel Farina <drfarina@acm.org> wrote:

These were not express goals of the patch, but so long as you are
inviting features, attached is a bonus patch that exposes the queryid
and also the notion of a "statistics session" that is re-rolled
whenever the stats file could not be read or the stats are reset, able
to fully explain all obvious causes of retrograde motion in
statistics. It too is cumulative, so it includes the under-estimation
field. Notably, I also opted to nullify extra pg_stat_statements
fields when they'd also show "insufficient privileges" (that one is
spared from this censorship), because I feel as though a bit too much
information leaks from pg_stat_statement's statistics to ignore,
especially after adding the query id. Since the common theme here is
identifying queries, I have called it
"pg_stat_statements-identification", and it can be found in the git
repo above under the same name (...-v1).

A small amendment that doesn't really change the spirit of the
narrative is attached.

--
fdr

Attachments:

pg_stat_statements-identification-v2.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v2.patch.gzDownload
In reply to: Daniel Farina (#3)
Re: pg_stat_statements: calls under-estimation propagation

On 29 December 2012 12:21, Daniel Farina <drfarina@acm.org> wrote:

These were not express goals of the patch, but so long as you are
inviting features, attached is a bonus patch that exposes the queryid
and also the notion of a "statistics session" that is re-rolled
whenever the stats file could not be read or the stats are reset, able
to fully explain all obvious causes of retrograde motion in
statistics. It too is cumulative, so it includes the under-estimation
field.

Cool.

I had a thought about Tom's objection to exposing the hash value. I
would like to propose a compromise between exposing the hash value and
not doing so at all.

What if we expose the hash value without documenting it, in a way not
apparent to normal users, while letting experts willing to make an
executive decision about its stability use it? What I have in mind is
to expose the hash value from the pg_stat_statements function, and yet
to avoid exposing it within the pg_stat_statements view definition.
The existence of the hash value would not need to be documented, since
the pg_stat_statements function is an undocumented implementation
detail.

Precedent for this exists, I think - the undocumented system hash
functions are exposed via an SQL interface. Some satellite projects
rely on this (apparently the pl/proxy documentation shows the use of
hashtext(), which is a thin wrapper on hash_any(), and there is
chatter about it elsewhere). So it is already the case that people are
using hashtext(), which should not be problematic if the applications
that do so have a reasonable set of expectations about its stability
(i.e. it's not going to change in a point release, because that would
break hash indexes, but may well change across major releases). We've
already in effect promised to not break hashtext() in a point release,
just as we've already in effect promised to not break the hash values
that pg_stat_statements uses internally (to do any less would
invalidate the on-disk representation, and necessitate bumping
PGSS_FILE_HEADER to wipe the stored stats).

Thoughts?

Notably, I also opted to nullify extra pg_stat_statements
fields when they'd also show "insufficient privileges" (that one is
spared from this censorship), because I feel as though a bit too much
information leaks from pg_stat_statement's statistics to ignore,
especially after adding the query id.

That seems sensible.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Daniel Farina
daniel@heroku.com
In reply to: Peter Geoghegan (#5)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 29, 2012 at 6:37 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 29 December 2012 12:21, Daniel Farina <drfarina@acm.org> wrote:

These were not express goals of the patch, but so long as you are
inviting features, attached is a bonus patch that exposes the queryid
and also the notion of a "statistics session" that is re-rolled
whenever the stats file could not be read or the stats are reset, able
to fully explain all obvious causes of retrograde motion in
statistics. It too is cumulative, so it includes the under-estimation
field.

Cool.

I had a thought about Tom's objection to exposing the hash value. I
would like to propose a compromise between exposing the hash value and
not doing so at all.

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it. Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions. That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

A change in hash function should also then necessitate changing the
stat file header.

Another more minor issue is the hard-wiring of the precision of the
id. For that reason, I have thought it reasonable to expose this as a
string also.

--
fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

In reply to: Daniel Farina (#6)
Re: pg_stat_statements: calls under-estimation propagation

On 30 December 2012 02:45, Daniel Farina <daniel@heroku.com> wrote:

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it. Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions. That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

Hmm. I like the idea, but a concern there would be that you'd
introduce additional scope for collisions in the third-party utility
building time-series data from snapshots. I currently put the
probability of a collision within pg_stat_statements as 1% in the
event of a pg_stat_statements.max of 10,000.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Daniel Farina
daniel@heroku.com
In reply to: Peter Geoghegan (#7)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 29, 2012 at 7:12 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 30 December 2012 02:45, Daniel Farina <daniel@heroku.com> wrote:

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it. Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions. That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

Hmm. I like the idea, but a concern there would be that you'd
introduce additional scope for collisions in the third-party utility
building time-series data from snapshots. I currently put the
probability of a collision within pg_stat_statements as 1% in the
event of a pg_stat_statements.max of 10,000.

We can use a longer session key and duplicate the queryid (effectively
padding) a couple of times to complete the XOR. I think that makes
the cases of collisions introduced by this astronomically low, as an
increase over the base collision rate.

--
fdr

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Daniel Farina
daniel@heroku.com
In reply to: Daniel Farina (#8)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, Dec 29, 2012 at 7:16 PM, Daniel Farina <daniel@heroku.com> wrote:

On Sat, Dec 29, 2012 at 7:12 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 30 December 2012 02:45, Daniel Farina <daniel@heroku.com> wrote:

As I recall, the gist of this objection had to do with a false sense
of stability of the hash value, and the desire to enforce the ability
to alter it. Here's an option: xor the hash value with the
'statistics session id', so it's *known* to be unstable between
sessions. That gets you continuity in the common case and sound
deprecation in the less-common cases (crashes, format upgrades, stat
resetting).

Hmm. I like the idea, but a concern there would be that you'd
introduce additional scope for collisions in the third-party utility
building time-series data from snapshots. I currently put the
probability of a collision within pg_stat_statements as 1% in the
event of a pg_stat_statements.max of 10,000.

We can use a longer session key and duplicate the queryid (effectively
padding) a couple of times to complete the XOR. I think that makes
the cases of collisions introduced by this astronomically low, as an
increase over the base collision rate.

A version implementing that is attached, except I generate an
additional 64-bit session not exposed to the client to prevent even
casual de-leaking of the session state. That may seem absurd, until
someone writes a tool that de-xors things and relies on it and then
nobody feels inclined to break it. It also keeps the public session
number short.

I also opted to save the underestimate since I'm adding a handful of
fixed width fields to the file format anyway.

--
fdr

Attachments:

pg_stat_statements-identification-v3.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v3.patch.gzDownload
#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Daniel Farina (#9)
Re: pg_stat_statements: calls under-estimation propagation

On 30.12.2012 08:31, Daniel Farina wrote:

A version implementing that is attached, except I generate an
additional 64-bit session not exposed to the client to prevent even
casual de-leaking of the session state. That may seem absurd, until
someone writes a tool that de-xors things and relies on it and then
nobody feels inclined to break it. It also keeps the public session
number short.

I also opted to save the underestimate since I'm adding a handful of
fixed width fields to the file format anyway.

This patch needs documentation. At a minimum, the new calls_underest
field needs to be listed in the description of the pg_stat_statements.

Pardon for not following the discussion: What exactly does the
calls_underest field mean? I couldn't decipher it from the patch. What
can an admin do with the value? How does it compare with just bumping up
pg_stat_statements.max?

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Sameer Thakur
samthakur74@gmail.com
In reply to: Heikki Linnakangas (#10)
Re: pg_stat_statements: calls under-estimation propagation

This patch needs documentation. At a minimum, the new calls_underest
field needs to be listed in the description of the pg_stat_statements.

I have attached a version which includes documentation.
pg_stat_statements-identification-v4.patch.gz
<http://postgresql.1045698.n5.nabble.com/file/n5770844/pg_stat_statements-identification-v4.patch.gz&gt;

Pardon for not following the discussion: What exactly does the
calls_underest field mean? I couldn't decipher it from the patch. What
can an admin do with the value? How does it compare with just bumping up
pg_stat_statements.max?

Paraphrasing the documentation.
There is a possibility that,for queries which are tracked intermittently,
could have their statistics silently reset.
The calls_underest field gives an indication that a query has been tracked
intermittently and consequently
have a higher probability of erroneous statistics. Queries tracked
constantly will have a zero value for
calls_underest indicating zero probability of erroneous statistics.
A greater value of calls_underest indicates greater degree of inconsistent
tracking which in
turn means greater possibility of erroneous statistics due to statistics
being reset when
query was not being tracked.

Increasing pg_stat_statements.max reduces the possibility of a query being
tracked intermittently but does not address the problem of indicating the
probability of erroneous statistics if the query is indeed being tracked
intermittently. I do not believe the admin can influence the value of
calls_underest as it is not a GUC parameter.

We have a need to see this patch committed hence the revived interest in
this thread

regards
Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5770844.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Sameer Thakur (#11)
Re: pg_stat_statements: calls under-estimation propagation

On Sat, 2013-09-14 at 03:51 -0700, samthakur74 wrote:

We have a need to see this patch committed hence the revived interest
in this thread

You have added this email to the commit fest, but it contains no patch.
Please add the email with the actual patch. Maybe the author should be
given a chance to update the patches, though, because they are quite
old.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Sameer Thakur
samthakur74@gmail.com
In reply to: Peter Eisentraut (#12)
Re: pg_stat_statements: calls under-estimation propagation

You have added this email to the commit fest, but it contains no patch.

Please add the email with the actual patch.

I hope its attached now!

Maybe the author should be

given a chance to update the patches, though, because they are quite
old.

I did connect with Daniel and he did have some improvement ideas. I am not
sure when they could be implemented. Since we have a interest in the
current version of the patch, which needed documentation, i tried to
complete that.
Thank you,
Sameer

pg_stat_statements-identification-v4.patch.gz (8K) <http://postgresql.1045698.n5.nabble.com/attachment/5770937/0/pg_stat_statements-identification-v4.patch.gz&gt;

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5770937.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#14Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#13)
Re: pg_stat_statements: calls under-estimation propagation

On Sun, Sep 15, 2013 at 3:54 PM, samthakur74 <samthakur74@gmail.com> wrote:

You have added this email to the commit fest, but it contains no patch.

Please add the email with the actual patch.

I hope its attached now!

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Sameer Thakur
samthakur74@gmail.com
In reply to: Fujii Masao (#14)
Re: pg_stat_statements: calls under-estimation propagation

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

<http://www.postgresql.org/mailpref/pgsql-hackers&gt;

NAML<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&amp;id=instant_html%21nabble%3Aemail.naml&amp;base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&amp;breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml&gt;

On Tue, Sep 17, 2013 at 5:47 PM, Fujii Masao-2 [via PostgreSQL] <
ml-node+s1045698n5771213h27@n5.nabble.com> wrote:

On Sun, Sep 15, 2013 at 3:54 PM, samthakur74 <[hidden email]<http://user/SendEmail.jtp?type=node&amp;node=5771213&amp;i=0&gt;&gt;
wrote:

You have added this email to the commit fest, but it contains no

patch.

Please add the email with the actual patch.

I hope its attached now!

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list ([hidden email]<http://user/SendEmail.jtp?type=node&amp;node=5771213&amp;i=1&gt;)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

------------------------------
If you reply to this email, your message will be added to the discussion
below:

http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771213.html
To unsubscribe from pg_stat_statements: calls under-estimation
propagation, click here<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&amp;node=5738128&amp;code=c2FtdGhha3VyNzRAZ21haWwuY29tfDU3MzgxMjh8ODM4MzYxMTcy&gt;
.
NAML<http://postgresql.1045698.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&amp;id=instant_html%21nabble%3Aemail.naml&amp;base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&amp;breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml&gt;

pg_stat_statements-identification-v4.patch.gz (9K) <http://postgresql.1045698.n5.nabble.com/attachment/5771248/0/pg_stat_statements-identification-v4.patch.gz&gt;

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771248.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#15)
Re: pg_stat_statements: calls under-estimation propagation

On Tue, Sep 17, 2013 at 10:48 PM, samthakur74 <samthakur74@gmail.com> wrote:

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

pg_stat_statements--1.2.sql is missing. Could you confirm that the patch
includes all the changes?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Sameer Thakur
samthakur74@gmail.com
In reply to: Fujii Masao (#16)
Re: pg_stat_statements: calls under-estimation propagation

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

I did not add pg_stat_statements--1.2.sql. I have added that now and
updated the patch again.

The patch attached should contain following file changes
patching file contrib/pg_stat_statements/Makefile
patching file contrib/pg_stat_statements/pg_stat_statements--1.1--1.2.sql
patching file contrib/pg_stat_statements/pg_stat_statements--1.2.sql
patching file contrib/pg_stat_statements/pg_stat_statements.c
patching file contrib/pg_stat_statements/pg_stat_statements.control
patching file doc/src/sgml/pgstatstatements.sgml

regards
Sameer

Attachments:

pg_stat_statements-identification-v4.patch.gzapplication/x-gzip; name=pg_stat_statements-identification-v4.patch.gzDownload
#18Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#17)
Re: pg_stat_statements: calls under-estimation propagation

On Wed, Sep 18, 2013 at 2:41 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

I did not add pg_stat_statements--1.2.sql. I have added that now and updated
the patch again.

Thanks!

I got the segmentation fault when I tested the case where the least-executed
query statistics is discarded, i.e., when I executed different queries more than
pg_stat_statements.max times. I guess that the patch might have a bug.

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#18)
Re: pg_stat_statements: calls under-estimation propagation

On Thu, Sep 19, 2013 at 2:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

On Wed, Sep 18, 2013 at 2:41 PM, Sameer Thakur <samthakur74@gmail.com> wrote:

You seem to have forgotten to include the pg_stat_statements--1.2.sql
and pg_stat_statements--1.1--1.2.sql in the patch.

Sorry again. Please find updated patch attached.

I did not add pg_stat_statements--1.2.sql. I have added that now and updated
the patch again.

pg_stat_statements--1.1.sql should be removed.

+      <entry><structfield>queryid</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Unique value of each representative statement for the
current statistics session.
+       This value will change for each new statistics session.</entry>

What does "statistics session" mean?

Regards,

--
Fujii Masao

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Sameer Thakur
samthakur74@gmail.com
In reply to: Fujii Masao (#19)
Re: pg_stat_statements: calls under-estimation propagation

I got the segmentation fault when I tested the case where the

least-executed

query statistics is discarded, i.e., when I executed different queries

more than

pg_stat_statements.max times. I guess that the patch might have a bug.

Thanks, will try to fix it.

pg_stat_statements--1.1.sql should be removed.
Yes will do that

+      <entry><structfield>queryid</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Unique value of each representative statement for the
current statistics session.
+       This value will change for each new statistics session.</entry>

What does "statistics session" mean?

The time period when statistics are gathered by statistics collector
without being reset. So the statistics session continues across normal
shutdowns, but in case of abnormal situations like crashes, format upgrades
or statistics being reset for any other reason, a new time period of
statistics collection starts i.e. a new statistics session. The queryid
value generation is linked to statistics session so emphasize the fact that
in case of crashes,format upgrades or any situation of statistics reset,
the queryid for the same queries will also change. Will update
documentation clearly explain the term statistics session in this context

regards

Sameer

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-stat-statements-calls-under-estimation-propagation-tp5738128p5771562.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

#21Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#20)
#22Sameer Thakur
samthakur74@gmail.com
In reply to: Fujii Masao (#21)
#23Daniel Farina
daniel@fdr.io
In reply to: Sameer Thakur (#13)
#24Daniel Farina
daniel@fdr.io
In reply to: Daniel Farina (#23)
#25Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#24)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Farina (#24)
#27Sameer Thakur
samthakur74@gmail.com
In reply to: Sameer Thakur (#25)
#28Daniel Farina
daniel@fdr.io
In reply to: Sameer Thakur (#27)
#29Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#28)
#30Daniel Farina
daniel@fdr.io
In reply to: Sameer Thakur (#29)
#31Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#30)
#32Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#30)
#33Daniel Farina
daniel@heroku.com
In reply to: Sameer Thakur (#32)
In reply to: Daniel Farina (#28)
#35Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#33)
#36Sameer Thakur
samthakur74@gmail.com
In reply to: Sameer Thakur (#35)
#37Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#36)
#38Daniel Farina
daniel@heroku.com
In reply to: Fujii Masao (#37)
#39Sameer Thakur
samthakur74@gmail.com
In reply to: Fujii Masao (#37)
#40Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#38)
#41Noname
pilum.70@uni-muenster.de
In reply to: Sameer Thakur (#40)
#42Noname
pilum.70@uni-muenster.de
In reply to: Noname (#41)
#43Sameer Thakur
samthakur74@gmail.com
In reply to: Sameer Thakur (#40)
In reply to: Noname (#42)
#45Noname
pilum.70@uni-muenster.de
In reply to: Peter Geoghegan (#44)
#46Fujii Masao
masao.fujii@gmail.com
In reply to: Sameer Thakur (#43)
#47Daniel Farina
daniel@heroku.com
In reply to: Fujii Masao (#46)
#48Fujii Masao
masao.fujii@gmail.com
In reply to: Daniel Farina (#47)
#49Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Farina (#47)
#50Fujii Masao
masao.fujii@gmail.com
In reply to: Alvaro Herrera (#49)
#51Daniel Farina
daniel@heroku.com
In reply to: Fujii Masao (#48)
#52Daniel Farina
daniel@heroku.com
In reply to: Alvaro Herrera (#49)
#53Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Farina (#52)
#54Daniel Farina
daniel@heroku.com
In reply to: Alvaro Herrera (#53)
#55Peter Eisentraut
peter_e@gmx.net
In reply to: Sameer Thakur (#43)
#56Sameer Thakur
samthakur74@gmail.com
In reply to: Alvaro Herrera (#49)
#57Sameer Thakur
samthakur74@gmail.com
In reply to: Daniel Farina (#54)
In reply to: Sameer Thakur (#57)
#59Daniel Farina
daniel@heroku.com
In reply to: Peter Geoghegan (#58)
#60Sameer Thakur
samthakur74@gmail.com
In reply to: Peter Geoghegan (#58)
#61Sameer Thakur
samthakur74@gmail.com
In reply to: Sameer Thakur (#60)
In reply to: Sameer Thakur (#61)
#63Sameer Thakur
samthakur74@gmail.com
In reply to: Peter Geoghegan (#62)
#64Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Geoghegan (#62)
#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#64)
In reply to: Tom Lane (#65)
#67Fujii Masao
masao.fujii@gmail.com
In reply to: Peter Geoghegan (#66)
#68Peter Eisentraut
peter_e@gmx.net
In reply to: Fujii Masao (#67)
In reply to: Peter Eisentraut (#68)
#70Magnus Hagander
magnus@hagander.net
In reply to: Peter Geoghegan (#69)
#71Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Geoghegan (#69)