Buffer statistics for pg_stat_statements

Started by Takahiro Itagakiabout 16 years ago11 messages
#1Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp

We have infrastructure to count numbers buffer access in 8.5 Alpha 3.
I'd like to add per-query buffer usage into contrib/pg_stat_statements.

The pg_stat_statements view will have the same contents with
struct BufferUsage. Fields named shared_blks_{hit|read|written},
local_blks_{hit|read|written}, and temp_blks_{read|written}
will be added to the view.

We can determine slow queries not only based on durations but also
based on I/O or memory access count. Also, queries with non-zero
temp_blks_read means DBA need to consider increasing work_mem. Those
information would be useful to find where the server's bottleneck is.

Additional management costs cannot be avoided, but I think it should be
not so high because we accumulate buffer usage only once per query,
while EXPLAIN BUFFERS is slow because we need per-tuple calculation.

I'll submit this pg_stat_statements enhancement to the next commit fest.
Comments welcome.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#2Cédric Villemain
cedric.villemain@dalibo.com
In reply to: Takahiro Itagaki (#1)
Re: Buffer statistics for pg_stat_statements

Le vendredi 18 décembre 2009 09:44:40, Takahiro Itagaki a écrit :

We have infrastructure to count numbers buffer access in 8.5 Alpha 3.
I'd like to add per-query buffer usage into contrib/pg_stat_statements.

The pg_stat_statements view will have the same contents with
struct BufferUsage. Fields named shared_blks_{hit|read|written},
local_blks_{hit|read|written}, and temp_blks_{read|written}
will be added to the view.

We can determine slow queries not only based on durations but also
based on I/O or memory access count. Also, queries with non-zero
temp_blks_read means DBA need to consider increasing work_mem. Those
information would be useful to find where the server's bottleneck is.

Additional management costs cannot be avoided, but I think it should be
not so high because we accumulate buffer usage only once per query,
while EXPLAIN BUFFERS is slow because we need per-tuple calculation.

I'll submit this pg_stat_statements enhancement to the next commit fest.
Comments welcome.

Very good idea.
Perhaps it can be usefull to have the percentage for hit/read ratio computed
in the view ?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

--
Cédric Villemain
Administrateur de Base de Données
Cel: +33 (0)6 74 15 56 53
http://dalibo.com - http://dalibo.org

#3Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Cédric Villemain (#2)
1 attachment(s)
Re: Buffer statistics for pg_stat_statements

Cedric Villemain <cedric.villemain@dalibo.com> wrote:

Le vendredi 18 decembre 2009 09:44:40, Takahiro Itagaki a ecrit :

I'd like to add per-query buffer usage into contrib/pg_stat_statements.

Here is a patch to add buffer usage columns into pg_stat_statements.
It also changes initialzation of the result TupleDesc from manually
coded routines to get_call_result_type().

Perhaps it can be usefull to have the percentage for hit/read ratio computed
in the view ?

I think different DBAs want different derived values; Some of them might want
the buffer hit ratio, but others might request numbers per query. I'd like to
privide only raw values from pg_stat_statements to keep it simple, but I added
a computational expression of hit percentage in the documentation.

! bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
! nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
! FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
! -[ RECORD 1 ]---------------------------------------------------------------------
! query | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
! calls | 3000
! total_time | 9.60900100000002
! rows | 2836
! hit_percent | 99.9778970000200936

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachments:

pg_stat_statements_bufusage_20091222.patchapplication/octet-stream; name=pg_stat_statements_bufusage_20091222.patchDownload
diff -cprN head/contrib/pg_stat_statements/pg_stat_statements.c work/contrib/pg_stat_statements/pg_stat_statements.c
*** head/contrib/pg_stat_statements/pg_stat_statements.c	2009-12-16 09:17:22.613136000 +0900
--- work/contrib/pg_stat_statements/pg_stat_statements.c	2009-12-22 17:27:04.203701708 +0900
***************
*** 26,31 ****
--- 26,32 ----
  #include "catalog/pg_type.h"
  #include "executor/executor.h"
  #include "executor/instrument.h"
+ #include "funcapi.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
  #include "pgstat.h"
*************** PG_MODULE_MAGIC;
*** 44,50 ****
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20081202;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
--- 45,51 ----
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20091216;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
*************** typedef struct pgssHashKey
*** 75,84 ****
   */
  typedef struct Counters
  {
! 	int64		calls;			/* # of times executed */
! 	double		total_time;		/* total execution time in seconds */
! 	int64		rows;			/* total # of retrieved or affected rows */
! 	double		usage;			/* usage factor */
  } Counters;
  
  /*
--- 76,93 ----
   */
  typedef struct Counters
  {
! 	int64	calls;					/* # of times executed */
! 	double	total_time;				/* total execution time in seconds */
! 	int64	rows;					/* total # of retrieved or affected rows */
! 	int64	shared_blks_hit;		/* # of shared buffer hits */
! 	int64	shared_blks_read;		/* # of shared disk blocks read */
! 	int64	shared_blks_written;	/* # of shared disk blocks written */
! 	int64	local_blks_hit;			/* # of local buffer hits */
! 	int64	local_blks_read;		/* # of local disk blocks read */
! 	int64	local_blks_written;		/* # of local disk blocks written */
! 	int64	temp_blks_read;			/* # of temp blocks read */
! 	int64	temp_blks_written;		/* # of temp blocks written */
! 	double	usage;					/* usage factor */
  } Counters;
  
  /*
*************** static void pgss_ProcessUtility(Node *pa
*** 169,175 ****
  			   DestReceiver *dest, char *completionTag);
  static uint32 pgss_hash_fn(const void *key, Size keysize);
  static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
! static void pgss_store(const char *query, double total_time, uint64 rows);
  static Size pgss_memsize(void);
  static pgssEntry *entry_alloc(pgssHashKey *key);
  static void entry_dealloc(void);
--- 178,185 ----
  			   DestReceiver *dest, char *completionTag);
  static uint32 pgss_hash_fn(const void *key, Size keysize);
  static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
! static void pgss_store(const char *query, double total_time, uint64 rows,
! 					   const BufferUsage *bufusage);
  static Size pgss_memsize(void);
  static pgssEntry *entry_alloc(pgssHashKey *key);
  static void entry_dealloc(void);
*************** pgss_ExecutorEnd(QueryDesc *queryDesc)
*** 558,564 ****
  
  		pgss_store(queryDesc->sourceText,
  				   queryDesc->totaltime->total,
! 				   queryDesc->estate->es_processed);
  	}
  
  	if (prev_ExecutorEnd)
--- 568,575 ----
  
  		pgss_store(queryDesc->sourceText,
  				   queryDesc->totaltime->total,
! 				   queryDesc->estate->es_processed,
! 				   &queryDesc->totaltime->bufusage);
  	}
  
  	if (prev_ExecutorEnd)
*************** pgss_ProcessUtility(Node *parsetree, con
*** 580,586 ****
--- 591,599 ----
  		instr_time	start;
  		instr_time	duration;
  		uint64		rows = 0;
+ 		BufferUsage	bufusage;
  
+ 		bufusage = pgBufferUsage;
  		INSTR_TIME_SET_CURRENT(start);
  
  		nested_level++;
*************** pgss_ProcessUtility(Node *parsetree, con
*** 609,615 ****
  			sscanf(completionTag, "COPY " UINT64_FORMAT, &rows) != 1)
  			rows = 0;
  
! 		pgss_store(queryString, INSTR_TIME_GET_DOUBLE(duration), rows);
  	}
  	else
  	{
--- 622,638 ----
  			sscanf(completionTag, "COPY " UINT64_FORMAT, &rows) != 1)
  			rows = 0;
  
! 		/* calc differences of buffer counters. */
! 		bufusage.shared_blks_hit = pgBufferUsage.shared_blks_hit - bufusage.shared_blks_hit;
! 		bufusage.shared_blks_read = pgBufferUsage.shared_blks_read - bufusage.shared_blks_read;
! 		bufusage.shared_blks_written = pgBufferUsage.shared_blks_written - bufusage.shared_blks_written;
! 		bufusage.local_blks_hit = pgBufferUsage.local_blks_hit - bufusage.local_blks_hit;
! 		bufusage.local_blks_read = pgBufferUsage.local_blks_read - bufusage.local_blks_read;
! 		bufusage.local_blks_written = pgBufferUsage.local_blks_written - bufusage.local_blks_written;
! 		bufusage.temp_blks_read = pgBufferUsage.temp_blks_read - bufusage.temp_blks_read;
! 		bufusage.temp_blks_written = pgBufferUsage.temp_blks_written - bufusage.temp_blks_written;
! 
! 		pgss_store(queryString, INSTR_TIME_GET_DOUBLE(duration), rows, &bufusage);
  	}
  	else
  	{
*************** pgss_match_fn(const void *key1, const vo
*** 660,666 ****
   * Store some statistics for a statement.
   */
  static void
! pgss_store(const char *query, double total_time, uint64 rows)
  {
  	pgssHashKey key;
  	double		usage;
--- 683,690 ----
   * Store some statistics for a statement.
   */
  static void
! pgss_store(const char *query, double total_time, uint64 rows,
! 		   const BufferUsage *bufusage)
  {
  	pgssHashKey key;
  	double		usage;
*************** pgss_store(const char *query, double tot
*** 706,711 ****
--- 730,743 ----
  		e->counters.calls += 1;
  		e->counters.total_time += total_time;
  		e->counters.rows += rows;
+ 		e->counters.shared_blks_hit += bufusage->shared_blks_hit;
+ 		e->counters.shared_blks_read += bufusage->shared_blks_read;
+ 		e->counters.shared_blks_written += bufusage->shared_blks_written;
+ 		e->counters.local_blks_hit += bufusage->local_blks_hit;
+ 		e->counters.local_blks_read += bufusage->local_blks_read;
+ 		e->counters.local_blks_written += bufusage->local_blks_written;
+ 		e->counters.temp_blks_read += bufusage->temp_blks_read;
+ 		e->counters.temp_blks_written += bufusage->temp_blks_written;
  		e->counters.usage += usage;
  		SpinLockRelease(&e->mutex);
  	}
*************** pg_stat_statements_reset(PG_FUNCTION_ARG
*** 727,733 ****
  	PG_RETURN_VOID();
  }
  
! #define PG_STAT_STATEMENTS_COLS		6
  
  /*
   * Retrieve statement statistics.
--- 759,765 ----
  	PG_RETURN_VOID();
  }
  
! #define PG_STAT_STATEMENTS_COLS		14
  
  /*
   * Retrieve statement statistics.
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 761,783 ****
  				 errmsg("materialize mode required, but it is not " \
  						"allowed in this context")));
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
  
- 	tupdesc = CreateTemplateTupleDesc(PG_STAT_STATEMENTS_COLS, false);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "userid",
- 					   OIDOID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "dbid",
- 					   OIDOID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "query",
- 					   TEXTOID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "calls",
- 					   INT8OID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "total_time",
- 					   FLOAT8OID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "rows",
- 					   INT8OID, -1, 0);
- 
  	tupstore = tuplestore_begin_heap(true, false, work_mem);
  	rsinfo->returnMode = SFRM_Materialize;
  	rsinfo->setResult = tupstore;
--- 793,805 ----
  				 errmsg("materialize mode required, but it is not " \
  						"allowed in this context")));
  
+ 	/* Build a tuple descriptor for our result type */
+ 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ 		elog(ERROR, "return type must be a row type");
+ 
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
  
  	tupstore = tuplestore_begin_heap(true, false, work_mem);
  	rsinfo->returnMode = SFRM_Materialize;
  	rsinfo->setResult = tupstore;
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 830,835 ****
--- 852,865 ----
  		values[i++] = Int64GetDatumFast(tmp.calls);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
+ 		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
+ 		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
+ 		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
+ 		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
+ 		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
+ 		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
+ 		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
+ 		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
  
  		Assert(i == PG_STAT_STATEMENTS_COLS);
  
diff -cprN head/contrib/pg_stat_statements/pg_stat_statements.sql.in work/contrib/pg_stat_statements/pg_stat_statements.sql.in
*** head/contrib/pg_stat_statements/pg_stat_statements.sql.in	2009-01-05 00:22:25.168790000 +0900
--- work/contrib/pg_stat_statements/pg_stat_statements.sql.in	2009-12-22 17:26:49.906739840 +0900
*************** CREATE FUNCTION pg_stat_statements(
*** 15,21 ****
      OUT query text,
      OUT calls int8,
      OUT total_time float8,
!     OUT rows int8
  )
  RETURNS SETOF record
  AS 'MODULE_PATHNAME'
--- 15,29 ----
      OUT query text,
      OUT calls int8,
      OUT total_time float8,
!     OUT rows int8,
!     OUT shared_blks_hit int8,
!     OUT shared_blks_read int8,
!     OUT shared_blks_written int8,
!     OUT local_blks_hit int8,
!     OUT local_blks_read int8,
!     OUT local_blks_written int8,
!     OUT temp_blks_read int8,
!     OUT temp_blks_written int8
  )
  RETURNS SETOF record
  AS 'MODULE_PATHNAME'
diff -cprN head/doc/src/sgml/pgstatstatements.sgml work/doc/src/sgml/pgstatstatements.sgml
*** head/doc/src/sgml/pgstatstatements.sgml	2009-12-16 09:17:22.613136000 +0900
--- work/doc/src/sgml/pgstatstatements.sgml	2009-12-22 17:26:49.907697038 +0900
***************
*** 85,90 ****
--- 85,146 ----
        <entry>Total number of rows retrieved or affected by the statement</entry>
       </row>
  
+      <row>
+       <entry><structfield>shared_blks_hit</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of shared blocks hits by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>shared_blks_read</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of shared blocks reads by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>shared_blks_written</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of shared blocks writes by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>local_blks_hit</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of local blocks hits by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>local_blks_read</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of local blocks reads by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>local_blks_written</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of local blocks writes by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>temp_blks_read</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of temp blocks reads by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>temp_blks_written</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of temp blocks writes by the statement</entry>
+      </row>
+ 
      </tbody>
     </tgroup>
    </table>
*************** pg_stat_statements.track = all
*** 239,273 ****
    <title>Sample output</title>
  
    <programlisting>
! $ pgbench -i bench
! 
! postgres=# SELECT pg_stat_statements_reset();
  
  $ pgbench -c10 -t300 -M prepared bench
  
! postgres=# \x
! postgres=# SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 3;
! -[ RECORD 1 ]------------------------------------------------------------
! userid     | 10
! dbid       | 63781
! query      | UPDATE branches SET bbalance = bbalance + $1 WHERE bid = $2;
! calls      | 3000
! total_time | 20.716706
! rows       | 3000
! -[ RECORD 2 ]------------------------------------------------------------
! userid     | 10
! dbid       | 63781
! query      | UPDATE tellers SET tbalance = tbalance + $1 WHERE tid = $2;
! calls      | 3000
! total_time | 17.1107649999999
! rows       | 3000
! -[ RECORD 3 ]------------------------------------------------------------
! userid     | 10
! dbid       | 63781
! query      | UPDATE accounts SET abalance = abalance + $1 WHERE aid = $2;
! calls      | 3000
! total_time | 0.645601
! rows       | 3000
    </programlisting>
   </sect2>
  
--- 295,339 ----
    <title>Sample output</title>
  
    <programlisting>
! bench=# SELECT pg_stat_statements_reset();
  
+ $ pgbench -i bench
  $ pgbench -c10 -t300 -M prepared bench
  
! bench=# \x
! bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
!                nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
!           FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
! -[ RECORD 1 ]---------------------------------------------------------------------
! query       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
! calls       | 3000
! total_time  | 9.60900100000002
! rows        | 2836
! hit_percent | 99.9778970000200936
! -[ RECORD 2 ]---------------------------------------------------------------------
! query       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;
! calls       | 3000
! total_time  | 8.015156
! rows        | 2990
! hit_percent | 99.9731126579631345
! -[ RECORD 3 ]---------------------------------------------------------------------
! query       | copy pgbench_accounts from stdin
! calls       | 1
! total_time  | 0.310624
! rows        | 100000
! hit_percent | 0.30395136778115501520
! -[ RECORD 4 ]---------------------------------------------------------------------
! query       | UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;
! calls       | 3000
! total_time  | 0.271741999999997
! rows        | 3000
! hit_percent | 93.7968855088209426
! -[ RECORD 5 ]---------------------------------------------------------------------
! query       | alter table pgbench_accounts add primary key (aid)
! calls       | 1
! total_time  | 0.08142
! rows        | 0
! hit_percent | 34.4947735191637631
    </programlisting>
   </sect2>
  
#4Cédric Villemain
cedric.villemain.debian@gmail.com
In reply to: Takahiro Itagaki (#3)
Re: Buffer statistics for pg_stat_statements

2009/12/22 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:

Cedric Villemain <cedric.villemain@dalibo.com> wrote:

Le vendredi 18 decembre 2009 09:44:40, Takahiro Itagaki a ecrit :

I'd like to add per-query buffer usage into contrib/pg_stat_statements.

Here is a patch to add buffer usage columns into pg_stat_statements.
It also changes initialzation of the result TupleDesc from manually
coded routines to get_call_result_type().

Perhaps it can be usefull to have the percentage for hit/read ratio computed
in the view ?

I think different DBAs want different derived values; Some of them might want
the buffer hit ratio, but others might request numbers per query. I'd like to
privide only raw values from pg_stat_statements to keep it simple, but I added
a computational expression of hit percentage in the documentation.

Yes, you are right.

Show quoted text

! bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
!                nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
!           FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
! -[ RECORD 1 ]---------------------------------------------------------------------
! query       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
! calls       | 3000
! total_time  | 9.60900100000002
! rows        | 2836
! hit_percent | 99.9778970000200936

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Robert Haas
robertmhaas@gmail.com
In reply to: Takahiro Itagaki (#3)
Re: Buffer statistics for pg_stat_statements

On Tue, Dec 22, 2009 at 3:27 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Cedric Villemain <cedric.villemain@dalibo.com> wrote:

Le vendredi 18 decembre 2009 09:44:40, Takahiro Itagaki a ecrit :

I'd like to add per-query buffer usage into contrib/pg_stat_statements.

Here is a patch to add buffer usage columns into pg_stat_statements.
It also changes initialzation of the result TupleDesc from manually
coded routines to get_call_result_type().

I have reviewed this patch and I think it looks pretty good. A couple
of minor nits:

- There are needless whitespace changes in the definition of struct
Counters. The changes to the existing four members should be
reverted, and the new members should be made to match the existing
members.

- In the part that reads /* calc differences of buffer counters */,
all the lines go past 80 columns. I wonder if it would be better to
insert a line break just after the equals sign and indent the next
line by an extra tab stop. See, e.g. src/backend/commands/user.c line
338.

Other than that I think this is ready to commit.

...Robert

#6Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Robert Haas (#5)
Re: Buffer statistics for pg_stat_statements

Robert Haas <robertmhaas@gmail.com> wrote:

I have reviewed this patch and I think it looks pretty good. A couple
of minor nits:

- There are needless whitespace changes in the definition of struct
Counters. The changes to the existing four members should be
reverted, and the new members should be made to match the existing
members.

That's because the 'shared_blks_written' field is too long to keep the
existing indentations. Since we still have some rooms in 80 columns,
I'd like to change all of them as the previous patch.

- In the part that reads /* calc differences of buffer counters */,
all the lines go past 80 columns. I wonder if it would be better to
insert a line break just after the equals sign and indent the next
line by an extra tab stop. See, e.g. src/backend/commands/user.c line
338.

Ok, I'll adjust them so.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

#7Robert Haas
robertmhaas@gmail.com
In reply to: Takahiro Itagaki (#6)
Re: Buffer statistics for pg_stat_statements

On Sun, Jan 3, 2010 at 10:17 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

I have reviewed this patch and I think it looks pretty good.  A couple
of minor nits:

- There are needless whitespace changes in the definition of struct
Counters.  The changes to the existing four members should be
reverted, and the new members should be made to match the existing
members.

That's because the 'shared_blks_written' field is too long to keep the
existing indentations. Since we still have some rooms in 80 columns,
I'd like to change all of them as the previous patch.

I don't necessarily know what the right thing to do with the new ones
is, but I am pretty sure that pg_indent will revert any changes you
make to the existing ones. Assuming that's so, it doesn't make sense
to change them.

...Robert

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#7)
Re: Buffer statistics for pg_stat_statements

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Jan 3, 2010 at 10:17 PM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Robert Haas <robertmhaas@gmail.com> wrote:

- There are needless whitespace changes in the definition of struct
Counters. �The changes to the existing four members should be
reverted, and the new members should be made to match the existing
members.

That's because the 'shared_blks_written' field is too long to keep the
existing indentations. Since we still have some rooms in 80 columns,
I'd like to change all of them as the previous patch.

I don't necessarily know what the right thing to do with the new ones
is, but I am pretty sure that pg_indent will revert any changes you
make to the existing ones.

That it will. The proposed changes to the existing lines are an
exercise in uselessness; and to the extent that you format the added
lines with this layout in mind, the final result could be worse than
what you'd get if you adapt to pg_indent's rules to start with.

One possibility is to adopt shorter field names than these.

regards, tom lane

#9Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Tom Lane (#8)
1 attachment(s)
Re: Buffer statistics for pg_stat_statements

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't necessarily know what the right thing to do with the new ones
is, but I am pretty sure that pg_indent will revert any changes you
make to the existing ones.

That it will. The proposed changes to the existing lines are an
exercise in uselessness; and to the extent that you format the added
lines with this layout in mind, the final result could be worse than
what you'd get if you adapt to pg_indent's rules to start with.

Here is the proposed patch to adjust white spaces.
It does not indent variables, but indents comments of the variables
to adjust other fields. Are those changes ok?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachments:

pg_stat_statements_bufusage_20100107.patchapplication/octet-stream; name=pg_stat_statements_bufusage_20100107.patchDownload
diff -cprN head/contrib/pg_stat_statements/pg_stat_statements.c work/contrib/pg_stat_statements/pg_stat_statements.c
*** head/contrib/pg_stat_statements/pg_stat_statements.c	2010-01-04 09:10:26.638773000 +0900
--- work/contrib/pg_stat_statements/pg_stat_statements.c	2010-01-07 17:27:03.299544176 +0900
***************
*** 26,31 ****
--- 26,32 ----
  #include "catalog/pg_type.h"
  #include "executor/executor.h"
  #include "executor/instrument.h"
+ #include "funcapi.h"
  #include "mb/pg_wchar.h"
  #include "miscadmin.h"
  #include "pgstat.h"
*************** PG_MODULE_MAGIC;
*** 44,50 ****
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20081202;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
--- 45,51 ----
  #define PGSS_DUMP_FILE	"global/pg_stat_statements.stat"
  
  /* This constant defines the magic number in the stats file header */
! static const uint32 PGSS_FILE_HEADER = 0x20100104;
  
  /* XXX: Should USAGE_EXEC reflect execution time and/or buffer usage? */
  #define USAGE_EXEC(duration)	(1.0)
*************** typedef struct pgssHashKey
*** 75,84 ****
   */
  typedef struct Counters
  {
! 	int64		calls;			/* # of times executed */
! 	double		total_time;		/* total execution time in seconds */
! 	int64		rows;			/* total # of retrieved or affected rows */
! 	double		usage;			/* usage factor */
  } Counters;
  
  /*
--- 76,93 ----
   */
  typedef struct Counters
  {
! 	int64		calls;				/* # of times executed */
! 	double		total_time;			/* total execution time in seconds */
! 	int64		rows;				/* total # of retrieved or affected rows */
! 	int64		shared_blks_hit;	/* # of shared buffer hits */
! 	int64		shared_blks_read;	/* # of shared disk blocks read */
! 	int64		shared_blks_written;/* # of shared disk blocks written */
! 	int64		local_blks_hit;		/* # of local buffer hits */
! 	int64		local_blks_read;	/* # of local disk blocks read */
! 	int64		local_blks_written;	/* # of local disk blocks written */
! 	int64		temp_blks_read;		/* # of temp blocks read */
! 	int64		temp_blks_written;	/* # of temp blocks written */
! 	double		usage;				/* usage factor */
  } Counters;
  
  /*
*************** static void pgss_ProcessUtility(Node *pa
*** 169,175 ****
  			   DestReceiver *dest, char *completionTag);
  static uint32 pgss_hash_fn(const void *key, Size keysize);
  static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
! static void pgss_store(const char *query, double total_time, uint64 rows);
  static Size pgss_memsize(void);
  static pgssEntry *entry_alloc(pgssHashKey *key);
  static void entry_dealloc(void);
--- 178,185 ----
  			   DestReceiver *dest, char *completionTag);
  static uint32 pgss_hash_fn(const void *key, Size keysize);
  static int	pgss_match_fn(const void *key1, const void *key2, Size keysize);
! static void pgss_store(const char *query, double total_time, uint64 rows,
! 					   const BufferUsage *bufusage);
  static Size pgss_memsize(void);
  static pgssEntry *entry_alloc(pgssHashKey *key);
  static void entry_dealloc(void);
*************** pgss_ExecutorEnd(QueryDesc *queryDesc)
*** 558,564 ****
  
  		pgss_store(queryDesc->sourceText,
  				   queryDesc->totaltime->total,
! 				   queryDesc->estate->es_processed);
  	}
  
  	if (prev_ExecutorEnd)
--- 568,575 ----
  
  		pgss_store(queryDesc->sourceText,
  				   queryDesc->totaltime->total,
! 				   queryDesc->estate->es_processed,
! 				   &queryDesc->totaltime->bufusage);
  	}
  
  	if (prev_ExecutorEnd)
*************** pgss_ProcessUtility(Node *parsetree, con
*** 580,586 ****
--- 591,599 ----
  		instr_time	start;
  		instr_time	duration;
  		uint64		rows = 0;
+ 		BufferUsage	bufusage;
  
+ 		bufusage = pgBufferUsage;
  		INSTR_TIME_SET_CURRENT(start);
  
  		nested_level++;
*************** pgss_ProcessUtility(Node *parsetree, con
*** 609,615 ****
  			sscanf(completionTag, "COPY " UINT64_FORMAT, &rows) != 1)
  			rows = 0;
  
! 		pgss_store(queryString, INSTR_TIME_GET_DOUBLE(duration), rows);
  	}
  	else
  	{
--- 622,647 ----
  			sscanf(completionTag, "COPY " UINT64_FORMAT, &rows) != 1)
  			rows = 0;
  
! 		/* calc differences of buffer counters. */
! 		bufusage.shared_blks_hit =
! 			pgBufferUsage.shared_blks_hit - bufusage.shared_blks_hit;
! 		bufusage.shared_blks_read =
! 			pgBufferUsage.shared_blks_read - bufusage.shared_blks_read;
! 		bufusage.shared_blks_written =
! 			pgBufferUsage.shared_blks_written - bufusage.shared_blks_written;
! 		bufusage.local_blks_hit =
! 			pgBufferUsage.local_blks_hit - bufusage.local_blks_hit;
! 		bufusage.local_blks_read =
! 			pgBufferUsage.local_blks_read - bufusage.local_blks_read;
! 		bufusage.local_blks_written =
! 			pgBufferUsage.local_blks_written - bufusage.local_blks_written;
! 		bufusage.temp_blks_read =
! 			pgBufferUsage.temp_blks_read - bufusage.temp_blks_read;
! 		bufusage.temp_blks_written =
! 			pgBufferUsage.temp_blks_written - bufusage.temp_blks_written;
! 
! 		pgss_store(queryString, INSTR_TIME_GET_DOUBLE(duration), rows,
! 				   &bufusage);
  	}
  	else
  	{
*************** pgss_match_fn(const void *key1, const vo
*** 660,666 ****
   * Store some statistics for a statement.
   */
  static void
! pgss_store(const char *query, double total_time, uint64 rows)
  {
  	pgssHashKey key;
  	double		usage;
--- 692,699 ----
   * Store some statistics for a statement.
   */
  static void
! pgss_store(const char *query, double total_time, uint64 rows,
! 		   const BufferUsage *bufusage)
  {
  	pgssHashKey key;
  	double		usage;
*************** pgss_store(const char *query, double tot
*** 706,711 ****
--- 739,752 ----
  		e->counters.calls += 1;
  		e->counters.total_time += total_time;
  		e->counters.rows += rows;
+ 		e->counters.shared_blks_hit += bufusage->shared_blks_hit;
+ 		e->counters.shared_blks_read += bufusage->shared_blks_read;
+ 		e->counters.shared_blks_written += bufusage->shared_blks_written;
+ 		e->counters.local_blks_hit += bufusage->local_blks_hit;
+ 		e->counters.local_blks_read += bufusage->local_blks_read;
+ 		e->counters.local_blks_written += bufusage->local_blks_written;
+ 		e->counters.temp_blks_read += bufusage->temp_blks_read;
+ 		e->counters.temp_blks_written += bufusage->temp_blks_written;
  		e->counters.usage += usage;
  		SpinLockRelease(&e->mutex);
  	}
*************** pg_stat_statements_reset(PG_FUNCTION_ARG
*** 727,733 ****
  	PG_RETURN_VOID();
  }
  
! #define PG_STAT_STATEMENTS_COLS		6
  
  /*
   * Retrieve statement statistics.
--- 768,774 ----
  	PG_RETURN_VOID();
  }
  
! #define PG_STAT_STATEMENTS_COLS		14
  
  /*
   * Retrieve statement statistics.
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 761,783 ****
  				 errmsg("materialize mode required, but it is not " \
  						"allowed in this context")));
  
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
  
- 	tupdesc = CreateTemplateTupleDesc(PG_STAT_STATEMENTS_COLS, false);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 1, "userid",
- 					   OIDOID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 2, "dbid",
- 					   OIDOID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 3, "query",
- 					   TEXTOID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 4, "calls",
- 					   INT8OID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 5, "total_time",
- 					   FLOAT8OID, -1, 0);
- 	TupleDescInitEntry(tupdesc, (AttrNumber) 6, "rows",
- 					   INT8OID, -1, 0);
- 
  	tupstore = tuplestore_begin_heap(true, false, work_mem);
  	rsinfo->returnMode = SFRM_Materialize;
  	rsinfo->setResult = tupstore;
--- 802,814 ----
  				 errmsg("materialize mode required, but it is not " \
  						"allowed in this context")));
  
+ 	/* Build a tuple descriptor for our result type */
+ 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
+ 		elog(ERROR, "return type must be a row type");
+ 
  	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
  	oldcontext = MemoryContextSwitchTo(per_query_ctx);
  
  	tupstore = tuplestore_begin_heap(true, false, work_mem);
  	rsinfo->returnMode = SFRM_Materialize;
  	rsinfo->setResult = tupstore;
*************** pg_stat_statements(PG_FUNCTION_ARGS)
*** 829,834 ****
--- 860,873 ----
  		values[i++] = Int64GetDatumFast(tmp.calls);
  		values[i++] = Float8GetDatumFast(tmp.total_time);
  		values[i++] = Int64GetDatumFast(tmp.rows);
+ 		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
+ 		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
+ 		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
+ 		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
+ 		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
+ 		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
+ 		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
+ 		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
  
  		Assert(i == PG_STAT_STATEMENTS_COLS);
  
diff -cprN head/contrib/pg_stat_statements/pg_stat_statements.sql.in work/contrib/pg_stat_statements/pg_stat_statements.sql.in
*** head/contrib/pg_stat_statements/pg_stat_statements.sql.in	2009-01-05 00:22:25.168790000 +0900
--- work/contrib/pg_stat_statements/pg_stat_statements.sql.in	2010-01-07 17:24:24.376766293 +0900
*************** CREATE FUNCTION pg_stat_statements(
*** 15,21 ****
      OUT query text,
      OUT calls int8,
      OUT total_time float8,
!     OUT rows int8
  )
  RETURNS SETOF record
  AS 'MODULE_PATHNAME'
--- 15,29 ----
      OUT query text,
      OUT calls int8,
      OUT total_time float8,
!     OUT rows int8,
!     OUT shared_blks_hit int8,
!     OUT shared_blks_read int8,
!     OUT shared_blks_written int8,
!     OUT local_blks_hit int8,
!     OUT local_blks_read int8,
!     OUT local_blks_written int8,
!     OUT temp_blks_read int8,
!     OUT temp_blks_written int8
  )
  RETURNS SETOF record
  AS 'MODULE_PATHNAME'
diff -cprN head/doc/src/sgml/pgstatstatements.sgml work/doc/src/sgml/pgstatstatements.sgml
*** head/doc/src/sgml/pgstatstatements.sgml	2009-12-16 09:17:22.613136000 +0900
--- work/doc/src/sgml/pgstatstatements.sgml	2010-01-07 17:24:24.376766293 +0900
***************
*** 85,90 ****
--- 85,146 ----
        <entry>Total number of rows retrieved or affected by the statement</entry>
       </row>
  
+      <row>
+       <entry><structfield>shared_blks_hit</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of shared blocks hits by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>shared_blks_read</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of shared blocks reads by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>shared_blks_written</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of shared blocks writes by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>local_blks_hit</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of local blocks hits by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>local_blks_read</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of local blocks reads by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>local_blks_written</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of local blocks writes by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>temp_blks_read</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of temp blocks reads by the statement</entry>
+      </row>
+ 
+      <row>
+       <entry><structfield>temp_blks_written</structfield></entry>
+       <entry><type>bigint</type></entry>
+       <entry></entry>
+       <entry>Total number of temp blocks writes by the statement</entry>
+      </row>
+ 
      </tbody>
     </tgroup>
    </table>
*************** pg_stat_statements.track = all
*** 239,273 ****
    <title>Sample output</title>
  
    <programlisting>
! $ pgbench -i bench
! 
! postgres=# SELECT pg_stat_statements_reset();
  
  $ pgbench -c10 -t300 -M prepared bench
  
! postgres=# \x
! postgres=# SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 3;
! -[ RECORD 1 ]------------------------------------------------------------
! userid     | 10
! dbid       | 63781
! query      | UPDATE branches SET bbalance = bbalance + $1 WHERE bid = $2;
! calls      | 3000
! total_time | 20.716706
! rows       | 3000
! -[ RECORD 2 ]------------------------------------------------------------
! userid     | 10
! dbid       | 63781
! query      | UPDATE tellers SET tbalance = tbalance + $1 WHERE tid = $2;
! calls      | 3000
! total_time | 17.1107649999999
! rows       | 3000
! -[ RECORD 3 ]------------------------------------------------------------
! userid     | 10
! dbid       | 63781
! query      | UPDATE accounts SET abalance = abalance + $1 WHERE aid = $2;
! calls      | 3000
! total_time | 0.645601
! rows       | 3000
    </programlisting>
   </sect2>
  
--- 295,339 ----
    <title>Sample output</title>
  
    <programlisting>
! bench=# SELECT pg_stat_statements_reset();
  
+ $ pgbench -i bench
  $ pgbench -c10 -t300 -M prepared bench
  
! bench=# \x
! bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
!                nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
!           FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
! -[ RECORD 1 ]---------------------------------------------------------------------
! query       | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2;
! calls       | 3000
! total_time  | 9.60900100000002
! rows        | 2836
! hit_percent | 99.9778970000200936
! -[ RECORD 2 ]---------------------------------------------------------------------
! query       | UPDATE pgbench_tellers SET tbalance = tbalance + $1 WHERE tid = $2;
! calls       | 3000
! total_time  | 8.015156
! rows        | 2990
! hit_percent | 99.9731126579631345
! -[ RECORD 3 ]---------------------------------------------------------------------
! query       | copy pgbench_accounts from stdin
! calls       | 1
! total_time  | 0.310624
! rows        | 100000
! hit_percent | 0.30395136778115501520
! -[ RECORD 4 ]---------------------------------------------------------------------
! query       | UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2;
! calls       | 3000
! total_time  | 0.271741999999997
! rows        | 3000
! hit_percent | 93.7968855088209426
! -[ RECORD 5 ]---------------------------------------------------------------------
! query       | alter table pgbench_accounts add primary key (aid)
! calls       | 1
! total_time  | 0.08142
! rows        | 0
! hit_percent | 34.4947735191637631
    </programlisting>
   </sect2>
  
#10Robert Haas
robertmhaas@gmail.com
In reply to: Takahiro Itagaki (#9)
Re: Buffer statistics for pg_stat_statements

On Thu, Jan 7, 2010 at 3:31 AM, Takahiro Itagaki
<itagaki.takahiro@oss.ntt.co.jp> wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I don't necessarily know what the right thing to do with the new ones
is, but I am pretty sure that pg_indent will revert any changes you
make to the existing ones.

That it will.  The proposed changes to the existing lines are an
exercise in uselessness; and to the extent that you format the added
lines with this layout in mind, the final result could be worse than
what you'd get if you adapt to pg_indent's rules to start with.

Here is the proposed patch to adjust white spaces.
It does not indent variables, but indents comments of the variables
to adjust other fields. Are those changes ok?

I think so. I'm not sure if it will push out the comment that is
immediately adjacent to the trailing semicolon, but I don't think it
will decrease the indent on the ones you've indented more. I think
this is close enough for now and you should go ahead and commit it.

...Robert

#11Takahiro Itagaki
itagaki.takahiro@oss.ntt.co.jp
In reply to: Robert Haas (#10)
Re: Buffer statistics for pg_stat_statements

Robert Haas <robertmhaas@gmail.com> wrote:

I think so. I'm not sure if it will push out the comment that is
immediately adjacent to the trailing semicolon, but I don't think it
will decrease the indent on the ones you've indented more. I think
this is close enough for now and you should go ahead and commit it.

Thanks for your review. I committed it.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center