Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

Started by Greg Smithalmost 14 years ago20 messages
#1Greg Smith
greg@2ndQuadrant.com
1 attachment(s)

One of the most useful bits of feedback on how well checkpoint I/O is
going is the amount of time taken to sync files to disk. Right now the
only way to get that is to parse the logs. The attached patch publishes
the most useful three bits of data you could only get from
log_checkpoints before out to pg_stat_bgwriter. Easiest to just show an
example:

$ createdb pgbench
$ psql -c "select pg_stat_reset_shared('bgwriter')"
$ pgbench -i -s 10 pgbench
$ psql -c "checkpoint"

$ tail $PGLOG
LOG: checkpoint complete: wrote 590 buffers (14.4%); 0 transaction log
file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.117 s,
total=1.746 s; sync files=31, longest=0.104 s, average=0.003 s

$ psql -c "select write_ms,sync_ms,sync_files from pg_stat_bgwriter"
write_ms | sync_ms | sync_files
----------+---------+------------
2 | 117 | 31

Like the other checkpoint statistics, these are updated in one lump when
the checkpoint finishes.

This includes rough documentation and the code isn't very complicated; a
cat version bump is required. Some open concerns I still have about this:

-The total amount of time in the sync phase is not the same as the total
of all sync wait times. I have published the total elapsed sync time.
The total of the times spent waiting on individual sync calls is also
available at the point when the above data is collected, and it could be
published as well. I don't think that's necessary yet, since they are
almost identical right now. But any feature that staggers sync calls
around might make that distinction important.

-The total checkpoint time is larger than the total of write+sync. You
can see that here: 2ms write + 117ms sync = 119ms; total checkpoint
time is actually 1746ms though. While I don't normally find the total
useful from a tuning perspective, it may be sensible to publish it just
to further reduce any perceived need for log scraping here.

-None of the debug1 level information about individual sync calls can be
published this way usefully. I'd like that to come out somewhere
eventually, but really what's needed there is a performance event
logging history collector, not pg_stats. I'm content that this patch
gets the big picture data out; it's sufficient to let you rule out
checkpoint problems if they aren't happening. If you only have to drill
into the logs when long times start showing up in this data, that's not
so bad to live with.

-I can't tell for sure if this is working properly when log_checkpoints
is off. This now collects checkpoint end time data in all cases,
whereas before it ignored that work if log_checkpoints was off. I think
I will have to fake the logging in this area, make it log some things
even when log_checkpoints is off, to confirm it gives the right
results. I haven't done that yet.

I would like to be able to tell people they need never turn on
log_checkpoints if they monitor pg_stat_bgwriter instead, because that
sets a good precedent for what direction the database is going. It
would be nice for pg_stat_bgwriter's format to settle down for a long
period too. In addition to my small list of concerns here, there seem
to be an increasing number of write path feature patches floating around
here lately. That all works against this being stable. The main
consumer of this data I use regularly is the pg_stat_bgwriter graphs
that Munin produces, and my expectation is that myself and/or Magnus
will make sure it's compatible with any changes made here for 9.2.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachments:

publish-sync-stats-v1.patchtext/x-patch; name=publish-sync-stats-v1.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a12a9a2..674091c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1182,6 +1182,34 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
      </row>
 
      <row>
+      <entry><literal><function>pg_stat_get_bgwriter_write_ms()</function></literal></entry>
+      <entry><type>bigint</type></entry>
+      <entry>
+       Total amount of time that has been spent in the part of checkpoint
+       processing where files are written to disk, in milliseconds.
+      </entry>
+     </row>
+
+
+     <row>
+      <entry><literal><function>pg_stat_get_bgwriter_sync_ms()</function></literal></entry>
+      <entry><type>bigint</type></entry>
+      <entry>
+       Total amount of time that has been spent in the part of checkpoint
+       processing where files are synchronized to disk, in milliseconds.
+      </entry>
+     </row>
+
+     <row>
+      <entry><literal><function>pg_stat_get_bgwriter_sync_files()</function></literal></entry>
+      <entry><type>bigint</type></entry>
+      <entry>
+       Total number of files that have been synchronized to disk during
+       checkpoint processing.
+      </entry>
+     </row>
+
+     <row>
       <entry><literal><function>pg_stat_get_wal_senders()</function></literal></entry>
       <entry><type>setof record</type></entry>
       <entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce659ec..4ff7b54 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7413,10 +7413,10 @@ LogCheckpointStart(int flags, bool restartpoint)
 }
 
 /*
- * Log end of a checkpoint.
+ * Time and potentially log the end of a checkpoint.
  */
 static void
-LogCheckpointEnd(bool restartpoint)
+TimeCheckpointEnd(bool restartpoint)
 {
 	long		write_secs,
 				sync_secs,
@@ -7444,6 +7444,16 @@ LogCheckpointEnd(bool restartpoint)
 						CheckpointStats.ckpt_sync_end_t,
 						&sync_secs, &sync_usecs);
 
+	BgWriterStats.m_write_ms=(uint64) write_secs * 1000 + write_usecs / 1000;
+	BgWriterStats.m_sync_ms=(uint64) sync_secs * 1000 + sync_usecs / 1000;
+	BgWriterStats.m_sync_files=CheckpointStats.ckpt_sync_rels;
+
+	/*
+	 * All of the published timing statistics are accounted for.  Only
+	 * continue if a log message is being written.
+	 */
+	if (!log_checkpoints) return;
+
 	/*
 	 * Timing values returned from CheckpointStats are in microseconds.
 	 * Convert to the second plus microsecond form that TimestampDifference
@@ -7892,8 +7902,7 @@ CreateCheckPoint(int flags)
 		TruncateSUBTRANS(GetOldestXmin(true, false));
 
 	/* All real work is done, but log before releasing lock. */
-	if (log_checkpoints)
-		LogCheckpointEnd(false);
+	TimeCheckpointEnd(false);
 
 	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
 									 NBuffers,
@@ -8158,8 +8167,7 @@ CreateRestartPoint(int flags)
 		TruncateSUBTRANS(GetOldestXmin(true, false));
 
 	/* All real work is done, but log before releasing lock. */
-	if (log_checkpoints)
-		LogCheckpointEnd(true);
+	TimeCheckpointEnd(true);
 
 	xtime = GetLatestXTime();
 	ereport((log_checkpoints ? LOG : DEBUG2),
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 50ba20c..280f563 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -622,7 +622,11 @@ CREATE VIEW pg_stat_bgwriter AS
         pg_stat_get_buf_written_backend() AS buffers_backend,
         pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
         pg_stat_get_buf_alloc() AS buffers_alloc,
-        pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+        pg_stat_get_bgwriter_stat_reset_time() AS stats_reset,
+        pg_stat_get_bgwriter_write_ms() AS write_ms,
+        pg_stat_get_bgwriter_sync_ms() AS sync_ms,
+        pg_stat_get_bgwriter_sync_files() AS sync_files
+        ;
 
 CREATE VIEW pg_user_mappings AS
     SELECT
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 323d42b..29c7fb4 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4360,6 +4360,9 @@ pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len)
 	globalStats.buf_written_backend += msg->m_buf_written_backend;
 	globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
 	globalStats.buf_alloc += msg->m_buf_alloc;
+	globalStats.write_ms += msg->m_write_ms;
+	globalStats.sync_ms += msg->m_sync_ms;
+	globalStats.sync_files += msg->m_sync_files;
 }
 
 /* ----------
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index b4986d8..69d16d0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -86,6 +86,9 @@ extern Datum pg_stat_get_bgwriter_buf_written_checkpoints(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_write_ms(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_sync_ms(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_buf_alloc(PG_FUNCTION_ARGS);
@@ -1305,6 +1308,22 @@ pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS)
 }
 
 Datum
+pg_stat_get_bgwriter_write_ms(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_global()->write_ms);
+}
+Datum
+pg_stat_get_bgwriter_sync_ms(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_global()->sync_ms);
+}
+Datum
+pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_INT64(pgstat_fetch_global()->sync_files);
+}
+
+Datum
 pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
 {
 	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stat_reset_timestamp);
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 1ddf4bf..7f8de4d 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -15,6 +15,7 @@
 #include "access/xlogdefs.h"
 #include "datatype/timestamp.h"
 #include "lib/stringinfo.h"
+#include "pgstat.h"
 #include "storage/buf.h"
 #include "storage/latch.h"
 #include "utils/pg_crc.h"
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 355c61a..22c2392 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2645,13 +2645,18 @@ DATA(insert OID = 2773 ( pg_stat_get_bgwriter_maxwritten_clean PGNSP PGUID 12 1
 DESCR("statistics: number of times the bgwriter stopped processing when it had written too many buffers while cleaning");
 DATA(insert OID = 3075 ( pg_stat_get_bgwriter_stat_reset_time PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 1184 "" _null_ _null_ _null_ _null_	pg_stat_get_bgwriter_stat_reset_time _null_ _null_ _null_ ));
 DESCR("statistics: last reset for the bgwriter");
+DATA(insert OID = 3144 ( pg_stat_get_bgwriter_write_ms PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_write_ms _null_ _null_ _null_ ));
+DESCR("statistics: total amount of time the bgwriter has spent writing buffers to disk");
+DATA(insert OID = 3145 ( pg_stat_get_bgwriter_sync_ms PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_ms _null_ _null_ _null_ ));
+DESCR("statistics: total amount of time the bgwriter has spent syncing buffers to disk");
+DATA(insert OID = 3146 ( pg_stat_get_bgwriter_sync_files PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_files _null_ _null_ _null_ ));
+DESCR("statistics: total number of files the bgwriter has synced to disk");
 DATA(insert OID = 2775 ( pg_stat_get_buf_written_backend PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_written_backend _null_ _null_ _null_ ));
 DESCR("statistics: number of buffers written by backends");
 DATA(insert OID = 3063 ( pg_stat_get_buf_fsync_backend PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_fsync_backend _null_ _null_ _null_ ));
 DESCR("statistics: number of backend buffer writes that did their own fsync");
 DATA(insert OID = 2859 ( pg_stat_get_buf_alloc			PGNSP PGUID 12 1 0 0 0 f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_alloc _null_ _null_ _null_ ));
 DESCR("statistics: number of buffer allocations");
-
 DATA(insert OID = 2978 (  pg_stat_get_function_calls		PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_function_calls _null_ _null_ _null_ ));
 DESCR("statistics: number of function calls");
 DATA(insert OID = 2979 (  pg_stat_get_function_time			PGNSP PGUID 12 1 0 0 0 f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_function_time _null_ _null_ _null_ ));
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index b8c6d82..311a95a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -362,6 +362,9 @@ typedef struct PgStat_MsgBgWriter
 	PgStat_Counter m_buf_written_backend;
 	PgStat_Counter m_buf_fsync_backend;
 	PgStat_Counter m_buf_alloc;
+	PgStat_Counter m_write_ms;
+	PgStat_Counter m_sync_ms;
+	PgStat_Counter m_sync_files;
 } PgStat_MsgBgWriter;
 
 /* ----------
@@ -584,6 +587,9 @@ typedef struct PgStat_GlobalStats
 	PgStat_Counter buf_written_backend;
 	PgStat_Counter buf_fsync_backend;
 	PgStat_Counter buf_alloc;
+	PgStat_Counter write_ms;
+	PgStat_Counter sync_ms;
+	PgStat_Counter sync_files;
 	TimestampTz stat_reset_timestamp;
 } PgStat_GlobalStats;
 
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 454e1f9..d4bf9ce 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1295,7 +1295,7 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
  pg_stat_activity                | SELECT s.datid, d.datname, s.procpid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.waiting, s.current_query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, procpid, usesysid, application_name, current_query, waiting, xact_start, query_start, backend_start, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
  pg_stat_all_indexes             | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
  pg_stat_all_tables              | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
- pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
+ pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset, pg_stat_get_bgwriter_write_ms() AS write_ms, pg_stat_get_bgwriter_sync_ms() AS sync_ms, pg_stat_get_bgwriter_sync_files() AS sync_files;
  pg_stat_database                | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
  pg_stat_database_conflicts      | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
  pg_stat_replication             | SELECT s.procpid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, procpid, usesysid, application_name, current_query, waiting, xact_start, query_start, backend_start, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(procpid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.procpid = w.procpid));
#2Greg Smith
greg@2ndQuadrant.com
In reply to: Greg Smith (#1)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On 01/16/2012 01:28 AM, Greg Smith wrote:

-I can't tell for sure if this is working properly when
log_checkpoints is off. This now collects checkpoint end time data in
all cases, whereas before it ignored that work if log_checkpoints was off.

...and there's at least one I missed located already: inside of md.c.
I'd forgotten how many spots where timing calls are optimized out are
floating around this code path.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

#3Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Greg Smith (#2)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Sun, Jan 15, 2012 at 10:46 PM, Greg Smith <greg@2ndquadrant.com> wrote:

On 01/16/2012 01:28 AM, Greg Smith wrote:

-I can't tell for sure if this is working properly when log_checkpoints is
off.  This now collects checkpoint end time data in all cases, whereas
before it ignored that work if log_checkpoints was off.

...and there's at least one I missed located already:  inside of md.c.  I'd
forgotten how many spots where timing calls are optimized out are floating
around this code path.

I think CF app page for this patch has missing link with wrong message-id.

Thanks,
--
Hitoshi Harada

#4Alvaro Herrera
alvherre@commandprompt.com
In reply to: Hitoshi Harada (#3)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

Excerpts from Hitoshi Harada's message of jue ene 19 07:08:52 -0300 2012:

On Sun, Jan 15, 2012 at 10:46 PM, Greg Smith <greg@2ndquadrant.com> wrote:

On 01/16/2012 01:28 AM, Greg Smith wrote:

-I can't tell for sure if this is working properly when log_checkpoints is
off.  This now collects checkpoint end time data in all cases, whereas
before it ignored that work if log_checkpoints was off.

...and there's at least one I missed located already:  inside of md.c.  I'd
forgotten how many spots where timing calls are optimized out are floating
around this code path.

I think CF app page for this patch has missing link with wrong message-id.

Fixed, thanks

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#1)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Mon, Jan 16, 2012 at 1:28 AM, Greg Smith <greg@2ndquadrant.com> wrote:

One of the most useful bits of feedback on how well checkpoint I/O is going
is the amount of time taken to sync files to disk.  Right now the only way
to get that is to parse the logs.  The attached patch publishes the most
useful three bits of data you could only get from log_checkpoints before out
to pg_stat_bgwriter.

It's not quite clear from your email, but I gather that the way that
this is intended to work is that these values increment every time we
checkpoint?

Also, forgive for asking this possibly-stupid question, but of what
use is this information? I can't imagine why I'd care about a running
total of the number of files fsync'd to disk. I also can't really
imagine why I'd care about the length of the write phase, which surely
will almost always be a function of checkpoint_completion_target and
checkpoint_timeout unless I manage to overrun the number of
checkpoint_segments I've allocated. The only number that really seems
useful to me is the time spent syncing. I have a clear idea what to
look for there: smaller numbers are better than bigger ones. For the
rest I'm mystified.

And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be "what percentage of my
total sync time is accounted for by the longest sync?". And so right
there I'm back to the logs. It's not clear how such information could
be usefully exposed in pg_stat_bgwriter either, since you probably
want to know only the last few values, not a total over all time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#5)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Thu, Jan 19, 2012 at 3:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:

And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be "what percentage of my
total sync time is accounted for by the longest sync?".  And so right
there I'm back to the logs.

It seems clear that the purpose of this is to quickly and easily
decide whether there is a potential problem.

If you decide there is a potential problem, you may wish to look at
more detailed information.

So there is a huge time saving from avoiding the need to look at the
detail when its unnecessary and possibly not even enabled.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#7Greg Smith
greg@2ndQuadrant.com
In reply to: Robert Haas (#5)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On 01/19/2012 10:52 AM, Robert Haas wrote:

It's not quite clear from your email, but I gather that the way that
this is intended to work is that these values increment every time we
checkpoint?

Right--they get updated in the same atomic bump that moves up things
like buffers_checkpoint

Also, forgive for asking this possibly-stupid question, but of what
use is this information? I can't imagine why I'd care about a running
total of the number of files fsync'd to disk. I also can't really
imagine why I'd care about the length of the write phase, which surely
will almost always be a function of checkpoint_completion_target and
checkpoint_timeout unless I manage to overrun the number of
checkpoint_segments I've allocated. The only number that really seems
useful to me is the time spent syncing. I have a clear idea what to
look for there: smaller numbers are better than bigger ones. For the
rest I'm mystified.

Priority #1 here is to reduce (but, admittedly, not always eliminate)
the need for log file parsing of this particular area, so including all
the major bits from the existing log message that can be published this
way would include the write phase time. You mentioned one reason why
the write phase time might be interesting; there could be others. One
of the things expected here is that Munin will expand its graphing of
values from pg_stat_bgwriter to include all these fields. Most of the
time the graph of time spent in the write phase will be boring and
useless. Making it easy for a look at a graph to spot those rare times
when it isn't is one motivation for including it.

As for why to include the number of files being sync'd, one reason is
again simply wanting to include everything that can easily be
published. A second is that it helps support ideas like my "Checkpoint
sync pause" one; that's untunable in any reasonable way without some
easy way of monitoring the number of files typically sync'd. Sometimes
when I'm investigating checkpoint spikes during sync, I wonder whether
they were because more files than usual were synced, or if it's instead
just because of more churn on a smaller number. Making this easy to
graph pulls that data out to where I can compare it with disk I/O
trends. And there's precedent now proving that an always incrementing
number in pg_stat_bgwriter can be turned into such a graph easily by
monitoring tools.

And, it doesn't seem like it's necessarily going to safe me a whole
lot either, because if it turns out that my sync phases are long, the
first question out of my mouth is going to be "what percentage of my
total sync time is accounted for by the longest sync?". And so right
there I'm back to the logs. It's not clear how such information could
be usefully exposed in pg_stat_bgwriter either, since you probably
want to know only the last few values, not a total over all time.

This isn't ideal yet. I mentioned how some future "performance event
logging history collector" was really needed as a place to push longest
sync times into, and we don't have it yet. This is the best thing to
instrument that I'm sure is useful, and that I can stick onto with the
existing infrastructure.

The idea is that this change makes it possible to trigger a "sync times
are too long" alert out of a tool that's based solely on database
queries. When that goes off, yes you're possibly back to the logs again
for more details about the longest individual sync time. But the rest
of the time, what's hopefully the normal state of things, you can ignore
the logs and just track the pg_stat_bgwriter numbers.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

#8Jeff Janes
jeff.janes@gmail.com
In reply to: Greg Smith (#1)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Sun, Jan 15, 2012 at 10:28 PM, Greg Smith <greg@2ndquadrant.com> wrote:

I would like to be able to tell people they need never turn on
log_checkpoints if they monitor pg_stat_bgwriter instead, because that sets
a good precedent for what direction the database is going.  It would be nice
for pg_stat_bgwriter's format to settle down for a long period too.

While on the topic of the format of pg_stat_bgwriter and it converging
on stability:

I'm finding the backend_writes column pretty unfortunate. The only
use I know of for it is to determine if the bgwriter is lagging
behind. Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes "by design" due to the use buffer access strategy
during bulk inserts.

If I'm running a tightly controlled benchmark on an otherwise unused
server and I know that no BAS is being used, then I can meaningfully
use backend_writes. That is a pretty limiting limit.

I think we should either create a separate column to segregate BAS
backend_writes, or just don't report them at all and report only the
non-BAS ones into pg_stat_bgwriter.

I know you've argued against this before
(http://archives.postgresql.org/pgsql-performance/2011-03/msg00340.php),
but I think the assumption that all writes are absorbed by the OS
without blocking is assuming an awful lot. If the server is not
behaving well, then it very well may be blocking on writes. If we are
confident that the OS can always efficiently cache writes, why bother
with a background writer at all?

I think the argument that it is not important to know which strategy
motivated a backend write could just as validly be reformulated as an
argument that we don't need to know how many backend writes there were
in the first place.

Knowing only an aggregation of BAS-motivated backend writes and
straggling-bgwriter-motivated backend writes just doesn't seem useful
to me. Am I missing a use for it? Not only is it probably not
useful, it seems to be actively harmful as I've seen several times
where it leads people down false paths (including me several times, as
about once a year I forget why it is useless and spend time
rediscovering it).

If we are not willing to separate the two types of backend writes, or
just omit the BAS ones altogether, then I would suggest we drop the
column entirely.

Would a patch to separate the backend writes into two columns be
entertained for 9.3?

Cheers,

Jeff

#9Greg Smith
greg@2ndquadrant.com
In reply to: Jeff Janes (#8)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

Jeff Janes wrote:

I'm finding the backend_writes column pretty unfortunate. The only
use I know of for it is to determine if the bgwriter is lagging
behind. Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes "by design" due to the use buffer access strategy
during bulk inserts.

If I'm running a tightly controlled benchmark on an otherwise unused
server and I know that no BAS is being used, then I can meaningfully
use backend_writes. That is a pretty limiting limit.

I don't think it's quite that bad in general; this presumes a moderate
amount of BAS writes relative to other activity. But I understand your
concern better now. I don't think the sorts of workloads you seem to
have a lot of were considered very carefully before here.

I think we should either create a separate column to segregate BAS
backend_writes, or just don't report them at all and report only the
non-BAS ones into pg_stat_bgwriter.

We can't not report them. One of the goals of pg_stat_bgwriter is to
account for all writes out of the buffer cache. If there enough BAS
writes on your system that them being lumped together is a serious
problem, having them go missing altogether would be even worse. And any
whacking around of pg_stat_bgwriter might as well fix that too.

Do you think you could put together a quick test case that's similar to
the ones you're seeing unfair accounting for here? This isn't quite
buggy behavior, but a solid example I could test against showing it's a
sketchy approach would be enough for me to incorporate a fix for it into
this suggested redesign.

#10Robert Haas
robertmhaas@gmail.com
In reply to: Jeff Janes (#8)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes "by design" due to the use buffer access strategy
during bulk inserts.

+1 for separating those.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11Greg Smith
greg@2ndquadrant.com
In reply to: Robert Haas (#10)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

Robert Haas wrote:

On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I'm finding the backend_writes column pretty unfortunate. The only
use I know of for it is to determine if the bgwriter is lagging
behind. Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes "by design" due to the use buffer access strategy
during bulk inserts.

+1 for separating those.

I'm tied up with some on-site things until Friday, can rev the patch
with this in mind (and clear some of my overall review work pile) then.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

#12Peter Geoghegan
peter@2ndquadrant.com
In reply to: Greg Smith (#1)
1 attachment(s)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On 16 January 2012 06:28, Greg Smith <greg@2ndquadrant.com> wrote:

One of the most useful bits of feedback on how well checkpoint I/O is going
is the amount of time taken to sync files to disk.  Right now the only way
to get that is to parse the logs.  The attached patch publishes the most
useful three bits of data you could only get from log_checkpoints before out
to pg_stat_bgwriter.  Easiest to just show an example:

This is a preliminary review of the patch. Attached is a revision of
the patch that clears up some minor bit-rot and style issues, just as
a convenience to everyone else who might like to try it out against
master who doesn't want to repeat my work. There were oid collisions
against master, and an upstream change in the pg_proc format with the
introduction of the "proleakproof" column. Regression tests pass (as
they must have in the prior revision, as the original authors modified
"expected/rules.out"). Note that I have deliberately avoided changing
the patch further than that, though I believe that further changes
will be necessary.

One beef that I have with the variable name m_write_ms is that "ms"
could equally well refer to microseconds or milliseconds, and these
mistakes are very common. Therefore, you might think about making the
fields of type "instr_time", which is consistent with some other such
fields used within the statistics collector - this takes care of
issues with overflows, portability and confusion over units, to as
great an extent as is possible. I'm not even confident that this is
the right thing though, because PgStat_StatFuncEntry, for example,
*does* use PgStat_Countuer as the type for time values, in its case
apparently *microseconds* rather than milliseconds, even though the
pg_stat_user_functions view displays values in millisecond (it's
incorrectly commented). Even the use of instr_time within some
pgstat.h structs sees the value converted to microseconds for
transmission to the collector. It might be that since we don't really
accumulate time values here, it's better to just directly represent
that value as PgStat_Counter, but in that case I'd still clearly mark
the value as being milliseconds. In any case, I would have the columns
in milliseconds, but not ending with *_ms within pg_stat_bgwriter,
for consistency with pg_stat_user_functions et al.

There is no reason to cast to uint64 when accumulating milliseconds,
as the lvalue is currently a PgStat_Counter.

Details of existing comment bug: The docs show that
pg_stat_*_functions accumulates time in units of milliseconds.
However, a PgStat_FunctionEntry payload is commented as sending the
time/self time to the stats collector in microseconds. So this
comment, in the existing code, is actually wrong:

PgStat_Counter f_time; /* times in microseconds */
PgStat_Counter f_time_self;
} PgStat_StatFuncEntry;

Can someone commit a fix for that please? It could easily result in a
user-visible bug. Mixing units, and representing them as simple
integers and totally eschewing type safety is sort of a pet peeve of
mine, but consistency with the existing code is more important.

My general impression of the patch's code quality is that it is more
or less idomatic, and, as Greg said, simple; it generalises from the
style of existing code in LogCheckpointEnd(), as well as existing
fields within the PgStat_MsgBgWriter and PgStat_GlobalStats structs .

As for the substance of the patch, I am in general agreement that this
is a good idea. Storing the statistics in pg_stat_bgwriter is a more
flexible approach than is immediately apparent. Users can usefully
calculate delta values, as part of the same process through which
checkpoint tuning is performed by comparing output from "select now(),
* from pg_stat_bgwriter" at different intervals. This also puts this
information within easy reach of monitoring tools.

So, I'd ask Greg and/or Jaime to produce a revision of the patch with
those concerns in mind, as well as fixing the md.c usage of
log_checkpoints. That variable is currently used in these places only:

src/backend/access/transam/xlog.c
80:bool log_checkpoints = false;
7694: * Note: because it is possible for log_checkpoints to change while a
7696: * log_checkpoints is currently off.
7835: if (log_checkpoints)
8035: if (log_checkpoints)
8229: * Note: because it is possible for log_checkpoints to change while a
8231: * log_checkpoints is currently off.
8236: if (log_checkpoints)
8301: if (log_checkpoints)
8305: ereport((log_checkpoints ? LOG : DEBUG2),

src/backend/storage/smgr/md.c
1097: if (log_checkpoints)
1105: if (log_checkpoints && (!INSTR_TIME_IS_ZERO(sync_start)))

I also agree with separating backend write times due to lagging
background writes, and the backend writes "by design", as described by
Jeff. I'm not inclined to worry too much about the usability issues
surrounding the relative values of things like the total checkpoint
time and total write and sync phase times matching up.

What do you intend to do about testing the log_checkpoints = 'off'
case? Can I help in some way? It seems pretty obvious to me that the
changes made are correct, apart from the fact that you omitted to
modify md.c . In particular, there are no pre or post conditions
involved with calling TimeCheckpointEnd() (formerly LogCheckpointEnd()
). This is evident from the following comment at transam/xlog.c:7709:

* Note: because it is possible for log_checkpoints to change while a
* checkpoint proceeds, we always accumulate stats, even if
* log_checkpoints is currently off.

It seems fairly silly to me to not expose this information if it's
invariably being collected anyway.

Cheers
--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

publish-sync-stats-v2.patchtext/x-patch; charset=US-ASCII; name=publish-sync-stats-v2.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
new file mode 100644
index cb13c8e..56a3ca4
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
*************** postgres: <replaceable>user</> <replacea
*** 1409,1414 ****
--- 1409,1442 ----
       </row>
  
       <row>
+       <entry><literal><function>pg_stat_get_bgwriter_write_ms()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+        Total amount of time that has been spent in the part of checkpoint
+        processing where files are written to disk, in milliseconds.
+       </entry>
+      </row>
+ 
+ 
+      <row>
+       <entry><literal><function>pg_stat_get_bgwriter_sync_ms()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+        Total amount of time that has been spent in the part of checkpoint
+        processing where files are synchronized to disk, in milliseconds.
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><literal><function>pg_stat_get_bgwriter_sync_files()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+        Total number of files that have been synchronized to disk during
+        checkpoint processing.
+       </entry>
+      </row>
+ 
+      <row>
        <entry><literal><function>pg_stat_get_wal_senders()</function></literal></entry>
        <entry><type>setof record</type></entry>
        <entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
new file mode 100644
index 266c0de..23f7cf4
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** LogCheckpointStart(int flags, bool resta
*** 7552,7561 ****
  }
  
  /*
!  * Log end of a checkpoint.
   */
  static void
! LogCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  				sync_secs,
--- 7552,7561 ----
  }
  
  /*
!  * Time and potentially log the end of a checkpoint.
   */
  static void
! TimeCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  				sync_secs,
*************** LogCheckpointEnd(bool restartpoint)
*** 7583,7588 ****
--- 7583,7599 ----
  						CheckpointStats.ckpt_sync_end_t,
  						&sync_secs, &sync_usecs);
  
+ 	BgWriterStats.m_write_ms = (uint64) write_secs * 1000 + write_usecs / 1000;
+ 	BgWriterStats.m_sync_ms = (uint64) sync_secs * 1000 + sync_usecs / 1000;
+ 	BgWriterStats.m_sync_files = CheckpointStats.ckpt_sync_rels;
+ 
+ 	/*
+ 	 * All of the published timing statistics are accounted for.  Only
+ 	 * continue if a log message is being written.
+ 	 */
+ 	if (!log_checkpoints)
+ 		return;
+ 
  	/*
  	 * Timing values returned from CheckpointStats are in microseconds.
  	 * Convert to the second plus microsecond form that TimestampDifference
*************** CreateCheckPoint(int flags)
*** 8032,8039 ****
  		TruncateSUBTRANS(GetOldestXmin(true, false));
  
  	/* All real work is done, but log before releasing lock. */
! 	if (log_checkpoints)
! 		LogCheckpointEnd(false);
  
  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
  									 NBuffers,
--- 8043,8049 ----
  		TruncateSUBTRANS(GetOldestXmin(true, false));
  
  	/* All real work is done, but log before releasing lock. */
! 	TimeCheckpointEnd(false);
  
  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
  									 NBuffers,
*************** CreateRestartPoint(int flags)
*** 8298,8305 ****
  		TruncateSUBTRANS(GetOldestXmin(true, false));
  
  	/* All real work is done, but log before releasing lock. */
! 	if (log_checkpoints)
! 		LogCheckpointEnd(true);
  
  	xtime = GetLatestXTime();
  	ereport((log_checkpoints ? LOG : DEBUG2),
--- 8308,8314 ----
  		TruncateSUBTRANS(GetOldestXmin(true, false));
  
  	/* All real work is done, but log before releasing lock. */
! 	TimeCheckpointEnd(true);
  
  	xtime = GetLatestXTime();
  	ereport((log_checkpoints ? LOG : DEBUG2),
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index 30b0bd0..3a5e1cd
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** CREATE VIEW pg_stat_bgwriter AS
*** 627,633 ****
          pg_stat_get_buf_written_backend() AS buffers_backend,
          pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
          pg_stat_get_buf_alloc() AS buffers_alloc,
!         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
  
  CREATE VIEW pg_user_mappings AS
      SELECT
--- 627,636 ----
          pg_stat_get_buf_written_backend() AS buffers_backend,
          pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
          pg_stat_get_buf_alloc() AS buffers_alloc,
!         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset,
!         pg_stat_get_bgwriter_write_ms() AS write_ms,
!         pg_stat_get_bgwriter_sync_ms() AS sync_ms,
!         pg_stat_get_bgwriter_sync_files() AS sync_files;
  
  CREATE VIEW pg_user_mappings AS
      SELECT
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
new file mode 100644
index a53fc52..ab58334
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*************** pgstat_recv_bgwriter(PgStat_MsgBgWriter
*** 4449,4454 ****
--- 4449,4457 ----
  	globalStats.buf_written_backend += msg->m_buf_written_backend;
  	globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
  	globalStats.buf_alloc += msg->m_buf_alloc;
+ 	globalStats.write_ms += msg->m_write_ms;
+ 	globalStats.sync_ms += msg->m_sync_ms;
+ 	globalStats.sync_files += msg->m_sync_files;
  }
  
  /* ----------
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
new file mode 100644
index 68b2527..e71d641
*** a/src/backend/utils/adt/pgstatfuncs.c
--- b/src/backend/utils/adt/pgstatfuncs.c
*************** extern Datum pg_stat_get_bgwriter_buf_wr
*** 89,94 ****
--- 89,97 ----
  extern Datum pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_write_ms(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_sync_ms(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_alloc(PG_FUNCTION_ARGS);
*************** pg_stat_get_bgwriter_maxwritten_clean(PG
*** 1388,1393 ****
--- 1391,1414 ----
  }
  
  Datum
+ pg_stat_get_bgwriter_write_ms(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->write_ms);
+ }
+ 
+ Datum
+ pg_stat_get_bgwriter_sync_ms(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->sync_ms);
+ }
+ 
+ Datum
+ pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->sync_files);
+ }
+ 
+ Datum
  pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
  {
  	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stat_reset_timestamp);
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 8700d0d..2be0e63
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 2773 ( pg_stat_get_bgw
*** 2666,2671 ****
--- 2666,2677 ----
  DESCR("statistics: number of times the bgwriter stopped processing when it had written too many buffers while cleaning");
  DATA(insert OID = 3075 ( pg_stat_get_bgwriter_stat_reset_time PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 1184 "" _null_ _null_ _null_ _null_	pg_stat_get_bgwriter_stat_reset_time _null_ _null_ _null_ ));
  DESCR("statistics: last reset for the bgwriter");
+ DATA(insert OID = 3160 ( pg_stat_get_bgwriter_write_ms PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_write_ms _null_ _null_ _null_ ));
+ DESCR("statistics: total amount of time the bgwriter has spent writing buffers to disk");
+ DATA(insert OID = 3161 ( pg_stat_get_bgwriter_sync_ms PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_ms _null_ _null_ _null_ ));
+ DESCR("statistics: total amount of time the bgwriter has spent syncing buffers to disk");
+ DATA(insert OID = 3162 ( pg_stat_get_bgwriter_sync_files PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_files _null_ _null_ _null_ ));
+ DESCR("statistics: total number of files the bgwriter has synced to disk");
  DATA(insert OID = 2775 ( pg_stat_get_buf_written_backend PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_written_backend _null_ _null_ _null_ ));
  DESCR("statistics: number of buffers written by backends");
  DATA(insert OID = 3063 ( pg_stat_get_buf_fsync_backend PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_fsync_backend _null_ _null_ _null_ ));
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
new file mode 100644
index 1281bd8..983013b
*** a/src/include/pgstat.h
--- b/src/include/pgstat.h
*************** typedef struct PgStat_MsgBgWriter
*** 364,369 ****
--- 364,372 ----
  	PgStat_Counter m_buf_written_backend;
  	PgStat_Counter m_buf_fsync_backend;
  	PgStat_Counter m_buf_alloc;
+ 	PgStat_Counter m_write_ms;
+ 	PgStat_Counter m_sync_ms;
+ 	PgStat_Counter m_sync_files;
  } PgStat_MsgBgWriter;
  
  /* ----------
*************** typedef struct PgStat_GlobalStats
*** 614,619 ****
--- 617,625 ----
  	PgStat_Counter buf_written_backend;
  	PgStat_Counter buf_fsync_backend;
  	PgStat_Counter buf_alloc;
+ 	PgStat_Counter write_ms;
+ 	PgStat_Counter sync_ms;
+ 	PgStat_Counter sync_files;
  	TimestampTz stat_reset_timestamp;
  } PgStat_GlobalStats;
  
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
new file mode 100644
index 0275a0e..e665052
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
*************** SELECT viewname, definition FROM pg_view
*** 1295,1301 ****
   pg_stat_activity                | SELECT s.datid, d.datname, s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.state_change, s.waiting, s.state, s.query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
   pg_stat_all_indexes             | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
   pg_stat_all_tables              | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
!  pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
   pg_stat_database                | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_temp_files(d.oid) AS temp_files, pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes, pg_stat_get_db_deadlocks(d.oid) AS deadlocks, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
   pg_stat_database_conflicts      | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
   pg_stat_replication             | SELECT s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(pid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.pid = w.pid));
--- 1295,1301 ----
   pg_stat_activity                | SELECT s.datid, d.datname, s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.state_change, s.waiting, s.state, s.query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
   pg_stat_all_indexes             | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
   pg_stat_all_tables              | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
!  pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset, pg_stat_get_bgwriter_write_ms() AS write_ms, pg_stat_get_bgwriter_sync_ms() AS sync_ms, pg_stat_get_bgwriter_sync_files() AS sync_files;
   pg_stat_database                | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_temp_files(d.oid) AS temp_files, pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes, pg_stat_get_db_deadlocks(d.oid) AS deadlocks, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
   pg_stat_database_conflicts      | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
   pg_stat_replication             | SELECT s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(pid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.pid = w.pid));
#13Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#12)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Wed, Feb 22, 2012 at 2:11 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

One beef that I have with the variable name m_write_ms is that "ms"
could equally well refer to microseconds or milliseconds, and these
mistakes are very common.

I would expect ms to mean milliseconds and us to mean microseconds.

Details of existing comment bug: The docs show that
pg_stat_*_functions accumulates time in units of milliseconds.
However, a PgStat_FunctionEntry payload is commented as sending the
time/self time to the stats collector in microseconds. So this
comment, in the existing code, is actually wrong:

       PgStat_Counter f_time;          /* times in microseconds */
       PgStat_Counter f_time_self;
} PgStat_StatFuncEntry;

I think that the comment is not wrong. The code that populates it
looks like this:

m_ent->f_time = INSTR_TIME_GET_MICROSEC(entry->f_counts.f_time);
m_ent->f_time_self = INSTR_TIME_GET_MICROSEC(entry->f_counts.f_time_self

If that's not producing microseconds, those macros are badly misnamed.
Note that the pg_stat_user_functions view divides the return value of
the function by 1000, which is why the view output is in milliseconds.

As for the substance of the patch, I am in general agreement that this
is a good idea. Storing the statistics in pg_stat_bgwriter is a more
flexible approach than is immediately apparent.  Users can usefully
calculate delta values, as part of the same process through which
checkpoint tuning is performed by comparing output from "select now(),
* from pg_stat_bgwriter" at different intervals. This also puts this
information within easy reach of monitoring tools.

On further thought, I'll revise the position I took upthread and
concede that write_ms and sync_ms are useful. Given values of the
stats counters at time A and time B, you can calculate what percentage
of that time was spent in the write phase of some checkpoint, the sync
phase of some checkpoint, and not in any checkpoint. But I'm still
pretty sketchy on sync_files. I can't see how you can do anything
useful with that.

So, I'd ask Greg and/or Jaime to produce a revision of the patch with
those concerns in mind, as well as fixing the md.c usage of
log_checkpoints.

Seems we still are waiting for an update of this patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#14Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#10)
1 attachment(s)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On 23 January 2012 02:08, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes "by design" due to the use buffer access strategy
during bulk inserts.

+1 for separating those.

I decided to have a go myself. Attached patch breaks out strategy
allocations in pg_stat_bgwriter, but not strategy writes. My thinking
is that this may serve to approximate non-BAS_NORMAL writes, with the
considerable advantage of not requiring that I work backwards to
figure out strategy from some block when backend-local syncing (yes,
syncing, not writing) a buffer to work out which strategy object
references the buffer. The bookkeeping that that would likely entail
seems to make it infeasible.

Incidentally, it seems Postgres doesn't currently record backend
writes when the buffer doesn't go on to be sync'd. That seems
problematic to me, or at the very least a misrepresentation, since
temporary tables will be written out by the backend for example. Not
sure if it's worth fixing, though I've added a comment to that effect
at the site of where backend_writes is bumped.

I have corrected the md.c bug. This previously would have prevented
the sync_files (number of relation segments synced) value from being
valid in non-log_checkpoints configurations.

I'm not currently confident that the strategy_alloc filed is a very
useful proxy for a strategy_backend_writes field. I think that rather
than bumping the strategy allocation count analogously to the way the
overall count is bumped (within StrategyGetBuffer()), I should have
bumped earlier within BufferAlloc() so that it'd count if the buffer
was requested with non-BAS_NORMAL strategy but was found in
shared_buffers (so control never got as far as StrategyGetBuffer() ).
That might make the value more closely line-up to the would-be value
of a strategy_backend_writes column. What do you think?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

publish-sync-stats-v3.patchtext/x-patch; charset=US-ASCII; name=publish-sync-stats-v3.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
new file mode 100644
index 840e54a..f714cb8
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
*************** SELECT pg_stat_get_backend_pid(s.backend
*** 806,811 ****
--- 806,818 ----
       the <function>pg_stat_get_buf_alloc</function> function.</entry>
       </row>
       <row>
+       <entry>buffers_strat_alloc</entry>
+       <entry><type>bigint</type></entry>
+       <entry>Number of buffers allocated with non-default buffer access strategy.
+      This value can also be returned by directly calling
+      the <function>pg_stat_get_buf_strat_alloc</function> function.</entry>
+      </row>
+      <row>
        <entry>stats_reset</entry>
        <entry><type>bigint</type></entry>
        <entry>The last time these statistics were reset.
*************** SELECT pg_stat_get_backend_pid(s.backend
*** 1703,1708 ****
--- 1710,1741 ----
        </entry>
       </row>
  
+      <row>
+       <entry><literal><function>pg_stat_get_bgwriter_write_time()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+ 	   Total amount of time that has been spent in the part of checkpoint
+ 	   processing where files are written to disk, in milliseconds.
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><literal><function>pg_stat_get_bgwriter_sync_time()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+ 	   Total amount of time that has been spent in the part of checkpoint
+ 	   processing where files are synchronized to disk, in milliseconds.
+       </entry>
+      </row>
+ 
+      <row>
+       <entry><literal><function>pg_stat_get_bgwriter_sync_files()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+ 	   Total number of files that have been synchronized to disk during
+ 	   checkpoint processing.
+       </entry>
+      </row>
  
       <row>
        <entry><literal><function>pg_stat_get_wal_senders()</function></literal></entry>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
new file mode 100644
index ff7f521..e481be3
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** LogCheckpointStart(int flags, bool resta
*** 7492,7501 ****
  }
  
  /*
!  * Log end of a checkpoint.
   */
  static void
! LogCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  				sync_secs,
--- 7492,7501 ----
  }
  
  /*
!  * Time and potentially log the end of a checkpoint.
   */
  static void
! TimeCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  				sync_secs,
*************** LogCheckpointEnd(bool restartpoint)
*** 7511,7519 ****
  
  	CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
  
- 	TimestampDifference(CheckpointStats.ckpt_start_t,
- 						CheckpointStats.ckpt_end_t,
- 						&total_secs, &total_usecs);
  
  	TimestampDifference(CheckpointStats.ckpt_write_t,
  						CheckpointStats.ckpt_sync_t,
--- 7511,7516 ----
*************** LogCheckpointEnd(bool restartpoint)
*** 7523,7528 ****
--- 7520,7541 ----
  						CheckpointStats.ckpt_sync_end_t,
  						&sync_secs, &sync_usecs);
  
+ 	/* Record checkpoint timing summary data. */
+ 	BgWriterStats.m_write_time = write_secs * 1000 + write_usecs / 1000;
+ 	BgWriterStats.m_sync_time = sync_secs * 1000 + sync_usecs / 1000;
+ 	BgWriterStats.m_sync_files = CheckpointStats.ckpt_sync_rels;
+ 
+ 	/*
+ 	 * All of the published timing statistics are accounted for.  Only
+ 	 * continue if a log message is to be written.
+ 	 */
+ 	if (!log_checkpoints)
+ 		return;
+ 
+ 	TimestampDifference(CheckpointStats.ckpt_start_t,
+ 						CheckpointStats.ckpt_end_t,
+ 						&total_secs, &total_usecs);
+ 
  	/*
  	 * Timing values returned from CheckpointStats are in microseconds.
  	 * Convert to the second plus microsecond form that TimestampDifference
*************** CreateCheckPoint(int flags)
*** 7971,7979 ****
  	if (!RecoveryInProgress())
  		TruncateSUBTRANS(GetOldestXmin(true, false));
  
! 	/* All real work is done, but log before releasing lock. */
! 	if (log_checkpoints)
! 		LogCheckpointEnd(false);
  
  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
  									 NBuffers,
--- 7984,7994 ----
  	if (!RecoveryInProgress())
  		TruncateSUBTRANS(GetOldestXmin(true, false));
  
! 	/*
! 	 * All real work is done, but record timings and potentially log before
! 	 * releasing lock.
! 	 */
! 	TimeCheckpointEnd(false);
  
  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
  									 NBuffers,
*************** CreateRestartPoint(int flags)
*** 8237,8245 ****
  	if (EnableHotStandby)
  		TruncateSUBTRANS(GetOldestXmin(true, false));
  
! 	/* All real work is done, but log before releasing lock. */
! 	if (log_checkpoints)
! 		LogCheckpointEnd(true);
  
  	xtime = GetLatestXTime();
  	ereport((log_checkpoints ? LOG : DEBUG2),
--- 8252,8262 ----
  	if (EnableHotStandby)
  		TruncateSUBTRANS(GetOldestXmin(true, false));
  
! 	/*
! 	 * All real work is done, but record timings and potentially log before
! 	 * releasing lock.
! 	 */
! 	TimeCheckpointEnd(true);
  
  	xtime = GetLatestXTime();
  	ereport((log_checkpoints ? LOG : DEBUG2),
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index ab594eb..23e7566
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** CREATE VIEW pg_stat_bgwriter AS
*** 652,658 ****
          pg_stat_get_buf_written_backend() AS buffers_backend,
          pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
          pg_stat_get_buf_alloc() AS buffers_alloc,
!         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
  
  CREATE VIEW pg_user_mappings AS
      SELECT
--- 652,662 ----
          pg_stat_get_buf_written_backend() AS buffers_backend,
          pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
          pg_stat_get_buf_alloc() AS buffers_alloc,
!         pg_stat_get_buf_strat_alloc() AS buffers_strategy_alloc,
!         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset,
!         pg_stat_get_bgwriter_write_time() AS write_time,
!         pg_stat_get_bgwriter_sync_time() AS sync_time,
!         pg_stat_get_bgwriter_sync_files() AS sync_files;
  
  CREATE VIEW pg_user_mappings AS
      SELECT
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
new file mode 100644
index c9473f7..9bdabbe
*** a/src/backend/postmaster/checkpointer.c
--- b/src/backend/postmaster/checkpointer.c
*************** ForwardFsyncRequest(RelFileNodeBackend r
*** 1064,1070 ****
  
  	LWLockAcquire(BgWriterCommLock, LW_EXCLUSIVE);
  
! 	/* Count all backend writes regardless of if they fit in the queue */
  	BgWriterShmem->num_backend_writes++;
  
  	/*
--- 1064,1074 ----
  
  	LWLockAcquire(BgWriterCommLock, LW_EXCLUSIVE);
  
! 	/*
! 	 * Count all backend writes regardless of if they fit in the queue.
! 	 * Technically this figure isn't quite correct, as some buffers will have
! 	 * been written without needing to be synced, but it hardly matters.
! 	 */
  	BgWriterShmem->num_backend_writes++;
  
  	/*
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
new file mode 100644
index a53fc52..9e5c4d2
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*************** pgstat_recv_bgwriter(PgStat_MsgBgWriter
*** 4449,4454 ****
--- 4449,4458 ----
  	globalStats.buf_written_backend += msg->m_buf_written_backend;
  	globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
  	globalStats.buf_alloc += msg->m_buf_alloc;
+ 	globalStats.buf_strat_alloc += msg->m_buf_strat_alloc;
+ 	globalStats.bgwriter_write += msg->m_write_time;
+ 	globalStats.bgwriter_sync += msg->m_sync_time;
+ 	globalStats.sync_files += msg->m_sync_files;
  }
  
  /* ----------
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
new file mode 100644
index 3924a51..30d0f97
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*************** BufferSync(int flags)
*** 1250,1256 ****
  	 * Note that we don't read the buffer alloc count here --- that should be
  	 * left untouched till the next BgBufferSync() call.
  	 */
! 	buf_id = StrategySyncStart(NULL, NULL);
  	num_to_scan = NBuffers;
  	num_written = 0;
  	while (num_to_scan-- > 0)
--- 1250,1256 ----
  	 * Note that we don't read the buffer alloc count here --- that should be
  	 * left untouched till the next BgBufferSync() call.
  	 */
! 	buf_id = StrategySyncStart(NULL, NULL, NULL);
  	num_to_scan = NBuffers;
  	num_written = 0;
  	while (num_to_scan-- > 0)
*************** BgBufferSync(void)
*** 1327,1332 ****
--- 1327,1333 ----
  	int			strategy_buf_id;
  	uint32		strategy_passes;
  	uint32		recent_alloc;
+ 	uint32		recent_strat_alloc;
  
  	/*
  	 * Information saved between calls so we can determine the strategy
*************** BgBufferSync(void)
*** 1364,1373 ****
  	 * Find out where the freelist clock sweep currently is, and how many
  	 * buffer allocations have happened since our last call.
  	 */
! 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
  
  	/* Report buffer alloc counts to pgstat */
  	BgWriterStats.m_buf_alloc += recent_alloc;
  
  	/*
  	 * If we're not running the LRU scan, just stop after doing the stats
--- 1365,1376 ----
  	 * Find out where the freelist clock sweep currently is, and how many
  	 * buffer allocations have happened since our last call.
  	 */
! 	strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc,
! 											&recent_strat_alloc);
  
  	/* Report buffer alloc counts to pgstat */
  	BgWriterStats.m_buf_alloc += recent_alloc;
+ 	BgWriterStats.m_buf_strat_alloc += recent_strat_alloc;
  
  	/*
  	 * If we're not running the LRU scan, just stop after doing the stats
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
new file mode 100644
index 3e62448..e37d83c
*** a/src/backend/storage/buffer/freelist.c
--- b/src/backend/storage/buffer/freelist.c
*************** typedef struct
*** 41,46 ****
--- 41,47 ----
  	 */
  	uint32		completePasses; /* Complete cycles of the clock sweep */
  	uint32		numBufferAllocs;	/* Buffers allocated since last reset */
+ 	uint32		numStratBufferAllocs;	/* of which are non-BAS_NORMAL */
  } BufferStrategyControl;
  
  /* Pointers to shared state */
*************** StrategyGetBuffer(BufferAccessStrategy s
*** 134,139 ****
--- 135,143 ----
  	 */
  	StrategyControl->numBufferAllocs++;
  
+ 	if (strategy && strategy->btype != BAS_NORMAL)
+ 		StrategyControl->numStratBufferAllocs++;
+ 
  	/*
  	 * Try to get a buffer from the freelist.  Note that the freeNext fields
  	 * are considered to be protected by the BufFreelistLock not the
*************** StrategyFreeBuffer(volatile BufferDesc *
*** 252,258 ****
   * being read.
   */
  int
! StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc)
  {
  	int			result;
  
--- 256,263 ----
   * being read.
   */
  int
! StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc,
! 					uint32 *num_buf_strat_alloc)
  {
  	int			result;
  
*************** StrategySyncStart(uint32 *complete_passe
*** 265,270 ****
--- 270,280 ----
  		*num_buf_alloc = StrategyControl->numBufferAllocs;
  		StrategyControl->numBufferAllocs = 0;
  	}
+ 	if (num_buf_strat_alloc)
+ 	{
+ 		*num_buf_strat_alloc = StrategyControl->numStratBufferAllocs;
+ 		StrategyControl->numStratBufferAllocs= 0;
+ 	}
  	LWLockRelease(BufFreelistLock);
  	return result;
  }
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
new file mode 100644
index bfc9f06..eeb20fd
*** a/src/backend/storage/smgr/md.c
--- b/src/backend/storage/smgr/md.c
*************** mdsync(void)
*** 1094,1120 ****
  							  entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
  								   false, EXTENSION_RETURN_NULL);
  
! 				if (log_checkpoints)
! 					INSTR_TIME_SET_CURRENT(sync_start);
! 				else
! 					INSTR_TIME_SET_ZERO(sync_start);
  
  				if (seg != NULL &&
  					FileSync(seg->mdfd_vfd) >= 0)
  				{
! 					if (log_checkpoints && (!INSTR_TIME_IS_ZERO(sync_start)))
! 					{
! 						INSTR_TIME_SET_CURRENT(sync_end);
! 						sync_diff = sync_end;
! 						INSTR_TIME_SUBTRACT(sync_diff, sync_start);
! 						elapsed = INSTR_TIME_GET_MICROSEC(sync_diff);
! 						if (elapsed > longest)
! 							longest = elapsed;
! 						total_elapsed += elapsed;
! 						processed++;
  						elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%.3f msec",
  							 processed, FilePathName(seg->mdfd_vfd), (double) elapsed / 1000);
- 					}
  
  					break;		/* success; break out of retry loop */
  				}
--- 1094,1115 ----
  							  entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
  								   false, EXTENSION_RETURN_NULL);
  
! 				INSTR_TIME_SET_CURRENT(sync_start);
  
  				if (seg != NULL &&
  					FileSync(seg->mdfd_vfd) >= 0)
  				{
! 					INSTR_TIME_SET_CURRENT(sync_end);
! 					sync_diff = sync_end;
! 					INSTR_TIME_SUBTRACT(sync_diff, sync_start);
! 					elapsed = INSTR_TIME_GET_MICROSEC(sync_diff);
! 					if (elapsed > longest)
! 						longest = elapsed;
! 					total_elapsed += elapsed;
! 					processed++;
! 					if (log_checkpoints)
  						elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%.3f msec",
  							 processed, FilePathName(seg->mdfd_vfd), (double) elapsed / 1000);
  
  					break;		/* success; break out of retry loop */
  				}
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
new file mode 100644
index 68b2527..83b49af
*** a/src/backend/utils/adt/pgstatfuncs.c
--- b/src/backend/utils/adt/pgstatfuncs.c
*************** extern Datum pg_stat_get_bgwriter_buf_wr
*** 89,97 ****
--- 89,101 ----
  extern Datum pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_write_time(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_sync_time(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_alloc(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_buf_strat_alloc(PG_FUNCTION_ARGS);
  
  extern Datum pg_stat_get_xact_numscans(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_xact_tuples_returned(PG_FUNCTION_ARGS);
*************** pg_stat_get_bgwriter_maxwritten_clean(PG
*** 1388,1393 ****
--- 1392,1415 ----
  }
  
  Datum
+ pg_stat_get_bgwriter_write_time(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->bgwriter_write);
+ }
+ 
+ Datum
+ pg_stat_get_bgwriter_sync_time(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->bgwriter_sync);
+ }
+ 
+ Datum
+ pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->sync_files);
+ }
+ 
+ Datum
  pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
  {
  	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stat_reset_timestamp);
*************** pg_stat_get_buf_alloc(PG_FUNCTION_ARGS)
*** 1412,1417 ****
--- 1434,1445 ----
  }
  
  Datum
+ pg_stat_get_buf_strat_alloc(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->buf_strat_alloc);
+ }
+ 
+ Datum
  pg_stat_get_xact_numscans(PG_FUNCTION_ARGS)
  {
  	Oid			relid = PG_GETARG_OID(0);
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 2db8489..8320f27
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 2773 ( pg_stat_get_bgw
*** 2674,2685 ****
--- 2674,2693 ----
  DESCR("statistics: number of times the bgwriter stopped processing when it had written too many buffers while cleaning");
  DATA(insert OID = 3075 ( pg_stat_get_bgwriter_stat_reset_time PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 1184 "" _null_ _null_ _null_ _null_	pg_stat_get_bgwriter_stat_reset_time _null_ _null_ _null_ ));
  DESCR("statistics: last reset for the bgwriter");
+ DATA(insert OID = 3160 ( pg_stat_get_bgwriter_write_time PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_write_time _null_ _null_ _null_ ));
+ DESCR("statistics: total amount of time the bgwriter has spent writing buffers to disk");
+ DATA(insert OID = 3161 ( pg_stat_get_bgwriter_sync_time PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_time _null_ _null_ _null_ ));
+ DESCR("statistics: total amount of time the bgwriter has spent syncing buffers to disk");
+ DATA(insert OID = 3166 ( pg_stat_get_bgwriter_sync_files PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_files _null_ _null_ _null_ ));
+ DESCR("statistics: total number of files the bgwriter has synced to disk");
  DATA(insert OID = 2775 ( pg_stat_get_buf_written_backend PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_written_backend _null_ _null_ _null_ ));
  DESCR("statistics: number of buffers written by backends");
  DATA(insert OID = 3063 ( pg_stat_get_buf_fsync_backend PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_fsync_backend _null_ _null_ _null_ ));
  DESCR("statistics: number of backend buffer writes that did their own fsync");
  DATA(insert OID = 2859 ( pg_stat_get_buf_alloc			PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_alloc _null_ _null_ _null_ ));
  DESCR("statistics: number of buffer allocations");
+ DATA(insert OID = 3167 ( pg_stat_get_buf_strat_alloc			PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_strat_alloc _null_ _null_ _null_ ));
+ DESCR("statistics: number of strategy buffer allocations");
  
  DATA(insert OID = 2978 (  pg_stat_get_function_calls		PGNSP PGUID 12 1 0 0 0 f f f f t f s 1 0 20 "26" _null_ _null_ _null_ _null_ pg_stat_get_function_calls _null_ _null_ _null_ ));
  DESCR("statistics: number of function calls");
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
new file mode 100644
index 1281bd8..161eea2
*** a/src/include/pgstat.h
--- b/src/include/pgstat.h
*************** typedef struct PgStat_MsgBgWriter
*** 364,369 ****
--- 364,373 ----
  	PgStat_Counter m_buf_written_backend;
  	PgStat_Counter m_buf_fsync_backend;
  	PgStat_Counter m_buf_alloc;
+ 	PgStat_Counter m_buf_strat_alloc;
+ 	PgStat_Counter m_write_time;
+ 	PgStat_Counter m_sync_time;
+ 	PgStat_Counter m_sync_files;
  } PgStat_MsgBgWriter;
  
  /* ----------
*************** typedef struct PgStat_GlobalStats
*** 614,619 ****
--- 618,628 ----
  	PgStat_Counter buf_written_backend;
  	PgStat_Counter buf_fsync_backend;
  	PgStat_Counter buf_alloc;
+ 	PgStat_Counter buf_strat_alloc;
+ 	PgStat_Counter bgwriter_write;	/* times in milliseconds */
+ 	PgStat_Counter bgwriter_sync;
+ 
+ 	PgStat_Counter sync_files;
  	TimestampTz stat_reset_timestamp;
  } PgStat_GlobalStats;
  
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
new file mode 100644
index e43719e..9f73b30
*** a/src/include/storage/buf_internals.h
--- b/src/include/storage/buf_internals.h
*************** extern void StrategyFreeBuffer(volatile
*** 187,193 ****
  extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
  					 volatile BufferDesc *buf);
  
! extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
  extern Size StrategyShmemSize(void);
  extern void StrategyInitialize(bool init);
  
--- 187,195 ----
  extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
  					 volatile BufferDesc *buf);
  
! extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc,
! 						uint32 *num_buf_strat_alloc);
! 
  extern Size StrategyShmemSize(void);
  extern void StrategyInitialize(bool init);
  
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
new file mode 100644
index aaf0cca..a0fd6e7
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
*************** SELECT viewname, definition FROM pg_view
*** 1295,1301 ****
   pg_stat_activity                | SELECT s.datid, d.datname, s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.state_change, s.waiting, s.state, s.query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
   pg_stat_all_indexes             | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
   pg_stat_all_tables              | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
!  pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
   pg_stat_database                | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_temp_files(d.oid) AS temp_files, pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes, pg_stat_get_db_deadlocks(d.oid) AS deadlocks, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
   pg_stat_database_conflicts      | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
   pg_stat_replication             | SELECT s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(pid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.pid = w.pid));
--- 1295,1301 ----
   pg_stat_activity                | SELECT s.datid, d.datname, s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.state_change, s.waiting, s.state, s.query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
   pg_stat_all_indexes             | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
   pg_stat_all_tables              | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
!  pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_buf_strat_alloc() AS buffers_strategy_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset, pg_stat_get_bgwriter_write_time() AS write_time, pg_stat_get_bgwriter_sync_time() AS sync_time, pg_stat_get_bgwriter_sync_files() AS sync_files;
   pg_stat_database                | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_temp_files(d.oid) AS temp_files, pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes, pg_stat_get_db_deadlocks(d.oid) AS deadlocks, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
   pg_stat_database_conflicts      | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
   pg_stat_replication             | SELECT s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(pid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.pid = w.pid));
#15Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#14)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Thu, Mar 22, 2012 at 6:07 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 23 January 2012 02:08, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

I'm finding the backend_writes column pretty unfortunate.  The only
use I know of for it is to determine if the bgwriter is lagging
behind.  Yet it doesn't serve even this purpose because it lumps
together the backend writes due to lagging background writes, and the
backend writes "by design" due to the use buffer access strategy
during bulk inserts.

+1 for separating those.

I decided to have a go myself. Attached patch breaks out strategy
allocations in pg_stat_bgwriter, but not strategy writes. My thinking
is that this may serve to approximate non-BAS_NORMAL writes, with the
considerable advantage of not requiring that I work backwards to
figure out strategy from some block when backend-local syncing (yes,
syncing, not writing) a buffer to work out which strategy object
references the buffer. The bookkeeping that that would likely entail
seems to make it infeasible.

Incidentally, it seems Postgres doesn't currently record backend
writes when the buffer doesn't go on to be sync'd. That seems
problematic to me, or at the very least a misrepresentation, since
temporary tables will be written out by the backend for example. Not
sure if it's worth fixing, though I've added a comment to that effect
at the site of where backend_writes is bumped.

I have corrected the md.c bug. This previously would have prevented
the sync_files (number of relation segments synced) value from being
valid in non-log_checkpoints configurations.

I'm not currently confident that the strategy_alloc filed is a very
useful proxy for a strategy_backend_writes field. I think that rather
than bumping the strategy allocation count analogously to the way the
overall count is bumped (within StrategyGetBuffer()), I should have
bumped earlier within BufferAlloc() so that it'd count if the buffer
was requested with non-BAS_NORMAL strategy but was found in
shared_buffers (so control never got as far as StrategyGetBuffer() ).
That might make the value more closely line-up to the would-be value
of a strategy_backend_writes column. What do you think?

Although I liked the idea of separating this out, I wasn't thinking we
should do it as part of this patch. It seems like an independent
project. At any rate, I strongly agree that counting the number of
strategy allocations is not really a viable proxy for counting the
number of backend writes. You can't know how many of those actually
got dirtied.

Since any backend write is necessarily the result of that backend
trying to allocate a buffer, I think maybe we should just count
whether the number of times it was trying to allocate a buffer *using
a BAS* vs. the number of times it was trying to allocate a buffer *not
using a BAS*. That is, decide whether or not it's a "strategy write"
not based on whether the buffer came in via a strategy allocation, but
rather based on whether it's going out as a result of a strategy
allocation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#15)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On 28 March 2012 15:23, Robert Haas <robertmhaas@gmail.com> wrote:

 At any rate, I strongly agree that counting the number of
strategy allocations is not really a viable proxy for counting the
number of backend writes.  You can't know how many of those actually
got dirtied.

Sure.

Since any backend write is necessarily the result of that backend
trying to allocate a buffer, I think maybe we should just count
whether the number of times it was trying to allocate a buffer *using
a BAS* vs. the number of times it was trying to allocate a buffer *not
using a BAS*.  That is, decide whether or not it's a "strategy write"
not based on whether the buffer came in via a strategy allocation, but
rather based on whether it's going out as a result of a strategy
allocation.

I'm not quite sure I understand what you mean here. Are you suggesting
that I produce a revision that bumps beside FlushBuffer() in
BufferAlloc(), as a dirty page is evicted/written, while breaking the
figure out into != BAS_NORMAL and == BAS_NORMAL figures? Would both
figures be presented as separate columns within pg_stat_bgwriter?
--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

#17Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#16)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Sat, Mar 31, 2012 at 5:34 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

Since any backend write is necessarily the result of that backend
trying to allocate a buffer, I think maybe we should just count
whether the number of times it was trying to allocate a buffer *using
a BAS* vs. the number of times it was trying to allocate a buffer *not
using a BAS*.  That is, decide whether or not it's a "strategy write"
not based on whether the buffer came in via a strategy allocation, but
rather based on whether it's going out as a result of a strategy
allocation.

I'm not quite sure I understand what you mean here. Are you suggesting
that I produce a revision that bumps beside FlushBuffer() in
BufferAlloc(), as a dirty page is evicted/written, while breaking the
figure out into != BAS_NORMAL and == BAS_NORMAL figures? Would both
figures be presented as separate columns within pg_stat_bgwriter?

Well, for 9.2, I am asking that you rip all that stuff out of the
patch altogether so we can focus on the stuff that was in the original
patch.

For 9.3, yes, that's what I'm suggesting, although I'll defer to you,
Jeff, and others on whether it's a *good* suggestion.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Peter Geoghegan
peter@2ndquadrant.com
In reply to: Robert Haas (#17)
1 attachment(s)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On 3 April 2012 12:11, Robert Haas <robertmhaas@gmail.com> wrote:

Well, for 9.2, I am asking that you rip all that stuff out of the
patch altogether so we can focus on the stuff that was in the original
patch.

Given how we're pushed for time, I'm inclined to agree that that is
the best course of action.

Attached revision does not attempt to do anything with strategy
writes/allocations.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachments:

bgwriter_stats_2012_04_05.patchapplication/octet-stream; name=bgwriter_stats_2012_04_05.patchDownload
diff doc/src/sgml/monitoring.sgml
index 840e54a..a85b043
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
*************** SELECT pg_stat_get_backend_pid(s.backend
*** 1703,1708 ****
--- 1703,1734 ----
        </entry>
       </row>

+      <row>
+       <entry><literal><function>pg_stat_get_bgwriter_write_time()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+        Total amount of time that has been spent in the portion of checkpoint
+        processing where files are written to disk, in milliseconds.
+       </entry>
+      </row>
+
+      <row>
+       <entry><literal><function>pg_stat_get_bgwriter_sync_time()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+        Total amount of time that has been spent in the portion of checkpoint
+        processing where files are synchronized to disk, in milliseconds.
+       </entry>
+      </row>
+
+      <row>
+       <entry><literal><function>pg_stat_get_bgwriter_sync_files()</function></literal></entry>
+       <entry><type>bigint</type></entry>
+       <entry>
+        Total number of files that have been synchronized to disk during
+        checkpoint processing.
+       </entry>
+      </row>

       <row>
        <entry><literal><function>pg_stat_get_wal_senders()</function></literal></entry>
diff src/backend/access/transam/xlog.c
index d268014..388cb66
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*************** LogCheckpointStart(int flags, bool resta
*** 7492,7501 ****
  }

  /*
!  * Log end of a checkpoint.
   */
  static void
! LogCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  				sync_secs,
--- 7492,7501 ----
  }

  /*
!  * Time the end of a checkpoint, potentially logging it.
   */
  static void
! TimeCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  				sync_secs,
*************** LogCheckpointEnd(bool restartpoint)
*** 7511,7519 ****

  	CheckpointStats.ckpt_end_t = GetCurrentTimestamp();

- 	TimestampDifference(CheckpointStats.ckpt_start_t,
- 						CheckpointStats.ckpt_end_t,
- 						&total_secs, &total_usecs);

  	TimestampDifference(CheckpointStats.ckpt_write_t,
  						CheckpointStats.ckpt_sync_t,
--- 7511,7516 ----
*************** LogCheckpointEnd(bool restartpoint)
*** 7523,7528 ****
--- 7520,7541 ----
  						CheckpointStats.ckpt_sync_end_t,
  						&sync_secs, &sync_usecs);

+ 	/* Record checkpoint timing summary data. */
+ 	BgWriterStats.m_write_time = write_secs * 1000 + write_usecs / 1000;
+ 	BgWriterStats.m_sync_time = sync_secs * 1000 + sync_usecs / 1000;
+ 	BgWriterStats.m_sync_files = CheckpointStats.ckpt_sync_rels;
+
+ 	/*
+ 	 * All of the published timing statistics are accounted for.  Only
+ 	 * continue if a log message is to be written.
+ 	 */
+ 	if (!log_checkpoints)
+ 		return;
+
+ 	TimestampDifference(CheckpointStats.ckpt_start_t,
+ 						CheckpointStats.ckpt_end_t,
+ 						&total_secs, &total_usecs);
+
  	/*
  	 * Timing values returned from CheckpointStats are in microseconds.
  	 * Convert to the second plus microsecond form that TimestampDifference
*************** CreateCheckPoint(int flags)
*** 7633,7639 ****
  	 *
  	 * Note: because it is possible for log_checkpoints to change while a
  	 * checkpoint proceeds, we always accumulate stats, even if
! 	 * log_checkpoints is currently off.
  	 */
  	MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
  	CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
--- 7646,7653 ----
  	 *
  	 * Note: because it is possible for log_checkpoints to change while a
  	 * checkpoint proceeds, we always accumulate stats, even if
! 	 * log_checkpoints is currently off. Furthermore, the statistics are
! 	 * required by the statistics collector.
  	 */
  	MemSet(&CheckpointStats, 0, sizeof(CheckpointStats));
  	CheckpointStats.ckpt_start_t = GetCurrentTimestamp();
*************** CreateCheckPoint(int flags)
*** 7971,7979 ****
  	if (!RecoveryInProgress())
  		TruncateSUBTRANS(GetOldestXmin(true, false));

! 	/* All real work is done, but log before releasing lock. */
! 	if (log_checkpoints)
! 		LogCheckpointEnd(false);

  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
  									 NBuffers,
--- 7985,7995 ----
  	if (!RecoveryInProgress())
  		TruncateSUBTRANS(GetOldestXmin(true, false));

! 	/*
! 	 * All real work is done, but record timings and potentially log before
! 	 * releasing lock.
! 	 */
! 	TimeCheckpointEnd(false);

  	TRACE_POSTGRESQL_CHECKPOINT_DONE(CheckpointStats.ckpt_bufs_written,
  									 NBuffers,
*************** CreateRestartPoint(int flags)
*** 8237,8245 ****
  	if (EnableHotStandby)
  		TruncateSUBTRANS(GetOldestXmin(true, false));

! 	/* All real work is done, but log before releasing lock. */
! 	if (log_checkpoints)
! 		LogCheckpointEnd(true);

  	xtime = GetLatestXTime();
  	ereport((log_checkpoints ? LOG : DEBUG2),
--- 8253,8263 ----
  	if (EnableHotStandby)
  		TruncateSUBTRANS(GetOldestXmin(true, false));

! 	/*
! 	 * All real work is done, but record timings and potentially log before
! 	 * releasing lock.
! 	 */
! 	TimeCheckpointEnd(true);

  	xtime = GetLatestXTime();
  	ereport((log_checkpoints ? LOG : DEBUG2),
diff src/backend/catalog/system_views.sql
index ab594eb..1ab56bf
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*************** CREATE VIEW pg_stat_bgwriter AS
*** 652,658 ****
          pg_stat_get_buf_written_backend() AS buffers_backend,
          pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
          pg_stat_get_buf_alloc() AS buffers_alloc,
!         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;

  CREATE VIEW pg_user_mappings AS
      SELECT
--- 652,661 ----
          pg_stat_get_buf_written_backend() AS buffers_backend,
          pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync,
          pg_stat_get_buf_alloc() AS buffers_alloc,
!         pg_stat_get_bgwriter_stat_reset_time() AS stats_reset,
!         pg_stat_get_bgwriter_write_time() AS write_time,
!         pg_stat_get_bgwriter_sync_time() AS sync_time,
!         pg_stat_get_bgwriter_sync_files() AS sync_files;

  CREATE VIEW pg_user_mappings AS
      SELECT
diff src/backend/postmaster/checkpointer.c
index c9473f7..9bdabbe
*** a/src/backend/postmaster/checkpointer.c
--- b/src/backend/postmaster/checkpointer.c
*************** ForwardFsyncRequest(RelFileNodeBackend r
*** 1064,1070 ****

  	LWLockAcquire(BgWriterCommLock, LW_EXCLUSIVE);

! 	/* Count all backend writes regardless of if they fit in the queue */
  	BgWriterShmem->num_backend_writes++;

  	/*
--- 1064,1074 ----

  	LWLockAcquire(BgWriterCommLock, LW_EXCLUSIVE);

! 	/*
! 	 * Count all backend writes regardless of if they fit in the queue.
! 	 * Technically this figure isn't quite correct, as some buffers will have
! 	 * been written without needing to be synced, but it hardly matters.
! 	 */
  	BgWriterShmem->num_backend_writes++;

  	/*
diff src/backend/postmaster/pgstat.c
index a53fc52..7121b0d
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
*************** pgstat_recv_bgwriter(PgStat_MsgBgWriter
*** 4449,4454 ****
--- 4449,4457 ----
  	globalStats.buf_written_backend += msg->m_buf_written_backend;
  	globalStats.buf_fsync_backend += msg->m_buf_fsync_backend;
  	globalStats.buf_alloc += msg->m_buf_alloc;
+ 	globalStats.bgwriter_write += msg->m_write_time;
+ 	globalStats.bgwriter_sync += msg->m_sync_time;
+ 	globalStats.sync_files += msg->m_sync_files;
  }

  /* ----------
diff src/backend/storage/smgr/md.c
index bfc9f06..eeb20fd
*** a/src/backend/storage/smgr/md.c
--- b/src/backend/storage/smgr/md.c
*************** mdsync(void)
*** 1094,1120 ****
  							  entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
  								   false, EXTENSION_RETURN_NULL);

! 				if (log_checkpoints)
! 					INSTR_TIME_SET_CURRENT(sync_start);
! 				else
! 					INSTR_TIME_SET_ZERO(sync_start);

  				if (seg != NULL &&
  					FileSync(seg->mdfd_vfd) >= 0)
  				{
! 					if (log_checkpoints && (!INSTR_TIME_IS_ZERO(sync_start)))
! 					{
! 						INSTR_TIME_SET_CURRENT(sync_end);
! 						sync_diff = sync_end;
! 						INSTR_TIME_SUBTRACT(sync_diff, sync_start);
! 						elapsed = INSTR_TIME_GET_MICROSEC(sync_diff);
! 						if (elapsed > longest)
! 							longest = elapsed;
! 						total_elapsed += elapsed;
! 						processed++;
  						elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%.3f msec",
  							 processed, FilePathName(seg->mdfd_vfd), (double) elapsed / 1000);
- 					}

  					break;		/* success; break out of retry loop */
  				}
--- 1094,1115 ----
  							  entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
  								   false, EXTENSION_RETURN_NULL);

! 				INSTR_TIME_SET_CURRENT(sync_start);

  				if (seg != NULL &&
  					FileSync(seg->mdfd_vfd) >= 0)
  				{
! 					INSTR_TIME_SET_CURRENT(sync_end);
! 					sync_diff = sync_end;
! 					INSTR_TIME_SUBTRACT(sync_diff, sync_start);
! 					elapsed = INSTR_TIME_GET_MICROSEC(sync_diff);
! 					if (elapsed > longest)
! 						longest = elapsed;
! 					total_elapsed += elapsed;
! 					processed++;
! 					if (log_checkpoints)
  						elog(DEBUG1, "checkpoint sync: number=%d file=%s time=%.3f msec",
  							 processed, FilePathName(seg->mdfd_vfd), (double) elapsed / 1000);

  					break;		/* success; break out of retry loop */
  				}
diff src/backend/utils/adt/pgstatfuncs.c
index 68b2527..24e147f
*** a/src/backend/utils/adt/pgstatfuncs.c
--- b/src/backend/utils/adt/pgstatfuncs.c
*************** extern Datum pg_stat_get_bgwriter_buf_wr
*** 89,94 ****
--- 89,97 ----
  extern Datum pg_stat_get_bgwriter_buf_written_clean(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_bgwriter_maxwritten_clean(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_write_time(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_sync_time(PG_FUNCTION_ARGS);
+ extern Datum pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_written_backend(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_fsync_backend(PG_FUNCTION_ARGS);
  extern Datum pg_stat_get_buf_alloc(PG_FUNCTION_ARGS);
*************** pg_stat_get_bgwriter_maxwritten_clean(PG
*** 1388,1393 ****
--- 1391,1414 ----
  }

  Datum
+ pg_stat_get_bgwriter_write_time(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->bgwriter_write);
+ }
+
+ Datum
+ pg_stat_get_bgwriter_sync_time(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->bgwriter_sync);
+ }
+
+ Datum
+ pg_stat_get_bgwriter_sync_files(PG_FUNCTION_ARGS)
+ {
+ 	PG_RETURN_INT64(pgstat_fetch_global()->sync_files);
+ }
+
+ Datum
  pg_stat_get_bgwriter_stat_reset_time(PG_FUNCTION_ARGS)
  {
  	PG_RETURN_TIMESTAMPTZ(pgstat_fetch_global()->stat_reset_timestamp);
diff src/include/catalog/pg_proc.h
index 49b0754..51b83ac
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 2773 ( pg_stat_get_bgw
*** 2674,2679 ****
--- 2674,2685 ----
  DESCR("statistics: number of times the bgwriter stopped processing when it had written too many buffers while cleaning");
  DATA(insert OID = 3075 ( pg_stat_get_bgwriter_stat_reset_time PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 1184 "" _null_ _null_ _null_ _null_	pg_stat_get_bgwriter_stat_reset_time _null_ _null_ _null_ ));
  DESCR("statistics: last reset for the bgwriter");
+ DATA(insert OID = 3160 ( pg_stat_get_bgwriter_write_time PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_write_time _null_ _null_ _null_ ));
+ DESCR("statistics: total amount of time the bgwriter has spent writing buffers to disk");
+ DATA(insert OID = 3161 ( pg_stat_get_bgwriter_sync_time PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_time _null_ _null_ _null_ ));
+ DESCR("statistics: total amount of time the bgwriter has spent syncing buffers to disk");
+ DATA(insert OID = 3166 ( pg_stat_get_bgwriter_sync_files PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_bgwriter_sync_files _null_ _null_ _null_ ));
+ DESCR("statistics: total number of files the bgwriter has synced to disk");
  DATA(insert OID = 2775 ( pg_stat_get_buf_written_backend PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_written_backend _null_ _null_ _null_ ));
  DESCR("statistics: number of buffers written by backends");
  DATA(insert OID = 3063 ( pg_stat_get_buf_fsync_backend PGNSP PGUID 12 1 0 0 0 f f f f t f s 0 0 20 "" _null_ _null_ _null_ _null_ pg_stat_get_buf_fsync_backend _null_ _null_ _null_ ));
diff src/include/pgstat.h
index 1281bd8..ec1aa8c
*** a/src/include/pgstat.h
--- b/src/include/pgstat.h
*************** typedef struct PgStat_MsgBgWriter
*** 364,369 ****
--- 364,372 ----
  	PgStat_Counter m_buf_written_backend;
  	PgStat_Counter m_buf_fsync_backend;
  	PgStat_Counter m_buf_alloc;
+ 	PgStat_Counter m_write_time;
+ 	PgStat_Counter m_sync_time;
+ 	PgStat_Counter m_sync_files;
  } PgStat_MsgBgWriter;

  /* ----------
*************** typedef struct PgStat_GlobalStats
*** 614,619 ****
--- 617,626 ----
  	PgStat_Counter buf_written_backend;
  	PgStat_Counter buf_fsync_backend;
  	PgStat_Counter buf_alloc;
+ 	PgStat_Counter bgwriter_write;	/* times in milliseconds */
+ 	PgStat_Counter bgwriter_sync;
+
+ 	PgStat_Counter sync_files;
  	TimestampTz stat_reset_timestamp;
  } PgStat_GlobalStats;

diff src/test/regress/expected/rules.out
index aaf0cca..219be7c
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
*************** SELECT viewname, definition FROM pg_view
*** 1295,1301 ****
   pg_stat_activity                | SELECT s.datid, d.datname, s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.state_change, s.waiting, s.state, s.query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
   pg_stat_all_indexes             | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
   pg_stat_all_tables              | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
!  pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset;
   pg_stat_database                | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_temp_files(d.oid) AS temp_files, pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes, pg_stat_get_db_deadlocks(d.oid) AS deadlocks, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
   pg_stat_database_conflicts      | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
   pg_stat_replication             | SELECT s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(pid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.pid = w.pid));
--- 1295,1301 ----
   pg_stat_activity                | SELECT s.datid, d.datname, s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, s.xact_start, s.query_start, s.state_change, s.waiting, s.state, s.query FROM pg_database d, pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u WHERE ((s.datid = d.oid) AND (s.usesysid = u.oid));
   pg_stat_all_indexes             | SELECT c.oid AS relid, i.oid AS indexrelid, n.nspname AS schemaname, c.relname, i.relname AS indexrelname, pg_stat_get_numscans(i.oid) AS idx_scan, pg_stat_get_tuples_returned(i.oid) AS idx_tup_read, pg_stat_get_tuples_fetched(i.oid) AS idx_tup_fetch FROM (((pg_class c JOIN pg_index x ON ((c.oid = x.indrelid))) JOIN pg_class i ON ((i.oid = x.indexrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"]));
   pg_stat_all_tables              | SELECT c.oid AS relid, n.nspname AS schemaname, c.relname, pg_stat_get_numscans(c.oid) AS seq_scan, pg_stat_get_tuples_returned(c.oid) AS seq_tup_read, (sum(pg_stat_get_numscans(i.indexrelid)))::bigint AS idx_scan, ((sum(pg_stat_get_tuples_fetched(i.indexrelid)))::bigint + pg_stat_get_tuples_fetched(c.oid)) AS idx_tup_fetch, pg_stat_get_tuples_inserted(c.oid) AS n_tup_ins, pg_stat_get_tuples_updated(c.oid) AS n_tup_upd, pg_stat_get_tuples_deleted(c.oid) AS n_tup_del, pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd, pg_stat_get_live_tuples(c.oid) AS n_live_tup, pg_stat_get_dead_tuples(c.oid) AS n_dead_tup, pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum, pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum, pg_stat_get_last_analyze_time(c.oid) AS last_analyze, pg_stat_get_last_autoanalyze_time(c.oid) AS last_autoanalyze, pg_stat_get_vacuum_count(c.oid) AS vacuum_count, pg_stat_get_autovacuum_count(c.oid) AS autovacuum_count, pg_stat_get_analyze_count(c.oid) AS analyze_count, pg_stat_get_autoanalyze_count(c.oid) AS autoanalyze_count FROM ((pg_class c LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) GROUP BY c.oid, n.nspname, c.relname;
!  pg_stat_bgwriter                | SELECT pg_stat_get_bgwriter_timed_checkpoints() AS checkpoints_timed, pg_stat_get_bgwriter_requested_checkpoints() AS checkpoints_req, pg_stat_get_bgwriter_buf_written_checkpoints() AS buffers_checkpoint, pg_stat_get_bgwriter_buf_written_clean() AS buffers_clean, pg_stat_get_bgwriter_maxwritten_clean() AS maxwritten_clean, pg_stat_get_buf_written_backend() AS buffers_backend, pg_stat_get_buf_fsync_backend() AS buffers_backend_fsync, pg_stat_get_buf_alloc() AS buffers_alloc, pg_stat_get_bgwriter_stat_reset_time() AS stats_reset, pg_stat_get_bgwriter_write_time() AS write_time, pg_stat_get_bgwriter_sync_time() AS sync_time, pg_stat_get_bgwriter_sync_files() AS sync_files;
   pg_stat_database                | SELECT d.oid AS datid, d.datname, pg_stat_get_db_numbackends(d.oid) AS numbackends, pg_stat_get_db_xact_commit(d.oid) AS xact_commit, pg_stat_get_db_xact_rollback(d.oid) AS xact_rollback, (pg_stat_get_db_blocks_fetched(d.oid) - pg_stat_get_db_blocks_hit(d.oid)) AS blks_read, pg_stat_get_db_blocks_hit(d.oid) AS blks_hit, pg_stat_get_db_tuples_returned(d.oid) AS tup_returned, pg_stat_get_db_tuples_fetched(d.oid) AS tup_fetched, pg_stat_get_db_tuples_inserted(d.oid) AS tup_inserted, pg_stat_get_db_tuples_updated(d.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(d.oid) AS tup_deleted, pg_stat_get_db_conflict_all(d.oid) AS conflicts, pg_stat_get_db_temp_files(d.oid) AS temp_files, pg_stat_get_db_temp_bytes(d.oid) AS temp_bytes, pg_stat_get_db_deadlocks(d.oid) AS deadlocks, pg_stat_get_db_stat_reset_time(d.oid) AS stats_reset FROM pg_database d;
   pg_stat_database_conflicts      | SELECT d.oid AS datid, d.datname, pg_stat_get_db_conflict_tablespace(d.oid) AS confl_tablespace, pg_stat_get_db_conflict_lock(d.oid) AS confl_lock, pg_stat_get_db_conflict_snapshot(d.oid) AS confl_snapshot, pg_stat_get_db_conflict_bufferpin(d.oid) AS confl_bufferpin, pg_stat_get_db_conflict_startup_deadlock(d.oid) AS confl_deadlock FROM pg_database d;
   pg_stat_replication             | SELECT s.pid, s.usesysid, u.rolname AS usename, s.application_name, s.client_addr, s.client_hostname, s.client_port, s.backend_start, w.state, w.sent_location, w.write_location, w.flush_location, w.replay_location, w.sync_priority, w.sync_state FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, waiting, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port), pg_authid u, pg_stat_get_wal_senders() w(pid, state, sent_location, write_location, flush_location, replay_location, sync_priority, sync_state) WHERE ((s.usesysid = u.oid) AND (s.pid = w.pid));
#19Jeff Janes
jeff.janes@gmail.com
In reply to: Robert Haas (#15)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Wed, Mar 28, 2012 at 7:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Although I liked the idea of separating this out, I wasn't thinking we
should do it as part of this patch.  It seems like an independent
project.  At any rate, I strongly agree that counting the number of
strategy allocations is not really a viable proxy for counting the
number of backend writes.  You can't know how many of those actually
got dirtied.

I was thinking that it would be part of this patch, but only because
Greg mentioned the view format stabilizing. If it is going to be
frozen, I wanted this in it. But I guess stabilize and fossilize are
different things...

Since any backend write is necessarily the result of that backend
trying to allocate a buffer, I think maybe we should just count
whether the number of times it was trying to allocate a buffer *using
a BAS* vs. the number of times it was trying to allocate a buffer *not
using a BAS*.  That is, decide whether or not it's a "strategy write"
not based on whether the buffer came in via a strategy allocation, but
rather based on whether it's going out as a result of a strategy
allocation.

Yes, exactly.

Cheers,

Jeff

#20Robert Haas
robertmhaas@gmail.com
In reply to: Peter Geoghegan (#18)
Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

On Thu, Apr 5, 2012 at 9:05 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:

On 3 April 2012 12:11, Robert Haas <robertmhaas@gmail.com> wrote:

Well, for 9.2, I am asking that you rip all that stuff out of the
patch altogether so we can focus on the stuff that was in the original
patch.

Given how we're pushed for time, I'm inclined to agree that that is
the best course of action.

Attached revision does not attempt to do anything with strategy
writes/allocations.

I have committed this after extensive revisions. I removed the
sync_files count, since no one ever explained what that was good for.
I renamed a bunch of things so that it was clear that these stats were
referring to checkpoints rather than anything else. I moved the
documentation to what I believe to be the correct place. I fixed a
few other assorted things, too, and reverted a couple hunks that
didn't seem to me to be adding anything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company