Displaying accumulated autovacuum cost

Started by Greg Smithover 14 years ago44 messages

greg@2ndQuadrant.com

over 14 years ago

2 attachment(s)

Attached is a patch that tracks and displays the accumulated cost when
autovacuum is running. Code by Noah Misch and myself. I hope this idea
will bring a formal process to vacuum tuning, which is currently too
hard to do. I was about to add "without..." to that, but I then
realized it needs no disclaimer; it's just too hard, period. Vacuum
issues are enemy #1 at all the terabyte scale customer sites I've been
fighting with lately.

The patch updates the command string just before the workers sleep to
show how much work they've done so far. And at the end, it adds a few
new lines to the information written to the logs, when the autovacuum is
notable enough to be logged at all. The overhead it adds is at most a
few integer operations per buffer processed and a slower title string
update once per sleep. It's trivial compared to both the vacuum itself,
and to the instrumentation's value to sites with vacuum issues.

To demonstrate the patch in action, here's a test case using a 6.4GB
pgbench_accounts table:

$ createdb pgbench
$ pgbench -i -s 500 pgbench
$ psql -d pgbench -c "select pg_relation_size('pgbench_accounts');"
pg_relation_size
------------------
6714761216
$ psql -d pgbench -c "select relname,relpages from pg_class where
relname='pgbench_accounts';"
relname | relpages
------------------+----------
pgbench_accounts | 819673
$psql -d pgbench -c "delete from pgbench_accounts where aid<20000000"

You can see the new information in the command string with ps and grep:

$ while [ 1 ] ; do (ps -eaf | grep "[a]utovacuum worker" && sleep 60) ; done
gsmith 2687 17718 0 15:44 ? 00:00:00 postgres: autovacuum
worker process h=19 m=14196 d=14185
...
gsmith 2687 17718 0 15:44 ? 00:00:09 postgres: autovacuum
worker process h=182701 m=301515 d=321345
...
gsmith 2687 17718 1 15:44 ? 00:00:23 postgres: autovacuum
worker process h=740359 m=679987 d=617559
...

That's accumulated hit/miss/dirty counts, the raw numbers. When the
autovacuum is finished, those totals appear as a new line in the log entry:

LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index
scans: 1
pages: 0 removed, 819673 remain
tuples: 19999999 removed, 30000022 remain
buffer usage: 809537 hits, 749340 misses, 686660 dirtied
system usage: CPU 5.70s/19.73u sec elapsed 2211.60 sec

To check if this makes sense, we need the relevant parameters at the
time, which were the defaults (I only tweaked some basic config bits
here, including shared_buffers=400MB so a bit more was cached):

vacuum_cost_page_hit = 1 # 0-10000 credits
vacuum_cost_page_miss = 10 # 0-10000 credits
vacuum_cost_page_dirty = 20 # 0-10000 credits
vacuum_cost_limit = 200 # 1-10000 credits
autovacuum_vacuum_cost_delay = 20ms

Every 20ms equals 50 times/second. That means the cost accumulation
should be 200 * 50 = 10000 / second, or 600K/minute. That's how fast
the cost should be increasing here. Given a runtime of 2211.60 seconds,
that's a total estimated cost of 2209.15 * 10000 = 22,091,500. Now we
check that against the totals printed at the end of the vacuum:

1 * 809537 hits=809,537
10 * 749340 misses=7,493,400
20 * 686607 dirtied=13,732,140

And that gives a directly computed total of 22,035,077. Close enough to
show this is working as expected. And how I computed all that should
give you an idea how you might use these numbers to extract other useful
statistics, if you'd like to tune the balance of various cost_page_*
parameters as one example. I have no idea how anyone could ever set
those relative to one another without this data, it would take epic
guessing skills.

What else can do you do with this data?

-Figure out if the VACUUM is still making progress when it appears stuck
-Estimate how long it will take to finish, based on current progress and
whatever total cost was logged last time VACUUM ran against this relation.
-Compute approximate hit rate on the read side. OS caching issues and
the ring buffer are obviously a problem with that, this isn't too valuable.
-Can see the cost split when multiple vacuums are running. This problem
is why sites can't just use "total time to vacuum" as a useful proxy to
estimate how long one will take to run.
-Easy to track the read/write ratio
-Directly measure the write rate

That last one is I think the part people are most perplexed by right
now, and this makes it trivial. How do you turn all these cost figures
into real-world read/write rates? It's been hard to do.

Now, you can take a bunch of samples of the data at 1 minute intervals,
like my little "ps | grep" example above does. The delta in the
"dirty=" column is how much was written per minute, in units of 8K
(usually) buffers. Multiply that by 8192/(60*1024*1024), and you get
MB/s out of there. I collected that data for a cleanup run of the
pgbench_accounts damage done above, CSV file with all the statistics is
attached.

I also collected OS level stats from Linux about the actual read/write
rate of the process, converted into "Write Mbps" (those are actually in
MB/s, sloppy capitalization is via OpenOffice "autocorrect"). Those
numbers are close enough to make me confident the dirty buffer totals
tracked here do turn into useful MB/s values. Sample of the most
interesting part:

Cost Delta Dirty Mbps Write Mbps
589,890 2.56 2.73
591,151 2.57 2.73
589,035 2.56 2.72
593,775 3.14 0.20
599,420 2.05 0.00
598,503 2.05 0.00
599,421 2.05 0.00
574,046 0.60 0.01
574,779 0.64 0.67
609,140 2.56 2.68
612,397 2.57 2.69
611,744 2.57 2.69
610,008 2.56 2.68

This shows the expected 600K/minute cost accumulation. And using the
dirty= numbers to compute MB/s of write speed closely matches the total
write speed of this process. Some of the difference might be I/O to
other things besides the main table here, some of it is just because OS
write caching will influence the write rate. In the spots where the OS
value and what's derived from the dirty rate diverge most, it appears to
be because vacuum is filling Linux's write cache. Actual writes
accumulated against the process them block for a while. It's a small
difference most of the time.

I'd be willing to accept a "Dirty MB/s" figure computed this way as
accurate enough for most purposes. And this patch lets you get that
data, currently unavailable without poking into the OS statistics (if at
all), just by doing a little log file and/or command string scraping.
Total at the end or real-time monitoring, based on how much work you
want to put into it. For a busy site where one or more autovacuum
processes are running most of the time, being able to monitor the vacuum
portion of the I/O this way will be a huge improvement over the current
state of things. I already have a stack of tools built on top of this
data I'm working on, and they're making it much easier to come up with
an iterative tuning process for autovacuum.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

Attachments:

pgbench-vacuum-stats.csvtext/csv; name=pgbench-vacuum-stats.csvDownload

vacuum_stats_v2.patchtext/x-patch; name=vacuum_stats_v2.patchDownload

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 889737e..cc2d148 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -43,6 +43,7 @@
 #include "utils/fmgroids.h"
 #include "utils/guc.h"
 #include "utils/memutils.h"
+#include "utils/ps_status.h"
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
@@ -214,6 +215,9 @@ vacuum(VacuumStmt *vacstmt, Oid relid, bool do_toast,
 
 		VacuumCostActive = (VacuumCostDelay > 0);
 		VacuumCostBalance = 0;
+		VacuumPageHit = 0;
+		VacuumPageMiss = 0;
+		VacuumPageDirty = 0;
 
 		/*
 		 * Loop to process each selected relation.
@@ -1155,11 +1159,17 @@ vacuum_delay_point(void)
 		VacuumCostBalance >= VacuumCostLimit)
 	{
 		int			msec;
+		StringInfoData ps;
 
 		msec = VacuumCostDelay * VacuumCostBalance / VacuumCostLimit;
 		if (msec > VacuumCostDelay * 4)
 			msec = VacuumCostDelay * 4;
 
+		initStringInfo(&ps);
+		appendStringInfo(&ps, "h=%d m=%d d=%d",
+						 VacuumPageHit, VacuumPageMiss, VacuumPageDirty);
+		set_ps_display(ps.data, false);
+
 		pg_usleep(msec * 1000L);
 
 		VacuumCostBalance = 0;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index c5bf32e..a6094fe 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -226,20 +226,42 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 		if (Log_autovacuum_min_duration == 0 ||
 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
 									   Log_autovacuum_min_duration))
-			ereport(LOG,
-					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
-							"pages: %d removed, %d remain\n"
-							"tuples: %.0f removed, %.0f remain\n"
-							"system usage: %s",
-							get_database_name(MyDatabaseId),
-							get_namespace_name(RelationGetNamespace(onerel)),
-							RelationGetRelationName(onerel),
-							vacrelstats->num_index_scans,
-							vacrelstats->pages_removed,
-							vacrelstats->rel_pages,
-							vacrelstats->tuples_deleted,
-							vacrelstats->new_rel_tuples,
-							pg_rusage_show(&ru0))));
+		{
+			if (VacuumCostActive)
+				ereport(LOG,
+						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
+								"pages: %d removed, %d remain\n"
+								"tuples: %.0f removed, %.0f remain\n"
+								"buffer usage: %d hits, %d misses, %d dirtied\n"
+								"system usage: %s",
+								get_database_name(MyDatabaseId),
+								get_namespace_name(RelationGetNamespace(onerel)),
+								RelationGetRelationName(onerel),
+								vacrelstats->num_index_scans,
+								vacrelstats->pages_removed,
+								vacrelstats->rel_pages,
+								vacrelstats->tuples_deleted,
+								vacrelstats->new_rel_tuples,
+								VacuumPageHit,
+								VacuumPageMiss,
+								VacuumPageDirty,
+								pg_rusage_show(&ru0))));
+			else
+				ereport(LOG,
+						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
+								"pages: %d removed, %d remain\n"
+								"tuples: %.0f removed, %.0f remain\n"
+								"system usage: %s",
+								get_database_name(MyDatabaseId),
+								get_namespace_name(RelationGetNamespace(onerel)),
+								RelationGetRelationName(onerel),
+								vacrelstats->num_index_scans,
+								vacrelstats->pages_removed,
+								vacrelstats->rel_pages,
+								vacrelstats->tuples_deleted,
+								vacrelstats->new_rel_tuples,
+								pg_rusage_show(&ru0))));
+		}
 	}
 }
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4c7cfb0..d62adbc 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -341,7 +341,10 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 			*hit = true;
 
 			if (VacuumCostActive)
+			{
+				VacuumPageHit++;
 				VacuumCostBalance += VacuumCostPageHit;
+			}
 
 			TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
 											  smgr->smgr_rnode.node.spcNode,
@@ -471,7 +474,10 @@ ReadBuffer_common(SMgrRelation smgr, char relpersistence, ForkNumber forkNum,
 	}
 
 	if (VacuumCostActive)
+	{
+		VacuumPageMiss++;
 		VacuumCostBalance += VacuumCostPageMiss;
+	}
 
 	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
 									  smgr->smgr_rnode.node.spcNode,
@@ -974,7 +980,10 @@ MarkBufferDirty(Buffer buffer)
 	 * If the buffer was not dirty already, do vacuum cost accounting.
 	 */
 	if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+	{
+		VacuumPageDirty++;
 		VacuumCostBalance += VacuumCostPageDirty;
+	}
 
 	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
 
@@ -2299,7 +2308,10 @@ SetBufferCommitInfoNeedsSave(Buffer buffer)
 		LockBufHdr(bufHdr);
 		Assert(bufHdr->refcount > 0);
 		if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+		{
+			VacuumPageDirty++;
 			VacuumCostBalance += VacuumCostPageDirty;
+		}
 		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
 		UnlockBufHdr(bufHdr);
 	}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index c4c4154..9ce64e6 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -115,6 +115,10 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 int			VacuumCostDelay = 0;
 
+int			VacuumPageHit = 0;
+int			VacuumPageMiss = 0;
+int			VacuumPageDirty = 0;
+
 int			VacuumCostBalance = 0;		/* working state for vacuum */
 bool		VacuumCostActive = false;
 
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9d19417..4ee08fe 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -230,6 +230,10 @@ extern int	VacuumCostPageDirty;
 extern int	VacuumCostLimit;
 extern int	VacuumCostDelay;
 
+extern int	VacuumPageHit;
+extern int	VacuumPageMiss;
+extern int	VacuumPageDirty;
+
 extern int	VacuumCostBalance;
 extern bool VacuumCostActive;

Euler Taveira de Oliveira

euler@timbira.com

over 14 years ago

In reply to: Greg Smith (#1)

Re: Displaying accumulated autovacuum cost

Em 17-08-2011 18:04, Greg Smith escreveu:

Attached is a patch that tracks and displays the accumulated cost when
autovacuum is running. Code by Noah Misch and myself. I hope this idea
will bring a formal process to vacuum tuning, which is currently too
hard to do. I was about to add "without..." to that, but I then realized
it needs no disclaimer; it's just too hard, period. Vacuum issues are
enemy #1 at all the terabyte scale customer sites I've been fighting
with lately.

Interesting patch. I drafted a similar idea but didn't have a chance to
publish it. It is a complement to the idea about autovacuum tuning [1]http://archives.postgresql.org/pgsql-hackers/2011-06/msg00678.php. Hope I
will have time to post something for the next CF. And, of course, I will
review this patch.

The patch updates the command string just before the workers sleep to
show how much work they've done so far. And at the end, it adds a few
new lines to the information written to the logs, when the autovacuum is
notable enough to be logged at all. The overhead it adds is at most a
few integer operations per buffer processed and a slower title string
update once per sleep. It's trivial compared to both the vacuum itself,
and to the instrumentation's value to sites with vacuum issues.

I don't like exposing this information only on title processes. It would be
difficult for client apps (for example, PGAdmin) to track this kind of
information and it is restricted to local access. I'm not objecting to display
this information in process title; I'm just saying that that information
should be exposed in functions (say pg_stat_get_vacuum_[hit|miss|dirty]) too.
I'm not sure about adding this information to incremental counters but that
would be useful to trace a vacuum work pattern.

[1]: http://archives.postgresql.org/pgsql-hackers/2011-06/msg00678.php

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Euler Taveira de Oliveira (#2)

Re: Displaying accumulated autovacuum cost

On 08/17/2011 07:42 PM, Euler Taveira de Oliveira wrote:

I don't like exposing this information only on title processes. It
would be difficult for client apps (for example, PGAdmin) to track
this kind of information and it is restricted to local access. I'm not
objecting to display this information in process title; I'm just
saying that that information should be exposed in functions (say
pg_stat_get_vacuum_[hit|miss|dirty]) too.

I tend to build the simplest possible thing that is useful enough to
work. The data is getting stored and shown now, where it wasn't
before. If it's possible to expose that in additional ways later too,
great. The big step up for this information is to go from
"unobtainable" to "obtainable". I'd prefer not to add a quest for
"easily obtainable" to the requirements until that big jump is made, for
fear it will cause nothing to get delivered.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

Magnus Hagander

magnus@hagander.net

over 14 years ago

In reply to: Greg Smith (#3)

Re: Displaying accumulated autovacuum cost

On Thu, Aug 18, 2011 at 03:23, Greg Smith <greg@2ndquadrant.com> wrote:

On 08/17/2011 07:42 PM, Euler Taveira de Oliveira wrote:

I don't like exposing this information only on title processes. It would
be difficult for client apps (for example, PGAdmin) to track this kind of
information and it is restricted to local access. I'm not objecting to
display this information in process title; I'm just saying that that
information should be exposed in functions (say
pg_stat_get_vacuum_[hit|miss|dirty]) too.

I tend to build the simplest possible thing that is useful enough to work.
The data is getting stored and shown now, where it wasn't before. If it's
possible to expose that in additional ways later too, great. The big step
up for this information is to go from "unobtainable" to "obtainable". I'd
prefer not to add a quest for "easily obtainable" to the requirements until
that big jump is made, for fear it will cause nothing to get delivered.

By only putting it in the ps display, you exclude all the users who
don't have an easy way to look at that information. The big group
there is Windows, but it's not necessarily easy on all other platforms
as well, afaik. And possibliy even more importantly, it makes it
impossible to view it from tools like pgadmin. I think it's definitely
worthwhile to add support to view it through the stats collector as
well from the beginnig. The question there is if it's enough to just
show it in the current_query (kind of like it's done in the ps
output), or if we want a completely separate view with this info.

Also, unrelated to that, wouldn't this information be interesting for
non-autovacuum queries as well?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 14 years ago

In reply to: Magnus Hagander (#4)

Re: Displaying accumulated autovacuum cost

Magnus Hagander <magnus@hagander.net> writes:

Also, unrelated to that, wouldn't this information be interesting for
non-autovacuum queries as well?

I was about to say that I would like to see it for normal queries too,
but I guess we already have it:

=> explain (analyze, buffers, costs off)
select * from pg_attribute a join pg_class c on a.attrelid = c.oid;
QUERY PLAN
--------------------------------------------------------------------------------
Hash Join (actual time=0.569..4.255 rows=2158 loops=1)
Hash Cond: (a.attrelid = c.oid)
Buffers: shared hit=48
-> Seq Scan on pg_attribute a (actual time=0.008..0.462 rows=2158 loops=1)
Buffers: shared hit=40
-> Hash (actual time=0.541..0.541 rows=282 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 54kB
Buffers: shared hit=8
-> Seq Scan on pg_class c (actual time=0.010..0.269 rows=282 loops=1)
Buffers: shared hit=8
Total runtime: 4.551 ms
(11 rows)

Also, from where I sit the ps title update for normal queries is about
useless, as I see loads of IDLE postgresql backends in top that are
consuming 20% and more CPU time. The refresh rate is way to low to be
useful, and having the title it updated more frequently would probably
consume enough CPU that it would defeat its purpose (going from
instrumenting to slowing down enough that you can see what's happening
is not where I'd want to go).

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Euler Taveira de Oliveira

euler@timbira.com

over 14 years ago

In reply to: Magnus Hagander (#4)

Re: Displaying accumulated autovacuum cost

Em 18-08-2011 03:39, Magnus Hagander escreveu:

Also, unrelated to that, wouldn't this information be interesting for
non-autovacuum queries as well?

Yes, it would. AFAICS, the patch will display that message in process titles.
However, analyze code also uses the vacuum_delay_point(). How do you handle it?

It would be another patch... autovacuum has an option to display summarized
information but vacuum don't. Isn't it time to be symmetrical here?

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Tom Lane

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Magnus Hagander (#4)

Re: Displaying accumulated autovacuum cost

Magnus Hagander <magnus@hagander.net> writes:

On Thu, Aug 18, 2011 at 03:23, Greg Smith <greg@2ndquadrant.com> wrote:

On 08/17/2011 07:42 PM, Euler Taveira de Oliveira wrote:

I don't like exposing this information only on title processes.

I tend to build the simplest possible thing that is useful enough to work.

By only putting it in the ps display, you exclude all the users who
don't have an easy way to look at that information. The big group
there is Windows, but it's not necessarily easy on all other platforms
as well, afaik.

Yeah. Also, process title updates are friggin expensive on some
platforms --- so much so, that we have a GUC setting to disable them.
So I don't think we should use that technique at all. Put the info
into the stats collector instead (not "also").

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Greg Smith (#3)

Re: Displaying accumulated autovacuum cost

On Wed, Aug 17, 2011 at 9:23 PM, Greg Smith <greg@2ndquadrant.com> wrote:

On 08/17/2011 07:42 PM, Euler Taveira de Oliveira wrote:

I don't like exposing this information only on title processes. It would
be difficult for client apps (for example, PGAdmin) to track this kind of
information and it is restricted to local access. I'm not objecting to
display this information in process title; I'm just saying that that
information should be exposed in functions (say
pg_stat_get_vacuum_[hit|miss|dirty]) too.

I tend to build the simplest possible thing that is useful enough to work.
The data is getting stored and shown now, where it wasn't before. If it's
possible to expose that in additional ways later too, great. The big step
up for this information is to go from "unobtainable" to "obtainable". I'd
prefer not to add a quest for "easily obtainable" to the requirements until
that big jump is made, for fear it will cause nothing to get delivered.

Perhaps a reasonable way to break up the patch would be:

- Part 1: Gather the information and display it in the
log_autovacuum_min_duration output.
- Part 2: Add the ability to see the information incrementally (via
some mechanism yet to be agreed upon).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Peter Eisentraut

peter_e@gmx.net

over 14 years ago

In reply to: Magnus Hagander (#4)

Re: Displaying accumulated autovacuum cost

On tor, 2011-08-18 at 08:39 +0200, Magnus Hagander wrote:

Also, unrelated to that, wouldn't this information be interesting for
non-autovacuum queries as well?

Last year we were discussing some details on progress reporting, and
some people suggested that instead of printing a single percentage, we
should let each type of activity print out whatever metrics it has that
would allow an experienced DBA to track the progress. Effectively, this
is what this patch is trying to do.

So how about adding a column to pg_stat_activity, progress_metrics or
something like that, and add that information there.

#10

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Peter Eisentraut (#9)

Re: Displaying accumulated autovacuum cost

On Thu, Aug 18, 2011 at 10:54 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

On tor, 2011-08-18 at 08:39 +0200, Magnus Hagander wrote:

Also, unrelated to that, wouldn't this information be interesting for
non-autovacuum queries as well?

Last year we were discussing some details on progress reporting, and
some people suggested that instead of printing a single percentage, we
should let each type of activity print out whatever metrics it has that
would allow an experienced DBA to track the progress. Effectively, this
is what this patch is trying to do.

So how about adding a column to pg_stat_activity, progress_metrics or
something like that, and add that information there.

That might be a good way to go. I don't think we want something like
pg_stat_all_tables for this, because it seems that Greg's use case is
to be able to see how a *particular* autovacuum process is doing
*while it's running*, not to look at aggregate statistics over time.
Putting it in pg_stat_activity would be good for that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#11

Magnus Hagander

magnus@hagander.net

over 14 years ago

In reply to: Robert Haas (#10)

Re: Displaying accumulated autovacuum cost

On Thu, Aug 18, 2011 at 17:13, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 18, 2011 at 10:54 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

On tor, 2011-08-18 at 08:39 +0200, Magnus Hagander wrote:

Also, unrelated to that, wouldn't this information be interesting for
non-autovacuum queries as well?

Last year we were discussing some details on progress reporting, and
some people suggested that instead of printing a single percentage, we
should let each type of activity print out whatever metrics it has that
would allow an experienced DBA to track the progress. Effectively, this
is what this patch is trying to do.

So how about adding a column to pg_stat_activity, progress_metrics or
something like that, and add that information there.

That might be a good way to go. I don't think we want something like
pg_stat_all_tables for this, because it seems that Greg's use case is
to be able to see how a *particular* autovacuum process is doing
*while it's running*, not to look at aggregate statistics over time.
Putting it in pg_stat_activity would be good for that.

It's also good to have it broken down into multiple columns, and not
just a freetext column in the view - if tools should be able to parse
it as well.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#12

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Magnus Hagander (#11)

Re: Displaying accumulated autovacuum cost

On Thu, Aug 18, 2011 at 11:14 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Aug 18, 2011 at 17:13, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 18, 2011 at 10:54 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

On tor, 2011-08-18 at 08:39 +0200, Magnus Hagander wrote:

Also, unrelated to that, wouldn't this information be interesting for
non-autovacuum queries as well?

Last year we were discussing some details on progress reporting, and
some people suggested that instead of printing a single percentage, we
should let each type of activity print out whatever metrics it has that
would allow an experienced DBA to track the progress. Effectively, this
is what this patch is trying to do.

So how about adding a column to pg_stat_activity, progress_metrics or
something like that, and add that information there.

That might be a good way to go. I don't think we want something like
pg_stat_all_tables for this, because it seems that Greg's use case is
to be able to see how a *particular* autovacuum process is doing
*while it's running*, not to look at aggregate statistics over time.
Putting it in pg_stat_activity would be good for that.

It's also good to have it broken down into multiple columns, and not
just a freetext column in the view - if tools should be able to parse
it as well.

True. We could have a separate system view that only shows the status
of currently-running vacuum proceses. That wouldn't bother me either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13

Magnus Hagander

magnus@hagander.net

over 14 years ago

In reply to: Robert Haas (#12)

Re: Displaying accumulated autovacuum cost

On Thu, Aug 18, 2011 at 17:23, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 18, 2011 at 11:14 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Thu, Aug 18, 2011 at 17:13, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 18, 2011 at 10:54 AM, Peter Eisentraut <peter_e@gmx.net> wrote:

On tor, 2011-08-18 at 08:39 +0200, Magnus Hagander wrote:

Also, unrelated to that, wouldn't this information be interesting for
non-autovacuum queries as well?

Last year we were discussing some details on progress reporting, and
some people suggested that instead of printing a single percentage, we
should let each type of activity print out whatever metrics it has that
would allow an experienced DBA to track the progress. Effectively, this
is what this patch is trying to do.

So how about adding a column to pg_stat_activity, progress_metrics or
something like that, and add that information there.

That might be a good way to go. I don't think we want something like
pg_stat_all_tables for this, because it seems that Greg's use case is
to be able to see how a *particular* autovacuum process is doing
*while it's running*, not to look at aggregate statistics over time.
Putting it in pg_stat_activity would be good for that.

It's also good to have it broken down into multiple columns, and not
just a freetext column in the view - if tools should be able to parse
it as well.

True. We could have a separate system view that only shows the status
of currently-running vacuum proceses. That wouldn't bother me either.

That's what I'd like to have. We could also have aggregate counters on
the table/database level of course.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#14

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Robert Haas (#8)

Re: Displaying accumulated autovacuum cost

On 08/18/2011 10:12 AM, Robert Haas wrote:

Perhaps a reasonable way to break up the patch would be:
- Part 1: Gather the information and display it in the
log_autovacuum_min_duration output.
- Part 2: Add the ability to see the information incrementally (via
some mechanism yet to be agreed upon).

My reaction to all the suggestions for redesign is just that: pull out
the part that does the incremental updates altogether, improve the part
that dumps the info into the logs, and resubmit without any incremental
progress for now. This is much more valuable to me if the first commit
that hits is something I can backport trivially. I'm seeing enough
production servers running into this problem right now on earlier
versions to be worried about that, and the log dump at the end would be
a huge help even if that was all they got. I'm going to add directly
computing the write MB/s figure from the dirty data written too, since
that ends up being the thing that I keep deriving by hand anyway.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#15

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Peter Eisentraut (#9)

Re: Displaying accumulated autovacuum cost

On 08/18/2011 10:54 AM, Peter Eisentraut wrote:

So how about adding a column to pg_stat_activity, progress_metrics or
something like that, and add that information there.

Adding a field here (I'd go for the simpler "progress") and updating it
regularly would be a reasonable way to go here. This data doesn't
really need to go into the traditional statistics infrastructure to be
useful. I didn't start there because I was already getting pushback on
overloading the stats collector with constantly updated metrics last
time I did something in this area. I wasn't going to try and argue why
it was worth it in this case, just like I'm not going to argue about the
complaint over the command string overhead being too high--just going to
not do that instead. If the bikeshed I built doesn't look fancy enough
to hold the bike I put in there, I'm not going to build a better one
right now--I'll just put a cheaper bike in there instead.

I was hoping to eventually take the useful summary bits at the end, the
totals, and save those into statistics somewhere each time a VACUUM of
either sort finishes. It would fit with the information shown in
pg_stat_tables, but that's obviously getting too wide. Breaking out a
pg_stat_autovacuum view that contains all the relevant bits currently
shown in that view, plus these 3 new fields, would be a reasonable start.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#16

Magnus Hagander

magnus@hagander.net

over 14 years ago

In reply to: Greg Smith (#15)

Re: Displaying accumulated autovacuum cost

On Thu, Aug 18, 2011 at 17:54, Greg Smith <greg@2ndquadrant.com> wrote:

On 08/18/2011 10:54 AM, Peter Eisentraut wrote:

So how about adding a column to pg_stat_activity, progress_metrics or
something like that, and add that information there.

Adding a field here (I'd go for the simpler "progress") and updating it
regularly would be a reasonable way to go here. This data doesn't really
need to go into the traditional statistics infrastructure to be useful. I
didn't start there because I was already getting pushback on overloading the
stats collector with constantly updated metrics last time I did something in
this area. I wasn't going to try and argue why it was worth it in this
case, just like I'm not going to argue about the complaint over the command
string overhead being too high--just going to not do that instead. If the
bikeshed I built doesn't look fancy enough to hold the bike I put in there,
I'm not going to build a better one right now--I'll just put a cheaper bike
in there instead.

The "current values per-backend" thing can go in shared memory. The
reason the per table ones can't is obviously that they go away when
the backend disconnects..

I was hoping to eventually take the useful summary bits at the end, the
totals, and save those into statistics somewhere each time a VACUUM of
either sort finishes. It would fit with the information shown in
pg_stat_tables, but that's obviously getting too wide. Breaking out a
pg_stat_autovacuum view that contains all the relevant bits currently shown
in that view, plus these 3 new fields, would be a reasonable start.

That depends on what you mea nby too wide. If it's intended to be
consumed with "SELECT *" or not...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#17

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Greg Smith (#14)

Re: Displaying accumulated autovacuum cost

On Thu, Aug 18, 2011 at 11:41 AM, Greg Smith <greg@2ndquadrant.com> wrote:

On 08/18/2011 10:12 AM, Robert Haas wrote:

Perhaps a reasonable way to break up the patch would be:
- Part 1: Gather the information and display it in the
log_autovacuum_min_duration output.
- Part 2: Add the ability to see the information incrementally (via
some mechanism yet to be agreed upon).

My reaction to all the suggestions for redesign is just that: pull out the
part that does the incremental updates altogether, improve the part that
dumps the info into the logs, and resubmit without any incremental progress
for now. This is much more valuable to me if the first commit that hits is
something I can backport trivially. I'm seeing enough production servers
running into this problem right now on earlier versions to be worried about
that, and the log dump at the end would be a huge help even if that was all
they got. I'm going to add directly computing the write MB/s figure from
the dirty data written too, since that ends up being the thing that I keep
deriving by hand anyway.

By the way, since I forgot to say it earlier, I think it's great that
you are working on some of this instrumentation stuff, so +1 for the
basic concept here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

Euler Taveira de Oliveira

euler@timbira.com

over 14 years ago

In reply to: Greg Smith (#15)

Re: Displaying accumulated autovacuum cost

Em 18-08-2011 12:54, Greg Smith escreveu:

I was hoping to eventually take the useful summary bits at the end, the
totals, and save those into statistics somewhere each time a VACUUM of
either sort finishes. It would fit with the information shown in
pg_stat_tables, but that's obviously getting too wide. Breaking out a
pg_stat_autovacuum view that contains all the relevant bits currently
shown in that view, plus these 3 new fields, would be a reasonable start.

IMHO the useful summary bits belongs to log. If you want to add it to stats
collector go for it. But if you go to the latter road, it is recommended to
move some fields (time-related fields) from pg_stat_*_tables to this new view
(pg_stat_maintenance?). I don't know how generic you want to go but have in
mind I would like to cover automatic and manual maintenance commands.

Besides that another view will cover the maintenance activity. This new view
could contain at least datname, schemaname, relname, command_start,
command_schedule, operation, progress (?), procpid, and current_command. The
name has to be generic to cover all maintenance commands (perhaps
pg_maintenance_activity).

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#19

Jim Nasby

jim@nasby.net

over 14 years ago

In reply to: Greg Smith (#14)

Re: Displaying accumulated autovacuum cost

On Aug 18, 2011, at 10:41 AM, Greg Smith wrote:

that was all they got. I'm going to add directly computing the write MB/s figure from the dirty data written too, since that ends up being the thing that I keep deriving by hand anyway.

I know folks have talked about progress, but I haven't seen anything specific... could you add info about what table/index vacuum is working on, and how far along it is? I realize that's not very close to an actual % completion, but it's far better than what we have right now.

FWIW, the number I end up caring about isn't so much write traffic as read. Thanks to a good amount of battery-backed write cache (and possibly some iSCSI misconfiguration), our writes are generally much cheaper than our reads.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

#20

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Jim Nasby (#19)

Re: Displaying accumulated autovacuum cost

On 08/22/2011 05:54 PM, Jim Nasby wrote:

I know folks have talked about progress, but I haven't seen anything
specific... could you add info about what table/index vacuum is
working on, and how far along it is? I realize that's not very close
to an actual % completion, but it's far better than what we have right
now.

Due to complaints about the mechanism the first version used to inform
the user of the progress, I'm yanking that from the next patch
altogether. The goal for now is to get a good report into the logs, and
then maybe that gets extended later with a progress report. (All of the
proposed alternate mechanisms are way more complicated than anything I
have time to do right now)

FWIW, the number I end up caring about isn't so much write traffic as read. Thanks to a good amount of battery-backed write cache (and possibly some iSCSI misconfiguration), our writes are generally much cheaper than our reads.

VACUUM can't really know its true read rate from what's inside the
database. I can add a summary of the accumulated read amounts into the
logs, in more useful figures than what is provided so far, which is
better than nothing. But those will be kind of deceptive, which is one
reason I wasn't so focused on them yet. If the relation is largely in
the OS cache, but not the PostgreSQL one, the summary can show a read
rate even when that isn't actually doing any reads at all. That was
exactly the case in the sample data I posted. VACUUM thought it was
reading anywhere from 2.5 to 6MB/s. But at the OS level, it was
actually reading zero bytes, since the whole thing was in cache already.

What you actually want is a count of the accumulated read counters at
the OS level. I've recently figured out how to track those, too, but
that code is something that lives outside the database. If this is
something useful to you, I think you're about to sign up to be my next
beta tester for that program.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#21

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Greg Smith (#20)

1 attachment(s)

Re: Displaying accumulated autovacuum cost

Attached patch includes "math is hard" reworking, so it displays the
average write rate in the log output automatically:

LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index
scans: 1
pages: 0 removed, 163935 remain
tuples: 2000000 removed, 4625165 remain
buffer usage: 111901 hits, 123348 misses, 102351 dirtied, 23.365
MiB/s write rate
system usage: CPU 1.84s/4.22u sec elapsed 34.22 sec

All of the updates to the process title are gone, in favor of some
progress report mechanism TBD. The summary is much more important than
the progress tracking part as far as I'm concerned, I don't mind
splitting things apart to try and get this part in earlier.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

Attachments:

vacuum_stats_v3.patchtext/x-patch; name=vacuum_stats_v3.patchDownload

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 889737e..fa15b2e 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
***************
*** 43,48 ****
--- 43,49 ----
  #include "utils/fmgroids.h"
  #include "utils/guc.h"
  #include "utils/memutils.h"
+ #include "utils/ps_status.h"
  #include "utils/snapmgr.h"
  #include "utils/syscache.h"
  #include "utils/tqual.h"
*************** vacuum(VacuumStmt *vacstmt, Oid relid, b
*** 214,219 ****
--- 215,223 ----
  
  		VacuumCostActive = (VacuumCostDelay > 0);
  		VacuumCostBalance = 0;
+ 		VacuumPageHit = 0;
+ 		VacuumPageMiss = 0;
+ 		VacuumPageDirty = 0;
  
  		/*
  		 * Loop to process each selected relation.
*************** vacuum_delay_point(void)
*** 1160,1167 ****
  		if (msec > VacuumCostDelay * 4)
  			msec = VacuumCostDelay * 4;
  
- 		pg_usleep(msec * 1000L);
- 
  		VacuumCostBalance = 0;
  
  		/* update balance values for workers */
--- 1164,1169 ----
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index b5547c5..a41f1cd 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 151,165 ****
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0)
  			starttime = GetCurrentTimestamp();
  	}
  
--- 151,168 ----
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0, endtime;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
+ 	long		secs;
+ 	int			usecs;
+ 	double		write_rate;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0 || VacuumCostActive)
  			starttime = GetCurrentTimestamp();
  	}
  
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 225,247 ****
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
  									   Log_autovacuum_min_duration))
! 			ereport(LOG,
! 					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 							"pages: %d removed, %d remain\n"
! 							"tuples: %.0f removed, %.0f remain\n"
! 							"system usage: %s",
! 							get_database_name(MyDatabaseId),
! 							get_namespace_name(RelationGetNamespace(onerel)),
! 							RelationGetRelationName(onerel),
! 							vacrelstats->num_index_scans,
! 							vacrelstats->pages_removed,
! 							vacrelstats->rel_pages,
! 							vacrelstats->tuples_deleted,
! 							vacrelstats->new_rel_tuples,
! 							pg_rusage_show(&ru0))));
  	}
  }
  
--- 228,282 ----
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
+ 		endtime = GetCurrentTimestamp();
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, endtime,
  									   Log_autovacuum_min_duration))
! 		{
! 			if (VacuumCostActive)
! 			{
! 				TimestampDifference(starttime, endtime, &secs, &usecs);
! 				write_rate = 0;
! 				if ((secs > 0) || (usecs > 0))
! 					write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) /
! 						(secs + usecs / 1000000.0);
! 
! 				ereport(LOG,
! 						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 								"pages: %d removed, %d remain\n"
! 								"tuples: %.0f removed, %.0f remain\n"
! 								"buffer usage: %d hits, %d misses, %d dirtied, %.3f MiB/s write rate\n"
! 								"system usage: %s",
! 								get_database_name(MyDatabaseId),
! 								get_namespace_name(RelationGetNamespace(onerel)),
! 								RelationGetRelationName(onerel),
! 								vacrelstats->num_index_scans,
! 								vacrelstats->pages_removed,
! 								vacrelstats->rel_pages,
! 								vacrelstats->tuples_deleted,
! 								vacrelstats->new_rel_tuples,
! 								VacuumPageHit,
! 								VacuumPageMiss,
! 								VacuumPageDirty,
! 								write_rate,
! 								pg_rusage_show(&ru0))));
! 			}
! 			else
! 				ereport(LOG,
! 						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 								"pages: %d removed, %d remain\n"
! 								"tuples: %.0f removed, %.0f remain\n"
! 								"system usage: %s",
! 								get_database_name(MyDatabaseId),
! 								get_namespace_name(RelationGetNamespace(onerel)),
! 								RelationGetRelationName(onerel),
! 								vacrelstats->num_index_scans,
! 								vacrelstats->pages_removed,
! 								vacrelstats->rel_pages,
! 								vacrelstats->tuples_deleted,
! 								vacrelstats->new_rel_tuples,
! 								pg_rusage_show(&ru0))));
! 		}
  	}
  }
  
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4c7cfb0..d62adbc 100644
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*************** ReadBuffer_common(SMgrRelation smgr, cha
*** 341,347 ****
--- 341,350 ----
  			*hit = true;
  
  			if (VacuumCostActive)
+ 			{
+ 				VacuumPageHit++;
  				VacuumCostBalance += VacuumCostPageHit;
+ 			}
  
  			TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
  											  smgr->smgr_rnode.node.spcNode,
*************** ReadBuffer_common(SMgrRelation smgr, cha
*** 471,477 ****
--- 474,483 ----
  	}
  
  	if (VacuumCostActive)
+ 	{
+ 		VacuumPageMiss++;
  		VacuumCostBalance += VacuumCostPageMiss;
+ 	}
  
  	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
  									  smgr->smgr_rnode.node.spcNode,
*************** MarkBufferDirty(Buffer buffer)
*** 974,980 ****
--- 980,989 ----
  	 * If the buffer was not dirty already, do vacuum cost accounting.
  	 */
  	if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+ 	{
+ 		VacuumPageDirty++;
  		VacuumCostBalance += VacuumCostPageDirty;
+ 	}
  
  	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  
*************** SetBufferCommitInfoNeedsSave(Buffer buff
*** 2299,2305 ****
--- 2308,2317 ----
  		LockBufHdr(bufHdr);
  		Assert(bufHdr->refcount > 0);
  		if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+ 		{
+ 			VacuumPageDirty++;
  			VacuumCostBalance += VacuumCostPageDirty;
+ 		}
  		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  		UnlockBufHdr(bufHdr);
  	}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index c4c4154..9ce64e6 100644
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
*************** int			VacuumCostPageDirty = 20;
*** 115,120 ****
--- 115,124 ----
  int			VacuumCostLimit = 200;
  int			VacuumCostDelay = 0;
  
+ int			VacuumPageHit = 0;
+ int			VacuumPageMiss = 0;
+ int			VacuumPageDirty = 0;
+ 
  int			VacuumCostBalance = 0;		/* working state for vacuum */
  bool		VacuumCostActive = false;
  
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9d19417..4ee08fe 100644
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
*************** extern int	VacuumCostPageDirty;
*** 230,235 ****
--- 230,239 ----
  extern int	VacuumCostLimit;
  extern int	VacuumCostDelay;
  
+ extern int	VacuumPageHit;
+ extern int	VacuumPageMiss;
+ extern int	VacuumPageDirty;
+ 
  extern int	VacuumCostBalance;
  extern bool VacuumCostActive;

#22

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Greg Smith (#21)

1 attachment(s)

Re: Displaying accumulated autovacuum cost

Updated patch cleans up two diff mistakes made when backing out the
progress report feature. The tip-off I screwed up should have been the
absurdly high write rate shown. The usleep was accidentally deleted, so
it was running without cost limits even applying. Here's a good one
instead:

LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index
scans: 1
pages: 0 removed, 163935 remain
tuples: 2000000 removed, 2928356 remain
buffer usage: 117393 hits, 123351 misses, 102684 dirtied, 2.168
MiB/s write rate
system usage: CPU 2.54s/6.27u sec elapsed 369.99 sec

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

Attachments:

vacuum_stats_v4.patchtext/x-patch; name=vacuum_stats_v4.patchDownload

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 889737e..c9890b4 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*************** vacuum(VacuumStmt *vacstmt, Oid relid, b
*** 214,219 ****
--- 214,222 ----
  
  		VacuumCostActive = (VacuumCostDelay > 0);
  		VacuumCostBalance = 0;
+ 		VacuumPageHit = 0;
+ 		VacuumPageMiss = 0;
+ 		VacuumPageDirty = 0;
  
  		/*
  		 * Loop to process each selected relation.
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index b5547c5..a41f1cd 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 151,165 ****
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0)
  			starttime = GetCurrentTimestamp();
  	}
  
--- 151,168 ----
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0, endtime;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
+ 	long		secs;
+ 	int			usecs;
+ 	double		write_rate;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0 || VacuumCostActive)
  			starttime = GetCurrentTimestamp();
  	}
  
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 225,247 ****
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
  									   Log_autovacuum_min_duration))
! 			ereport(LOG,
! 					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 							"pages: %d removed, %d remain\n"
! 							"tuples: %.0f removed, %.0f remain\n"
! 							"system usage: %s",
! 							get_database_name(MyDatabaseId),
! 							get_namespace_name(RelationGetNamespace(onerel)),
! 							RelationGetRelationName(onerel),
! 							vacrelstats->num_index_scans,
! 							vacrelstats->pages_removed,
! 							vacrelstats->rel_pages,
! 							vacrelstats->tuples_deleted,
! 							vacrelstats->new_rel_tuples,
! 							pg_rusage_show(&ru0))));
  	}
  }
  
--- 228,282 ----
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
+ 		endtime = GetCurrentTimestamp();
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, endtime,
  									   Log_autovacuum_min_duration))
! 		{
! 			if (VacuumCostActive)
! 			{
! 				TimestampDifference(starttime, endtime, &secs, &usecs);
! 				write_rate = 0;
! 				if ((secs > 0) || (usecs > 0))
! 					write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) /
! 						(secs + usecs / 1000000.0);
! 
! 				ereport(LOG,
! 						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 								"pages: %d removed, %d remain\n"
! 								"tuples: %.0f removed, %.0f remain\n"
! 								"buffer usage: %d hits, %d misses, %d dirtied, %.3f MiB/s write rate\n"
! 								"system usage: %s",
! 								get_database_name(MyDatabaseId),
! 								get_namespace_name(RelationGetNamespace(onerel)),
! 								RelationGetRelationName(onerel),
! 								vacrelstats->num_index_scans,
! 								vacrelstats->pages_removed,
! 								vacrelstats->rel_pages,
! 								vacrelstats->tuples_deleted,
! 								vacrelstats->new_rel_tuples,
! 								VacuumPageHit,
! 								VacuumPageMiss,
! 								VacuumPageDirty,
! 								write_rate,
! 								pg_rusage_show(&ru0))));
! 			}
! 			else
! 				ereport(LOG,
! 						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 								"pages: %d removed, %d remain\n"
! 								"tuples: %.0f removed, %.0f remain\n"
! 								"system usage: %s",
! 								get_database_name(MyDatabaseId),
! 								get_namespace_name(RelationGetNamespace(onerel)),
! 								RelationGetRelationName(onerel),
! 								vacrelstats->num_index_scans,
! 								vacrelstats->pages_removed,
! 								vacrelstats->rel_pages,
! 								vacrelstats->tuples_deleted,
! 								vacrelstats->new_rel_tuples,
! 								pg_rusage_show(&ru0))));
! 		}
  	}
  }
  
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 4c7cfb0..d62adbc 100644
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*************** ReadBuffer_common(SMgrRelation smgr, cha
*** 341,347 ****
--- 341,350 ----
  			*hit = true;
  
  			if (VacuumCostActive)
+ 			{
+ 				VacuumPageHit++;
  				VacuumCostBalance += VacuumCostPageHit;
+ 			}
  
  			TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
  											  smgr->smgr_rnode.node.spcNode,
*************** ReadBuffer_common(SMgrRelation smgr, cha
*** 471,477 ****
--- 474,483 ----
  	}
  
  	if (VacuumCostActive)
+ 	{
+ 		VacuumPageMiss++;
  		VacuumCostBalance += VacuumCostPageMiss;
+ 	}
  
  	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
  									  smgr->smgr_rnode.node.spcNode,
*************** MarkBufferDirty(Buffer buffer)
*** 974,980 ****
--- 980,989 ----
  	 * If the buffer was not dirty already, do vacuum cost accounting.
  	 */
  	if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+ 	{
+ 		VacuumPageDirty++;
  		VacuumCostBalance += VacuumCostPageDirty;
+ 	}
  
  	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  
*************** SetBufferCommitInfoNeedsSave(Buffer buff
*** 2299,2305 ****
--- 2308,2317 ----
  		LockBufHdr(bufHdr);
  		Assert(bufHdr->refcount > 0);
  		if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+ 		{
+ 			VacuumPageDirty++;
  			VacuumCostBalance += VacuumCostPageDirty;
+ 		}
  		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  		UnlockBufHdr(bufHdr);
  	}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index c4c4154..9ce64e6 100644
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
*************** int			VacuumCostPageDirty = 20;
*** 115,120 ****
--- 115,124 ----
  int			VacuumCostLimit = 200;
  int			VacuumCostDelay = 0;
  
+ int			VacuumPageHit = 0;
+ int			VacuumPageMiss = 0;
+ int			VacuumPageDirty = 0;
+ 
  int			VacuumCostBalance = 0;		/* working state for vacuum */
  bool		VacuumCostActive = false;
  
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9d19417..4ee08fe 100644
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
*************** extern int	VacuumCostPageDirty;
*** 230,235 ****
--- 230,239 ----
  extern int	VacuumCostLimit;
  extern int	VacuumCostDelay;
  
+ extern int	VacuumPageHit;
+ extern int	VacuumPageMiss;
+ extern int	VacuumPageDirty;
+ 
  extern int	VacuumCostBalance;
  extern bool VacuumCostActive;

#23

Robert Haas

robertmhaas@gmail.com

over 14 years ago

In reply to: Greg Smith (#22)

Re: Displaying accumulated autovacuum cost

On Sun, Aug 28, 2011 at 5:54 AM, Greg Smith <greg@2ndquadrant.com> wrote:

Updated patch cleans up two diff mistakes made when backing out the progress
report feature. The tip-off I screwed up should have been the absurdly high
write rate shown. The usleep was accidentally deleted, so it was running
without cost limits even applying. Here's a good one instead:

LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index
scans: 1
pages: 0 removed, 163935 remain
tuples: 2000000 removed, 2928356 remain
buffer usage: 117393 hits, 123351 misses, 102684 dirtied, 2.168 MiB/s
write rate
system usage: CPU 2.54s/6.27u sec elapsed 369.99 sec

I postulate that this change isn't needed until you get to the part
where we do some kind of intermediate progress reporting:

! if (Log_autovacuum_min_duration > 0 || VacuumCostActive)

Instead of doing this only when vacuum costing is active, could we
drive it off of the pgBufferUsage stuff (maybe with a few tweaks...)
and do it unconditionally?

To me it seems like it would better to say "write rate xyz MB/s"
rather than "xyz MB/s write rate", but maybe I'm in the minority on
that one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#24

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Robert Haas (#23)

Re: Displaying accumulated autovacuum cost

On 08/29/2011 11:03 AM, Robert Haas wrote:

Instead of doing this only when vacuum costing is active, could we
drive it off of the pgBufferUsage stuff (maybe with a few tweaks...)
and do it unconditionally?

Sure. I've wondered about an ever larger refactoring, to reorient
vacuum costing around completely: drive it all from the pgBufferUsage
side rather than running its own totals. I didn't even start wandering
down that path yet because of time constraints, plus the desire to have
something I could backport to installs having VACUUM issues on earlier
versions. This code I'd backport without hesitation; something that
wanders toward a more complicated rearrangement becomes harder to deal with.

To me it seems like it would better to say "write rate xyz MB/s"
rather than "xyz MB/s write rate", but maybe I'm in the minority on
that one.

I was just trying to mimic the style of the logging already there as
closely as I could. I don't like the way the existing log message looks
either. I wasn't going to ignore its style over that though.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#25

Shigeru Hanada

shigeru.hanada@gmail.com

over 14 years ago

In reply to: Greg Smith (#22)

1 attachment(s)

Re: Displaying accumulated autovacuum cost

Hi Greg,

(2011/08/28 18:54), Greg Smith wrote:

Updated patch cleans up two diff mistakes made when backing out the
progress report feature. The tip-off I screwed up should have been the
absurdly high write rate shown. The usleep was accidentally deleted, so
it was running without cost limits even applying. Here's a good one
instead:

LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index
scans: 1
pages: 0 removed, 163935 remain
tuples: 2000000 removed, 2928356 remain
buffer usage: 117393 hits, 123351 misses, 102684 dirtied, 2.168 MiB/s
write rate
system usage: CPU 2.54s/6.27u sec elapsed 369.99 sec

I took a look at your patch, and it seems fine about fundamental
functionality, though the patch needed to be rebased against current
HEAD. Please see attached patch which I used for review.

IIUC, this patch provides:
* Three counters, which are used to keep # of buffers which were (1)
Hits: found in shared buffers, (2) Missed: not found in shared buffers,
and (3) Dirtied: marked as dirty, in an autovacuum of a relation.
These counters are used only when cost-based autovacuum is enabled by
setting vacuum_cost_delay to non-zero.
* Capability to report # of buffers above, and buffer write rate
(MiB/sec) in the existing autovacuum logging message, only when actual
duration > autovacuum_min_duration, and cost-based autovacuum is enabled.

I think one concern is the way showing statistics. If showing summary
of statistics (at the end of an autovacuum) in server log is enough,
current implementation is fine. Also showing progress report in server
log would be easy to achieve. In contrast, reporting progress via
another backend would require shared memory or statistics collector,
rather than per-process global variables. Anyway, this patch can be the
base of such enhancement.

There are some trivial comments:
* Local variables added by the patch (secs, usecs, write_rate and
endtime) can be moved into narrower scope.
* Initializing starttime to zero seems unnecessary.

In addition, documents about how to use the statistics would be
necessary, maybe in "23.1.5. The Autovacuum Daemon".

I'll set the status of this patch to waiting-on-author. Would you rebase
the patch and post it again?

Regards,
--
Shigeru Hanada

Attachments:

vacuum_stats_v4_rebased.patchtext/plain; name=vacuum_stats_v4_rebased.patchDownload

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 7fe787e..768c658 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*************** vacuum(VacuumStmt *vacstmt, Oid relid, b
*** 214,219 ****
--- 214,222 ----
  
  		VacuumCostActive = (VacuumCostDelay > 0);
  		VacuumCostBalance = 0;
+ 		VacuumPageHit = 0;
+ 		VacuumPageMiss = 0;
+ 		VacuumPageDirty = 0;
  
  		/*
  		 * Loop to process each selected relation.
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index cf8337b..d9bb272 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 153,170 ****
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  	BlockNumber new_rel_pages;
  	double		new_rel_tuples;
  	TransactionId new_frozen_xid;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0)
  			starttime = GetCurrentTimestamp();
  	}
  
--- 153,173 ----
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0, endtime;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  	BlockNumber new_rel_pages;
  	double		new_rel_tuples;
  	TransactionId new_frozen_xid;
+ 	long		secs;
+ 	int			usecs;
+ 	double		write_rate;
  
  	/* measure elapsed time iff autovacuum logging requires it */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0 || VacuumCostActive)
  			starttime = GetCurrentTimestamp();
  	}
  
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 250,272 ****
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
  									   Log_autovacuum_min_duration))
! 			ereport(LOG,
! 					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 							"pages: %d removed, %d remain\n"
! 							"tuples: %.0f removed, %.0f remain\n"
! 							"system usage: %s",
! 							get_database_name(MyDatabaseId),
! 							get_namespace_name(RelationGetNamespace(onerel)),
! 							RelationGetRelationName(onerel),
! 							vacrelstats->num_index_scans,
! 							vacrelstats->pages_removed,
! 							vacrelstats->rel_pages,
! 							vacrelstats->tuples_deleted,
! 							new_rel_tuples,
! 							pg_rusage_show(&ru0))));
  	}
  }
  
--- 253,309 ----
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
+ 		endtime = GetCurrentTimestamp();
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, endtime,
  									   Log_autovacuum_min_duration))
! 		{
! 			if (VacuumCostActive)
! 			{
! 				TimestampDifference(starttime, endtime, &secs, &usecs);
! 				write_rate = 0;
! 				if ((secs > 0) || (usecs > 0))
! 					write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) /
! 						(secs + usecs / 1000000.0);
! 
! 				ereport(LOG,
! 						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 								"pages: %d removed, %d remain\n"
! 								"tuples: %.0f removed, %.0f remain\n"
! 								"buffer usage: %d hits, %d misses, %d dirtied, %.3f MiB/s write rate\n"
! 								"system usage: %s",
! 								get_database_name(MyDatabaseId),
! 								get_namespace_name(RelationGetNamespace(onerel)),
! 								RelationGetRelationName(onerel),
! 								vacrelstats->num_index_scans,
! 								vacrelstats->pages_removed,
! 								vacrelstats->rel_pages,
! 								vacrelstats->tuples_deleted,
! 								new_rel_tuples,
! 								VacuumPageHit,
! 								VacuumPageMiss,
! 								VacuumPageDirty,
! 								write_rate,
! 								pg_rusage_show(&ru0))));
! 			}
! 			else
! 			{
! 				ereport(LOG,
! 						(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
! 								"pages: %d removed, %d remain\n"
! 								"tuples: %.0f removed, %.0f remain\n"
! 								"system usage: %s",
! 								get_database_name(MyDatabaseId),
! 								get_namespace_name(RelationGetNamespace(onerel)),
! 								RelationGetRelationName(onerel),
! 								vacrelstats->num_index_scans,
! 								vacrelstats->pages_removed,
! 								vacrelstats->rel_pages,
! 								vacrelstats->tuples_deleted,
! 								new_rel_tuples,
! 								pg_rusage_show(&ru0))));
! 			}
! 		}
  	}
  }
  
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8647edd..1b0415c 100644
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*************** ReadBuffer_common(SMgrRelation smgr, cha
*** 342,348 ****
--- 342,351 ----
  			*hit = true;
  
  			if (VacuumCostActive)
+ 			{
+ 				VacuumPageHit++;
  				VacuumCostBalance += VacuumCostPageHit;
+ 			}
  
  			TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
  											  smgr->smgr_rnode.node.spcNode,
*************** ReadBuffer_common(SMgrRelation smgr, cha
*** 472,478 ****
--- 475,484 ----
  	}
  
  	if (VacuumCostActive)
+ 	{
+ 		VacuumPageMiss++;
  		VacuumCostBalance += VacuumCostPageMiss;
+ 	}
  
  	TRACE_POSTGRESQL_BUFFER_READ_DONE(forkNum, blockNum,
  									  smgr->smgr_rnode.node.spcNode,
*************** MarkBufferDirty(Buffer buffer)
*** 975,981 ****
--- 981,990 ----
  	 * If the buffer was not dirty already, do vacuum cost accounting.
  	 */
  	if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+ 	{
+ 		VacuumPageDirty++;
  		VacuumCostBalance += VacuumCostPageDirty;
+ 	}
  
  	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  
*************** SetBufferCommitInfoNeedsSave(Buffer buff
*** 2300,2306 ****
--- 2309,2318 ----
  		LockBufHdr(bufHdr);
  		Assert(bufHdr->refcount > 0);
  		if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
+ 		{
+ 			VacuumPageDirty++;
  			VacuumCostBalance += VacuumCostPageDirty;
+ 		}
  		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  		UnlockBufHdr(bufHdr);
  	}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index c4c4154..9ce64e6 100644
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
*************** int			VacuumCostPageDirty = 20;
*** 115,120 ****
--- 115,124 ----
  int			VacuumCostLimit = 200;
  int			VacuumCostDelay = 0;
  
+ int			VacuumPageHit = 0;
+ int			VacuumPageMiss = 0;
+ int			VacuumPageDirty = 0;
+ 
  int			VacuumCostBalance = 0;		/* working state for vacuum */
  bool		VacuumCostActive = false;
  
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9d19417..4ee08fe 100644
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
*************** extern int	VacuumCostPageDirty;
*** 230,235 ****
--- 230,239 ----
  extern int	VacuumCostLimit;
  extern int	VacuumCostDelay;
  
+ extern int	VacuumPageHit;
+ extern int	VacuumPageMiss;
+ extern int	VacuumPageDirty;
+ 
  extern int	VacuumCostBalance;
  extern bool VacuumCostActive;

#26

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Shigeru Hanada (#25)

Re: Displaying accumulated autovacuum cost

I reviewed this patch. My question for you is: does it make sense to
enable to reporting of write rate even when vacuum cost accounting is
enabled? In my opinion it would be useful to do so. If you agree,
please submit an updated patch.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#27

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Shigeru Hanada (#25)

Re: Displaying accumulated autovacuum cost

On 09/26/2011 05:58 AM, Shigeru Hanada wrote:

* Local variables added by the patch (secs, usecs, write_rate and
endtime) can be moved into narrower scope.
* Initializing starttime to zero seems unnecessary.

Setting starttime to 0 is already in the code; the change made to that
line was to add endtime, which is not initialized. You may be right that
initializing it isn't necessary, but I'm sure not going to touch that
part of the working code.

You're right about the the local variables; they were placed to look
like the surrounding code rather than to be as local as possible. I'm
not sure if all the PostgreSQL code is completely consistent here; a
quick survey shows examples of both "put all the variables at the top"
and "make variables as local to blocks as possible" styles. I don't know
that it really makes any difference with modern compilers, either. I'm
sure someone else will have a stronger opinion on this subject now that
I've trolled the list for one by writing this.

In addition, documents about how to use the statistics would be
necessary, maybe in "23.1.5. The Autovacuum Daemon".
I'll set the status of this patch to waiting-on-author. Would you rebase
the patch and post it again?

I didn't do that because there's not really much documentation at this
level of detail yet--how to interpret all the information in the logs.
That's an open-ended bit of work; there's a lot more that could be
written on this topic than the docs have right now. It's probably worth
pointing out that this specific info is now in the logs, though, given
that people might not notice it otherwise. I'll see what I can do about
that.

As a general FYI, rebasing is normally requested only when the existing
patch doesn't apply anymore. If "patch" or "git apply" can consume it,
complaints about code shifting around isn't by itself enough reason to
ask for an updated patch.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#28

Greg Smith

greg@2ndQuadrant.com

over 14 years ago

In reply to: Alvaro Herrera (#26)

Re: Displaying accumulated autovacuum cost

On 09/29/2011 10:40 AM, Alvaro Herrera wrote:

I reviewed this patch. My question for you is: does it make sense to
enable to reporting of write rate even when vacuum cost accounting is
enabled? In my opinion it would be useful to do so. If you agree,
please submit an updated patch.

Presumably you meant to ask if this makes sense to show when cost
accounting isn't enabled, because the code doesn't do that right now.
No cost accounting, no buffer usage/write rate data as this was submitted.

Looks like making this work even in cases where cost accounting isn't on
will make the patch a bit larger obtrusive, but it's not unreasonable.
Now that you mention it, people who do a manual, full-speed VACUUM would
certainly appreciate some feedback on the rate it ran at. I'll include
that in the next update.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#29

Alvaro Herrera

alvherre@commandprompt.com

over 14 years ago

In reply to: Greg Smith (#28)

Re: Displaying accumulated autovacuum cost

Excerpts from Greg Smith's message of mié oct 05 04:02:12 -0300 2011:

On 09/29/2011 10:40 AM, Alvaro Herrera wrote:

I reviewed this patch. My question for you is: does it make sense to
enable to reporting of write rate even when vacuum cost accounting is
enabled? In my opinion it would be useful to do so. If you agree,
please submit an updated patch.

Presumably you meant to ask if this makes sense to show when cost
accounting isn't enabled, because the code doesn't do that right now.
No cost accounting, no buffer usage/write rate data as this was submitted.

Yes, sorry, that's what I meant.

Looks like making this work even in cases where cost accounting isn't on
will make the patch a bit larger obtrusive, but it's not unreasonable.
Now that you mention it, people who do a manual, full-speed VACUUM would
certainly appreciate some feedback on the rate it ran at. I'll include
that in the next update.

Thanks.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#30

Greg Smith

greg@2ndQuadrant.com

about 14 years ago

In reply to: Greg Smith (#28)

1 attachment(s)

Re: Displaying accumulated autovacuum cost

On 10/05/2011 03:02 AM, Greg Smith wrote:

Presumably you meant to ask if this makes sense to show when cost
accounting isn't enabled, because the code doesn't do that right now.
No cost accounting, no buffer usage/write rate data as this was
submitted.

This is done in the attached update. I just made the page accounting
happen all the time, regardless of whether the costs were being
accumulated. Added a read rate too, which is how fast reads happened
from the OS cache to shared_buffers. Simple test case generates a 600MB
pgbench_accounts database and wipes out enough to take a while to clean
up; it needs log_autovacuum_min_duration = 0 and then:

$ createdb pgbench
$ pgbench -i -s 10 pgbench
$ psql -d pgbench -c "delete from pgbench_accounts where aid<200000"

LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index
scans: 1
pages: 0 removed, 16394 remain
tuples: 199999 removed, 640011 remain
buffer usage: 13742 hits, 2708 misses, 1058 dirtied
avg read rate: 3.067 MiB/s, avg write rate: 1.198 MiB/s
system usage: CPU 0.05s/0.61u sec elapsed 6.89 sec

Now that you mention it, people who do a manual, full-speed VACUUM
would certainly appreciate some feedback on the rate it ran at.

This is more of a pain because this whole code path is only active when
IsAutoVacuumWorkerProcess. I have some larger refactoring in mind to
perhaps make that more feasible. I didn't want to hold this update
aiming at the more valuable autovac case for that though, can always
layer it on later.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

Attachments:

vacuum_stats_v5.patchtext/x-patch; name=vacuum_stats_v5.patchDownload

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index f42504c..6ef85dd 100644
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
*************** vacuum(VacuumStmt *vacstmt, Oid relid, b
*** 214,219 ****
--- 214,222 ----
  
  		VacuumCostActive = (VacuumCostDelay > 0);
  		VacuumCostBalance = 0;
+ 		VacuumPageHit = 0;
+ 		VacuumPageMiss = 0;
+ 		VacuumPageDirty = 0;
  
  		/*
  		 * Loop to process each selected relation.
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 38deddc..c59fceb 100644
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 154,160 ****
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  	BlockNumber new_rel_pages;
--- 154,163 ----
  	int			nindexes;
  	BlockNumber possibly_freeable;
  	PGRUsage	ru0;
! 	TimestampTz starttime = 0, endtime;
!  	long		secs;
!  	int			usecs;
!  	double		read_rate, write_rate;
  	bool		scan_all;
  	TransactionId freezeTableLimit;
  	BlockNumber new_rel_pages;
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 166,173 ****
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		if (Log_autovacuum_min_duration > 0)
! 			starttime = GetCurrentTimestamp();
  	}
  
  	if (vacstmt->options & VACOPT_VERBOSE)
--- 169,175 ----
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		pg_rusage_init(&ru0);
! 		starttime = GetCurrentTimestamp();
  	}
  
  	if (vacstmt->options & VACOPT_VERBOSE)
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 262,274 ****
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, GetCurrentTimestamp(),
  									   Log_autovacuum_min_duration))
  			ereport(LOG,
  					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
  							"pages: %d removed, %d remain\n"
  							"tuples: %.0f removed, %.0f remain\n"
  							"system usage: %s",
  							get_database_name(MyDatabaseId),
  							get_namespace_name(RelationGetNamespace(onerel)),
--- 264,290 ----
  	/* and log the action if appropriate */
  	if (IsAutoVacuumWorkerProcess() && Log_autovacuum_min_duration >= 0)
  	{
+ 		endtime = GetCurrentTimestamp();
  		if (Log_autovacuum_min_duration == 0 ||
! 			TimestampDifferenceExceeds(starttime, endtime,
  									   Log_autovacuum_min_duration))
+ 		{
+ 			TimestampDifference(starttime, endtime, &secs, &usecs);
+ 			read_rate = 0;
+ 			write_rate = 0;
+ 			if ((secs > 0) || (usecs > 0))
+ 			{
+ 				read_rate = (double) BLCKSZ * VacuumPageMiss / (1024 * 1024) /
+ 					(secs + usecs / 1000000.0);
+ 				write_rate = (double) BLCKSZ * VacuumPageDirty / (1024 * 1024) /
+  					(secs + usecs / 1000000.0);
+ 			}
  			ereport(LOG,
  					(errmsg("automatic vacuum of table \"%s.%s.%s\": index scans: %d\n"
  							"pages: %d removed, %d remain\n"
  							"tuples: %.0f removed, %.0f remain\n"
+ 							"buffer usage: %d hits, %d misses, %d dirtied\n"
+ 							"avg read rate: %.3f MiB/s, avg write rate: %.3f MiB/s\n"
  							"system usage: %s",
  							get_database_name(MyDatabaseId),
  							get_namespace_name(RelationGetNamespace(onerel)),
*************** lazy_vacuum_rel(Relation onerel, VacuumS
*** 277,284 ****
  							vacrelstats->pages_removed,
  							vacrelstats->rel_pages,
  							vacrelstats->tuples_deleted,
! 							new_rel_tuples,
  							pg_rusage_show(&ru0))));
  	}
  }
  
--- 293,305 ----
  							vacrelstats->pages_removed,
  							vacrelstats->rel_pages,
  							vacrelstats->tuples_deleted,
! 							vacrelstats->new_rel_tuples,
! 							VacuumPageHit,
! 							VacuumPageMiss,
! 							VacuumPageDirty,
! 							read_rate,write_rate,
  							pg_rusage_show(&ru0))));
+ 		}
  	}
  }
  
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index e59af33..5306cb4 100644
*** a/src/backend/storage/buffer/bufmgr.c
--- b/src/backend/storage/buffer/bufmgr.c
*************** ReadBuffer_common(SMgrRelation smgr, cha
*** 340,345 ****
--- 340,346 ----
  		{
  			/* Just need to update stats before we exit */
  			*hit = true;
+ 			VacuumPageHit++;
  
  			if (VacuumCostActive)
  				VacuumCostBalance += VacuumCostPageHit;
*************** ReadBuffer_common(SMgrRelation smgr, cha
*** 471,476 ****
--- 472,478 ----
  		TerminateBufferIO(bufHdr, false, BM_VALID);
  	}
  
+ 	VacuumPageMiss++;
  	if (VacuumCostActive)
  		VacuumCostBalance += VacuumCostPageMiss;
  
*************** MarkBufferDirty(Buffer buffer)
*** 972,981 ****
  	Assert(bufHdr->refcount > 0);
  
  	/*
! 	 * If the buffer was not dirty already, do vacuum cost accounting.
  	 */
! 	if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
! 		VacuumCostBalance += VacuumCostPageDirty;
  
  	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  
--- 974,987 ----
  	Assert(bufHdr->refcount > 0);
  
  	/*
! 	 * If the buffer was not dirty already, do vacuum accounting.
  	 */
! 	if (!(bufHdr->flags & BM_DIRTY))
! 	{
! 		VacuumPageDirty++;
! 		if (VacuumCostActive)
! 			VacuumCostBalance += VacuumCostPageDirty;
! 	}
  
  	bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  
*************** SetBufferCommitInfoNeedsSave(Buffer buff
*** 2326,2333 ****
  	{
  		LockBufHdr(bufHdr);
  		Assert(bufHdr->refcount > 0);
! 		if (!(bufHdr->flags & BM_DIRTY) && VacuumCostActive)
! 			VacuumCostBalance += VacuumCostPageDirty;
  		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  		UnlockBufHdr(bufHdr);
  	}
--- 2332,2343 ----
  	{
  		LockBufHdr(bufHdr);
  		Assert(bufHdr->refcount > 0);
! 		if (!(bufHdr->flags & BM_DIRTY))
! 		{
! 			VacuumPageDirty++;
! 			if (VacuumCostActive)
! 				VacuumCostBalance += VacuumCostPageDirty;
! 		}
  		bufHdr->flags |= (BM_DIRTY | BM_JUST_DIRTIED);
  		UnlockBufHdr(bufHdr);
  	}
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index c4c4154..9ce64e6 100644
*** a/src/backend/utils/init/globals.c
--- b/src/backend/utils/init/globals.c
*************** int			VacuumCostPageDirty = 20;
*** 115,120 ****
--- 115,124 ----
  int			VacuumCostLimit = 200;
  int			VacuumCostDelay = 0;
  
+ int			VacuumPageHit = 0;
+ int			VacuumPageMiss = 0;
+ int			VacuumPageDirty = 0;
+ 
  int			VacuumCostBalance = 0;		/* working state for vacuum */
  bool		VacuumCostActive = false;
  
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 9d19417..4ee08fe 100644
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
*************** extern int	VacuumCostPageDirty;
*** 230,235 ****
--- 230,239 ----
  extern int	VacuumCostLimit;
  extern int	VacuumCostDelay;
  
+ extern int	VacuumPageHit;
+ extern int	VacuumPageMiss;
+ extern int	VacuumPageDirty;
+ 
  extern int	VacuumCostBalance;
  extern bool VacuumCostActive;

#31

Alvaro Herrera

alvherre@commandprompt.com

about 14 years ago

In reply to: Greg Smith (#30)

Re: Displaying accumulated autovacuum cost

Excerpts from Greg Smith's message of mié nov 16 04:37:43 -0300 2011:

On 10/05/2011 03:02 AM, Greg Smith wrote:

Presumably you meant to ask if this makes sense to show when cost
accounting isn't enabled, because the code doesn't do that right now.
No cost accounting, no buffer usage/write rate data as this was
submitted.

This is done in the attached update. I just made the page accounting
happen all the time, regardless of whether the costs were being
accumulated. Added a read rate too, which is how fast reads happened
from the OS cache to shared_buffers. Simple test case generates a 600MB
pgbench_accounts database and wipes out enough to take a while to clean
up; it needs log_autovacuum_min_duration = 0 and then:

$ createdb pgbench
$ pgbench -i -s 10 pgbench
$ psql -d pgbench -c "delete from pgbench_accounts where aid<200000"

LOG: automatic vacuum of table "pgbench.public.pgbench_accounts": index
scans: 1
pages: 0 removed, 16394 remain
tuples: 199999 removed, 640011 remain
buffer usage: 13742 hits, 2708 misses, 1058 dirtied
avg read rate: 3.067 MiB/s, avg write rate: 1.198 MiB/s
system usage: CPU 0.05s/0.61u sec elapsed 6.89 sec

I was about to commit this when I noticed that the avg values may not be
all that representative of reality after all; consider that it's
computed across the whole duration of the vacuuming operation, including
the index scans ... it'd be possibly useful to keep separate timings for
the heap scan (which is likely to use I/O more quickly) from index
scans. That way you can tune for the max, not a possibly misleading
average. That's a much larger change though, so I'm not going to get
into it.

Does anybody else think this would be worthwhile? If so we can stick it
into the TODO with an "easy" tag for someone to tackle -- seems like a
useful first project.

One funny thing in the test I did was that the buffer count might add to
a much larger amount than total disk pages:

LOG: automatic vacuum of table "alvherre.public.foo": index scans: 4
pages: 0 removed, 8850 remain
tuples: 666668 removed, 1333332 remain
buffer usage: 14675 hits, 33857 misses, 20274 dirtied
avg read rate: 2.823 MiB/s, avg write rate: 1.690 MiB/s
system usage: CPU 1.26s/8.08u sec elapsed 93.69 sec

The table and index:

alvherre=# select relname, relpages from pg_class where relname like 'foo%';
relname | relpages
---------+----------
foo | 8850
foo_idx | 5487
(2 filas)

My guess is that this is roughly counting three heap scans plus the four
index scans mentioned in the log report (there were so many because I
reduced maintenance_work_mem to its 1 MB minimum):

alvherre=# select 5487 * 4 + 8850 * 3;
?column?
----------
48498
(1 fila)

alvherre=# select 14675 + 33857;
?column?
----------
48532
(1 fila)

My test case was

create table foo (a int);
insert into foo select * from generate_series (1, 2000000);
create index foo_idx on foo (a);
delete from foo where a % 6 in (1,2);

I then checkpointed before autovac had the chance to process the table,
just to see a higher number of pages dirtied. Note how the number of
pages dirtied is also much higher the total number of existing pages!

I'm going to push this now anyway, thanks.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#32

Robert Haas

robertmhaas@gmail.com

about 14 years ago

In reply to: Alvaro Herrera (#31)

1 attachment(s)

Re: Displaying accumulated autovacuum cost

On Fri, Nov 25, 2011 at 11:39 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

I'm going to push this now anyway, thanks.

This patch adds a count of the number of buffers dirtied to VACUUM,
but it strikes me that it would be useful to add similar tracking to
pgBufferUsage. Attached is a patch for that. You can see the new
counters through pg_stat_statements or with EXPLAIN (ANALYZE,
BUFFERS). This is useful because the number of buffers that a query
*writes* doesn't necessarily have much to do with anything - it may
end up writing buffers dirtied by other queries while being read-only
itself, or conversely it may not write anything at all even though it
dirties quite a bit.

Thoughts? Comments? Objections?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

dirty_buffers.patchapplication/octet-stream; name=dirty_buffers.patchDownload

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index e086fd8..e8aed61 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -4,7 +4,8 @@ MODULE_big = pg_stat_statements
 OBJS = pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.0.sql pg_stat_statements--unpackaged--1.0.sql
+DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+	pg_stat_statements--unpackaged--1.0.sql
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql
new file mode 100644
index 0000000..0086338
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql
@@ -0,0 +1,37 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql */
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements();
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(
+    OUT userid oid,
+    OUT dbid oid,
+    OUT query text,
+    OUT calls int8,
+    OUT total_time float8,
+    OUT rows int8,
+    OUT shared_blks_hit int8,
+    OUT shared_blks_read int8,
+    OUT shared_blks_dirtied int8,
+    OUT shared_blks_written int8,
+    OUT local_blks_hit int8,
+    OUT local_blks_read int8,
+    OUT local_blks_dirtied int8,
+    OUT local_blks_written int8,
+    OUT temp_blks_read int8,
+    OUT temp_blks_written int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements();
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
new file mode 100644
index 0000000..77eff6b
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
@@ -0,0 +1,41 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
+
+-- Register functions.
+CREATE FUNCTION pg_stat_statements_reset()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE FUNCTION pg_stat_statements(
+    OUT userid oid,
+    OUT dbid oid,
+    OUT query text,
+    OUT calls int8,
+    OUT total_time float8,
+    OUT rows int8,
+    OUT shared_blks_hit int8,
+    OUT shared_blks_read int8,
+    OUT shared_blks_dirtied int8,
+    OUT shared_blks_written int8,
+    OUT local_blks_hit int8,
+    OUT local_blks_read int8,
+    OUT local_blks_dirtied int8,
+    OUT local_blks_written int8,
+    OUT temp_blks_read int8,
+    OUT temp_blks_written int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+-- Register a view on the function for ease of use.
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements();
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
+
+-- Don't want this to be available to non-superusers.
+REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 8dc3054..7b312e0 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -77,9 +77,11 @@ typedef struct Counters
 	int64		rows;			/* total # of retrieved or affected rows */
 	int64		shared_blks_hit;	/* # of shared buffer hits */
 	int64		shared_blks_read;		/* # of shared disk blocks read */
+	int64		shared_blks_dirtied;	/* # of shared disk blocks dirtied */
 	int64		shared_blks_written;	/* # of shared disk blocks written */
 	int64		local_blks_hit; /* # of local buffer hits */
 	int64		local_blks_read;	/* # of local disk blocks read */
+	int64		local_blks_dirtied;		/* # of local disk blocks dirtied */
 	int64		local_blks_written;		/* # of local disk blocks written */
 	int64		temp_blks_read; /* # of temp blocks read */
 	int64		temp_blks_written;		/* # of temp blocks written */
@@ -652,12 +654,16 @@ pgss_ProcessUtility(Node *parsetree, const char *queryString,
 			pgBufferUsage.shared_blks_hit - bufusage.shared_blks_hit;
 		bufusage.shared_blks_read =
 			pgBufferUsage.shared_blks_read - bufusage.shared_blks_read;
+		bufusage.shared_blks_dirtied =
+			pgBufferUsage.shared_blks_dirtied - bufusage.shared_blks_dirtied;
 		bufusage.shared_blks_written =
 			pgBufferUsage.shared_blks_written - bufusage.shared_blks_written;
 		bufusage.local_blks_hit =
 			pgBufferUsage.local_blks_hit - bufusage.local_blks_hit;
 		bufusage.local_blks_read =
 			pgBufferUsage.local_blks_read - bufusage.local_blks_read;
+		bufusage.local_blks_dirtied =
+			pgBufferUsage.local_blks_dirtied - bufusage.local_blks_dirtied;
 		bufusage.local_blks_written =
 			pgBufferUsage.local_blks_written - bufusage.local_blks_written;
 		bufusage.temp_blks_read =
@@ -766,9 +772,11 @@ pgss_store(const char *query, double total_time, uint64 rows,
 		e->counters.rows += rows;
 		e->counters.shared_blks_hit += bufusage->shared_blks_hit;
 		e->counters.shared_blks_read += bufusage->shared_blks_read;
+		e->counters.shared_blks_dirtied += bufusage->shared_blks_dirtied;
 		e->counters.shared_blks_written += bufusage->shared_blks_written;
 		e->counters.local_blks_hit += bufusage->local_blks_hit;
 		e->counters.local_blks_read += bufusage->local_blks_read;
+		e->counters.local_blks_dirtied += bufusage->local_blks_dirtied;
 		e->counters.local_blks_written += bufusage->local_blks_written;
 		e->counters.temp_blks_read += bufusage->temp_blks_read;
 		e->counters.temp_blks_written += bufusage->temp_blks_written;
@@ -793,7 +801,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
-#define PG_STAT_STATEMENTS_COLS		14
+#define PG_STAT_STATEMENTS_COLS_V1_0	14
+#define PG_STAT_STATEMENTS_COLS			16
 
 /*
  * Retrieve statement statistics.
@@ -810,6 +819,7 @@ pg_stat_statements(PG_FUNCTION_ARGS)
 	bool		is_superuser = superuser();
 	HASH_SEQ_STATUS hash_seq;
 	pgssEntry  *entry;
+	bool		sql_supports_dirty_counters = true;
 
 	if (!pgss || !pgss_hash)
 		ereport(ERROR,
@@ -830,6 +840,8 @@ pg_stat_statements(PG_FUNCTION_ARGS)
 	/* Build a tuple descriptor for our result type */
 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
 		elog(ERROR, "return type must be a row type");
+	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
+		sql_supports_dirty_counters = false;
 
 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
@@ -887,14 +899,19 @@ pg_stat_statements(PG_FUNCTION_ARGS)
 		values[i++] = Int64GetDatumFast(tmp.rows);
 		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
 		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
+		if (sql_supports_dirty_counters)
+			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
 		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
 		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
 		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
+		if (sql_supports_dirty_counters)
+			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
 		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
 		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
 		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
 
-		Assert(i == PG_STAT_STATEMENTS_COLS);
+		Assert(i == sql_supports_dirty_counters ? \
+			PG_STAT_STATEMENTS_COLS : PG_STAT_STATEMENTS_COLS_V1_0);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 6f9a947..428fbb2 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
 # pg_stat_statements extension
 comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.0'
+default_version = '1.1'
 module_pathname = '$libdir/pg_stat_statements'
 relocatable = true
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 5a0230c..ab34ca1 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -100,6 +100,13 @@
      </row>
 
      <row>
+      <entry><structfield>shared_blks_dirtied</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Total number of shared blocks dirtied by the statement</entry>
+     </row>
+
+     <row>
       <entry><structfield>shared_blks_written</structfield></entry>
       <entry><type>bigint</type></entry>
       <entry></entry>
@@ -121,6 +128,13 @@
      </row>
 
      <row>
+      <entry><structfield>local_blks_dirtied</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Total number of local blocks dirtied by the statement</entry>
+     </row>
+
+     <row>
       <entry><structfield>local_blks_written</structfield></entry>
       <entry><type>bigint</type></entry>
       <entry></entry>
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e38de5c..b2c4bfb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1144,12 +1144,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		{
 			bool		has_shared = (usage->shared_blks_hit > 0 ||
 									  usage->shared_blks_read > 0 ||
-									  usage->shared_blks_written);
+									  usage->shared_blks_dirtied > 0 ||
+									  usage->shared_blks_written > 0);
 			bool		has_local = (usage->local_blks_hit > 0 ||
 									 usage->local_blks_read > 0 ||
-									 usage->local_blks_written);
+									 usage->local_blks_dirtied > 0 ||
+									 usage->local_blks_written > 0);
 			bool		has_temp = (usage->temp_blks_read > 0 ||
-									usage->temp_blks_written);
+									usage->temp_blks_written > 0);
 
 			/* Show only positive counter values. */
 			if (has_shared || has_local || has_temp)
@@ -1166,6 +1168,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					if (usage->shared_blks_read > 0)
 						appendStringInfo(es->str, " read=%ld",
 										 usage->shared_blks_read);
+					if (usage->shared_blks_dirtied > 0)
+						appendStringInfo(es->str, " dirtied=%ld",
+										 usage->shared_blks_dirtied);
 					if (usage->shared_blks_written > 0)
 						appendStringInfo(es->str, " written=%ld",
 										 usage->shared_blks_written);
@@ -1181,6 +1186,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					if (usage->local_blks_read > 0)
 						appendStringInfo(es->str, " read=%ld",
 										 usage->local_blks_read);
+					if (usage->local_blks_dirtied > 0)
+						appendStringInfo(es->str, " dirtied=%ld",
+										 usage->local_blks_dirtied);
 					if (usage->local_blks_written > 0)
 						appendStringInfo(es->str, " written=%ld",
 										 usage->local_blks_written);
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 9d30200..b34df2a 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -129,9 +129,11 @@ BufferUsageAccumDiff(BufferUsage *dst,
 {
 	dst->shared_blks_hit += add->shared_blks_hit - sub->shared_blks_hit;
 	dst->shared_blks_read += add->shared_blks_read - sub->shared_blks_read;
+	dst->shared_blks_dirtied += add->shared_blks_dirtied - sub->shared_blks_dirtied;
 	dst->shared_blks_written += add->shared_blks_written - sub->shared_blks_written;
 	dst->local_blks_hit += add->local_blks_hit - sub->local_blks_hit;
 	dst->local_blks_read += add->local_blks_read - sub->local_blks_read;
+	dst->local_blks_dirtied += add->local_blks_dirtied - sub->local_blks_dirtied;
 	dst->local_blks_written += add->local_blks_written - sub->local_blks_written;
 	dst->temp_blks_read += add->temp_blks_read - sub->temp_blks_read;
 	dst->temp_blks_written += add->temp_blks_written - sub->temp_blks_written;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 71fe8c6..fbbeb22 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -979,6 +979,7 @@ MarkBufferDirty(Buffer buffer)
 	if (!(bufHdr->flags & BM_DIRTY))
 	{
 		VacuumPageDirty++;
+		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
 	}
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 6acb6eb..1806cd6 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -276,6 +276,10 @@ MarkLocalBufferDirty(Buffer buffer)
 	Assert(LocalRefCount[bufid] > 0);
 
 	bufHdr = &LocalBufferDescriptors[bufid];
+
+	if (!(bufHdr->flags & BM_DIRTY))
+		pgBufferUsage.local_blks_dirtied++;
+
 	bufHdr->flags |= BM_DIRTY;
 }
 
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 22c3106..1eb53d5 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -20,9 +20,11 @@ typedef struct BufferUsage
 {
 	long		shared_blks_hit;	/* # of shared buffer hits */
 	long		shared_blks_read;		/* # of shared disk blocks read */
+	long		shared_blks_dirtied;	/* # of shared blocks dirtied */
 	long		shared_blks_written;	/* # of shared disk blocks written */
 	long		local_blks_hit; /* # of local buffer hits */
 	long		local_blks_read;	/* # of local disk blocks read */
+	long		local_blks_dirtied;		/* # of shared blocks dirtied */
 	long		local_blks_written;		/* # of local disk blocks written */
 	long		temp_blks_read; /* # of temp blocks read */
 	long		temp_blks_written;		/* # of temp blocks written */

#33

Greg Smith

greg@2ndQuadrant.com

about 14 years ago

In reply to: Alvaro Herrera (#31)

Re: Displaying accumulated autovacuum cost

On 11/25/2011 08:39 AM, Alvaro Herrera wrote:

I was about to commit this when I noticed that the avg values may not be
all that representative of reality after all; consider that it's
computed across the whole duration of the vacuuming operation, including
the index scans ... it'd be possibly useful to keep separate timings for
the heap scan (which is likely to use I/O more quickly) from index
scans. That way you can tune for the max, not a possibly misleading
average. That's a much larger change though, so I'm not going to get
into it.

Yes, that's one of the interesting additional things to track. At the
point I was writing this originally, things like "Single Pass VACUUM"
were still on the table; maybe it still is. Didn't seem worthwhile to
try and get any more detailed than the average when the underlying work
might still be whacked around. Just having some useful numbers on both
costs and speeds for the first time was such an improvement I didn't
want to get too far lost on chasing perfection here, at the risk of not
getting anything done.

One funny thing in the test I did was that the buffer count might add to
a much larger amount than total disk pages:

Good catch, I didn't think about the ramifications of multi-pass work
here. I'll revisit that and see if I can do a bit better before the
final 9.2 CommitFest; I'm not done with this area yet. I kind of needed
this basic logging patch in place before it's easy to experiment with
behavior changes and see what they do though.

#34

Greg Smith

greg@2ndQuadrant.com

about 14 years ago

In reply to: Robert Haas (#32)

Re: Displaying accumulated autovacuum cost

On 11/25/2011 05:10 PM, Robert Haas wrote:

This patch adds a count of the number of buffers dirtied to VACUUM,
but it strikes me that it would be useful to add similar tracking to
pgBufferUsage. Attached is a patch for that. You can see the new
counters through pg_stat_statements or with EXPLAIN (ANALYZE,
BUFFERS).

This looks pretty useful to me. I just threw it into the current
CommitFest, on the basis that there's already so many other thing trying
to whack around pg_stat_statements right now we might as well keep them
together. Let me clear my queue of patches submitted on time I need to
do something with (re-review Scott Mead's pg_stat_activity change,
respond to Jeff Janes on relation free space) and I'll take a quick spin
on this one.

#35

Fujii Masao

masao.fujii@gmail.com

almost 14 years ago

In reply to: Robert Haas (#32)

Re: Displaying accumulated autovacuum cost

On Sat, Nov 26, 2011 at 10:10 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Nov 25, 2011 at 11:39 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

I'm going to push this now anyway, thanks.

This patch adds a count of the number of buffers dirtied to VACUUM,
but it strikes me that it would be useful to add similar tracking to
pgBufferUsage. Attached is a patch for that. You can see the new
counters through pg_stat_statements or with EXPLAIN (ANALYZE,
BUFFERS). This is useful because the number of buffers that a query
*writes* doesn't necessarily have much to do with anything - it may
end up writing buffers dirtied by other queries while being read-only
itself, or conversely it may not write anything at all even though it
dirties quite a bit.

Thoughts? Comments? Objections?

Here are review comments:

The document about EXPLAIN needs to be updated.

You forgot to add the long-integer-valued property of shared/local_blks_dirtied.
So when I ran EXPLAIN and used json as a format, no information about
blks_dirtied
was reported.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#36

Robert Haas

robertmhaas@gmail.com

almost 14 years ago

In reply to: Fujii Masao (#35)

1 attachment(s)

Re: Displaying accumulated autovacuum cost

On Fri, Feb 17, 2012 at 5:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Here are review comments:

The document about EXPLAIN needs to be updated.

You forgot to add the long-integer-valued property of shared/local_blks_dirtied.
So when I ran EXPLAIN and used json as a format, no information about
blks_dirtied
was reported.

Thanks for the review. Updated patch attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

dirty_buffers_v2.patchapplication/octet-stream; name=dirty_buffers_v2.patchDownload

diff --git a/contrib/pg_stat_statements/Makefile b/contrib/pg_stat_statements/Makefile
index e086fd8..e8aed61 100644
--- a/contrib/pg_stat_statements/Makefile
+++ b/contrib/pg_stat_statements/Makefile
@@ -4,7 +4,8 @@ MODULE_big = pg_stat_statements
 OBJS = pg_stat_statements.o
 
 EXTENSION = pg_stat_statements
-DATA = pg_stat_statements--1.0.sql pg_stat_statements--unpackaged--1.0.sql
+DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+	pg_stat_statements--unpackaged--1.0.sql
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql
new file mode 100644
index 0000000..0086338
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql
@@ -0,0 +1,37 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.0--1.1.sql */
+
+/* First we have to remove them from the extension */
+ALTER EXTENSION pg_stat_statements DROP VIEW pg_stat_statements;
+ALTER EXTENSION pg_stat_statements DROP FUNCTION pg_stat_statements();
+
+/* Then we can drop them */
+DROP VIEW pg_stat_statements;
+DROP FUNCTION pg_stat_statements();
+
+/* Now redefine */
+CREATE FUNCTION pg_stat_statements(
+    OUT userid oid,
+    OUT dbid oid,
+    OUT query text,
+    OUT calls int8,
+    OUT total_time float8,
+    OUT rows int8,
+    OUT shared_blks_hit int8,
+    OUT shared_blks_read int8,
+    OUT shared_blks_dirtied int8,
+    OUT shared_blks_written int8,
+    OUT local_blks_hit int8,
+    OUT local_blks_read int8,
+    OUT local_blks_dirtied int8,
+    OUT local_blks_written int8,
+    OUT temp_blks_read int8,
+    OUT temp_blks_written int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements();
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements--1.1.sql b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
new file mode 100644
index 0000000..77eff6b
--- /dev/null
+++ b/contrib/pg_stat_statements/pg_stat_statements--1.1.sql
@@ -0,0 +1,41 @@
+/* contrib/pg_stat_statements/pg_stat_statements--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_stat_statements" to load this file. \quit
+
+-- Register functions.
+CREATE FUNCTION pg_stat_statements_reset()
+RETURNS void
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+CREATE FUNCTION pg_stat_statements(
+    OUT userid oid,
+    OUT dbid oid,
+    OUT query text,
+    OUT calls int8,
+    OUT total_time float8,
+    OUT rows int8,
+    OUT shared_blks_hit int8,
+    OUT shared_blks_read int8,
+    OUT shared_blks_dirtied int8,
+    OUT shared_blks_written int8,
+    OUT local_blks_hit int8,
+    OUT local_blks_read int8,
+    OUT local_blks_dirtied int8,
+    OUT local_blks_written int8,
+    OUT temp_blks_read int8,
+    OUT temp_blks_written int8
+)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME'
+LANGUAGE C;
+
+-- Register a view on the function for ease of use.
+CREATE VIEW pg_stat_statements AS
+  SELECT * FROM pg_stat_statements();
+
+GRANT SELECT ON pg_stat_statements TO PUBLIC;
+
+-- Don't want this to be available to non-superusers.
+REVOKE ALL ON FUNCTION pg_stat_statements_reset() FROM PUBLIC;
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 434aa71..914fbf2 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -77,9 +77,11 @@ typedef struct Counters
 	int64		rows;			/* total # of retrieved or affected rows */
 	int64		shared_blks_hit;	/* # of shared buffer hits */
 	int64		shared_blks_read;		/* # of shared disk blocks read */
+	int64		shared_blks_dirtied;	/* # of shared disk blocks dirtied */
 	int64		shared_blks_written;	/* # of shared disk blocks written */
 	int64		local_blks_hit; /* # of local buffer hits */
 	int64		local_blks_read;	/* # of local disk blocks read */
+	int64		local_blks_dirtied;		/* # of local disk blocks dirtied */
 	int64		local_blks_written;		/* # of local disk blocks written */
 	int64		temp_blks_read; /* # of temp blocks read */
 	int64		temp_blks_written;		/* # of temp blocks written */
@@ -652,12 +654,16 @@ pgss_ProcessUtility(Node *parsetree, const char *queryString,
 			pgBufferUsage.shared_blks_hit - bufusage.shared_blks_hit;
 		bufusage.shared_blks_read =
 			pgBufferUsage.shared_blks_read - bufusage.shared_blks_read;
+		bufusage.shared_blks_dirtied =
+			pgBufferUsage.shared_blks_dirtied - bufusage.shared_blks_dirtied;
 		bufusage.shared_blks_written =
 			pgBufferUsage.shared_blks_written - bufusage.shared_blks_written;
 		bufusage.local_blks_hit =
 			pgBufferUsage.local_blks_hit - bufusage.local_blks_hit;
 		bufusage.local_blks_read =
 			pgBufferUsage.local_blks_read - bufusage.local_blks_read;
+		bufusage.local_blks_dirtied =
+			pgBufferUsage.local_blks_dirtied - bufusage.local_blks_dirtied;
 		bufusage.local_blks_written =
 			pgBufferUsage.local_blks_written - bufusage.local_blks_written;
 		bufusage.temp_blks_read =
@@ -766,9 +772,11 @@ pgss_store(const char *query, double total_time, uint64 rows,
 		e->counters.rows += rows;
 		e->counters.shared_blks_hit += bufusage->shared_blks_hit;
 		e->counters.shared_blks_read += bufusage->shared_blks_read;
+		e->counters.shared_blks_dirtied += bufusage->shared_blks_dirtied;
 		e->counters.shared_blks_written += bufusage->shared_blks_written;
 		e->counters.local_blks_hit += bufusage->local_blks_hit;
 		e->counters.local_blks_read += bufusage->local_blks_read;
+		e->counters.local_blks_dirtied += bufusage->local_blks_dirtied;
 		e->counters.local_blks_written += bufusage->local_blks_written;
 		e->counters.temp_blks_read += bufusage->temp_blks_read;
 		e->counters.temp_blks_written += bufusage->temp_blks_written;
@@ -793,7 +801,8 @@ pg_stat_statements_reset(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
-#define PG_STAT_STATEMENTS_COLS		14
+#define PG_STAT_STATEMENTS_COLS_V1_0	14
+#define PG_STAT_STATEMENTS_COLS			16
 
 /*
  * Retrieve statement statistics.
@@ -810,6 +819,7 @@ pg_stat_statements(PG_FUNCTION_ARGS)
 	bool		is_superuser = superuser();
 	HASH_SEQ_STATUS hash_seq;
 	pgssEntry  *entry;
+	bool		sql_supports_dirty_counters = true;
 
 	if (!pgss || !pgss_hash)
 		ereport(ERROR,
@@ -830,6 +840,8 @@ pg_stat_statements(PG_FUNCTION_ARGS)
 	/* Build a tuple descriptor for our result type */
 	if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
 		elog(ERROR, "return type must be a row type");
+	if (tupdesc->natts == PG_STAT_STATEMENTS_COLS_V1_0)
+		sql_supports_dirty_counters = false;
 
 	per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
 	oldcontext = MemoryContextSwitchTo(per_query_ctx);
@@ -887,14 +899,19 @@ pg_stat_statements(PG_FUNCTION_ARGS)
 		values[i++] = Int64GetDatumFast(tmp.rows);
 		values[i++] = Int64GetDatumFast(tmp.shared_blks_hit);
 		values[i++] = Int64GetDatumFast(tmp.shared_blks_read);
+		if (sql_supports_dirty_counters)
+			values[i++] = Int64GetDatumFast(tmp.shared_blks_dirtied);
 		values[i++] = Int64GetDatumFast(tmp.shared_blks_written);
 		values[i++] = Int64GetDatumFast(tmp.local_blks_hit);
 		values[i++] = Int64GetDatumFast(tmp.local_blks_read);
+		if (sql_supports_dirty_counters)
+			values[i++] = Int64GetDatumFast(tmp.local_blks_dirtied);
 		values[i++] = Int64GetDatumFast(tmp.local_blks_written);
 		values[i++] = Int64GetDatumFast(tmp.temp_blks_read);
 		values[i++] = Int64GetDatumFast(tmp.temp_blks_written);
 
-		Assert(i == PG_STAT_STATEMENTS_COLS);
+		Assert(i == sql_supports_dirty_counters ? \
+			PG_STAT_STATEMENTS_COLS : PG_STAT_STATEMENTS_COLS_V1_0);
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
 	}
diff --git a/contrib/pg_stat_statements/pg_stat_statements.control b/contrib/pg_stat_statements/pg_stat_statements.control
index 6f9a947..428fbb2 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.control
+++ b/contrib/pg_stat_statements/pg_stat_statements.control
@@ -1,5 +1,5 @@
 # pg_stat_statements extension
 comment = 'track execution statistics of all SQL statements executed'
-default_version = '1.0'
+default_version = '1.1'
 module_pathname = '$libdir/pg_stat_statements'
 relocatable = true
diff --git a/doc/src/sgml/pgstatstatements.sgml b/doc/src/sgml/pgstatstatements.sgml
index 5a0230c..ab34ca1 100644
--- a/doc/src/sgml/pgstatstatements.sgml
+++ b/doc/src/sgml/pgstatstatements.sgml
@@ -100,6 +100,13 @@
      </row>
 
      <row>
+      <entry><structfield>shared_blks_dirtied</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Total number of shared blocks dirtied by the statement</entry>
+     </row>
+
+     <row>
       <entry><structfield>shared_blks_written</structfield></entry>
       <entry><type>bigint</type></entry>
       <entry></entry>
@@ -121,6 +128,13 @@
      </row>
 
      <row>
+      <entry><structfield>local_blks_dirtied</structfield></entry>
+      <entry><type>bigint</type></entry>
+      <entry></entry>
+      <entry>Total number of local blocks dirtied by the statement</entry>
+     </row>
+
+     <row>
       <entry><structfield>local_blks_written</structfield></entry>
       <entry><type>bigint</type></entry>
       <entry></entry>
diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml
index 419b72c..1f35a1d 100644
--- a/doc/src/sgml/ref/explain.sgml
+++ b/doc/src/sgml/ref/explain.sgml
@@ -155,14 +155,20 @@ ROLLBACK;
     <listitem>
      <para>
       Include information on buffer usage. Specifically, include the number of
-      shared blocks hits, reads, and writes, the number of local blocks hits,
-      reads, and writes, and the number of temp blocks reads and writes.
-      A <quote>hit</> means that a read was avoided because the block was
+      shared blocks hit, read, dirtied, and written, the number of local blocks
+      hit, read, dirtied, and written, and the number of temp blocks read and
+      written.
+      A <emphasis>hit</> means that a read was avoided because the block was
       found already in cache when needed.
       Shared blocks contain data from regular tables and indexes;
       local blocks contain data from temporary tables and indexes;
       while temp blocks contain short-term working data used in sorts, hashes,
       Materialize plan nodes, and similar cases.
+      The number of blocks <emphasis>dirtied</> indicates the number of
+      previously unmodified blocks that were changed by this query; while the
+      number of blocks <emphasis>written</> indicates the number of
+      previously-dirtied blocks evicted from cache by this backend during
+      query processing.
       The number of blocks shown for an
       upper-level node includes those used by all its child nodes.  In text
       format, only non-zero values are printed.  This parameter may only be
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a1692f8..93b1f34 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1183,12 +1183,14 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		{
 			bool		has_shared = (usage->shared_blks_hit > 0 ||
 									  usage->shared_blks_read > 0 ||
-									  usage->shared_blks_written);
+									  usage->shared_blks_dirtied > 0 ||
+									  usage->shared_blks_written > 0);
 			bool		has_local = (usage->local_blks_hit > 0 ||
 									 usage->local_blks_read > 0 ||
-									 usage->local_blks_written);
+									 usage->local_blks_dirtied > 0 ||
+									 usage->local_blks_written > 0);
 			bool		has_temp = (usage->temp_blks_read > 0 ||
-									usage->temp_blks_written);
+									usage->temp_blks_written > 0);
 
 			/* Show only positive counter values. */
 			if (has_shared || has_local || has_temp)
@@ -1205,6 +1207,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					if (usage->shared_blks_read > 0)
 						appendStringInfo(es->str, " read=%ld",
 										 usage->shared_blks_read);
+					if (usage->shared_blks_dirtied > 0)
+						appendStringInfo(es->str, " dirtied=%ld",
+										 usage->shared_blks_dirtied);
 					if (usage->shared_blks_written > 0)
 						appendStringInfo(es->str, " written=%ld",
 										 usage->shared_blks_written);
@@ -1220,6 +1225,9 @@ ExplainNode(PlanState *planstate, List *ancestors,
 					if (usage->local_blks_read > 0)
 						appendStringInfo(es->str, " read=%ld",
 										 usage->local_blks_read);
+					if (usage->local_blks_dirtied > 0)
+						appendStringInfo(es->str, " dirtied=%ld",
+										 usage->local_blks_dirtied);
 					if (usage->local_blks_written > 0)
 						appendStringInfo(es->str, " written=%ld",
 										 usage->local_blks_written);
@@ -1243,9 +1251,11 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		{
 			ExplainPropertyLong("Shared Hit Blocks", usage->shared_blks_hit, es);
 			ExplainPropertyLong("Shared Read Blocks", usage->shared_blks_read, es);
+			ExplainPropertyLong("Shared Dirtied Blocks", usage->shared_blks_dirtied, es);
 			ExplainPropertyLong("Shared Written Blocks", usage->shared_blks_written, es);
 			ExplainPropertyLong("Local Hit Blocks", usage->local_blks_hit, es);
 			ExplainPropertyLong("Local Read Blocks", usage->local_blks_read, es);
+			ExplainPropertyLong("Local Dirtied Blocks", usage->local_blks_dirtied, es);
 			ExplainPropertyLong("Local Written Blocks", usage->local_blks_written, es);
 			ExplainPropertyLong("Temp Read Blocks", usage->temp_blks_read, es);
 			ExplainPropertyLong("Temp Written Blocks", usage->temp_blks_written, es);
diff --git a/src/backend/executor/instrument.c b/src/backend/executor/instrument.c
index 2c749b1..6e9f450 100644
--- a/src/backend/executor/instrument.c
+++ b/src/backend/executor/instrument.c
@@ -137,9 +137,11 @@ BufferUsageAccumDiff(BufferUsage *dst,
 {
 	dst->shared_blks_hit += add->shared_blks_hit - sub->shared_blks_hit;
 	dst->shared_blks_read += add->shared_blks_read - sub->shared_blks_read;
+	dst->shared_blks_dirtied += add->shared_blks_dirtied - sub->shared_blks_dirtied;
 	dst->shared_blks_written += add->shared_blks_written - sub->shared_blks_written;
 	dst->local_blks_hit += add->local_blks_hit - sub->local_blks_hit;
 	dst->local_blks_read += add->local_blks_read - sub->local_blks_read;
+	dst->local_blks_dirtied += add->local_blks_dirtied - sub->local_blks_dirtied;
 	dst->local_blks_written += add->local_blks_written - sub->local_blks_written;
 	dst->temp_blks_read += add->temp_blks_read - sub->temp_blks_read;
 	dst->temp_blks_written += add->temp_blks_written - sub->temp_blks_written;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 1adb6d3..3924a51 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -988,6 +988,7 @@ MarkBufferDirty(Buffer buffer)
 	if (dirtied)
 	{
 		VacuumPageDirty++;
+		pgBufferUsage.shared_blks_dirtied++;
 		if (VacuumCostActive)
 			VacuumCostBalance += VacuumCostPageDirty;
 		if (ProcGlobal->bgwriterLatch)
diff --git a/src/backend/storage/buffer/localbuf.c b/src/backend/storage/buffer/localbuf.c
index 096d36a..63c14f7 100644
--- a/src/backend/storage/buffer/localbuf.c
+++ b/src/backend/storage/buffer/localbuf.c
@@ -276,6 +276,10 @@ MarkLocalBufferDirty(Buffer buffer)
 	Assert(LocalRefCount[bufid] > 0);
 
 	bufHdr = &LocalBufferDescriptors[bufid];
+
+	if (!(bufHdr->flags & BM_DIRTY))
+		pgBufferUsage.local_blks_dirtied++;
+
 	bufHdr->flags |= BM_DIRTY;
 }
 
diff --git a/src/include/executor/instrument.h b/src/include/executor/instrument.h
index 084302e..066f684 100644
--- a/src/include/executor/instrument.h
+++ b/src/include/executor/instrument.h
@@ -20,9 +20,11 @@ typedef struct BufferUsage
 {
 	long		shared_blks_hit;	/* # of shared buffer hits */
 	long		shared_blks_read;		/* # of shared disk blocks read */
+	long		shared_blks_dirtied;	/* # of shared blocks dirtied */
 	long		shared_blks_written;	/* # of shared disk blocks written */
 	long		local_blks_hit; /* # of local buffer hits */
 	long		local_blks_read;	/* # of local disk blocks read */
+	long		local_blks_dirtied;		/* # of shared blocks dirtied */
 	long		local_blks_written;		/* # of local disk blocks written */
 	long		temp_blks_read; /* # of temp blocks read */
 	long		temp_blks_written;		/* # of temp blocks written */

#37

Fujii Masao

masao.fujii@gmail.com

almost 14 years ago

In reply to: Robert Haas (#36)

Re: Displaying accumulated autovacuum cost

On Sat, Feb 18, 2012 at 2:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Feb 17, 2012 at 5:04 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

Here are review comments:

The document about EXPLAIN needs to be updated.

You forgot to add the long-integer-valued property of shared/local_blks_dirtied.
So when I ran EXPLAIN and used json as a format, no information about
blks_dirtied
was reported.

Thanks for the review. Updated patch attached.

Thanks for updating the patch!

The patch looks good to me. But I have three minor comments:

In pg_stat_statements--1.1.sql
+/* contrib/pg_stat_statements/pg_stat_statements--1.0.sql */

Typo: s/1.0/1.1

In pg_stat_statements--1.0--1.1.sql, we should complain if script is sourced
in psql, as follows?

\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.1'" to
load this file. \quit

+DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+	pg_stat_statements--unpackaged--1.0.sql

Though I'm not familiar with CREATE EXTENSION. Why did you exclude 1.0.sql
from DATA? In hstore/Makefile, 1.0.sql is included. You think we should prevent
old version (i.e., 1.0) of pg_stat_statements from being used in 9.2?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#38

Robert Haas

robertmhaas@gmail.com

almost 14 years ago

In reply to: Fujii Masao (#37)

Re: Displaying accumulated autovacuum cost

On Mon, Feb 20, 2012 at 2:49 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

In pg_stat_statements--1.0--1.1.sql, we should complain if script is sourced
in psql, as follows?

\echo Use "ALTER EXTENSION pg_stat_statements UPDATE TO '1.1'" to
load this file. \quit

Yeah, maybe. I don't know if we want to put that in every file
forever, but I guess it probably makes sense to do it here.

+DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+       pg_stat_statements--unpackaged--1.0.sql
Though I'm not familiar with CREATE EXTENSION. Why did you exclude 1.0.sql
from DATA? In hstore/Makefile, 1.0.sql is included. You think we should prevent
old version (i.e., 1.0) of pg_stat_statements from being used in 9.2?

I'm not sure. My feeling is that we probably don't want to ship all
the old scripts forever. People should install the latest version,
and use the upgrade scripts to get there if they have an older one.
So my gut feeling here is to change hstore to exclude that file rather
than adding it here. Any other opinions?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#39

Fujii Masao

masao.fujii@gmail.com

almost 14 years ago

In reply to: Robert Haas (#38)

Re: Displaying accumulated autovacuum cost

On Mon, Feb 20, 2012 at 11:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Feb 20, 2012 at 2:49 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
+DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \
+       pg_stat_statements--unpackaged--1.0.sql
Though I'm not familiar with CREATE EXTENSION. Why did you exclude 1.0.sql
from DATA? In hstore/Makefile, 1.0.sql is included. You think we should prevent
old version (i.e., 1.0) of pg_stat_statements from being used in 9.2?
I'm not sure. My feeling is that we probably don't want to ship all
the old scripts forever. People should install the latest version,
and use the upgrade scripts to get there if they have an older one.
So my gut feeling here is to change hstore to exclude that file rather
than adding it here. Any other opinions?

Agreed. But I wonder why VERSION option is usable in CREATE EXTENSION
if people always should use the latest version. Maybe I'm missing something..
Anyway, shipping v1.0 of pg_stat_statement seems less useful in 9.2, so
I agree to exclude it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#40

Dimitri Fontaine

dimitri@2ndQuadrant.fr

almost 14 years ago

In reply to: Fujii Masao (#39)

1 attachment(s)

Re: Displaying accumulated autovacuum cost

Fujii Masao <masao.fujii@gmail.com> writes:

Though I'm not familiar with CREATE EXTENSION. Why did you exclude 1.0.sql
from DATA? In hstore/Makefile, 1.0.sql is included. You think we should prevent
old version (i.e., 1.0) of pg_stat_statements from being used in 9.2?

I'm not sure. My feeling is that we probably don't want to ship all
the old scripts forever. People should install the latest version,
and use the upgrade scripts to get there if they have an older one.
So my gut feeling here is to change hstore to exclude that file rather
than adding it here. Any other opinions?

The problem with the hstore scripts is that you had to copy the 1.0
script, change a couple of lines, and call that 1.1, and you also had to
provide the 1.0--1.1 script file.

The solution would be to be able to create hstore 1.1 from 1.0
automatically and I sent over a very simple patch to do that, albeit
after the deadline for the current CF (that's why it's not listed).

Maybe that's simple enough to be considered? (re-attaching here)

b/contrib/hstore/Makefile | 2
b/contrib/hstore/hstore.control | 1
b/src/backend/commands/extension.c | 83 +++--
contrib/hstore/hstore--1.1.sql | 524 -------------------------------------
4 files changed, 57 insertions(+), 553 deletions(-)

Agreed. But I wonder why VERSION option is usable in CREATE EXTENSION
if people always should use the latest version. Maybe I'm missing something..

I think not that many people are using 9.1 in production already. Also
bear in mind that the mechanism is not made only for contrib, it makes
sense to ship in-house procedure code as an extension too.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachments:

extension-default-full-version.v0.patchtext/x-patchDownload

diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile
index e9e5e53..2d624d3 100644
--- a/contrib/hstore/Makefile
+++ b/contrib/hstore/Makefile
@@ -5,7 +5,7 @@ OBJS = hstore_io.o hstore_op.o hstore_gist.o hstore_gin.o hstore_compat.o \
 	crc32.o
 
 EXTENSION = hstore
-DATA = hstore--1.0.sql hstore--1.1.sql hstore--1.0--1.1.sql \
+DATA = hstore--1.0.sql hstore--1.1.sql.orig hstore--1.0--1.1.sql \
 	hstore--unpackaged--1.0.sql
 
 REGRESS = hstore
diff --git a/contrib/hstore/hstore--1.1.sql b/contrib/hstore/hstore--1.1.sql
deleted file mode 100644
index e95ad32..0000000
--- a/contrib/hstore/hstore--1.1.sql
+++ /dev/null
@@ -1,524 +0,0 @@
-/* contrib/hstore/hstore--1.1.sql */
-
--- complain if script is sourced in psql, rather than via CREATE EXTENSION
-\echo Use "CREATE EXTENSION hstore" to load this file. \quit
-
-CREATE TYPE hstore;
-
-CREATE FUNCTION hstore_in(cstring)
-RETURNS hstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_out(hstore)
-RETURNS cstring
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_recv(internal)
-RETURNS hstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_send(hstore)
-RETURNS bytea
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE TYPE hstore (
-        INTERNALLENGTH = -1,
-        INPUT = hstore_in,
-        OUTPUT = hstore_out,
-        RECEIVE = hstore_recv,
-        SEND = hstore_send,
-        STORAGE = extended
-);
-
-CREATE FUNCTION hstore_version_diag(hstore)
-RETURNS integer
-AS 'MODULE_PATHNAME','hstore_version_diag'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION fetchval(hstore,text)
-RETURNS text
-AS 'MODULE_PATHNAME','hstore_fetchval'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR -> (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = fetchval
-);
-
-CREATE FUNCTION slice_array(hstore,text[])
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_slice_to_array'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR -> (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = slice_array
-);
-
-CREATE FUNCTION slice(hstore,text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_slice_to_hstore'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION isexists(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION exist(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ? (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = exist,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION exists_any(hstore,text[])
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists_any'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ?| (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = exists_any,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION exists_all(hstore,text[])
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_exists_all'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR ?& (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = exists_all,
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION isdefined(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_defined'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION defined(hstore,text)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_defined'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,text)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete_array'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION delete(hstore,hstore)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_delete_hstore'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = text,
-	PROCEDURE = delete
-);
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = text[],
-	PROCEDURE = delete
-);
-
-CREATE OPERATOR - (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = delete
-);
-
-CREATE FUNCTION hs_concat(hstore,hstore)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_concat'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR || (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = hs_concat
-);
-
-CREATE FUNCTION hs_contains(hstore,hstore)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_contains'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hs_contained(hstore,hstore)
-RETURNS bool
-AS 'MODULE_PATHNAME','hstore_contained'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR @> (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = hs_contains,
-	COMMUTATOR = '<@',
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE OPERATOR <@ (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = hs_contained,
-	COMMUTATOR = '@>',
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
--- obsolete:
-CREATE OPERATOR @ (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = hs_contains,
-	COMMUTATOR = '~',
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE OPERATOR ~ (
-	LEFTARG = hstore,
-	RIGHTARG = hstore,
-	PROCEDURE = hs_contained,
-	COMMUTATOR = '@',
-	RESTRICT = contsel,
-	JOIN = contjoinsel
-);
-
-CREATE FUNCTION tconvert(text,text)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_from_text'
-LANGUAGE C IMMUTABLE; -- not STRICT; needs to allow (key,NULL)
-
-CREATE FUNCTION hstore(text,text)
-RETURNS hstore
-AS 'MODULE_PATHNAME','hstore_from_text'
-LANGUAGE C IMMUTABLE; -- not STRICT; needs to allow (key,NULL)
-
-CREATE FUNCTION hstore(text[],text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME', 'hstore_from_arrays'
-LANGUAGE C IMMUTABLE; -- not STRICT; allows (keys,null)
-
-CREATE FUNCTION hstore(text[])
-RETURNS hstore
-AS 'MODULE_PATHNAME', 'hstore_from_array'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE CAST (text[] AS hstore)
-  WITH FUNCTION hstore(text[]);
-
-CREATE FUNCTION hstore(record)
-RETURNS hstore
-AS 'MODULE_PATHNAME', 'hstore_from_record'
-LANGUAGE C IMMUTABLE; -- not STRICT; allows (null::recordtype)
-
-CREATE FUNCTION hstore_to_array(hstore)
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_to_array'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR %% (
-       RIGHTARG = hstore,
-       PROCEDURE = hstore_to_array
-);
-
-CREATE FUNCTION hstore_to_matrix(hstore)
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_to_matrix'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR %# (
-       RIGHTARG = hstore,
-       PROCEDURE = hstore_to_matrix
-);
-
-CREATE FUNCTION akeys(hstore)
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_akeys'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION avals(hstore)
-RETURNS text[]
-AS 'MODULE_PATHNAME','hstore_avals'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION skeys(hstore)
-RETURNS setof text
-AS 'MODULE_PATHNAME','hstore_skeys'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION svals(hstore)
-RETURNS setof text
-AS 'MODULE_PATHNAME','hstore_svals'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION each(IN hs hstore,
-    OUT key text,
-    OUT value text)
-RETURNS SETOF record
-AS 'MODULE_PATHNAME','hstore_each'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION populate_record(anyelement,hstore)
-RETURNS anyelement
-AS 'MODULE_PATHNAME', 'hstore_populate_record'
-LANGUAGE C IMMUTABLE; -- not STRICT; allows (null::rectype,hstore)
-
-CREATE OPERATOR #= (
-	LEFTARG = anyelement,
-	RIGHTARG = hstore,
-	PROCEDURE = populate_record
-);
-
--- btree support
-
-CREATE FUNCTION hstore_eq(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_eq'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_ne(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_ne'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_gt(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_gt'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_ge(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_ge'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_lt(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_lt'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_le(hstore,hstore)
-RETURNS boolean
-AS 'MODULE_PATHNAME','hstore_le'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION hstore_cmp(hstore,hstore)
-RETURNS integer
-AS 'MODULE_PATHNAME','hstore_cmp'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR = (
-       LEFTARG = hstore,
-       RIGHTARG = hstore,
-       PROCEDURE = hstore_eq,
-       COMMUTATOR = =,
-       NEGATOR = <>,
-       RESTRICT = eqsel,
-       JOIN = eqjoinsel,
-       MERGES,
-       HASHES
-);
-CREATE OPERATOR <> (
-       LEFTARG = hstore,
-       RIGHTARG = hstore,
-       PROCEDURE = hstore_ne,
-       COMMUTATOR = <>,
-       NEGATOR = =,
-       RESTRICT = neqsel,
-       JOIN = neqjoinsel
-);
-
--- the comparison operators have funky names (and are undocumented)
--- in an attempt to discourage anyone from actually using them. they
--- only exist to support the btree opclass
-
-CREATE OPERATOR #<# (
-       LEFTARG = hstore,
-       RIGHTARG = hstore,
-       PROCEDURE = hstore_lt,
-       COMMUTATOR = #>#,
-       NEGATOR = #>=#,
-       RESTRICT = scalarltsel,
-       JOIN = scalarltjoinsel
-);
-CREATE OPERATOR #<=# (
-       LEFTARG = hstore,
-       RIGHTARG = hstore,
-       PROCEDURE = hstore_le,
-       COMMUTATOR = #>=#,
-       NEGATOR = #>#,
-       RESTRICT = scalarltsel,
-       JOIN = scalarltjoinsel
-);
-CREATE OPERATOR #># (
-       LEFTARG = hstore,
-       RIGHTARG = hstore,
-       PROCEDURE = hstore_gt,
-       COMMUTATOR = #<#,
-       NEGATOR = #<=#,
-       RESTRICT = scalargtsel,
-       JOIN = scalargtjoinsel
-);
-CREATE OPERATOR #>=# (
-       LEFTARG = hstore,
-       RIGHTARG = hstore,
-       PROCEDURE = hstore_ge,
-       COMMUTATOR = #<=#,
-       NEGATOR = #<#,
-       RESTRICT = scalargtsel,
-       JOIN = scalargtjoinsel
-);
-
-CREATE OPERATOR CLASS btree_hstore_ops
-DEFAULT FOR TYPE hstore USING btree
-AS
-	OPERATOR	1	#<# ,
-	OPERATOR	2	#<=# ,
-	OPERATOR	3	= ,
-	OPERATOR	4	#>=# ,
-	OPERATOR	5	#># ,
-	FUNCTION	1	hstore_cmp(hstore,hstore);
-
--- hash support
-
-CREATE FUNCTION hstore_hash(hstore)
-RETURNS integer
-AS 'MODULE_PATHNAME','hstore_hash'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE OPERATOR CLASS hash_hstore_ops
-DEFAULT FOR TYPE hstore USING hash
-AS
-	OPERATOR	1	= ,
-	FUNCTION	1	hstore_hash(hstore);
-
--- GiST support
-
-CREATE TYPE ghstore;
-
-CREATE FUNCTION ghstore_in(cstring)
-RETURNS ghstore
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE FUNCTION ghstore_out(ghstore)
-RETURNS cstring
-AS 'MODULE_PATHNAME'
-LANGUAGE C STRICT IMMUTABLE;
-
-CREATE TYPE ghstore (
-        INTERNALLENGTH = -1,
-        INPUT = ghstore_in,
-        OUTPUT = ghstore_out
-);
-
-CREATE FUNCTION ghstore_compress(internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION ghstore_decompress(internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION ghstore_penalty(internal,internal,internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION ghstore_picksplit(internal, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION ghstore_union(internal, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION ghstore_same(internal, internal, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION ghstore_consistent(internal,internal,int,oid,internal)
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE OPERATOR CLASS gist_hstore_ops
-DEFAULT FOR TYPE hstore USING gist
-AS
-	OPERATOR        7       @> ,
-	OPERATOR        9       ?(hstore,text) ,
-	OPERATOR        10      ?|(hstore,text[]) ,
-	OPERATOR        11      ?&(hstore,text[]) ,
-        --OPERATOR        8       <@ ,
-        OPERATOR        13      @ ,
-        --OPERATOR        14      ~ ,
-        FUNCTION        1       ghstore_consistent (internal, internal, int, oid, internal),
-        FUNCTION        2       ghstore_union (internal, internal),
-        FUNCTION        3       ghstore_compress (internal),
-        FUNCTION        4       ghstore_decompress (internal),
-        FUNCTION        5       ghstore_penalty (internal, internal, internal),
-        FUNCTION        6       ghstore_picksplit (internal, internal),
-        FUNCTION        7       ghstore_same (internal, internal, internal),
-        STORAGE         ghstore;
-
--- GIN support
-
-CREATE FUNCTION gin_extract_hstore(internal, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION gin_extract_hstore_query(internal, internal, int2, internal, internal)
-RETURNS internal
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE FUNCTION gin_consistent_hstore(internal, int2, internal, int4, internal, internal)
-RETURNS bool
-AS 'MODULE_PATHNAME'
-LANGUAGE C IMMUTABLE STRICT;
-
-CREATE OPERATOR CLASS gin_hstore_ops
-DEFAULT FOR TYPE hstore USING gin
-AS
-	OPERATOR        7       @>,
-	OPERATOR        9       ?(hstore,text),
-	OPERATOR        10      ?|(hstore,text[]),
-	OPERATOR        11      ?&(hstore,text[]),
-	FUNCTION        1       bttextcmp(text,text),
-	FUNCTION        2       gin_extract_hstore(internal, internal),
-	FUNCTION        3       gin_extract_hstore_query(internal, internal, int2, internal, internal),
-	FUNCTION        4       gin_consistent_hstore(internal, int2, internal, int4, internal, internal),
-	STORAGE         text;
diff --git a/contrib/hstore/hstore.control b/contrib/hstore/hstore.control
index 4104e17..0c49ade 100644
--- a/contrib/hstore/hstore.control
+++ b/contrib/hstore/hstore.control
@@ -1,5 +1,6 @@
 # hstore extension
 comment = 'data type for storing sets of (key, value) pairs'
 default_version = '1.1'
+default_full_version = '1.0'
 module_pathname = '$libdir/hstore'
 relocatable = true
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 6ecbbc7..2fb1658 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -66,6 +66,7 @@ typedef struct ExtensionControlFile
 	char	   *name;			/* name of the extension */
 	char	   *directory;		/* directory for script files */
 	char	   *default_version;	/* default install target version, if any */
+	char	   *default_full_version;	/* default install source version, if any */
 	char	   *module_pathname;	/* string to substitute for MODULE_PATHNAME */
 	char	   *comment;		/* comment, if any */
 	char	   *schema;			/* target schema (allowed if !relocatable) */
@@ -505,6 +506,10 @@ parse_extension_control_file(ExtensionControlFile *control,
 
 			control->default_version = pstrdup(item->value);
 		}
+		else if (strcmp(item->name, "default_full_version") == 0)
+		{
+			control->default_full_version = pstrdup(item->value);
+		}
 		else if (strcmp(item->name, "module_pathname") == 0)
 		{
 			control->module_pathname = pstrdup(item->value);
@@ -1288,9 +1293,15 @@ CreateExtension(CreateExtensionStmt *stmt)
 	 * Determine the (unpackaged) version to update from, if any, and then
 	 * figure out what sequence of update scripts we need to apply.
 	 */
-	if (d_old_version && d_old_version->arg)
+	if ((d_old_version && d_old_version->arg) || pcontrol->default_full_version)
 	{
-		oldVersionName = strVal(d_old_version->arg);
+		bool unpackaged = (d_old_version && d_old_version->arg);
+
+		if (unpackaged)
+			oldVersionName = strVal(d_old_version->arg);
+		else
+			oldVersionName = pcontrol->default_full_version;
+
 		check_valid_version_name(oldVersionName);
 
 		if (strcmp(oldVersionName, versionName) == 0)
@@ -1303,24 +1314,28 @@ CreateExtension(CreateExtensionStmt *stmt)
 											  oldVersionName,
 											  versionName);
 
-		if (list_length(updateVersions) == 1)
-		{
-			/*
-			 * Simple case where there's just one update script to run. We
-			 * will not need any follow-on update steps.
-			 */
-			Assert(strcmp((char *) linitial(updateVersions), versionName) == 0);
-			updateVersions = NIL;
-		}
-		else
+		/* in the create from unpackaged case, redude the update list */
+		if (unpackaged)
 		{
-			/*
-			 * Multi-step sequence.  We treat this as installing the version
-			 * that is the target of the first script, followed by successive
-			 * updates to the later versions.
-			 */
-			versionName = (char *) linitial(updateVersions);
-			updateVersions = list_delete_first(updateVersions);
+			if (list_length(updateVersions) == 1)
+			{
+				/*
+				 * Simple case where there's just one update script to run. We
+				 * will not need any follow-on update steps.
+				 */
+				Assert(strcmp((char *) linitial(updateVersions), versionName) == 0);
+				updateVersions = NIL;
+			}
+			else
+			{
+				/*
+				 * Multi-step sequence.  We treat this as installing the version
+				 * that is the target of the first script, followed by successive
+				 * updates to the later versions.
+				 */
+				versionName = (char *) linitial(updateVersions);
+				updateVersions = list_delete_first(updateVersions);
+			}
 		}
 	}
 	else
@@ -1455,18 +1470,30 @@ CreateExtension(CreateExtensionStmt *stmt)
 
 	/*
 	 * Execute the installation script file
-	 */
-	execute_extension_script(extensionOid, control,
-							 oldVersionName, versionName,
-							 requiredSchemas,
-							 schemaName, schemaOid);
-
-	/*
+	 *
 	 * If additional update scripts have to be executed, apply the updates as
 	 * though a series of ALTER EXTENSION UPDATE commands were given
 	 */
-	ApplyExtensionUpdates(extensionOid, pcontrol,
-						  versionName, updateVersions);
+	if (pcontrol->default_full_version)
+	{
+		execute_extension_script(extensionOid, control,
+								 NULL, oldVersionName,
+								 requiredSchemas,
+								 schemaName, schemaOid);
+
+		ApplyExtensionUpdates(extensionOid, pcontrol,
+							  oldVersionName, updateVersions);
+	}
+	else
+	{
+		execute_extension_script(extensionOid, control,
+								 oldVersionName, versionName,
+								 requiredSchemas,
+								 schemaName, schemaOid);
+
+		ApplyExtensionUpdates(extensionOid, pcontrol,
+							  versionName, updateVersions);
+	}
 }
 
 /*

#41

Greg Smith

greg@2ndQuadrant.com

almost 14 years ago

In reply to: Dimitri Fontaine (#40)

Re: Displaying accumulated autovacuum cost

On 02/21/2012 04:44 PM, Dimitri Fontaine wrote:

The solution would be to be able to create hstore 1.1 from 1.0
automatically and I sent over a very simple patch to do that, albeit
after the deadline for the current CF (that's why it's not listed).

Maybe that's simple enough to be considered? (re-attaching here)

I can't find any message that matches this description--from you,
touches extensions in this way, and was submitted after the CF
deadline. Help? If that's something that ripples out to impacting how
changes to upgraded extensions should look, that should get its own
thread I think. This one is pretty deep and already far off its titled
topic.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

#42

Greg Smith

greg@2ndQuadrant.com

almost 14 years ago

In reply to: Robert Haas (#36)

Re: Displaying accumulated autovacuum cost

I just took this for spin. Everything I tried worked, docs built and
read fine. The description of how "dirty" differs from "written" is a
bit cryptic, but I don't see an easy way to do better without a whole
new section on that topic. Once the extension upgrade questions are
sorted out, I'd say this is ready to commit. Example I have at the
bottom here shows a case where this is a big improvement over the
existing tracking. I think this is a must-have improvement if we're
going to advocate using pg_stat_statements for more things.

This works as expected in all of the EXPLAIN forms, I tried all of the
supported formats. Sample of the text one:

$ psql -d pgbench -c "EXPLAIN (ANALYZE,BUFFERS,FORMAT text) UPDATE
pgbench_accounts SET aid=aid+0 WHERE aid<1000"
QUERY PLAN
----------
Update on pgbench_accounts (cost=0.00..86.09 rows=860 width=103)
(actual time=8.587..8.587 rows=0 loops=1)
Buffers: shared hit=8315 read=70 dirtied=16
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..86.09 rows=860 width=103) (actual time=0.017..2.086 rows=999
loops=1)
Index Cond: (aid < 1000)
Buffers: shared hit=1828 read=28
Total runtime: 8.654 ms

Also ran just the UPDATE statement alone, then retrieved the counts from
pg_stat_statements:

Note that there are no blocks shown as written there. That is also
demonstrated by the results after some pgbench "-M prepared" stress
testing against a small database. The pgbench tables are structured
such that the number of branches < tellers << accounts. On a small
scale database (I used 10 here), there might only be a single page of
branch data. That shows up clearly in the different amount of dirtied
blocks in each update:

$ psql -x -c "select
query,shared_blks_hit,shared_blks_read,shared_blks_dirtied,shared_blks_written
from pg_stat_statements order by calls desc limit 7"

Note how in the branches and tellers case, the existing "written"
counter shows 0. Those hot pages stay in cache the whole time with a
high usage count, backends never get to write them out; only the
checkpointer does. Only this new "dirtied" one reflects a useful write
count for frequently used pages like that, and it does show that more
pages are being touched by pgbench_tellers than pgbench_branches.

I'd never ran into this before because I normally test against larger
databases. But once I tried to find an example of this form, it was
easy to do so. Systems where much of the database fits into
shared_buffers in particular are likely to see a deceptively small write
count.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

#43

Dimitri Fontaine

dimitri@2ndQuadrant.fr

almost 14 years ago

In reply to: Greg Smith (#41)

Extensions default full version (was: Displaying accumulated autovacuum cost)

Greg Smith <greg@2ndQuadrant.com> writes:

On 02/21/2012 04:44 PM, Dimitri Fontaine wrote:

The solution would be to be able to create hstore 1.1 from 1.0
automatically and I sent over a very simple patch to do that, albeit
after the deadline for the current CF (that's why it's not listed).

Maybe that's simple enough to be considered? (re-attaching here)

I can't find any message that matches this description--from you, touches
extensions in this way, and was submitted after the CF deadline. Help? If

http://archives.postgresql.org/message-id/m2k44m9oyo.fsf@2ndQuadrant.fr

that's something that ripples out to impacting how changes to upgraded
extensions should look, that should get its own thread I think. This one is
pretty deep and already far off its titled topic.

Let's do that then. Well I'm just changing the subject here.
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#44

Robert Haas

robertmhaas@gmail.com

almost 14 years ago

In reply to: Greg Smith (#42)

Re: Displaying accumulated autovacuum cost

On Tue, Feb 21, 2012 at 11:55 PM, Greg Smith <greg@2ndquadrant.com> wrote:

I just took this for spin. Everything I tried worked, docs built and read
fine. The description of how "dirty" differs from "written" is a bit
cryptic, but I don't see an easy way to do better without a whole new
section on that topic. Once the extension upgrade questions are sorted out,
I'd say this is ready to commit. Example I have at the bottom here shows a
case where this is a big improvement over the existing tracking. I think
this is a must-have improvement if we're going to advocate using
pg_stat_statements for more things.

Glad you like it; committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company