widen vacuum buffer counters

Started by Alvaro Herreraalmost 6 years ago6 messages
#1Alvaro Herrera
alvherre@2ndquadrant.com
1 attachment(s)

We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results. Example:

2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17
pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen
tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable
buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied
avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s
system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec

That's to be expected, as tables exist that are large enough for 4 billion
buffer accesses to be a possibility. Let's widen the counters, as in the
attached patch.

I propose to backpatch this.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

widen-vacuum-counts.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce501151e..049ec2703b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -614,7 +614,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 							 vacrelstats->new_dead_tuples,
 							 OldestXmin);
 			appendStringInfo(&buf,
-							 _("buffer usage: %d hits, %d misses, %d dirtied\n"),
+							 _("buffer usage: %zd hits, %zd misses, %zd dirtied\n"),
 							 VacuumPageHit,
 							 VacuumPageMiss,
 							 VacuumPageDirty);
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b1f6291b99..eb19644419 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -140,9 +140,9 @@ int			VacuumCostPageDirty = 20;
 int			VacuumCostLimit = 200;
 double		VacuumCostDelay = 0;
 
-int			VacuumPageHit = 0;
-int			VacuumPageMiss = 0;
-int			VacuumPageDirty = 0;
+int64		VacuumPageHit = 0;
+int64		VacuumPageMiss = 0;
+int64		VacuumPageDirty = 0;
 
 int			VacuumCostBalance = 0;	/* working state for vacuum */
 bool		VacuumCostActive = false;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 62d64aa0a1..f985453ec3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -252,9 +252,9 @@ extern int	VacuumCostPageDirty;
 extern int	VacuumCostLimit;
 extern double VacuumCostDelay;
 
-extern int	VacuumPageHit;
-extern int	VacuumPageMiss;
-extern int	VacuumPageDirty;
+extern int64 VacuumPageHit;
+extern int64 VacuumPageMiss;
+extern int64 VacuumPageDirty;
 
 extern int	VacuumCostBalance;
 extern bool VacuumCostActive;
#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Alvaro Herrera (#1)
Re: widen vacuum buffer counters

On Fri, Jan 31, 2020 at 9:59 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results. Example:

2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17
pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen
tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable
buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied
avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s
system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec

That's to be expected, as tables exist that are large enough for 4 billion
buffer accesses to be a possibility. Let's widen the counters, as in the
attached patch.

I propose to backpatch this.

+1, and patch LGTM.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: widen vacuum buffer counters

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results.

Ugh.

I propose to backpatch this.

+1 for widening these counters, but since they're global variables, -0.2
or so for back-patching. I don't know of any reason that an extension
would be touching these, but I feel like the problem isn't severe enough
to justify taking an ABI-break risk.

Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)

regards, tom lane

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: widen vacuum buffer counters

On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote:

+1 for widening these counters, but since they're global variables, -0.2
or so for back-patching. I don't know of any reason that an extension
would be touching these, but I feel like the problem isn't severe enough
to justify taking an ABI-break risk.

I would not recommend doing a back-patch because of that. I don't
think that's worth taking any risk. Extension authors can have a lot
of imagination.

Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)

Surely you meant INT64_FORMAT here? Anyway, looking at the patch,
couldn't we just use uint64?
--
Michael

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
Re: widen vacuum buffer counters

Michael Paquier <michael@paquier.xyz> writes:

On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote:

Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)

Surely you meant INT64_FORMAT here?

No, because that varies depending on platform, so using it in a
translatable string is a bad idea. See e.g. 6a1cd8b92.

Anyway, looking at the patch,
couldn't we just use uint64?

Yeah, I was wondering if those counters shouldn't be unsigned, too.
Probably doesn't matter once we widen them to 64 bits though.

regards, tom lane

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: widen vacuum buffer counters

On 2020-Jan-31, Tom Lane wrote:

Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)

Pushed with that change; did not backpatch, because I don't think it's
really worth the possible breakage :-)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services