widen vacuum buffer counters
We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results. Example:
2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17
pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen
tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable
buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied
avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s
system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec
That's to be expected, as tables exist that are large enough for 4 billion
buffer accesses to be a possibility. Let's widen the counters, as in the
attached patch.
I propose to backpatch this.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
widen-vacuum-counts.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8ce501151e..049ec2703b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -614,7 +614,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
vacrelstats->new_dead_tuples,
OldestXmin);
appendStringInfo(&buf,
- _("buffer usage: %d hits, %d misses, %d dirtied\n"),
+ _("buffer usage: %zd hits, %zd misses, %zd dirtied\n"),
VacuumPageHit,
VacuumPageMiss,
VacuumPageDirty);
diff --git a/src/backend/utils/init/globals.c b/src/backend/utils/init/globals.c
index b1f6291b99..eb19644419 100644
--- a/src/backend/utils/init/globals.c
+++ b/src/backend/utils/init/globals.c
@@ -140,9 +140,9 @@ int VacuumCostPageDirty = 20;
int VacuumCostLimit = 200;
double VacuumCostDelay = 0;
-int VacuumPageHit = 0;
-int VacuumPageMiss = 0;
-int VacuumPageDirty = 0;
+int64 VacuumPageHit = 0;
+int64 VacuumPageMiss = 0;
+int64 VacuumPageDirty = 0;
int VacuumCostBalance = 0; /* working state for vacuum */
bool VacuumCostActive = false;
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 62d64aa0a1..f985453ec3 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -252,9 +252,9 @@ extern int VacuumCostPageDirty;
extern int VacuumCostLimit;
extern double VacuumCostDelay;
-extern int VacuumPageHit;
-extern int VacuumPageMiss;
-extern int VacuumPageDirty;
+extern int64 VacuumPageHit;
+extern int64 VacuumPageMiss;
+extern int64 VacuumPageDirty;
extern int VacuumCostBalance;
extern bool VacuumCostActive;
On Fri, Jan 31, 2020 at 9:59 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results. Example:2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17
pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen
tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable
buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied
avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s
system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 secThat's to be expected, as tables exist that are large enough for 4 billion
buffer accesses to be a possibility. Let's widen the counters, as in the
attached patch.I propose to backpatch this.
+1, and patch LGTM.
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results.
Ugh.
I propose to backpatch this.
+1 for widening these counters, but since they're global variables, -0.2
or so for back-patching. I don't know of any reason that an extension
would be touching these, but I feel like the problem isn't severe enough
to justify taking an ABI-break risk.
Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)
regards, tom lane
On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote:
+1 for widening these counters, but since they're global variables, -0.2
or so for back-patching. I don't know of any reason that an extension
would be touching these, but I feel like the problem isn't severe enough
to justify taking an ABI-break risk.
I would not recommend doing a back-patch because of that. I don't
think that's worth taking any risk. Extension authors can have a lot
of imagination.
Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)
Surely you meant INT64_FORMAT here? Anyway, looking at the patch,
couldn't we just use uint64?
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote:
Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)
Surely you meant INT64_FORMAT here?
No, because that varies depending on platform, so using it in a
translatable string is a bad idea. See e.g. 6a1cd8b92.
Anyway, looking at the patch,
couldn't we just use uint64?
Yeah, I was wondering if those counters shouldn't be unsigned, too.
Probably doesn't matter once we widen them to 64 bits though.
regards, tom lane
On 2020-Jan-31, Tom Lane wrote:
Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)
Pushed with that change; did not backpatch, because I don't think it's
really worth the possible breakage :-)
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services