widen vacuum buffer counters

Started by Alvaro Herreraabout 6 years ago6 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results. Example:

2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17
pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen
tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable
buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied
avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s
system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec

That's to be expected, as tables exist that are large enough for 4 billion
buffer accesses to be a possibility. Let's widen the counters, as in the
attached patch.

I propose to backpatch this.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

widen-vacuum-counts.patchtext/x-diff; charset=us-asciiDownload+7-7
#2Julien Rouhaud
rjuju123@gmail.com
In reply to: Alvaro Herrera (#1)
Re: widen vacuum buffer counters

On Fri, Jan 31, 2020 at 9:59 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results. Example:

2020-01-06 16:38:38.010 EST [45625-1] app= LOG: automatic vacuum of table "somtab.sf.foobar": index scans: 17
pages: 0 removed, 207650641 remain, 0 skipped due to pins, 13419403 skipped frozen
tuples: 141265419 removed, 3186614627 remain, 87783760 are dead but not yet removable
buffer usage: -2022059267 hits, -17141881 misses, 1252507767 dirtied
avg read rate: -0.043 MB/s, avg write rate: 3.146 MB/s
system usage: CPU 107819.92s/2932957.75u sec elapsed 3110498.10 sec

That's to be expected, as tables exist that are large enough for 4 billion
buffer accesses to be a possibility. Let's widen the counters, as in the
attached patch.

I propose to backpatch this.

+1, and patch LGTM.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: widen vacuum buffer counters

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

We recently noticed that vacuum buffer counters wraparound in extreme
cases, with ridiculous results.

Ugh.

I propose to backpatch this.

+1 for widening these counters, but since they're global variables, -0.2
or so for back-patching. I don't know of any reason that an extension
would be touching these, but I feel like the problem isn't severe enough
to justify taking an ABI-break risk.

Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)

regards, tom lane

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#3)
Re: widen vacuum buffer counters

On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote:

+1 for widening these counters, but since they're global variables, -0.2
or so for back-patching. I don't know of any reason that an extension
would be touching these, but I feel like the problem isn't severe enough
to justify taking an ABI-break risk.

I would not recommend doing a back-patch because of that. I don't
think that's worth taking any risk. Extension authors can have a lot
of imagination.

Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)

Surely you meant INT64_FORMAT here? Anyway, looking at the patch,
couldn't we just use uint64?
--
Michael

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
Re: widen vacuum buffer counters

Michael Paquier <michael@paquier.xyz> writes:

On Fri, Jan 31, 2020 at 05:13:53PM -0500, Tom Lane wrote:

Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)

Surely you meant INT64_FORMAT here?

No, because that varies depending on platform, so using it in a
translatable string is a bad idea. See e.g. 6a1cd8b92.

Anyway, looking at the patch,
couldn't we just use uint64?

Yeah, I was wondering if those counters shouldn't be unsigned, too.
Probably doesn't matter once we widen them to 64 bits though.

regards, tom lane

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: widen vacuum buffer counters

On 2020-Jan-31, Tom Lane wrote:

Also, %zd is the wrong format code for int64. Recommended practice
these days is to use "%lld" with an explicit cast of the printf argument
to long long (just to be sure). That doesn't work safely before v12,
and if you did insist on back-patching further, you'd need to jump
through hoops to avoid having platform-specific format codes in a
translatable string. (The side-effects for translation seem like
an independent argument against back-patching.)

Pushed with that change; did not backpatch, because I don't think it's
really worth the possible breakage :-)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services