auto-sizing wal_buffers

Started by Robert Haasover 15 years ago37 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

On Thu, Jan 6, 2011 at 11:37 PM, Greg Smith <greg@2ndquadrant.com> wrote:

If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto
setting, it would for the most part become an autotuned parameter.  That
would make it 0.75 to 1MB at the standard anemic Linux default kernel
parameters.  Maybe more than some would like, but dropping shared_buffers
from 24MB to 23MB to keep this from being ridiculously undersized is
probably a win.  That percentage would reach 16MB by the time shared_buffers
was increased to 533MB, which also seems about right to me.  On a really bad
setup (brief pause to flip off Apple) with only 4MB to work with total,
you'd end up with wal_buffers between 64 and 128K, so very close to the
status quo.

Code that up, and we could probably even remove the parameter as a tunable
altogether.  Very few would see a downside relative to any sensible
configuration under the current situation, and many people would notice
better automagic performance with one less parameter to tweak.  Given the
recent investigations about the serious downsides of tiny wal_buffers values
on new Linux kernels when using open_datasync, a touch more aggression about
this setting seems particularly appropriate to consider now.  That's been
swapped out as the default, but it's still possible people will switch to
it.

Would anyone like to argue vigorously for or against the above proposal?

I'll start: I think this is a good idea. I don't have a strong
opinion on whether the exact details of Greg proposes above are
precisely optimal, but I think they're in the right ballpark.
Furthermore, we already have other things that are tuned in somewhat
similar ways (e.g. the size of the fsync request queue defaults to the
number of shared buffers) so there's precedent for it. It's one less
parameter that you have to set to make things just work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#2Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#1)
Re: auto-sizing wal_buffers

On Thu, Jan 13, 2011 at 23:19, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 6, 2011 at 11:37 PM, Greg Smith <greg@2ndquadrant.com> wrote:

If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto
setting, it would for the most part become an autotuned parameter.  That
would make it 0.75 to 1MB at the standard anemic Linux default kernel
parameters.  Maybe more than some would like, but dropping shared_buffers
from 24MB to 23MB to keep this from being ridiculously undersized is
probably a win.  That percentage would reach 16MB by the time shared_buffers
was increased to 533MB, which also seems about right to me.  On a really bad
setup (brief pause to flip off Apple) with only 4MB to work with total,
you'd end up with wal_buffers between 64 and 128K, so very close to the
status quo.

Code that up, and we could probably even remove the parameter as a tunable
altogether.  Very few would see a downside relative to any sensible
configuration under the current situation, and many people would notice
better automagic performance with one less parameter to tweak.  Given the
recent investigations about the serious downsides of tiny wal_buffers values
on new Linux kernels when using open_datasync, a touch more aggression about
this setting seems particularly appropriate to consider now.  That's been
swapped out as the default, but it's still possible people will switch to
it.

Would anyone like to argue vigorously for or against the above proposal?

I'll start: I think this is a good idea.  I don't have a strong
opinion on whether the exact details of Greg proposes above are
precisely optimal, but I think they're in the right ballpark.
Furthermore, we already have other things that are tuned in somewhat
similar ways (e.g. the size of the fsync request queue defaults to the
number of shared buffers) so there's precedent for it.  It's one less
parameter that you have to set to make things just work.

+1, I like the idea. Would it still be there to override if necessary?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#3Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#2)
Re: auto-sizing wal_buffers

On Thu, Jan 13, 2011 at 5:29 PM, Magnus Hagander <magnus@hagander.net> wrote:

+1, I like the idea. Would it still be there to override if necessary?

Depends what people want to do. We could make the default "0kB", and
define that to mean "auto-tune", or we could remove the parameter
altogether. I think I was envisioning the latter, but if people are
hesitant to do that we could do the former instead.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#3)
Re: auto-sizing wal_buffers

Depends what people want to do. We could make the default "0kB", and
define that to mean "auto-tune", or we could remove the parameter
altogether. I think I was envisioning the latter, but if people are
hesitant to do that we could do the former instead.

Unfortunately, we might still need a manual parameter for override
because of the interaction between wal_buffers and
synchronous_commit=off, since it sets the max size of the unflushed data
buffer. Discuss?

And the "auto" setting should be -1, not 0kB. We use -1 for "use
default" for several other GUCs.

Other than that, I think Greg's numbers are fine, and strongly support
having one less thing to tune.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#5Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#4)
Re: auto-sizing wal_buffers

On Thu, Jan 13, 2011 at 6:02 PM, Josh Berkus <josh@agliodbs.com> wrote:

Depends what people want to do.  We could make the default "0kB", and
define that to mean "auto-tune", or we could remove the parameter
altogether.  I think I was envisioning the latter, but if people are
hesitant to do that we could do the former instead.

Unfortunately, we might still need a manual parameter for override
because of the interaction between wal_buffers and
synchronous_commit=off, since it sets the max size of the unflushed data
buffer.  Discuss?

Do we have any evidence there's actually a problem in that case, or
that a larger value of wal_buffers solves it? I mean, the background
writer is going to start a background flush as quickly as it can...

And the "auto" setting should be -1, not 0kB.  We use -1 for "use
default" for several other GUCs.

No can do. Gotta have things in the same units.

Other than that, I think Greg's numbers are fine, and strongly support
having one less thing to tune.

OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#6Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#5)
Re: auto-sizing wal_buffers

Robert,

Unfortunately, we might still need a manual parameter for override
because of the interaction between wal_buffers and
synchronous_commit=off, since it sets the max size of the unflushed data
buffer. Discuss?

Do we have any evidence there's actually a problem in that case, or
that a larger value of wal_buffers solves it? I mean, the background
writer is going to start a background flush as quickly as it can...

I don't think anyone has done any testing. However, the setting is
there and some users might be convinced that they need it.

And the "auto" setting should be -1, not 0kB. We use -1 for "use
default" for several other GUCs.

No can do. Gotta have things in the same units.

That's certainly not true with, for example, log_temp_files.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#7Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Robert Haas (#1)
Re: auto-sizing wal_buffers

Robert Haas <robertmhaas@gmail.com> wrote:

Would anyone like to argue vigorously for or against the above
proposal?

Greg's numbers look reasonable to me, and there's nobody I'd trust
more to come up with reasonable numbers for this. One less tunable
is a good thing, especially since this designed to scale from
someone slapping it on his laptop for a first quick try, all the way
up to industrial strength production environments. I guess a manual
override doesn't bother me too much, but I am a bit dubious of its
value, and there is value in keeping the GUC count down....

-Kevin

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: auto-sizing wal_buffers

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 13, 2011 at 5:29 PM, Magnus Hagander <magnus@hagander.net> wrote:

+1, I like the idea. Would it still be there to override if necessary?

Depends what people want to do. We could make the default "0kB", and
define that to mean "auto-tune", or we could remove the parameter
altogether. I think I was envisioning the latter, but if people are
hesitant to do that we could do the former instead.

I think we need to keep the override capability until the autotune
algorithm has proven itself in the field for a couple of years.

I agree with Josh that a negative value should be used to select the
autotune method.

regards, tom lane

#9Greg Smith
gsmith@gregsmith.com
In reply to: Tom Lane (#8)
Re: auto-sizing wal_buffers

Tom Lane wrote:

I think we need to keep the override capability until the autotune
algorithm has proven itself in the field for a couple of years.

I agree with Josh that a negative value should be used to select the
autotune method.

Agreed on both fronts. Attached patch does the magic. Also available
in branch "walbuffers" from git://github.com/greg2ndQuadrant/postgres.git

By changing only shared_buffers I get the following quite reasonable
automatic behavior:

$ psql -c "SELECT name,unit,boot_val,setting,current_setting(name) FROM
pg_settings WHERE name IN ('wal_buffers','shared_buffers')"
name | unit | boot_val | setting | current_setting
----------------+------+----------+---------+-----------------
shared_buffers | 8kB | 1024 | 3072 | 24MB
wal_buffers | 8kB | -1 | 96 | 768kB

shared_buffers | 8kB | 1024 | 4096 | 32MB
wal_buffers | 8kB | -1 | 128 | 1MB

shared_buffers | 8kB | 1024 | 16384 | 128MB
wal_buffers | 8kB | -1 | 512 | 4MB

shared_buffers | 8kB | 1024 | 131072 | 1GB
wal_buffers | 8kB | -1 | 2048 | 16MB

shared_buffers | 8kB | 1024 | 262144 | 2GB
wal_buffers | 8kB | -1 | 2048 | 16MB

If you've set it to the auto-tuning behavior, you don't see that setting
of -1 in the SHOW output; you see the value it's actually been set to.
The only way to know that was set automatically is to look at boot_val
as I've shown here. I consider this what admins would prefer, as the
easy way to expose the value that was used. I would understand if
people considered it a little odd though. Since you can't change it
without a postgresql.conf edit and a server start anyway, and it's
tersely documented in the sample postgresql.conf what -1 does, I don't
see this being a problem for anyone in the field.

To try and clear up some of the confusion around how the earlier
documentation suggests larger values of this aren't needed, I added the
following updated description of how this has been observed to work for
admins in practice:

! Since the data is written out to disk at every transaction commit,
! the setting many only need to be be large enough to hold the
amount
! of WAL data generated by one typical transaction. Larger values,
! typically at least a few megabytes, can improve write performance
! on a busy server where many clients are committing at once.
! Extremely large settings are unlikely to provide additional
benefit.

And to make this easy as possible to apply if I got this right, here's
some proposed commit text:

Automatically set wal_buffers to be proportional
to the size of shared_buffers. Make it 1/32
as large when the auto-tuned behavior, which
is the default and set with a value of -1,
is used. The previous default of 64kB is still
enforced as a minimum value. The maximum
automatic value is limited to 16MB.

(Note that this not exactly what I put in my own commit message if you
grab from my repo, that had a typo)

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Attachments:

auto-wal-buffers-v1.patchtext/x-patch; name=auto-wal-buffers-v1.patchDownload+52-29
#10Greg Smith
gsmith@gregsmith.com
In reply to: Kevin Grittner (#7)
Re: auto-sizing wal_buffers

Kevin Grittner wrote:

I guess a manual override doesn't bother me too much, but I am a bit dubious of its
value, and there is value in keeping the GUC count down...

It's a risk-reward thing really. The reward for removing it is that a
few lines of code and a small section of the documentation go away.
It's not very big. The risk seems low, but it's not zero. Let's say
this goes in, we get to 9.2 or later, and a survey suggests that no one
has needed to ever set wal_buffers when deploying 9.1. At that point I
think everyone would feel much better considering to nuke it
altogether. I just looked at the code again when developing the patch,
and there's really not enough benefit to removing it to worry about
taking any risk right now.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#11Fujii Masao
masao.fujii@gmail.com
In reply to: Greg Smith (#9)
Re: auto-sizing wal_buffers

On Sat, Jan 15, 2011 at 3:51 PM, Greg Smith <greg@2ndquadrant.com> wrote:

Agreed on both fronts.  Attached patch does the magic.  Also available in
branch "walbuffers" from git://github.com/greg2ndQuadrant/postgres.git

+int XLOGbuffersMin = 8;

XLOGbuffersMin is a fixed value. I think that defining it as a macro
rather than a variable seems better.

+ if (XLOGbuffers > 2048)
+ XLOGbuffers = 2048;

Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems
better.

+#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers

Typo: s/32kB/64kB

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#12Greg Smith
gsmith@gregsmith.com
In reply to: Fujii Masao (#11)
Re: auto-sizing wal_buffers

Fujii Masao wrote:

+int XLOGbuffersMin = 8;

XLOGbuffersMin is a fixed value. I think that defining it as a macro
rather than a variable seems better.

+ if (XLOGbuffers > 2048)
+ XLOGbuffers = 2048;

Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems
better.

+#wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers

Typo: s/32kB/64kB

Thanks, I've fixed all these issues and attached a new full patch,
pushed to github, etc. Tests give same results back, and it's nice that
it scale to reasonable behavior if someone changes their XLOG segment size.

It should be possible to set the value back to the older minimum value
of 32kB too. That's doesn't actually seem to work though; when I try it
I get:

$ psql -c "SELECT name,unit,boot_val,setting,current_setting(name) FROM
pg_settings WHERE name IN ('wal_buffers','shared_buffers')"
name | unit | boot_val | setting | current_setting
----------------+------+----------+---------+-----------------
shared_buffers | 8kB | 1024 | 131072 | 1GB
wal_buffers | 8kB | -1 | 8 | 64kB

Where I was expecting that setting to be "4" instead for 32kB. So
there's probably some minor bug left in where I inserted this into the
initialization sequence.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Attachments:

auto-wal-buffers-v2.patchtext/x-patch; name=auto-wal-buffers-v2.patchDownload+61-29
#13Josh Berkus
josh@agliodbs.com
In reply to: Greg Smith (#9)
Re: auto-sizing wal_buffers

On 1/14/11 10:51 PM, Greg Smith wrote:

! Since the data is written out to disk at every transaction
commit,
! the setting many only need to be be large enough to hold the
amount
! of WAL data generated by one typical transaction. Larger values,
! typically at least a few megabytes, can improve write performance
! on a busy server where many clients are committing at once.
! Extremely large settings are unlikely to provide additional
benefit.

I think we can be more specific on that last sentence; is there even any
*theoretical* benefit to settings above 16MB, the size of a WAL segment?
Certainly there have been no test results to show any.

If we don't know, keep it vague, but otherwise I suggest:

"Settings larger than the size of a single WAL segment (16MB by default)
are unlikely to produce any benefit."

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#13)
Re: auto-sizing wal_buffers

Josh Berkus <josh@agliodbs.com> writes:

I think we can be more specific on that last sentence; is there even any
*theoretical* benefit to settings above 16MB, the size of a WAL segment?

IIRC there's a forced fsync at WAL segment switch, so no.

regards, tom lane

#15Marti Raudsepp
marti@juffo.org
In reply to: Josh Berkus (#13)
Re: auto-sizing wal_buffers

On Sun, Jan 16, 2011 at 00:34, Josh Berkus <josh@agliodbs.com> wrote:

I think we can be more specific on that last sentence; is there even any
*theoretical* benefit to settings above 16MB, the size of a WAL segment?
 Certainly there have been no test results to show any.

I don't know if it's applicable to real workloads in any way, but it
did make a measurable difference in one of my tests.

Back when benchmarking different wal_sync_methods, I found that when
doing massive INSERTs from generate_series, the INSERT time kept
improving even after increasing wal_buffers from 16MB to 32, 64 and
128MB; especially with wal_sync_method=open_datasync. The total
INSERT+COMMIT time remained constant, however.

More details here:
http://archives.postgresql.org/pgsql-performance/2010-11/msg00094.php

Regards,
Marti

#16Fujii Masao
masao.fujii@gmail.com
In reply to: Greg Smith (#12)
Re: auto-sizing wal_buffers

On Sun, Jan 16, 2011 at 1:52 AM, Greg Smith <greg@2ndquadrant.com> wrote:

Fujii Masao wrote:

+int                    XLOGbuffersMin = 8;

XLOGbuffersMin is a fixed value. I think that defining it as a macro
rather than a variable seems better.

+               if (XLOGbuffers > 2048)
+                       XLOGbuffers = 2048;

Using "XLOG_SEG_SIZE/XLOG_BLCKSZ" rather than 2048 seems
better.

+#wal_buffers = -1                      # min 32kB, -1 sets based on
shared_buffers

Typo: s/32kB/64kB

Thanks, I've fixed all these issues and attached a new full patch, pushed to
github, etc.  Tests give same results back, and it's nice that it scale to
reasonable behavior if someone changes their XLOG segment size.

Thanks for the update.

+/* Minimum setting used for a lower bound on wal_buffers */
+#define XLOG_BUFFER_MIN			4

Why didn't you use XLOG_BUFFER_MIN instead of XLOGbuffersMin?
XLOG_BUFFER_MIN is not used anywhere for now.

+		if (XLOGbuffers < (XLOGbuffersMin * 2))
+			XLOGbuffers = XLOGbuffersMin * 2;
+		}

Why is the minimum value 64kB only when wal_buffers is set to
-1? This seems confusing for users.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#17Fujii Masao
masao.fujii@gmail.com
In reply to: Josh Berkus (#13)
Re: auto-sizing wal_buffers

On Sun, Jan 16, 2011 at 7:34 AM, Josh Berkus <josh@agliodbs.com> wrote:

I think we can be more specific on that last sentence; is there even any
*theoretical* benefit to settings above 16MB, the size of a WAL segment?
 Certainly there have been no test results to show any.

If the workload generates 16MB or more WAL for wal_writer_delay,
16MB or more of wal_buffers would be effective. In that case,
wal_buffers is likely to be filled up with unwritten WAL, then you have
to write buffers while holding WALInsert lock. This is obviously not
good.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#18Jeff Janes
jeff.janes@gmail.com
In reply to: Josh Berkus (#13)
Re: auto-sizing wal_buffers

On Sat, Jan 15, 2011 at 2:34 PM, Josh Berkus <josh@agliodbs.com> wrote:

On 1/14/11 10:51 PM, Greg Smith wrote:

!         Since the data is written out to disk at every transaction
commit,
!         the setting many only need to be be large enough to hold the
amount
!         of WAL data generated by one typical transaction.  Larger values,
!         typically at least a few megabytes, can improve write performance
!         on a busy server where many clients are committing at once.
!         Extremely large settings are unlikely to provide additional
benefit.

I think we can be more specific on that last sentence; is there even any
*theoretical* benefit to settings above 16MB, the size of a WAL segment?

I would turn it around and ask if there is any theoretical reason it
would not benefit?
(And if so, can they be cured soon?)

Certainly there have been no test results to show any.

Did the tests show steady improvement up to 16MB and then suddenly
hit a wall? (And in which case, were they recompiled at a larger segment
size and repeated?) Or did improvement just peter out because 16MB is really
quite a bit and there was just no need for it to be larger independent
of segment size?

Cheers,

Jeff

#19Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#14)
Re: auto-sizing wal_buffers

On Sun, Jan 16, 2011 at 9:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Josh Berkus <josh@agliodbs.com> writes:

I think we can be more specific on that last sentence; is there even any
*theoretical* benefit to settings above 16MB, the size of a WAL segment?

IIRC there's a forced fsync at WAL segment switch, so no.

However other backends can still do WAL inserts while that fsync
takes place, as long as they can find available buffers to write into.
So that should not be too limiting--a larger wal_buffers make it more
likely they will find available buffers.

However if the background writer does not keep up under bulk loading
conditions, then the end of segment fsync will probably happen via
AdvanceXLInsertBuffer, which will be sitting on the WALInsertLock. So
that is obviously bad news.

Cheers,

Jeff

#20Greg Smith
gsmith@gregsmith.com
In reply to: Fujii Masao (#16)
Re: auto-sizing wal_buffers

Fujii Masao wrote:

+/* Minimum setting used for a lower bound on wal_buffers */
+#define XLOG_BUFFER_MIN			4

Why didn't you use XLOG_BUFFER_MIN instead of XLOGbuffersMin?
XLOG_BUFFER_MIN is not used anywhere for now.

That's a typo; will fix.

+		if (XLOGbuffers < (XLOGbuffersMin * 2))
+			XLOGbuffers = XLOGbuffersMin * 2;
+		}

Why is the minimum value 64kB only when wal_buffers is set to
-1? This seems confusing for users.

That's because the current default on older versions is 64kB. Since the
automatic selection is going to be the new default, I hope, I don't want
it to be possible it will pick a number smaller than the default of
older versions. So the automatic lower limit is 64kB, while the actual
manually set lower limit remains 32kB, as before.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#21Greg Smith
gsmith@gregsmith.com
In reply to: Josh Berkus (#13)
#22Fujii Masao
masao.fujii@gmail.com
In reply to: Greg Smith (#20)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#12)
#24Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Robert Haas (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Jaime Casanova (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#23)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#28)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#32)
#34Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#36)