increasing the default WAL segment size
Hi,
I'd like to propose that we increase the default WAL segment size,
which is currently 16MB. It was first set to that value in commit
47937403676d913c0e740eec6b85113865c6c8ab in October of 1999; prior to
that, it was 64MB. Between 1999 and now, there have been three
significant changes that make me think it might be time to rethink
this value:
1. Transaction rates are vastly higher these days. In 1999, I think
we were still limited to ~2^32 transactions during the entire lifetime
of the server; transaction ID wraparound hadn't been invented yet.[1]I believe at that time we consumed an XID even for a read-only transaction, too; today, we can do 2^32 read transactions in a few hours.
Today, some installations do that many write transactions in under a
week. The practical consequence of this is that WAL files fill up in
extremely short periods of time. Some users generate multiple
terabytes of WAL per day, which means they are generating - and very
likely archiving - WAL files a rate of greater than 1 per second!
That poses multiple problems. For example, if your archive command
happens to involve ssh, you might run into trouble because of this
sort of thing:
[rhaas pgsql]$ /usr/bin/time ssh hydra true
1.57 real 0.00 user 0.00 sys
Also, your operating system's implementation of directories and the
commands to work with them (like ls) don't necessarily scale well to
tens or hundreds of thousands of archived files.
Furthermore, there is an enforced, synchronous fsync at the end of
every segment, which actually does hurt performance on write-heavy
workloads.[2]Amit did some benchmarking on this, I believe, but I don't have the numbers handy. Of course, if that were the only reason to consider
increasing the segment size, it would probably make more sense to just
try to push that extra fsync into the background, but that's not
really the case. From what I hear, the gigantic number of files is a
bigger pain point.
2. Disks are a bit larger these days. In the worst case, we waste
just under twice as much space as whatever the segment size is: you
might need 1 byte from the oldest segment you're keeping and 1 byte
from the newest segment that you are keeping, but not the remaining
contents of either file. In 1999, trying to limit disk wastage to
<32MB probably seemed reasonable, but today that's very little disk
space. I think at that time typical hard drive sizes were around 10
GB, whereas today they are around 1 TB.[3]https://commons.wikimedia.org/wiki/File:Hard_drive_capacity_over_time.png I'm not sure whether the
size of the sorts of high-performance storage that is likely to be
used for pg_xlog has grown as fast as hard drives generally, but even
so it seems pretty clear to me that trying to limit disk wastage to
32MB is excessively conservative on modern hardware.
3. archive_timeout is no longer a frequently used option. Obviously,
if you are frequently archiving partial segments, you don't want the
segment size to be too large, because if it is, each forced segment
switch potentially wastes a large amount of space (and bandwidth).
But given streaming replication and pg_receivexlog, the use case for
archiving partial segments is, at least according to my understanding,
a lot narrower than it used to be. So, I think we don't have to worry
as much about keeping forced segment switches cheap as we did during
the 8.x series.
Considering those three factors, I think we should consider pushing
the default value up somewhat higher for v10. Reverting to the 64MB
size that we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
sounds pretty reasonable. Users with really high transaction rates
might even prefer a higher value (e.g. 256MB, 1GB) but that's hardly
practical for small installs given our default of max_wal_size = 1GB.
Possibly it would make sense for this to be configurable at initdb
time instead of requiring a recompile; we probably don't save any
significant number of cycles by compiling this into the server.
Thoughts?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
[1]: I believe at that time we consumed an XID even for a read-only transaction, too; today, we can do 2^32 read transactions in a few hours.
transaction, too; today, we can do 2^32 read transactions in a few
hours.
[2]: Amit did some benchmarking on this, I believe, but I don't have the numbers handy.
the numbers handy.
[3]: https://commons.wikimedia.org/wiki/File:Hard_drive_capacity_over_time.png
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 24, 2016 at 10:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
1. Transaction rates are vastly higher these days. In 1999, I think
we were still limited to ~2^32 transactions during the entire lifetime
of the server; transaction ID wraparound hadn't been invented yet.[1]
Today, some installations do that many write transactions in under a
week. The practical consequence of this is that WAL files fill up in
extremely short periods of time. Some users generate multiple
terabytes of WAL per day, which means they are generating - and very
likely archiving - WAL files a rate of greater than 1 per second!
That poses multiple problems. For example, if your archive command
happens to involve ssh, you might run into trouble because of this
sort of thing:[rhaas pgsql]$ /usr/bin/time ssh hydra true
1.57 real 0.00 user 0.00 sys
...
Considering those three factors, I think we should consider pushing
the default value up somewhat higher for v10. Reverting to the 64MB
size that we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
sounds pretty reasonable. Users with really high transaction rates
might even prefer a higher value (e.g. 256MB, 1GB) but that's hardly
practical for small installs given our default of max_wal_size = 1GB.
Possibly it would make sense for this to be configurable at initdb
time instead of requiring a recompile; we probably don't save any
significant number of cycles by compiling this into the server.
FWIW, +1
We're already hurt by the small segments due to a similar phenomenon
as the ssh case: TCP slow start. Designing the archive/recovery
command to work around TCP slow start is quite complex, and bigger
segments would just be a better thing.
Not to mention that bigger segments compress better.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
I'd like to propose that we increase the default WAL segment size,
which is currently 16MB.
That seems like a reasonable thing to consider ...
Possibly it would make sense for this to be configurable at initdb
time instead of requiring a recompile;
... but I think this is just folly. You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is. Better to keep treating it like
BLCKSZ, as a fixed parameter of a build. (There's a reason why we
keep this number in pg_control.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
Considering those three factors, I think we should consider pushing the
default value up somewhat higher for v10. Reverting to the 64MB size that
we had prior to 47937403676d913c0e740eec6b85113865c6c8ab
sounds pretty reasonable.
+1
The other downside is that the response time of transactions may degrade when they have to wait for a new WAL segment to be created. Tha might pop up as occasional slow or higher maximum response time, which is a mystery to users. Maybe it's time to use posix_fallocate() to create WAL segments.
Possibly it would make sense for this to be configurable at initdb time
instead of requiring a recompile; we probably don't save any significant
number of cycles by compiling this into the server.
+1
3. archive_timeout is no longer a frequently used option. Obviously, if
you are frequently archiving partial segments, you don't want the segment
size to be too large, because if it is, each forced segment switch
potentially wastes a large amount of space (and bandwidth).
But given streaming replication and pg_receivexlog, the use case for
archiving partial segments is, at least according to my understanding, a
lot narrower than it used to be. So, I think we don't have to worry as
much about keeping forced segment switches cheap as we did during the 8.x
series.
I'm not sure about this. I know (many or not) users use continuous archiving with archive_command and archive_timeout for backups, and don't want to use streaming replication, because the system is not worth the cost and trouble of HA. I heard from a few users that they were surprised when they knew that PostgreSQL generates WAL even when no update transaction is happening. Is this still true?
Regards
Takayuki Tsunakawa
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
Possibly it would make sense for this to be configurable at initdb
time instead of requiring a recompile;... but I think this is just folly. You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.
Don't think it'd actually be all that complicated, we already verify
the compatibility of some things. But I'm doubtful it's worth it, and
I'm also rather doubtful that it's actually without overhead.
Andres
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
... but I think this is just folly. You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.
Don't think it'd actually be all that complicated, we already verify
the compatibility of some things. But I'm doubtful it's worth it, and
I'm also rather doubtful that it's actually without overhead.
My point is basically that it'll introduce failure modes that we don't
currently concern ourselves with. Yes, you can do configure
--with-wal-segsize, but it's on your own head whether the resulting build
will interoperate with anything else --- and I'm quite sure nobody tests,
eg, walsender or walreceiver to see if they fail sanely in such cases.
I don't think we'd get to take such a laissez-faire position with respect
to an initdb option.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... but I think this is just folly. You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.
I said an initdb-time parameter, meaning not capable of being changed
within the lifetime of the cluster. So I don't see how the slave
servers would get out of sync?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
Possibly it would make sense for this to be configurable at initdb
time instead of requiring a recompile;... but I think this is just folly. You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.Don't think it'd actually be all that complicated, we already verify
the compatibility of some things. But I'm doubtful it's worth it, and
I'm also rather doubtful that it's actually without overhead.
Really? Where do you think the overhead would come from? What sort
of test would you run to try to detect it?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 24, 2016 at 11:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@anarazel.de> writes:
On 2016-08-24 22:33:49 -0400, Tom Lane wrote:
... but I think this is just folly. You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.Don't think it'd actually be all that complicated, we already verify
the compatibility of some things. But I'm doubtful it's worth it, and
I'm also rather doubtful that it's actually without overhead.My point is basically that it'll introduce failure modes that we don't
currently concern ourselves with. Yes, you can do configure
--with-wal-segsize, but it's on your own head whether the resulting build
will interoperate with anything else --- and I'm quite sure nobody tests,
eg, walsender or walreceiver to see if they fail sanely in such cases.
I don't think we'd get to take such a laissez-faire position with respect
to an initdb option.
I am really confused by this. If you connect a slave to a master
other than the one that you cloned to create the salve, of course
that's going to fail. But if the slave is cloned from the master,
then the segment size is going to match. It seems like the only thing
we need to do to make this work is make sure to get the segment size
from the control file rather than anywhere else, which doesn't seem
very difficult. What am I missing?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... but I think this is just folly. You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.
I said an initdb-time parameter, meaning not capable of being changed
within the lifetime of the cluster. So I don't see how the slave
servers would get out of sync?
The point is that that now becomes something to worry about. I do not
think I have to exhibit a live bug within five minutes' thought before
saying that it's a risk area. It's something that we simply have not
worried about before, and IME that generally means there's some squishy
things there.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
What am I missing?
Maybe nothing. But I'll point out that of the things that can currently
be configured at initdb time, such as LC_COLLATE, there is not one single
one that matters to walsender/walreceiver. If you think there is zero
risk involved in introducing a parameter that will matter at that level,
you have a different concept of risk than I do.
If you'd presented some positive reason why we ought to be taking some
risk here, I'd be on board. But you haven't really. The current default
value for this parameter is nearly old enough to vote; how is it that
we suddenly need to make it easily configurable? Let's just change
the value and be happy.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
and I'm also rather doubtful that it's actually without overhead.
Really? Where do you think the overhead would come from?
ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
doing a lot of %). Some of that happens with exclusive lwlocks held, and
some even with a spinlock held IIRC. Making that variable won't be
free. Whether it's actually measurabel - hard to say. I do remember
Heikki fighting hard to simplify some parts of the critical code during
xlog scalability stuff, and that that even involved moving minor amounts
of math out of critical sections.
What sort of test would you run to try to detect it?
Xlog scalability tests (parallel copy, parallel inserts...), and
decoding speed (pg_xlogdump --stats?)
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 24, 2016 at 11:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
What am I missing?
Maybe nothing. But I'll point out that of the things that can currently
be configured at initdb time, such as LC_COLLATE, there is not one single
one that matters to walsender/walreceiver. If you think there is zero
risk involved in introducing a parameter that will matter at that level,
you have a different concept of risk than I do.If you'd presented some positive reason why we ought to be taking some
risk here, I'd be on board. But you haven't really. The current default
value for this parameter is nearly old enough to vote; how is it that
we suddenly need to make it easily configurable? Let's just change
the value and be happy.
I certainly think that's a good first cut. As I said before, I think
that increasing the value from 16MB to 64MB won't really hurt people
with mostly-default configurations. max_wal_size=1GB currently means
64 16-MB segments; if it starts meaning 16 64-MB segments, I don't
think that will have much impact on on people one way or the other.
Meanwhile, we'll significantly help people who are currently
generating painfully large but not totally insane numbers of WAL
segments. Someone who is currently generating 32,768 WAL segments per
day - about one every 2.6 seconds - will have a significantly easier
time if they start generating 8,192 WAL segments per day - about one
every 10.5 seconds - instead. It's just much easier for a reasonably
simple archive command to keep up, "ls" doesn't have as many directory
entries to sort, etc.
However, for people who have really high velocity systems - say
300,000 WAL segments per day - a fourfold increase in the segment size
only gets them down to 75,000 WAL segments per day, which is still
pretty nuts. High tens of thousands of segments per day is, surely,
easier to manage than low hundreds of thousands, but it still puts
really tight requirements on how fast your archive_command has to run.
On that kind of system, you really want a segment size of maybe 1GB.
In this example that gets you down to ~4700 WAL files per day, or
about one every 18 seconds. But 1GB is clearly too large to be the
default.
I think we're going to run into this issue more and more as people
start running PostgreSQL on larger databases. In current releases,
the cost of wraparound autovacuums can easily be the limiting factor
here: the I/O cost is proportional to the XID burn rate multiplied by
the entire size of the database. So mostly read-only databases or
databases that only take batch loads can be fine even if they are
really big, but it's hard to scale databases that do lots of
transaction processing beyond a certain size because you just end up
running continuous wraparound vacuums and eventually you can't even do
that fast enough. The freeze map changes in 9.6 should help with this
problem, though, at least for databases that have hot spots rather
than uniform access, which is of course very common. I think the
result of that is likely to be that people try to scale up PostgreSQL
to larger databases than ever before. New techniques for indexing
large amounts of data (like BRIN) and for querying it (like parallel
query, especially once we support having the driving scan be a bitmap
heap scan) are going to encourage people in that direction, too.
You're asking why we suddenly need to make this configurable as if it
were a surprising need, but I think it would be more surprising if
scaling up didn't create some new needs. I can't think of any reason
why a 100TB database and a 100MB database should both want to use the
same WAL segment size, and I think we want to support both of those
things.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 24, 2016 at 11:52 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
and I'm also rather doubtful that it's actually without overhead.
Really? Where do you think the overhead would come from?
ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
doing a lot of %). Some of that happens with exclusive lwlocks held, and
some even with a spinlock held IIRC. Making that variable won't be
free. Whether it's actually measurabel - hard to say. I do remember
Heikki fighting hard to simplify some parts of the critical code during
xlog scalability stuff, and that that even involved moving minor amounts
of math out of critical sections.
OK, that's helpful context.
What sort of test would you run to try to detect it?
Xlog scalability tests (parallel copy, parallel inserts...), and
decoding speed (pg_xlogdump --stats?)
Thanks; that's helpful, too.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-25 00:28:58 -0400, Robert Haas wrote:
On Wed, Aug 24, 2016 at 11:52 PM, Andres Freund <andres@anarazel.de> wrote:
On 2016-08-24 23:26:51 -0400, Robert Haas wrote:
On Wed, Aug 24, 2016 at 10:54 PM, Andres Freund <andres@anarazel.de> wrote:
and I'm also rather doubtful that it's actually without overhead.
Really? Where do you think the overhead would come from?
ATM we do a math involving XLOG_BLCKSZ in a bunch of places (including
doing a lot of %). Some of that happens with exclusive lwlocks held, and
some even with a spinlock held IIRC. Making that variable won't be
free. Whether it's actually measurabel - hard to say. I do remember
Heikki fighting hard to simplify some parts of the critical code during
xlog scalability stuff, and that that even involved moving minor amounts
of math out of critical sections.OK, that's helpful context.
What sort of test would you run to try to detect it?
Xlog scalability tests (parallel copy, parallel inserts...), and
decoding speed (pg_xlogdump --stats?)Thanks; that's helpful, too.
FWIW, I'm also doubtful that investing time into making this initdb
configurable is a good use of time: The number of users that'll adjust
initdb time parameters is going to be fairly small.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 25, 2016 at 12:35 AM, Andres Freund <andres@anarazel.de> wrote:
FWIW, I'm also doubtful that investing time into making this initdb
configurable is a good use of time: The number of users that'll adjust
initdb time parameters is going to be fairly small.
I have to admit that I was skeptical about the idea of doing anything
about this at all the first few times it came up. 16MB ought to be
good enough for anyone! However, the time between beatings has now
gotten short enough that the bruises don't have time to heal before
the next beating arrives from a completely different customer. I try
not to hold my views so firmly as to be impervious to contrary
evidence.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello hackers,
I'm no PG hacker, so maybe I'm completely wrong, so sorry if I have wasted your time. I try to make the best out of Tom Lanes comment.
What would happen if there's a database on a server with initdb (or whatever) parameter -with-wal-size=64MB and later someone decides to make it the master in a replicated system and has a slave without that parameter? Would the slave work with the "different" wal size of the master? How could be guaranteed that in such a scenario the replication either works correctly or failes with a meaningful error message?
But in general I thing a more flexible WAL size is a good idea.
To answer Andres: You have found one of the (few?) users to adjust initdb parameters.
Regards
Robert Haas <robertmhaas@gmail.com> schrieb am 6:43 Donnerstag, 25.August 2016:
On Thu, Aug 25, 2016 at 12:35 AM, Andres Freund <andres@anarazel.de> wrote:
FWIW, I'm also doubtful that investing time into making this initdb
configurable is a good use of time: The number of users that'll adjust
initdb time parameters is going to be fairly small.
I have to admit that I was skeptical about the idea of doing anything
about this at all the first few times it came up. 16MB ought to be
good enough for anyone! However, the time between beatings has now
gotten short enough that the bruises don't have time to heal before
the next beating arrives from a completely different customer. I try
not to hold my views so firmly as to be impervious to contrary
evidence.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 24, 2016 at 10:40:06PM -0300, Claudio Freire wrote:
time instead of requiring a recompile; we probably don't save any
significant number of cycles by compiling this into the server.FWIW, +1
We're already hurt by the small segments due to a similar phenomenon
as the ssh case: TCP slow start. Designing the archive/recovery
command to work around TCP slow start is quite complex, and bigger
segments would just be a better thing.Not to mention that bigger segments compress better.
This would be good time to rename pg_xlog and pg_clog directories too.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 25, 2016 at 1:04 AM, Wolfgang Wilhelm
<wolfgang20121964@yahoo.de> wrote:
What would happen if there's a database on a server with initdb (or
whatever) parameter -with-wal-size=64MB and later someone decides to make it
the master in a replicated system and has a slave without that parameter?
Would the slave work with the "different" wal size of the master? How could
be guaranteed that in such a scenario the replication either works correctly
or failes with a meaningful error message?
You make reference to an "initdb (or whatever) parameter" but actually
there is a big difference between the "initdb" case and the "whatever"
case. If the parameter is fixed at initdb time, then the master and
the slave will definitely agree: the slave had to be created by
copying the master, and that means the control file that contains the
size was also copied. Neither can have been changed afterwards.
That's what an initdb-time parameter means. On the other hand, if the
parameter is, say, a GUC, then you would have exactly the kinds of
problems that you are talking about here. I am not keen to solve any
of those problems, which is why I am not proposing to go any further
than an initdb-time parameter.
But in general I thing a more flexible WAL size is a good idea.
To answer Andres: You have found one of the (few?) users to adjust initdb
parameters.
Good to know, thanks.
In further defense of the idea that making this more configurable
isn't nuts, it's worth noting that the history here is:
* When Vadim originally added XLogSegSize in
30659d43eb73272e20f2eb1d785a07ba3b553ed8 (September 1999), it was a
constant.
* In c3c09be34b6b0d7892f1087a23fc6eb93f3c4f04 (February 2004), this
became configurable via pg_config_manual.h.
* In cf9f6c8d8e9df28f3fbe1850ca7f042b2c01252e (May 2008), Tom made
this configurable via configure.
So there's a well-established history of making this gradually easier
for users to change.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Aug 25, 2016 at 5:32 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Wed, Aug 24, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... but I think this is just folly. You'd have to do major amounts
of work to keep, eg, slave servers on the same page as the master
about what the segment size is.I said an initdb-time parameter, meaning not capable of being changed
within the lifetime of the cluster. So I don't see how the slave
servers would get out of sync?The point is that that now becomes something to worry about. I do not
think I have to exhibit a live bug within five minutes' thought before
saying that it's a risk area. It's something that we simply have not
worried about before, and IME that generally means there's some squishy
things there.
If we ignore the possible performance implications (which we shouldn't, of
course, but for the sake of argument), I think having it as a configurable
parameter in initdb would make it *less* of something to worry about.
Because it comes with the cluster during replication. I think it's more
likely that you accidentally end up with two instances compiled with
different values than that you get an issue from this.
That said, I think it also has to be a *very* bad painpoint for somebody to
care about changing it if it requires recompilation. The vast majority of
users run the packaged versions, and they don't want to run anything else.
So you will have whatever the RPMs or the DEBs or installers pick for you.
Anything that is a ./configure-time option,is something we should expect
almost nobody to change.
Changing the default will of course help/hurt those as well. But if we
change the default to something high and say "hey those of you who just run
it on a smaller system should recompile with a different --configure", we
are being *very* user-unfriendly. Or the other way around.
That doesn't mean we shouldn't change the default. We just need to be a lot
more careful about what we change it to if it's ./configure to reset it.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/