max_wal_senders must die
Hackers,
What purpose is served, exactly, by max_wal_senders?
In order for a standby to connect, it must have a superuser login, and
replication connections must be enabled in pg_hba.conf. How is having
one more setting in one more file you have to enable on the master
benefitting anyone?
Under what bizarre set of circumstances would anyone have runaway
connections from replicas to the master?
Proposed that we simply remove this setting in 9.1. The real maximum
wal senders should be whatever max_connections is.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus wrote:
Hackers,
What purpose is served, exactly, by max_wal_senders?
In order for a standby to connect, it must have a superuser login, and
replication connections must be enabled in pg_hba.conf. How is having
one more setting in one more file you have to enable on the master
benefitting anyone?Under what bizarre set of circumstances would anyone have runaway
connections from replicas to the master?Proposed that we simply remove this setting in 9.1. The real maximum
wal senders should be whatever max_connections is.
I disagree - limiting the maximum number of replication connections is
important for my usecases.
Replication connections are significantly more heavilyweight than a
normal connection and right now I for example simply use this setting to
prevent stupid mistakes (especially in virtualized^cloudstyle environments).
What we really should look into is using a less privileged role - or
dedicated replication role - and use the existing per role connection
limit feature. That feature is unlimited by default, people can change
it like for every role and we can git rid of that guc.
Stefan
On Tue, Oct 19, 2010 at 13:14, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:
Josh Berkus wrote:
Hackers,
What purpose is served, exactly, by max_wal_senders?
In order for a standby to connect, it must have a superuser login, and
replication connections must be enabled in pg_hba.conf. How is having one
more setting in one more file you have to enable on the master benefitting
anyone?Under what bizarre set of circumstances would anyone have runaway
connections from replicas to the master?Proposed that we simply remove this setting in 9.1. The real maximum wal
senders should be whatever max_connections is.I disagree - limiting the maximum number of replication connections is
important for my usecases.
Replication connections are significantly more heavilyweight than a normal
connection and right now I for example simply use this setting to prevent
stupid mistakes (especially in virtualized^cloudstyle environments).What we really should look into is using a less privileged role - or
dedicated replication role - and use the existing per role connection limit
feature. That feature is unlimited by default, people can change it like
for every role and we can git rid of that guc.
+1 for being able to control it that wya - that should keep it simple
for the newbie usecase, while retaining the ability for fine-grained
control for those who need it.
I think it's already on the TODO for 9.1 to use a separate role for it...
If we want something fixed *now*, should we perhaps just bump the
*default* value for max_wal_senders to 5 or something?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander wrote:
On Tue, Oct 19, 2010 at 13:14, Stefan Kaltenbrunner
<stefan@kaltenbrunner.cc> wrote:Josh Berkus wrote:
Hackers,
What purpose is served, exactly, by max_wal_senders?
In order for a standby to connect, it must have a superuser login, and
replication connections must be enabled in pg_hba.conf. How is having one
more setting in one more file you have to enable on the master benefitting
anyone?Under what bizarre set of circumstances would anyone have runaway
connections from replicas to the master?Proposed that we simply remove this setting in 9.1. The real maximum wal
senders should be whatever max_connections is.I disagree - limiting the maximum number of replication connections is
important for my usecases.
Replication connections are significantly more heavilyweight than a normal
connection and right now I for example simply use this setting to prevent
stupid mistakes (especially in virtualized^cloudstyle environments).What we really should look into is using a less privileged role - or
dedicated replication role - and use the existing per role connection limit
feature. That feature is unlimited by default, people can change it like
for every role and we can git rid of that guc.+1 for being able to control it that wya - that should keep it simple
for the newbie usecase, while retaining the ability for fine-grained
control for those who need it.I think it's already on the TODO for 9.1 to use a separate role for it...
I Think we had some plans to do that - I wonder how hard it would be to
just do the dedicated role thing for now (maybe with the only constraint
that it can only be used on a replication connection) and looking into
making it (technically) less privileged later?
If we want something fixed *now*, should we perhaps just bump the
*default* value for max_wal_senders to 5 or something?
or accept -1 for "unlimited" and use by default, that would fix part of
the complaint from josh but you would still have to restart the master
to implement a limit...
Stefan
Stefan, Dimitri,
I disagree - limiting the maximum number of replication connections is
important for my usecases.
Can you explain more? I clearly don't understand your use case.
If we want something fixed *now*, should we perhaps just bump the
*default* value for max_wal_senders to 5 or something?
If we're not going to remove it, then the default should be -1, which
should mean "same as max_connections".
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus wrote:
Under what bizarre set of circumstances would anyone have runaway
connections from replicas to the master?
Cloud computing deployments where additional replicas are brought up
automatically in response to demand. It's easy to imagine a situation
where a standby instance is spawned, starts to sync, and that additional
load triggers *another* standby to come on board; repeat until the
master is doing nothing but servicing standby sync requests.
max_wal_senders provides a safety value for that.
I think Magnus's idea to bump the default to 5 triages the worst of the
annoyance here, without dropping the feature (which has uses) or waiting
for new development to complete. I'd be in favor of just committing
that change right now, before it gets forgotten about, and then if
nobody else gets around to further work at least something improved here
for 9.1.
--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
On 10/19/2010 09:06 AM, Greg Smith wrote:
I think Magnus's idea to bump the default to 5 triages the worst of the
annoyance here, without dropping the feature (which has uses) or waiting
for new development to complete. I'd be in favor of just committing
that change right now, before it gets forgotten about, and then if
nobody else gets around to further work at least something improved here
for 9.1.
Heck, even *I* could write that patch, if we're agreed. Although you
can commit it.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 10/19/2010 09:06 AM, Greg Smith wrote:
I think Magnus's idea to bump the default to 5 triages the worst of the
annoyance here, without dropping the feature (which has uses) or waiting
for new development to complete. I'd be in favor of just committing
that change right now, before it gets forgotten about, and then if
nobody else gets around to further work at least something improved here
for 9.1.Heck, even *I* could write that patch, if we're agreed. Although you can
commit it.
Setting max_wal_senders to a non-zero value causes additional work to
be done every time a transaction commits, aborts, or is prepared.
It's possible that this overhead is too trivial to be worth worrying
about; I haven't looked at it closely.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 10/19/2010 09:06 AM, Greg Smith wrote:
I think Magnus's idea to bump the default to 5 triages the worst of the
annoyance here, without dropping the feature (which has uses) or waiting
for new development to complete.
Setting max_wal_senders to a non-zero value causes additional work to
be done every time a transaction commits, aborts, or is prepared.
Yes. This isn't just a numeric parameter; it's also a boolean
indicating "do I want to pay the overhead to be prepared to be a
replication master?". Josh has completely failed to make a case that
that should be the default. In fact, the system would fail to start
at all if we just changed the default for max_wal_senders and not the
default for wal_level.
regards, tom lane
On Tue, Oct 19, 2010 at 12:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 10/19/2010 09:06 AM, Greg Smith wrote:
I think Magnus's idea to bump the default to 5 triages the worst of the
annoyance here, without dropping the feature (which has uses) or waiting
for new development to complete.Setting max_wal_senders to a non-zero value causes additional work to
be done every time a transaction commits, aborts, or is prepared.Yes. This isn't just a numeric parameter; it's also a boolean
indicating "do I want to pay the overhead to be prepared to be a
replication master?". Josh has completely failed to make a case that
that should be the default. In fact, the system would fail to start
at all if we just changed the default for max_wal_senders and not the
default for wal_level.regards, tom lane
If the variable is altered such that it is dynamic, could it not be
updated by the postmaster when a connection attempts to begin
replicating?
--
Rob Wultsch
wultsch@gmail.com
On Tue, Oct 19, 2010 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Tue, Oct 19, 2010 at 12:18 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 10/19/2010 09:06 AM, Greg Smith wrote:
I think Magnus's idea to bump the default to 5 triages the worst of the
annoyance here, without dropping the feature (which has uses) or waiting
for new development to complete.Setting max_wal_senders to a non-zero value causes additional work to
be done every time a transaction commits, aborts, or is prepared.Yes. This isn't just a numeric parameter; it's also a boolean
indicating "do I want to pay the overhead to be prepared to be a
replication master?". Josh has completely failed to make a case that
that should be the default. In fact, the system would fail to start
at all if we just changed the default for max_wal_senders and not the
default for wal_level.
On a slightly tangential note, it would be really nice to be able to
change things like wal_level and max_wal_senders on the fly. ISTM
that needing to change the setting is actually the smaller portion of
the gripe; the bigger annoyance is that you bring the system down,
change one setting, bring it back up, take a hot backup, and then
realize that you have to shut the system down again to change the
other setting, because you forgot to do it the first time. Since the
error message you get on the slave side is pretty explicit, the sheer
fact of needing to change max_wal_senders is only a minor
inconvenience; but the possible need to take down the system a second
time is a major one.
One of the problems with making these and similar settings changeable
on the fly is that, to be safe, we can declare that the value is
increased until we can verify that every still-running backend has
picked up the increased value and begun acting upon it. We have no
architecture for knowing when that has happened; the postmaster just
signals its children and forgets about it. If you had a shared memory
array (similar to the ProcArray, but perhaps with a separate set of
locks) in which backends wrote their last-known values of GUCs that
fall into this category, it would be possible to write a function that
returns the answer to the question "What is the least value of
wal_level that exists in any backend in the system?". Maybe that's
not the right architecture; I'm not sure. But we should see if we can
figure out something that will work.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Yes. This isn't just a numeric parameter; it's also a boolean
indicating "do I want to pay the overhead to be prepared to be a
replication master?".
Since this is the first time I've heard of the overhead, it would be
hard for me to have taken that into consideration. If there was
discussion about this ever, I missed it. That explains why we changed
the default in RC1 though, which was a surprise to me.
What is the overhead exactly? What specific work do we do? Link?
Josh has completely failed to make a case that
that should be the default. In fact, the system would fail to start
at all if we just changed the default for max_wal_senders and not the
default for wal_level.
Well, now that you mention it, I also think that "hot standby" should be
the default. Yes, I know about the overhead, but I also think that the
number of our users who want easy replication *far* outnumber the users
who care about an extra 10% WAL overhead.
On a slightly tangential note, it would be really nice to be able to
change things like wal_level and max_wal_senders on the fly.
This would certainly help solve the problem. Heck, if we could even
just change them with a reload (rather than a restart) it would be a
vast improvement.
ISTM
that needing to change the setting is actually the smaller portion of
the gripe; the bigger annoyance is that you bring the system down,
change one setting, bring it back up, take a hot backup, and then
realize that you have to shut the system down again to change the
other setting, because you forgot to do it the first time. Since the
error message you get on the slave side is pretty explicit, the sheer
fact of needing to change max_wal_senders is only a minor
inconvenience; but the possible need to take down the system a second
time is a major one.
You've summed up the problem nicely. I'll note that even though I've
already set up 3 production systems using SR, I still mess this up about
1/3 the time and have to restart and reclone. There's just too many
settings.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus wrote:
Well, now that you mention it, I also think that "hot standby" should be
the default. Yes, I know about the overhead, but I also think that the
number of our users who want easy replication *far* outnumber the users
who care about an extra 10% WAL overhead.
I think this whole situation is similar to the resistance to increasing
default_statistics_target. There's additional overhead added by
enabling both of these settings, in return for making it more likely for
the average person to see useful behavior by default. If things are
rejiggered so the advanced user has to turn things off in order to get
optimal performance when not using these features, in return for the
newbie being more likely to get things working in basic form, that's
probably a net win for PostgreSQL advocacy. But much like
default_statistics_target, there needs to be some more formal work done
on quantifying just how bad each of these overheads really are first.
We lost 3-7% on multiple simple benchmarks between 8.3 and 8.4[1]http://suckit.blog.hu/2009/09/29/postgresql_history and the tests Jignesh did while at Sun
because of that retuning for ease of use on real-world workloads, and
that's not something you want to repeat too often.
[1]: http://suckit.blog.hu/2009/09/29/postgresql_history and the tests Jignesh did while at Sun
Jignesh did while at Sun
--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
On Wed, Oct 20, 2010 at 1:06 AM, Greg Smith <greg@2ndquadrant.com> wrote:
Josh Berkus wrote:
Well, now that you mention it, I also think that "hot standby" should be
the default. Yes, I know about the overhead, but I also think that the
number of our users who want easy replication *far* outnumber the users
who care about an extra 10% WAL overhead.I think this whole situation is similar to the resistance to increasing
default_statistics_target. There's additional overhead added by enabling
both of these settings, in return for making it more likely for the average
person to see useful behavior by default. If things are rejiggered so the
advanced user has to turn things off in order to get optimal performance
when not using these features, in return for the newbie being more likely to
get things working in basic form, that's probably a net win for PostgreSQL
advocacy. But much like default_statistics_target, there needs to be some
more formal work done on quantifying just how bad each of these overheads
really are first. We lost 3-7% on multiple simple benchmarks between 8.3
and 8.4[1] because of that retuning for ease of use on real-world workloads,
and that's not something you want to repeat too often.
Exactly. It doesn't take many 3-7% slowdowns to add up to being 50%
or 100% slower, and that sucks. In fact, I'm still not convinced that
we were wise to boost default_statistics_target as much as we did. I
argued for a smaller boost at the time.
Actually, I think the best thing for default_statistics_target might
be to scale the target based on the number of rows in the table, e.g.
given N rows:
10 + (N / 1000), if N < 40,000
46 + (N / 10000), if 50,000 < N < 3,540,000
400, if N > 3,540,000
Consider a table with 2,000 rows. With default_statistics_target =
100, we can store up to 100 MCVs; and we break the remaining ~1900
values up into 100 buckets with 19 values/bucket. In most cases, that
is probably overkill. Where you tend to run into problems with
inadequate statistics is with the values that are not quite common
enough to be an MCV, but are still significantly more common than
their companions in the same bucket. However, with only 19 values in
a bucket, you're probably not going to have that problem. If you
scale the table down to 1000 rows you now have 9 values in a bucket,
which makes it *really* unlikely you're going to have that problem.
On the other hand, on a table with 4 million rows, it is entirely
likely that there could be more than 100 values whose frequencies are
worth tracking individually, and odds are good also that even if the
planning time is a little longer to no purpose, it'll still be small
relatively to the query execution time. It's unfortunately
impractical for the size of the MCV list to track linearly with the
size of the table, because there are O(n^2) algorithms in use, but I
think some kind of graduated scheme might enable us to buy back some
of that lost performance without damaging real workloads very much.
Possibly even helping real workloads, because you may very well join
large fact tables against small dimension tables, and odds are good
that under the present scheme the fact tables have more statistics
than they really need.
As to replication, I don't believe the contention that most people
will want to use replication. Many will, and that is fine, but many
also won't. The world is full of development and test machines where
replication is a non-issue, and some people won't even run it in
production because the nature of their application makes the data on
that box non-essential, or because they replicate with Bucardo or
Slony. I completely agree that we should make it easier to get
replication set up without (multiple) server restarts, but imposing a
performance overhead on untuned systems is not the right way to do it.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Greg Smith <greg@2ndquadrant.com> writes:
Josh Berkus wrote:
Well, now that you mention it, I also think that "hot standby" should be
the default. Yes, I know about the overhead, but I also think that the
number of our users who want easy replication *far* outnumber the users
who care about an extra 10% WAL overhead.
... But much like
default_statistics_target, there needs to be some more formal work done
on quantifying just how bad each of these overheads really are first.
Quite. Josh, have you got any evidence showing that the penalty is
only 10%? There are cases, such as COPY and ALTER TABLE, where
you'd be looking at 2X or worse penalties, because of the existing
optimizations that avoid writing WAL at all for operations where a
single final fsync can serve the purpose. I'm not sure what the
penalty for "typical" workloads is, partly because I'm not sure what
should be considered a "typical" workload for this purpose.
regards, tom lane
On 20.10.2010 17:19, Tom Lane wrote:
Greg Smith<greg@2ndquadrant.com> writes:
Josh Berkus wrote:
Well, now that you mention it, I also think that "hot standby" should be
the default. Yes, I know about the overhead, but I also think that the
number of our users who want easy replication *far* outnumber the users
who care about an extra 10% WAL overhead.... But much like
default_statistics_target, there needs to be some more formal work done
on quantifying just how bad each of these overheads really are first.Quite. Josh, have you got any evidence showing that the penalty is
only 10%? There are cases, such as COPY and ALTER TABLE, where
you'd be looking at 2X or worse penalties, because of the existing
optimizations that avoid writing WAL at all for operations where a
single final fsync can serve the purpose. I'm not sure what the
penalty for "typical" workloads is, partly because I'm not sure what
should be considered a "typical" workload for this purpose.
Going from wal_level='minimal' to 'archivë́' incurs the penalty on
WAL-logging COPY etc. That's a big penalty. However, the difference
between wal_level='archive' and wal_level='hot_standby' should be tiny.
The big reason for separating those two in 9.0 was that it's all new
code with new ways to fail and, yes, new bugs. It's not smart to expose
people who are not interested in using hot standby to those issues. But
maybe we feel more comfortable merging 'archive' and 'hot_standby'
levels in 9.1.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
Going from wal_level='minimal' to 'archivë́' incurs the penalty on
WAL-logging COPY etc. That's a big penalty. However, the difference
between wal_level='archive' and wal_level='hot_standby' should be tiny.
I'm not sure I believe that either, because of the costs associated with
logging lock acquisitions.
We really need some actual benchmarks in this area, rather than
handwaving ...
regards, tom lane
Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010:
Actually, I think the best thing for default_statistics_target might
be to scale the target based on the number of rows in the table, e.g.
given N rows:10 + (N / 1000), if N < 40,000
46 + (N / 10000), if 50,000 < N < 3,540,000
400, if N > 3,540,000Consider a table with 2,000 rows. With default_statistics_target =
100, we can store up to 100 MCVs; and we break the remaining ~1900
values up into 100 buckets with 19 values/bucket.
Maybe what should be done about this is to have separate sizes for the
MCV list and the histogram, where the MCV list is automatically sized
during ANALYZE.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Wed, Oct 20, 2010 at 10:53 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Robert Haas's message of mié oct 20 10:29:04 -0300 2010:
Actually, I think the best thing for default_statistics_target might
be to scale the target based on the number of rows in the table, e.g.
given N rows:10 + (N / 1000), if N < 40,000
46 + (N / 10000), if 50,000 < N < 3,540,000
400, if N > 3,540,000Consider a table with 2,000 rows. With default_statistics_target =
100, we can store up to 100 MCVs; and we break the remaining ~1900
values up into 100 buckets with 19 values/bucket.Maybe what should be done about this is to have separate sizes for the
MCV list and the histogram, where the MCV list is automatically sized
during ANALYZE.
I thought about that, but I'm not sure there's any particular
advantage. Automatically scaling the histogram seems just as useful
as automatically scaling the MCV list - both things will tend to
reduce the estimation error. For a table with 2,000,000 rows,
automatically setting the statistics target from 100 to the value that
would be computed by the above formula, which happens to be 246, will
help the 101th-246th most common values, because they will now be
MCVs. It will also help all the remaining values, both because
you've pulled 146 fairly common values out of the histogram buckets
and also because each bucket now contains ~8130 values rather than
~20,000.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Oct 20, 2010 at 6:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
Exactly. It doesn't take many 3-7% slowdowns to add up to being 50%
or 100% slower, and that sucks. In fact, I'm still not convinced that
we were wise to boost default_statistics_target as much as we did. I
argued for a smaller boost at the time.
Well we don't want to let ourselves be paralyzed by FUD so it was
important to identify specific concerns and then tackle those
concerns. Once we identified the worst-case planning cases we profiled
them and found that the inflection point of the curve was fairly
clearly above 100 but that there were cases where values below 1,000
caused problems. So I'm pretty happy with the evidence-based approach.
The problem with being overly conservative is that it gives free rein
to the folks who were shouting that we should just set the default to
1,000. They weren't wrong that the 10 was overly conservative and in
the absence of evidence 1,000 was just as reasonable.
Actually, I think the best thing for default_statistics_target might
be to scale the target based on the number of rows in the table, e.g.
given N rows:
The number of buckets needed isn't related to the population size --
it's related to how wide the ranges you'll be estimating selectivity
for are. That is, with our current code, if you're selecting tuples
within a range a..b and that range happens to be the same size as the
bucket size then you'll get an accurate estimate with a fixed 95th
percentile precision regardless of the size of the table (to an
approximation).
I'm not sure how our selectivity works at all for the degenerate case
of selecting for specific values. I don't understand how histograms
are useful for such estimates at all. I think the MCV lists are
basically an attempt to overcome this problem and as you point out I'm
not sure the statistics target is really the right thign to control
them -- but since I don't think there's any real statistics behind
them I'm not sure there's any right way to control them.
--
greg