Simplifying replication

Started by Josh Berkusover 15 years ago50 messageshackers
Jump to latest
#1Josh Berkus
josh@agliodbs.com

Robert asked me to write this up, so here it is.

It is critical that we make replication easier to set up, administrate
and monitor than it currently is. In my conversations with people, this
is more important to our users and the adoption of PostgreSQL than
synchronous replication is.

First, I'm finding myself constantly needing to tutor people on how to
set up replication. The mere fact that it requires a minimum 1-hour
class to explain how to use it, or a 10-page tutoral, tells us it's too
complex. As further evidence, Bruce and I explained binary replication
to several MySQL geeks at OpenSQLCamp last weekend, and they were
horrified at the number and complexity of the steps required. As it
currently is, binary replication is not going to win us a lot of new
users from the web development or virtualization world.

I had to write it up a couple of times; I started with a critique of the
various current commands and options, but that seemed to miss the point.
So instead, let me lay out how I think replication should work in my
dream world 9.1:

1. Any postgresql standalone server can become a replication master
simply by enabling replication connections in pg_hba.conf. No other
configuration is required, and no server restart is required.

2. Should I choose to adjust master configuration, for say performance
reasons, most replication variables (including ones like
wal_keep_segments) should be changeable without a server restart.

3. I can configure a standby by copying the same postgresql.conf on the
master. I only have to change a single configuration variable (the
primary_conninfo, or maybe a replication_mode setting) in order to start
the server in standby mode. GUCs which apply only to masters are ignored.

4. I can start a new replica off the master by running a single
command-line utility on the standby and giving it connection information
to the master. Using this connection, it should be able to start a
backup snapshot, copy the entire database and any required logs, and
then come up in standby mode. All that should be required for this is
one or two highport connections to the master. No recovery.conf file is
required, or exists.

5. I can to cause the standby to fail over with a single command to the
failover server. If this is a trigger file, then it already has a
default path to the trigger file in postgresql.conf, so that this does
not require reconfiguration and restart of the standby at crisis time.
Ideally, I use a "pg_failover" command or something similar.

6. Should I decide to make the standby the new master, this should also
be possible with a single command and a one-line configuration on the
other standbys. To aid this, we have an easy way to tell which standby
in a group are most "caught up". If I try to promote the wrong standby
(it's behind or somehow incompatible), it should fail with an
appropriate message.

7. Should I choose to use archive files as well as streaming
replication, the utilities to manage them (such as pg_archivecleanup and
pg_standby) are built and installed with PostgreSQL by default, and do
not require complex settings with escape codes.

That's my vision of "simple replication". It is also 100% achieveable.
We just have to priorities ease-of-use over having, and requiring the
user to set, 1,000 little knobs.

Speaking of knobs .... (next message)

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#2Brendan Jurd
direvus@gmail.com
In reply to: Josh Berkus (#1)
Re: Simplifying replication

On 19 October 2010 11:16, Josh Berkus <josh@agliodbs.com> wrote:

4. I can start a new replica off the master by running a single command-line
utility on the standby and giving it connection information to the master.
 Using this connection, it should be able to start a backup snapshot, copy
the entire database and any required logs, and then come up in standby mode.
 All that should be required for this is one or two highport connections to
the master.  No recovery.conf file is required, or exists.

Having just configured replication for the first time with 9.0, I
agree in general with your whole message, but in particular, I want to
give a double-thumbs-up to the above.

Resolving this one item would subtract a great deal of pain -- and
potential for error -- from the process.

Cheers,
BJ

#3Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Josh Berkus (#1)
Re: Simplifying replication

Hi,

Josh Berkus <josh@agliodbs.com> writes:

It is critical that we make replication easier to set up, administrate and
monitor than it currently is. In my conversations with people, this is more
important to our users and the adoption of PostgreSQL than synchronous
replication is.

I want to say a big big +1 here. The way replication and PITR setup are
implemented now are a very good prototype, it's time to consolidate and
get to something usable by normal people, as opposed to PostgreSQL full
time geeks.

Well, the current setup offers lots of flexibility which we'd better not
lose in the process, but the simple setup simply does not exists yet.

1. Any postgresql standalone server can become a replication master simply
by enabling replication connections in pg_hba.conf. No other configuration
is required, and no server restart is required.

That sounds as simple as changing the default wal_level to hot_standby,
and the default max_wal_senders to non-zero.

2. Should I choose to adjust master configuration, for say performance
reasons, most replication variables (including ones like wal_keep_segments)
should be changeable without a server restart.

Anybody know how difficult that is without having to spend lots of time
studying the source code with the question in mind?

3. I can configure a standby by copying the same postgresql.conf on the
master. I only have to change a single configuration variable (the
primary_conninfo, or maybe a replication_mode setting) in order to start the
server in standby mode. GUCs which apply only to masters are ignored.

4. I can start a new replica off the master by running a single command-line
utility on the standby and giving it connection information to the master.
Using this connection, it should be able to start a backup snapshot, copy
the entire database and any required logs, and then come up in standby mode.
All that should be required for this is one or two highport connections to
the master. No recovery.conf file is required, or exists.

There's a prototype to stream a base backup from a libpq connection, I
think someone here wanted to integrate that into the replication
protocol itself. It should be doable with a simple libpq connection and
all automated.

The pg_basebackup python client software is 100 lines of code. It's
mainly a recursive query to get the list of files within the master,
then two server side functions to get binary file chunks,
compressed. Then client side, a loop to decompress and write the chunks
at the right place. That's it.

http://github.com/dimitri/pg_basebackup/blob/master/pg_basebackup.py

I could prepare a patch given some advice on the replication protocol
integration. For one, is streaming a base backup something that
walsender should care about?

5. I can to cause the standby to fail over with a single command to the
failover server. If this is a trigger file, then it already has a default
path to the trigger file in postgresql.conf, so that this does not require
reconfiguration and restart of the standby at crisis time. Ideally, I use a
"pg_failover" command or something similar.

This feature is in walmgr.py from Skytools and it's something necessary
to have in -core now that we have failover standby capacity. Much
agreed, and the pg_failover command is a good idea.

BTW, do we have a clear idea of how to implement pg_ping, and should it
reports current WAL location(s) of a standby?

6. Should I decide to make the standby the new master, this should also be
possible with a single command and a one-line configuration on the other
standbys. To aid this, we have an easy way to tell which standby in a group
are most "caught up". If I try to promote the wrong standby (it's behind or
somehow incompatible), it should fail with an appropriate message.

That needs a way to define a group of standby. There's nothing there
that makes them know about each other. That could fall off the automated
registration of them in a shared catalog on the master, with this shared
catalog spread over (hard-coded) asynchronous replication (sync ==
disaster here). But there's no agreement on this feature yet.

Then you need a way to organise them in groups in this shared catalog,
and you need to ask your network admins to make it so that they can
communicate with each other.

Now say we have pg_ping (or another tool) returning the current recv,
applied and synced LSNs, it would be possible for any standby to figure
out which other ones must be shot in case you failover here. The
failover command could list those other standby in the group that you're
behind of, and with a force command allow you to still failover to this
one. Now you have to STONITH the one listed, but that's your problem
after all.

Then, of course, any standby that's not in the same group as the one
that you failed over to has to be checked and resynced.

7. Should I choose to use archive files as well as streaming replication,
the utilities to manage them (such as pg_archivecleanup and pg_standby) are
built and installed with PostgreSQL by default, and do not require complex
settings with escape codes.

Now that PITR has been in for a long enough time, we *need* to take it
to next step integration-wise. By that I mean that we have to support
internal commands and provide reasonable default implementation of the
different scripts needed (in portable C, hence "internal").

There are too many pitfalls in this part of the setup to be serious in
documenting them all and expecting people to come up with bash or perl
implementations that avoid them all. That used to be good enough, but
Josh is right, we need to get even better!

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#4Greg Smith
gsmith@gregsmith.com
In reply to: Josh Berkus (#1)
Re: Simplifying replication

Josh Berkus wrote:

It is critical that we make replication easier to set up, administrate
and monitor than it currently is. In my conversations with people,
this is more important to our users and the adoption of PostgreSQL
than synchronous replication is.

You should enjoy one of the patches we're furiously working on then,
which is aiming at some of the administration and monitoring pieces
here. I have my own grand vision of how easy replication should be to
setup too. Visions and plans are nice, but building functional pieces
of them and delivering them to the community is what actually moves
PostgreSQL forward. So far, multiple people have done that for sync
rep, and what we're supposed to be focused on at this stage in the
development cycle is finishing the work related to the open CommitFest
item that includes that.

I find this launch into a new round of bike-shedding a bit distracting.
If you want this to be easier to use, which it's obvious to any observer
it should be because what's delivered in 9.0 is way too complicated,
please work on finding development resources to assign to that problem.
Because that's the bottleneck on simplifying things, not ideas about
what to do. I would recommend finding or assigning a developer to work
on integrating base backup in to the streaming protocol as the biggest
single thing that would improve the built-in replication. All of the
rest of the trivia about what knobs to set and such are tiny details
that make for only a minor improvement until that's taken care of.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

#5Josh Berkus
josh@agliodbs.com
In reply to: Greg Smith (#4)
Re: Simplifying replication

You should enjoy one of the patches we're furiously working on then,
which is aiming at some of the administration and monitoring pieces
here.

Great, glad to hear it! Would you be willing to go into detail?

I have my own grand vision of how easy replication should be to
setup too.

So, share it. I'd look forward to hearing it, especially since your
vision probably takes synch rep and quorum commit into account, which
mine doesn't. If not here, then on your blog.

Visions and plans are nice, but building functional pieces of
them and delivering them to the community is what actually moves
PostgreSQL forward.

*shrug*. Robert asked me to write it up for the list based on the
discussions around synch rep. Now you're going to bash me for doing so?

Many of the goals I described will mean removing knobs and changing
defaults, or even foregoing fine-grained control entirely. If we don't
have agreement that simplifying replication is a high-priority goal,
then it won't happen; anyone submitting a patch will be
this-or-that-use-cased to death and will give up.

For that matter, I'm not sure that everyone agrees that simplification
is a worthwhile goal. For example, somewhere between 9.0beta4 and final
release, someone changed the defaults for max_wal_senders and
hot_standby to "0" and "off". I don't remember there even being
discussion about it.

The discussion around synch rep certainly showed that the "natural"
tendency of this list is to add complexity with each incarnation of a
feature. It's the easiest way to accomodate conflicting use cases, but
it's not the best way.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#6Greg Smith
gsmith@gregsmith.com
In reply to: Josh Berkus (#5)
Re: Simplifying replication

Josh Berkus wrote:

*shrug*. Robert asked me to write it up for the list based on the
discussions around synch rep. Now you're going to bash me for doing so?

Sorry, next time I'll make sure to bash Robert too. I don't have any
problems with the basic ideas you're proposing, just concerns about when
the right time to get into that whole giant subject is and who is going
to work on.

--
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

#7Josh Berkus
josh@agliodbs.com
In reply to: Dimitri Fontaine (#3)
Re: Simplifying replication

Dimitri, Greg,

I want to say a big big +1 here. The way replication and PITR setup are
implemented now are a very good prototype, it's time to consolidate and
get to something usable by normal people, as opposed to PostgreSQL full
time geeks.

Well, one thing to be addressed is separating the PITR functionality
from replication. PITR needs a lot of features -- timelines, recovery
stop points, etc. -- which replication doesn't need or want. I think
that focussing on streaming replication functionality and ignoring the
archive logs case is probably the best way to logically separate these
two. Presumably anyone who needs archive logs as well will be a
professional DBA.

I could prepare a patch given some advice on the replication protocol
integration. For one, is streaming a base backup something that
walsender should care about?

Yeah, I thought there was a prototype for this somewhere. From a user
perspective, using a 2nd pgport connection for the initial clone is
fine. I don't know if we want to worry about it otherwise from a
resource management perspective; presumably the cloning process is going
to be a pretty big performance hit on the master.

BTW, do we have a clear idea of how to implement pg_ping, and should it
reports current WAL location(s) of a standby?

pg_ping?

That needs a way to define a group of standby. There's nothing there
that makes them know about each other.

Let me clarify. I meant that if I try to make a *single* standby point
to a new master, and that new master was behind the standby when it
failed over, then the attempt to remaster should fail with an error.

I do *not* want to get into standby groups. That way lies madness. ;-)

Now say we have pg_ping (or another tool) returning the current recv,
applied and synced LSNs, it would be possible for any standby to figure
out which other ones must be shot in case you failover here. The
failover command could list those other standby in the group that you're
behind of, and with a force command allow you to still failover to this
one. Now you have to STONITH the one listed, but that's your problem
after all.

The LSN isn't enough; as others have pointed out, we have a fairly
serious failure case if a standby comes up as a master, accepts
transactions, and then we try to remaster a 2nd standby which was
actually ahead of the first standby at the time of master failure. I
haven't seen a solution posted to that yet; maybe I missed it?

Sorry, next time I'll make sure to bash Robert too. I don't have any
problems with the basic ideas you're proposing, just concerns about when
the right time to get into that whole giant subject is and who is going
to work on.

If not now, when? The 2nd CommitFest is almost complete. If we're
going to make any substantial changes, we need to have patches for the
3rd commitfest. And I didn't see anyone discussing simplification until
I brought it up.

I don't realistically think that we're going to get 100% simplification
for 9.1. But it would be nice to at least get some components, which
means getting agreement on how things should work, at least roughly.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#8Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#7)
Re: Simplifying replication

On Tue, Oct 19, 2010 at 9:16 AM, Josh Berkus <josh@agliodbs.com> wrote:

Well, one thing to be addressed is separating the PITR functionality from
replication.  PITR needs a lot of features -- timelines, recovery stop
points, etc. -- which replication doesn't need or want.  I think that
focussing on streaming replication functionality and ignoring the archive
logs case is probably the best way to logically separate these two.
 Presumably anyone who needs archive logs as well will be a professional
DBA.

The way things stand you *always* need archived logs. Even if you have
streaming set up it might try to use archived logs if it falls too far
behind.

Also all the features PITR needs are needed by replication as well.
Recovery stop points are absolutely critical. Otherwise if your
replica crashed it would have to start over from the original clone
time and replay all logs since then.

Timelines are not as obvious but perhaps that's our own mistake. When
you fail over to your replica shouldn't the new master get a new
timelineid? Isn't that the answer to the failure case when a slave
finds it's ahead of the master? If it has already replayed logs from a
different timelineid in the same lsn range then it can't switch
timelines to follow the new master. But if it hasn't then it can.

--
greg

#9Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#8)
Re: Simplifying replication

Greg,

The way things stand you *always* need archived logs. Even if you have
streaming set up it might try to use archived logs if it falls too far
behind.

Actually, you don't. If you're willing to accept possible
desynchronization and recloning of the standbys, then you can skip the
archive logs.

Timelines are not as obvious but perhaps that's our own mistake. When
you fail over to your replica shouldn't the new master get a new
timelineid? Isn't that the answer to the failure case when a slave
finds it's ahead of the master? If it has already replayed logs from a
different timelineid in the same lsn range then it can't switch
timelines to follow the new master. But if it hasn't then it can.

Oh? Do we have this information (i.e. what LSNs are associated with
which timeline)?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#10Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Josh Berkus (#7)
Re: Simplifying replication

Josh Berkus <josh@agliodbs.com> writes:

Well, one thing to be addressed is separating the PITR functionality from
replication. PITR needs a lot of features -- timelines, recovery stop
points, etc. -- which replication doesn't need or want. I think that
focussing on streaming replication functionality and ignoring the archive
logs case is probably the best way to logically separate these two.
Presumably anyone who needs archive logs as well will be a professional DBA.

So, I've been thinking some more about this.

We now have two modes of operation when starting up a PostgreSQL
cluster, either it's a primary/master or it's a standby (in recovery or
hot_standby). What I think would make sense here would be to add another
mode of operation, archiving.

A cluster that is archiving will accept replication connections and will
accept WAL files streamed there, that it will put on its archive
directory, defaults to $PGDATA/pg_xlog_archive. It should also be able
to take a base backup from its primary server, maybe with a new pg_ctl
command. The base backup location defaults to
$PGDATA/pg_basebackup/$label.

Then, it would also accept replication connection in the other way
around, starting a walsender to publish its archive or its base backup,
so that you could prepare a new standby against the archive server, then
switch the primary_conninfo to the master and that's it.

Now, the archive and restore commands in this setup would be internal
commands pg_walsender and pg_walreceiver. That would mean we need to
extend those spacial backends to be able to send and receive a full WAL
file but it sounds simple enough, right?

Then we could add support for those to compress the WAL files before
streaming, as we're talking "internal commands" here it sounds easy
enough (we're already linking against gzip I think).

The base backup streaming support would be there for the archiving mode
but also for the normal standby "empty $PGDATA" initial start up.

Comments?
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#11Robert Treat
xzilla@users.sourceforge.net
In reply to: Greg Smith (#4)
Re: Simplifying replication

On Tue, Oct 19, 2010 at 11:16 AM, Greg Smith <greg@2ndquadrant.com> wrote:

Josh Berkus wrote:

It is critical that we make replication easier to set up, administrate and
monitor than it currently is. In my conversations with people, this is more
important to our users and the adoption of PostgreSQL than synchronous
replication is.

<snip>

I find this launch into a new round of bike-shedding a bit distracting. If
you want this to be easier to use, which it's obvious to any observer it
should be because what's delivered in 9.0 is way too complicated, please
work on finding development resources to assign to that problem. Because
that's the bottleneck on simplifying things, not ideas about what to do. I
would recommend finding or assigning a developer to work on integrating base
backup in to the streaming protocol as the biggest single thing that would
improve the built-in replication. All of the rest of the trivia about what
knobs to set and such are tiny details that make for only a minor
improvement until that's taken care of.

Yeah, I'm sure we all think it should be easier, but figuring out what that
means is certainly a moving target. The idea of being able to create a base
backup automagically sounds good, but comparatively it's
not significantly more difficult than what many other systems make you do,
and actually if done incorrectly could be something rather limiting. On the
whole the customers we are talking with are far more concerned about things
like managing failover scenarios when you have multiple slaves, and it's the
lack of capabilities around those kinds of things that hurt postgres
adoption much more than it being hard to set up.

Robert Treat
play: http://www.xzilla.net
work: http://omniti.com/is/hiring

#12Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Josh Berkus (#1)
Re: Simplifying replication

On 19/10/10 13:16, Josh Berkus wrote:

Robert asked me to write this up, so here it is.

It is critical that we make replication easier to set up, administrate
and monitor than it currently is. In my conversations with people,
this is more important to our users and the adoption of PostgreSQL
than synchronous replication is.

First, I'm finding myself constantly needing to tutor people on how to
set up replication. The mere fact that it requires a minimum 1-hour
class to explain how to use it, or a 10-page tutoral, tells us it's
too complex. As further evidence, Bruce and I explained binary
replication to several MySQL geeks at OpenSQLCamp last weekend, and
they were horrified at the number and complexity of the steps
required. As it currently is, binary replication is not going to win
us a lot of new users from the web development or virtualization world.

+1

I've been having the same experience - how to set this up and do
failover and failback etc occupies quite a bit of time in courses I've
been teaching here in NZ and Australia. Having this whole replication
business much simpler is definitely the way to go.

A good example of how simple it can be is mongodb, where it is
essentially one command to setup a 2 replica system with a voting arbiter:

$ mongo

rs.initiate(

{
_id : "replication_set0",
members : [
{ _id : 0, host : "192.163,2,100" },
{ _id : 1, host : "192.168.2.101" },
{ _id : 2, host : "192.168.2.103", arbiterOnly : true }
]
}
)

#13Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#9)
Re: Simplifying replication

Josh Berkus wrote:

Greg,

The way things stand you *always* need archived logs. Even if you have
streaming set up it might try to use archived logs if it falls too far
behind.

Actually, you don't. If you're willing to accept possible
desynchronization and recloning of the standbys, then you can skip the
archive logs.

Agreed, but as a reality check: when I proposed that wal_keep_segments
= -1 would keep all WAL segments (for use while the file system was
being backed up), I was told administrators shoud compute how much free
disk space they had. Obviously easy of use is not our #1 priority.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#14Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#13)
Re: Simplifying replication

On Thu, Oct 21, 2010 at 8:22 PM, Bruce Momjian <bruce@momjian.us> wrote:

Josh Berkus wrote:

Greg,

The way things stand you *always* need archived logs. Even if you have
streaming set up it might try to use archived logs if it falls too far
behind.

Actually, you don't.  If you're willing to accept possible
desynchronization and recloning of the standbys, then you can skip the
archive logs.

Agreed, but as a reality check:  when I proposed that wal_keep_segments
= -1 would keep all WAL segments (for use while the file system was
being backed up), I was told administrators shoud compute how much free
disk space they had.  Obviously easy of use is not our #1 priority.

Amen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#13)
Re: Simplifying replication

Agreed, but as a reality check: when I proposed that wal_keep_segments
= -1 would keep all WAL segments (for use while the file system was
being backed up), I was told administrators shoud compute how much free
disk space they had. Obviously easy of use is not our #1 priority.

Depends. Running out of disk space isn't exactly user-friendly either.
And detecting how much free space is available would be a painful bit
of platform-dependant code ...

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#16Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#15)
Re: Simplifying replication

On Thu, Oct 21, 2010 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote:

Agreed, but as a reality check:  when I proposed that wal_keep_segments
= -1 would keep all WAL segments (for use while the file system was
being backed up), I was told administrators shoud compute how much free
disk space they had.  Obviously easy of use is not our #1 priority.

Depends.  Running out of disk space isn't exactly user-friendly either.
 And detecting how much free space is available would be a painful bit
of platform-dependant code ...

Nor can we assume we're the only thing using disk space.

However the user-unfriendliness isn't the fact that administrators
need to determine how much disk they're willing to dedicate to
Postgres. The user-unfriendliness is that they then have to specify
this in terms of WAL log files and also have to know that we sometimes
keep more than that and so on.

We've done a good job in the past of converting GUC variables to
meaningful units for administrators and users but it's an ongoing
effort. If we need a GUC to control the amount of disk space we use it
should be in units of MB/GB/TB. If we need a GUC for controlling how
much WAL history to keep for recovering standbys or replicas then it
should be specified in units of time.

Units like "number of wal files" or worse in the case of
checkpoint_segments "number of wal files / 2 - 1" or something like
that.... are terrible. They require arcane knowledge for the
administrator to have a clue how to set.

--
greg

#17Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#16)
Re: Simplifying replication

On Thu, Oct 21, 2010 at 8:52 PM, Greg Stark <gsstark@mit.edu> wrote:

On Thu, Oct 21, 2010 at 5:46 PM, Josh Berkus <josh@agliodbs.com> wrote:

Agreed, but as a reality check:  when I proposed that wal_keep_segments
= -1 would keep all WAL segments (for use while the file system was
being backed up), I was told administrators shoud compute how much free
disk space they had.  Obviously easy of use is not our #1 priority.

Depends.  Running out of disk space isn't exactly user-friendly either.
 And detecting how much free space is available would be a painful bit
of platform-dependant code ...

Nor can we assume we're the only thing using disk space.

However the user-unfriendliness isn't the fact that administrators
need to determine how much disk they're willing to dedicate to
Postgres. The user-unfriendliness is that they then have to specify
this in terms of WAL log files and also have to know that we sometimes
keep more than that and so on.

We've done a good job in the past of converting GUC variables to
meaningful units for administrators and users but it's an ongoing
effort. If we need a GUC to control the amount of disk space we use it
should be in units of MB/GB/TB. If we need a GUC for controlling how
much WAL history to keep for recovering standbys or replicas then it
should be specified in units of time.

Units like "number of wal files" or worse in the case of
checkpoint_segments "number of wal files / 2 - 1" or something like
that.... are terrible. They require arcane knowledge for the
administrator to have a clue how to set.

Very true. But the lack of a -1 setting for wal_keep_segments means
that if you would like to take a backup without archiving, you must
set wal_keep_segments to a value greater than or equal to the rate at
which you generate WAL segments multiplied by the time it takes you to
run a backup. If that doesn't qualify as requiring arcane knowledge,
I'm mystified as to what ever could.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#17)
Re: Simplifying replication

Robert Haas wrote:

However the user-unfriendliness isn't the fact that administrators
need to determine how much disk they're willing to dedicate to
Postgres. The user-unfriendliness is that they then have to specify
this in terms of WAL log files and also have to know that we sometimes
keep more than that and so on.

We've done a good job in the past of converting GUC variables to
meaningful units for administrators and users but it's an ongoing
effort. If we need a GUC to control the amount of disk space we use it
should be in units of MB/GB/TB. If we need a GUC for controlling how
much WAL history to keep for recovering standbys or replicas then it
should be specified in units of time.

Units like "number of wal files" or worse in the case of
checkpoint_segments "number of wal files / 2 - 1" or something like
that.... are terrible. They require arcane knowledge for the
administrator to have a clue how to set.

Very true. But the lack of a -1 setting for wal_keep_segments means
that if you would like to take a backup without archiving, you must
set wal_keep_segments to a value greater than or equal to the rate at
which you generate WAL segments multiplied by the time it takes you to
run a backup. If that doesn't qualify as requiring arcane knowledge,
I'm mystified as to what ever could.

LOL. Time machine required (both forward and backward time options).

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#19Josh Berkus
josh@agliodbs.com
In reply to: Bruce Momjian (#18)
Re: Simplifying replication

Very true. But the lack of a -1 setting for wal_keep_segments means
that if you would like to take a backup without archiving, you must
set wal_keep_segments to a value greater than or equal to the rate at
which you generate WAL segments multiplied by the time it takes you to
run a backup. If that doesn't qualify as requiring arcane knowledge,
I'm mystified as to what ever could.

Speaking of which, what's the relationship between checkpoint_segments
and wal_keep_segments? PG seems perfectly willing to let me set the
latter higher than the former, and it's not documented.

If checkpoint_segments were a hard limit, then we could let admins set
wal_keep_segments to -1, knowing that they'd set checkpoint_segments to
the max space they had available.

Although we might want to rename those.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

#20Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#19)
Re: Simplifying replication

On Thu, Oct 21, 2010 at 9:09 PM, Josh Berkus <josh@agliodbs.com> wrote:

Very true.  But the lack of a -1 setting for wal_keep_segments means
that if you would like to take a backup without archiving, you must
set wal_keep_segments to a value greater than or equal to the rate at
which you generate WAL segments multiplied by the time it takes you to
run a backup.  If that doesn't qualify as requiring arcane knowledge,
I'm mystified as to what ever could.

Speaking of which, what's the relationship between checkpoint_segments
and wal_keep_segments?  PG seems perfectly willing to let me set the
latter higher than the former, and it's not documented.

I think it's pretty well explained in the fine manual.

http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-KEEP-SEGMENTS

If checkpoint_segments were a hard limit, then we could let admins set
wal_keep_segments to -1, knowing that they'd set checkpoint_segments to
the max space they had available.

This assumes that more checkpoint segments is always better, which
isn't true. I might have 100 GB of disk space free, but not want to
replay WAL for 4 days if I have a crash.

I do think that the current default of checkpoint_segments=3 is
pathologically insane, but that's another can of worms.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#20)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#21)
#23Fujii Masao
masao.fujii@gmail.com
In reply to: Josh Berkus (#21)
#24Josh Berkus
josh@agliodbs.com
In reply to: Fujii Masao (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#24)
#26Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#25)
#27Fujii Masao
masao.fujii@gmail.com
In reply to: Josh Berkus (#26)
#28Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#17)
#29Josh Berkus
josh@agliodbs.com
In reply to: Fujii Masao (#27)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#28)
#31Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#31)
#33Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#32)
#34Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#32)
#35Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Josh Berkus (#34)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#35)
#37Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#28)
#38Robert Haas
robertmhaas@gmail.com
In reply to: Simon Riggs (#33)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#37)
#40Josh Berkus
josh@agliodbs.com
In reply to: Robert Haas (#39)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#40)
#42Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Josh Berkus (#37)
#43Hannu Krosing
hannu@tm.ee
In reply to: Dimitri Fontaine (#3)
#44Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Hannu Krosing (#43)
#45Hannu Krosing
hannu@tm.ee
In reply to: Dimitri Fontaine (#44)
#46Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Hannu Krosing (#45)
#47Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#41)
#48Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#38)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#47)
#50Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#49)