Feature request: pg_basebackup --force
Magnus, all:
It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup. This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.
Couldn't we have a --force option which would clear all data and tablespace directories before resynching?
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco
Import Notes
Reply to msg id not found: 392518426.53210.1302373566098.JavaMail.root@mail-1.01.com
On Sat, Apr 9, 2011 at 20:26, Joshua Berkus <josh@agliodbs.com> wrote:
Magnus, all:
It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup. This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.
Couldn't we have a --force option which would clear all data and tablespace directories before resynching?
That could certainly be useful, yes. But I have a feeling whomever
tries to get that into 9.1 will be killed - but it's certainly good to
put ont he list of things for 9.2.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus,
That could certainly be useful, yes. But I have a feeling whomever
tries to get that into 9.1 will be killed - but it's certainly good to
put ont he list of things for 9.2.
Oh, no question. At some point in 9.2 we should also discuss how basebackup considers "emtpy" directories. Because the other thing I find myself constantly scripting is replacing the conf files on the replica after the base backup sync.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco
On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh@agliodbs.com> wrote:
It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup. This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.
Couldn't we have a --force option which would clear all data and tablespace directories before resynching?
What would be even more useful us some kind of support for
differential copy, a la rsync.
(Now I'm waiting for someone to tell me this is a pipe dream.)
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh@agliodbs.com> wrote:
Couldn't we have a --force option which would clear all data and tablespace directories before resynching?
What would be even more useful us some kind of support for
differential copy, a la rsync.
(Now I'm waiting for someone to tell me this is a pipe dream.)
Not so much a pipe dream as reinventing the wheel. Why not use rsync?
regards, tom lane
On Sun, Apr 10, 2011 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh@agliodbs.com> wrote:
Couldn't we have a --force option which would clear all data and tablespace directories before resynching?
What would be even more useful us some kind of support for
differential copy, a la rsync.(Now I'm waiting for someone to tell me this is a pipe dream.)
Not so much a pipe dream as reinventing the wheel. Why not use rsync?
It's not integrated and I doubt it's conveniently available on Windows.
One of the biggest problems with our replication functionality right
now is that it's hard to set up. We've actually done a good job
making the very simplest case (one slave, no archive) reasonably
simple, but how many PostgreSQL users do you think can manage to set
up SR + HS + archiving, with two slaves that can use the archive if
they fall too far behind the master, but with the archive regularly
trimmed to the farthest-back segment that is still needed?
We have pg_archivecleanup, but AIUI that's only smart enough to handle
the one-standby case.
Admittedly, the above is a slightly different problem, but I think it
all points in the direction of needing more automation and more ease
of use.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Sun, Apr 10, 2011 at 12:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
It's not integrated and I doubt it's conveniently available on Windows.
One of the biggest problems with our replication functionality right
now is that it's hard to set up. We've actually done a good job
making the very simplest case (one slave, no archive) reasonably
simple, but how many PostgreSQL users do you think can manage to set
up SR + HS + archiving, with two slaves that can use the archive if
they fall too far behind the master, but with the archive regularly
trimmed to the farthest-back segment that is still needed?We have pg_archivecleanup, but AIUI that's only smart enough to handle
the one-standby case.Admittedly, the above is a slightly different problem, but I think it
all points in the direction of needing more automation and more ease
of use.
And let me also note that the difficulty of getting this all exactly
right is one of the things that causes people to come up with creative
solutions like this:
http://archives.postgresql.org/pgsql-hackers/2010-12/msg02514.php
That's why we need to put it in a box, tie a bow around it, and put up
a big sign that says "do not look into laser with remaining eye".
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 10.04.2011 20:06, Robert Haas wrote:
On Sun, Apr 10, 2011 at 12:41 PM, Robert Haas<robertmhaas@gmail.com> wrote:
Admittedly, the above is a slightly different problem, but I think it
all points in the direction of needing more automation and more ease
of use.And let me also note that the difficulty of getting this all exactly
right is one of the things that causes people to come up with creative
solutions like this:http://archives.postgresql.org/pgsql-hackers/2010-12/msg02514.php
That's why we need to put it in a box, tie a bow around it, and put up
a big sign that says "do not look into laser with remaining eye".
That's exactly what pg_basebackup does. Once you move into more
complicated scenarios with multiple standbys and WAL archiving, it's
inevitably going to be more complicated to set up.
That doesn't mean that we can't make it easier - we can and we should -
but I don't think the common complaint that replication is hard to set
up is true anymore.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
That's exactly what pg_basebackup does. Once you move into more
complicated scenarios with multiple standbys and WAL archiving,
it's inevitably going to be more complicated to set up.That doesn't mean that we can't make it easier - we can and we
should - but I don't think the common complaint that replication
is hard to set up is true anymore.
Getting back to the rsync-like behavior, which is what led the
conversation in this direction, I think -- the point of that seemed
to be to allow similar ease of use for those activating a replicated
node as the master, without requiring that the entire data directory
be sent over a slow WAN or Internet path when the delta needed to
modify what was already at the remote end to match the new master
might be orders of magnitude less than data than that.
The intelligence to support that would be a fraction of what is in
rsync. In fact, since we might want to ignore hint bit differences
where possible, rsync might not work nearly as well as a home-grown
solution.
-Kevin