Feature request: pg_basebackup --force

Started by Joshua Berkusalmost 15 years ago9 messages

josh@agliodbs.com

almost 15 years ago

Magnus, all:

It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup. This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.

Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

Import Notes

Reply to msg id not found: 392518426.53210.1302373566098.JavaMail.root@mail-1.01.com

Magnus Hagander

magnus@hagander.net

almost 15 years ago

In reply to: Joshua Berkus (#1)

Re: Feature request: pg_basebackup --force

On Sat, Apr 9, 2011 at 20:26, Joshua Berkus <josh@agliodbs.com> wrote:

Magnus, all:

It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup. This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.

Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

That could certainly be useful, yes. But I have a feeling whomever
tries to get that into 9.1 will be killed - but it's certainly good to
put ont he list of things for 9.2.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Joshua Berkus

josh@agliodbs.com

almost 15 years ago

In reply to: Magnus Hagander (#2)

Re: Feature request: pg_basebackup --force

Magnus,

That could certainly be useful, yes. But I have a feeling whomever
tries to get that into 9.1 will be killed - but it's certainly good to
put ont he list of things for 9.2.

Oh, no question. At some point in 9.2 we should also discuss how basebackup considers "emtpy" directories. Because the other thing I find myself constantly scripting is replacing the conf files on the replica after the base backup sync.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Joshua Berkus (#1)

Re: Feature request: pg_basebackup --force

On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh@agliodbs.com> wrote:

It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup. This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.

Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

What would be even more useful us some kind of support for
differential copy, a la rsync.

(Now I'm waiting for someone to tell me this is a pipe dream.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tom Lane

tgl@sss.pgh.pa.us

almost 15 years ago

In reply to: Robert Haas (#4)

Re: Feature request: pg_basebackup --force

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh@agliodbs.com> wrote:

Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

What would be even more useful us some kind of support for
differential copy, a la rsync.

(Now I'm waiting for someone to tell me this is a pipe dream.)

Not so much a pipe dream as reinventing the wheel. Why not use rsync?

regards, tom lane

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Tom Lane (#5)

Re: Feature request: pg_basebackup --force

On Sun, Apr 10, 2011 at 12:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh@agliodbs.com> wrote:

Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

What would be even more useful us some kind of support for
differential copy, a la rsync.

(Now I'm waiting for someone to tell me this is a pipe dream.)

Not so much a pipe dream as reinventing the wheel. Why not use rsync?

It's not integrated and I doubt it's conveniently available on Windows.

One of the biggest problems with our replication functionality right
now is that it's hard to set up. We've actually done a good job
making the very simplest case (one slave, no archive) reasonably
simple, but how many PostgreSQL users do you think can manage to set
up SR + HS + archiving, with two slaves that can use the archive if
they fall too far behind the master, but with the archive regularly
trimmed to the farthest-back segment that is still needed?

We have pg_archivecleanup, but AIUI that's only smart enough to handle
the one-standby case.

Admittedly, the above is a slightly different problem, but I think it
all points in the direction of needing more automation and more ease
of use.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Robert Haas

robertmhaas@gmail.com

almost 15 years ago

In reply to: Robert Haas (#6)

Re: Feature request: pg_basebackup --force

On Sun, Apr 10, 2011 at 12:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:

It's not integrated and I doubt it's conveniently available on Windows.

One of the biggest problems with our replication functionality right
now is that it's hard to set up. We've actually done a good job
making the very simplest case (one slave, no archive) reasonably
simple, but how many PostgreSQL users do you think can manage to set
up SR + HS + archiving, with two slaves that can use the archive if
they fall too far behind the master, but with the archive regularly
trimmed to the farthest-back segment that is still needed?

We have pg_archivecleanup, but AIUI that's only smart enough to handle
the one-standby case.

Admittedly, the above is a slightly different problem, but I think it
all points in the direction of needing more automation and more ease
of use.

And let me also note that the difficulty of getting this all exactly
right is one of the things that causes people to come up with creative
solutions like this:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02514.php

That's why we need to put it in a box, tie a bow around it, and put up
a big sign that says "do not look into laser with remaining eye".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 15 years ago

In reply to: Robert Haas (#7)

Re: Feature request: pg_basebackup --force

On 10.04.2011 20:06, Robert Haas wrote:

On Sun, Apr 10, 2011 at 12:41 PM, Robert Haas<robertmhaas@gmail.com> wrote:

Admittedly, the above is a slightly different problem, but I think it
all points in the direction of needing more automation and more ease
of use.

And let me also note that the difficulty of getting this all exactly
right is one of the things that causes people to come up with creative
solutions like this:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02514.php

That's why we need to put it in a box, tie a bow around it, and put up
a big sign that says "do not look into laser with remaining eye".

That's exactly what pg_basebackup does. Once you move into more
complicated scenarios with multiple standbys and WAL archiving, it's
inevitably going to be more complicated to set up.

That doesn't mean that we can't make it easier - we can and we should -
but I don't think the common complaint that replication is hard to set
up is true anymore.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Kevin Grittner

Kevin.Grittner@wicourts.gov

almost 15 years ago

In reply to: Heikki Linnakangas (#8)

Re: Feature request: pg_basebackup --force

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:

That's exactly what pg_basebackup does. Once you move into more
complicated scenarios with multiple standbys and WAL archiving,
it's inevitably going to be more complicated to set up.

That doesn't mean that we can't make it easier - we can and we
should - but I don't think the common complaint that replication
is hard to set up is true anymore.

Getting back to the rsync-like behavior, which is what led the
conversation in this direction, I think -- the point of that seemed
to be to allow similar ease of use for those activating a replicated
node as the master, without requiring that the entire data directory
be sent over a slow WAN or Internet path when the delta needed to
modify what was already at the remote end to match the new master
might be orders of magnitude less than data than that.

The intelligence to support that would be a fraction of what is in
rsync. In fact, since we might want to ignore hint bit differences
where possible, rsync might not work nearly as well as a home-grown
solution.

-Kevin