Streaming a base backup from master

Started by Heikki Linnakangasover 15 years ago41 messageshackers
Jump to latest
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

It's been discussed before that it would be cool if you could stream a
new base backup from the master server, via libpq. That way you would
not need low-level filesystem access to initialize a new standby.

Magnus mentioned today that he started hacking on that, and
coincidentally I just started experimenting with it yesterday as well
:-). So let's get this out on the mailing list.

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the
replication command set. Upon receiving that command, the master starts
a COPY, and streams a tarred copy of the data directory to the client.
The patch includes a simple command-line tool, pg_streambackup, to
connect to a server and request a backup that you can then redirect to a
.tar file or pipe to "tar x".

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

* The streamed backup archive should contain all the necessary WAL files
too, so that you don't need to set up archiving to use this. You could
just point the tiny client tool to the server, and get a backup archive
containing everything that's necessary to restore correctly.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachments:

basebackup-1.patchtext/x-diff; name=basebackup-1.patchDownload+488-2
#2Thom Brown
thom@linux.com
In reply to: Heikki Linnakangas (#1)
Re: Streaming a base backup from master

On 3 September 2010 12:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

Would it be possible to not require pg_start/stop_backup() for this
new feature? (yes, I'm probably missing something obvious here)

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

#3Dave Page
dpage@pgadmin.org
In reply to: Heikki Linnakangas (#1)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
command set. Upon receiving that command, the master starts a COPY, and
streams a tarred copy of the data directory to the client. The patch
includes a simple command-line tool, pg_streambackup, to connect to a server
and request a backup that you can then redirect to a .tar file or pipe to
"tar x".

Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

#4Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#1)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 13:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

It's been discussed before that it would be cool if you could stream a new
base backup from the master server, via libpq. That way you would not need
low-level filesystem access to initialize a new standby.

Magnus mentioned today that he started hacking on that, and coincidentally I
just started experimenting with it yesterday as well :-). So let's get this
out on the mailing list.

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
command set. Upon receiving that command, the master starts a COPY, and
streams a tarred copy of the data directory to the client. The patch
includes a simple command-line tool, pg_streambackup, to connect to a server
and request a backup that you can then redirect to a .tar file or pipe to
"tar x".

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

* The streamed backup archive should contain all the necessary WAL files
too, so that you don't need to set up archiving to use this. You could just
point the tiny client tool to the server, and get a backup archive
containing everything that's necessary to restore correctly.

For this last point, this should of course be *optional*, but it would
be very good to have that option (and probably on by default).

Couple of quick comments that I saw directly differentiated from the
code I have :-) We chatted some about it already, but it should be
included for others...

* It should be possible to pass the backup label through, not just
hardcode it to basebackup

* Needs support for tablespaces. We should either follow the symlinks
and pick up the files, or throw an error if it's there. Silently
delivering an incomplete backup is not a good thing :-)

* Is there a point in adapting the chunk size to the size of the libpq buffers?

FWIW, my implementation was as a user-defined function, which has the
advantage it can run on 9.0. But most likely this code can be ripped
out and provided as a separate backport project for 9.0 if necessary -
no need to have separate codebases.

Other than that, our code is remarkably similar.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#5Magnus Hagander
magnus@hagander.net
In reply to: Thom Brown (#2)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 13:25, Thom Brown <thom@linux.com> wrote:

On 3 September 2010 12:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

Would it be possible to not require pg_start/stop_backup() for this
new feature? (yes, I'm probably missing something obvious here)

You don't need to run it *manually*, but the process needs to run it
automatically in the background for you. Which it does already in the
suggested patch.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#6Thom Brown
thom@linux.com
In reply to: Magnus Hagander (#5)
Re: Streaming a base backup from master

On 3 September 2010 12:30, Magnus Hagander <magnus@hagander.net> wrote:

On Fri, Sep 3, 2010 at 13:25, Thom Brown <thom@linux.com> wrote:

On 3 September 2010 12:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

Would it be possible to not require pg_start/stop_backup() for this
new feature? (yes, I'm probably missing something obvious here)

You don't need to run it *manually*, but the process needs to run it
automatically in the background for you. Which it does already in the
suggested patch.

Ah, clearly I didn't read the patch in any detail. Thanks :)

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Thom Brown (#2)
Re: Streaming a base backup from master

On 03/09/10 14:25, Thom Brown wrote:

On 3 September 2010 12:19, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

TODO:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

Would it be possible to not require pg_start/stop_backup() for this
new feature? (yes, I'm probably missing something obvious here)

Well, pg_start_backup() does several things:

1. It sets the forceFullPageWrites flag, so that we don't get partial
pages in the restored database.
2. It performs a checkpoint
3. It creates a backup label file

We certainly need 1 and 2. We don't necessary need to write the backup
label file to the data directory when we're streaming the backup
directly to the client, we can just include it in the streamed archive.

pg_stop_backup() also does several things:
1. It clears the forceFullPageWrites flag.
2. It writes an end-of-backup WAL record
3. It switches to new WAL segment, to get the final WAL segment archived.
4. It writes a backup history file
5. It removes the backup label file.
6. It waits for all the required WAL files to be archived.

We need 1, but the rest we could do in a smarter way. When we have more
control of the backup process, I don't think we need the end-of-backup
WAL record or the backup label anymore. We can add the pg_control file
as the last file in the archive, and set minRecoveryPoint in it to the
last WAL record needed to recover.

So no, we don't really need pg_start/stop_backup() per se, but we'll
need something similar...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Dave Page (#3)
Re: Streaming a base backup from master

On 03/09/10 14:28, Dave Page wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
command set. Upon receiving that command, the master starts a COPY, and
streams a tarred copy of the data directory to the client. The patch
includes a simple command-line tool, pg_streambackup, to connect to a server
and request a backup that you can then redirect to a .tar file or pipe to
"tar x".

Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.

Ok. Another obvious thing that people will want is to gzip the tar file
while sending it, to reduce network traffic.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#9Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#8)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 13:48, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 03/09/10 14:28, Dave Page wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com>  wrote:

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the
replication
command set. Upon receiving that command, the master starts a COPY, and
streams a tarred copy of the data directory to the client. The patch
includes a simple command-line tool, pg_streambackup, to connect to a
server
and request a backup that you can then redirect to a .tar file or pipe to
"tar x".

Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.

Ok. Another obvious thing that people will want is to gzip the tar file
while sending it, to reduce network traffic.

Not necessarily obvious, needs to be configurable. There are a lot of
cases where you might not want it.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#10Bruce Momjian
bruce@momjian.us
In reply to: Heikki Linnakangas (#1)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

Well there's no particular reason we couldn't support having multiple
pg_start_backup() pending either. It's just not usually something
people have need so far.

--
greg

#11Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Bruce Momjian (#10)
Re: Streaming a base backup from master

On 03/09/10 15:16, Greg Stark wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

Well there's no particular reason we couldn't support having multiple
pg_start_backup() pending either. It's just not usually something
people have need so far.

The backup label file makes that hard. There can be only one at a time.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#12Robert Haas
robertmhaas@gmail.com
In reply to: Dave Page (#3)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 7:28 AM, Dave Page <dpage@pgadmin.org> wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
command set. Upon receiving that command, the master starts a COPY, and
streams a tarred copy of the data directory to the client. The patch
includes a simple command-line tool, pg_streambackup, to connect to a server
and request a backup that you can then redirect to a .tar file or pipe to
"tar x".

Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.

It seems like the elephant in the room here is updating an existing
backup without recopying the entire data directory. Perhaps that's
phase two, but worth keeping in mind...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#13Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#12)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 15:24, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 3, 2010 at 7:28 AM, Dave Page <dpage@pgadmin.org> wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
command set. Upon receiving that command, the master starts a COPY, and
streams a tarred copy of the data directory to the client. The patch
includes a simple command-line tool, pg_streambackup, to connect to a server
and request a backup that you can then redirect to a .tar file or pipe to
"tar x".

Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.

It seems like the elephant in the room here is updating an existing
backup without recopying the entire data directory.  Perhaps that's
phase two, but worth keeping in mind...

I'd say that's a very different use-case, but still a very useful one
of course. It's probably going to be a lot more complex (it would
require bi-directional traffic, I think)...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#14Dave Page
dpage@pgadmin.org
In reply to: Robert Haas (#12)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 2:24 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 3, 2010 at 7:28 AM, Dave Page <dpage@pgadmin.org> wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Here's a WIP patch. It adds a new "TAKE_BACKUP" command to the replication
command set. Upon receiving that command, the master starts a COPY, and
streams a tarred copy of the data directory to the client. The patch
includes a simple command-line tool, pg_streambackup, to connect to a server
and request a backup that you can then redirect to a .tar file or pipe to
"tar x".

Cool. Can you add a TODO to build in code to un-tar the archive? tar
is not usually found on Windows systems, and as we already have tar
extraction code in pg_restore it could presumably be added relatively
painlessly.

It seems like the elephant in the room here is updating an existing
backup without recopying the entire data directory.  Perhaps that's
phase two, but worth keeping in mind...

rsync? Might be easier to use that from day 1 (well, day 2) than to
retrofit later.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

#15Robert Haas
robertmhaas@gmail.com
In reply to: Dave Page (#14)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 9:26 AM, Dave Page <dpage@pgadmin.org> wrote:

rsync? Might be easier to use that from day 1 (well, day 2) than to
retrofit later.

I'm not sure we want to depend on an external utility like that,
particularly one that users may not have installed. And I'm not sure
if that can be made to work over a libpq channel, either. But
certainly something with that functionality would be nice to have,
whether it ends up sharing code or not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#16Dave Page
dpage@pgadmin.org
In reply to: Robert Haas (#15)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 2:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 3, 2010 at 9:26 AM, Dave Page <dpage@pgadmin.org> wrote:

rsync? Might be easier to use that from day 1 (well, day 2) than to
retrofit later.

I'm not sure we want to depend on an external utility like that,
particularly one that users may not have installed.  And I'm not sure
if that can be made to work over a libpq channel, either.  But
certainly something with that functionality would be nice to have,
whether it ends up sharing code or not.

No, I agree we don't want an external dependency (I was just bleating
about needing tar on Windows). I was assuming/hoping there's a
librsync somewhere...

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise Postgres Company

#17Robert Haas
robertmhaas@gmail.com
In reply to: Dave Page (#16)
Re: Streaming a base backup from master

On Fri, Sep 3, 2010 at 9:32 AM, Dave Page <dpage@pgadmin.org> wrote:

No, I agree we don't want an external dependency (I was just bleating
about needing tar on Windows). I was assuming/hoping there's a
librsync somewhere...

The rsync code itself is not modular, I believe. I think the author
thereof kind of took the approach of placing efficiency before all.
See:

http://www.samba.org/rsync/how-rsync-works.html ... especially the
section on "The Rsync Protocol"

I Googled librsync and got a hit, but that code is a rewrite of the
source base and seems to have little or no activity since 2004.

http://librsync.sourceforge.net/

That page writes: "librsync is not wire-compatible with rsync 2.x, and
is not likely to be in the future." The current version of rsync is
3.0.7.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#18Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#17)
Re: Streaming a base backup from master

* Robert Haas (robertmhaas@gmail.com) wrote:

The rsync code itself is not modular, I believe. I think the author
thereof kind of took the approach of placing efficiency before all.

Yeah, I looked into this when discussing this same concept at PGCon with
folks. There doesn't appear to be a good librsync and, even if there
was, there's a heck of alot of complexity there that we *don't* need.
rsync is a great tool, don't get me wrong, but let's not try to go over
our heads here.

We don't need permissions handling, as an example. I also don't think
we need the binary diff/partial file transfer capability- we already
break relations into 1G chunks (when/if they reach that size), so you
won't necessairly be copying the entire relation if you're just doing
mtime based or per-file-checksum based detection. We don't need device
node handling, we don't need auto-ignoring files, or pattern
exclusion/inclusion, we don't really need a progress bar (though it'd be
nice.. :), etc, etc, etc.

Thanks,

Stephen

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#11)
Re: Streaming a base backup from master

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

On 03/09/10 15:16, Greg Stark wrote:

On Fri, Sep 3, 2010 at 12:19 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

* We need a smarter way to do pg_start/stop_backup() with this. At the
moment, you can only have one backup running at a time, but we shouldn't
have that limitation with this built-in mechanism.

Well there's no particular reason we couldn't support having multiple
pg_start_backup() pending either. It's just not usually something
people have need so far.

The backup label file makes that hard. There can be only one at a time.

I don't actually see a use-case for streaming multiple concurrent
backups. How many people are going to be able to afford that kind of
load on the master's I/O bandwidth?

Certainly for version 1, it would be sufficient to throw an error if
someone tries to start a backup while another one is in progress.
*Maybe*, down the road, we'd want to relax it.

regards, tom lane

#20Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Stephen Frost (#18)
Re: Streaming a base backup from master

Stephen Frost <sfrost@snowman.net> wrote:

there's a heck of alot of complexity there that we *don't* need.
rsync is a great tool, don't get me wrong, but let's not try to go
over our heads here.

Right -- among other things, it checks for portions of a new file
which match the old file at a different location. For example, if
you have a very large text file, and insert a line or two at the
start, it will wind up only sending the new lines. (Well, that and
all the checksums which help it determine that the rest of the file
matches at a shifted location.) I would think that PostgreSQL could
just check whether *corresponding* portions of a file matched, which
is much simpler.

we already break relations into 1G chunks (when/if they reach that
size), so you won't necessairly be copying the entire relation if
you're just doing mtime based or per-file-checksum based
detection.

While 1GB granularity would be OK, I doubt it's optimal; I think CRC
checks for smaller chunks might be worthwhile. My gut feel is that
somewhere in the 64kB to 1MB range would probably be optimal for us,
although the "sweet spot" will depend on how the database is used.
A configurable or self-adjusting size would be cool.

-Kevin

#21Stephen Frost
sfrost@snowman.net
In reply to: Kevin Grittner (#20)
#22Thom Brown
thom@linux.com
In reply to: Tom Lane (#19)
#23Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Stephen Frost (#21)
#24Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#19)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Stephen Frost (#21)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#23)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#26)
#28David Blewett
david@dawninglight.net
In reply to: Tom Lane (#26)
#29Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#26)
#30Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#26)
#31Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Stephen Frost (#29)
#32Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: David Blewett (#28)
#33Martijn van Oosterhout
kleptog@svana.org
In reply to: Stephen Frost (#18)
#34David Blewett
david@dawninglight.net
In reply to: Heikki Linnakangas (#32)
#35Bruce Momjian
bruce@momjian.us
In reply to: Martijn van Oosterhout (#33)
#36Thom Brown
thom@linux.com
In reply to: Bruce Momjian (#35)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#35)
#38Martijn van Oosterhout
kleptog@svana.org
In reply to: Bruce Momjian (#35)
#39Bruce Momjian
bruce@momjian.us
In reply to: Martijn van Oosterhout (#38)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#39)
#41Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#35)