Streaming base backups

dimitri@2ndQuadrant.fr

over 15 years ago

In reply to: Magnus Hagander (#1)

Re: Streaming base backups

Magnus Hagander <magnus@hagander.net> writes:

Attached is an updated streaming base backup patch, based off the work

Thanks! :)

* Compression: Do we want to be able to compress the backups server-side? Or
defer that to whenever we get compression in libpq? (you can still tunnel it
through for example SSH to get compression if you want to) My thinking is
defer it.

Compression in libpq would be a nice way to solve it, later.

* Compression: We could still implement compression of the tar files in
pg_streamrecv (probably easier, possibly more useful?)

What about pg_streamrecv | gzip > …, which has the big advantage of
being friendly to *any* compression command line tool, whatever patents
and licenses?

* Stefan mentiond it might be useful to put some
posix_fadvise(POSIX_FADV_DONTNEED)
in the process that streams all the files out. Seems useful, as long as that
doesn't kick them out of the cache *completely*, for other backends as well.
Do we know if that is the case?

Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
not already in SHM?

* include all the necessary WAL files in the backup. This way we could generate
a tar file that would work on it's own - right now, you still need to set up
log archiving (or use streaming repl) to get the remaining logfiles from the
master. This is fine for replication setups, but not for backups.
This would also require us to block recycling of WAL files during the backup,
of course.

Well, I would guess that if you're streaming the WAL files in parallel
while the base backup is taken, then you're able to have it all without
archiving setup, and the server could still recycling them.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

magnus@hagander.net

over 15 years ago

In reply to: Dimitri Fontaine (#3)

Re: Streaming base backups

On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Attached is an updated streaming base backup patch, based off the work

Thanks! :)

* Compression: Do we want to be able to compress the backups server-side? Or
defer that to whenever we get compression in libpq? (you can still tunnel it
through for example SSH to get compression if you want to) My thinking is
defer it.

Compression in libpq would be a nice way to solve it, later.

Yeah, I'm pretty much set on postponing that one.

* Compression: We could still implement compression of the tar files in
pg_streamrecv (probably easier, possibly more useful?)

What about pg_streamrecv | gzip > …, which has the big advantage of
being friendly to *any* compression command line tool, whatever patents
and licenses?

That's part of what I meant with "easier and more useful".

Right now though, pg_streamrecv will output one tar file for each
tablespace, so you can't get it on stdout. But that can be changed of
course. The easiest step 1 is to just use gzopen() from zlib on the
files and use the same code as now :-)

* Stefan mentiond it might be useful to put some
posix_fadvise(POSIX_FADV_DONTNEED)
in the process that streams all the files out. Seems useful, as long as that
doesn't kick them out of the cache *completely*, for other backends as well.
Do we know if that is the case?

Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
not already in SHM?

I think that's way more complex than we want to go here.

* include all the necessary WAL files in the backup. This way we could generate
a tar file that would work on it's own - right now, you still need to set up
log archiving (or use streaming repl) to get the remaining logfiles from the
master. This is fine for replication setups, but not for backups.
This would also require us to block recycling of WAL files during the backup,
of course.

Well, I would guess that if you're streaming the WAL files in parallel
while the base backup is taken, then you're able to have it all without
archiving setup, and the server could still recycling them.

Yes, this was mostly for the use-case of "getting a single tarfile
that you can actually use to restore from without needing the log
archive at all".

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

dimitri@2ndQuadrant.fr

over 15 years ago

In reply to: Magnus Hagander (#4)

Re: Streaming base backups

Magnus Hagander <magnus@hagander.net> writes:

Compression in libpq would be a nice way to solve it, later.

Yeah, I'm pretty much set on postponing that one.

+1, in case it was not clear for whoever's counting the votes :)

What about pg_streamrecv | gzip > …, which has the big advantage of

That's part of what I meant with "easier and more useful".

Well…

Right now though, pg_streamrecv will output one tar file for each
tablespace, so you can't get it on stdout. But that can be changed of
course. The easiest step 1 is to just use gzopen() from zlib on the
files and use the same code as now :-)

Oh if integrating it is easier :)

Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
not already in SHM?

I think that's way more complex than we want to go here.

Yeah.

Well, I would guess that if you're streaming the WAL files in parallel
while the base backup is taken, then you're able to have it all without
archiving setup, and the server could still recycling them.

Yes, this was mostly for the use-case of "getting a single tarfile
that you can actually use to restore from without needing the log
archive at all".

It also allows for a simpler kick-start procedure for preparing a
standby, and allows to stop worrying too much about wal_keep_segments
and archive servers.

When do the standby launch its walreceiver? It would be extra-nice for
the base backup tool to optionally continue streaming WALs until the
standby starts doing it itself, so that wal_keep_segments is really
deprecated. No idea how feasible that is, though.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Dimitri Fontaine (#5)

Re: Streaming base backups

On 06.01.2011 00:27, Dimitri Fontaine wrote:

Magnus Hagander<magnus@hagander.net> writes:

What about pg_streamrecv | gzip> …, which has the big advantage of

That's part of what I meant with "easier and more useful".

Well…

One thing to keep in mind is that if you do compression in libpq for the
transfer, and gzip the tar file in the client, that's quite inefficient.
You compress the data once in the server, decompress in the client, then
compress it again in the client. If you're going to write the backup to
a compressed file, and you want to transfer it compressed to save
bandwidth, you want to gzip it in the server to begin with.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Marti Raudsepp

marti@juffo.org

over 15 years ago

In reply to: Dimitri Fontaine (#3)

Re: Streaming base backups

On Wed, Jan 5, 2011 at 23:58, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

* Stefan mentiond it might be useful to put some
posix_fadvise(POSIX_FADV_DONTNEED)
in the process that streams all the files out. Seems useful, as long as that
doesn't kick them out of the cache *completely*, for other backends as well.
Do we know if that is the case?

Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
not already in SHM?

It's not much of an improvement. For pages that we already have in
shared memory, OS cache is mostly useless. OS cache matters for pages
that *aren't* in shared memory.

Regards,
Marti

magnus@hagander.net

over 15 years ago

In reply to: Dimitri Fontaine (#5)

Re: Streaming base backups

On Wed, Jan 5, 2011 at 23:27, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Well, I would guess that if you're streaming the WAL files in parallel
while the base backup is taken, then you're able to have it all without
archiving setup, and the server could still recycling them.

Yes, this was mostly for the use-case of "getting a single tarfile
that you can actually use to restore from without needing the log
archive at all".

It also allows for a simpler kick-start procedure for preparing a
standby, and allows to stop worrying too much about wal_keep_segments
and archive servers.

When do the standby launch its walreceiver? It would be extra-nice for
the base backup tool to optionally continue streaming WALs until the
standby starts doing it itself, so that wal_keep_segments is really
deprecated. No idea how feasible that is, though.

I think that's we're inventing a whole lot of complexity that may not
be necessary at all. Let's do it the simple way and see how far we can
get by with that one - we can always improve this for 9.2

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Magnus Hagander (#1)

Re: Streaming base backups

On 05.01.2011 15:54, Magnus Hagander wrote:

Attached is an updated streaming base backup patch, based off the work
that Heikki started.
...
I've implemented a frontend for this in pg_streamrecv, based on the assumption
that we wanted to include this in bin/ for 9.1 - and that it seems like a
reasonable place to put it. This can obviously be moved elsewhere if we want to.

Hmm, is there any point in keeping the two functionalities in the same
binary, taking the base backup and streaming WAL to an archive
directory? Looks like the only common option between the two modes is
passing the connection string, and the verbose flag. A separate
pg_basebackup binary would probably make more sense.

That code needs a lot more cleanup, but I wanted to make sure I got the backend
patch out for review quickly. You can find the current WIP branch for
pg_streamrecv on my github page at https://github.com/mhagander/pg_streamrecv,
in the branch "baserecv". I'll be posting that as a separate patch once it's
been a bit more cleaned up (it does work now if you want to test it, though).

Looks like pg_streamrecv creates the pg_xlog and pg_tblspc directories,
because they're not included in the streamed tar. Wouldn't it be better
to include them in the tar as empty directories at the server-side?
Otherwise if you write the tar file to disk and untar it later, you have
to manually create them.

It would be nice to have an option in pg_streamrecv to specify the
backup label to use.

An option to stream the tar to stdout instead of a file would be very
handy too, so that you could pipe it directly to gzip for example. I
realize you get multiple tar files if tablespaces are used, but even if
you just throw an error in that case, it would be handy.

* Suggestion from Heikki: perhaps at some point we're going to need a full
bison grammar for walsender commands.

Maybe we should at least start using the lexer; we're not quite there to
need a full-blown grammar yet, but even a lexer might help.

BTW, looking at the WAL-streaming side of pg_streamrecv, if you start it
from scratch with an empty target directory, it needs to connect to
"postgres" database, to run pg_current_xlog_location(), and then
reconnect in replication mode. That's a bit awkward, there might not be
a "postgres" database, and even if there is, you might not have the
permission to connect to it. It would be much better to have a variant
of the START_REPLICATION command at the server-side that begins
streaming from the current location. Maybe just by leaving out the
start-location parameter.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#10

magnus@hagander.net

over 15 years ago

In reply to: Heikki Linnakangas (#9)

Re: Streaming base backups

On Thu, Jan 6, 2011 at 23:57, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

On 05.01.2011 15:54, Magnus Hagander wrote:

Attached is an updated streaming base backup patch, based off the work
that Heikki started.
...
I've implemented a frontend for this in pg_streamrecv, based on the
assumption
that we wanted to include this in bin/ for 9.1 - and that it seems like a
reasonable place to put it. This can obviously be moved elsewhere if we
want to.

Hmm, is there any point in keeping the two functionalities in the same
binary, taking the base backup and streaming WAL to an archive directory?
Looks like the only common option between the two modes is passing the
connection string, and the verbose flag. A separate pg_basebackup binary
would probably make more sense.

Yeah, once I broke things apart for better readability, I started
leaning in that direction as well.

However, if you consider the things that Dimiti mentioned about
streaming at the same time as downloading, having them in the same one
would make more sense. I don't think that's something for now,
though..

That code needs a lot more cleanup, but I wanted to make sure I got the
backend
patch out for review quickly. You can find the current WIP branch for
pg_streamrecv on my github page at
https://github.com/mhagander/pg_streamrecv,
in the branch "baserecv". I'll be posting that as a separate patch once
it's
been a bit more cleaned up (it does work now if you want to test it,
though).

Looks like pg_streamrecv creates the pg_xlog and pg_tblspc directories,
because they're not included in the streamed tar. Wouldn't it be better to
include them in the tar as empty directories at the server-side? Otherwise
if you write the tar file to disk and untar it later, you have to manually
create them.

Yeah, good point. Originally, the tar code (your tar code, btw :P)
didn't create *any* directories, so I stuck it in there. I agree it
should be moved to the backend patch now.

It would be nice to have an option in pg_streamrecv to specify the backup
label to use.

Agreed.

An option to stream the tar to stdout instead of a file would be very handy
too, so that you could pipe it directly to gzip for example. I realize you
get multiple tar files if tablespaces are used, but even if you just throw
an error in that case, it would be handy.

Makes sense.

* Suggestion from Heikki: perhaps at some point we're going to need a full
bison grammar for walsender commands.

Maybe we should at least start using the lexer; we're not quite there to
need a full-blown grammar yet, but even a lexer might help.

Might. I don't speak flex very well, so I'm not really sure what that
would mean.

BTW, looking at the WAL-streaming side of pg_streamrecv, if you start it
from scratch with an empty target directory, it needs to connect to
"postgres" database, to run pg_current_xlog_location(), and then reconnect
in replication mode. That's a bit awkward, there might not be a "postgres"
database, and even if there is, you might not have the permission to connect
to it. It would be much better to have a variant of the START_REPLICATION
command at the server-side that begins streaming from the current location.
Maybe just by leaving out the start-location parameter.

Agreed. That part is unchanged from the one that runs against 9.0
though, where that wasn't a possibility. But adding something like
that to the walsender in 9.1 would be good.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#11

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Magnus Hagander (#4)

Re: Streaming base backups

2011/1/5 Magnus Hagander <magnus@hagander.net>:

On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Magnus Hagander <magnus@hagander.net> writes:

* Stefan mentiond it might be useful to put some
posix_fadvise(POSIX_FADV_DONTNEED)
in the process that streams all the files out. Seems useful, as long as that
doesn't kick them out of the cache *completely*, for other backends as well.
Do we know if that is the case?

Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
not already in SHM?

I think that's way more complex than we want to go here.

DONTNEED will remove the block from OS buffer everytime.

It should not be that hard to implement a snapshot(it needs mincore())
and to restore previous state. I don't know how basebackup is
performed exactly...so perhaps I am wrong.

posix_fadvise support is already in postgresql core...we can start by
just doing a snapshot of the files before starting, or at some point
in the basebackup, it will need only 256kB per GB of data...
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

#12

Simon Riggs

simon@2ndQuadrant.com

over 15 years ago

In reply to: Magnus Hagander (#1)

Re: Streaming base backups

On Wed, 2011-01-05 at 14:54 +0100, Magnus Hagander wrote:

The basic implementation is: Add a new command to the replication mode called
BASE_BACKUP, that will initiate a base backup, stream the contents (in tar
compatible format) of the data directory and all tablespaces, and then end
the base backup in a single operation.

I'm a little dubious of the performance of that approach for some users,
though it does seem a popular idea.

One very useful feature will be some way of confirming the number and
size of files to transfer, so that the base backup client can find out
the progress.

It would also be good to avoid writing a backup_label file at all on the
master, so there was no reason why multiple concurrent backups could not
be taken. The current coding allows for the idea that the start and stop
might be in different sessions, whereas here we know we are in one
session.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

#13

magnus@hagander.net

over 15 years ago

In reply to: Simon Riggs (#12)

Re: Streaming base backups

On Fri, Jan 7, 2011 at 02:15, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, 2011-01-05 at 14:54 +0100, Magnus Hagander wrote:

The basic implementation is: Add a new command to the replication mode called
BASE_BACKUP, that will initiate a base backup, stream the contents (in tar
compatible format) of the data directory and all tablespaces, and then end
the base backup in a single operation.

I'm a little dubious of the performance of that approach for some users,
though it does seem a popular idea.

Well, it's of course only going to be an *option*. We should keep our
flexibility and allow the current ways as well.

One very useful feature will be some way of confirming the number and
size of files to transfer, so that the base backup client can find out
the progress.

The patch already does this. Or rather, as it's coded it does this
once per tablespace.

It'll give you an approximation only of course, that can change, but
it should be enough for the purposes of a progress indication.

It would also be good to avoid writing a backup_label file at all on the
master, so there was no reason why multiple concurrent backups could not
be taken. The current coding allows for the idea that the start and stop
might be in different sessions, whereas here we know we are in one
session.

Yeah, I have that on the todo list suggested by Heikki. I consider it
a later phase though.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#14

magnus@hagander.net

over 15 years ago

In reply to: Cédric Villemain (#11)

Re: Streaming base backups

On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:

2011/1/5 Magnus Hagander <magnus@hagander.net>:

On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Magnus Hagander <magnus@hagander.net> writes:

* Stefan mentiond it might be useful to put some
posix_fadvise(POSIX_FADV_DONTNEED)
in the process that streams all the files out. Seems useful, as long as that
doesn't kick them out of the cache *completely*, for other backends as well.
Do we know if that is the case?

Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
not already in SHM?

I think that's way more complex than we want to go here.

DONTNEED will remove the block from OS buffer everytime.

Then we definitely don't want to use it - because some other backend
might well want the file. Better leave it up to the standard logic in
the kernel.

It should not be that hard to implement a snapshot(it needs mincore())
and to restore previous state. I don't know how basebackup is
performed exactly...so perhaps I am wrong.

Uh, it just reads the files out of the filesystem. Just like you'd to
today, except it's now integrated and streams the data across a
regular libpq connection.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#15

ghamlin@isc.upenn.edu

over 15 years ago

In reply to: Cédric Villemain (#11)

Re: Streaming base backups

On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cï¿½dric Villemain wrote:

2011/1/5 Magnus Hagander <magnus@hagander.net>:

On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Magnus Hagander <magnus@hagander.net> writes:

* Stefan mentiond it might be useful to put some
posix_fadvise(POSIX_FADV_DONTNEED)
ï¿½ in the process that streams all the files out. Seems useful, as long as that
ï¿½ doesn't kick them out of the cache *completely*, for other backends as well.
ï¿½ Do we know if that is the case?

Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
not already in SHM?

I think that's way more complex than we want to go here.

DONTNEED will remove the block from OS buffer everytime.

It should not be that hard to implement a snapshot(it needs mincore())
and to restore previous state. I don't know how basebackup is
performed exactly...so perhaps I am wrong.

posix_fadvise support is already in postgresql core...we can start by
just doing a snapshot of the files before starting, or at some point
in the basebackup, it will need only 256kB per GB of data...

It is actually possible to be more scalable than the simple solution you
outline here (although that solution works pretty well).

I've written a program that syncronizes the OS cache state using
mmap()/mincore() between two computers. It haven't actually tested its
impact on performance yet, but I was surprised by how fast it actually runs
and how compact cache maps can be.

If one encodes the data so one remembers the number of zeros between 1s
one, storage scale by the amount of memory in each size rather than the
dataset size. I actually played with doing that, then doing huffman
encoding of that. I get around 1.2-1.3 bits / page of _physical memory_
on my tests.

I don't have my notes handy, but here are some numbers from memory...

The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page
of physical memory in the machine. The latter limit get better, however,
since there are < 1024 symbols possible for the encoder (since in this
case symbols are spans of zeros that need to fit in a file that is 1 GB in
size). So is actually real worst case is much closer to 1 bit per page of
the dataset or ~10 bits per page of physical memory. The real performance
I see with huffman is more like 1.3 bits per page of physical memory. All the
encoding decoding is actually very fast. zlib would actually compress even
better than huffman, but huffman encoder/decoder is actually pretty good and
very straightforward code.

I would like to integrate something like this into PG or perhaps even into
something like rsync, but its was written as proof of concept and I haven't
had time work on it recently.

Garick

Show quoted text

--
Cï¿½dric Villemainï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ï¿½ 2ndQuadrant
http://2ndQuadrant.fr/ï¿½ ï¿½ï¿½ PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

ghamlin@isc.upenn.edu

over 15 years ago

In reply to: Garick Hamlin (#15)

Re: Streaming base backups

On Fri, Jan 07, 2011 at 10:26:29AM -0500, Garick Hamlin wrote:

On Thu, Jan 06, 2011 at 07:47:39PM -0500, Cï¿½dric Villemain wrote:

2011/1/5 Magnus Hagander <magnus@hagander.net>:

On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:

Magnus Hagander <magnus@hagander.net> writes:

* Stefan mentiond it might be useful to put some
posix_fadvise(POSIX_FADV_DONTNEED)
ï¿½ in the process that streams all the files out. Seems useful, as long as that
ï¿½ doesn't kick them out of the cache *completely*, for other backends as well.
ï¿½ Do we know if that is the case?

Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
not already in SHM?

I think that's way more complex than we want to go here.

DONTNEED will remove the block from OS buffer everytime.

It should not be that hard to implement a snapshot(it needs mincore())
and to restore previous state. I don't know how basebackup is
performed exactly...so perhaps I am wrong.

posix_fadvise support is already in postgresql core...we can start by
just doing a snapshot of the files before starting, or at some point
in the basebackup, it will need only 256kB per GB of data...

It is actually possible to be more scalable than the simple solution you
outline here (although that solution works pretty well).

I've written a program that syncronizes the OS cache state using
mmap()/mincore() between two computers. It haven't actually tested its
impact on performance yet, but I was surprised by how fast it actually runs
and how compact cache maps can be.

If one encodes the data so one remembers the number of zeros between 1s
one, storage scale by the amount of memory in each size rather than the

Sorry for the typos, that should read:

the storage scales by the number of pages resident in memory rather than the
total dataset size.

Show quoted text

dataset size. I actually played with doing that, then doing huffman
encoding of that. I get around 1.2-1.3 bits / page of _physical memory_
on my tests.

I don't have my notes handy, but here are some numbers from memory...

The obvious worst cases are 1 bit per page of _dataset_ or 19 bits per page
of physical memory in the machine. The latter limit get better, however,
since there are < 1024 symbols possible for the encoder (since in this
case symbols are spans of zeros that need to fit in a file that is 1 GB in
size). So is actually real worst case is much closer to 1 bit per page of
the dataset or ~10 bits per page of physical memory. The real performance
I see with huffman is more like 1.3 bits per page of physical memory. All the
encoding decoding is actually very fast. zlib would actually compress even
better than huffman, but huffman encoder/decoder is actually pretty good and
very straightforward code.

I would like to integrate something like this into PG or perhaps even into
something like rsync, but its was written as proof of concept and I haven't
had time work on it recently.

Garick

--
Cï¿½dric Villemainï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ ï¿½ï¿½ 2ndQuadrant
http://2ndQuadrant.fr/ï¿½ ï¿½ï¿½ PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Magnus Hagander (#1)

Re: Streaming base backups

On 05.01.2011 15:54, Magnus Hagander wrote:

* Suggestion from Heikki: perhaps at some point we're going to need a full
bison grammar for walsender commands.

Here's a patch for this (Also available at
git@github.com:hlinnaka/postgres.git, branch "streaming_base"). I
thought I know our bison/flex magic pretty well by now, but it turned
out to take much longer than I thought. But here it is.

I'm not 100% sure if this is worth the trouble quite yet. It adds quite
a lot of boilerplate code.. OTOH, having a bison grammar file makes it
easier to see what exactly the grammar is, and I like that. It's not too
bad with three commands yet, but if it expands much further a bison
grammar is a must.

At first I tried using the backend lexer for this, but it couldn't parse
the xlog-start location in the "START_REPLICATION 0/47000000" command.
In hindsight that may have been a badly chosen syntax. But as you
pointed out on IM, the lexer needed to handle this limited set of
commands is very small, so I wrote a dedicated flex lexer instead that
can handle it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#18

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Magnus Hagander (#1)

Re: Streaming base backups

On 05.01.2011 15:54, Magnus Hagander wrote:

I've implemented a frontend for this in pg_streamrecv, based on the assumption
that we wanted to include this in bin/ for 9.1 - and that it seems like a
reasonable place to put it. This can obviously be moved elsewhere if we want to.
That code needs a lot more cleanup, but I wanted to make sure I got the backend
patch out for review quickly. You can find the current WIP branch for
pg_streamrecv on my github page at https://github.com/mhagander/pg_streamrecv,
in the branch "baserecv". I'll be posting that as a separate patch once it's
been a bit more cleaned up (it does work now if you want to test it, though).

One more thing, now that I've played a bit with pg_streamrecv:

I find it strange that the data directory must exist when you call
pg_streamrecv in base-backup mode. I would expect it to work like
initdb, and create the directory if it doesn't exist.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#19

magnus@hagander.net

over 15 years ago

In reply to: Heikki Linnakangas (#9)

Re: Streaming base backups

On Thu, Jan 6, 2011 at 23:57, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Looks like pg_streamrecv creates the pg_xlog and pg_tblspc directories,
because they're not included in the streamed tar. Wouldn't it be better to
include them in the tar as empty directories at the server-side? Otherwise
if you write the tar file to disk and untar it later, you have to manually
create them.

Attached is an updated patch that does this.

It also collects all the header records as a single resultset at the
beginning. This made for cleaner code, but more importantly makes it
possible to get the total size of the backup even if there are
multiple tablespaces.

It also changes the tar members to use relative paths instead of
absolute ones - since we send the root of the directory in the header
anyway. That also takes away the "./" portion in all tar members.

git branch on github updated as well, of course.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#20

Hannu Krosing

hannu@tm.ee

over 15 years ago

In reply to: Magnus Hagander (#13)

Re: Streaming base backups

On 7.1.2011 15:45, Magnus Hagander wrote:

On Fri, Jan 7, 2011 at 02:15, Simon Riggs<simon@2ndquadrant.com> wrote:

One very useful feature will be some way of confirming the number and
size of files to transfer, so that the base backup client can find out
the progress.

The patch already does this. Or rather, as it's coded it does this
once per tablespace.

It'll give you an approximation only of course, that can change,

In this case you actually could send exact numbers, as you need to only
transfer the files
up to the size they were when starting the base backup. The rest will
be taken care of by
WAL replay

but
it should be enough for the purposes of a progress indication.

It would also be good to avoid writing a backup_label file at all on the
master, so there was no reason why multiple concurrent backups could not
be taken. The current coding allows for the idea that the start and stop
might be in different sessions, whereas here we know we are in one
session.

Yeah, I have that on the todo list suggested by Heikki. I consider it
a later phase though.

--
--------------------------------------------
Hannu Krosing
Senior Consultant,
Infinite Scalability& Performance
http://www.2ndQuadrant.com/books/

#21

magnus@hagander.net

over 15 years ago

In reply to: Hannu Krosing (#20)

#22

Hannu Krosing

hannu@tm.ee

over 15 years ago

In reply to: Magnus Hagander (#21)

#23

magnus@hagander.net

over 15 years ago

In reply to: Hannu Krosing (#22)

#24

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Garick Hamlin (#15)

#25

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Magnus Hagander (#14)

#26

magnus@hagander.net

over 15 years ago

In reply to: Cédric Villemain (#25)

#27

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Magnus Hagander (#26)

#28

Stefan Kaltenbrunner

stefan@kaltenbrunner.cc

over 15 years ago

In reply to: Cédric Villemain (#27)

#29

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Stefan Kaltenbrunner (#28)

#30

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Magnus Hagander (#26)

#31

magnus@hagander.net

over 15 years ago

In reply to: Cédric Villemain (#30)

#32

ghamlin@isc.upenn.edu

over 15 years ago

In reply to: Magnus Hagander (#26)

#33

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Garick Hamlin (#32)

#34

ghamlin@isc.upenn.edu

over 15 years ago

In reply to: Cédric Villemain (#33)

#35

Florian Pflug

fgp@phlo.org

over 15 years ago

In reply to: Garick Hamlin (#34)

#36

cedric.villemain.debian@gmail.com

over 15 years ago

In reply to: Garick Hamlin (#34)

#37

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Florian Pflug (#35)

#38

ghamlin@isc.upenn.edu

over 15 years ago

In reply to: Tom Lane (#37)

#39

Fujii Masao

masao.fujii@gmail.com

over 15 years ago

In reply to: Magnus Hagander (#26)

#40

magnus@hagander.net

over 15 years ago

In reply to: Fujii Masao (#39)

#41

Fujii Masao

masao.fujii@gmail.com

over 15 years ago

In reply to: Magnus Hagander (#40)

#42

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Fujii Masao (#41)

#43

magnus@hagander.net

over 15 years ago

In reply to: Heikki Linnakangas (#42)

#44

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Magnus Hagander (#43)

#45

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Heikki Linnakangas (#44)

#46

heikki.linnakangas@enterprisedb.com

over 15 years ago

In reply to: Tom Lane (#45)

#47

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Heikki Linnakangas (#46)

#48

magnus@hagander.net

over 15 years ago

In reply to: Tom Lane (#47)

#49

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Magnus Hagander (#48)

#50

magnus@hagander.net

over 15 years ago

In reply to: Tom Lane (#49)

#51

Tatsuo Ishii

t-ishii@sra.co.jp

over 15 years ago

In reply to: Dimitri Fontaine (#5)

#52

Robert Haas

robertmhaas@gmail.com

over 15 years ago

In reply to: Tatsuo Ishii (#51)

#53

Tatsuo Ishii

t-ishii@sra.co.jp

over 15 years ago

In reply to: Robert Haas (#52)

#54

Fujii Masao

masao.fujii@gmail.com

over 15 years ago

In reply to: Tatsuo Ishii (#53)

#55

magnus@hagander.net

over 15 years ago

In reply to: Tatsuo Ishii (#53)

#56

dimitri@2ndQuadrant.fr

over 15 years ago

In reply to: Magnus Hagander (#55)

#57

magnus@hagander.net

over 15 years ago

In reply to: Dimitri Fontaine (#56)

#58