pg_basebackup for streaming base backups
This patch creates pg_basebackup in bin/, being a client program for
the streaming base backup feature.
I think it's more or less done now. I've again split it out of
pg_streamrecv, because it had very little shared code with that
(basically just the PQconnectdb() wrapper).
One thing I'm thinking about - right now the tool just takes -c
<conninfo> to connect to the database. Should it instead be taught to
take the connection parameters that for example pg_dump does - one for
each of host, port, user, password? (shouldn't be hard to do..)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Attachments:
pg_basebackup.patchtext/x-patch; charset=US-ASCII; name=pg_basebackup.patchDownload+1200-1
Hi,
I have an unexpected 5 mins window to do a first reading of the patch,
so here goes the quick doc and comments proof reading of it. :)
Magnus Hagander <magnus@hagander.net> writes:
This patch creates pg_basebackup in bin/, being a client program for
the streaming base backup feature.
Great! We have pg_ctl init[db], I think we want pg_ctl clone or some
other command here to call the binary for us. What do you think?
One thing I'm thinking about - right now the tool just takes -c
<conninfo> to connect to the database. Should it instead be taught to
take the connection parameters that for example pg_dump does - one for
each of host, port, user, password? (shouldn't be hard to do..)
Consistency is good.
Now, basic first patch reading level review:
I think doc/src/sgml/backup.sgml should include some notes about how
libpq base backup streaming compares to rsync and the like in term of
efficiency or typical performances, when to prefer which, etc. I'll see
about doing some tests next week.
+ <term><option>--basedir=<replaceable class="parameter">directory</replaceable></option></term>
That should be -D --pgdata, for consistency with pg_dump.
On a quick reading it's unclear from the docs alone how -d and -t leave
together. It seems like the options are exclusive but I'd have to ask…
+ * The file will be named base.tar[.gz] if it's for the main data directory
+ * or <tablespaceoid>.tar[.gz] if it's for another tablespace.
Well we have UNIQUE, btree (spcname), so maybe we can use that here?
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sat, Jan 15, 2011 at 21:16, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
Hi,
I have an unexpected 5 mins window to do a first reading of the patch,
so here goes the quick doc and comments proof reading of it. :)
:-)
Magnus Hagander <magnus@hagander.net> writes:
This patch creates pg_basebackup in bin/, being a client program for
the streaming base backup feature.Great! We have pg_ctl init[db], I think we want pg_ctl clone or some
other command here to call the binary for us. What do you think?
That might be useful, but I think we need to settle on the
pg_basebackup contents itself first.
Not sure pg_ctl clone would be the proper name, since it's not
actually a clone at this point (it might be with the second patch I
ust posted that includes the WAL files)
One thing I'm thinking about - right now the tool just takes -c
<conninfo> to connect to the database. Should it instead be taught to
take the connection parameters that for example pg_dump does - one for
each of host, port, user, password? (shouldn't be hard to do..)Consistency is good.
Now, basic first patch reading level review:
I think doc/src/sgml/backup.sgml should include some notes about how
libpq base backup streaming compares to rsync and the like in term of
efficiency or typical performances, when to prefer which, etc. I'll see
about doing some tests next week.
Yeah, the whole backup chapter may well need some more work after this.
+ <term><option>--basedir=<replaceable class="parameter">directory</replaceable></option></term>
That should be -D --pgdata, for consistency with pg_dump.
pg_dump doesn't have a -D. I assume you mean pg_ctl / initdb?
On a quick reading it's unclear from the docs alone how -d and -t leave
together. It seems like the options are exclusive but I'd have to ask…
They are. The docs clearly say "Only one of <literal>-d</> and
<literal>-t</> can be specified"
+ * The file will be named base.tar[.gz] if it's for the main data directory + * or <tablespaceoid>.tar[.gz] if it's for another tablespace.Well we have UNIQUE, btree (spcname), so maybe we can use that here?
We could, but that would make it more likely to run into encoding
issues and such - do we restrict what can be in a tablespace name?
Also with a tar named by the oid, you *can* untar it into a directory
in pg_tblspc to recover from if you have to.
Another option, I think Heikki mentioned this on IM at some point, is
to do something like name it <oid>-<name>.tar. That would give us best
of both worlds?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Em 15-01-2011 15:10, Magnus Hagander escreveu:
One thing I'm thinking about - right now the tool just takes -c
<conninfo> to connect to the database. Should it instead be taught to
take the connection parameters that for example pg_dump does - one for
each of host, port, user, password? (shouldn't be hard to do..)
+1.
--
Euler Taveira de Oliveira
http://www.timbira.com/
Magnus Hagander <magnus@hagander.net> writes:
Not sure pg_ctl clone would be the proper name, since it's not
actually a clone at this point (it might be with the second patch I
ust posted that includes the WAL files)
Let's keep the clone name for the client that makes it all then :)
That should be -D --pgdata, for consistency with pg_dump.
pg_dump doesn't have a -D. I assume you mean pg_ctl / initdb?
Yes, sorry, been too fast.
They are. The docs clearly say "Only one of <literal>-d</> and
<literal>-t</> can be specified"
Really too fast…
Another option, I think Heikki mentioned this on IM at some point, is
to do something like name it <oid>-<name>.tar. That would give us best
of both worlds?
Well I'd think we know the pg_tablespace columns encoding, so the
problem might be the filesystem encodings, right? Well there's also the
option of creating <oid>.tar and have a symlink to it called <name>.tar
but that's pushing it. I don't think naming after OIDs is a good
service for users, but if that's all we can reasonably do…
Will continue reviewing and post something more polished and
comprehensive next week — mainly wanted to see if you wanted to include
pg_ctl <command> in the patch already.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sat, Jan 15, 2011 at 23:10, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
That should be -D --pgdata, for consistency with pg_dump.
pg_dump doesn't have a -D. I assume you mean pg_ctl / initdb?
Yes, sorry, been too fast.
Ok. Updated patch that includes this change attached. I also changed
the tar directory from -t to -T, for consistency.
It also includes the change to take -h host, -U user, -w/-W for
password -p port instead of a conninfo string.
Another option, I think Heikki mentioned this on IM at some point, is
to do something like name it <oid>-<name>.tar. That would give us best
of both worlds?Well I'd think we know the pg_tablespace columns encoding, so the
problem might be the filesystem encodings, right? Well there's also the
Do we really? That's one of the global catalogs that don't really have
an encoding, isn't it?
option of creating <oid>.tar and have a symlink to it called <name>.tar
but that's pushing it. I don't think naming after OIDs is a good
service for users, but if that's all we can reasonably do…
Yeah, symlink seems to be making things way too complex. <oid>-<name>
seems is perhaps a reasonable compromise?
Will continue reviewing and post something more polished and
comprehensive next week — mainly wanted to see if you wanted to include
pg_ctl <command> in the patch already.
Ok, thanks.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Attachments:
pg_basebackup.patchtext/x-patch; charset=US-ASCII; name=pg_basebackup.patchDownload+1360-1
Magnus Hagander <magnus@hagander.net> writes:
+ * The file will be named base.tar[.gz] if it's for the main data directory + * or <tablespaceoid>.tar[.gz] if it's for another tablespace.Well we have UNIQUE, btree (spcname), so maybe we can use that here?
We could, but that would make it more likely to run into encoding
issues and such - do we restrict what can be in a tablespace name?
No. Don't even think of going there --- we got rid of user-accessible
names in the filesystem years ago and we're not going back. Consider
CREATE TABLESPACE "/foo/bar" LOCATION '/foo/bar';
regards, tom lane
On Sun, Jan 16, 2011 at 18:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
+ * The file will be named base.tar[.gz] if it's for the main data directory + * or <tablespaceoid>.tar[.gz] if it's for another tablespace.Well we have UNIQUE, btree (spcname), so maybe we can use that here?
We could, but that would make it more likely to run into encoding
issues and such - do we restrict what can be in a tablespace name?No. Don't even think of going there --- we got rid of user-accessible
names in the filesystem years ago and we're not going back. Consider
CREATE TABLESPACE "/foo/bar" LOCATION '/foo/bar';
Well, we'd try to name the file for that "<oid>-/foo/bar.tar", which I
guess would break badly, yes.
I guess we could normalize the tablespace name into [a-zA-Z0-9] or so,
which would still be useful for the majority of cases, I think?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
On Sun, Jan 16, 2011 at 18:18, Tom Lane <tgl@sss.pgh.pa.us> wrote:
No. Don't even think of going there --- we got rid of user-accessible
names in the filesystem years ago and we're not going back. Consider
CREATE TABLESPACE "/foo/bar" LOCATION '/foo/bar';Well, we'd try to name the file for that "<oid>-/foo/bar.tar", which I
guess would break badly, yes.I guess we could normalize the tablespace name into [a-zA-Z0-9] or so,
which would still be useful for the majority of cases, I think?
Well if we're not using user names, there's no good choice except for
system name, and the one you're making up here isn't the "true" one…
Now I think the unfriendliness is around the fact that you need to
prepare (untar, unzip) and start a cluster from the backup to be able to
know what file contains what. Is it possible to offer a tool that lists
the logical objects contained into each tar file?
Maybe adding a special section at the beginning of each. That would be
logically like pg_dump "catalog", but implemented as a simple "noise"
file that you simply `cat` with some command.
Once more, I'm still unclear how important that is, but it's scratching.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Magnus Hagander <magnus@hagander.net> writes:
Well, we'd try to name the file for that "<oid>-/foo/bar.tar", which I
guess would break badly, yes.
I guess we could normalize the tablespace name into [a-zA-Z0-9] or so,
which would still be useful for the majority of cases, I think?
Just stick with the OID. There's no reason that I can see to have
"friendly" names for these tarfiles --- in most cases, the DBA will
never even deal with them, no?
regards, tom lane
On Sun, Jan 16, 2011 at 18:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
Well, we'd try to name the file for that "<oid>-/foo/bar.tar", which I
guess would break badly, yes.I guess we could normalize the tablespace name into [a-zA-Z0-9] or so,
which would still be useful for the majority of cases, I think?Just stick with the OID. There's no reason that I can see to have
"friendly" names for these tarfiles --- in most cases, the DBA will
never even deal with them, no?
No, this is the output mode where the DBA chooses to get the output in
the form of tarfiles. So if chosen, he will definitely deal with it.
When we unpack the tars right away to a directory, they have no name,
so that doesn't apply here.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
On Sun, Jan 16, 2011 at 18:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Just stick with the OID. �There's no reason that I can see to have
"friendly" names for these tarfiles --- in most cases, the DBA will
never even deal with them, no?
No, this is the output mode where the DBA chooses to get the output in
the form of tarfiles. So if chosen, he will definitely deal with it.
Mph. How big a use-case has that got? Offhand I can't see a reason to
use it at all, ever. If you're trying to set up a clone you want the
files unpacked.
regards, tom lane
On Sun, Jan 16, 2011 at 19:03, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
On Sun, Jan 16, 2011 at 18:59, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Just stick with the OID. There's no reason that I can see to have
"friendly" names for these tarfiles --- in most cases, the DBA will
never even deal with them, no?No, this is the output mode where the DBA chooses to get the output in
the form of tarfiles. So if chosen, he will definitely deal with it.Mph. How big a use-case has that got? Offhand I can't see a reason to
use it at all, ever. If you're trying to set up a clone you want the
files unpacked.
Yes, but the tool isn't just for setting up a clone.
If you're doing a regular base backup, that's *not* for replication,
you might want them in files.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
If you're doing a regular base backup, that's *not* for replication,
you might want them in files.
+1
So, is that pg_restore -l idea feasible with your current tar format? I
guess that would translate to pg_basebackup -l <directory>|<oid>.tar.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, Jan 16, 2011 at 19:21, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
Magnus Hagander <magnus@hagander.net> writes:
If you're doing a regular base backup, that's *not* for replication,
you might want them in files.+1
So, is that pg_restore -l idea feasible with your current tar format? I
guess that would translate to pg_basebackup -l <directory>|<oid>.tar.
Um, not easily if you want to translate it to names. Just like you
don't have access to the oid->name mapping without the server started.
The walsender can't read pg_class for example, so it can't generate
that mapping file.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On Sun, Jan 16, 2011 at 11:31 PM, Magnus Hagander <magnus@hagander.net> wrote:
Ok. Updated patch that includes this change attached.
I could not apply the patch cleanly against the git master.
Do you know what the cause is?
$ patch -p1 -d. < /hoge/pg_basebackup.patch
patching file doc/src/sgml/backup.sgml
patching file doc/src/sgml/ref/allfiles.sgml
patching file doc/src/sgml/ref/pg_basebackup.sgml
patching file doc/src/sgml/reference.sgml
patching file src/bin/Makefile
patching file src/bin/pg_basebackup/Makefile
patching file src/bin/pg_basebackup/nls.mk
patching file src/bin/pg_basebackup/pg_basebackup.c
patch: **** malformed patch at line 1428: diff --git
a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Jan 17, 2011 9:16 AM, "Fujii Masao" <masao.fujii@gmail.com> wrote:
On Sun, Jan 16, 2011 at 11:31 PM, Magnus Hagander <magnus@hagander.net>
wrote:
Ok. Updated patch that includes this change attached.
I could not apply the patch cleanly against the git master.
Do you know what the cause is?$ patch -p1 -d. < /hoge/pg_basebackup.patch
patching file doc/src/sgml/backup.sgml
patching file doc/src/sgml/ref/allfiles.sgml
patching file doc/src/sgml/ref/pg_basebackup.sgml
patching file doc/src/sgml/reference.sgml
patching file src/bin/Makefile
patching file src/bin/pg_basebackup/Makefile
patching file src/bin/pg_basebackup/nls.mk
patching file src/bin/pg_basebackup/pg_basebackup.c
patch: **** malformed patch at line 1428: diff --git
a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
Weird, no idea. Will have to look into that later - meanwhile you can grab
the branch tip from my github repo if you want to review it.
/Magnus
On Mon, Jan 17, 2011 at 5:44 PM, Magnus Hagander <magnus@hagander.net> wrote:
Weird, no idea. Will have to look into that later - meanwhile you can grab
the branch tip from my github repo if you want to review it.
Which repo should I grab? You seem to have many repos :)
http://git.postgresql.org/gitweb
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
On Mon, Jan 17, 2011 at 09:50, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Jan 17, 2011 at 5:44 PM, Magnus Hagander <magnus@hagander.net> wrote:
Weird, no idea. Will have to look into that later - meanwhile you can grab
the branch tip from my github repo if you want to review it.Which repo should I grab? You seem to have many repos :)
http://git.postgresql.org/gitweb
Oh, sorry about that. There is only one that contains postgresql though :P
http://github.com/mhagander/postgres, branch streaming_base.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
The walsender can't read pg_class for example, so it can't generate
that mapping file.
I don't see any way out here. So let's call <oid>.tar good enough for now…
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support