pg_basebackup vs. Windows and tablespaces

Started by Noah Mischover 12 years ago9 messages

noah@leadboat.com

over 12 years ago

A "pg_basebackup -Fp" running on the same system as the target cluster will
fail in the presence of tablespaces; it would backup each tablespace to its
original path, and those paths are in use locally for the very originals we're
copying. "pg_basebackup -Ft" does not exhibit that hazard, and I typically
recommend it for folks using tablespaces.

On Windows, we populate pg_tblspc with NTFS junction points. "pg_basebackup
-Fp" reproduces them, and "pg_basebackup -Ft" stores them in the tar archive
as symbolic links. Trouble arises for -Ft backups: no Windows tar expander
that I've found will recreate the junction points. While -Fp backups are
basically usable, commands that copy files on Windows are inconsistent about
their support for junction points; duplicating a base backup after the fact is
error-prone. Windows users of tablespaces are left with limited options: use
"pg_basebackup -Fp" on a different system, or use -Ft but manually recreate
the junction points. We can do better; I see a few options:

1. Include in the base backup a file listing symbolic links/junction points,
then have archive recovery recreate them. This file would be managed like the
backup label file; exclusive backups would actually write it to the master
data directory, and non-exclusive backups would incorporate it on the fly.
pg_basebackup could also omit the actual links from its backup. Nearly any
tar or file copy utility would then suffice.

2. Add a pg_basebackup option like "--destdir" or "--sysroot", meaningful only
with -Fp; tablespace backups will be stored relative to it. So if the actual
tablespace path is c:/foo, --destdir=c:/backups/today would backup that
tablespace to c:/backups/today/c/foo. This facilitates same-server use of -Fp
on all platforms.

3. Use path concatenation instead of symbolic links/junction points for
tablespaces. More invasive, no doubt. For example, we would need to devise a
way for recovery to get the tablespace path.

I think #1 is a good bet; it's self-contained and fully heals the situation
for Windows users. By itself, #2 helps less than #1 on Windows. It may have
independent value. Other ideas, opinions?

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Dimitri Fontaine

dimitri@2ndQuadrant.fr

over 12 years ago

In reply to: Noah Misch (#1)

Re: pg_basebackup vs. Windows and tablespaces

Noah Misch <noah@leadboat.com> writes:

A "pg_basebackup -Fp" running on the same system as the target cluster will
fail in the presence of tablespaces; it would backup each tablespace to its

I'd like to see that fixed, +1.

1. Include in the base backup a file listing symbolic links/junction points,
then have archive recovery recreate them. This file would be managed like the
backup label file; exclusive backups would actually write it to the master
data directory, and non-exclusive backups would incorporate it on the fly.
pg_basebackup could also omit the actual links from its backup. Nearly any
tar or file copy utility would then suffice.

2. Add a pg_basebackup option like "--destdir" or "--sysroot", meaningful only
with -Fp; tablespace backups will be stored relative to it. So if the actual
tablespace path is c:/foo, --destdir=c:/backups/today would backup that
tablespace to c:/backups/today/c/foo. This facilitates same-server use of -Fp
on all platforms.

My understanding is that the second option here would be useful also
when you want to create a standby with a different file layout than the
master, which in some cases is what you want to do (not HA strictly).

Another defect of pg_basebackup is its lack of shandling of tablespaces
mounted within $PGDATA, which happens often enough at customers sites,
whatever we think about that option. Would your work be extended to
cover that too?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 12 years ago

In reply to: Noah Misch (#1)

Re: pg_basebackup vs. Windows and tablespaces

On 08/01/2013 12:15 PM, Noah Misch wrote:

A "pg_basebackup -Fp" running on the same system as the target cluster will
fail in the presence of tablespaces; it would backup each tablespace to its
original path, and those paths are in use locally for the very originals we're
copying. "pg_basebackup -Ft" does not exhibit that hazard, and I typically
recommend it for folks using tablespaces.

On Windows, we populate pg_tblspc with NTFS junction points. "pg_basebackup
-Fp" reproduces them, and "pg_basebackup -Ft" stores them in the tar archive
as symbolic links. Trouble arises for -Ft backups: no Windows tar expander
that I've found will recreate the junction points. While -Fp backups are
basically usable, commands that copy files on Windows are inconsistent about
their support for junction points; duplicating a base backup after the fact is
error-prone. Windows users of tablespaces are left with limited options: use
"pg_basebackup -Fp" on a different system, or use -Ft but manually recreate
the junction points. We can do better; I see a few options:

1. Include in the base backup a file listing symbolic links/junction points,
then have archive recovery recreate them. This file would be managed like the
backup label file; exclusive backups would actually write it to the master
data directory, and non-exclusive backups would incorporate it on the fly.
pg_basebackup could also omit the actual links from its backup. Nearly any
tar or file copy utility would then suffice.

2. Add a pg_basebackup option like "--destdir" or "--sysroot", meaningful only
with -Fp; tablespace backups will be stored relative to it. So if the actual
tablespace path is c:/foo, --destdir=c:/backups/today would backup that
tablespace to c:/backups/today/c/foo. This facilitates same-server use of -Fp
on all platforms.

3. Use path concatenation instead of symbolic links/junction points for
tablespaces. More invasive, no doubt. For example, we would need to devise a
way for recovery to get the tablespace path.

I think #1 is a good bet; it's self-contained and fully heals the situation
for Windows users. By itself, #2 helps less than #1 on Windows. It may have
independent value. Other ideas, opinions?

Thanks for raising this. I agree it's an area that needs work.

I like #1, it seems nice and workable.

I also like the concept of #2, but I think we need to think about it a
bit more. One of the things I like about barman backups is that on
recovery you can map where tablespaces go, on a per tablespace basis
(it's not very well documented, or wasn't when I last looked, but it
does work). I think something like that would be awesome to have for
pg_basebackup. So allowing multiple options of the form

--map-tablespace c:/foo/bar=d:/baz/blurfl

or some such would be great.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Noah Misch

noah@leadboat.com

over 12 years ago

In reply to: Dimitri Fontaine (#2)

Re: pg_basebackup vs. Windows and tablespaces

On Thu, Aug 01, 2013 at 06:44:41PM +0200, Dimitri Fontaine wrote:

Noah Misch <noah@leadboat.com> writes:

2. Add a pg_basebackup option like "--destdir" or "--sysroot", meaningful only
with -Fp; tablespace backups will be stored relative to it. So if the actual
tablespace path is c:/foo, --destdir=c:/backups/today would backup that
tablespace to c:/backups/today/c/foo. This facilitates same-server use of -Fp
on all platforms.

My understanding is that the second option here would be useful also
when you want to create a standby with a different file layout than the
master, which in some cases is what you want to do (not HA strictly).

The way I was envisioning it, you would still need to place the tablespace
directories in their ordinary locations before recovering the base backup.
This was just a way to relocate the backup itself. I can see value in both
capabilities, though.

Another defect of pg_basebackup is its lack of shandling of tablespaces
mounted within $PGDATA, which happens often enough at customers sites,
whatever we think about that option. Would your work be extended to
cover that too?

Not that I had in mind. My latest thinking on that topic is along the lines
of helping folks stop doing it, not making it work better:

/messages/by-id/20121205010442.GA16472@tornado.leadboat.com

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Noah Misch

noah@leadboat.com

over 12 years ago

In reply to: Andrew Dunstan (#3)

Re: pg_basebackup vs. Windows and tablespaces

On Thu, Aug 01, 2013 at 01:04:42PM -0400, Andrew Dunstan wrote:

On 08/01/2013 12:15 PM, Noah Misch wrote:

1. Include in the base backup a file listing symbolic links/junction points,
then have archive recovery recreate them. This file would be managed like the
backup label file; exclusive backups would actually write it to the master
data directory, and non-exclusive backups would incorporate it on the fly.
pg_basebackup could also omit the actual links from its backup. Nearly any
tar or file copy utility would then suffice.

2. Add a pg_basebackup option like "--destdir" or "--sysroot", meaningful only
with -Fp; tablespace backups will be stored relative to it. So if the actual
tablespace path is c:/foo, --destdir=c:/backups/today would backup that
tablespace to c:/backups/today/c/foo. This facilitates same-server use of -Fp
on all platforms.

I like #1, it seems nice and workable.

Agreed. I'll lean in that direction for resolving the proximate problem.

I also like the concept of #2, but I think we need to think about it a
bit more. One of the things I like about barman backups is that on
recovery you can map where tablespaces go, on a per tablespace basis
(it's not very well documented, or wasn't when I last looked, but it
does work). I think something like that would be awesome to have for
pg_basebackup. So allowing multiple options of the form

--map-tablespace c:/foo/bar=d:/baz/blurfl

or some such would be great.

Good point. I see now that the syntax I floated covered just one slice of a
whole range of things folks might want in that area.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Magnus Hagander

magnus@hagander.net

over 12 years ago

In reply to: Noah Misch (#5)

Re: pg_basebackup vs. Windows and tablespaces

On Mon, Aug 5, 2013 at 10:03 PM, Noah Misch <noah@leadboat.com> wrote:

On Thu, Aug 01, 2013 at 01:04:42PM -0400, Andrew Dunstan wrote:

On 08/01/2013 12:15 PM, Noah Misch wrote:

1. Include in the base backup a file listing symbolic links/junction points,
then have archive recovery recreate them. This file would be managed like the
backup label file; exclusive backups would actually write it to the master
data directory, and non-exclusive backups would incorporate it on the fly.
pg_basebackup could also omit the actual links from its backup. Nearly any
tar or file copy utility would then suffice.

2. Add a pg_basebackup option like "--destdir" or "--sysroot", meaningful only
with -Fp; tablespace backups will be stored relative to it. So if the actual
tablespace path is c:/foo, --destdir=c:/backups/today would backup that
tablespace to c:/backups/today/c/foo. This facilitates same-server use of -Fp
on all platforms.

I like #1, it seems nice and workable.

Agreed. I'll lean in that direction for resolving the proximate problem.

+1.

I also like the concept of #2, but I think we need to think about it a
bit more. One of the things I like about barman backups is that on
recovery you can map where tablespaces go, on a per tablespace basis
(it's not very well documented, or wasn't when I last looked, but it
does work). I think something like that would be awesome to have for
pg_basebackup. So allowing multiple options of the form

--map-tablespace c:/foo/bar=d:/baz/blurfl

or some such would be great.

Good point. I see now that the syntax I floated covered just one slice of a
whole range of things folks might want in that area.

I think I have an old patch around for doing just the map-tablespace
thing that I never quite finished. That one was mostly useful for
setting up replicas really, since that's when you know at backup time
where you want to the new tablespaces to go. For a regular backup, you
want it to happen at restore time. And in this case, you're quite
likely working off the tarfiles, and we don't really have a program
dealing with the restore of those at all - you're just supposed to do
it manually.

A trivial tool that worked off the directory of tarfiles and allowed
remapping of the tablespace locations (by updating the
symlinks/junctions restored out of the base.tar flie) might be an
easier and less invasive way to do it than to put it in the actual
recovery code?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 12 years ago

In reply to: Magnus Hagander (#6)

Re: pg_basebackup vs. Windows and tablespaces

On 08/12/2013 01:40 PM, Magnus Hagander wrote:

I also like the concept of #2, but I think we need to think about it a
bit more. One of the things I like about barman backups is that on
recovery you can map where tablespaces go, on a per tablespace basis
(it's not very well documented, or wasn't when I last looked, but it
does work). I think something like that would be awesome to have for
pg_basebackup. So allowing multiple options of the form

--map-tablespace c:/foo/bar=d:/baz/blurfl

or some such would be great.

Good point. I see now that the syntax I floated covered just one slice of a
whole range of things folks might want in that area.

I think I have an old patch around for doing just the map-tablespace
thing that I never quite finished. That one was mostly useful for
setting up replicas really, since that's when you know at backup time
where you want to the new tablespaces to go. For a regular backup, you
want it to happen at restore time. And in this case, you're quite
likely working off the tarfiles, and we don't really have a program
dealing with the restore of those at all - you're just supposed to do
it manually.

A trivial tool that worked off the directory of tarfiles and allowed
remapping of the tablespace locations (by updating the
symlinks/junctions restored out of the base.tar flie) might be an
easier and less invasive way to do it than to put it in the actual
recovery code?

What barman does is to dissolve the symlink when making its backup (i.e
pg_tblspc/12345 becomes a directory in the backup instead of a symlink),
and store the info relating to the source symlink in its metadata file.
On restore it recreates the symlink. It's at that stage that you can
modify its default behaviour by specifying where the link should go.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Magnus Hagander

magnus@hagander.net

over 12 years ago

In reply to: Andrew Dunstan (#7)

Re: pg_basebackup vs. Windows and tablespaces

On Mon, Aug 12, 2013 at 8:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 08/12/2013 01:40 PM, Magnus Hagander wrote:

I also like the concept of #2, but I think we need to think about it a
bit more. One of the things I like about barman backups is that on
recovery you can map where tablespaces go, on a per tablespace basis
(it's not very well documented, or wasn't when I last looked, but it
does work). I think something like that would be awesome to have for
pg_basebackup. So allowing multiple options of the form

--map-tablespace c:/foo/bar=d:/baz/blurfl

or some such would be great.

Good point. I see now that the syntax I floated covered just one slice
of a
whole range of things folks might want in that area.

I think I have an old patch around for doing just the map-tablespace
thing that I never quite finished. That one was mostly useful for
setting up replicas really, since that's when you know at backup time
where you want to the new tablespaces to go. For a regular backup, you
want it to happen at restore time. And in this case, you're quite
likely working off the tarfiles, and we don't really have a program
dealing with the restore of those at all - you're just supposed to do
it manually.

A trivial tool that worked off the directory of tarfiles and allowed
remapping of the tablespace locations (by updating the
symlinks/junctions restored out of the base.tar flie) might be an
easier and less invasive way to do it than to put it in the actual
recovery code?

What barman does is to dissolve the symlink when making its backup (i.e
pg_tblspc/12345 becomes a directory in the backup instead of a symlink), and
store the info relating to the source symlink in its metadata file. On
restore it recreates the symlink. It's at that stage that you can modify its
default behaviour by specifying where the link should go.

Something like that makes sense for a plain format dump - but maybe
not for a tar one. And in either case, that also requires there to be
a pg_baserestore (or whatever you'd want to call it, please come up
with a better name :D) to do the restore process, to make sure it's
properly mapped.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andrew Dunstan

andrew@dunslane.net

over 12 years ago

In reply to: Magnus Hagander (#8)

Re: pg_basebackup vs. Windows and tablespaces

On 08/12/2013 02:22 PM, Magnus Hagander wrote:

On Mon, Aug 12, 2013 at 8:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 08/12/2013 01:40 PM, Magnus Hagander wrote:

I also like the concept of #2, but I think we need to think about it a
bit more. One of the things I like about barman backups is that on
recovery you can map where tablespaces go, on a per tablespace basis
(it's not very well documented, or wasn't when I last looked, but it
does work). I think something like that would be awesome to have for
pg_basebackup. So allowing multiple options of the form

--map-tablespace c:/foo/bar=d:/baz/blurfl

or some such would be great.

Good point. I see now that the syntax I floated covered just one slice
of a
whole range of things folks might want in that area.

I think I have an old patch around for doing just the map-tablespace
thing that I never quite finished. That one was mostly useful for
setting up replicas really, since that's when you know at backup time
where you want to the new tablespaces to go. For a regular backup, you
want it to happen at restore time. And in this case, you're quite
likely working off the tarfiles, and we don't really have a program
dealing with the restore of those at all - you're just supposed to do
it manually.

A trivial tool that worked off the directory of tarfiles and allowed
remapping of the tablespace locations (by updating the
symlinks/junctions restored out of the base.tar flie) might be an
easier and less invasive way to do it than to put it in the actual
recovery code?

What barman does is to dissolve the symlink when making its backup (i.e
pg_tblspc/12345 becomes a directory in the backup instead of a symlink), and
store the info relating to the source symlink in its metadata file. On
restore it recreates the symlink. It's at that stage that you can modify its
default behaviour by specifying where the link should go.

Something like that makes sense for a plain format dump - but maybe
not for a tar one. And in either case, that also requires there to be
a pg_baserestore (or whatever you'd want to call it, please come up
with a better name :D) to do the restore process, to make sure it's
properly mapped.

I'm not saying to do it exactly as barman does. I'm just describing how
they do it.

I'd be quite happy if we just provided, at least to begin with,
tablespace mapping for plain format pg_basebackup dumps.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers