cross-platform pg_basebackup

Started by Robert Haasabout 3 years ago10 messages
#1Robert Haas
robertmhaas@gmail.com

Suppose that, for some reason, you want to use pg_basebackup on a
Linux machine to back up a database cluster on a Windows machine.
Suppose further that you attempt to use the -T option. Then you might
run afoul of this check:

/*
* This check isn't absolutely necessary. But all tablespaces are created
* with absolute directories, so specifying a non-absolute path here would
* just never match, possibly confusing users. It's also good to be
* consistent with the new_dir check.
*/
if (!is_absolute_path(cell->old_dir))
pg_fatal("old directory is not an absolute path in tablespace
mapping: %s",
cell->old_dir);

The problem is that the definition of is_absolute_path() here differs
depending on whether you are on Windows or not. So this code is, I
think, subtly incorrect. What it is testing is whether the
user-specified pathname is an absolute pathname *on the local machine*
whereas what it should be testing is whether the user-specified
pathname is an absolute pathname *on the remote machine*. There's no
problem if both sides are Windows or neither side is Windows, but if
the remote side is and the local side isn't, then something like
-TC:\foo=/backup/foo will fail. As far as I know, there's no reason
why that shouldn't be permitted to work.

What this check is actually intending to prevent, I believe, is
something like -T../mytablespace=/bkp/ts1, because that wouldn't
actually work: the value in the list will be an absolute path. The
tablespace wouldn't get remapped, and the user might be confused about
why it didn't, so it is good that we tell them what they did wrong.
However, I think we could relax the check a little bit, something
along the lines of !is_nonwindows_absolute_path(cell->old_dir) &&
!is_windows_absolute_path(dir). We can't actually know whether the
remote side is Windows or non-Windows, but if the string we're given
is plausibly an absolute path under either set of conventions, it's
probably fine to just search the list for it and see if it shows up.

This would have the disadvantage that if a Linux user creates a
tablespace directory inside $PGDATA and gives it a name like
/home/rhaas/pgdata/C:\Program Files\PostgreSQL\Data, and then attempts
a backup with '-TC:\Program Files\PostgreSQL\Data=/tmp/ts1' it will
not relocate the tablespace, yet the user won't get a message
explaining why. I'm prepared to dismiss that scenario as "not a real
use case".

Thoughts?

--
Robert Haas
EDB: http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: cross-platform pg_basebackup

Robert Haas <robertmhaas@gmail.com> writes:

However, I think we could relax the check a little bit, something
along the lines of !is_nonwindows_absolute_path(cell->old_dir) &&
!is_windows_absolute_path(dir). We can't actually know whether the
remote side is Windows or non-Windows, but if the string we're given
is plausibly an absolute path under either set of conventions, it's
probably fine to just search the list for it and see if it shows up.

Seems reasonable.

This would have the disadvantage that if a Linux user creates a
tablespace directory inside $PGDATA and gives it a name like
/home/rhaas/pgdata/C:\Program Files\PostgreSQL\Data, and then attempts
a backup with '-TC:\Program Files\PostgreSQL\Data=/tmp/ts1' it will
not relocate the tablespace, yet the user won't get a message
explaining why. I'm prepared to dismiss that scenario as "not a real
use case".

Agreed.

regards, tom lane

#3Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
1 attachment(s)
Re: cross-platform pg_basebackup

On Thu, Oct 20, 2022 at 12:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

However, I think we could relax the check a little bit, something
along the lines of !is_nonwindows_absolute_path(cell->old_dir) &&
!is_windows_absolute_path(dir). We can't actually know whether the
remote side is Windows or non-Windows, but if the string we're given
is plausibly an absolute path under either set of conventions, it's
probably fine to just search the list for it and see if it shows up.

Seems reasonable.

Cool. Here's a patch.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v1-0001-pg_basebackup-When-Tx-y-is-used-weaken-absolute-p.patchapplication/octet-stream; name=v1-0001-pg_basebackup-When-Tx-y-is-used-weaken-absolute-p.patchDownload
From b64c81d0d0f388aed10a543ed8d0a2f6fff478b9 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 20 Oct 2022 12:53:55 -0400
Subject: [PATCH v1] pg_basebackup: When -Tx=y is used, weaken absolute path
 check on x.

Specifically, if x looks like it could be an absolute pathname on
either Windows or Linux, assume it is. The previous code checked
using the local machine's rules, but the remote server need not be
running the same OS.
---
 src/bin/pg_basebackup/pg_basebackup.c | 20 ++++++++++++----
 src/include/port.h                    | 34 +++++++++++++++------------
 2 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7d17f8b33e..22836ca01a 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -340,12 +340,22 @@ tablespace_list_append(const char *arg)
 		pg_fatal("invalid tablespace mapping format \"%s\", must be \"OLDDIR=NEWDIR\"", arg);
 
 	/*
-	 * This check isn't absolutely necessary.  But all tablespaces are created
-	 * with absolute directories, so specifying a non-absolute path here would
-	 * just never match, possibly confusing users.  It's also good to be
-	 * consistent with the new_dir check.
+	 * All tablespaces are created with absolute directories, so specifying a
+	 * non-absolute path here would just never match, possibly confusing users.
+	 * Since we don't know whether the remote side is Windows or not, and it
+	 * might be different than the local side, permit any path that could be
+	 * absolute under either set of rules.
+	 *
+	 * (There is little practical risk of confusion here, because someone
+	 * running entirely on Linux isn't likely to have a relative path that
+	 * begins with a backslash or something that looks like a drive
+	 * specification. If they do, and they also incorrectly believe that
+	 * a relative path is acceptable here, we'll silently fail to warn them
+	 * of their mistake, and the -T option will just not get applied, same
+	 * as if they'd specified -T for a nonexistent tablespace.)
 	 */
-	if (!is_absolute_path(cell->old_dir))
+	if (!is_nonwindows_absolute_path(cell->old_dir) &&
+		!is_windows_absolute_path(cell->old_dir))
 		pg_fatal("old directory is not an absolute path in tablespace mapping: %s",
 				 cell->old_dir);
 
diff --git a/src/include/port.h b/src/include/port.h
index 69d8818d61..79a41b4c7f 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -78,28 +78,32 @@ extern void get_parent_directory(char *path);
 extern char **pgfnames(const char *path);
 extern void pgfnames_cleanup(char **filenames);
 
-/*
- *	is_absolute_path
- *
- *	By making this a macro we avoid needing to include path.c in libpq.
- */
-#ifndef WIN32
-#define IS_DIR_SEP(ch)	((ch) == '/')
-
-#define is_absolute_path(filename) \
+#define IS_NONWINDOWS_DIR_SEP(ch)	((ch) == '/')
+#define is_nonwindows_absolute_path(filename) \
 ( \
-	IS_DIR_SEP((filename)[0]) \
+	IS_NONWINDOWS_DIR_SEP((filename)[0]) \
 )
-#else
-#define IS_DIR_SEP(ch)	((ch) == '/' || (ch) == '\\')
 
+#define IS_WINDOWS_DIR_SEP(ch)	((ch) == '/' || (ch) == '\\')
 /* See path_is_relative_and_below_cwd() for how we handle 'E:abc'. */
-#define is_absolute_path(filename) \
+#define is_windows_absolute_path(filename) \
 ( \
-	IS_DIR_SEP((filename)[0]) || \
+	IS_WINDOWS_DIR_SEP((filename)[0]) || \
 	(isalpha((unsigned char) ((filename)[0])) && (filename)[1] == ':' && \
-	 IS_DIR_SEP((filename)[2])) \
+	 IS_WINDOWS_DIR_SEP((filename)[2])) \
 )
+
+/*
+ *	is_absolute_path and IS_DIR_SEP
+ *
+ *	By using macros here we avoid needing to include path.c in libpq.
+ */
+#ifndef WIN32
+#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP
+#define is_absolute_path is_nonwindows_absolute_path
+#else
+#define IS_DIR_SEP IS_WINDOWS_DIR_SEP
+#define is_absolute_path is_windows_absolute_path
 #endif
 
 /*
-- 
2.24.3 (Apple Git-128)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: cross-platform pg_basebackup

Robert Haas <robertmhaas@gmail.com> writes:

Cool. Here's a patch.

LGTM, except I'd be inclined to ensure that all the macros
are function-style, ie

+#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)

not just

+#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP

I don't recall the exact rules, but I know that the second style
can lead to expanding the macro in more cases, which we likely
don't want. It also seems like better documentation to show
the expected arguments.

regards, tom lane

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
1 attachment(s)
Re: cross-platform pg_basebackup

On Thu, Oct 20, 2022 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Cool. Here's a patch.

LGTM, except I'd be inclined to ensure that all the macros
are function-style, ie

+#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)

not just

+#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP

I don't recall the exact rules, but I know that the second style
can lead to expanding the macro in more cases, which we likely
don't want. It also seems like better documentation to show
the expected arguments.

OK, thanks. v2 attached.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v2-0001-pg_basebackup-Fix-cross-platform-tablespace-reloc.patchapplication/octet-stream; name=v2-0001-pg_basebackup-Fix-cross-platform-tablespace-reloc.patchDownload
From 70b19137161416f62324f8da43dbbd5c99d38373 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Thu, 20 Oct 2022 12:53:55 -0400
Subject: [PATCH v2] pg_basebackup: Fix cross-platform tablespace relocation.

Specifically, when pg_basebackup is invoked with -Tx=y, don't error
out if x could plausibly be an absolute path either on Windows or on
non-Windows systems. We don't know whether the remote system is
running the same OS as the local system, so it's not appropriate to
assume that our local rule about absolute pathnames is the same as
the rule on the remote system.

Patch by me, reviewed by Tom Lane.

Discussion: http://postgr.es/m/CA+TgmoY+jC3YiskomvYKDPK3FbrmsDU7_8+wMHt02HOdJeRb0g@mail.gmail.com
---
 src/bin/pg_basebackup/pg_basebackup.c | 20 ++++++++++++----
 src/include/port.h                    | 34 +++++++++++++++------------
 2 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 7d17f8b33e..22836ca01a 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -340,12 +340,22 @@ tablespace_list_append(const char *arg)
 		pg_fatal("invalid tablespace mapping format \"%s\", must be \"OLDDIR=NEWDIR\"", arg);
 
 	/*
-	 * This check isn't absolutely necessary.  But all tablespaces are created
-	 * with absolute directories, so specifying a non-absolute path here would
-	 * just never match, possibly confusing users.  It's also good to be
-	 * consistent with the new_dir check.
+	 * All tablespaces are created with absolute directories, so specifying a
+	 * non-absolute path here would just never match, possibly confusing users.
+	 * Since we don't know whether the remote side is Windows or not, and it
+	 * might be different than the local side, permit any path that could be
+	 * absolute under either set of rules.
+	 *
+	 * (There is little practical risk of confusion here, because someone
+	 * running entirely on Linux isn't likely to have a relative path that
+	 * begins with a backslash or something that looks like a drive
+	 * specification. If they do, and they also incorrectly believe that
+	 * a relative path is acceptable here, we'll silently fail to warn them
+	 * of their mistake, and the -T option will just not get applied, same
+	 * as if they'd specified -T for a nonexistent tablespace.)
 	 */
-	if (!is_absolute_path(cell->old_dir))
+	if (!is_nonwindows_absolute_path(cell->old_dir) &&
+		!is_windows_absolute_path(cell->old_dir))
 		pg_fatal("old directory is not an absolute path in tablespace mapping: %s",
 				 cell->old_dir);
 
diff --git a/src/include/port.h b/src/include/port.h
index 69d8818d61..02189d59b1 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -78,28 +78,32 @@ extern void get_parent_directory(char *path);
 extern char **pgfnames(const char *path);
 extern void pgfnames_cleanup(char **filenames);
 
-/*
- *	is_absolute_path
- *
- *	By making this a macro we avoid needing to include path.c in libpq.
- */
-#ifndef WIN32
-#define IS_DIR_SEP(ch)	((ch) == '/')
-
-#define is_absolute_path(filename) \
+#define IS_NONWINDOWS_DIR_SEP(ch)	((ch) == '/')
+#define is_nonwindows_absolute_path(filename) \
 ( \
-	IS_DIR_SEP((filename)[0]) \
+	IS_NONWINDOWS_DIR_SEP((filename)[0]) \
 )
-#else
-#define IS_DIR_SEP(ch)	((ch) == '/' || (ch) == '\\')
 
+#define IS_WINDOWS_DIR_SEP(ch)	((ch) == '/' || (ch) == '\\')
 /* See path_is_relative_and_below_cwd() for how we handle 'E:abc'. */
-#define is_absolute_path(filename) \
+#define is_windows_absolute_path(filename) \
 ( \
-	IS_DIR_SEP((filename)[0]) || \
+	IS_WINDOWS_DIR_SEP((filename)[0]) || \
 	(isalpha((unsigned char) ((filename)[0])) && (filename)[1] == ':' && \
-	 IS_DIR_SEP((filename)[2])) \
+	 IS_WINDOWS_DIR_SEP((filename)[2])) \
 )
+
+/*
+ *	is_absolute_path and IS_DIR_SEP
+ *
+ *	By using macros here we avoid needing to include path.c in libpq.
+ */
+#ifndef WIN32
+#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)
+#define is_absolute_path(filename) is_nonwindows_absolute_path(filename)
+#else
+#define IS_DIR_SEP(ch) IS_WINDOWS_DIR_SEP(ch)
+#define is_absolute_path(filename) is_windows_absolute_path(filename)
 #endif
 
 /*
-- 
2.24.3 (Apple Git-128)

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#5)
Re: cross-platform pg_basebackup

On 2022-10-20 Th 14:47, Robert Haas wrote:

On Thu, Oct 20, 2022 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Cool. Here's a patch.

LGTM, except I'd be inclined to ensure that all the macros
are function-style, ie

+#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)

not just

+#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP

I don't recall the exact rules, but I know that the second style
can lead to expanding the macro in more cases, which we likely
don't want. It also seems like better documentation to show
the expected arguments.

OK, thanks. v2 attached.

Looks good.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#7davinder singh
davindersingh2692@gmail.com
In reply to: Andrew Dunstan (#6)
Re: cross-platform pg_basebackup

Hi,
Patch v2 looks good to me, I have tested it, and pg_basebackup works fine
across the platforms (Windows to Linux and Linux to Windows).
Syntax used for testing
$ pg_basebackup -h remote_server_ip -p 5432 -U user_name -D backup/data -T
olddir=newdir

I have also tested with non-absolute paths, it behaves as expected.

On Fri, Oct 21, 2022 at 12:42 AM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-10-20 Th 14:47, Robert Haas wrote:

On Thu, Oct 20, 2022 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Cool. Here's a patch.

LGTM, except I'd be inclined to ensure that all the macros
are function-style, ie

+#define IS_DIR_SEP(ch) IS_NONWINDOWS_DIR_SEP(ch)

not just

+#define IS_DIR_SEP IS_NONWINDOWS_DIR_SEP

I don't recall the exact rules, but I know that the second style
can lead to expanding the macro in more cases, which we likely
don't want. It also seems like better documentation to show
the expected arguments.

OK, thanks. v2 attached.

Looks good.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

--
Regards,
Davinder
EnterpriseDB: http://www.enterprisedb.com

#8Robert Haas
robertmhaas@gmail.com
In reply to: davinder singh (#7)
Re: cross-platform pg_basebackup

On Fri, Oct 21, 2022 at 4:14 AM davinder singh
<davindersingh2692@gmail.com> wrote:

Hi,
Patch v2 looks good to me, I have tested it, and pg_basebackup works fine across the platforms (Windows to Linux and Linux to Windows).
Syntax used for testing
$ pg_basebackup -h remote_server_ip -p 5432 -U user_name -D backup/data -T olddir=newdir

I have also tested with non-absolute paths, it behaves as expected.

Cool. Thanks to you, Andrew, and Tom for reviewing.

Committed and back-patched to all supported branches.

--
Robert Haas
EDB: http://www.enterprisedb.com

#9Julien Rouhaud
rjuju123@gmail.com
In reply to: Robert Haas (#8)
Re: cross-platform pg_basebackup

Hi,

On Fri, Oct 21, 2022 at 09:15:39AM -0400, Robert Haas wrote:

Committed and back-patched to all supported branches.

Is there any additional things to be taken care of or should
https://commitfest.postgresql.org/40/3954/ be closed?

#10Robert Haas
robertmhaas@gmail.com
In reply to: Julien Rouhaud (#9)
Re: cross-platform pg_basebackup

On Sun, Oct 23, 2022 at 10:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Fri, Oct 21, 2022 at 09:15:39AM -0400, Robert Haas wrote:

Committed and back-patched to all supported branches.

Is there any additional things to be taken care of or should
https://commitfest.postgresql.org/40/3954/ be closed?

As far as I know we're done. I have closed that entry.

--
Robert Haas
EDB: http://www.enterprisedb.com