Yet another failure mode in pg_upgrade

Started by Tom Laneover 13 years ago28 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I've been experimenting with moving the Unix socket directory to
/var/run/postgresql for the Fedora distribution (don't ask :-().
It's mostly working, but I found out yet another way that pg_upgrade
can crash and burn: it doesn't consider the possibility that the
old or new postmaster is compiled with a different default
unix_socket_directory than what is compiled into the libpq it's using
or that pg_dump is using.

This is another hazard that we could forget about if we had some way for
pg_upgrade to run standalone backends instead of starting a postmaster.
But in the meantime, I suggest it'd be a good idea for pg_upgrade to
explicitly set unix_socket_directory (or unix_socket_directories in
HEAD) when starting the postmasters, and also explicitly set PGHOST
to ensure that the client-side code plays along.

regards, tom lane

#2Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#1)
Re: Yet another failure mode in pg_upgrade

On Mon, Aug 13, 2012 at 4:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been experimenting with moving the Unix socket directory to
/var/run/postgresql for the Fedora distribution (don't ask :-().
It's mostly working, but I found out yet another way that pg_upgrade
can crash and burn: it doesn't consider the possibility that the
old or new postmaster is compiled with a different default
unix_socket_directory than what is compiled into the libpq it's using
or that pg_dump is using.

This is another hazard that we could forget about if we had some way for
pg_upgrade to run standalone backends instead of starting a postmaster.

Yeah, that would be nice.

But in the meantime, I suggest it'd be a good idea for pg_upgrade to
explicitly set unix_socket_directory (or unix_socket_directories in
HEAD) when starting the postmasters, and also explicitly set PGHOST
to ensure that the client-side code plays along.

That sounds like a good idea for other reasons as well - manual
connections attempting to get in during an upgrade will just fail with
a "no connection" error, which makes sense...

So, +1.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#3Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#2)
1 attachment(s)
Re: Yet another failure mode in pg_upgrade

On Mon, Aug 13, 2012 at 12:46:43PM +0200, Magnus Hagander wrote:

On Mon, Aug 13, 2012 at 4:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been experimenting with moving the Unix socket directory to
/var/run/postgresql for the Fedora distribution (don't ask :-().
It's mostly working, but I found out yet another way that pg_upgrade
can crash and burn: it doesn't consider the possibility that the
old or new postmaster is compiled with a different default
unix_socket_directory than what is compiled into the libpq it's using
or that pg_dump is using.

This is another hazard that we could forget about if we had some way for
pg_upgrade to run standalone backends instead of starting a postmaster.

Yeah, that would be nice.

But in the meantime, I suggest it'd be a good idea for pg_upgrade to
explicitly set unix_socket_directory (or unix_socket_directories in
HEAD) when starting the postmasters, and also explicitly set PGHOST
to ensure that the client-side code plays along.

That sounds like a good idea for other reasons as well - manual
connections attempting to get in during an upgrade will just fail with
a "no connection" error, which makes sense...

So, +1.

OK, I looked this over, and I have a patch, attached.

Because we are already playing with socket directories, this patch creates
the socket files in the current directory for upgrades and non-live
checks, but not live checks. This eliminates the "someone accidentally
connects" problem, at least on Unix, plus we are using port 50432
already. This also turns off TCP connections for unix domain socket
systems.

For "live check" operation, you are checking a running server, so
assuming the socket is in the current directory is not going to work.
What the code does is to read the 5th line from the running server's
postmaster.pid file, which has the socket directory in PG >= 9.1. For
pre-9.1, pg_upgrade uses the compiled-in defaults for socket directory.
If the defaults are different between the two servers, the new binaries,
e.g. pg_dump, will not work. The fix is for the user to set pg_upgrade
-O to match the old socket directory, and set PGHOST before running
pg_upgrade. I could not find a good way to generate a proper error
message because we are blind to the socket directory in pre-9.1.
Frankly, this is a problem if the old pre-9.1 server is running in a
user-configured socket directory too, so a documentation addition seems
right here.

So, in summary, this patch moves the socket directory to the current
directory all but live check operation, and handles different socket
directories for old cluster >= 9.1. I have added a documentation
mention of how to make this work for for pre-9.1 old servers.

Thus completes another "surgery on a moving train" that is pg_upgrade
development.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

socket.difftext/x-diff; charset=us-asciiDownload
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 94bce50..8b35d8f
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*************** adjust_data_dir(ClusterInfo *cluster)
*** 376,378 ****
--- 376,431 ----
  
  	check_ok();
  }
+ 
+ 
+ #if HAVE_UNIX_SOCKETS
+ /*
+  * set_sockdir_and_pghost
+  *
+  * Set the socket directory to either the configured location (live check)
+  * or the current directory.
+  */
+ void
+ set_sockdir_and_pghost(bool live_check)
+ {
+ 	if (!live_check)
+ 	{
+ 		/* Use the current directory for the socket */
+ 		if (!getcwd(os_info.sockdir, MAXPGPATH))
+ 			pg_log(PG_FATAL, "cannot find current directory\n");
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 *	If we are doing a live check, we will use the old cluster's Unix
+ 		 *	domain socket directory so we can connect to the live server.
+ 		 */
+ 
+ 		/* sockdir added to the 5th line of postmaster.pid in PG 9.1 */
+ 		if (GET_MAJOR_VERSION(old_cluster.major_version) >= 901)
+ 		{
+ 			char		filename[MAXPGPATH];
+ 			FILE		*fp;
+ 			int			i;
+ 
+ 			snprintf(filename, sizeof(filename), "%s/postmaster.pid", old_cluster.pgdata);
+ 			if ((fp = fopen(filename, "r")) == NULL)
+ 				pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
+ 
+ 			for (i = 0; i < 5; i++)
+ 				if (fgets(os_info.sockdir, sizeof(os_info.sockdir), fp) == NULL)
+ 					pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
+ 			fclose(fp);
+ 		}
+ 		else
+ 			return;		/* Can't get live sockdir, so use the default. */
+ 
+ 		/* Remove trailing newline */
+ 		if (strchr(os_info.sockdir, '\n') != NULL)
+ 			*strchr(os_info.sockdir, '\n') = '\0';
+ 	}
+ 
+ 	/* For client communication */
+ 	pg_putenv("PGHOST", os_info.sockdir);
+ }
+ #endif
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index c47c8bb..8cad5c3
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*************** main(int argc, char **argv)
*** 88,93 ****
--- 88,97 ----
  	check_cluster_versions();
  	check_cluster_compatibility(live_check);
  
+ #if HAVE_UNIX_SOCKETS
+ 	set_sockdir_and_pghost(live_check);
+ #endif
+ 
  	check_old_cluster(live_check, &sequence_script_file_name);
  
  
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index fa4c6c0..a712e21
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
*************** typedef struct
*** 267,272 ****
--- 267,275 ----
  	const char *progname;		/* complete pathname for this program */
  	char	   *exec_path;		/* full path to my executable */
  	char	   *user;			/* username for clusters */
+ #if HAVE_UNIX_SOCKETS
+ 	char		sockdir[NAMEDATALEN];	/* directory for Unix Domain socket */
+ #endif
  	char	  **tablespaces;	/* tablespaces */
  	int			num_tablespaces;
  	char	  **libraries;		/* loadable libraries */
*************** void print_maps(FileNameMap *maps, int n
*** 387,392 ****
--- 390,398 ----
  
  void		parseCommandLine(int argc, char *argv[]);
  void		adjust_data_dir(ClusterInfo *cluster);
+ #if HAVE_UNIX_SOCKETS
+ void		set_sockdir_and_pghost(bool live_check);
+ #endif
  
  /* relfilenode.c */
  
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 1fb0d6c..c042fb4
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*************** start_postmaster(ClusterInfo *cluster)
*** 144,149 ****
--- 144,150 ----
  	PGconn	   *conn;
  	bool		exit_hook_registered = false;
  	bool		pg_ctl_return = false;
+ 	char		socket_string[MAXPGPATH + 100];
  
  	if (!exit_hook_registered)
  	{
*************** start_postmaster(ClusterInfo *cluster)
*** 151,156 ****
--- 152,178 ----
  		exit_hook_registered = true;
  	}
  
+ 	socket_string[0] = '\0';
+ 
+ #ifdef HAVE_UNIX_SOCKETS
+ 	/*
+ 	 *	Report the Unix domain socket directory location to the postmaster.
+ 	 */
+ 
+ #define LISTEN_STR	" -c listen_addresses=''"
+ 
+ 	/* Have a sockdir to use? */
+ 	if (strlen(os_info.sockdir) != 0)
+ 		snprintf(socket_string, sizeof(socket_string) - strlen(LISTEN_STR),
+ 			" -c %s='%s'",
+ 			(GET_MAJOR_VERSION(cluster->major_version) < 903) ?
+ 				"unix_socket_directory" : "unix_socket_directories",
+ 			os_info.sockdir);
+ 	
+ 	/* prevent TCP/IP connections */
+ 	strcat(socket_string, LISTEN_STR);
+ #endif
+ 
  	/*
  	 * Using autovacuum=off disables cleanup vacuum and analyze, but freeze
  	 * vacuums can still happen, so we set autovacuum_freeze_max_age to its
*************** start_postmaster(ClusterInfo *cluster)
*** 159,170 ****
  	 * not touch them.
  	 */
  	snprintf(cmd, sizeof(cmd),
! 			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d %s %s\" start",
  		  cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
  			 (cluster->controldata.cat_ver >=
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? "-b" :
  			 "-c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
! 			 cluster->pgopts ? cluster->pgopts : "");
  
  	/*
  	 * Don't throw an error right away, let connecting throw the error because
--- 181,192 ----
  	 * not touch them.
  	 */
  	snprintf(cmd, sizeof(cmd),
! 			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d %s %s%s\" start",
  		  cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
  			 (cluster->controldata.cat_ver >=
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? "-b" :
  			 "-c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
! 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
  
  	/*
  	 * Don't throw an error right away, let connecting throw the error because
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index b62aba2..b621911
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*************** psql --username postgres --file script.s
*** 520,525 ****
--- 520,532 ----
    </para>
  
    <para>
+    If running check on an old pre-9.1 Unix-like running server, and the
+    old and new servers use different Unix-domain socket directories,
+    use the <option>-O</> option so the new server uses the same socket
+    directory as the old server, and set <envar>PGHOST</> similarly.
+   </para>
+ 
+   <para>
     A Log-Shipping Standby Server (<xref linkend="warm-standby">) cannot
     be upgraded because the server must allow writes.  The simplest way
     is to upgrade the primary and use rsync to rebuild the standbys.
#4Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#3)
Re: Yet another failure mode in pg_upgrade

On Sat, Sep 1, 2012 at 11:45:58AM -0400, Bruce Momjian wrote:

So, in summary, this patch moves the socket directory to the current
directory all but live check operation, and handles different socket
directories for old cluster >= 9.1. I have added a documentation
mention of how to make this work for for pre-9.1 old servers.

Thus completes another "surgery on a moving train" that is pg_upgrade
development.

Oh, one more thing. We have talked about creating some special pipe for
pg_upgrade to communicate the a backend directly, but live check mode
hightlights that we will _still_ need traditional connection abilities
even if we add the pipe ability.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Sep 1, 2012 at 11:45:58AM -0400, Bruce Momjian wrote:

So, in summary, this patch moves the socket directory to the current
directory all but live check operation, and handles different socket
directories for old cluster >= 9.1. I have added a documentation
mention of how to make this work for for pre-9.1 old servers.

Thus completes another "surgery on a moving train" that is pg_upgrade
development.

Oh, one more thing. We have talked about creating some special pipe for
pg_upgrade to communicate the a backend directly, but live check mode
hightlights that we will _still_ need traditional connection abilities
even if we add the pipe ability.

So? By definition, the live check mode is not guaranteed to produce
correct answers, since other connections could be changing the
database's contents. The problem we are interested in solving here is
preventing other connections from occurring when we're doing the upgrade
"for real". All this stuff with moving sockets around is nothing but
security by obscurity; it cannot positively guarantee that there's
nobody else connecting to the database while pg_upgrade runs. (Most
notably, on Windows there's no guarantee at all.)

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

+ 	/*
+ 	 *	Report the Unix domain socket directory location to the postmaster.
+ 	 */

"Report" seems entirely the wrong verb there.

+ #define LISTEN_STR	" -c listen_addresses=''"
+ 
+ 	/* Have a sockdir to use? */
+ 	if (strlen(os_info.sockdir) != 0)
+ 		snprintf(socket_string, sizeof(socket_string) - strlen(LISTEN_STR),
+ 			" -c %s='%s'",
+ 			(GET_MAJOR_VERSION(cluster->major_version) < 903) ?
+ 				"unix_socket_directory" : "unix_socket_directories",
+ 			os_info.sockdir);
+ 	
+ 	/* prevent TCP/IP connections */
+ 	strcat(socket_string, LISTEN_STR);

IMO this would be simpler and more readable if you got rid of the LISTEN_STR
#define and just included -c listen_addresses='' in the snprintf format
string. The comment for the whole thing should be something like
"If we have a socket directory to use, command the postmaster to use it,
and disable TCP/IP connections altogether".

regards, tom lane

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: Yet another failure mode in pg_upgrade

On Sat, Sep 1, 2012 at 02:23:22PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

+ 	/*
+ 	 *	Report the Unix domain socket directory location to the postmaster.
+ 	 */

"Report" seems entirely the wrong verb there.

+ #define LISTEN_STR	" -c listen_addresses=''"
+ 
+ 	/* Have a sockdir to use? */
+ 	if (strlen(os_info.sockdir) != 0)
+ 		snprintf(socket_string, sizeof(socket_string) - strlen(LISTEN_STR),
+ 			" -c %s='%s'",
+ 			(GET_MAJOR_VERSION(cluster->major_version) < 903) ?
+ 				"unix_socket_directory" : "unix_socket_directories",
+ 			os_info.sockdir);
+ 	
+ 	/* prevent TCP/IP connections */
+ 	strcat(socket_string, LISTEN_STR);

IMO this would be simpler and more readable if you got rid of the LISTEN_STR
#define and just included -c listen_addresses='' in the snprintf format
string. The comment for the whole thing should be something like
"If we have a socket directory to use, command the postmaster to use it,
and disable TCP/IP connections altogether".

Well, you only want the unix_socket* if sockdir is defined, but you want
LISTEN_STR unconditionally, even if there is no sockdir. Not sure how
that could cleanly be in a single snprintf.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

Well, you only want the unix_socket* if sockdir is defined, but you want
LISTEN_STR unconditionally, even if there is no sockdir.

Really? What will happen when the installation's default is to not
listen on any Unix socket? (unix_socket_directories = '' in
postgresql.conf.)

I'm inclined to think that the "no sockdir" case is broken and you
should get rid of it. If you're starting a postmaster, you can and
should tell it a sockdir, period. If you're running a live check this
code is all moot anyway.

regards, tom lane

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: Yet another failure mode in pg_upgrade

On Sat, Sep 1, 2012 at 02:18:59PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Sep 1, 2012 at 11:45:58AM -0400, Bruce Momjian wrote:

So, in summary, this patch moves the socket directory to the current
directory all but live check operation, and handles different socket
directories for old cluster >= 9.1. I have added a documentation
mention of how to make this work for for pre-9.1 old servers.

Thus completes another "surgery on a moving train" that is pg_upgrade
development.

Oh, one more thing. We have talked about creating some special pipe for
pg_upgrade to communicate the a backend directly, but live check mode
hightlights that we will _still_ need traditional connection abilities
even if we add the pipe ability.

So? By definition, the live check mode is not guaranteed to produce
correct answers, since other connections could be changing the
database's contents. The problem we are interested in solving here is

True.

preventing other connections from occurring when we're doing the upgrade
"for real". All this stuff with moving sockets around is nothing but
security by obscurity; it cannot positively guarantee that there's
nobody else connecting to the database while pg_upgrade runs. (Most
notably, on Windows there's no guarantee at all.)

My point is that we are still going to need traditional connections for
live checks. If we could find a solution for Windows, the socket in
current directory might be enough to lock things down, especially if we
put the socket in a new subdirectory that only we can read/write to.
Should I persue that in my patch?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: Yet another failure mode in pg_upgrade

On Sat, Sep 1, 2012 at 02:43:35PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Well, you only want the unix_socket* if sockdir is defined, but you want
LISTEN_STR unconditionally, even if there is no sockdir.

Really? What will happen when the installation's default is to not
listen on any Unix socket? (unix_socket_directories = '' in
postgresql.conf.)

Well, don't do that then. Locking out TCP seems like a big win.

I'm inclined to think that the "no sockdir" case is broken and you
should get rid of it. If you're starting a postmaster, you can and
should tell it a sockdir, period. If you're running a live check this
code is all moot anyway.

I don't think you understand. The "no sockdir" case is only for live
checks of pre-9.1 old servers, because we can't find the socket
directory being used. Everything else uses the local directory for the
socket. If we remove that case, we can't do live checks on pre-9.1
servers.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

My point is that we are still going to need traditional connections for
live checks.

Yes, but that's not terribly relevant, IMO. All it means is that we
don't want to invent some solution that doesn't go through libpq.

If we could find a solution for Windows, the socket in
current directory might be enough to lock things down, especially if we
put the socket in a new subdirectory that only we can read/write to.

Who is "we"? Somebody else logged in under the postgres userid could
still connect.

Should I persue that in my patch?

I think this is just a band-aid, and we shouldn't be putting more
effort into it than needed to ensure that unexpected configuration
settings won't break it. The right fix is a better form of
standalone-backend mode. Maybe I will go pursue that, since nobody
else seems to want to.

regards, tom lane

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Sep 1, 2012 at 02:43:35PM -0400, Tom Lane wrote:

I'm inclined to think that the "no sockdir" case is broken and you
should get rid of it. If you're starting a postmaster, you can and
should tell it a sockdir, period. If you're running a live check this
code is all moot anyway.

I don't think you understand. The "no sockdir" case is only for live
checks of pre-9.1 old servers, because we can't find the socket
directory being used. Everything else uses the local directory for the
socket. If we remove that case, we can't do live checks on pre-9.1
servers.

If it's a live check, then (a) you aren't restarting the postmaster,
and (b) you wouldn't want to lock out TCP anyway. So adding
--listen-addresses to the string seems pointless and/or wrong.

regards, tom lane

#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#12)
Re: Yet another failure mode in pg_upgrade

On Sat, Sep 1, 2012 at 03:06:57PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sat, Sep 1, 2012 at 02:43:35PM -0400, Tom Lane wrote:

I'm inclined to think that the "no sockdir" case is broken and you
should get rid of it. If you're starting a postmaster, you can and
should tell it a sockdir, period. If you're running a live check this
code is all moot anyway.

I don't think you understand. The "no sockdir" case is only for live
checks of pre-9.1 old servers, because we can't find the socket
directory being used. Everything else uses the local directory for the
socket. If we remove that case, we can't do live checks on pre-9.1
servers.

If it's a live check, then (a) you aren't restarting the postmaster,
and (b) you wouldn't want to lock out TCP anyway. So adding
--listen-addresses to the string seems pointless and/or wrong.

What about the new server? That is still started and stopped. You are
right that this code is never going to be called for the check of a
running old server.

Let's walk through the options:

non-live check:
uses current directory, start/stop old/new servers

live check, old server >= 9.1:
only new server started/stopped, new server uses old server's
socket directory and PGHOST set so clients use the same directory

live check, old server < 9.1:
only new server started/stopped, old/new servers use their
default/configured socket directory

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#14Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#11)
Re: Yet another failure mode in pg_upgrade

On Sat, Sep 1, 2012 at 03:05:01PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

My point is that we are still going to need traditional connections for
live checks.

Yes, but that's not terribly relevant, IMO. All it means is that we
don't want to invent some solution that doesn't go through libpq.

If we could find a solution for Windows, the socket in
current directory might be enough to lock things down, especially if we
put the socket in a new subdirectory that only we can read/write to.

Who is "we"? Somebody else logged in under the postgres userid could
still connect.

But they have to find the current directory to do that; seems unlikely.
They could kill -9 pg_upgrade too if they are the same user id.

Should I persue that in my patch?

I think this is just a band-aid, and we shouldn't be putting more
effort into it than needed to ensure that unexpected configuration
settings won't break it. The right fix is a better form of
standalone-backend mode. Maybe I will go pursue that, since nobody
else seems to want to.

I am worried that is going to be a complex solution to a very minor
problem. Also, how is that going to get backpatched?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#15Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#3)
Re: Yet another failure mode in pg_upgrade

On Sat, Sep 1, 2012 at 11:45 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Aug 13, 2012 at 12:46:43PM +0200, Magnus Hagander wrote:

On Mon, Aug 13, 2012 at 4:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've been experimenting with moving the Unix socket directory to
/var/run/postgresql for the Fedora distribution (don't ask :-().
It's mostly working, but I found out yet another way that pg_upgrade
can crash and burn: it doesn't consider the possibility that the
old or new postmaster is compiled with a different default
unix_socket_directory than what is compiled into the libpq it's using
or that pg_dump is using.

This is another hazard that we could forget about if we had some way for
pg_upgrade to run standalone backends instead of starting a postmaster.

Yeah, that would be nice.

But in the meantime, I suggest it'd be a good idea for pg_upgrade to
explicitly set unix_socket_directory (or unix_socket_directories in
HEAD) when starting the postmasters, and also explicitly set PGHOST
to ensure that the client-side code plays along.

That sounds like a good idea for other reasons as well - manual
connections attempting to get in during an upgrade will just fail with
a "no connection" error, which makes sense...

So, +1.

OK, I looked this over, and I have a patch, attached.

Because we are already playing with socket directories, this patch creates
the socket files in the current directory for upgrades and non-live
checks, but not live checks. This eliminates the "someone accidentally
connects" problem, at least on Unix, plus we are using port 50432
already. This also turns off TCP connections for unix domain socket
systems.

For "live check" operation, you are checking a running server, so
assuming the socket is in the current directory is not going to work.
What the code does is to read the 5th line from the running server's
postmaster.pid file, which has the socket directory in PG >= 9.1. For
pre-9.1, pg_upgrade uses the compiled-in defaults for socket directory.
If the defaults are different between the two servers, the new binaries,
e.g. pg_dump, will not work. The fix is for the user to set pg_upgrade
-O to match the old socket directory, and set PGHOST before running
pg_upgrade. I could not find a good way to generate a proper error
message because we are blind to the socket directory in pre-9.1.
Frankly, this is a problem if the old pre-9.1 server is running in a
user-configured socket directory too, so a documentation addition seems
right here.

So, in summary, this patch moves the socket directory to the current
directory all but live check operation, and handles different socket
directories for old cluster >= 9.1. I have added a documentation
mention of how to make this work for for pre-9.1 old servers.

I don't think this is reducing the number of failure modes; it's just
changing it from one set of obscure cases to a slightly different set
of obscure cases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#15)
Re: Yet another failure mode in pg_upgrade

On Sun, Sep 2, 2012 at 01:06:28AM -0400, Robert Haas wrote:

For "live check" operation, you are checking a running server, so
assuming the socket is in the current directory is not going to work.
What the code does is to read the 5th line from the running server's
postmaster.pid file, which has the socket directory in PG >= 9.1. For
pre-9.1, pg_upgrade uses the compiled-in defaults for socket directory.
If the defaults are different between the two servers, the new binaries,
e.g. pg_dump, will not work. The fix is for the user to set pg_upgrade
-O to match the old socket directory, and set PGHOST before running
pg_upgrade. I could not find a good way to generate a proper error
message because we are blind to the socket directory in pre-9.1.
Frankly, this is a problem if the old pre-9.1 server is running in a
user-configured socket directory too, so a documentation addition seems
right here.

So, in summary, this patch moves the socket directory to the current
directory all but live check operation, and handles different socket
directories for old cluster >= 9.1. I have added a documentation
mention of how to make this work for for pre-9.1 old servers.

I don't think this is reducing the number of failure modes; it's just
changing it from one set of obscure cases to a slightly different set
of obscure cases.

Tom reported problems with having old/new with different default socket
locations. This fixes that, and reduces the possibility of acciental
connections. What problems does this add?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#17Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#7)
1 attachment(s)
Re: Yet another failure mode in pg_upgrade

On Sat, Sep 1, 2012 at 02:35:06PM -0400, Bruce Momjian wrote:

On Sat, Sep 1, 2012 at 02:23:22PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

+ 	/*
+ 	 *	Report the Unix domain socket directory location to the postmaster.
+ 	 */

"Report" seems entirely the wrong verb there.

Fixed.

+ #define LISTEN_STR	" -c listen_addresses=''"
+ 
+ 	/* Have a sockdir to use? */
+ 	if (strlen(os_info.sockdir) != 0)
+ 		snprintf(socket_string, sizeof(socket_string) - strlen(LISTEN_STR),
+ 			" -c %s='%s'",
+ 			(GET_MAJOR_VERSION(cluster->major_version) < 903) ?
+ 				"unix_socket_directory" : "unix_socket_directories",
+ 			os_info.sockdir);
+ 	
+ 	/* prevent TCP/IP connections */
+ 	strcat(socket_string, LISTEN_STR);

IMO this would be simpler and more readable if you got rid of the LISTEN_STR
#define and just included -c listen_addresses='' in the snprintf format
string. The comment for the whole thing should be something like
"If we have a socket directory to use, command the postmaster to use it,
and disable TCP/IP connections altogether".

Well, you only want the unix_socket* if sockdir is defined, but you want
LISTEN_STR unconditionally, even if there is no sockdir. Not sure how
that could cleanly be in a single snprintf.

I restructured the code to add the listen_addresses string first,
allowing the removal of the #define, as Tom suggested. I also added
unix_socket_permissions=0700 to further restrict socket access.

Updated patch attached.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

socket.difftext/x-diff; charset=us-asciiDownload
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 94bce50..8b35d8f
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*************** adjust_data_dir(ClusterInfo *cluster)
*** 376,378 ****
--- 376,431 ----
  
  	check_ok();
  }
+ 
+ 
+ #if HAVE_UNIX_SOCKETS
+ /*
+  * set_sockdir_and_pghost
+  *
+  * Set the socket directory to either the configured location (live check)
+  * or the current directory.
+  */
+ void
+ set_sockdir_and_pghost(bool live_check)
+ {
+ 	if (!live_check)
+ 	{
+ 		/* Use the current directory for the socket */
+ 		if (!getcwd(os_info.sockdir, MAXPGPATH))
+ 			pg_log(PG_FATAL, "cannot find current directory\n");
+ 	}
+ 	else
+ 	{
+ 		/*
+ 		 *	If we are doing a live check, we will use the old cluster's Unix
+ 		 *	domain socket directory so we can connect to the live server.
+ 		 */
+ 
+ 		/* sockdir added to the 5th line of postmaster.pid in PG 9.1 */
+ 		if (GET_MAJOR_VERSION(old_cluster.major_version) >= 901)
+ 		{
+ 			char		filename[MAXPGPATH];
+ 			FILE		*fp;
+ 			int			i;
+ 
+ 			snprintf(filename, sizeof(filename), "%s/postmaster.pid", old_cluster.pgdata);
+ 			if ((fp = fopen(filename, "r")) == NULL)
+ 				pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
+ 
+ 			for (i = 0; i < 5; i++)
+ 				if (fgets(os_info.sockdir, sizeof(os_info.sockdir), fp) == NULL)
+ 					pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
+ 			fclose(fp);
+ 		}
+ 		else
+ 			return;		/* Can't get live sockdir, so use the default. */
+ 
+ 		/* Remove trailing newline */
+ 		if (strchr(os_info.sockdir, '\n') != NULL)
+ 			*strchr(os_info.sockdir, '\n') = '\0';
+ 	}
+ 
+ 	/* For client communication */
+ 	pg_putenv("PGHOST", os_info.sockdir);
+ }
+ #endif
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index c47c8bb..8cad5c3
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*************** main(int argc, char **argv)
*** 88,93 ****
--- 88,97 ----
  	check_cluster_versions();
  	check_cluster_compatibility(live_check);
  
+ #if HAVE_UNIX_SOCKETS
+ 	set_sockdir_and_pghost(live_check);
+ #endif
+ 
  	check_old_cluster(live_check, &sequence_script_file_name);
  
  
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new file mode 100644
index fa4c6c0..a712e21
*** a/contrib/pg_upgrade/pg_upgrade.h
--- b/contrib/pg_upgrade/pg_upgrade.h
*************** typedef struct
*** 267,272 ****
--- 267,275 ----
  	const char *progname;		/* complete pathname for this program */
  	char	   *exec_path;		/* full path to my executable */
  	char	   *user;			/* username for clusters */
+ #if HAVE_UNIX_SOCKETS
+ 	char		sockdir[NAMEDATALEN];	/* directory for Unix Domain socket */
+ #endif
  	char	  **tablespaces;	/* tablespaces */
  	int			num_tablespaces;
  	char	  **libraries;		/* loadable libraries */
*************** void print_maps(FileNameMap *maps, int n
*** 387,392 ****
--- 390,398 ----
  
  void		parseCommandLine(int argc, char *argv[]);
  void		adjust_data_dir(ClusterInfo *cluster);
+ #if HAVE_UNIX_SOCKETS
+ void		set_sockdir_and_pghost(bool live_check);
+ #endif
  
  /* relfilenode.c */
  
diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
new file mode 100644
index 1fb0d6c..3d84f83
*** a/contrib/pg_upgrade/server.c
--- b/contrib/pg_upgrade/server.c
*************** start_postmaster(ClusterInfo *cluster)
*** 144,149 ****
--- 144,150 ----
  	PGconn	   *conn;
  	bool		exit_hook_registered = false;
  	bool		pg_ctl_return = false;
+ 	char		socket_string[MAXPGPATH + 200];
  
  	if (!exit_hook_registered)
  	{
*************** start_postmaster(ClusterInfo *cluster)
*** 151,156 ****
--- 152,174 ----
  		exit_hook_registered = true;
  	}
  
+ 	socket_string[0] = '\0';
+ 
+ #ifdef HAVE_UNIX_SOCKETS
+ 	/* prevent TCP/IP connections, restrict socket access */
+ 	strcat(socket_string,
+ 		   " -c listen_addresses='' -c unix_socket_permissions=0700");
+ 
+ 	/* Have a sockdir?  Tell the postmaster. */
+ 	if (strlen(os_info.sockdir) != 0)
+ 		snprintf(socket_string + strlen(socket_string),
+ 				 sizeof(socket_string) - strlen(socket_string),
+ 				 " -c %s='%s'",
+ 				 (GET_MAJOR_VERSION(cluster->major_version) < 903) ?
+ 					"unix_socket_directory" : "unix_socket_directories",
+ 				 os_info.sockdir);
+ #endif
+ 
  	/*
  	 * Using autovacuum=off disables cleanup vacuum and analyze, but freeze
  	 * vacuums can still happen, so we set autovacuum_freeze_max_age to its
*************** start_postmaster(ClusterInfo *cluster)
*** 159,170 ****
  	 * not touch them.
  	 */
  	snprintf(cmd, sizeof(cmd),
! 			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d %s %s\" start",
  		  cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
  			 (cluster->controldata.cat_ver >=
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? "-b" :
  			 "-c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
! 			 cluster->pgopts ? cluster->pgopts : "");
  
  	/*
  	 * Don't throw an error right away, let connecting throw the error because
--- 177,188 ----
  	 * not touch them.
  	 */
  	snprintf(cmd, sizeof(cmd),
! 			 "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d %s %s%s\" start",
  		  cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
  			 (cluster->controldata.cat_ver >=
  			  BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? "-b" :
  			 "-c autovacuum=off -c autovacuum_freeze_max_age=2000000000",
! 			 cluster->pgopts ? cluster->pgopts : "", socket_string);
  
  	/*
  	 * Don't throw an error right away, let connecting throw the error because
diff --git a/doc/src/sgml/pgupgrade.sgml b/doc/src/sgml/pgupgrade.sgml
new file mode 100644
index b62aba2..b621911
*** a/doc/src/sgml/pgupgrade.sgml
--- b/doc/src/sgml/pgupgrade.sgml
*************** psql --username postgres --file script.s
*** 520,525 ****
--- 520,532 ----
    </para>
  
    <para>
+    If running check on an old pre-9.1 Unix-like running server, and the
+    old and new servers use different Unix-domain socket directories,
+    use the <option>-O</> option so the new server uses the same socket
+    directory as the old server, and set <envar>PGHOST</> similarly.
+   </para>
+ 
+   <para>
     A Log-Shipping Standby Server (<xref linkend="warm-standby">) cannot
     be upgraded because the server must allow writes.  The simplest way
     is to upgrade the primary and use rsync to rebuild the standbys.
#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#16)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Sep 2, 2012 at 01:06:28AM -0400, Robert Haas wrote:

I don't think this is reducing the number of failure modes; it's just
changing it from one set of obscure cases to a slightly different set
of obscure cases.

Tom reported problems with having old/new with different default socket
locations. This fixes that, and reduces the possibility of acciental
connections. What problems does this add?

I'm going to be needing some fix in this area in any case, though
whether it's exactly Bruce's current patch is not clear to me. I found
out last night while making a test build of 9.2rc1 as a Fedora package
that pg_upgrade's regression test fails in the Fedora build environment,
if the postmaster has been patched so that its default socket directory
is /var/run/postgresql. That happens because /var/run/postgresql
doesn't exist in the build environment (it is only going to exist once
the postgresql-server package is installed), so the postmaster fails
to start because it can't create a socket where it expects to.
I have a patch to pg_regress that instructs the temporary postmaster
to use /tmp as unix_socket_directory regardless of its built-in default,
so that "make check" works for the regular core and contrib regression
tests. However, that doesn't affect pg_upgrade's regression test case.

It looks rather messy to persuade pg_upgrade to do things differently
for regression testing and normal use, not to mention that it would make
the test even less representative of normal use. So I'm thinking that
we do need the pg_upgrade feature Bruce is suggesting of forcing the
socket directory to be the current directory. What's more, if that's
not back-patched to 9.2, I'm going to have to carry it as a Fedora patch
anyway.

Alternatively, I can prevent "make check" from testing pg_upgrade
(which is what I did so I could carry on with package testing).
I'd just as soon not ship it like that, though.

regards, tom lane

#19Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#18)
Re: Yet another failure mode in pg_upgrade

On Sun, Sep 2, 2012 at 01:13:52PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Sep 2, 2012 at 01:06:28AM -0400, Robert Haas wrote:

I don't think this is reducing the number of failure modes; it's just
changing it from one set of obscure cases to a slightly different set
of obscure cases.

Tom reported problems with having old/new with different default socket
locations. This fixes that, and reduces the possibility of acciental
connections. What problems does this add?

I'm going to be needing some fix in this area in any case, though
whether it's exactly Bruce's current patch is not clear to me. I found
out last night while making a test build of 9.2rc1 as a Fedora package
that pg_upgrade's regression test fails in the Fedora build environment,
if the postmaster has been patched so that its default socket directory
is /var/run/postgresql. That happens because /var/run/postgresql
doesn't exist in the build environment (it is only going to exist once
the postgresql-server package is installed), so the postmaster fails
to start because it can't create a socket where it expects to.
I have a patch to pg_regress that instructs the temporary postmaster
to use /tmp as unix_socket_directory regardless of its built-in default,
so that "make check" works for the regular core and contrib regression
tests. However, that doesn't affect pg_upgrade's regression test case.

It looks rather messy to persuade pg_upgrade to do things differently
for regression testing and normal use, not to mention that it would make
the test even less representative of normal use. So I'm thinking that
we do need the pg_upgrade feature Bruce is suggesting of forcing the
socket directory to be the current directory. What's more, if that's
not back-patched to 9.2, I'm going to have to carry it as a Fedora patch
anyway.

Alternatively, I can prevent "make check" from testing pg_upgrade
(which is what I did so I could carry on with package testing).
I'd just as soon not ship it like that, though.

Well, I don't know of any known problems with the patch. On the other
hand, I don't know our policy in pushing patches into RC releases at the
request of packagers.

If you want to stand behind the patch, it might be OK. I think that's
how we handle such requests --- someone has to put their neck out for
it. Fortunately the patch is not very large so is easy to review.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#17)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

Updated patch attached.

[ looks at that for a bit... ] Now I see why you were on about that:
the method you used here requires both clusters to have the same socket
directory. Which is silly and unnecessary. Revised patch attached.

regards, tom lane

#21Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#20)
1 attachment(s)
Re: Yet another failure mode in pg_upgrade

On Sun, Sep 2, 2012 at 11:47:06PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Updated patch attached.

[ looks at that for a bit... ] Now I see why you were on about that:
the method you used here requires both clusters to have the same socket
directory. Which is silly and unnecessary. Revised patch attached.

I was trying to avoid setting the host for every client database
application, so I set PGHOST, but your use of get_db_conn() has
simplified that, which I had not considered. Also, you will need to
update the comment above cluster_conn_opts() to match your new function
name.

I am working on an additional enhancement that also pulls the live
cluster's port number from the postmaster.pid file. I am attaching the
part of my patch that was modified to add that feature. This allows
live checks without requiring any port numbers to be specified. Let me
know if you would like me to email you that merged into your patch, if
you want it for 9.2.

Also, I don't see my doc addition on your patch; was that intentional?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

sock.difftext/x-diff; charset=us-asciiDownload
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index 0fec73e..e7574a2
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*************** output_check_banner(bool *live_check)
*** 29,40 ****
  	if (user_opts.check && is_server_running(old_cluster.pgdata))
  	{
  		*live_check = true;
- 		if (old_cluster.port == DEF_PGUPORT)
- 			pg_log(PG_FATAL, "When checking a live old server, "
- 				   "you must specify the old server's port number.\n");
- 		if (old_cluster.port == new_cluster.port)
- 			pg_log(PG_FATAL, "When checking a live server, "
- 				   "the old and new port numbers must be different.\n");
  		pg_log(PG_REPORT, "Performing Consistency Checks on Old Live Server\n");
  		pg_log(PG_REPORT, "------------------------------------------------\n");
  	}
--- 29,34 ----
*************** check_cluster_compatibility(bool live_ch
*** 291,296 ****
--- 285,300 ----
  		new_cluster.controldata.cat_ver < TABLE_SPACE_SUBDIRS_CAT_VER)
  		pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n"
  			   "because of backend API changes made during development.\n");
+ 
+ 	/* We read the real port number for PG >= 9.1 */
+ 	if (GET_MAJOR_VERSION(old_cluster.major_version) < 901 &&
+ 		old_cluster.port == DEF_PGUPORT)
+ 			pg_log(PG_FATAL, "When checking a pre-PG 9.1 live old server, "
+ 				   "you must specify the old server's port number.\n");
+ 
+ 	if (old_cluster.port == new_cluster.port)
+ 		pg_log(PG_FATAL, "When checking a live server, "
+ 			   "the old and new port numbers must be different.\n");
  }
  
  
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 94bce50..e511839
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*************** adjust_data_dir(ClusterInfo *cluster)
*** 376,378 ****
--- 376,439 ----
  
  	check_ok();
  }
+ 
+ 
+ /*
+  * get_sockdir_and_live_port
+  *
+  * Set the socket directory to either the configured location (live check)
+  * or the current directory, and live port number.
+  */
+ void
+ get_sockdir_and_live_port(bool live_check)
+ {
+ 	if (!live_check)
+ 	{
+ #ifdef HAVE_UNIX_SOCKETS
+ 		/* Use the current directory for the socket */
+ 		if (!getcwd(os_info.sockdir, MAXPGPATH))
+ 			pg_log(PG_FATAL, "cannot find current directory\n");
+ #endif
+ 	}
+ 	/* port/sockdir added to the 4th/5th line of postmaster.pid in PG 9.1 */
+ 	else if (GET_MAJOR_VERSION(old_cluster.major_version) >= 901)
+ 	{
+ 		/* get live port number and sockdir */
+ 
+ 		char		filename[MAXPGPATH], line[MAXPGPATH];
+ 		FILE		*fp;
+ 		int			i;
+ 		unsigned short orig_old_port = old_cluster.port;
+ 
+ 		snprintf(filename, sizeof(filename), "%s/postmaster.pid", old_cluster.pgdata);
+ 		if ((fp = fopen(filename, "r")) == NULL)
+ 			pg_log(PG_FATAL, "Cannot open file %s: %m\n", filename);
+ 	
+ 		for (i = 0; i < 5; i++)
+ 		{
+ 			if (fgets(line, sizeof(line), fp) == NULL)
+ 				pg_log(PG_FATAL, "Cannot read line %d from %s: %m\n", i + 1, filename);
+ 			/* potentially overwrite user-supplied value */
+ 			if (i == 3)
+ 				sscanf(line, "%hu", &old_cluster.port);
+ #ifdef HAVE_UNIX_SOCKETS
+ 			if (i == 4)
+ 				/* strip off newline */
+ 				sscanf(line, "%s\n", os_info.sockdir);
+ #endif
+ 		}
+ 
+ 		/* warn of port number correction */
+ 		if (old_cluster.port != orig_old_port && orig_old_port != DEF_PGUPORT)
+ 			pg_log(PG_WARNING, "User-supplied old port number %hu corrected to %hu\n",
+ 			orig_old_port, old_cluster.port);
+ 			
+ 		fclose(fp);
+ 	}
+ 
+ #ifdef HAVE_UNIX_SOCKETS
+ 	if (strlen(os_info.sockdir) != 0)
+ 		/* For client communication */
+ 		pg_putenv("PGHOST", os_info.sockdir);
+ #endif
+ }
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index c47c8bb..6b41559
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*************** main(int argc, char **argv)
*** 86,96 ****
  	setup(argv[0], live_check);
  
  	check_cluster_versions();
  	check_cluster_compatibility(live_check);
  
  	check_old_cluster(live_check, &sequence_script_file_name);
  
- 
  	/* -- NEW -- */
  	start_postmaster(&new_cluster);
  
--- 86,97 ----
  	setup(argv[0], live_check);
  
  	check_cluster_versions();
+ 	get_sockdir_and_live_port(live_check);
+ 
  	check_cluster_compatibility(live_check);
  
  	check_old_cluster(live_check, &sequence_script_file_name);
  
  	/* -- NEW -- */
  	start_postmaster(&new_cluster);
  
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

Also, I don't see my doc addition on your patch; was that intentional?

It's not necessary, no? The code now gets socket directory right
without help.

regards, tom lane

#23Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#22)
Re: Yet another failure mode in pg_upgrade

On Mon, Sep 3, 2012 at 10:07:43AM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Also, I don't see my doc addition on your patch; was that intentional?

It's not necessary, no? The code now gets socket directory right
without help.

Well, the doc comment is:

+    If running check on an old pre-9.1 Unix-like running server, and the
+    old and new servers use different Unix-domain socket directories,
+    use the <option>-O</> option so the new server uses the same socket
+    directory as the old server, and set <envar>PGHOST</> similarly.

Remember, we can't get the socket directory for pre-9.1 servers. When
you and I added that for 9.1, we did it for pg_ctl, which was needed by
pg_upgrade. We were not aware we would eventually be using it for
pg_upgrade directly, but it is the logical progression of pg_ctl not
working cleanly for non-standard socket and port numbers.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#23)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Sep 3, 2012 at 10:07:43AM -0400, Tom Lane wrote:

It's not necessary, no? The code now gets socket directory right
without help.

Well, the doc comment is:

+    If running check on an old pre-9.1 Unix-like running server, and the
+    old and new servers use different Unix-domain socket directories,
+    use the <option>-O</> option so the new server uses the same socket
+    directory as the old server, and set <envar>PGHOST</> similarly.

Remember, we can't get the socket directory for pre-9.1 servers.

Yeah, but even if you assume that pg_upgrade needs help for that, this
doc is wrong and overcomplicated. The case that would be problematic
is where the old server is running with a unix_socket_directory that
is not the same as the default built into pg_upgrade's libpq. It is
not necessary (given my patch) that the new server match that socket
directory. What is necessary is that pg_upgrade be able to contact
the old server. I think, but haven't tested, that it's sufficient
to set PGHOST to make that work. The patched code will override
PGHOST for the new cluster anyway, since it will always specify
--host as the new-cluster socket directory.

regards, tom lane

#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#24)
Re: Yet another failure mode in pg_upgrade

On Mon, Sep 3, 2012 at 10:42:38AM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Mon, Sep 3, 2012 at 10:07:43AM -0400, Tom Lane wrote:

It's not necessary, no? The code now gets socket directory right
without help.

Well, the doc comment is:

+    If running check on an old pre-9.1 Unix-like running server, and the
+    old and new servers use different Unix-domain socket directories,
+    use the <option>-O</> option so the new server uses the same socket
+    directory as the old server, and set <envar>PGHOST</> similarly.

Remember, we can't get the socket directory for pre-9.1 servers.

Yeah, but even if you assume that pg_upgrade needs help for that, this
doc is wrong and overcomplicated. The case that would be problematic
is where the old server is running with a unix_socket_directory that
is not the same as the default built into pg_upgrade's libpq. It is
not necessary (given my patch) that the new server match that socket
directory. What is necessary is that pg_upgrade be able to contact
the old server. I think, but haven't tested, that it's sufficient
to set PGHOST to make that work. The patched code will override
PGHOST for the new cluster anyway, since it will always specify
--host as the new-cluster socket directory.

Yes, agreed. With your change, the new server doesn't have to match the
old server's sockdir anymore; it is always going to use the current
directory.

So the text should be:

If running check on an old pre-9.1 Unix-like running server, and
the old running server is using a Unix-domain socket directory
different from the default on the new server, set <envar>PGHOST</>
to point to the socket location of the old server.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
Re: Yet another failure mode in pg_upgrade

Bruce Momjian <bruce@momjian.us> writes:

I am working on an additional enhancement that also pulls the live
cluster's port number from the postmaster.pid file. I am attaching the
part of my patch that was modified to add that feature. This allows
live checks without requiring any port numbers to be specified. Let me
know if you would like me to email you that merged into your patch, if
you want it for 9.2.

That seems like a reasonable idea, but how are we going to go about
merging the various proposed patches here? Do you want me to commit
what I've got, and then you can rebase this port change on that?

As far as 9.2 vs HEAD goes, I think we should try to converge 9.2 and
HEAD as closely as possible, just for our own sanity in future
back-patching. Right now I think the only necessary difference is
the int64-XLogRecPtr changes in HEAD.

regards, tom lane

#27Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#26)
Re: Yet another failure mode in pg_upgrade

On Mon, Sep 3, 2012 at 12:16:04PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

I am working on an additional enhancement that also pulls the live
cluster's port number from the postmaster.pid file. I am attaching the
part of my patch that was modified to add that feature. This allows
live checks without requiring any port numbers to be specified. Let me
know if you would like me to email you that merged into your patch, if
you want it for 9.2.

That seems like a reasonable idea, but how are we going to go about
merging the various proposed patches here? Do you want me to commit
what I've got, and then you can rebase this port change on that?

Sure, that is easy. As you can see, it pulls the port from the pid
file, and it just moves a few functions around, and moves the port
number check later in the process. It looks larger than it is.

As far as 9.2 vs HEAD goes, I think we should try to converge 9.2 and
HEAD as closely as possible, just for our own sanity in future
back-patching. Right now I think the only necessary difference is
the int64-XLogRecPtr changes in HEAD.

Sure, understand.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#28Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#21)
1 attachment(s)
Re: Yet another failure mode in pg_upgrade

On Mon, Sep 3, 2012 at 09:52:22AM -0400, Bruce Momjian wrote:

On Sun, Sep 2, 2012 at 11:47:06PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Updated patch attached.

[ looks at that for a bit... ] Now I see why you were on about that:
the method you used here requires both clusters to have the same socket
directory. Which is silly and unnecessary. Revised patch attached.

I was trying to avoid setting the host for every client database
application, so I set PGHOST, but your use of get_db_conn() has
simplified that, which I had not considered. Also, you will need to
update the comment above cluster_conn_opts() to match your new function
name.

I am working on an additional enhancement that also pulls the live
cluster's port number from the postmaster.pid file. I am attaching the
part of my patch that was modified to add that feature. This allows
live checks without requiring any port numbers to be specified. Let me
know if you would like me to email you that merged into your patch, if
you want it for 9.2.

Also, I don't see my doc addition on your patch; was that intentional?

I have applied a modified version of the above patch, attached, to head
and 9.2. One additional change in this patch is that the current
directory is not used for pre-9.1 servers, in live check and non-live
check mode, because pre-9.1 pg_ctl -w can't handle sockets in
non-default locations; the info isn't in postmaster.pid for it to use.
I found this in testing the patch and using a script that tests all
possible major version combinations.

I am concerned this will hamper Tom's attempt to allow for changes to
the compiled-in socket location for pre-9.1 servers, but I don't see a
solution. FYI, all the binaries will be from the new server, and will
have the new compiled-in socket location. I suggest using -o to pass in
the socket location. The new server will use the current directory just
fine. Tom's patch allowing the old and new servers to use different
socket directories helps in this regard.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

pg_upgrade.difftext/x-diff; charset=us-asciiDownload
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index d965fa8..c89b2df
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*************** output_check_banner(bool *live_check)
*** 58,69 ****
  	if (user_opts.check && is_server_running(old_cluster.pgdata))
  	{
  		*live_check = true;
- 		if (old_cluster.port == DEF_PGUPORT)
- 			pg_log(PG_FATAL, "When checking a live old server, "
- 				   "you must specify the old server's port number.\n");
- 		if (old_cluster.port == new_cluster.port)
- 			pg_log(PG_FATAL, "When checking a live server, "
- 				   "the old and new port numbers must be different.\n");
  		pg_log(PG_REPORT, "Performing Consistency Checks on Old Live Server\n");
  		pg_log(PG_REPORT, "------------------------------------------------\n");
  	}
--- 58,63 ----
*************** check_cluster_compatibility(bool live_ch
*** 320,325 ****
--- 314,329 ----
  		new_cluster.controldata.cat_ver < TABLE_SPACE_SUBDIRS_CAT_VER)
  		pg_log(PG_FATAL, "This utility can only upgrade to PostgreSQL version 9.0 after 2010-01-11\n"
  			   "because of backend API changes made during development.\n");
+ 
+ 	/* We read the real port number for PG >= 9.1 */
+ 	if (live_check && GET_MAJOR_VERSION(old_cluster.major_version) < 901 &&
+ 		old_cluster.port == DEF_PGUPORT)
+ 			pg_log(PG_FATAL, "When checking a pre-PG 9.1 live old server, "
+ 				   "you must specify the old server's port number.\n");
+ 
+ 	if (live_check && old_cluster.port == new_cluster.port)
+ 		pg_log(PG_FATAL, "When checking a live server, "
+ 			   "the old and new port numbers must be different.\n");
  }
  
  
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 6d5a93a..19053fa
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*************** void
*** 392,440 ****
  get_sock_dir(ClusterInfo *cluster, bool live_check)
  {
  #ifdef HAVE_UNIX_SOCKETS
! 	if (!live_check)
! 	{
! 		/* Use the current directory for the socket */
! 		cluster->sockdir = pg_malloc(MAXPGPATH);
! 		if (!getcwd(cluster->sockdir, MAXPGPATH))
! 			pg_log(PG_FATAL, "cannot find current directory\n");
! 	}
! 	else
  	{
! 		/*
! 		 *	If we are doing a live check, we will use the old cluster's Unix
! 		 *	domain socket directory so we can connect to the live server.
! 		 */
! 
! 		/* sockdir was added to postmaster.pid in PG 9.1 */
! 		if (GET_MAJOR_VERSION(cluster->major_version) >= 901)
  		{
! 			char		filename[MAXPGPATH];
  			FILE		*fp;
! 			int			i;
! 
  			snprintf(filename, sizeof(filename), "%s/postmaster.pid",
  					 cluster->pgdata);
  			if ((fp = fopen(filename, "r")) == NULL)
! 				pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
! 
! 			cluster->sockdir = pg_malloc(MAXPGPATH);
! 			for (i = 0; i < LOCK_FILE_LINE_SOCKET_DIR; i++)
! 				if (fgets(cluster->sockdir, MAXPGPATH, fp) == NULL)
! 					pg_log(PG_FATAL, "Could not get socket directory of the old server\n");
! 
  			fclose(fp);
! 
! 			/* Remove trailing newline */
! 			if (strchr(cluster->sockdir, '\n') != NULL)
! 				*strchr(cluster->sockdir, '\n') = '\0';
! 		}
! 		else
! 		{
! 			/* Can't get live sockdir, so assume the default is OK. */
! 			cluster->sockdir = NULL;
  		}
  	}
  #else /* !HAVE_UNIX_SOCKETS */
  	cluster->sockdir = NULL;
  #endif
--- 392,456 ----
  get_sock_dir(ClusterInfo *cluster, bool live_check)
  {
  #ifdef HAVE_UNIX_SOCKETS
! 	/*
! 	 *	sockdir and port were added to postmaster.pid in PG 9.1.
! 	 *	Pre-9.1 cannot process pg_ctl -w for sockets in non-default
! 	 *	locations.
! 	 */
! 	if (GET_MAJOR_VERSION(cluster->major_version) >= 901)
  	{
! 		if (!live_check)
  		{
! 			/* Use the current directory for the socket */
! 			cluster->sockdir = pg_malloc(MAXPGPATH);
! 			if (!getcwd(cluster->sockdir, MAXPGPATH))
! 				pg_log(PG_FATAL, "cannot find current directory\n");
! 		}
! 		else
! 		{
! 			/*
! 			 *	If we are doing a live check, we will use the old cluster's Unix
! 			 *	domain socket directory so we can connect to the live server.
! 			 */
! 			unsigned short orig_port = cluster->port;
! 			char		filename[MAXPGPATH], line[MAXPGPATH];
  			FILE		*fp;
! 			int			lineno;
! 	
  			snprintf(filename, sizeof(filename), "%s/postmaster.pid",
  					 cluster->pgdata);
  			if ((fp = fopen(filename, "r")) == NULL)
! 				pg_log(PG_FATAL, "Cannot open file %s: %m\n", filename);
! 	
! 			for (lineno = 1;
! 				 lineno <= Max(LOCK_FILE_LINE_PORT, LOCK_FILE_LINE_SOCKET_DIR);
! 				 lineno++)
! 			{
! 				if (fgets(line, sizeof(line), fp) == NULL)
! 					pg_log(PG_FATAL, "Cannot read line %d from %s: %m\n", lineno, filename);
! 	
! 				/* potentially overwrite user-supplied value */
! 				if (lineno == LOCK_FILE_LINE_PORT)
! 					sscanf(line, "%hu", &old_cluster.port);
! 				if (lineno == LOCK_FILE_LINE_SOCKET_DIR)
! 				{
! 					cluster->sockdir = pg_malloc(MAXPGPATH);
! 					/* strip off newline */
! 					sscanf(line, "%s\n", cluster->sockdir);
! 				}
! 			}
  			fclose(fp);
! 	
! 			/* warn of port number correction */
! 			if (orig_port != DEF_PGUPORT && old_cluster.port != orig_port)
! 				pg_log(PG_WARNING, "User-supplied old port number %hu corrected to %hu\n",
! 				orig_port, cluster->port);
  		}
  	}
+ 	else
+ 		/* Can't get sockdir and pg_ctl -w can't use a non-default, use default */
+ 		cluster->sockdir = NULL;
+ 
  #else /* !HAVE_UNIX_SOCKETS */
  	cluster->sockdir = NULL;
  #endif
diff --git a/contrib/pg_upgrade/pg_upgrade.c b/contrib/pg_upgrade/pg_upgrade.c
new file mode 100644
index ee3a151..4d2e79c
*** a/contrib/pg_upgrade/pg_upgrade.c
--- b/contrib/pg_upgrade/pg_upgrade.c
*************** main(int argc, char **argv)
*** 86,96 ****
  	setup(argv[0], live_check);
  
  	check_cluster_versions();
- 	check_cluster_compatibility(live_check);
  
  	get_sock_dir(&old_cluster, live_check);
  	get_sock_dir(&new_cluster, false);
  
  	check_old_cluster(live_check, &sequence_script_file_name);
  
  
--- 86,97 ----
  	setup(argv[0], live_check);
  
  	check_cluster_versions();
  
  	get_sock_dir(&old_cluster, live_check);
  	get_sock_dir(&new_cluster, false);
  
+ 	check_cluster_compatibility(live_check);
+ 
  	check_old_cluster(live_check, &sequence_script_file_name);