pg_upgrade from 9.4 to 10.4

Started by Vimalraj Aover 7 years ago14 messages
#1Vimalraj A
vimal1805@gmail.com

Hi,

After pg_upgrade, the data in Postgres is corrupted.

1. Postgres 9.4, stop db with "immediate" mode
2. Run pg_upgrade, to upgrade to Postgres 10.4
3. Start Postgres 10 and run vacuum full, got a bunch of "concurrent insert
in progress". Looks like the data is corrupted. (Loading the old data back
on Postgres 9.4 works just fine)

But if I stop the 9.4 DB with "smart" mode, the data after pg_upgrade looks
fine.

pg_upgrade doesn't stop or throw warnings if the upgraded db gets into
corrupted state.

I would like to understand why it happens so.
1. What transient state corrupts the db?
2. Is it a known issue with pg_upgrade?
3. Is there a way to get the data from pg_upgrade after "immediate" mode
stop of previous version?

Thank you.

Regards,
Vimal.

#2Bruce Momjian
bruce@momjian.us
In reply to: Vimalraj A (#1)
1 attachment(s)
Re: pg_upgrade from 9.4 to 10.4

On Fri, Jun 15, 2018 at 03:01:37PM -0700, Vimalraj A wrote:

Hi,

After pg_upgrade, the data in Postgres is corrupted.�

1. Postgres 9.4, stop db with "immediate" mode
2. Run pg_upgrade, to upgrade to Postgres 10.4
3. Start Postgres 10 and run vacuum full, got a bunch of "concurrent insert in
progress". Looks like the data is corrupted. (Loading the old data back on
Postgres 9.4 works just fine)

But if I stop the 9.4 DB with "smart" mode, the data after pg_upgrade looks
fine.�

pg_upgrade doesn't stop or throw warnings if the upgraded db gets into
corrupted state.�

I would like to understand why it happens so.�
1. What transient state corrupts the db?
2. Is it a known issue with pg_upgrade?�
3. Is there a way to get the data from pg_upgrade after "immediate" mode stop
of previous version?

Well, that's interesting. We document to shut down the old and new
sever with pg_ctl stop, but don't specify to avoid immediate.

The reason you are having problems is that pg_upgrade does not copy the
WAL from the old cluster to the new one, so there is no way to replay
the needed WAL during startup of the new server, which leads to
corruption. Did you find this out in testing or in actual use?

What is also interesting is how pg_upgrade tries to avoid problems with
_crash_ shutdowns --- if it sees a postmaster lock file, it tries to
start the server, and if that works, it then stops it, causing the WAL
to be replayed and cleanly shutdown. What it _doesn't_ handle is pg_ctl
-m immediate, which removes the lock file, but does leave WAL in need of
replay. Oops!

Ideally we could detect this before we check pg_controldata and then do
the start/stop trick to fix the WAL, but the ordering of the code makes
that hard. Instead, I have developed the attached patch which does a
check for the cluster state at the same time we are checking
pg_controldata, and reports an error if there is not a clean shutdown.
Based on how rare this is, this is probably the cleanest solution, and I
think can be backpatched.

Patch attached.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Attachments:

upgrade.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
new file mode 100644
index 0fe98a5..bba3b1b
*** a/src/bin/pg_upgrade/controldata.c
--- b/src/bin/pg_upgrade/controldata.c
*************** get_control_data(ClusterInfo *cluster, b
*** 58,63 ****
--- 58,64 ----
  	bool		got_large_object = false;
  	bool		got_date_is_int = false;
  	bool		got_data_checksum_version = false;
+ 	bool		got_cluster_state = false;
  	char	   *lc_collate = NULL;
  	char	   *lc_ctype = NULL;
  	char	   *lc_monetary = NULL;
*************** get_control_data(ClusterInfo *cluster, b
*** 423,428 ****
--- 424,487 ----
  	pclose(output);
  
  	/*
+ 	 * Check for clean shutdown
+ 	 */
+ 
+ 	/* only pg_controldata outputs the cluster state */
+ 	snprintf(cmd, sizeof(cmd), "\"%s/pg_controldata\" \"%s\"",
+ 			 cluster->bindir, cluster->pgdata);
+ 	fflush(stdout);
+ 	fflush(stderr);
+ 
+ 	if ((output = popen(cmd, "r")) == NULL)
+ 		pg_fatal("could not get control data using %s: %s\n",
+ 				 cmd, strerror(errno));
+ 
+ 	/* we have the result of cmd in "output". so parse it line by line now */
+ 	while (fgets(bufin, sizeof(bufin), output))
+ 	{
+ 		if ((!live_check || cluster == &new_cluster) &&
+ 			(p = strstr(bufin, "Database cluster state:")) != NULL)
+ 		{
+ 			p = strchr(p, ':');
+ 
+ 			if (p == NULL || strlen(p) <= 1)
+ 				pg_fatal("%d: database cluster state problem\n", __LINE__);
+ 
+ 			p++;				/* remove ':' char */
+ 
+ 			/*
+ 			 * We checked earlier for a postmaster lock file, and if we found
+ 			 * one, we tried to start/stop the server to replay the WAL.  However,
+ 			 * pg_ctl -m immediate doesn't leave a lock file, but does require
+ 			 * WAL replay, so we check here that the server was shut down cleanly,
+ 			 * from the controldata perspective.
+ 			 */
+ 			/* remove leading spaces */
+ 			while (*p == ' ')
+ 				p++;
+ 			if (strcmp(p, "shut down\n") != 0)
+ 			{
+ 				if (cluster == &old_cluster)
+ 					pg_fatal("The source cluster was not shut down cleanly.\n");
+ 				else
+ 					pg_fatal("The target cluster was not shut down cleanly.\n");
+ 			}
+ 			got_cluster_state = true;
+ 		}
+ 	}
+ 
+ 	pclose(output);
+ 
+ 	if (!got_cluster_state)
+ 	{
+ 		if (cluster == &old_cluster)
+ 			pg_fatal("The source cluster lacks cluster state information:\n");
+ 		else
+ 			pg_fatal("The target cluster lacks cluster state information:\n");
+ 	}
+ 
+ 	/*
  	 * Restore environment variables
  	 */
  	pg_putenv("LC_COLLATE", lc_collate);
#3Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#2)
Re: pg_upgrade from 9.4 to 10.4

On Sat, Jun 23, 2018 at 10:29:49PM -0400, Bruce Momjian wrote:

On Fri, Jun 15, 2018 at 03:01:37PM -0700, Vimalraj A wrote:

I would like to understand why it happens so.�
1. What transient state corrupts the db?
2. Is it a known issue with pg_upgrade?�
3. Is there a way to get the data from pg_upgrade after "immediate" mode stop
of previous version?

Well, that's interesting. We document to shut down the old and new
sever with pg_ctl stop, but don't specify to avoid immediate.

The reason you are having problems is that pg_upgrade does not copy the
WAL from the old cluster to the new one, so there is no way to replay
the needed WAL during startup of the new server, which leads to
corruption. Did you find this out in testing or in actual use?

What is also interesting is how pg_upgrade tries to avoid problems with
_crash_ shutdowns --- if it sees a postmaster lock file, it tries to
start the server, and if that works, it then stops it, causing the WAL
to be replayed and cleanly shutdown. What it _doesn't_ handle is pg_ctl
-m immediate, which removes the lock file, but does leave WAL in need of
replay. Oops!

Ideally we could detect this before we check pg_controldata and then do
the start/stop trick to fix the WAL, but the ordering of the code makes
that hard. Instead, I have developed the attached patch which does a
check for the cluster state at the same time we are checking
pg_controldata, and reports an error if there is not a clean shutdown.
Based on how rare this is, this is probably the cleanest solution, and I
think can be backpatched.

Updated patch applied through 9.3:

https://git.postgresql.org/pg/commitdiff/260fe9f2b02b67de1e5ff29faf123e4220586c43

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: pg_upgrade from 9.4 to 10.4

Bruce Momjian <bruce@momjian.us> writes:

Updated patch applied through 9.3:
https://git.postgresql.org/pg/commitdiff/260fe9f2b02b67de1e5ff29faf123e4220586c43

This patch evidently broke buildfarm member jacana, although only
in the 9.3 branch. It's been failing with

Jul 28 23:22:29 Performing Consistency Checks
Jul 28 23:22:29 -----------------------------
Jul 28 23:22:29 Checking cluster versions ok
Jul 28 23:22:30 The system cannot find the path specified.
Jul 28 23:22:30
Jul 28 23:22:30 The source cluster lacks cluster state information:
Jul 28 23:22:30 Failure, exiting
Jul 28 23:22:30 make: *** [check] Error 1

regards, tom lane

#5Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
Re: pg_upgrade from 9.4 to 10.4

On Tue, Jul 31, 2018 at 03:26:47PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Updated patch applied through 9.3:
https://git.postgresql.org/pg/commitdiff/260fe9f2b02b67de1e5ff29faf123e4220586c43

This patch evidently broke buildfarm member jacana, although only
in the 9.3 branch. It's been failing with

Jul 28 23:22:29 Performing Consistency Checks
Jul 28 23:22:29 -----------------------------
Jul 28 23:22:29 Checking cluster versions ok
Jul 28 23:22:30 The system cannot find the path specified.
Jul 28 23:22:30
Jul 28 23:22:30 The source cluster lacks cluster state information:
Jul 28 23:22:30 Failure, exiting
Jul 28 23:22:30 make: *** [check] Error 1

Well, that's interesting. I have a test script that upgrades all
version combinations of Postgres from 9.3 to head and it worked for me.
However, I now see that the tests for checking a live server were wrong,
and 9.3 must have a slightly different test than the others. Fixes
applied. Thanks for finding this so quickly.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#5)
Re: pg_upgrade from 9.4 to 10.4

Bruce Momjian <bruce@momjian.us> writes:

On Tue, Jul 31, 2018 at 03:26:47PM -0400, Tom Lane wrote:

This patch evidently broke buildfarm member jacana, although only
in the 9.3 branch. It's been failing with
Jul 28 23:22:30 The system cannot find the path specified.

Well, that's interesting. I have a test script that upgrades all
version combinations of Postgres from 9.3 to head and it worked for me.
However, I now see that the tests for checking a live server were wrong,
and 9.3 must have a slightly different test than the others. Fixes
applied. Thanks for finding this so quickly.

After poking around a bit more, I think the more likely explanation is
that 9.3 was still using the SYSTEMQUOTE macro in commands to be popen'd,
and you did not make that adjustment when back-patching into that branch.
Note the adjacent pre-existing invocation of pg_controldata:

snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/%s \"%s\"" SYSTEMQUOTE,
cluster->bindir,
live_check ? "pg_controldata\"" : "pg_resetxlog\" -n",

regards, tom lane

#7Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: pg_upgrade from 9.4 to 10.4

On Tue, Jul 31, 2018 at 06:29:34PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Tue, Jul 31, 2018 at 03:26:47PM -0400, Tom Lane wrote:

This patch evidently broke build farm member jacana, although only
in the 9.3 branch. It's been failing with
Jul 28 23:22:30 The system cannot find the path specified.

Well, that's interesting. I have a test script that upgrades all
version combinations of Postgres from 9.3 to head and it worked for me.
However, I now see that the tests for checking a live server were wrong,
and 9.3 must have a slightly different test than the others. Fixes
applied. Thanks for finding this so quickly.

After poking around a bit more, I think the more likely explanation is
that 9.3 was still using the SYSTEMQUOTE macro in commands to be popen'd,
and you did not make that adjustment when back-patching into that branch.
Note the adjacent pre-existing invocation of pg_controldata:

snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/%s \"%s\"" SYSTEMQUOTE,
cluster->bindir,
live_check ? "pg_controldata\"" : "pg_resetxlog\" -n",

Oh, jacana must be a Windows server with spaces in the file name paths.
Fixed. Thanks again. I really didn't want to backpatch the original
fix but had to since it could produce corrupt upgrades.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#8Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Bruce Momjian (#7)
Re: pg_upgrade from 9.4 to 10.4

On 07/31/2018 07:08 PM, Bruce Momjian wrote:

On Tue, Jul 31, 2018 at 06:29:34PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Tue, Jul 31, 2018 at 03:26:47PM -0400, Tom Lane wrote:

This patch evidently broke build farm member jacana, although only
in the 9.3 branch. It's been failing with
Jul 28 23:22:30 The system cannot find the path specified.

Well, that's interesting. I have a test script that upgrades all
version combinations of Postgres from 9.3 to head and it worked for me.
However, I now see that the tests for checking a live server were wrong,
and 9.3 must have a slightly different test than the others. Fixes
applied. Thanks for finding this so quickly.

After poking around a bit more, I think the more likely explanation is
that 9.3 was still using the SYSTEMQUOTE macro in commands to be popen'd,
and you did not make that adjustment when back-patching into that branch.
Note the adjacent pre-existing invocation of pg_controldata:

snprintf(cmd, sizeof(cmd), SYSTEMQUOTE "\"%s/%s \"%s\"" SYSTEMQUOTE,
cluster->bindir,
live_check ? "pg_controldata\"" : "pg_resetxlog\" -n",

Oh, jacana must be a Windows server with spaces in the file name paths.
Fixed. Thanks again. I really didn't want to backpatch the original
fix but had to since it could produce corrupt upgrades.

It is a Windows machine, but there should be any spaces in the paths.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#8)
Re: pg_upgrade from 9.4 to 10.4

On Tue, Jul 31, 2018 at 08:05:06PM -0400, Andrew Dunstan wrote:

On 07/31/2018 07:08 PM, Bruce Momjian wrote:

Oh, jacana must be a Windows server with spaces in the file name paths.
Fixed. Thanks again. I really didn't want to backpatch the original
fix but had to since it could produce corrupt upgrades.

It is a Windows machine, but there should be any spaces in the paths.

The comment at the top of src/port/system.c explains why we need those
quotes. Spaces was not the issue.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: pg_upgrade from 9.4 to 10.4

Bruce Momjian <bruce@momjian.us> writes:

The comment at the top of src/port/system.c explains why we need those
quotes. Spaces was not the issue.

So, while starting to prepare the release notes, I looked at this patch
again and I'm still wondering why it's designed this way. AFAICS,
the current state of affairs is:

1. previous server was shut down nicely: all good.
2. previous server was shut down with "-m immediate": we complain and die.
3. previous server was shut down with "kill -9": we clean up and press on.

I am not really sure why case 2 deserves a wrist slap and making the user
clean it up manually when we're willing to clean up automatically in
case 3. If we're going to treat them differently, that's backwards.

Right now is probably not a good time to fix this, but it seems like
something that could be improved. I'd be kind of inclined to remove
the pidfile checking business altogether in favor of inspecting the
state in pg_control; or at least do them both in the same place with
the same recovery attempt if we don't like what we see.

regards, tom lane

#11Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#10)
Re: pg_upgrade from 9.4 to 10.4

On Fri, Aug 3, 2018 at 01:55:04PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

The comment at the top of src/port/system.c explains why we need those
quotes. Spaces was not the issue.

So, while starting to prepare the release notes, I looked at this patch
again and I'm still wondering why it's designed this way. AFAICS,
the current state of affairs is:

1. previous server was shut down nicely: all good.
2. previous server was shut down with "-m immediate": we complain and die.
3. previous server was shut down with "kill -9": we clean up and press on.

I am not really sure why case 2 deserves a wrist slap and making the user
clean it up manually when we're willing to clean up automatically in
case 3. If we're going to treat them differently, that's backwards.

Right now is probably not a good time to fix this, but it seems like
something that could be improved. I'd be kind of inclined to remove
the pidfile checking business altogether in favor of inspecting the
state in pg_control; or at least do them both in the same place with
the same recovery attempt if we don't like what we see.

Yes, I realize this was inconsistent. It was done this way purely based
on how easy it would be to check each item in each place. I am fine
with removing the #3 cleanup so we are consistent. We can also add docs
to say it should be a "clean" shutdown.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#12Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#11)
Re: pg_upgrade from 9.4 to 10.4

On Fri, Aug 3, 2018 at 04:56:32PM -0400, Bruce Momjian wrote:

On Fri, Aug 3, 2018 at 01:55:04PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

The comment at the top of src/port/system.c explains why we need those
quotes. Spaces was not the issue.

So, while starting to prepare the release notes, I looked at this patch
again and I'm still wondering why it's designed this way. AFAICS,
the current state of affairs is:

1. previous server was shut down nicely: all good.
2. previous server was shut down with "-m immediate": we complain and die.
3. previous server was shut down with "kill -9": we clean up and press on.

I am not really sure why case 2 deserves a wrist slap and making the user
clean it up manually when we're willing to clean up automatically in
case 3. If we're going to treat them differently, that's backwards.

Right now is probably not a good time to fix this, but it seems like
something that could be improved. I'd be kind of inclined to remove
the pidfile checking business altogether in favor of inspecting the
state in pg_control; or at least do them both in the same place with
the same recovery attempt if we don't like what we see.

Yes, I realize this was inconsistent. It was done this way purely based
on how easy it would be to check each item in each place. I am fine
with removing the #3 cleanup so we are consistent. We can also add docs
to say it should be a "clean" shutdown.

How do you want me to handle this, considering it has to be backpatched?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#12)
Re: pg_upgrade from 9.4 to 10.4

Bruce Momjian <bruce@momjian.us> writes:

On Fri, Aug 3, 2018 at 04:56:32PM -0400, Bruce Momjian wrote:

On Fri, Aug 3, 2018 at 01:55:04PM -0400, Tom Lane wrote:

Right now is probably not a good time to fix this, but it seems like
something that could be improved. I'd be kind of inclined to remove
the pidfile checking business altogether in favor of inspecting the
state in pg_control; or at least do them both in the same place with
the same recovery attempt if we don't like what we see.

Yes, I realize this was inconsistent. It was done this way purely based
on how easy it would be to check each item in each place. I am fine
with removing the #3 cleanup so we are consistent. We can also add docs
to say it should be a "clean" shutdown.

How do you want me to handle this, considering it has to be backpatched?

Well, removing the check entirely is certainly not a good idea.

I looked at the code briefly and concur that making it work "nicely" isn't
as simple as one could wish; the code you added has some dependencies on
the context it's in, so that moving it elsewhere would require code
duplication or refactoring. Maybe it's not something to try to shoehorn
into v11. We can always improve it later.

(I'd also say that the weekend before a release wrap is no time
to be committing anything noncritical, so even if you're feeling
motivated to fix it in v11, best to hold off till after the wrap.)

regards, tom lane

#14Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#13)
Re: pg_upgrade from 9.4 to 10.4

On Sat, Aug 4, 2018 at 11:26:06PM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Fri, Aug 3, 2018 at 04:56:32PM -0400, Bruce Momjian wrote:

On Fri, Aug 3, 2018 at 01:55:04PM -0400, Tom Lane wrote:

Right now is probably not a good time to fix this, but it seems like
something that could be improved. I'd be kind of inclined to remove
the pidfile checking business altogether in favor of inspecting the
state in pg_control; or at least do them both in the same place with
the same recovery attempt if we don't like what we see.

Yes, I realize this was inconsistent. It was done this way purely based
on how easy it would be to check each item in each place. I am fine
with removing the #3 cleanup so we are consistent. We can also add docs
to say it should be a "clean" shutdown.

How do you want me to handle this, considering it has to be backpatched?

Well, removing the check entirely is certainly not a good idea.

I looked at the code briefly and concur that making it work "nicely" isn't
as simple as one could wish; the code you added has some dependencies on
the context it's in, so that moving it elsewhere would require code
duplication or refactoring. Maybe it's not something to try to shoehorn
into v11. We can always improve it later.

(I'd also say that the weekend before a release wrap is no time
to be committing anything noncritical, so even if you're feeling
motivated to fix it in v11, best to hold off till after the wrap.)

Yeah, I felt that what we have is probably as good as we reasonably
could get in the short term.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +