[bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

Started by MauMauabout 12 years ago19 messages
#1MauMau
maumau307@gmail.com
1 attachment(s)

Hello,

I've found and fixed a bug that causes recovery (crash recovery, PITR) to
fail. Please find attached the patch against HEAD.

[Bug]
To reproduce the problem, do the following on Windows:

1. pg_ctl start
2. CREATE TABLESPACE tbs LOCATION 'some_tblspc_path';
3. pg_ctl stop -mi
4. pg_ctl start

The database server fails to start, leaving the below messages:

LOG: database system was interrupted; last known up at 2013-10-31 20:24:07
JST
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/1788938
FATAL: could not remove symbolic link "pg_tblspc/16385": Permission denied
CONTEXT: xlog redo create tablespace: 16385 "d:/tbs"
LOG: startup process (PID 2724) exited with exit code 1
LOG: aborting startup due to startup process failure

[Cause]
In redo, create_tablespace_directories() tries to remove the symbolic link
for the tablespace using unlink(). However, unlink() on Windows fails with
errno=13 (Permission denied). This is because junction points are
directories on Windows.

[Fix]
Follow destroy_tablespace_directories() and use rmdir() to remove the
junction point.

I've tested the patch. Could you review it and commit? I wish it to be
backported to all major releases.

Regards
MauMau

Attachments:

remove_tblspc_symlink.patchapplication/octet-stream; name=remove_tblspc_symlink.patchDownload
diff -rpcd a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
*** a/src/backend/commands/tablespace.c	2013-10-04 13:17:03.000000000 +0900
--- b/src/backend/commands/tablespace.c	2013-10-31 21:17:39.508000000 +0900
*************** create_tablespace_directories(const char
*** 544,549 ****
--- 544,550 ----
  	char	   *linkloc = palloc(OIDCHARS + OIDCHARS + 1);
  	char	   *location_with_version_dir = palloc(strlen(location) + 1 +
  								   strlen(TABLESPACE_VERSION_DIRECTORY) + 1);
+ 	struct stat st;
  
  	sprintf(linkloc, "pg_tblspc/%u", tablespaceoid);
  	sprintf(location_with_version_dir, "%s/%s", location,
*************** create_tablespace_directories(const char
*** 606,618 ****
  	}
  
  	/* Remove old symlink in recovery, in case it points to the wrong place */
  	if (InRecovery)
  	{
! 		if (unlink(linkloc) < 0 && errno != ENOENT)
! 			ereport(ERROR,
! 					(errcode_for_file_access(),
! 					 errmsg("could not remove symbolic link \"%s\": %m",
! 							linkloc)));
  	}
  
  	/*
--- 607,631 ----
  	}
  
  	/* Remove old symlink in recovery, in case it points to the wrong place */
+ 	/* On Windows, lstat() reports junction points as directories */
  	if (InRecovery)
  	{
! 		if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
! 		{
! 			if (rmdir(linkloc) < 0)
! 				ereport(ERROR,
! 						(errcode_for_file_access(),
! 						 errmsg("could not remove directory \"%s\": %m",
! 								linkloc)));
! 		}
! 		else
! 		{
! 			if (unlink(linkloc) < 0 && errno != ENOENT)
! 				ereport(ERROR,
! 						(errcode_for_file_access(),
! 						 errmsg("could not remove symbolic link \"%s\": %m",
! 								linkloc)));
! 		}
  	}
  
  	/*
#2Andrew Dunstan
andrew@dunslane.net
In reply to: MauMau (#1)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

On 10/31/2013 08:40 AM, MauMau wrote:

Hello,

I've found and fixed a bug that causes recovery (crash recovery, PITR)
to fail. Please find attached the patch against HEAD.

[Bug]
To reproduce the problem, do the following on Windows:

1. pg_ctl start
2. CREATE TABLESPACE tbs LOCATION 'some_tblspc_path';
3. pg_ctl stop -mi
4. pg_ctl start

The database server fails to start, leaving the below messages:

LOG: database system was interrupted; last known up at 2013-10-31
20:24:07 JST
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/1788938
FATAL: could not remove symbolic link "pg_tblspc/16385": Permission
denied
CONTEXT: xlog redo create tablespace: 16385 "d:/tbs"
LOG: startup process (PID 2724) exited with exit code 1
LOG: aborting startup due to startup process failure

[Cause]
In redo, create_tablespace_directories() tries to remove the symbolic
link for the tablespace using unlink(). However, unlink() on Windows
fails with errno=13 (Permission denied). This is because junction
points are directories on Windows.

[Fix]
Follow destroy_tablespace_directories() and use rmdir() to remove the
junction point.

I've tested the patch. Could you review it and commit? I wish it to be
backported to all major releases.

Regards
MauMau

Why are you making this a runtime check instead of a compile time check?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3MauMau
maumau307@gmail.com
In reply to: Andrew Dunstan (#2)
1 attachment(s)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

From: "Andrew Dunstan" <andrew@dunslane.net>

Why are you making this a runtime check instead of a compile time check?

I thought the same situation could happen as in
destroy_tablespace_directories(), but it doesn't seem to apply on second
thought. Revised patch attached, which is very simple based on compile time
check.

Regards
MauMau

Attachments:

remove_tblspc_symlink_v2.patchapplication/octet-stream; name=remove_tblspc_symlink_v2.patchDownload
diff -rpcd a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
*** a/src/backend/commands/tablespace.c	2013-10-04 13:17:03.000000000 +0900
--- b/src/backend/commands/tablespace.c	2013-10-31 23:01:35.228000000 +0900
*************** create_tablespace_directories(const char
*** 606,614 ****
--- 606,619 ----
  	}
  
  	/* Remove old symlink in recovery, in case it points to the wrong place */
+ 	/* On Windows, lstat() reports junction points as directories */
  	if (InRecovery)
  	{
+ #ifdef WIN32
+ 		if (rmdir(linkloc) < 0 && errno != ENOENT)
+ #else
  		if (unlink(linkloc) < 0 && errno != ENOENT)
+ #endif
  			ereport(ERROR,
  					(errcode_for_file_access(),
  					 errmsg("could not remove symbolic link \"%s\": %m",
#4MauMau
maumau307@gmail.com
In reply to: MauMau (#3)
1 attachment(s)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

From: "MauMau" <maumau307@gmail.com>

I thought the same situation could happen as in
destroy_tablespace_directories(), but it doesn't seem to apply on second
thought. Revised patch attached, which is very simple based on compile
time
check.

Sorry, I was careless to leave an old comment. Please use this version.

Regards
MauMau

Attachments:

remove_tblspc_symlink_v3.patchapplication/octet-stream; name=remove_tblspc_symlink_v3.patchDownload
diff -rpcd a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
*** a/src/backend/commands/tablespace.c	2013-10-04 13:17:03.000000000 +0900
--- b/src/backend/commands/tablespace.c	2013-10-31 23:56:50.223000000 +0900
*************** create_tablespace_directories(const char
*** 606,614 ****
--- 606,619 ----
  	}
  
  	/* Remove old symlink in recovery, in case it points to the wrong place */
+ 	/* On Windows, junction points are directories */
  	if (InRecovery)
  	{
+ #ifdef WIN32
+ 		if (rmdir(linkloc) < 0 && errno != ENOENT)
+ #else
  		if (unlink(linkloc) < 0 && errno != ENOENT)
+ #endif
  			ereport(ERROR,
  					(errcode_for_file_access(),
  					 errmsg("could not remove symbolic link \"%s\": %m",
#5Asif Naeem
anaeem.it@gmail.com
In reply to: MauMau (#4)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

I did worked on testing the patch on Windows, test scenario mentioned above
thread is reproducible and the provided patch resolve the issue. In case of
junction or directory unlink function
(deprecated<http://msdn.microsoft.com/en-us/library/ms235350.aspx&gt;)
returns -1 with errno “EACCES” (i.e. Permission denied). As you have
followed destroy_tablespace_directories() function, Is there any specific
reason not to use same logic to detect type of the file/link i.e.
“(lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))”, It also seems have
more appropriate error message i.e.

src/backend/commands/tablespace.c

….
745 /*
746 * Try to remove the symlink. We must however deal with
the possibility
747 * that it's a directory instead of a symlink --- this
could happen during
748 * WAL replay (see TablespaceCreateDbspace), and it is
also the case on
749 * Windows where junction points lstat() as directories.
750 *
751 * Note: in the redo case, we'll return true if this final
step fails;
752 * there's no point in retrying it. Also, ENOENT should
provoke no more
753 * than a warning.
754 */
755 remove_symlink:
756 linkloc = pstrdup(linkloc_with_version_dir);
757 get_parent_directory(linkloc);
758 if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
759 {
760 if (rmdir(linkloc) < 0)
761 ereport(redo ? LOG : ERROR,
762 (errcode_for_file_access(),
763 errmsg("could not remove
directory \"%s\": %m",
764 linkloc)));
765 }
766 else
767 {
768 if (unlink(linkloc) < 0)
769 ereport(redo ? LOG : (errno == ENOENT ?
WARNING : ERROR),
770 (errcode_for_file_access(),
771 errmsg("could not remove
symbolic link \"%s\": %m",
772 linkloc)));
773 }
….

Other than this patch looks good to me.

Regards,
Muhammad Asif Naeem

On Thu, Oct 31, 2013 at 8:03 PM, MauMau <maumau307@gmail.com> wrote:

Show quoted text

From: "MauMau" <maumau307@gmail.com>

I thought the same situation could happen as in

destroy_tablespace_directories(), but it doesn't seem to apply on second
thought. Revised patch attached, which is very simple based on compile
time
check.

Sorry, I was careless to leave an old comment. Please use this version.

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6MauMau
maumau307@gmail.com
In reply to: Asif Naeem (#5)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

From: "Asif Naeem" <anaeem.it@gmail.com>

As you have

followed destroy_tablespace_directories() function, Is there any specific
reason not to use same logic to detect type of the file/link i.e.
�(lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))�, It also seems have
more appropriate error message i.e.

Thanks for reviewing and testing the patch. Yes, at first I did what you
mentioned, but modified the patch according to some advice in the mail
thread. During redo, create_tablespace_directories() needs to handle the
case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory even
on UNIX/Linux. Please see TablespaceCreateDbspace is().
destroy_tablespace_directories() doesn't have to handle such situation.

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Asif Naeem
anaeem.it@gmail.com
In reply to: MauMau (#6)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

Hi MauMau,

Ah. Sorry, I missed that part. As NTFS junctions and symbolic links are
different (although they behave similarly), there seems only a minor
inconvenience related to misleading error message i.e.

+ #ifdef WIN32

+  if (rmdir(linkloc) < 0 && errno != ENOENT)
+ #else
if (unlink(linkloc) < 0 && errno != ENOENT)
+ #endif
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not remove symbolic link \"%s\": %m",

What is your opinion about it, Is it not worth changing ? . Thanks.

On Wed, Jan 15, 2014 at 7:42 PM, MauMau <maumau307@gmail.com> wrote:

Show quoted text

From: "Asif Naeem" <anaeem.it@gmail.com>

As you have

followed destroy_tablespace_directories() function, Is there any specific
reason not to use same logic to detect type of the file/link i.e.
“(lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))”, It also seems have
more appropriate error message i.e.

Thanks for reviewing and testing the patch. Yes, at first I did what you
mentioned, but modified the patch according to some advice in the mail
thread. During redo, create_tablespace_directories() needs to handle the
case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory even
on UNIX/Linux. Please see TablespaceCreateDbspace is(). destroy_tablespace_directories()
doesn't have to handle such situation.

Regards
MauMau

#8MauMau
maumau307@gmail.com
In reply to: MauMau (#6)
1 attachment(s)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

I'm sorry, I'm replying to an older mail, because I lost your latest mail by
mistake.

Ah. Sorry, I missed that part. As NTFS junctions and symbolic links are
different (although they behave similarly), there seems only a minor
inconvenience related to misleading error message i.e.

You are right. Fixed.

Regards
MauMau

Attachments:

remove_tblspc_symlink_v4.patchapplication/octet-stream; name=remove_tblspc_symlink_v4.patchDownload
diff -rpcd a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
*** a/src/backend/commands/tablespace.c	2014-02-03 09:17:02.000000000 +0900
--- b/src/backend/commands/tablespace.c	2014-02-04 09:05:50.000000000 +0900
*************** create_tablespace_directories(const char
*** 621,633 ****
--- 621,642 ----
  	}
  
  	/* Remove old symlink in recovery, in case it points to the wrong place */
+ 	/* On Windows, lstat() reports junction points as directories */
  	if (InRecovery)
  	{
+ #ifdef WIN32
+ 		if (rmdir(linkloc) < 0 && errno != ENOENT)
+ 			ereport(ERROR,
+ 					(errcode_for_file_access(),
+ 					 errmsg("could not remove directory \"%s\": %m",
+ 							linkloc)));
+ #else
  		if (unlink(linkloc) < 0 && errno != ENOENT)
  			ereport(ERROR,
  					(errcode_for_file_access(),
  					 errmsg("could not remove symbolic link \"%s\": %m",
  							linkloc)));
+ #endif
  	}
  
  	/*
#9Amit Kapila
amit.kapila16@gmail.com
In reply to: MauMau (#6)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

On Wed, Jan 15, 2014 at 8:12 PM, MauMau <maumau307@gmail.com> wrote:

From: "Asif Naeem" <anaeem.it@gmail.com>

As you have

followed destroy_tablespace_directories() function, Is there any specific
reason not to use same logic to detect type of the file/link i.e.
"(lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))", It also seems have
more appropriate error message i.e.

Thanks for reviewing and testing the patch. Yes, at first I did what you
mentioned, but modified the patch according to some advice in the mail
thread. During redo, create_tablespace_directories() needs to handle the
case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory even
on UNIX/Linux. Please see TablespaceCreateDbspace is().
destroy_tablespace_directories() doesn't have to handle such situation.

If create_tablespace_directories() needs to handle with directory both on
Windows/Linux, then shouldn't it be a runtime check as in your first
version rather than compile time check?
Also isn't that the reason why destroy_tablespace_directories() have similar
check?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10MauMau
maumau307@gmail.com
In reply to: Amit Kapila (#9)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

From: "Amit Kapila" <amit.kapila16@gmail.com>

On Wed, Jan 15, 2014 at 8:12 PM, MauMau <maumau307@gmail.com> wrote:

Thanks for reviewing and testing the patch. Yes, at first I did what you
mentioned, but modified the patch according to some advice in the mail
thread. During redo, create_tablespace_directories() needs to handle the
case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory
even
on UNIX/Linux. Please see TablespaceCreateDbspace is().
destroy_tablespace_directories() doesn't have to handle such situation.

If create_tablespace_directories() needs to handle with directory both on
Windows/Linux, then shouldn't it be a runtime check as in your first
version rather than compile time check?
Also isn't that the reason why destroy_tablespace_directories() have
similar
check?

I see..., and you are correct. The first version of my patch should be the
right fix. It seems that my head went somewhere when I submitted the second
revision.

What should I do? Should I re-submit the first revision as the latest fifth
revision and link the email from the CommitFest newest entry?

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Amit Kapila
amit.kapila16@gmail.com
In reply to: MauMau (#10)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

On Fri, Mar 21, 2014 at 12:24 PM, MauMau <maumau307@gmail.com> wrote:

From: "Amit Kapila" <amit.kapila16@gmail.com>

If create_tablespace_directories() needs to handle with directory both on
Windows/Linux, then shouldn't it be a runtime check as in your first
version rather than compile time check?
Also isn't that the reason why destroy_tablespace_directories() have
similar
check?

I see..., and you are correct. The first version of my patch should be the
right fix. It seems that my head went somewhere when I submitted the second
revision.

What should I do? Should I re-submit the first revision as the latest fifth
revision and link the email from the CommitFest newest entry?

The comments in your first version needs to be improved, as there
you just mentioned a Windows specific comment:
+ /* On Windows, lstat()

I think you can change comments (make it somewhat similar to
destroy_tablespace_directories) and then submit it as a new version.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12MauMau
maumau307@gmail.com
In reply to: Amit Kapila (#11)
1 attachment(s)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

From: "Amit Kapila" <amit.kapila16@gmail.com>

The comments in your first version needs to be improved, as there
you just mentioned a Windows specific comment:
+ /* On Windows, lstat()

I think you can change comments (make it somewhat similar to
destroy_tablespace_directories) and then submit it as a new version.

OK, done. Please find the attached patch. I also rebased the patch to
HEAD.

I'll update the CommitFest entry soon.
Regards
MauMau

Attachments:

remove_tblspc_symlink_v5.patchapplication/octet-stream; name=remove_tblspc_symlink_v5.patchDownload
diff -rpcd a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
*** a/src/backend/commands/tablespace.c	2014-03-21 05:17:03.000000000 +0900
--- b/src/backend/commands/tablespace.c	2014-03-21 17:26:38.348000000 +0900
*************** create_tablespace_directories(const char
*** 559,564 ****
--- 559,565 ----
  {
  	char	   *linkloc;
  	char	   *location_with_version_dir;
+ 	struct stat st;
  
  	linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
  	location_with_version_dir = psprintf("%s/%s", location,
*************** create_tablespace_directories(const char
*** 620,633 ****
  							location_with_version_dir)));
  	}
  
! 	/* Remove old symlink in recovery, in case it points to the wrong place */
  	if (InRecovery)
  	{
! 		if (unlink(linkloc) < 0 && errno != ENOENT)
! 			ereport(ERROR,
! 					(errcode_for_file_access(),
! 					 errmsg("could not remove symbolic link \"%s\": %m",
! 							linkloc)));
  	}
  
  	/*
--- 621,651 ----
  							location_with_version_dir)));
  	}
  
! 	/*
! 	 * Remove old symlink in recovery, in case it points to the wrong place.
! 	 * We must however deal with the possibility that it's a directory
! 	 * instead of a symlink --- this could happen during WAL replay, and
! 	 * it is also the case on Windows where junction points lstat() as
! 	 * directories.
! 	 */
  	if (InRecovery)
  	{
! 		if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
! 		{
! 			if (rmdir(linkloc) < 0)
! 				ereport(ERROR,
! 						(errcode_for_file_access(),
! 						 errmsg("could not remove directory \"%s\": %m",
! 								linkloc)));
! 		}
! 		else
! 		{
! 			if (unlink(linkloc) < 0 && errno != ENOENT)
! 				ereport(ERROR,
! 						(errcode_for_file_access(),
! 						 errmsg("could not remove symbolic link \"%s\": %m",
! 								linkloc)));
! 		}
  	}
  
  	/*
#13Amit Kapila
amit.kapila16@gmail.com
In reply to: MauMau (#10)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

On Fri, Mar 21, 2014 at 12:24 PM, MauMau <maumau307@gmail.com> wrote:

From: "Amit Kapila" <amit.kapila16@gmail.com>

On Wed, Jan 15, 2014 at 8:12 PM, MauMau <maumau307@gmail.com> wrote:

Thanks for reviewing and testing the patch. Yes, at first I did what you
mentioned, but modified the patch according to some advice in the mail
thread. During redo, create_tablespace_directories() needs to handle the
case where the $PGDATA/pg_tblspc/xxx is not a symlink but a directory
even
on UNIX/Linux. Please see TablespaceCreateDbspace is().
destroy_tablespace_directories() doesn't have to handle such situation.

If create_tablespace_directories() needs to handle with directory both on
Windows/Linux, then shouldn't it be a runtime check as in your first
version rather than compile time check?
Also isn't that the reason why destroy_tablespace_directories() have
similar
check?

I see..., and you are correct.

I have thought about the above point again and it seems to me that
the above claim (create_tablespace_directories() needs to handle a
path which is not a symlink but directory) might not be true.
For Example, I could easily think of case where it is required for
destroy_tablespace_directories().

1. Assume a tablespace tbs already exists.
2. Create table t1(c1 int) tablespace tbs;
3. drop table t1;
4. Drop tablespace tbs;
5. Do immediate shutdown (pg_ctl stop -mi);
6. During recovery it will create a table in directory (in function
TablespaceCreateDbspace) which needs to be removed by
destroy_tablespace_directories().

I am neither aware of, nor could think of such a case for
create_tablespace_directories(). Do you have any such case in mind
which I might be missing?

By saying above, I don't mean that your current patch has any
problem; even if there is no such scenario, I think your code is
right as stat/isdir check seems to be okay to identify junction
points and it avoids ifdef WIN32 check (which I personally think
is bit annoying and we should try to avoid such code unless it
is must or provides any significant advantage).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14MauMau
maumau307@gmail.com
In reply to: Amit Kapila (#13)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

From: "Amit Kapila" <amit.kapila16@gmail.com>

1. Assume a tablespace tbs already exists.
2. Create table t1(c1 int) tablespace tbs;
3. drop table t1;
4. Drop tablespace tbs;
5. Do immediate shutdown (pg_ctl stop -mi);
6. During recovery it will create a table in directory (in function
TablespaceCreateDbspace) which needs to be removed by
destroy_tablespace_directories().

I am neither aware of, nor could think of such a case for
create_tablespace_directories(). Do you have any such case in mind
which I might be missing?

A bit contrived example is:

1. After the directory is created by TablespaceCreateDbspace(), recovery is
stopped (e.g. due to power outage). The directory remains.
2. Restart the server, redoing CREATE TABLESPACE during recovery, which
executes create_tablespace_directories().

By saying above, I don't mean that your current patch has any
problem; even if there is no such scenario, I think your code is
right as stat/isdir check seems to be okay to identify junction
points and it avoids ifdef WIN32 check (which I personally think
is bit annoying and we should try to avoid such code unless it
is must or provides any significant advantage).

I think so, too.

Regards
MauMau

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Amit Kapila
amit.kapila16@gmail.com
In reply to: MauMau (#14)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

On Mon, Mar 24, 2014 at 7:49 PM, MauMau <maumau307@gmail.com> wrote:

A bit contrived example is:

1. After the directory is created by TablespaceCreateDbspace(), recovery is
stopped (e.g. due to power outage). The directory remains.
2. Restart the server, redoing CREATE TABLESPACE during recovery, which
executes create_tablespace_directories().

I don't understand how after step-1, step-2 can occur in recovery for same
tablespace path.

Function TablespaceCreateDbspace() would have called for CREATE TABLE.
Now Step-2 can only occur if there is a Drop Tablespace command in-between
step-1 and step-2, else CREATE TABLESPACE can't be successful during
command execution so will not get recorded in WAL. Basically Create Table
cannot happen on a particular directory without having some
CREATE TABLESPACE before it, so in the above example taken by you,
there must be some Create TableSpace before step-1 or it's on default
tablespace location which means you cannot perform step-2 for same
tablespace path as step-1 without having DROP TABLESPACE in-between
step-1 and step-2.

If you think that above scenario is not possible, then you just need to
modify comment:
"! * Remove old symlink in recovery...."

One more minor point about patch:
+ struct stat st;

if (InRecovery)
{
struct stat st;

Defining stat struct two times in same function in different ways doesn't
seem to be good, we can do the same way for new usage as is already
done in code or may be declare it once.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16MauMau
maumau307@gmail.com
In reply to: Amit Kapila (#15)
1 attachment(s)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

From: "Amit Kapila" <amit.kapila16@gmail.com>

If you think that above scenario is not possible, then you just need to
modify comment:
"! * Remove old symlink in recovery...."

Hm, my scenario seems impossible. I reverted the comment.

One more minor point about patch:
+ struct stat st;

if (InRecovery)
{
struct stat st;

Defining stat struct two times in same function in different ways doesn't
seem to be good, we can do the same way for new usage as is already
done in code or may be declare it once.

OK, I removed the second (existing) definition of st.

Regards
MauMau

Attachments:

remove_tblspc_symlink_v6.patchapplication/octet-stream; name=remove_tblspc_symlink_v6.patchDownload
diff -rpcd a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
*** a/src/backend/commands/tablespace.c	2014-03-21 05:17:03.000000000 +0900
--- b/src/backend/commands/tablespace.c	2014-03-25 20:58:11.758000000 +0900
*************** create_tablespace_directories(const char
*** 559,564 ****
--- 559,565 ----
  {
  	char	   *linkloc;
  	char	   *location_with_version_dir;
+ 	struct stat st;
  
  	linkloc = psprintf("pg_tblspc/%u", tablespaceoid);
  	location_with_version_dir = psprintf("%s/%s", location,
*************** create_tablespace_directories(const char
*** 585,592 ****
  
  	if (InRecovery)
  	{
- 		struct stat st;
- 
  		/*
  		 * Our theory for replaying a CREATE is to forcibly drop the target
  		 * subdirectory if present, and then recreate it. This may be more
--- 586,591 ----
*************** create_tablespace_directories(const char
*** 620,633 ****
  							location_with_version_dir)));
  	}
  
! 	/* Remove old symlink in recovery, in case it points to the wrong place */
  	if (InRecovery)
  	{
! 		if (unlink(linkloc) < 0 && errno != ENOENT)
! 			ereport(ERROR,
! 					(errcode_for_file_access(),
! 					 errmsg("could not remove symbolic link \"%s\": %m",
! 							linkloc)));
  	}
  
  	/*
--- 619,646 ----
  							location_with_version_dir)));
  	}
  
! 	/*
! 	 * Remove old symlink in recovery, in case it points to the wrong place.
! 	 * On Windows, lstat() reports junction points as directories.
! 	 */
  	if (InRecovery)
  	{
! 		if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
! 		{
! 			if (rmdir(linkloc) < 0)
! 				ereport(ERROR,
! 						(errcode_for_file_access(),
! 						 errmsg("could not remove directory \"%s\": %m",
! 								linkloc)));
! 		}
! 		else
! 		{
! 			if (unlink(linkloc) < 0 && errno != ENOENT)
! 				ereport(ERROR,
! 						(errcode_for_file_access(),
! 						 errmsg("could not remove symbolic link \"%s\": %m",
! 								linkloc)));
! 		}
  	}
  
  	/*
#17Amit Kapila
amit.kapila16@gmail.com
In reply to: MauMau (#16)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

On Tue, Mar 25, 2014 at 5:39 PM, MauMau <maumau307@gmail.com> wrote:

OK, I removed the second (existing) definition of st.

This patch version looks fine, I have verified the issue, regression tests
passed.

I had attached the latest version of patch in CF app and marked it as
"Ready For Committer".

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Andres Freund
andres@2ndquadrant.com
In reply to: MauMau (#16)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

On 2014-03-25 21:09:13 +0900, MauMau wrote:

! /*
! * Remove old symlink in recovery, in case it points to the wrong place.
! * On Windows, lstat() reports junction points as directories.
! */
if (InRecovery)
{
! if (lstat(linkloc, &st) == 0 && S_ISDIR(st.st_mode))
! {
! if (rmdir(linkloc) < 0)
! ereport(ERROR,
! (errcode_for_file_access(),
! errmsg("could not remove directory \"%s\": %m",
! linkloc)));
! }
! else
! {
! if (unlink(linkloc) < 0 && errno != ENOENT)
! ereport(ERROR,
! (errcode_for_file_access(),
! errmsg("could not remove symbolic link \"%s\": %m",
! linkloc)));
! }
}

if (..)
...
else
{
if (...)
...
}

is pretty odd code layout.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: MauMau (#16)
Re: [bug fix] PostgreSQL fails to start on Windows if it crashes after tablespace creation

"MauMau" <maumau307@gmail.com> writes:

[ remove_tblspc_symlink_v6.patch ]

Committed, thanks.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers