Path case sensitivity on windows

Started by Magnus Haganderalmost 17 years ago7 messages
#1Magnus Hagander
magnus@hagander.net
1 attachment(s)

Bug #4694
(http://archives.postgresql.org/message-id/200903050848.n258mVgm046178@wwwmaster.postgresql.org)
shows a very strange behaviour on windows when you use a different case PATH

From what I can tell, this is because dir_strcmp() is case sensitive,
but paths on windows are really case-insensitive.

Attached patch fixes this in my testcase. Can anybody spot something
wrong with it? If not, I'll apply once I've finished my test runs:-)

//Magnus

Attachments:

path_case.patchtext/x-diff; name=path_case.patchDownload
diff --git a/src/port/path.c b/src/port/path.c
index 708306d..d7bd353 100644
--- a/src/port/path.c
+++ b/src/port/path.c
@@ -427,7 +427,12 @@ dir_strcmp(const char *s1, const char *s2)
 {
 	while (*s1 && *s2)
 	{
+#ifndef WIN32
 		if (*s1 != *s2 &&
+#else
+			/* On windows, paths are case-insensitive */
+		if (tolower(*s1) != tolower(*s2) &&
+#endif
 			!(IS_DIR_SEP(*s1) && IS_DIR_SEP(*s2)))
 			return (int) *s1 - (int) *s2;
 		s1++, s2++;
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#1)
Re: Path case sensitivity on windows

Magnus Hagander <magnus@hagander.net> writes:

Attached patch fixes this in my testcase. Can anybody spot something
wrong with it?

It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?

regards, tom lane

#3Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#2)
Re: Path case sensitivity on windows

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Attached patch fixes this in my testcase. Can anybody spot something
wrong with it?

It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?

Not sure, really :) That's the encoding we'd get the paths in in the
first place, is it not?

Or are you just saying we should be using pg_tolower()? (which I forgot
about yet again)

//Magnus

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#3)
Re: Path case sensitivity on windows

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?

Or are you just saying we should be using pg_tolower()? (which I forgot
about yet again)

Well, I'd be happier with pg_tolower, because I know what it does.
But the real question here is what does "case insensitivity" on
file names actually mean in Windows --- ie, what happens to non-ASCII
letters?

regards, tom lane

#5Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#4)
Re: Path case sensitivity on windows

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?

Or are you just saying we should be using pg_tolower()? (which I forgot
about yet again)

Well, I'd be happier with pg_tolower, because I know what it does.
But the real question here is what does "case insensitivity" on
file names actually mean in Windows --- ie, what happens to non-ASCII
letters?

The filesystem itself is UTF-16. I would assume the "system default"
locale controls the case insensitivity, but I'm not sure about that.

Reading up some, it seems the collation is actually stored in a hidden
file on the NTFS volume... It seems to differ between different versions
of windows from what I can tell, but since this is written to the fs,
it's ok.

I have not found a way to actually *get* the locale.. Or even to compare
two filenames. There is a function called GetFullPathName(), but I'm not
sure how to use it for this.

However. I don't think it's really critical that we deal with all corner
cases for this. It's not likely that the user would be using any really
weird locale-specific combinations *differently* in the PATH variable vs
the commandline, or something like that...

And this only shows up when the binary is found in the PATH and not
through a fully specified directory. This is, AFAICT, the only case
where they can differ. This is the reason why we haven't had any reports
of this before - nobody using the installer, or doing even a "normal
style" install would ever end up in this situation.

//Magnus

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#5)
Re: Path case sensitivity on windows

Magnus Hagander <magnus@hagander.net> writes:

And this only shows up when the binary is found in the PATH and not
through a fully specified directory. This is, AFAICT, the only case
where they can differ. This is the reason why we haven't had any reports
of this before - nobody using the installer, or doing even a "normal
style" install would ever end up in this situation.

Hmm. Well, if we use pg_tolower then it will only do the right thing
for ASCII letters, but it seems like non-ASCII in the path leading to
the postgres binaries would be pretty dang unusual. (And I am not
convinced tolower() would get it right either --- it certainly won't
if the encoding is multibyte.)

On balance I'd suggest just using pg_tolower and figuring it's close
enough.

regards, tom lane

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#6)
Re: Path case sensitivity on windows

On Thursday 02 April 2009 18:29:45 Tom Lane wrote:

Hmm. Well, if we use pg_tolower then it will only do the right thing
for ASCII letters, but it seems like non-ASCII in the path leading to
the postgres binaries would be pretty dang unusual.

Well, Windows localizes the directory names like C:\Program Files, so it is
entirely plausible to have non-ASCII path names across the board in certain
locales.