Path case sensitivity on windows
Bug #4694
(http://archives.postgresql.org/message-id/200903050848.n258mVgm046178@wwwmaster.postgresql.org)
shows a very strange behaviour on windows when you use a different case PATH
From what I can tell, this is because dir_strcmp() is case sensitive,
but paths on windows are really case-insensitive.
Attached patch fixes this in my testcase. Can anybody spot something
wrong with it? If not, I'll apply once I've finished my test runs:-)
//Magnus
Attachments:
path_case.patchtext/x-diff; name=path_case.patchDownload
diff --git a/src/port/path.c b/src/port/path.c
index 708306d..d7bd353 100644
--- a/src/port/path.c
+++ b/src/port/path.c
@@ -427,7 +427,12 @@ dir_strcmp(const char *s1, const char *s2)
{
while (*s1 && *s2)
{
+#ifndef WIN32
if (*s1 != *s2 &&
+#else
+ /* On windows, paths are case-insensitive */
+ if (tolower(*s1) != tolower(*s2) &&
+#endif
!(IS_DIR_SEP(*s1) && IS_DIR_SEP(*s2)))
return (int) *s1 - (int) *s2;
s1++, s2++;
Magnus Hagander <magnus@hagander.net> writes:
Attached patch fixes this in my testcase. Can anybody spot something
wrong with it?
It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?
regards, tom lane
Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
Attached patch fixes this in my testcase. Can anybody spot something
wrong with it?It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?
Not sure, really :) That's the encoding we'd get the paths in in the
first place, is it not?
Or are you just saying we should be using pg_tolower()? (which I forgot
about yet again)
//Magnus
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?
Or are you just saying we should be using pg_tolower()? (which I forgot
about yet again)
Well, I'd be happier with pg_tolower, because I know what it does.
But the real question here is what does "case insensitivity" on
file names actually mean in Windows --- ie, what happens to non-ASCII
letters?
regards, tom lane
Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
It depends on tolower(), which is going to have LC_CTYPE-dependent
behavior, which is surely wrong?Or are you just saying we should be using pg_tolower()? (which I forgot
about yet again)Well, I'd be happier with pg_tolower, because I know what it does.
But the real question here is what does "case insensitivity" on
file names actually mean in Windows --- ie, what happens to non-ASCII
letters?
The filesystem itself is UTF-16. I would assume the "system default"
locale controls the case insensitivity, but I'm not sure about that.
Reading up some, it seems the collation is actually stored in a hidden
file on the NTFS volume... It seems to differ between different versions
of windows from what I can tell, but since this is written to the fs,
it's ok.
I have not found a way to actually *get* the locale.. Or even to compare
two filenames. There is a function called GetFullPathName(), but I'm not
sure how to use it for this.
However. I don't think it's really critical that we deal with all corner
cases for this. It's not likely that the user would be using any really
weird locale-specific combinations *differently* in the PATH variable vs
the commandline, or something like that...
And this only shows up when the binary is found in the PATH and not
through a fully specified directory. This is, AFAICT, the only case
where they can differ. This is the reason why we haven't had any reports
of this before - nobody using the installer, or doing even a "normal
style" install would ever end up in this situation.
//Magnus
Magnus Hagander <magnus@hagander.net> writes:
And this only shows up when the binary is found in the PATH and not
through a fully specified directory. This is, AFAICT, the only case
where they can differ. This is the reason why we haven't had any reports
of this before - nobody using the installer, or doing even a "normal
style" install would ever end up in this situation.
Hmm. Well, if we use pg_tolower then it will only do the right thing
for ASCII letters, but it seems like non-ASCII in the path leading to
the postgres binaries would be pretty dang unusual. (And I am not
convinced tolower() would get it right either --- it certainly won't
if the encoding is multibyte.)
On balance I'd suggest just using pg_tolower and figuring it's close
enough.
regards, tom lane
On Thursday 02 April 2009 18:29:45 Tom Lane wrote:
Hmm. Well, if we use pg_tolower then it will only do the right thing
for ASCII letters, but it seems like non-ASCII in the path leading to
the postgres binaries would be pretty dang unusual.
Well, Windows localizes the directory names like C:\Program Files, so it is
entirely plausible to have non-ASCII path names across the board in certain
locales.