BUG #18219: libpq does not take into consideration UNICODE define
The following bug has been logged on the website:
Bug reference: 18219
Logged by: Jan Březina
Email address: 2janbrezina@gmail.com
PostgreSQL version: 16.0
Operating system: Windows
Description:
I use libpqxx via vcpkg in Windows project build for x64 system using Visual
Studio. There is a UNICODE define, which causes **W Windows functions to be
used and hence string parameters should be passed in using L"...".
I have sslrootcert file used in my connection string. During the connection
initialization stat() function is invoked for the file. It is implemented in
win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW.
Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in
is char*, not wchar_t*. This leads to failed initialization returning error
code 126 - The specified module could not be found.
There should be a switch if UNICODE is defined, then L"ntdll.dll" should be
passed in. The same applies to all other Windows functions consuming
strings.
On Fri, Dec 1, 2023 at 4:50 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
I use libpqxx via vcpkg in Windows project build for x64 system using Visual
Studio. There is a UNICODE define, which causes **W Windows functions to be
used and hence string parameters should be passed in using L"...".
I have sslrootcert file used in my connection string. During the connection
initialization stat() function is invoked for the file. It is implemented in
win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW.
Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in
is char*, not wchar_t*. This leads to failed initialization returning error
code 126 - The specified module could not be found.
There should be a switch if UNICODE is defined, then L"ntdll.dll" should be
passed in. The same applies to all other Windows functions consuming
strings.
I don't understand vcpkg so I don't even know where to look, but who
defined UNICODE when building libpq?
I defined UNICODE for the whole project and it's propagated also to the
dependencies. Vcpkg is just the package manager. You can see it's port in
the public repository:
https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake
Dne čt 30. 11. 2023 20:20 uživatel Thomas Munro <thomas.munro@gmail.com>
napsal:
Show quoted text
On Fri, Dec 1, 2023 at 4:50 AM PG Bug reporting form
<noreply@postgresql.org> wrote:I use libpqxx via vcpkg in Windows project build for x64 system using
Visual
Studio. There is a UNICODE define, which causes **W Windows functions to
be
used and hence string parameters should be passed in using L"...".
I have sslrootcert file used in my connection string. During theconnection
initialization stat() function is invoked for the file. It is
implemented in
win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
LoadLibraryEx which is just a define above LoadLibraryExA orLoadLibraryExW.
Using UNICODE implies LoadLibraryExW to be used, but the parameter
passed in
is char*, not wchar_t*. This leads to failed initialization returning
error
code 126 - The specified module could not be found.
There should be a switch if UNICODE is defined, then L"ntdll.dll" shouldbe
passed in. The same applies to all other Windows functions consuming
strings.I don't understand vcpkg so I don't even know where to look, but who
defined UNICODE when building libpq?
On Fri, Dec 1, 2023 at 10:10 AM Jan Březina <2janbrezina@gmail.com> wrote:
I defined UNICODE for the whole project and it's propagated also to the dependencies. Vcpkg is just the package manager. You can see it's port in the public repository: https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake
I see. Well, generally speaking PostgreSQL just isn't going to work
if some external thing #defines a bunch of stuff that changes the
meaning of system headers. For example on Unix systems there are
various headers that will tell some systems to use various ancient
standards instead of the modern POSIX semantics, and if you #define
those, surprise!, PostgreSQL will not work. But I'm not a Windows
person so I have no real opinion on whether that particular thing
would be reasonably expected to work and is something that people
typically go around doing. What would you suggest? Should we
explicitly #undef it, is that what people do? My guess is that there
is 0% chance that we'll ever modify our OWN behaviour due to UNICODE
(that is, our C functions are not going to have a wchar_t version that
UNICODE silently selects that you can call, you'll always have to talk
to our code with char strings AFAICS), but as for how our own .c files
internally talk to the system headers, that's kinda private to our
source tree, so I guess it would at least be theoretically possible
for us to nail that down with a well placed #undef. I just don't know
how sensible or typical that would be.
On Fri, Dec 1, 2023 at 10:20 AM Thomas Munro <thomas.munro@gmail.com> wrote:
various headers that will tell some systems to use various ancient
s/headers/macros/
It's pretty common to handle UNICODE define on Windows AFAIK. I suggest two
ways of handling the issue.
1. pass the appropriate type if UNICODE is defined:
#ifdef UNICODE
LoadLibraryEx(L"ntdll.dll", nullptr, 0);
#else
LoadLibraryEx("ntdll.dll", nullptr, 0);
#endif
2. Use **A functions no matter what (everywhere)
LoadLibraryExA("ntdll.dll", nullptr, 0);
https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
See, LoadLibraryEx is affected by the UNICODE define within the Windows
header files. It's LoadLibraryExW if it's defined, or LoadLibraryExA if
it's not:
#ifdef UNICODE
#define LoadLibraryEx LoadLibraryExW
#else
#define LoadLibraryEx LoadLibraryExA
#endif
Dne čt 30. 11. 2023 22:21 uživatel Thomas Munro <thomas.munro@gmail.com>
napsal:
Show quoted text
On Fri, Dec 1, 2023 at 10:10 AM Jan Březina <2janbrezina@gmail.com> wrote:
I defined UNICODE for the whole project and it's propagated also to the
dependencies. Vcpkg is just the package manager. You can see it's port in
the public repository:
https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmakeI see. Well, generally speaking PostgreSQL just isn't going to work
if some external thing #defines a bunch of stuff that changes the
meaning of system headers. For example on Unix systems there are
various headers that will tell some systems to use various ancient
standards instead of the modern POSIX semantics, and if you #define
those, surprise!, PostgreSQL will not work. But I'm not a Windows
person so I have no real opinion on whether that particular thing
would be reasonably expected to work and is something that people
typically go around doing. What would you suggest? Should we
explicitly #undef it, is that what people do? My guess is that there
is 0% chance that we'll ever modify our OWN behaviour due to UNICODE
(that is, our C functions are not going to have a wchar_t version that
UNICODE silently selects that you can call, you'll always have to talk
to our code with char strings AFAICS), but as for how our own .c files
internally talk to the system headers, that's kinda private to our
source tree, so I guess it would at least be theoretically possible
for us to nail that down with a well placed #undef. I just don't know
how sensible or typical that would be.
On Fri, Dec 1, 2023 at 10:50 AM Jan Březina <2janbrezina@gmail.com> wrote:
It's pretty common to handle UNICODE define on Windows AFAIK. I suggest two ways of handling the issue.
1. pass the appropriate type if UNICODE is defined:
#ifdef UNICODE
LoadLibraryEx(L"ntdll.dll", nullptr, 0);
#else
LoadLibraryEx("ntdll.dll", nullptr, 0);
#endif
But this would apply to many, many places where we call system
functions, no? I guess you've just picked on this one case because it
was the first thing to break. In some cases the string to be provided
is not a constant, so would require passing through char/wchar_t
conversion routines. Seems like a non-starter.
2. Use **A functions no matter what (everywhere)
LoadLibraryExA("ntdll.dll", nullptr, 0);https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa
See, LoadLibraryEx is affected by the UNICODE define within the Windows header files. It's LoadLibraryExW if it's defined, or LoadLibraryExA if it's not:
#ifdef UNICODE
#define LoadLibraryEx LoadLibraryExW
#else
#define LoadLibraryEx LoadLibraryExA
#endif
Right, but since some of the functions that are affected in this way
are also standardised functions that we call on POSIX systems, I think
we'd finish up scattering more #ifdef #else stuff all through the code
to explicitly select the 'ANSI' (sic) Windows variant or the
standardised-function-with-no-suffix variant. Also seems like a
non-starter.
On Fri, Dec 1, 2023 at 10:58 AM Thomas Munro <thomas.munro@gmail.com> wrote:
Seems like a non-starter.
... and I should probably explain why. This project has difficulty
maintaining Windows support, because of a general lack of PostgreSQL
developers working on that platform (but we welcome more! patches
welcome!) and a disproportionately high amount of maintenance work
that it generates. One factor is that there are several different
Windows configurations (different compilers, C runtimes, build
systems, file system semantics, partial POSIX emulations like MSYS and
Cygwin...) to consider. I doubt we'd want to add another dimension to
that problem space by saying we have to maintain and test the UNICODE
and non-UNICODE code paths in our libraries.
Hence my intuition that we should be figuring out where the right
place is to suppress or undefine UNICODE.
Thomas Munro <thomas.munro@gmail.com> writes:
Seems like a non-starter.
... and I should probably explain why. This project has difficulty
maintaining Windows support, because of a general lack of PostgreSQL
developers working on that platform (but we welcome more! patches
welcome!) and a disproportionately high amount of maintenance work
that it generates. One factor is that there are several different
Windows configurations (different compilers, C runtimes, build
systems, file system semantics, partial POSIX emulations like MSYS and
Cygwin...) to consider. I doubt we'd want to add another dimension to
that problem space by saying we have to maintain and test the UNICODE
and non-UNICODE code paths in our libraries.
Yeah, I think there's no chance that we'd accept such a patch even if
one were submitted. Quite aside from the initial development effort,
the ongoing maintenance cost would be large, and the value is just not
there (as evidenced by the approximately zero previous requests we've
had for this).
Hence my intuition that we should be figuring out where the right
place is to suppress or undefine UNICODE.
I'm not even excited about that. Would we accept patches to #undef
the random other system-API-changing symbols that exist on other
platforms? We've not done so in the past and I don't see a great
argument to start here. IMO the submitter has simply misconfigured
his project. He should not assume that he can build other peoples'
code with whatever configuration suits his own code.
regards, tom lane
The project is not misconfigured. Windows applications use MBCS (multi byte
character set) or UNICODE defines to specify what Windows API they are
using and there are good reasons not to mix the two.
You should read the following documentation:
https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api
and mainly
https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes
So if you are not planning to support UNICODE define, then you should be
concrete in what Windows API you are using and call the **A (ANSI) API
directly, as I suggested. It is a correct way, doesn't conflict with any
defines and doesn't conflict with any standard API, as UNICODE define
affects only the behavior of Windows API.
I bet there won't be so much work to fix this, as Windows specific
implementations are in separate *win32.c files and hence it should not
affect any other platforms and it has to work with every compiler, as it is
the standard Windows API.
I'm surprised no one raised this problem up until now and I believe there
must be a lot of workarounds to make libpq library work "correctly".
If the patch would be accepted, I can help with it's implementation.
Dne čt 30. 11. 2023 23:32 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Show quoted text
Thomas Munro <thomas.munro@gmail.com> writes:
Seems like a non-starter.
... and I should probably explain why. This project has difficulty
maintaining Windows support, because of a general lack of PostgreSQL
developers working on that platform (but we welcome more! patches
welcome!) and a disproportionately high amount of maintenance work
that it generates. One factor is that there are several different
Windows configurations (different compilers, C runtimes, build
systems, file system semantics, partial POSIX emulations like MSYS and
Cygwin...) to consider. I doubt we'd want to add another dimension to
that problem space by saying we have to maintain and test the UNICODE
and non-UNICODE code paths in our libraries.Yeah, I think there's no chance that we'd accept such a patch even if
one were submitted. Quite aside from the initial development effort,
the ongoing maintenance cost would be large, and the value is just not
there (as evidenced by the approximately zero previous requests we've
had for this).Hence my intuition that we should be figuring out where the right
place is to suppress or undefine UNICODE.I'm not even excited about that. Would we accept patches to #undef
the random other system-API-changing symbols that exist on other
platforms? We've not done so in the past and I don't see a great
argument to start here. IMO the submitter has simply misconfigured
his project. He should not assume that he can build other peoples'
code with whatever configuration suits his own code.regards, tom lane
On Mon, Dec 4, 2023 at 8:56 AM Jan Březina <2janbrezina@gmail.com> wrote:
The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and there are good reasons not to mix the two.
You should read the following documentation:
https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api
and mainly
https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes
We know about that stuff at least in general terms, but we are arguing
that *we* should be in control of whether our .c files are compiled
with UNICODE defined, not someone compiling our library. That advice
is about applications. libpq is a shared library.
But let's actually check the cost of this:
So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using and call the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't conflict with any standard API, as UNICODE define affects only the behavior of Windows API.
I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and hence it should not affect any other platforms and it has to work with every compiler, as it is the standard Windows API.
What symbols is libpq.dll referencing? Maybe "dumpbin /imports
libpq.dll" can tell you? I guess that'd show us how many *A /*W
functions are reached, and would be changed to call *A directly. If
it's really just Win32 APIs that are in Windows-only code paths that'd
be one thing. I wasn't sure if it might also include standard C-ish
code that is shared with Unix.
Looks like there is a mix to me…
c:\...mumble…> dumpbin.exe /imports libpq.dll | grep A$
23 InitializeSecurityContextA
1 AcquireCredentialsHandleA
26A GetModuleHandleA
3A8 LoadLibraryA
237 GetFileAttributesA
BA CreateFileA
3A9 LoadLibraryExA
19F FormatMessageA
179 GetUserNameA
CC SHGetFolderPathA
c:\mumble…> dumpbin.exe /imports libpq.dll | grep W$
2C5 GetStartupInfoW
26D GetModuleHandleW
From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, December 4, 2023 9:20 AM
To: Jan Březina <2janbrezina@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #18219: libpq does not take into consideration UNICODE define
On Mon, Dec 4, 2023 at 8: 56 AM Jan Březina <2janbrezina@ gmail. com> wrote: > The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and
On Mon, Dec 4, 2023 at 8:56 AM Jan Březina <2janbrezina@gmail.com<mailto:2janbrezina@gmail.com>> wrote:
The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and there are good reasons not to mix the two.
You should read the following documentation:
and mainly
We know about that stuff at least in general terms, but we are arguing
that *we* should be in control of whether our .c files are compiled
with UNICODE defined, not someone compiling our library. That advice
is about applications. libpq is a shared library.
But let's actually check the cost of this:
So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using and call the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't conflict with any standard API, as UNICODE define affects only the behavior of Windows API.
I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and hence it should not affect any other platforms and it has to work with every compiler, as it is the standard Windows API.
What symbols is libpq.dll referencing? Maybe "dumpbin /imports
libpq.dll" can tell you? I guess that'd show us how many *A /*W
functions are reached, and would be changed to call *A directly. If
it's really just Win32 APIs that are in Windows-only code paths that'd
be one thing. I wasn't sure if it might also include standard C-ish
code that is shared with Unix.