BUG #18219: libpq does not take into consideration UNICODE define

Started by PG Bug reporting formover 2 years ago12 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18219
Logged by: Jan Březina
Email address: 2janbrezina@gmail.com
PostgreSQL version: 16.0
Operating system: Windows
Description:

I use libpqxx via vcpkg in Windows project build for x64 system using Visual
Studio. There is a UNICODE define, which causes **W Windows functions to be
used and hence string parameters should be passed in using L"...".
I have sslrootcert file used in my connection string. During the connection
initialization stat() function is invoked for the file. It is implemented in
win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW.
Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in
is char*, not wchar_t*. This leads to failed initialization returning error
code 126 - The specified module could not be found.
There should be a switch if UNICODE is defined, then L"ntdll.dll" should be
passed in. The same applies to all other Windows functions consuming
strings.

#2Thomas Munro
thomas.munro@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #18219: libpq does not take into consideration UNICODE define

On Fri, Dec 1, 2023 at 4:50 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

I use libpqxx via vcpkg in Windows project build for x64 system using Visual
Studio. There is a UNICODE define, which causes **W Windows functions to be
used and hence string parameters should be passed in using L"...".
I have sslrootcert file used in my connection string. During the connection
initialization stat() function is invoked for the file. It is implemented in
win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
LoadLibraryEx which is just a define above LoadLibraryExA or LoadLibraryExW.
Using UNICODE implies LoadLibraryExW to be used, but the parameter passed in
is char*, not wchar_t*. This leads to failed initialization returning error
code 126 - The specified module could not be found.
There should be a switch if UNICODE is defined, then L"ntdll.dll" should be
passed in. The same applies to all other Windows functions consuming
strings.

I don't understand vcpkg so I don't even know where to look, but who
defined UNICODE when building libpq?

#3Jan Březina
2janbrezina@gmail.com
In reply to: Thomas Munro (#2)
Re: BUG #18219: libpq does not take into consideration UNICODE define

I defined UNICODE for the whole project and it's propagated also to the
dependencies. Vcpkg is just the package manager. You can see it's port in
the public repository:
https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake

Dne čt 30. 11. 2023 20:20 uživatel Thomas Munro <thomas.munro@gmail.com>
napsal:

Show quoted text

On Fri, Dec 1, 2023 at 4:50 AM PG Bug reporting form
<noreply@postgresql.org> wrote:

I use libpqxx via vcpkg in Windows project build for x64 system using

Visual

Studio. There is a UNICODE define, which causes **W Windows functions to

be

used and hence string parameters should be passed in using L"...".
I have sslrootcert file used in my connection string. During the

connection

initialization stat() function is invoked for the file. It is

implemented in

win32stat.c, which then invokes pgwin32_open_handle() (open.c) and then
initialize_ntdll() (win32ntdll.c) is invoked. There is a use of
LoadLibraryEx which is just a define above LoadLibraryExA or

LoadLibraryExW.

Using UNICODE implies LoadLibraryExW to be used, but the parameter

passed in

is char*, not wchar_t*. This leads to failed initialization returning

error

code 126 - The specified module could not be found.
There should be a switch if UNICODE is defined, then L"ntdll.dll" should

be

passed in. The same applies to all other Windows functions consuming
strings.

I don't understand vcpkg so I don't even know where to look, but who
defined UNICODE when building libpq?

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Jan Březina (#3)
Re: BUG #18219: libpq does not take into consideration UNICODE define

On Fri, Dec 1, 2023 at 10:10 AM Jan Březina <2janbrezina@gmail.com> wrote:

I defined UNICODE for the whole project and it's propagated also to the dependencies. Vcpkg is just the package manager. You can see it's port in the public repository: https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake

I see. Well, generally speaking PostgreSQL just isn't going to work
if some external thing #defines a bunch of stuff that changes the
meaning of system headers. For example on Unix systems there are
various headers that will tell some systems to use various ancient
standards instead of the modern POSIX semantics, and if you #define
those, surprise!, PostgreSQL will not work. But I'm not a Windows
person so I have no real opinion on whether that particular thing
would be reasonably expected to work and is something that people
typically go around doing. What would you suggest? Should we
explicitly #undef it, is that what people do? My guess is that there
is 0% chance that we'll ever modify our OWN behaviour due to UNICODE
(that is, our C functions are not going to have a wchar_t version that
UNICODE silently selects that you can call, you'll always have to talk
to our code with char strings AFAICS), but as for how our own .c files
internally talk to the system headers, that's kinda private to our
source tree, so I guess it would at least be theoretically possible
for us to nail that down with a well placed #undef. I just don't know
how sensible or typical that would be.

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#4)
Re: BUG #18219: libpq does not take into consideration UNICODE define

On Fri, Dec 1, 2023 at 10:20 AM Thomas Munro <thomas.munro@gmail.com> wrote:

various headers that will tell some systems to use various ancient

s/headers/macros/

#6Jan Březina
2janbrezina@gmail.com
In reply to: Thomas Munro (#4)
Re: BUG #18219: libpq does not take into consideration UNICODE define

It's pretty common to handle UNICODE define on Windows AFAIK. I suggest two
ways of handling the issue.
1. pass the appropriate type if UNICODE is defined:
#ifdef UNICODE
LoadLibraryEx(L"ntdll.dll", nullptr, 0);
#else
LoadLibraryEx("ntdll.dll", nullptr, 0);
#endif

2. Use **A functions no matter what (everywhere)
LoadLibraryExA("ntdll.dll", nullptr, 0);

https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa

See, LoadLibraryEx is affected by the UNICODE define within the Windows
header files. It's LoadLibraryExW if it's defined, or LoadLibraryExA if
it's not:

#ifdef UNICODE
#define LoadLibraryEx LoadLibraryExW
#else
#define LoadLibraryEx LoadLibraryExA
#endif

Dne čt 30. 11. 2023 22:21 uživatel Thomas Munro <thomas.munro@gmail.com>
napsal:

Show quoted text

On Fri, Dec 1, 2023 at 10:10 AM Jan Březina <2janbrezina@gmail.com> wrote:

I defined UNICODE for the whole project and it's propagated also to the

dependencies. Vcpkg is just the package manager. You can see it's port in
the public repository:
https://github.com/Microsoft/vcpkg/blob/master/ports/libpq/portfile.cmake

I see. Well, generally speaking PostgreSQL just isn't going to work
if some external thing #defines a bunch of stuff that changes the
meaning of system headers. For example on Unix systems there are
various headers that will tell some systems to use various ancient
standards instead of the modern POSIX semantics, and if you #define
those, surprise!, PostgreSQL will not work. But I'm not a Windows
person so I have no real opinion on whether that particular thing
would be reasonably expected to work and is something that people
typically go around doing. What would you suggest? Should we
explicitly #undef it, is that what people do? My guess is that there
is 0% chance that we'll ever modify our OWN behaviour due to UNICODE
(that is, our C functions are not going to have a wchar_t version that
UNICODE silently selects that you can call, you'll always have to talk
to our code with char strings AFAICS), but as for how our own .c files
internally talk to the system headers, that's kinda private to our
source tree, so I guess it would at least be theoretically possible
for us to nail that down with a well placed #undef. I just don't know
how sensible or typical that would be.

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Jan Březina (#6)
Re: BUG #18219: libpq does not take into consideration UNICODE define

On Fri, Dec 1, 2023 at 10:50 AM Jan Březina <2janbrezina@gmail.com> wrote:

It's pretty common to handle UNICODE define on Windows AFAIK. I suggest two ways of handling the issue.
1. pass the appropriate type if UNICODE is defined:
#ifdef UNICODE
LoadLibraryEx(L"ntdll.dll", nullptr, 0);
#else
LoadLibraryEx("ntdll.dll", nullptr, 0);
#endif

But this would apply to many, many places where we call system
functions, no? I guess you've just picked on this one case because it
was the first thing to break. In some cases the string to be provided
is not a constant, so would require passing through char/wchar_t
conversion routines. Seems like a non-starter.

2. Use **A functions no matter what (everywhere)
LoadLibraryExA("ntdll.dll", nullptr, 0);

https://learn.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa

See, LoadLibraryEx is affected by the UNICODE define within the Windows header files. It's LoadLibraryExW if it's defined, or LoadLibraryExA if it's not:

#ifdef UNICODE
#define LoadLibraryEx LoadLibraryExW
#else
#define LoadLibraryEx LoadLibraryExA
#endif

Right, but since some of the functions that are affected in this way
are also standardised functions that we call on POSIX systems, I think
we'd finish up scattering more #ifdef #else stuff all through the code
to explicitly select the 'ANSI' (sic) Windows variant or the
standardised-function-with-no-suffix variant. Also seems like a
non-starter.

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#7)
Re: BUG #18219: libpq does not take into consideration UNICODE define

On Fri, Dec 1, 2023 at 10:58 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Seems like a non-starter.

... and I should probably explain why. This project has difficulty
maintaining Windows support, because of a general lack of PostgreSQL
developers working on that platform (but we welcome more! patches
welcome!) and a disproportionately high amount of maintenance work
that it generates. One factor is that there are several different
Windows configurations (different compilers, C runtimes, build
systems, file system semantics, partial POSIX emulations like MSYS and
Cygwin...) to consider. I doubt we'd want to add another dimension to
that problem space by saying we have to maintain and test the UNICODE
and non-UNICODE code paths in our libraries.

Hence my intuition that we should be figuring out where the right
place is to suppress or undefine UNICODE.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#8)
Re: BUG #18219: libpq does not take into consideration UNICODE define

Thomas Munro <thomas.munro@gmail.com> writes:

Seems like a non-starter.

... and I should probably explain why. This project has difficulty
maintaining Windows support, because of a general lack of PostgreSQL
developers working on that platform (but we welcome more! patches
welcome!) and a disproportionately high amount of maintenance work
that it generates. One factor is that there are several different
Windows configurations (different compilers, C runtimes, build
systems, file system semantics, partial POSIX emulations like MSYS and
Cygwin...) to consider. I doubt we'd want to add another dimension to
that problem space by saying we have to maintain and test the UNICODE
and non-UNICODE code paths in our libraries.

Yeah, I think there's no chance that we'd accept such a patch even if
one were submitted. Quite aside from the initial development effort,
the ongoing maintenance cost would be large, and the value is just not
there (as evidenced by the approximately zero previous requests we've
had for this).

Hence my intuition that we should be figuring out where the right
place is to suppress or undefine UNICODE.

I'm not even excited about that. Would we accept patches to #undef
the random other system-API-changing symbols that exist on other
platforms? We've not done so in the past and I don't see a great
argument to start here. IMO the submitter has simply misconfigured
his project. He should not assume that he can build other peoples'
code with whatever configuration suits his own code.

regards, tom lane

#10Jan Březina
2janbrezina@gmail.com
In reply to: Tom Lane (#9)
Re: BUG #18219: libpq does not take into consideration UNICODE define

The project is not misconfigured. Windows applications use MBCS (multi byte
character set) or UNICODE defines to specify what Windows API they are
using and there are good reasons not to mix the two.
You should read the following documentation:
https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api
and mainly
https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes
So if you are not planning to support UNICODE define, then you should be
concrete in what Windows API you are using and call the **A (ANSI) API
directly, as I suggested. It is a correct way, doesn't conflict with any
defines and doesn't conflict with any standard API, as UNICODE define
affects only the behavior of Windows API.
I bet there won't be so much work to fix this, as Windows specific
implementations are in separate *win32.c files and hence it should not
affect any other platforms and it has to work with every compiler, as it is
the standard Windows API.
I'm surprised no one raised this problem up until now and I believe there
must be a lot of workarounds to make libpq library work "correctly".
If the patch would be accepted, I can help with it's implementation.

Dne čt 30. 11. 2023 23:32 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Show quoted text

Thomas Munro <thomas.munro@gmail.com> writes:

Seems like a non-starter.

... and I should probably explain why. This project has difficulty
maintaining Windows support, because of a general lack of PostgreSQL
developers working on that platform (but we welcome more! patches
welcome!) and a disproportionately high amount of maintenance work
that it generates. One factor is that there are several different
Windows configurations (different compilers, C runtimes, build
systems, file system semantics, partial POSIX emulations like MSYS and
Cygwin...) to consider. I doubt we'd want to add another dimension to
that problem space by saying we have to maintain and test the UNICODE
and non-UNICODE code paths in our libraries.

Yeah, I think there's no chance that we'd accept such a patch even if
one were submitted. Quite aside from the initial development effort,
the ongoing maintenance cost would be large, and the value is just not
there (as evidenced by the approximately zero previous requests we've
had for this).

Hence my intuition that we should be figuring out where the right
place is to suppress or undefine UNICODE.

I'm not even excited about that. Would we accept patches to #undef
the random other system-API-changing symbols that exist on other
platforms? We've not done so in the past and I don't see a great
argument to start here. IMO the submitter has simply misconfigured
his project. He should not assume that he can build other peoples'
code with whatever configuration suits his own code.

regards, tom lane

#11Thomas Munro
thomas.munro@gmail.com
In reply to: Jan Březina (#10)
Re: BUG #18219: libpq does not take into consideration UNICODE define

On Mon, Dec 4, 2023 at 8:56 AM Jan Březina <2janbrezina@gmail.com> wrote:

The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and there are good reasons not to mix the two.
You should read the following documentation:
https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api
and mainly
https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes

We know about that stuff at least in general terms, but we are arguing
that *we* should be in control of whether our .c files are compiled
with UNICODE defined, not someone compiling our library. That advice
is about applications. libpq is a shared library.

But let's actually check the cost of this:

So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using and call the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't conflict with any standard API, as UNICODE define affects only the behavior of Windows API.
I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and hence it should not affect any other platforms and it has to work with every compiler, as it is the standard Windows API.

What symbols is libpq.dll referencing? Maybe "dumpbin /imports
libpq.dll" can tell you? I guess that'd show us how many *A /*W
functions are reached, and would be changed to call *A directly. If
it's really just Win32 APIs that are in Windows-only code paths that'd
be one thing. I wasn't sure if it might also include standard C-ish
code that is shared with Unix.

#12Jeff Laing
Jeff.Laing@synchronoss.com
In reply to: Thomas Munro (#11)
RE: BUG #18219: libpq does not take into consideration UNICODE define

Looks like there is a mix to me…

c:\...mumble…> dumpbin.exe /imports libpq.dll | grep A$
23 InitializeSecurityContextA
1 AcquireCredentialsHandleA
26A GetModuleHandleA
3A8 LoadLibraryA
237 GetFileAttributesA
BA CreateFileA
3A9 LoadLibraryExA
19F FormatMessageA
179 GetUserNameA
CC SHGetFolderPathA

c:\mumble…> dumpbin.exe /imports libpq.dll | grep W$
2C5 GetStartupInfoW
26D GetModuleHandleW

From: Thomas Munro <thomas.munro@gmail.com>
Sent: Monday, December 4, 2023 9:20 AM
To: Jan Březina <2janbrezina@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #18219: libpq does not take into consideration UNICODE define

On Mon, Dec 4, 2023 at 8: 56 AM Jan Březina <2janbrezina@ gmail. com> wrote: > The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and

On Mon, Dec 4, 2023 at 8:56 AM Jan Březina <2janbrezina@gmail.com<mailto:2janbrezina@gmail.com>> wrote:

The project is not misconfigured. Windows applications use MBCS (multi byte character set) or UNICODE defines to specify what Windows API they are using and there are good reasons not to mix the two.

You should read the following documentation:

https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api__;!!LlG_G4lo9h6Y!NGBMDMpFtxNbtb155g1HCcZB7if4EEXqwlnDws8-7_cM4q7QKLQtHy960vxoB2eM1FGocuofqDDQHEKE9HLunNjlOeY$&lt;https://urldefense.com/v3/__https:/learn.microsoft.com/en-us/windows/win32/intl/unicode-in-the-windows-api__;!!LlG_G4lo9h6Y!NGBMDMpFtxNbtb155g1HCcZB7if4EEXqwlnDws8-7_cM4q7QKLQtHy960vxoB2eM1FGocuofqDDQHEKE9HLunNjlOeY$&gt;

and mainly

https://urldefense.com/v3/__https://learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes__;!!LlG_G4lo9h6Y!NGBMDMpFtxNbtb155g1HCcZB7if4EEXqwlnDws8-7_cM4q7QKLQtHy960vxoB2eM1FGocuofqDDQHEKE9HLupjSS5eI$&lt;https://urldefense.com/v3/__https:/learn.microsoft.com/en-us/windows/win32/intl/conventions-for-function-prototypes__;!!LlG_G4lo9h6Y!NGBMDMpFtxNbtb155g1HCcZB7if4EEXqwlnDws8-7_cM4q7QKLQtHy960vxoB2eM1FGocuofqDDQHEKE9HLupjSS5eI$&gt;

We know about that stuff at least in general terms, but we are arguing

that *we* should be in control of whether our .c files are compiled

with UNICODE defined, not someone compiling our library. That advice

is about applications. libpq is a shared library.

But let's actually check the cost of this:

So if you are not planning to support UNICODE define, then you should be concrete in what Windows API you are using and call the **A (ANSI) API directly, as I suggested. It is a correct way, doesn't conflict with any defines and doesn't conflict with any standard API, as UNICODE define affects only the behavior of Windows API.

I bet there won't be so much work to fix this, as Windows specific implementations are in separate *win32.c files and hence it should not affect any other platforms and it has to work with every compiler, as it is the standard Windows API.

What symbols is libpq.dll referencing? Maybe "dumpbin /imports

libpq.dll" can tell you? I guess that'd show us how many *A /*W

functions are reached, and would be changed to call *A directly. If

it's really just Win32 APIs that are in Windows-only code paths that'd

be one thing. I wasn't sure if it might also include standard C-ish

code that is shared with Unix.