Reject Windows command lines not convertible to system locale

Started by Noah Mischabout 1 year ago4 messages
#1Noah Misch
noah@leadboat.com
2 attachment(s)

The security team received a report about misbehavior when a Windows program
executes a PostgreSQL program, passing a wide character on the command line.
Every such character gets translated to something else. Of note, a U+FF02
FULLWIDTH QUOTATION MARK in the caller-provided command line becomes U+0022
QUOTATION MARK in GetCommandLineA(). Since U+0022 is a metacharacter for
splitting the command line into argv, the argv argument boundaries may be
contrary to caller intentions. This isn't a vulnerability in PostgreSQL, but
it may be a vulnerability in Windows programs that construct CreateProcessW()
arguments to execute PostgreSQL programs. We don't know of any affected
programs, but we didn't search diligently.

The report (reporter bcc'd) proposed a fix of using non-ANSI APIs (UNICODE
APIs) to process the command line. That would create the worse problem that
both the caller (runs CreateProcessW() or CreateProcessA()) and callee (runs
GetCommandLineA() or GetCommandLineW()) would need to change simultaneously or
suffer mojibake. Consider the example of the argument to vacuumdb's
"--schema" argument. Under PGCLIENTENCODING=UTF8, a caller wanting YEN SIGN
runs CreateProcessA(... 0xC2 0xA5 ...) or
CreateProcessW(... MultiByteToWideChar(CP_ACP, 0, ... 0xC2 0xA5 ...), ...).
Suppose vacuumdb switched to wmain(int argc, wchar_t *argv[]) and interpreted
argv as UTF-16. The same two CreateProcess calls would then get a
two-character schema name instead, yielding mojibake until callers update to
CreateProcessW(... 0x00A5 ...). That would create as many vulnerabilities as
it solves.

What we can do safely is exit if we could not convert the command line to the
Windows ANSI code page (losslessly). Patch attached.

As I mention in a code comment, I decided not to do anything about the similar
hazard in GetEnvironmentStringsW(). A possible alternative would be to check
each var referenced in PostgreSQL code or each var where the name begins with
"PG". Should we use one of those or another alternative?

The Perl testing calls powershell, which is new for our source tree. The
Internet thinks Windows Server 2008 R1 is the newest Windows where powershell
was optional. The buildfarm's oldest Windows is Windows Server 2008 R2. If
powershell is bad, there's likely a hairy way to write the test in any of
cmd.exe, cscript, Perl Win32::API, or C. My first choice was Perl
Win32::Process or IPC::Run. This test needs CreateProcessW(), but those two
use CreateProcessA().

I subjected the postmaster to program_options_handling_ok() and the other
tests we apply to most executables.

I didn't test a CP_UTF8 Windows system. The new test coverage has a condition
intended to avoid failure there. (My Windows development environments are all
too old, and I stopped short of building a new one for this.) CP_UTF8 Windows
raises broader issues. I'll start a separate thread about them.

I wanted to put the check in some function that most main() instances already
call, instead of adding ~40 new calls for this esoteric topic. Options:

1. Call in pg_logging_init() and the backend. Add pg_logging_init() calls to
pg_config, ecpg (the ecpg preprocessor), and pg_test_timing. I picked
this, for two minor benefits. First, it facilitates the backend
initializing LC_MESSAGES before this check. Second, every frontend may
need pg_logging_init() eventually, so it's fair to add calls where missing.

2. Call in set_pglocale_pgservice() and the three programs that don't call
set_pglocale_pgservice(): oid2name, vacuumlo, pgbench.

Thanks,
nm

Attachments:

bestfit05-postmaster-test-basic-v1.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Test postmaster with program_options_handling_ok() et al.
    
    Most executables already get that testing.  To occupy the customary
    001_basic.pl name, this renumbers the new-in-October tests of
    src/test/postmaster/t.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/test/postmaster/meson.build b/src/test/postmaster/meson.build
index 2d89adf..5b909fe 100644
--- a/src/test/postmaster/meson.build
+++ b/src/test/postmaster/meson.build
@@ -6,8 +6,9 @@ tests += {
   'bd': meson.current_build_dir(),
   'tap': {
     'tests': [
-      't/001_connection_limits.pl',
-      't/002_start_stop.pl',
+      't/001_basic.pl',
+      't/002_connection_limits.pl',
+      't/003_start_stop.pl',
     ],
   },
 }
diff --git a/src/test/postmaster/t/001_basic.pl b/src/test/postmaster/t/001_basic.pl
new file mode 100644
index 0000000..13f411d
--- /dev/null
+++ b/src/test/postmaster/t/001_basic.pl
@@ -0,0 +1,13 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+program_help_ok('postgres');
+program_version_ok('postgres');
+program_options_handling_ok('postgres');
+
+done_testing();
diff --git a/src/test/postmaster/t/001_connection_limits.pl b/src/test/postmaster/t/001_connection_limits.pl
deleted file mode 100644
index 7e64a82..0000000
--- a/src/test/postmaster/t/001_connection_limits.pl
+++ /dev/null
@@ -1,123 +0,0 @@
-
-# Copyright (c) 2021-2024, PostgreSQL Global Development Group
-
-# Test connection limits, i.e. max_connections, reserved_connections
-# and superuser_reserved_connections.
-
-use strict;
-use warnings FATAL => 'all';
-use PostgreSQL::Test::Cluster;
-use PostgreSQL::Test::Utils;
-use Test::More;
-
-# Initialize the server with specific low connection limits
-my $node = PostgreSQL::Test::Cluster->new('primary');
-$node->init(
-	'auth_extra' => [
-		'--create-role', 'regress_regular,regress_reserved,regress_superuser'
-	]);
-$node->append_conf('postgresql.conf', "max_connections = 6");
-$node->append_conf('postgresql.conf', "reserved_connections = 2");
-$node->append_conf('postgresql.conf', "superuser_reserved_connections = 1");
-$node->append_conf('postgresql.conf', "log_connections = on");
-$node->start;
-
-$node->safe_psql(
-	'postgres', qq{
-CREATE USER regress_regular LOGIN;
-CREATE USER regress_reserved LOGIN;
-GRANT pg_use_reserved_connections TO regress_reserved;
-CREATE USER regress_superuser LOGIN SUPERUSER;
-});
-
-# With the limits we set in postgresql.conf, we can establish:
-# - 3 connections for any user with no special privileges
-# - 2 more connections for users belonging to "pg_use_reserved_connections"
-# - 1 more connection for superuser
-
-sub background_psql_as_user
-{
-	my $user = shift;
-
-	return $node->background_psql(
-		'postgres',
-		on_error_die => 1,
-		extra_params => [ '-U', $user ]);
-}
-
-my @sessions = ();
-my @raw_connections = ();
-
-push(@sessions, background_psql_as_user('regress_regular'));
-push(@sessions, background_psql_as_user('regress_regular'));
-push(@sessions, background_psql_as_user('regress_regular'));
-$node->connect_fails(
-	"dbname=postgres user=regress_regular",
-	"reserved_connections limit",
-	expected_stderr =>
-	  qr/FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role/
-);
-
-push(@sessions, background_psql_as_user('regress_reserved'));
-push(@sessions, background_psql_as_user('regress_reserved'));
-$node->connect_fails(
-	"dbname=postgres user=regress_regular",
-	"reserved_connections limit",
-	expected_stderr =>
-	  qr/FATAL:  remaining connection slots are reserved for roles with the SUPERUSER attribute/
-);
-
-push(@sessions, background_psql_as_user('regress_superuser'));
-$node->connect_fails(
-	"dbname=postgres user=regress_superuser",
-	"superuser_reserved_connections limit",
-	expected_stderr => qr/FATAL:  sorry, too many clients already/);
-
-# We can still open TCP (or Unix domain socket) connections, but
-# beyond a certain number (roughly 2x max_connections), they will be
-# "dead-end backends".
-SKIP:
-{
-	skip "this test requires working raw_connect()"
-	  unless $node->raw_connect_works();
-
-	for (my $i = 0; $i <= 20; $i++)
-	{
-		my $sock = $node->raw_connect();
-
-		# On a busy system, the server might reject connections if
-		# postmaster cannot accept() them fast enough. The exact limit
-		# and behavior depends on the platform. To make this reliable,
-		# we attempt SSL negotiation on each connection before opening
-		# next one. The server will reject the SSL negotiations, but
-		# when it does so, we know that the backend has been launched
-		# and we should be able to open another connection.
-
-		# SSLRequest packet consists of packet length followed by
-		# NEGOTIATE_SSL_CODE.
-		my $negotiate_ssl_code = pack("Nnn", 8, 1234, 5679);
-		my $sent = $sock->send($negotiate_ssl_code);
-
-		# Read reply. We expect the server to reject it with 'N'
-		my $reply = "";
-		$sock->recv($reply, 1);
-		is($reply, "N", "dead-end connection $i");
-
-		push(@raw_connections, $sock);
-	}
-}
-
-# TODO: test that query cancellation is still possible. A dead-end
-# backend can process a query cancellation packet.
-
-# Clean up
-foreach my $session (@sessions)
-{
-	$session->quit;
-}
-foreach my $socket (@raw_connections)
-{
-	$socket->close();
-}
-
-done_testing();
diff --git a/src/test/postmaster/t/002_connection_limits.pl b/src/test/postmaster/t/002_connection_limits.pl
new file mode 100644
index 0000000..7e64a82
--- /dev/null
+++ b/src/test/postmaster/t/002_connection_limits.pl
@@ -0,0 +1,123 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Test connection limits, i.e. max_connections, reserved_connections
+# and superuser_reserved_connections.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Initialize the server with specific low connection limits
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init(
+	'auth_extra' => [
+		'--create-role', 'regress_regular,regress_reserved,regress_superuser'
+	]);
+$node->append_conf('postgresql.conf', "max_connections = 6");
+$node->append_conf('postgresql.conf', "reserved_connections = 2");
+$node->append_conf('postgresql.conf', "superuser_reserved_connections = 1");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->start;
+
+$node->safe_psql(
+	'postgres', qq{
+CREATE USER regress_regular LOGIN;
+CREATE USER regress_reserved LOGIN;
+GRANT pg_use_reserved_connections TO regress_reserved;
+CREATE USER regress_superuser LOGIN SUPERUSER;
+});
+
+# With the limits we set in postgresql.conf, we can establish:
+# - 3 connections for any user with no special privileges
+# - 2 more connections for users belonging to "pg_use_reserved_connections"
+# - 1 more connection for superuser
+
+sub background_psql_as_user
+{
+	my $user = shift;
+
+	return $node->background_psql(
+		'postgres',
+		on_error_die => 1,
+		extra_params => [ '-U', $user ]);
+}
+
+my @sessions = ();
+my @raw_connections = ();
+
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+push(@sessions, background_psql_as_user('regress_regular'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with privileges of the "pg_use_reserved_connections" role/
+);
+
+push(@sessions, background_psql_as_user('regress_reserved'));
+push(@sessions, background_psql_as_user('regress_reserved'));
+$node->connect_fails(
+	"dbname=postgres user=regress_regular",
+	"reserved_connections limit",
+	expected_stderr =>
+	  qr/FATAL:  remaining connection slots are reserved for roles with the SUPERUSER attribute/
+);
+
+push(@sessions, background_psql_as_user('regress_superuser'));
+$node->connect_fails(
+	"dbname=postgres user=regress_superuser",
+	"superuser_reserved_connections limit",
+	expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# We can still open TCP (or Unix domain socket) connections, but
+# beyond a certain number (roughly 2x max_connections), they will be
+# "dead-end backends".
+SKIP:
+{
+	skip "this test requires working raw_connect()"
+	  unless $node->raw_connect_works();
+
+	for (my $i = 0; $i <= 20; $i++)
+	{
+		my $sock = $node->raw_connect();
+
+		# On a busy system, the server might reject connections if
+		# postmaster cannot accept() them fast enough. The exact limit
+		# and behavior depends on the platform. To make this reliable,
+		# we attempt SSL negotiation on each connection before opening
+		# next one. The server will reject the SSL negotiations, but
+		# when it does so, we know that the backend has been launched
+		# and we should be able to open another connection.
+
+		# SSLRequest packet consists of packet length followed by
+		# NEGOTIATE_SSL_CODE.
+		my $negotiate_ssl_code = pack("Nnn", 8, 1234, 5679);
+		my $sent = $sock->send($negotiate_ssl_code);
+
+		# Read reply. We expect the server to reject it with 'N'
+		my $reply = "";
+		$sock->recv($reply, 1);
+		is($reply, "N", "dead-end connection $i");
+
+		push(@raw_connections, $sock);
+	}
+}
+
+# TODO: test that query cancellation is still possible. A dead-end
+# backend can process a query cancellation packet.
+
+# Clean up
+foreach my $session (@sessions)
+{
+	$session->quit;
+}
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
+
+done_testing();
diff --git a/src/test/postmaster/t/002_start_stop.pl b/src/test/postmaster/t/002_start_stop.pl
deleted file mode 100644
index 993d855..0000000
--- a/src/test/postmaster/t/002_start_stop.pl
+++ /dev/null
@@ -1,98 +0,0 @@
-
-# Copyright (c) 2021-2024, PostgreSQL Global Development Group
-
-# Test postmaster start and stop state machine.
-
-use strict;
-use warnings FATAL => 'all';
-use PostgreSQL::Test::Cluster;
-use PostgreSQL::Test::Utils;
-use Test::More;
-
-#
-# Test that dead-end backends don't prevent the server from shutting
-# down.
-#
-# Dead-end backends can linger until they reach
-# authentication_timeout. We use a long authentication_timeout and a
-# much shorter timeout for the "pg_ctl stop" operation, to test that
-# if dead-end backends are killed at fast shut down. If they're not,
-# "pg_ctl stop" will error out before the authentication timeout kicks
-# in and cleans up the dead-end backends.
-my $authentication_timeout = $PostgreSQL::Test::Utils::timeout_default;
-my $stop_timeout = $authentication_timeout / 2;
-
-# Initialize the server with low connection limits, to test dead-end backends
-my $node = PostgreSQL::Test::Cluster->new('main');
-$node->init;
-$node->append_conf('postgresql.conf', "max_connections = 5");
-$node->append_conf('postgresql.conf', "max_wal_senders = 0");
-$node->append_conf('postgresql.conf', "autovacuum_max_workers = 1");
-$node->append_conf('postgresql.conf', "max_worker_processes = 1");
-$node->append_conf('postgresql.conf', "log_connections = on");
-$node->append_conf('postgresql.conf', "log_min_messages = debug2");
-$node->append_conf('postgresql.conf',
-	"authentication_timeout = '$authentication_timeout s'");
-$node->append_conf('postgresql.conf', 'trace_connection_negotiation=on');
-$node->start;
-
-if (!$node->raw_connect_works())
-{
-	plan skip_all => "this test requires working raw_connect()";
-}
-
-my @raw_connections = ();
-
-# Open a lot of TCP (or Unix domain socket) connections to use up all
-# the connection slots. Beyond a certain number (roughly 2x
-# max_connections), they will be "dead-end backends".
-for (my $i = 0; $i <= 20; $i++)
-{
-	my $sock = $node->raw_connect();
-
-	# On a busy system, the server might reject connections if
-	# postmaster cannot accept() them fast enough. The exact limit
-	# and behavior depends on the platform. To make this reliable,
-	# we attempt SSL negotiation on each connection before opening
-	# next one. The server will reject the SSL negotations, but
-	# when it does so, we know that the backend has been launched
-	# and we should be able to open another connection.
-
-	# SSLRequest packet consists of packet length followed by
-	# NEGOTIATE_SSL_CODE.
-	my $negotiate_ssl_code = pack("Nnn", 8, 1234, 5679);
-	my $sent = $sock->send($negotiate_ssl_code);
-
-	# Read reply. We expect the server to reject it with 'N'
-	my $reply = "";
-	$sock->recv($reply, 1);
-	is($reply, "N", "dead-end connection $i");
-
-	push(@raw_connections, $sock);
-}
-
-# When all the connection slots are in use, new connections will fail
-# before even looking up the user. Hence you now get "sorry, too many
-# clients already" instead of "role does not exist" error. Test that
-# to ensure that we have used up all the slots.
-$node->connect_fails("dbname=postgres user=invalid_user",
-	"connect ",
-	expected_stderr => qr/FATAL:  sorry, too many clients already/);
-
-# Open one more connection, to really ensure that we have at least one
-# dead-end backend.
-my $sock = $node->raw_connect();
-
-# Test that the dead-end backends don't prevent the server from stopping.
-$node->stop('fast', timeout => $stop_timeout);
-
-$node->start();
-$node->connect_ok("dbname=postgres", "works after restart");
-
-# Clean up
-foreach my $socket (@raw_connections)
-{
-	$socket->close();
-}
-
-done_testing();
diff --git a/src/test/postmaster/t/003_start_stop.pl b/src/test/postmaster/t/003_start_stop.pl
new file mode 100644
index 0000000..993d855
--- /dev/null
+++ b/src/test/postmaster/t/003_start_stop.pl
@@ -0,0 +1,98 @@
+
+# Copyright (c) 2021-2024, PostgreSQL Global Development Group
+
+# Test postmaster start and stop state machine.
+
+use strict;
+use warnings FATAL => 'all';
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+#
+# Test that dead-end backends don't prevent the server from shutting
+# down.
+#
+# Dead-end backends can linger until they reach
+# authentication_timeout. We use a long authentication_timeout and a
+# much shorter timeout for the "pg_ctl stop" operation, to test that
+# if dead-end backends are killed at fast shut down. If they're not,
+# "pg_ctl stop" will error out before the authentication timeout kicks
+# in and cleans up the dead-end backends.
+my $authentication_timeout = $PostgreSQL::Test::Utils::timeout_default;
+my $stop_timeout = $authentication_timeout / 2;
+
+# Initialize the server with low connection limits, to test dead-end backends
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->append_conf('postgresql.conf', "max_connections = 5");
+$node->append_conf('postgresql.conf', "max_wal_senders = 0");
+$node->append_conf('postgresql.conf', "autovacuum_max_workers = 1");
+$node->append_conf('postgresql.conf', "max_worker_processes = 1");
+$node->append_conf('postgresql.conf', "log_connections = on");
+$node->append_conf('postgresql.conf', "log_min_messages = debug2");
+$node->append_conf('postgresql.conf',
+	"authentication_timeout = '$authentication_timeout s'");
+$node->append_conf('postgresql.conf', 'trace_connection_negotiation=on');
+$node->start;
+
+if (!$node->raw_connect_works())
+{
+	plan skip_all => "this test requires working raw_connect()";
+}
+
+my @raw_connections = ();
+
+# Open a lot of TCP (or Unix domain socket) connections to use up all
+# the connection slots. Beyond a certain number (roughly 2x
+# max_connections), they will be "dead-end backends".
+for (my $i = 0; $i <= 20; $i++)
+{
+	my $sock = $node->raw_connect();
+
+	# On a busy system, the server might reject connections if
+	# postmaster cannot accept() them fast enough. The exact limit
+	# and behavior depends on the platform. To make this reliable,
+	# we attempt SSL negotiation on each connection before opening
+	# next one. The server will reject the SSL negotations, but
+	# when it does so, we know that the backend has been launched
+	# and we should be able to open another connection.
+
+	# SSLRequest packet consists of packet length followed by
+	# NEGOTIATE_SSL_CODE.
+	my $negotiate_ssl_code = pack("Nnn", 8, 1234, 5679);
+	my $sent = $sock->send($negotiate_ssl_code);
+
+	# Read reply. We expect the server to reject it with 'N'
+	my $reply = "";
+	$sock->recv($reply, 1);
+	is($reply, "N", "dead-end connection $i");
+
+	push(@raw_connections, $sock);
+}
+
+# When all the connection slots are in use, new connections will fail
+# before even looking up the user. Hence you now get "sorry, too many
+# clients already" instead of "role does not exist" error. Test that
+# to ensure that we have used up all the slots.
+$node->connect_fails("dbname=postgres user=invalid_user",
+	"connect ",
+	expected_stderr => qr/FATAL:  sorry, too many clients already/);
+
+# Open one more connection, to really ensure that we have at least one
+# dead-end backend.
+my $sock = $node->raw_connect();
+
+# Test that the dead-end backends don't prevent the server from stopping.
+$node->stop('fast', timeout => $stop_timeout);
+
+$node->start();
+$node->connect_ok("dbname=postgres", "works after restart");
+
+# Clean up
+foreach my $socket (@raw_connections)
+{
+	$socket->close();
+}
+
+done_testing();
bestfit10-cmdline-fatal-v2.patchtext/plain; charset=us-asciiDownload
Author:     Noah Misch <noah@leadboat.com>
Commit:     Noah Misch <noah@leadboat.com>

    Reject Windows command lines not convertible to Windows ANSI code page.
    
    PostgreSQL can't interpret them correctly without causing mojibake for
    other arguments.  This isn't fixing a vulnerability in PostgreSQL, but
    Windows programs that form arbitrary CreateProcessW() command lines to
    execute PostgreSQL programs may have a vulnerability.  If those programs
    exist, this change would mitigate their associated vulnerabilities.
    
    A program reaching the new error could use the following steps.
    Construct a char[] representing the intended PostgreSQL child program
    command line bytes.  Filter that through MultiByteToWideChar(CP_ACP, 0,
    ...), and pass the result as the CreateProcessW() command line argument.
    
    This is both a compatibility break and a bug fix.  While PostgreSQL has
    been misinterpreting the arguments this rejects, the application may be
    compensating.  For example, an inconvertible character in the argument
    to psql's "--output" has been causing a write to a differently-named
    file, determined by Windows "best fit" behavior.  If other file accesses
    filter the name through "best fit", the application may be functioning
    and oblivious to the actual file name written.  Hence, no back-patch.
    
    Reported by Splitline Huang.  Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/FIXME

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index e286810..20bad17 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -150,6 +150,8 @@ main(int argc, char *argv[])
 	 */
 	unsetenv("LC_ALL");
 
+	validate_startup();			/* exit if parent called us incorrectly */
+
 	/*
 	 * Catch standard options before doing much else, in particular before we
 	 * insist on not being root.
diff --git a/src/bin/pg_config/pg_config.c b/src/bin/pg_config/pg_config.c
index 504e6c5..f588fdd 100644
--- a/src/bin/pg_config/pg_config.c
+++ b/src/bin/pg_config/pg_config.c
@@ -25,6 +25,7 @@
 #include "postgres_fe.h"
 
 #include "common/config_info.h"
+#include "common/logging.h"
 
 static const char *progname;
 
@@ -134,8 +135,8 @@ main(int argc, char **argv)
 	int			i;
 	int			j;
 
+	pg_logging_init(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_config"));
-
 	progname = get_progname(argv[0]);
 
 	/* check for --help */
diff --git a/src/bin/pg_test_timing/pg_test_timing.c b/src/bin/pg_test_timing/pg_test_timing.c
index ce7aad4..eede8dd 100644
--- a/src/bin/pg_test_timing/pg_test_timing.c
+++ b/src/bin/pg_test_timing/pg_test_timing.c
@@ -8,6 +8,7 @@
 
 #include <limits.h>
 
+#include "common/logging.h"
 #include "getopt_long.h"
 #include "portability/instr_time.h"
 
@@ -27,6 +28,7 @@ main(int argc, char *argv[])
 {
 	uint64		loop_count;
 
+	pg_logging_init(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("pg_test_timing"));
 	progname = get_progname(argv[0]);
 
diff --git a/src/common/exec.c b/src/common/exec.c
index 32fd565..7a5b035 100644
--- a/src/common/exec.c
+++ b/src/common/exec.c
@@ -42,6 +42,9 @@
 #endif
 #endif
 
+#ifdef FRONTEND
+#include "common/logging.h"
+#endif
 #include "common/string.h"
 
 /* Inhibit mingw CRT's auto-globbing of command line arguments */
@@ -466,6 +469,178 @@ set_pglocale_pgservice(const char *argv0, const char *app)
 	}
 }
 
+/*
+ * Exit if parent process designated unacceptable conditions.
+ *
+ * Every installed executable shall call this, most via pg_logging_init().
+ *
+ * Currently, the only unacceptable condition is a Windows command line
+ * containing characters not surviving conversion to Windows ANSI code page.
+ * Splitting such an unacceptable command line into "char *argv[]" uses "best
+ * fit" behavior
+ * (unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/readme.txt).
+ * For example, U+FF02 FULLWIDTH QUOTATION MARK becomes U+0022 QUOTATION MARK.
+ * Since U+0022 is a metacharacter that affects argument boundaries, the
+ * resulting argument boundaries could differ from what they would be if the
+ * executable were to use wmain(int argc, wchar_t *argv[]).  That isn't a
+ * vulnerability in PostgreSQL, but it may be a vulnerability in Windows
+ * programs that construct CreateProcessW() arguments to execute PostgreSQL
+ * programs.  While most "best fit" substitutes aren't metacharacters, they
+ * still lose parent process intent.
+ *
+ * ==== Background ====
+ *
+ * Windows API functions taking string arguments come in two variants.  The
+ * "wide" variant (e.g. CreateProcessW()) takes wchar_t[] strings in UTF-16
+ * encoding.  The "ANSI" variant (e.g. CreateProcessA()) takes char[] strings
+ * in the "Windows ANSI code page" encoding.  Changing the Control Panel
+ * "system locale" changes the Windows ANSI code page.
+ *
+ * POSIX handles process arguments as an array of encoding-oblivious sequences
+ * of bytes [0x1,0xFF].  Windows handles any number of arguments through a
+ * single wchar_t[] "command line".  When both parent and child use ANSI APIs,
+ * process arguments arrive via steps equivalent to the following:
+ *
+ * - Parent starts with the CreateProcessA() lpCommandLine argument.
+ * - Parent's MultiByteToWideChar(CP_ACP, 0, lpCommandLine) produces the
+ *   child-accessible command line (the child's GetCommandLineW()).
+ * - Child retrieves GetCommandLineA(), which matches from
+ *   WideCharToMultiByte(CP_ACP, 0, GetCommandLineW(), ...).
+ * - Child parses GetCommandLineA() to construct argv (see
+ *   https://msdn.microsoft.com/en-us/library/17w5ykft.aspx).
+ *
+ * For code pages other than GetACP()==CP_UTF8, the WideCharToMultiByte()
+ * losslessly reverses the MultiByteToWideChar().  Hence, the double
+ * conversion is imperceptible for those code pages.  It also follows that a
+ * CreateProcessW() caller can make arbitrary GetCommandLineA() sequences of
+ * bytes [0x1,0xFF].  It does that by filtering the intended GetCommandLineA()
+ * through MultiByteToWideChar() to get the equivalent CreateProcessW()
+ * lpCommandLine argument.
+ *
+ * With CP_UTF8, GetCommandLineW() will contain U+FFFD REPLACEMENT CHARACTER
+ * for each byte of lpCommandLine not forming valid UTF-8.  Hence,
+ * GetCommandLineA() is always valid UTF8.  Other GetCommandLineA() byte
+ * sequences are infeasible.  An administrator elects CP_UTF8 with non-default
+ * Windows setting "Beta: Use Unicode UTF-8 for worldwide language support".
+ *
+ * ==== Design decisions ====
+ *
+ * Conversion from U+0081 to 0x81 is also a "best fit" conversion in that most
+ * code pages do not have a character named HIGH OCTET PRESET.  However, the
+ * conversion is bidirectional, so it loses no parent process intent.  This
+ * applies to some other characters in [U+0080,U+00FF].  Allowing these
+ * characters is valuable, because it avoids rejecting any CreateProcessA()
+ * lpCommandLine value.  Only CreateProcessW() reaches failure here.
+ * PostgreSQL-supplied programs don't use CreateProcessW().
+ *
+ * Environment variables, too, see "best fit" behavior.  In PGOPTIONS, an
+ * unescaped U+FF07 FULLWIDTH APOSTROPHE or U+FF3C FULLWIDTH REVERSE SOLIDUS
+ * would defeat quoting.  In PGDATA, any inconvertible character would defeat
+ * finding the directory.  Even so, we don't check GetEnvironmentStringsW().
+ * We don't know which variables this process will retrieve.  It doesn't help
+ * to reject variables outside those.
+ *
+ * One can always interconvert UTF-8 and UTF-16 without loss.  Hence, one
+ * could skip this check for CP_UTF8.  Since GetACP()==CP_UTF8 is still in
+ * beta, don't rely on that.
+ */
+void
+validate_startup(void)
+{
+#ifdef WIN32
+	wchar_t    *cmd_w = GetCommandLineW();
+	wchar_t    *cmd_w_cursor = cmd_w;
+	wchar_t		ch;
+	bool		all_ascii = true;
+	char	   *cmd_a;
+	wchar_t    *cmd_w_from_a;
+	int			result;
+	int			n_cmd_w_from_a;
+	int			n_cmd_w;
+
+	/* Return early for all-ASCII args. */
+	while ((ch = *cmd_w_cursor++))
+		if (ch > 0x7f)
+		{
+			all_ascii = false;
+			break;
+		}
+	if (all_ascii)
+		return;
+
+	/* non-ASCII: check conversion reversibility */
+	cmd_a = GetCommandLineA();
+	result = MultiByteToWideChar(CP_ACP, 0, cmd_a, -1, NULL, 0);
+	if (result != 0)
+	{
+		n_cmd_w_from_a = result;
+		cmd_w_from_a = palloc(sizeof(wchar_t) * n_cmd_w_from_a);
+		result = MultiByteToWideChar(CP_ACP, 0, cmd_a, -1,
+									 cmd_w_from_a, n_cmd_w_from_a);
+	}
+
+	/*
+	 * Since the system fills GetCommandLineA() by converting
+	 * GetCommandLineW() from UTF-16, there's no known way for those
+	 * MultiByteToWideChar() calls to fail.  This failure message doesn't
+	 * deserve translation, but logging.h has no errmsg_internal() equivalent
+	 * to suppress translation. Once we translate the string for logging.h, we
+	 * may as well use the translation here.
+	 */
+	if (result == 0)
+	{
+#ifndef FRONTEND
+		ereport(FATAL,
+				errcode(ERRCODE_INTERNAL_ERROR),
+				errmsg("could not convert command line to UTF-16: error code %lu",
+					   GetLastError()));
+#else
+		pg_log_error("could not convert command line to UTF-16: error code %lu",
+					 GetLastError());
+		exit(1);
+#endif
+	}
+
+	/*
+	 * Since the system fills GetCommandLineA() by converting
+	 * GetCommandLineW() from UTF-16, there's no known way for the reverse
+	 * conversion to yield a different number of code units.
+	 */
+	n_cmd_w = 1 /* terminator */ + wcslen(cmd_w);
+	if (n_cmd_w_from_a != n_cmd_w)
+	{
+#ifndef FRONTEND
+		ereport(FATAL,
+				errcode(ERRCODE_INTERNAL_ERROR),
+				errmsg("command line converted length (%d) did not match wide character command line length (%d)",
+					   n_cmd_w_from_a, n_cmd_w));
+#else
+		pg_log_error("command line converted length (%d) did not match wide character command line length (%d)",
+					 n_cmd_w_from_a, n_cmd_w);
+		exit(1);
+#endif
+	}
+	for (int i = 0; i < n_cmd_w; i++)
+	{
+		if (cmd_w[i] == cmd_w_from_a[i])
+			continue;
+#ifndef FRONTEND
+		ereport(FATAL,
+				errcode(ERRCODE_SYSTEM_ERROR),
+				errmsg("could not convert command line to Windows ANSI code page"),
+				errdetail("First inconvertible UTF-16 code unit was 0x%04X at offset %d.",
+						  cmd_w[i], i));
+#else
+		pg_log_error("could not convert command line to Windows ANSI code page");
+		pg_log_error_detail("First inconvertible UTF-16 code unit was 0x%04X at offset %d.",
+							cmd_w[i], i);
+		exit(1);
+#endif
+	}
+	/* wide strings were equal, so args are fine */
+#endif
+}
+
 #ifdef EXEC_BACKEND
 /*
  * For the benefit of PostgreSQL developers testing EXEC_BACKEND on Unix
diff --git a/src/common/logging.c b/src/common/logging.c
index 3cf1190..b577d06 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -77,7 +77,8 @@ enable_vt_processing(void)
 #endif							/* WIN32 */
 
 /*
- * This should be called before any output happens.
+ * Every installed frontend should call this before any output happens.  Exits
+ * on error.
  */
 void
 pg_logging_init(const char *argv0)
@@ -157,6 +158,8 @@ pg_logging_init(const char *argv0)
 			sgr_locus = SGR_LOCUS_DEFAULT;
 		}
 	}
+
+	validate_startup();			/* convenient place to affect all frontends */
 }
 
 /*
diff --git a/src/include/port.h b/src/include/port.h
index ba9ab0d..380c46e 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -132,6 +132,9 @@ extern void pgfnames_cleanup(char **filenames);
 /* Portable locale initialization (in exec.c) */
 extern void set_pglocale_pgservice(const char *argv0, const char *app);
 
+/* Exit if parent process designated unacceptable conditions (in exec.c) */
+extern void validate_startup(void);
+
 /* Portable way to find and execute binaries (in exec.c) */
 extern int	validate_exec(const char *path);
 extern int	find_my_exec(const char *argv0, char *retpath);
diff --git a/src/interfaces/ecpg/preproc/ecpg.c b/src/interfaces/ecpg/preproc/ecpg.c
index 2fcc6f8..36be896 100644
--- a/src/interfaces/ecpg/preproc/ecpg.c
+++ b/src/interfaces/ecpg/preproc/ecpg.c
@@ -7,6 +7,7 @@
 
 #include <unistd.h>
 
+#include "common/logging.h"
 #include "getopt_long.h"
 
 #include "preproc_extern.h"
@@ -143,8 +144,8 @@ main(int argc, char *const argv[])
 	char		my_exec_path[MAXPGPATH];
 	char		include_path[MAXPGPATH];
 
+	pg_logging_init(argv[0]);
 	set_pglocale_pgservice(argv[0], PG_TEXTDOMAIN("ecpg"));
-
 	progname = get_progname(argv[0]);
 
 	if (find_my_exec(argv[0], my_exec_path) < 0)
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm
index 022b44b..7266f01 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -936,6 +936,7 @@ sub program_options_handling_ok
 {
 	local $Test::Builder::Level = $Test::Builder::Level + 1;
 	my ($cmd) = @_;
+
 	my ($stdout, $stderr);
 	print("# Running: $cmd --not-a-valid-option\n");
 	my $result = IPC::Run::run [ $cmd, '--not-a-valid-option' ], '>',
@@ -943,6 +944,42 @@ sub program_options_handling_ok
 	  '2>', \$stderr;
 	ok(!$result, "$cmd with invalid option nonzero exit code");
 	isnt($stderr, '', "$cmd with invalid option prints error message");
+
+	# On Windows, confirm (1) rejection of characters we can't convert to the
+	# Windows ANSI code page and (2) no rejection of environment variables.
+	# The $unicode_only test character, U+1FBCC RAISED SMALL LEFT SQUARE
+	# BRACKET, could have been be any character found only in CP_UTF8 and code
+	# pages ineligible to be GetACP().  Do accept the wide equivalent of
+	# CreateProcessA(0x80 0x81), per comments at validate_startup().
+  SKIP:
+	{
+		skip "wide characters are a special case to Windows only", 1
+		  if !$windows_os;
+		local $ENV{PG_TEST_EXE} = $cmd;
+		my $script = <<'EOPOWERSHELL';
+$want_exit_unicode_only = if ([System.Text.Encoding]::Default -eq
+							  [System.Text.Encoding]::UTF8) { 0 } else { 1 }
+$exe = $Env:PG_TEST_EXE
+$any_encoding =
+	[System.Text.Encoding]::Default.GetString([byte[]](0x80, 0x81))
+$unicode_only = [Char]::ConvertFromUtf32(0x1FBCC)
+$Env:PG_TEST_WIDE_ENV = $unicode_only
+& $exe --help ($any_encoding) > $null
+$exit_any_encoding = $LASTEXITCODE
+& $exe --help ($unicode_only) > $null
+$exit_unicode_only = $LASTEXITCODE
+if ($exit_any_encoding -eq 0 -and
+	$exit_unicode_only -eq $want_exit_unicode_only)
+{
+	exit 0
+}
+exit 1
+EOPOWERSHELL
+		$result = IPC::Run::run [ qw(powershell -NoProfile -NonInteractive),
+			'-Command' => $script ];
+		ok($result, 'command line rejected iff cannot convert to GetACP()');
+	}
+
 	return;
 }
 
#2Thomas Munro
thomas.munro@gmail.com
In reply to: Noah Misch (#1)
Re: Reject Windows command lines not convertible to system locale

On Sun, Dec 15, 2024 at 3:27 PM Noah Misch <noah@leadboat.com> wrote:

yielding mojibake

Thank you for this magnificent terminology.

What we can do safely is exit if we could not convert the command line to the
Windows ANSI code page (losslessly). Patch attached.

This seems to make sense generally, since (based only on googling, I
am not a Windows person and I don't even play one on TV), it doesn't
sound like there is a way for the system to tell you that the
conversion was lossy.

For the all-ASCII test, why not us pg_is_ascii()?

I wonder if there are any legitimate uses that would fail spuriously
due to the minutiae of normalisation, surrogates, etc. Or perhaps
that type of thing is part of the problem you're talking about.

As I mention in a code comment, I decided not to do anything about the similar
hazard in GetEnvironmentStringsW(). A possible alternative would be to check
each var referenced in PostgreSQL code or each var where the name begins with
"PG". Should we use one of those or another alternative?

I would vote for having a table of variables we know about and
checking them too, just on principle (it's not the characters the user
intended). Checking every variable might reject legitimate UTF-16
communication between parent and sub-processes other than ours, and
even assuming that we own everything beginning "PG" seems an
unnecessary overreach, which seems to leave only the known list idea.

The Perl testing calls powershell, which is new for our source tree. The
Internet thinks Windows Server 2008 R1 is the newest Windows where powershell
was optional. The buildfarm's oldest Windows is Windows Server 2008 R2. If
powershell is bad, there's likely a hairy way to write the test in any of
cmd.exe, cscript, Perl Win32::API, or C. My first choice was Perl
Win32::Process or IPC::Run. This test needs CreateProcessW(), but those two
use CreateProcessA().

Sounds OK from here if it's the only sane way, and you could always skip?

I didn't test a CP_UTF8 Windows system. The new test coverage has a condition
intended to avoid failure there. (My Windows development environments are all
too old, and I stopped short of building a new one for this.) CP_UTF8 Windows
raises broader issues. I'll start a separate thread about them.

Is it really still in beta? Yeah, I can't immediately think of a good
reason not to treat it the same as everything else and create more
code paths, even if we think it should always succeed (but I'll look
out for your thread, I'm quite interested in this general topic of
"can we make a lot of Windows encoding troubles go away that way").
It's a few extra CPU cycles but I guess it doesn't matter much.

I wanted to put the check in some function that most main() instances already
call, instead of adding ~40 new calls for this esoteric topic. Options:

1. Call in pg_logging_init() and the backend. Add pg_logging_init() calls to
pg_config, ecpg (the ecpg preprocessor), and pg_test_timing. I picked
this, for two minor benefits. First, it facilitates the backend
initializing LC_MESSAGES before this check. Second, every frontend may
need pg_logging_init() eventually, so it's fair to add calls where missing.

2. Call in set_pglocale_pgservice() and the three programs that don't call
set_pglocale_pgservice(): oid2name, vacuumlo, pgbench.

+1

FWIW, coincidentally, I assume, I wrote about environ and argv
encoding problems last week, including a test program showing wchar_t
<-> ACP weirdness before main() among other interesting related
effects[1]/messages/by-id/CA+hUKGL=F0pSLLf3nLpA_-sBwYsLg7s=FD6YFo_PDvS84FE_hw@mail.gmail.com. (My focus has been on how we sort out ACP vs database
encoding and block stuff that is b0rked in *our* code, ie below
main(), because we don't understand the platform's encoding model, not
so much the stuff that is already b0rked in all non-UTF8, non-wmain()
Windows programs in general before even laying hands on the program
counter. I had got as far as wondering out loud if forcing or
strongly recommending UTF-8 ACP should be on the agenda to head off
various related problems and generally make Windows not so different
from Unix, and also supporting it as a native encoding properly on
that platform as experimented with in CF #3772, but much more study is
required there.)

[1]: /messages/by-id/CA+hUKGL=F0pSLLf3nLpA_-sBwYsLg7s=FD6YFo_PDvS84FE_hw@mail.gmail.com

#3Noah Misch
noah@leadboat.com
In reply to: Thomas Munro (#2)
Re: Reject Windows command lines not convertible to system locale

On Sun, Dec 15, 2024 at 04:50:57PM +1300, Thomas Munro wrote:

On Sun, Dec 15, 2024 at 3:27 PM Noah Misch <noah@leadboat.com> wrote:

yielding mojibake

Thank you for this magnificent terminology.

Thank you for reviewing.

What we can do safely is exit if we could not convert the command line to the
Windows ANSI code page (losslessly). Patch attached.

This seems to make sense generally, since (based only on googling, I
am not a Windows person and I don't even play one on TV), it doesn't
sound like there is a way for the system to tell you that the
conversion was lossy.

For the all-ASCII test, why not us pg_is_ascii()?

pg_is_ascii() takes a char[], but this has a wchar_t[]. Maybe I should wrap
that part in a pg_wcs_is_ascii()? (We can't use GetCommandLineA(), because
"best fit" may have converted non-ASCII to ASCII.)

I wonder if there are any legitimate uses that would fail spuriously
due to the minutiae of normalisation, surrogates, etc. Or perhaps
that type of thing is part of the problem you're talking about.

I wouldn't rule it out. My original plan was just to reject any wchar_t above
0xFF. That breaks innocent command lines. If GetACP()==1252, then
CreateProcessA(... 0x80 ...) puts 0x20ac (UTF-16 EURO SIGN) in
GetCommandLineW(). The conversion to GetCommandLineA() arrives back at 0x80,
so there's no reason to reject it.

As I mention in a code comment, I decided not to do anything about the similar
hazard in GetEnvironmentStringsW(). A possible alternative would be to check
each var referenced in PostgreSQL code or each var where the name begins with
"PG". Should we use one of those or another alternative?

I would vote for having a table of variables we know about and
checking them too, just on principle (it's not the characters the user
intended). Checking every variable might reject legitimate UTF-16
communication between parent and sub-processes other than ours, and
even assuming that we own everything beginning "PG" seems an
unnecessary overreach, which seems to leave only the known list idea.

Works for me. It's a little weird for psql to complain about a PGDATA value,
but the condition should be rare and easy enough for the user to fix. If
nobody else dislikes that plan, I'll give it a try. Perhaps I'd add a
src/tools/RELEASE_CHANGES task to check for new vars. Alternatively, some
check-world test could verify names appearing in getenv calls or
PQconninfoOptions.envvar fields. I don't think we have a source-tree-scanning
test like that today, but I wouldn't object to starting. We've talked about a
pgindent test, so visiting source code from a test is plausible.

The Perl testing calls powershell, which is new for our source tree. The
Internet thinks Windows Server 2008 R1 is the newest Windows where powershell
was optional. The buildfarm's oldest Windows is Windows Server 2008 R2. If
powershell is bad, there's likely a hairy way to write the test in any of
cmd.exe, cscript, Perl Win32::API, or C. My first choice was Perl
Win32::Process or IPC::Run. This test needs CreateProcessW(), but those two
use CreateProcessA().

Sounds OK from here if it's the only sane way, and you could always skip?

A C program would be another sane way, but I would indeed opt to skip on
16-year-old Windows before rewriting in C.

I didn't test a CP_UTF8 Windows system. The new test coverage has a condition
intended to avoid failure there. (My Windows development environments are all
too old, and I stopped short of building a new one for this.) CP_UTF8 Windows
raises broader issues. I'll start a separate thread about them.

Is it really still in beta?

The word "beta" appears next to its checkbox. That's all I know!

Yeah, I can't immediately think of a good
reason not to treat it the same as everything else and create more
code paths, even if we think it should always succeed (but I'll look
out for your thread, I'm quite interested in this general topic of
"can we make a lot of Windows encoding troubles go away that way").
It's a few extra CPU cycles but I guess it doesn't matter much.

postgr.es/m/flat/20241215023221.4d.nmisch@google.com is that thread. So far,
it's only about new breakage.

FWIW, coincidentally, I assume, I wrote about environ and argv
encoding problems last week, including a test program showing wchar_t
<-> ACP weirdness before main() among other interesting related
effects[1].

Nice. I had made a note to look at that thread eventually, but I didn't know
Windows had come up there. My observations have been consistent with your
Windows findings in [1].

(My focus has been on how we sort out ACP vs database
encoding and block stuff that is b0rked in *our* code, ie below
main(), because we don't understand the platform's encoding model, not
so much the stuff that is already b0rked in all non-UTF8, non-wmain()
Windows programs in general before even laying hands on the program
counter. I had got as far as wondering out loud if forcing or
strongly recommending UTF-8 ACP should be on the agenda to head off
various related problems and generally make Windows not so different

My initial impression is not great, since it's painful with any database
having encoding!=UTF8. The nice thing about every other ACP is that you can
treat it as a POSIX-style, encoding-oblivious sequence of bytes. (Side note:
I don't understand how that is the case for code page 932 and other variable
length, non-UTF* code pages. It seems to work, but maybe I just haven't found
the byte sequences that break.) My hypothesis is that GetACP()==CP_UTF8 would
make PostgreSQL cooperate better with rest of the Windows world, but it would
make PostgreSQL cooperate worse with other POSIX transplants. Again, that's
just an initial impression.

Show quoted text

[1] /messages/by-id/CA+hUKGL=F0pSLLf3nLpA_-sBwYsLg7s=FD6YFo_PDvS84FE_hw@mail.gmail.com

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Noah Misch (#3)
Re: Reject Windows command lines not convertible to system locale

On Sun, Dec 15, 2024 at 6:05 PM Noah Misch <noah@leadboat.com> wrote:

On Sun, Dec 15, 2024 at 04:50:57PM +1300, Thomas Munro wrote:

For the all-ASCII test, why not us pg_is_ascii()?

pg_is_ascii() takes a char[], but this has a wchar_t[]. Maybe I should wrap
that part in a pg_wcs_is_ascii()? (We can't use GetCommandLineA(), because
"best fit" may have converted non-ASCII to ASCII.)

Right, sorry, brain fade. No issue with the open-coding here if you
want an early exit (I'd be tempted to just not have it for simplicity
but no strong opinion).