Encoding issues in console and eventlog on win32
We can choose different encodings from platform-dependent one
for database, but postgres writes serverlogs in the database encoding.
As the result, serverlogs are filled with broken characters.
The problem could occur on all platforms, however, there is a solution
for win32. Since Windows supports wide characters to write logs, we can
convert log texts => UTF-8 => UTF-16 and pass them to WriteConsoleW()
and ReportEventW().
Especially in Japan, encoding troubles on Windows are unavoidable
because postgres doesn't support Shift-JIS for database encoding,
that is the native encoding for Windows Japanese edition.
If we also want to support the same functionality on non-win32 platform,
we might need non-throwable version of pg_do_encoding_conversion():
log_message_to_write = pg_do_encoding_conversion_nothrow(
log_message_in_database_encoding,
GetDatabaseEncoding() /* as src_encoding */,
GetPlatformEncoding() /* as dst_encoding */)
and pass the result to stderr and syslog. But it requires major rewrites
of conversion functions, so I'd like to submit a solution only for win32
for now. Also, the issue is not so serious on non-win32 platforms because
we can choose UTF-8 or EUC_* on those platforms.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Attachments:
eventlog-20090817.patchapplication/octet-stream; name=eventlog-20090817.patchDownload
diff -cpr head/src/backend/utils/error/elog.c eventlog/src/backend/utils/error/elog.c
*** head/src/backend/utils/error/elog.c Tue Jul 7 16:06:35 2009
--- eventlog/src/backend/utils/error/elog.c Mon Aug 17 10:49:42 2009
*************** static int syslog_facility = LOG_LOCAL0;
*** 111,116 ****
--- 111,118 ----
static void write_syslog(int level, const char *line);
#endif
+ static void write_console(const char *line, int len);
+
#ifdef WIN32
static void write_eventlog(int level, const char *line);
#endif
*************** write_eventlog(int level, const char *li
*** 1606,1613 ****
break;
}
!
! ReportEvent(evtHandle,
eventlevel,
0,
0, /* All events are Id 0 */
--- 1608,1616 ----
break;
}
! if (GetDatabaseEncoding() == GetPlatformEncoding())
! {
! ReportEventA(evtHandle,
eventlevel,
0,
0, /* All events are Id 0 */
*************** write_eventlog(int level, const char *li
*** 1616,1624 ****
--- 1619,1696 ----
0,
&line,
NULL);
+ }
+ else
+ {
+ char *utf8;
+ WCHAR *utf16;
+ int len;
+
+ len = strlen(line);
+ utf8 = (char *) pg_do_encoding_conversion((unsigned char *) line,
+ len, GetDatabaseEncoding(), PG_UTF8);
+ if (utf8 != line)
+ len = strlen(utf8);
+
+ utf16 = (WCHAR *) palloc(sizeof(WCHAR) * (len + 1));
+ len = MultiByteToWideChar(CP_UTF8, 0, utf8, len, utf16, len);
+ utf16[len] = L'\0';
+
+ ReportEventW(evtHandle,
+ eventlevel,
+ 0,
+ 0, /* All events are Id 0 */
+ NULL,
+ 1,
+ 0,
+ (LPCWSTR *) &utf16,
+ NULL);
+
+ if (utf8 != line)
+ pfree(utf8);
+ pfree(utf16);
+ }
}
#endif /* WIN32 */
+ static void
+ write_console(const char *line, int len)
+ {
+ if (GetDatabaseEncoding() == GetPlatformEncoding())
+ write(fileno(stderr), line, len);
+ else
+ {
+ #ifdef WIN32
+ char *utf8;
+ WCHAR *utf16;
+ HANDLE stderrHandle;
+ DWORD written;
+
+ utf8 = (char *) pg_do_encoding_conversion((unsigned char *) line,
+ len, GetDatabaseEncoding(), PG_UTF8);
+
+ if (utf8 != line)
+ len = strlen(utf8);
+ utf16 = (WCHAR *) palloc(sizeof(WCHAR) * len);
+ len = MultiByteToWideChar(CP_UTF8, 0, utf8, len, utf16, len);
+
+ stderrHandle = GetStdHandle(STD_ERROR_HANDLE);
+ WriteConsoleW(stderrHandle, utf16, len, &written, NULL);
+
+ if (utf8 != line)
+ pfree(utf8);
+ pfree(utf16);
+ #else
+ /*
+ * Conversion on non-win32 platform is not implemented yet.
+ * It requires non-throw version of pg_do_encoding_conversion(),
+ * that converts unconvertable characters to '?' without errors.
+ */
+ write(fileno(stderr), line, len);
+ #endif
+ }
+ }
+
/*
* setup formatted_log_time, for consistent times between CSV and regular logs
*/
*************** send_message_to_server_log(ErrorData *ed
*** 2233,2239 ****
write_eventlog(edata->elevel, buf.data);
#endif
else
! write(fileno(stderr), buf.data, buf.len);
}
/* If in the syslogger process, try to write messages direct to file */
--- 2305,2311 ----
write_eventlog(edata->elevel, buf.data);
#endif
else
! write_console(buf.data, buf.len);
}
/* If in the syslogger process, try to write messages direct to file */
*************** send_message_to_server_log(ErrorData *ed
*** 2256,2267 ****
{
const char *msg = _("Not safe to send CSV data\n");
! write(fileno(stderr), msg, strlen(msg));
if (!(Log_destination & LOG_DESTINATION_STDERR) &&
whereToSendOutput != DestDebug)
{
/* write message to stderr unless we just sent it above */
! write(fileno(stderr), buf.data, buf.len);
}
pfree(buf.data);
}
--- 2328,2339 ----
{
const char *msg = _("Not safe to send CSV data\n");
! write_console(msg, strlen(msg));
if (!(Log_destination & LOG_DESTINATION_STDERR) &&
whereToSendOutput != DestDebug)
{
/* write message to stderr unless we just sent it above */
! write_console(buf.data, buf.len);
}
pfree(buf.data);
}
*************** void
*** 2642,2647 ****
--- 2714,2722 ----
write_stderr(const char *fmt,...)
{
va_list ap;
+ #ifdef WIN32
+ char errbuf[2048]; /* Arbitrary size? */
+ #endif
fmt = _(fmt);
*************** write_stderr(const char *fmt,...)
*** 2651,2656 ****
--- 2726,2732 ----
vfprintf(stderr, fmt, ap);
fflush(stderr);
#else
+ vsnprintf(errbuf, sizeof(errbuf), fmt, ap);
/*
* On Win32, we print to stderr if running on a console, or write to
*************** write_stderr(const char *fmt,...)
*** 2658,2673 ****
*/
if (pgwin32_is_service()) /* Running as a service */
{
- char errbuf[2048]; /* Arbitrary size? */
-
- vsnprintf(errbuf, sizeof(errbuf), fmt, ap);
-
write_eventlog(ERROR, errbuf);
}
else
{
/* Not running as service, write to stderr */
! vfprintf(stderr, fmt, ap);
fflush(stderr);
}
#endif
--- 2734,2745 ----
*/
if (pgwin32_is_service()) /* Running as a service */
{
write_eventlog(ERROR, errbuf);
}
else
{
/* Not running as service, write to stderr */
! write_console(errbuf, strlen(errbuf));
fflush(stderr);
}
#endif
diff -cpr head/src/backend/utils/mb/mbutils.c eventlog/src/backend/utils/mb/mbutils.c
*** head/src/backend/utils/mb/mbutils.c Thu Jul 9 11:16:13 2009
--- eventlog/src/backend/utils/mb/mbutils.c Mon Aug 17 10:49:42 2009
*************** static FmgrInfo *ToClientConvProc = NULL
*** 58,63 ****
--- 58,64 ----
*/
static pg_enc2name *ClientEncoding = &pg_enc2name_tbl[PG_SQL_ASCII];
static pg_enc2name *DatabaseEncoding = &pg_enc2name_tbl[PG_SQL_ASCII];
+ static pg_enc2name *PlatformEncoding = NULL;
/*
* During backend startup we can't set client encoding because we (a)
*************** pg_client_encoding(PG_FUNCTION_ARGS)
*** 977,980 ****
--- 978,989 ----
{
Assert(ClientEncoding);
return DirectFunctionCall1(namein, CStringGetDatum(ClientEncoding->name));
+ }
+
+ int
+ GetPlatformEncoding(void)
+ {
+ if (PlatformEncoding == NULL)
+ PlatformEncoding = &pg_enc2name_tbl[pg_get_encoding_from_locale("")];
+ return PlatformEncoding->encoding;
}
diff -cpr head/src/include/mb/pg_wchar.h eventlog/src/include/mb/pg_wchar.h
*** head/src/include/mb/pg_wchar.h Fri Jun 12 09:52:43 2009
--- eventlog/src/include/mb/pg_wchar.h Mon Aug 17 10:49:42 2009
*************** extern const char *pg_get_client_encodin
*** 402,407 ****
--- 402,408 ----
extern void SetDatabaseEncoding(int encoding);
extern int GetDatabaseEncoding(void);
extern const char *GetDatabaseEncodingName(void);
+ extern int GetPlatformEncoding(void);
extern void pg_bind_textdomain_codeset(const char *domainname);
extern int pg_valid_client_encoding(const char *name);
Itagaki Takahiro wrote:
We can choose different encodings from platform-dependent one
for database, but postgres writes serverlogs in the database encoding.
As the result, serverlogs are filled with broken characters.The problem could occur on all platforms, however, there is a solution
for win32. Since Windows supports wide characters to write logs, we can
convert log texts => UTF-8 => UTF-16 and pass them to WriteConsoleW()
and ReportEventW().Especially in Japan, encoding troubles on Windows are unavoidable
because postgres doesn't support Shift-JIS for database encoding,
that is the native encoding for Windows Japanese edition.If we also want to support the same functionality on non-win32 platform,
we might need non-throwable version of pg_do_encoding_conversion():log_message_to_write = pg_do_encoding_conversion_nothrow(
log_message_in_database_encoding,
GetDatabaseEncoding() /* as src_encoding */,
GetPlatformEncoding() /* as dst_encoding */)and pass the result to stderr and syslog. But it requires major rewrites
of conversion functions, so I'd like to submit a solution only for win32
for now. Also, the issue is not so serious on non-win32 platforms because
we can choose UTF-8 or EUC_* on those platforms.
Something like that seems reasonable for the Windows event log; that is
clearly supposed to be written using a specific encoding. With the log
files, we're more free to do what we want, and IMHO we shouldn't put a
Windows-specific hack there because as you say we have the same problem
on all platforms.
There's no guarantee that conversion to UTF-8 won't fail, so this isn't
totally risk-free on Windows either. Theoretically, MultiByteToWideChar
could fail too (the patch neglects to check for that), although I
suppose it can't really happen for UTF-8 -> UTF-16 conversion.
Can't we use MultiByteToWideChar() to convert directly to the required
encoding, avoiding the double conversion?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
Can't we use MultiByteToWideChar() to convert directly to the required
encoding, avoiding the double conversion?
Here is an updated version of the patch.
I use direct conversion in pgwin32_toUTF16() if a corresponding codepage
is available. If not available, I still use double conversion.
Now pgwin32_toUTF16() is exported from mbutil.c. I used the function
in following parts, although the main target of the patch is eventlog.
* WriteConsoleW() - write unredirected stderr log.
* ReportEventW() - write evenlog.
* CreateFileW() - open non-ascii filename (ex. COPY TO/FROM 'mb-path').
This approach is only available for Windows because any other platform
don't support locale-independent and wide-character-based system calls.
Other platforms require a different approach, but even then we'd still
better have win32-specific routines because UTF16 is the native encoding
in Windows.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Attachments:
eventlog-20090915.patchapplication/octet-stream; name=eventlog-20090915.patchDownload
diff -cprN head/src/backend/utils/error/elog.c eventlog/src/backend/utils/error/elog.c
*** head/src/backend/utils/error/elog.c 2009-07-04 04:14:25.000000000 +0900
--- eventlog/src/backend/utils/error/elog.c 2009-09-15 12:31:24.555451172 +0900
*************** static int syslog_facility = LOG_LOCAL0;
*** 111,118 ****
static void write_syslog(int level, const char *line);
#endif
#ifdef WIN32
! static void write_eventlog(int level, const char *line);
#endif
/* We provide a small stack of ErrorData records for re-entrant cases */
--- 111,120 ----
static void write_syslog(int level, const char *line);
#endif
+ static void write_console(const char *line, int len);
+
#ifdef WIN32
! static void write_eventlog(int level, const char *line, int len);
#endif
/* We provide a small stack of ErrorData records for re-entrant cases */
*************** write_syslog(int level, const char *line
*** 1567,1573 ****
* Write a message line to the windows event log
*/
static void
! write_eventlog(int level, const char *line)
{
int eventlevel = EVENTLOG_ERROR_TYPE;
static HANDLE evtHandle = INVALID_HANDLE_VALUE;
--- 1569,1575 ----
* Write a message line to the windows event log
*/
static void
! write_eventlog(int level, const char *line, int len)
{
int eventlevel = EVENTLOG_ERROR_TYPE;
static HANDLE evtHandle = INVALID_HANDLE_VALUE;
*************** write_eventlog(int level, const char *li
*** 1606,1613 ****
break;
}
!
! ReportEvent(evtHandle,
eventlevel,
0,
0, /* All events are Id 0 */
--- 1608,1616 ----
break;
}
! if (GetDatabaseEncoding() == GetPlatformEncoding())
! {
! ReportEventA(evtHandle,
eventlevel,
0,
0, /* All events are Id 0 */
*************** write_eventlog(int level, const char *li
*** 1616,1624 ****
--- 1619,1685 ----
0,
&line,
NULL);
+ }
+ else
+ {
+ WCHAR *utf16;
+ int utf16len;
+
+ utf16 = pgwin32_toUTF16(line, len, &utf16len);
+ ReportEventW(evtHandle,
+ eventlevel,
+ 0,
+ 0, /* All events are Id 0 */
+ NULL,
+ 1,
+ 0,
+ (LPCWSTR *) &utf16,
+ NULL);
+
+ pfree(utf16);
+ }
}
+
#endif /* WIN32 */
+ static void
+ write_console(const char *line, int len)
+ {
+ if (GetDatabaseEncoding() != GetPlatformEncoding())
+ {
+ #ifdef WIN32
+ static bool redirected = false;
+
+ if (!redirected)
+ {
+ WCHAR *utf16;
+ int utf16len;
+ HANDLE stderrHandle;
+ DWORD written;
+
+ utf16 = pgwin32_toUTF16(line, len, &utf16len);
+ stderrHandle = GetStdHandle(STD_ERROR_HANDLE);
+ if (WriteConsoleW(stderrHandle, utf16, utf16len, &written, NULL))
+ {
+ pfree(utf16);
+ return;
+ }
+
+ pfree(utf16);
+ redirected = true; /* stderr might be redirected */
+ }
+ #else
+ /*
+ * Conversion on non-win32 platform is not implemented yet.
+ * It requires non-throw version of pg_do_encoding_conversion(),
+ * that converts unconvertable characters to '?' without errors.
+ */
+ #endif
+ }
+
+ write(fileno(stderr), line, len);
+ }
+
/*
* setup formatted_log_time, for consistent times between CSV and regular logs
*/
*************** send_message_to_server_log(ErrorData *ed
*** 2206,2212 ****
/* Write to eventlog, if enabled */
if (Log_destination & LOG_DESTINATION_EVENTLOG)
{
! write_eventlog(edata->elevel, buf.data);
}
#endif /* WIN32 */
--- 2267,2273 ----
/* Write to eventlog, if enabled */
if (Log_destination & LOG_DESTINATION_EVENTLOG)
{
! write_eventlog(edata->elevel, buf.data, buf.len);
}
#endif /* WIN32 */
*************** send_message_to_server_log(ErrorData *ed
*** 2230,2239 ****
* because that's really a pipe to the syslogger process.
*/
else if (pgwin32_is_service())
! write_eventlog(edata->elevel, buf.data);
#endif
else
! write(fileno(stderr), buf.data, buf.len);
}
/* If in the syslogger process, try to write messages direct to file */
--- 2291,2300 ----
* because that's really a pipe to the syslogger process.
*/
else if (pgwin32_is_service())
! write_eventlog(edata->elevel, buf.data, buf.len);
#endif
else
! write_console(buf.data, buf.len);
}
/* If in the syslogger process, try to write messages direct to file */
*************** send_message_to_server_log(ErrorData *ed
*** 2256,2267 ****
{
const char *msg = _("Not safe to send CSV data\n");
! write(fileno(stderr), msg, strlen(msg));
if (!(Log_destination & LOG_DESTINATION_STDERR) &&
whereToSendOutput != DestDebug)
{
/* write message to stderr unless we just sent it above */
! write(fileno(stderr), buf.data, buf.len);
}
pfree(buf.data);
}
--- 2317,2328 ----
{
const char *msg = _("Not safe to send CSV data\n");
! write_console(msg, strlen(msg));
if (!(Log_destination & LOG_DESTINATION_STDERR) &&
whereToSendOutput != DestDebug)
{
/* write message to stderr unless we just sent it above */
! write_console(buf.data, buf.len);
}
pfree(buf.data);
}
*************** void
*** 2642,2647 ****
--- 2703,2711 ----
write_stderr(const char *fmt,...)
{
va_list ap;
+ #ifdef WIN32
+ char errbuf[2048]; /* Arbitrary size? */
+ #endif
fmt = _(fmt);
*************** write_stderr(const char *fmt,...)
*** 2651,2656 ****
--- 2715,2721 ----
vfprintf(stderr, fmt, ap);
fflush(stderr);
#else
+ vsnprintf(errbuf, sizeof(errbuf), fmt, ap);
/*
* On Win32, we print to stderr if running on a console, or write to
*************** write_stderr(const char *fmt,...)
*** 2658,2673 ****
*/
if (pgwin32_is_service()) /* Running as a service */
{
! char errbuf[2048]; /* Arbitrary size? */
!
! vsnprintf(errbuf, sizeof(errbuf), fmt, ap);
!
! write_eventlog(ERROR, errbuf);
}
else
{
/* Not running as service, write to stderr */
! vfprintf(stderr, fmt, ap);
fflush(stderr);
}
#endif
--- 2723,2734 ----
*/
if (pgwin32_is_service()) /* Running as a service */
{
! write_eventlog(ERROR, errbuf, strlen(errbuf));
}
else
{
/* Not running as service, write to stderr */
! write_console(errbuf, strlen(errbuf));
fflush(stderr);
}
#endif
diff -cprN head/src/backend/utils/mb/mbutils.c eventlog/src/backend/utils/mb/mbutils.c
*** head/src/backend/utils/mb/mbutils.c 2009-07-08 04:28:56.000000000 +0900
--- eventlog/src/backend/utils/mb/mbutils.c 2009-09-15 12:31:24.556451161 +0900
*************** static FmgrInfo *ToClientConvProc = NULL
*** 58,63 ****
--- 58,64 ----
*/
static pg_enc2name *ClientEncoding = &pg_enc2name_tbl[PG_SQL_ASCII];
static pg_enc2name *DatabaseEncoding = &pg_enc2name_tbl[PG_SQL_ASCII];
+ static pg_enc2name *PlatformEncoding = NULL;
/*
* During backend startup we can't set client encoding because we (a)
*************** pg_client_encoding(PG_FUNCTION_ARGS)
*** 978,980 ****
--- 979,1083 ----
Assert(ClientEncoding);
return DirectFunctionCall1(namein, CStringGetDatum(ClientEncoding->name));
}
+
+ int
+ GetPlatformEncoding(void)
+ {
+ if (PlatformEncoding == NULL)
+ PlatformEncoding = &pg_enc2name_tbl[pg_get_encoding_from_locale("")];
+ return PlatformEncoding->encoding;
+ }
+
+ #ifdef WIN32
+
+ static const UINT encoding_to_codepage[] =
+ {
+ 0, /* PG_SQL_ASCII */
+ 20932, /* PG_EUC_JP */
+ 20936, /* PG_EUC_CN */
+ 51949, /* PG_EUC_KR */
+ 0, /* PG_EUC_TW */
+ 20932, /* PG_EUC_JIS_2004 */
+ CP_UTF8, /* PG_UTF8 */
+ 0, /* PG_MULE_INTERNAL */
+ 28591, /* PG_LATIN1 */
+ 28592, /* PG_LATIN2 */
+ 28593, /* PG_LATIN3 */
+ 28594, /* PG_LATIN4 */
+ 28599, /* PG_LATIN5 */
+ 0, /* PG_LATIN6 */
+ 0, /* PG_LATIN7 */
+ 0, /* PG_LATIN8 */
+ 28605, /* PG_LATIN9 */
+ 0, /* PG_LATIN10 */
+ 1256, /* PG_WIN1256 */
+ 1258, /* PG_WIN1258 */
+ 866, /* PG_WIN866 */
+ 874, /* PG_WIN874 */
+ 20866, /* PG_KOI8R */
+ 1251, /* PG_WIN1251 */
+ 1252, /* PG_WIN1252 */
+ 28595, /* PG_ISO_8859_5 */
+ 28596, /* PG_ISO_8859_6 */
+ 28597, /* PG_ISO_8859_7 */
+ 28598, /* PG_ISO_8859_8 */
+ 1250, /* PG_WIN1250 */
+ 1253, /* PG_WIN1253 */
+ 1254, /* PG_WIN1254 */
+ 1255, /* PG_WIN1255 */
+ 1257, /* PG_WIN1257 */
+ 21866, /* PG_KOI8U */
+ 932, /* PG_SJIS */
+ 950, /* PG_BIG5 */
+ 936, /* PG_GBK */
+ 0, /* PG_UHC */
+ 54936, /* PG_GB18030 */
+ 0, /* PG_JOHAB */
+ 932 /* PG_SHIFT_JIS_2004 */
+ };
+
+ /*
+ * Result is palloc'ed null-terminated utf16 string. The character length
+ * is also passed to utf16len if not null.
+ */
+ WCHAR *
+ pgwin32_toUTF16(const char *str, int len, int *utf16len)
+ {
+ WCHAR *utf16;
+ UINT codepage;
+
+ codepage = encoding_to_codepage[GetDatabaseEncoding()];
+
+ /*
+ * Use MultiByteToWideChar directly if there is a corresponding codepage,
+ * or double conversion through UTF8.
+ */
+ if (codepage != 0)
+ {
+ utf16 = (WCHAR *) palloc(sizeof(WCHAR) * (len + 1));
+ len = MultiByteToWideChar(codepage, 0, str, len, utf16, len);
+ utf16[len] = L'\0';
+ }
+ else
+ {
+ char *utf8;
+
+ utf8 = (char *) pg_do_encoding_conversion((unsigned char *) str,
+ len, GetDatabaseEncoding(), PG_UTF8);
+ if (utf8 != str)
+ len = strlen(utf8);
+
+ utf16 = (WCHAR *) palloc(sizeof(WCHAR) * (len + 1));
+ len = MultiByteToWideChar(CP_UTF8, 0, utf8, len, utf16, len);
+ utf16[len] = L'\0';
+
+ if (utf8 != str)
+ pfree(utf8);
+ }
+
+ if (utf16len)
+ *utf16len = len;
+ return utf16;
+ }
+
+ #endif
diff -cprN head/src/include/mb/pg_wchar.h eventlog/src/include/mb/pg_wchar.h
*** head/src/include/mb/pg_wchar.h 2009-06-11 23:49:11.000000000 +0900
--- eventlog/src/include/mb/pg_wchar.h 2009-09-15 12:31:24.556451161 +0900
*************** extern const char *pg_get_client_encodin
*** 402,407 ****
--- 402,408 ----
extern void SetDatabaseEncoding(int encoding);
extern int GetDatabaseEncoding(void);
extern const char *GetDatabaseEncodingName(void);
+ extern int GetPlatformEncoding(void);
extern void pg_bind_textdomain_codeset(const char *domainname);
extern int pg_valid_client_encoding(const char *name);
*************** extern void mic2latin_with_table(const u
*** 458,461 ****
--- 459,466 ----
extern bool pg_utf8_islegal(const unsigned char *source, int length);
+ #ifdef WIN32
+ extern WCHAR *pgwin32_toUTF16(const char *str, int len, int *utf16len);
+ #endif
+
#endif /* PG_WCHAR_H */
diff -cprN head/src/port/open.c eventlog/src/port/open.c
*** head/src/port/open.c 2009-06-11 23:49:15.000000000 +0900
--- eventlog/src/port/open.c 2009-09-15 12:31:24.556451161 +0900
***************
*** 23,28 ****
--- 23,31 ----
#include <fcntl.h>
#include <assert.h>
+ #ifndef FRONTEND
+ #include "mb/pg_wchar.h"
+ #endif
static int
openFlagsToCreateFileFlags(int openFlags)
*************** pgwin32_open(const char *fileName, int f
*** 65,70 ****
--- 68,78 ----
HANDLE h = INVALID_HANDLE_VALUE;
SECURITY_ATTRIBUTES sa;
int loops = 0;
+ DWORD dwDesiredAccess;
+ DWORD dwShareMode;
+ DWORD dwCreationDisposition;
+ DWORD dwFlagsAndAttributes;
+ WCHAR *wFileName = NULL;
/* Check that we can handle the request */
assert((fileFlags & ((O_RDONLY | O_WRONLY | O_RDWR) | O_APPEND |
*************** pgwin32_open(const char *fileName, int f
*** 72,97 ****
_O_SHORT_LIVED | O_DSYNC | O_DIRECT |
(O_CREAT | O_TRUNC | O_EXCL) | (O_TEXT | O_BINARY))) == fileFlags);
- sa.nLength = sizeof(sa);
- sa.bInheritHandle = TRUE;
- sa.lpSecurityDescriptor = NULL;
-
- while ((h = CreateFile(fileName,
/* cannot use O_RDONLY, as it == 0 */
! (fileFlags & O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) :
! ((fileFlags & O_WRONLY) ? GENERIC_WRITE : GENERIC_READ),
/* These flags allow concurrent rename/unlink */
! (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE),
! &sa,
! openFlagsToCreateFileFlags(fileFlags),
! FILE_ATTRIBUTE_NORMAL |
((fileFlags & O_RANDOM) ? FILE_FLAG_RANDOM_ACCESS : 0) |
((fileFlags & O_SEQUENTIAL) ? FILE_FLAG_SEQUENTIAL_SCAN : 0) |
((fileFlags & _O_SHORT_LIVED) ? FILE_ATTRIBUTE_TEMPORARY : 0) |
((fileFlags & O_TEMPORARY) ? FILE_FLAG_DELETE_ON_CLOSE : 0) |
((fileFlags & O_DIRECT) ? FILE_FLAG_NO_BUFFERING : 0) |
! ((fileFlags & O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0),
! NULL)) == INVALID_HANDLE_VALUE)
{
/*
* Sharing violation or locking error can indicate antivirus, backup
--- 80,126 ----
_O_SHORT_LIVED | O_DSYNC | O_DIRECT |
(O_CREAT | O_TRUNC | O_EXCL) | (O_TEXT | O_BINARY))) == fileFlags);
/* cannot use O_RDONLY, as it == 0 */
! dwDesiredAccess = (fileFlags & O_RDWR) ? (GENERIC_WRITE | GENERIC_READ) :
! ((fileFlags & O_WRONLY) ? GENERIC_WRITE : GENERIC_READ);
/* These flags allow concurrent rename/unlink */
! dwShareMode = (FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE);
! dwCreationDisposition = openFlagsToCreateFileFlags(fileFlags);
! dwFlagsAndAttributes = FILE_ATTRIBUTE_NORMAL |
((fileFlags & O_RANDOM) ? FILE_FLAG_RANDOM_ACCESS : 0) |
((fileFlags & O_SEQUENTIAL) ? FILE_FLAG_SEQUENTIAL_SCAN : 0) |
((fileFlags & _O_SHORT_LIVED) ? FILE_ATTRIBUTE_TEMPORARY : 0) |
((fileFlags & O_TEMPORARY) ? FILE_FLAG_DELETE_ON_CLOSE : 0) |
((fileFlags & O_DIRECT) ? FILE_FLAG_NO_BUFFERING : 0) |
! ((fileFlags & O_DSYNC) ? FILE_FLAG_WRITE_THROUGH : 0);
!
! sa.nLength = sizeof(sa);
! sa.bInheritHandle = TRUE;
! sa.lpSecurityDescriptor = NULL;
!
! #ifndef FRONTEND
! /*
! * Use wide-character file name only if the database encoding doesn't match
! * to the platform encoding and the path contains any multi-byte characters.
! */
! if (GetDatabaseEncoding() != GetPlatformEncoding())
! {
! int len;
! bool hasMBChar = false;
!
! for (len = 0; fileName[len]; len++)
! hasMBChar |= IS_HIGHBIT_SET(fileName[len]);
! if (hasMBChar)
! wFileName = pgwin32_toUTF16(fileName, len, NULL);
! }
! #endif
!
! while ((h = (wFileName != NULL
! ? CreateFileW(wFileName, dwDesiredAccess, dwShareMode, &sa,
! dwCreationDisposition, dwFlagsAndAttributes, NULL)
! : CreateFileA(fileName, dwDesiredAccess, dwShareMode, &sa,
! dwCreationDisposition, dwFlagsAndAttributes, NULL))
! ) == INVALID_HANDLE_VALUE)
{
/*
* Sharing violation or locking error can indicate antivirus, backup
*************** pgwin32_open(const char *fileName, int f
*** 119,128 ****
--- 148,166 ----
continue;
}
+ #ifndef FRONTEND
+ if (wFileName)
+ pfree(wFileName);
+ #endif
_dosmaperr(err);
return -1;
}
+ #ifndef FRONTEND
+ if (wFileName)
+ pfree(wFileName);
+ #endif
+
/* _open_osfhandle will, on error, set errno accordingly */
if ((fd = _open_osfhandle((long) h, fileFlags & O_APPEND)) < 0)
CloseHandle(h); /* will not affect errno */
On Tue, 2009-09-15 at 12:49 +0900, Itagaki Takahiro wrote:
Here is an updated version of the patch.
This is a review of the Eventlog encoding on Windows patch:
http://archives.postgresql.org/message-id/20090915123243.9C59.52131E4D@oss.ntt.co.jp
Purpose & Format
================
This patch is designed to coerce log messages to a specific encoding.
It's currently only targeted at the win32 port, where the logs are
written in UTF-16.
The patch applies cleanly. It doesn't include any documentation updates
or additional regression tests. A comment in the documentation that
logs on Windows will go through an encoding conversion if appropriate
might be nice, though.
Initial Run
===========
To (hopefully) properly test I initdb'd a couple directories under
different locales. I then ran a few statements designed to generate
event log messages showing characters in a different encoding:
SELECT E'\xF0'::int;
The unpatched backend generated event log message showing only the byte
value interpreted as the same character each time in the system default
encoding.
With the patch in place the event log message showed the character
correctly for each of the different encodings.
I haven't tried any performance testing against it.
Concurrent Development Issues
=============================
On a hunch, tried applying the "syslogger infrastructure changes" at the
same time. They conflict on elog.c. Not sure if we're supposed to
check for that, but thought I'd point it out. :)
Editorial
=========
The problem seems to stem from PG and Windows each having a few
encodings the other won't understand, or at least don't immediately
support. So log messages back to the system from its perspective
contain incorrect or broken characters. I'm not sure this is as much of
a problem on other platforms, though, where the database encoding
typically doesn't have any trouble matching the system's; would it be
worth pursuing beyond the win32 port?
I'm not too familiar with alternate character sets... I would assume if
there's a code page supported on win32 it'll naturally support
conversion to UTF-16 on the platform, but is there any time this could
fail? What about the few encodings that it doesn't directly support,
which need a conversion to UTF-8 first?
Maybe someone with more familiarity with encoding conversion issues
could comment on that? Otherwise I think this is ready to be bumped up
for committer review.
- Josh Williams
2009/9/15 Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
Can't we use MultiByteToWideChar() to convert directly to the required
encoding, avoiding the double conversion?Here is an updated version of the patch.
I use direct conversion in pgwin32_toUTF16() if a corresponding codepage
is available. If not available, I still use double conversion.Now pgwin32_toUTF16() is exported from mbutil.c. I used the function
in following parts, although the main target of the patch is eventlog.* WriteConsoleW() - write unredirected stderr log.
* ReportEventW() - write evenlog.
* CreateFileW() - open non-ascii filename (ex. COPY TO/FROM 'mb-path').This approach is only available for Windows because any other platform
don't support locale-independent and wide-character-based system calls.
Other platforms require a different approach, but even then we'd still
better have win32-specific routines because UTF16 is the native encoding
in Windows.
I did a quick check of this, and here are the things I would like to
have changed:
First of all, the change to port/open.c seems to be unrelated to the
rest, and should be a separate patch, correct? I'm sure there's a
usecase for it, but it's not actually included in the patches
description, so I assume this was a mistake?
Per your own comments earlier, and in the code, what will happen if
pg_do_encoding_conversion() calls ereport()? Didn't you say we need a
non-throwing version of it?
pgwin32_toUTF16() needs error checking on the API calls, and needs to
do something reasonable if it fails. For example, it can fail because
of out of memory error. I suggest just returning the error code in
some way in that case, and have the callers fall back to logging in
the incorrect encoding - in a lot of cases that will produce an at
least partially readable message. A second message should also be
logged saying that the conversion failed - this needs to be done
directly with the eventlog API functions and not ereport, so we don't
end up in infinite recursion.
The encoding_to_codepage array needs to go in encnames.c, where other
such tables are. Perhaps it can even be integrated in pg_enc2name_tbl
as a separate field?
I don't have the time to clean this up right now, so if you have,
please do so and resubmit. If not, I can clean it up later and apply.
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> wrote:
First of all, the change to port/open.c seems to be unrelated to the
rest, and should be a separate patch, correct? I'm sure there's a
usecase for it, but it's not actually included in the patches
description, so I assume this was a mistake?
It was just a demo for pgwin32_toUTF16(). I'll remove this part from
the patch, but I think we also need to fix the encoding mismatch issue
in path strings. I'll re-submit for the next commitfest.
Per your own comments earlier, and in the code, what will happen if
pg_do_encoding_conversion() calls ereport()? Didn't you say we need a
non-throwing version of it?
We are hard to use encoding conversion functions in logging routines
because they could throw errors if there are some unconvertable characters.
Non-throwing version will convert such characters into '?' or escaped form
(something like \888 or \xFF). If there where such infrastructure, we can
support "log_encoding" settings and convert messages in platform-dependent
encoding before writing to syslog or console.
pgwin32_toUTF16() needs error checking on the API calls, and needs to
do something reasonable if it fails.
Now it returns NULL and caller writes messages in the original encoding.
Also I added the following error checks before calling pgwin32_toUTF16()
(errordata_stack_depth < ERRORDATA_STACK_SIZE - 1)
to avoid recursive errors, but I'm not sure it is really meaningful.
Please remove or rewrite this part if it is not a right way.
The encoding_to_codepage array needs to go in encnames.c, where other
such tables are. Perhaps it can even be integrated in pg_enc2name_tbl
as a separate field?
I added pg_enc2name.codepage. Note that this field is needed only
on Windows, but now exported for all platforms. If you don't like
the useless field, the following macro could be a help.
#ifdef WIN32
#define def_enc2name(name, codepage) { #name, PG_##name, codepage }
#else
#define def_enc2name(name, codepage) { #name, PG_##name }
#endif
pg_enc2name pg_enc2name_tbl[] =
{
def_enc2name(SQL_ASCII),
def_enc2name(EUC_JP),
...
I don't have the time to clean this up right now, so if you have,
please do so and resubmit. If not, I can clean it up later and apply.
Patch attached.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Attachments:
eventlog_20091007.patchapplication/octet-stream; name=eventlog_20091007.patchDownload
diff -cprN head/src/backend/utils/error/elog.c work/src/backend/utils/error/elog.c
*** head/src/backend/utils/error/elog.c 2009-07-04 04:14:25.000000000 +0900
--- work/src/backend/utils/error/elog.c 2009-10-07 11:15:16.251326894 +0900
*************** static int syslog_facility = LOG_LOCAL0;
*** 111,118 ****
static void write_syslog(int level, const char *line);
#endif
#ifdef WIN32
! static void write_eventlog(int level, const char *line);
#endif
/* We provide a small stack of ErrorData records for re-entrant cases */
--- 111,120 ----
static void write_syslog(int level, const char *line);
#endif
+ static void write_console(const char *line, int len);
+
#ifdef WIN32
! static void write_eventlog(int level, const char *line, int len);
#endif
/* We provide a small stack of ErrorData records for re-entrant cases */
*************** write_syslog(int level, const char *line
*** 1567,1576 ****
* Write a message line to the windows event log
*/
static void
! write_eventlog(int level, const char *line)
{
! int eventlevel = EVENTLOG_ERROR_TYPE;
! static HANDLE evtHandle = INVALID_HANDLE_VALUE;
if (evtHandle == INVALID_HANDLE_VALUE)
{
--- 1569,1579 ----
* Write a message line to the windows event log
*/
static void
! write_eventlog(int level, const char *line, int len)
{
! WCHAR *utf16;
! int eventlevel = EVENTLOG_ERROR_TYPE;
! static HANDLE evtHandle = INVALID_HANDLE_VALUE;
if (evtHandle == INVALID_HANDLE_VALUE)
{
*************** write_eventlog(int level, const char *li
*** 1606,1613 ****
break;
}
! ReportEvent(evtHandle,
eventlevel,
0,
0, /* All events are Id 0 */
--- 1609,1637 ----
break;
}
+ /*
+ * Convert message to UTF16 text and write it with ReportEventW,
+ * but fall-back into ReportEventA if conversion failed.
+ */
+ if (errordata_stack_depth < ERRORDATA_STACK_SIZE - 1 &&
+ GetDatabaseEncoding() != GetPlatformEncoding() &&
+ (utf16 = pgwin32_toUTF16(line, len, NULL)) != NULL)
+ {
+ ReportEventW(evtHandle,
+ eventlevel,
+ 0,
+ 0, /* All events are Id 0 */
+ NULL,
+ 1,
+ 0,
+ (LPCWSTR *) &utf16,
+ NULL);
! pfree(utf16);
! }
! else
! {
! ReportEventA(evtHandle,
eventlevel,
0,
0, /* All events are Id 0 */
*************** write_eventlog(int level, const char *li
*** 1616,1624 ****
--- 1640,1690 ----
0,
&line,
NULL);
+ }
}
+
#endif /* WIN32 */
+ static void
+ write_console(const char *line, int len)
+ {
+ #ifdef WIN32
+ if (errordata_stack_depth < ERRORDATA_STACK_SIZE - 1 &&
+ GetDatabaseEncoding() != GetPlatformEncoding())
+ {
+ static bool redirected = false;
+ WCHAR *utf16;
+ int utf16len;
+
+ if (!redirected &&
+ (utf16 = pgwin32_toUTF16(line, len, &utf16len)) != NULL)
+ {
+ HANDLE stdHandle;
+ DWORD written;
+
+ stdHandle = GetStdHandle(STD_ERROR_HANDLE);
+ if (WriteConsoleW(stdHandle, utf16, utf16len, &written, NULL))
+ {
+ pfree(utf16);
+ return;
+ }
+
+ /* WriteConsoleW could fail if stderr is redirected. */
+ pfree(utf16);
+ redirected = true;
+ }
+ }
+ #else
+ /*
+ * Conversion on non-win32 platform is not implemented yet.
+ * It requires non-throw version of pg_do_encoding_conversion(),
+ * that converts unconvertable characters to '?' without errors.
+ */
+ #endif
+
+ write(fileno(stderr), line, len);
+ }
+
/*
* setup formatted_log_time, for consistent times between CSV and regular logs
*/
*************** send_message_to_server_log(ErrorData *ed
*** 2206,2212 ****
/* Write to eventlog, if enabled */
if (Log_destination & LOG_DESTINATION_EVENTLOG)
{
! write_eventlog(edata->elevel, buf.data);
}
#endif /* WIN32 */
--- 2272,2278 ----
/* Write to eventlog, if enabled */
if (Log_destination & LOG_DESTINATION_EVENTLOG)
{
! write_eventlog(edata->elevel, buf.data, buf.len);
}
#endif /* WIN32 */
*************** send_message_to_server_log(ErrorData *ed
*** 2230,2239 ****
* because that's really a pipe to the syslogger process.
*/
else if (pgwin32_is_service())
! write_eventlog(edata->elevel, buf.data);
#endif
else
! write(fileno(stderr), buf.data, buf.len);
}
/* If in the syslogger process, try to write messages direct to file */
--- 2296,2305 ----
* because that's really a pipe to the syslogger process.
*/
else if (pgwin32_is_service())
! write_eventlog(edata->elevel, buf.data, buf.len);
#endif
else
! write_console(buf.data, buf.len);
}
/* If in the syslogger process, try to write messages direct to file */
*************** send_message_to_server_log(ErrorData *ed
*** 2256,2267 ****
{
const char *msg = _("Not safe to send CSV data\n");
! write(fileno(stderr), msg, strlen(msg));
if (!(Log_destination & LOG_DESTINATION_STDERR) &&
whereToSendOutput != DestDebug)
{
/* write message to stderr unless we just sent it above */
! write(fileno(stderr), buf.data, buf.len);
}
pfree(buf.data);
}
--- 2322,2333 ----
{
const char *msg = _("Not safe to send CSV data\n");
! write_console(msg, strlen(msg));
if (!(Log_destination & LOG_DESTINATION_STDERR) &&
whereToSendOutput != DestDebug)
{
/* write message to stderr unless we just sent it above */
! write_console(buf.data, buf.len);
}
pfree(buf.data);
}
*************** void
*** 2642,2647 ****
--- 2708,2716 ----
write_stderr(const char *fmt,...)
{
va_list ap;
+ #ifdef WIN32
+ char errbuf[2048]; /* Arbitrary size? */
+ #endif
fmt = _(fmt);
*************** write_stderr(const char *fmt,...)
*** 2651,2656 ****
--- 2720,2726 ----
vfprintf(stderr, fmt, ap);
fflush(stderr);
#else
+ vsnprintf(errbuf, sizeof(errbuf), fmt, ap);
/*
* On Win32, we print to stderr if running on a console, or write to
*************** write_stderr(const char *fmt,...)
*** 2658,2673 ****
*/
if (pgwin32_is_service()) /* Running as a service */
{
! char errbuf[2048]; /* Arbitrary size? */
!
! vsnprintf(errbuf, sizeof(errbuf), fmt, ap);
!
! write_eventlog(ERROR, errbuf);
}
else
{
/* Not running as service, write to stderr */
! vfprintf(stderr, fmt, ap);
fflush(stderr);
}
#endif
--- 2728,2739 ----
*/
if (pgwin32_is_service()) /* Running as a service */
{
! write_eventlog(ERROR, errbuf, strlen(errbuf));
}
else
{
/* Not running as service, write to stderr */
! write_console(errbuf, strlen(errbuf));
fflush(stderr);
}
#endif
diff -cprN head/src/backend/utils/mb/encnames.c work/src/backend/utils/mb/encnames.c
*** head/src/backend/utils/mb/encnames.c 2009-04-24 17:43:50.000000000 +0900
--- work/src/backend/utils/mb/encnames.c 2009-10-07 11:04:14.224280217 +0900
*************** sizeof(pg_encname_tbl) / sizeof(pg_encna
*** 303,432 ****
pg_enc2name pg_enc2name_tbl[] =
{
{
! "SQL_ASCII", PG_SQL_ASCII
},
{
! "EUC_JP", PG_EUC_JP
},
{
! "EUC_CN", PG_EUC_CN
},
{
! "EUC_KR", PG_EUC_KR
},
{
! "EUC_TW", PG_EUC_TW
},
{
! "EUC_JIS_2004", PG_EUC_JIS_2004
},
{
! "UTF8", PG_UTF8
},
{
! "MULE_INTERNAL", PG_MULE_INTERNAL
},
{
! "LATIN1", PG_LATIN1
},
{
! "LATIN2", PG_LATIN2
},
{
! "LATIN3", PG_LATIN3
},
{
! "LATIN4", PG_LATIN4
},
{
! "LATIN5", PG_LATIN5
},
{
! "LATIN6", PG_LATIN6
},
{
! "LATIN7", PG_LATIN7
},
{
! "LATIN8", PG_LATIN8
},
{
! "LATIN9", PG_LATIN9
},
{
! "LATIN10", PG_LATIN10
},
{
! "WIN1256", PG_WIN1256
},
{
! "WIN1258", PG_WIN1258
},
{
! "WIN866", PG_WIN866
},
{
! "WIN874", PG_WIN874
},
{
! "KOI8R", PG_KOI8R
},
{
! "WIN1251", PG_WIN1251
},
{
! "WIN1252", PG_WIN1252
},
{
! "ISO_8859_5", PG_ISO_8859_5
},
{
! "ISO_8859_6", PG_ISO_8859_6
},
{
! "ISO_8859_7", PG_ISO_8859_7
},
{
! "ISO_8859_8", PG_ISO_8859_8
},
{
! "WIN1250", PG_WIN1250
},
{
! "WIN1253", PG_WIN1253
},
{
! "WIN1254", PG_WIN1254
},
{
! "WIN1255", PG_WIN1255
},
{
! "WIN1257", PG_WIN1257
},
{
! "KOI8U", PG_KOI8U
},
{
! "SJIS", PG_SJIS
},
{
! "BIG5", PG_BIG5
},
{
! "GBK", PG_GBK
},
{
! "UHC", PG_UHC
},
{
! "GB18030", PG_GB18030
},
{
! "JOHAB", PG_JOHAB
},
{
! "SHIFT_JIS_2004", PG_SHIFT_JIS_2004
}
};
--- 303,432 ----
pg_enc2name pg_enc2name_tbl[] =
{
{
! "SQL_ASCII", PG_SQL_ASCII, 0
},
{
! "EUC_JP", PG_EUC_JP, 20932
},
{
! "EUC_CN", PG_EUC_CN, 20936
},
{
! "EUC_KR", PG_EUC_KR, 51949
},
{
! "EUC_TW", PG_EUC_TW, 0
},
{
! "EUC_JIS_2004", PG_EUC_JIS_2004, 20932
},
{
! "UTF8", PG_UTF8, 65001
},
{
! "MULE_INTERNAL", PG_MULE_INTERNAL, 0
},
{
! "LATIN1", PG_LATIN1, 28591
},
{
! "LATIN2", PG_LATIN2, 28592
},
{
! "LATIN3", PG_LATIN3, 28593
},
{
! "LATIN4", PG_LATIN4, 28594
},
{
! "LATIN5", PG_LATIN5, 28599
},
{
! "LATIN6", PG_LATIN6, 0
},
{
! "LATIN7", PG_LATIN7, 0
},
{
! "LATIN8", PG_LATIN8, 0
},
{
! "LATIN9", PG_LATIN9, 28605
},
{
! "LATIN10", PG_LATIN10, 0
},
{
! "WIN1256", PG_WIN1256, 1256
},
{
! "WIN1258", PG_WIN1258, 1258
},
{
! "WIN866", PG_WIN866, 866
},
{
! "WIN874", PG_WIN874, 874
},
{
! "KOI8R", PG_KOI8R, 20866
},
{
! "WIN1251", PG_WIN1251, 1251
},
{
! "WIN1252", PG_WIN1252, 1252
},
{
! "ISO_8859_5", PG_ISO_8859_5, 28595
},
{
! "ISO_8859_6", PG_ISO_8859_6, 28596
},
{
! "ISO_8859_7", PG_ISO_8859_7, 28597
},
{
! "ISO_8859_8", PG_ISO_8859_8, 28598
},
{
! "WIN1250", PG_WIN1250, 1250
},
{
! "WIN1253", PG_WIN1253, 1253
},
{
! "WIN1254", PG_WIN1254, 1254
},
{
! "WIN1255", PG_WIN1255, 1255
},
{
! "WIN1257", PG_WIN1257, 1257
},
{
! "KOI8U", PG_KOI8U, 21866
},
{
! "SJIS", PG_SJIS, 932
},
{
! "BIG5", PG_BIG5, 950
},
{
! "GBK", PG_GBK, 936
},
{
! "UHC", PG_UHC, 0
},
{
! "GB18030", PG_GB18030, 54936
},
{
! "JOHAB", PG_JOHAB, 0
},
{
! "SHIFT_JIS_2004", PG_SHIFT_JIS_2004, 932
}
};
diff -cprN head/src/backend/utils/mb/mbutils.c work/src/backend/utils/mb/mbutils.c
*** head/src/backend/utils/mb/mbutils.c 2009-07-08 04:28:56.000000000 +0900
--- work/src/backend/utils/mb/mbutils.c 2009-10-07 11:04:14.224280217 +0900
*************** static FmgrInfo *ToClientConvProc = NULL
*** 58,63 ****
--- 58,64 ----
*/
static pg_enc2name *ClientEncoding = &pg_enc2name_tbl[PG_SQL_ASCII];
static pg_enc2name *DatabaseEncoding = &pg_enc2name_tbl[PG_SQL_ASCII];
+ static pg_enc2name *PlatformEncoding = NULL;
/*
* During backend startup we can't set client encoding because we (a)
*************** pg_client_encoding(PG_FUNCTION_ARGS)
*** 978,980 ****
--- 979,1041 ----
Assert(ClientEncoding);
return DirectFunctionCall1(namein, CStringGetDatum(ClientEncoding->name));
}
+
+ int
+ GetPlatformEncoding(void)
+ {
+ if (PlatformEncoding == NULL)
+ PlatformEncoding = &pg_enc2name_tbl[pg_get_encoding_from_locale("")];
+ return PlatformEncoding->encoding;
+ }
+
+ #ifdef WIN32
+
+ /*
+ * Result is palloc'ed null-terminated utf16 string. The character length
+ * is also passed to utf16len if not null. Returns NULL iff failed.
+ */
+ WCHAR *
+ pgwin32_toUTF16(const char *str, int len, int *utf16len)
+ {
+ WCHAR *utf16;
+ int dstlen;
+ UINT codepage;
+
+ codepage = pg_enc2name_tbl[GetDatabaseEncoding()].codepage;
+
+ /*
+ * Use MultiByteToWideChar directly if there is a corresponding codepage,
+ * or double conversion through UTF8.
+ */
+ if (codepage != 0)
+ {
+ utf16 = (WCHAR *) palloc(sizeof(WCHAR) * (len + 1));
+ dstlen = MultiByteToWideChar(codepage, 0, str, len, utf16, len);
+ utf16[dstlen] = L'\0';
+ }
+ else
+ {
+ char *utf8;
+
+ utf8 = (char *) pg_do_encoding_conversion((unsigned char *) str,
+ len, GetDatabaseEncoding(), PG_UTF8);
+ if (utf8 != str)
+ len = strlen(utf8);
+
+ utf16 = (WCHAR *) palloc(sizeof(WCHAR) * (len + 1));
+ dstlen = MultiByteToWideChar(CP_UTF8, 0, utf8, len, utf16, len);
+ utf16[dstlen] = L'\0';
+
+ if (utf8 != str)
+ pfree(utf8);
+ }
+
+ if (dstlen == 0 && len > 0)
+ return NULL; /* error */
+
+ if (utf16len)
+ *utf16len = len;
+ return utf16;
+ }
+
+ #endif
diff -cprN head/src/include/mb/pg_wchar.h work/src/include/mb/pg_wchar.h
*** head/src/include/mb/pg_wchar.h 2009-06-11 23:49:11.000000000 +0900
--- work/src/include/mb/pg_wchar.h 2009-10-07 11:04:14.225327260 +0900
*************** typedef struct pg_enc2name
*** 257,262 ****
--- 257,263 ----
{
char *name;
pg_enc encoding;
+ unsigned codepage; /* codepage for WIN32 */
} pg_enc2name;
extern pg_enc2name pg_enc2name_tbl[];
*************** extern const char *pg_get_client_encodin
*** 402,407 ****
--- 403,409 ----
extern void SetDatabaseEncoding(int encoding);
extern int GetDatabaseEncoding(void);
extern const char *GetDatabaseEncodingName(void);
+ extern int GetPlatformEncoding(void);
extern void pg_bind_textdomain_codeset(const char *domainname);
extern int pg_valid_client_encoding(const char *name);
*************** extern void mic2latin_with_table(const u
*** 458,461 ****
--- 460,467 ----
extern bool pg_utf8_islegal(const unsigned char *source, int length);
+ #ifdef WIN32
+ extern WCHAR *pgwin32_toUTF16(const char *str, int len, int *utf16len);
+ #endif
+
#endif /* PG_WCHAR_H */
2009/10/7 Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
Magnus Hagander <magnus@hagander.net> wrote:
Per your own comments earlier, and in the code, what will happen if
pg_do_encoding_conversion() calls ereport()? Didn't you say we need a
non-throwing version of it?We are hard to use encoding conversion functions in logging routines
because they could throw errors if there are some unconvertable characters.
Non-throwing version will convert such characters into '?' or escaped form
(something like \888 or \xFF). If there where such infrastructure, we can
support "log_encoding" settings and convert messages in platform-dependent
encoding before writing to syslog or console.
Right, which we don't have at this point. That would be very useful on
unix, i believe.
pgwin32_toUTF16() needs error checking on the API calls, and needs to
do something reasonable if it fails.Now it returns NULL and caller writes messages in the original encoding.
Seems reasonable. If encoding fails, I think that's the best we can do.
Also I added the following error checks before calling pgwin32_toUTF16()
(errordata_stack_depth < ERRORDATA_STACK_SIZE - 1)
to avoid recursive errors, but I'm not sure it is really meaningful.
Please remove or rewrite this part if it is not a right way.
I'm not entirely sure either, but it looks like it could protect us
from getting into a tight loop on an error here.. Tom (or someone else
who knows that for sure :P),comments?
The encoding_to_codepage array needs to go in encnames.c, where other
such tables are. Perhaps it can even be integrated in pg_enc2name_tbl
as a separate field?I added pg_enc2name.codepage. Note that this field is needed only
on Windows, but now exported for all platforms. If you don't like
the useless field, the following macro could be a help.
#ifdef WIN32
#define def_enc2name(name, codepage) { #name, PG_##name, codepage }
#else
#define def_enc2name(name, codepage) { #name, PG_##name }
#endif
pg_enc2name pg_enc2name_tbl[] =
{
def_enc2name(SQL_ASCII),
def_enc2name(EUC_JP),
...
Yeah, I think that makes sense. It's not much data, but it's
completely unnecessary :-) I can make that change at commit.
One other question - you note that WriteConsoleW() "could fail if
stderr is redirected". Are you saying that it will always fail when
stderr is redirected, or only sometimes? If ony sometimes, do you know
under which conditions it happens?
If it's always, I assume this just means that the logfile will be in
the database encoding and not in UTF16? Is this what we want, or would
we like the logfile to also be in UTF16? If we can convert it to
UTF16, that would fix the case when you have different databases in
different encodings, wouldn't it? (Even if your editor, unlike the
console subsystem, can view the individual encoding you need, I bet it
can't deal with multiple encodings in the same file)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes:
2009/10/7 Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
Also I added the following error checks before calling pgwin32_toUTF16()
� �(errordata_stack_depth < ERRORDATA_STACK_SIZE - 1)
to avoid recursive errors, but I'm not sure it is really meaningful.
Please remove or rewrite this part if it is not a right way.
I'm not entirely sure either, but it looks like it could protect us
from getting into a tight loop on an error here.. Tom (or someone else
who knows that for sure :P),comments?
I haven't read the patch, but I'd suggest making any behavior changes
dependent on in_error_recursion_trouble(), rather than getting in bed
with internal implementation variables.
regards, tom lane
Magnus Hagander <magnus@hagander.net> wrote:
One other question - you note that WriteConsoleW() "could fail if
stderr is redirected". Are you saying that it will always fail when
stderr is redirected, or only sometimes? If ony sometimes, do you know
under which conditions it happens?
It will always fail if redirected. We can test the conditions using:
pg_ctl start > result.log
So, the comment should be:
/* WriteConsoleW always fails if stderr is redirected. */
I cleaned up the patch per comments. I hope this will be the final one ;-).
* Use in_error_recursion_trouble() instead of own implementation.
* Use def_enc2name() macro to avoid adding the codepage field
on non-win32 platforms.
* Fix a bug of calculation of result length.
* Fix a memory leak on error handling path in pgwin32_toUTF16().
If it's always, I assume this just means that the logfile will be in
the database encoding and not in UTF16? Is this what we want, or would
we like the logfile to also be in UTF16? If we can convert it to
UTF16, that would fix the case when you have different databases in
different encodings, wouldn't it? (Even if your editor, unlike the
console subsystem, can view the individual encoding you need, I bet it
can't deal with multiple encodings in the same file)
Sure, the logfile will be filled with mixed encoding strings,
that could happen in logfile and syslog on non-win32 platforms.
I think UTF8 is better than UTF16 for logfile encoding because
there are some text editors that do not support wide characters.
At any rate, the logfile encoding feature will come from another patch,
that might add "log_encoding" variable and work on any platforms.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Attachments:
eventlog_20091013.patchapplication/octet-stream; name=eventlog_20091013.patchDownload
diff -cprN head/src/backend/utils/error/elog.c work/src/backend/utils/error/elog.c
*** head/src/backend/utils/error/elog.c 2009-07-04 04:14:25.000000000 +0900
--- work/src/backend/utils/error/elog.c 2009-10-13 10:11:34.310337321 +0900
*************** static int syslog_facility = LOG_LOCAL0;
*** 111,118 ****
static void write_syslog(int level, const char *line);
#endif
#ifdef WIN32
! static void write_eventlog(int level, const char *line);
#endif
/* We provide a small stack of ErrorData records for re-entrant cases */
--- 111,120 ----
static void write_syslog(int level, const char *line);
#endif
+ static void write_console(const char *line, int len);
+
#ifdef WIN32
! static void write_eventlog(int level, const char *line, int len);
#endif
/* We provide a small stack of ErrorData records for re-entrant cases */
*************** write_syslog(int level, const char *line
*** 1567,1576 ****
* Write a message line to the windows event log
*/
static void
! write_eventlog(int level, const char *line)
{
! int eventlevel = EVENTLOG_ERROR_TYPE;
! static HANDLE evtHandle = INVALID_HANDLE_VALUE;
if (evtHandle == INVALID_HANDLE_VALUE)
{
--- 1569,1579 ----
* Write a message line to the windows event log
*/
static void
! write_eventlog(int level, const char *line, int len)
{
! WCHAR *utf16;
! int eventlevel = EVENTLOG_ERROR_TYPE;
! static HANDLE evtHandle = INVALID_HANDLE_VALUE;
if (evtHandle == INVALID_HANDLE_VALUE)
{
*************** write_eventlog(int level, const char *li
*** 1606,1613 ****
break;
}
! ReportEvent(evtHandle,
eventlevel,
0,
0, /* All events are Id 0 */
--- 1609,1637 ----
break;
}
+ /*
+ * Convert message to UTF16 text and write it with ReportEventW,
+ * but fall-back into ReportEventA if conversion failed.
+ */
+ if (!in_error_recursion_trouble() &&
+ GetDatabaseEncoding() != GetPlatformEncoding() &&
+ (utf16 = pgwin32_toUTF16(line, len, NULL)) != NULL)
+ {
+ ReportEventW(evtHandle,
+ eventlevel,
+ 0,
+ 0, /* All events are Id 0 */
+ NULL,
+ 1,
+ 0,
+ (LPCWSTR *) &utf16,
+ NULL);
! pfree(utf16);
! }
! else
! {
! ReportEventA(evtHandle,
eventlevel,
0,
0, /* All events are Id 0 */
*************** write_eventlog(int level, const char *li
*** 1616,1624 ****
--- 1640,1690 ----
0,
&line,
NULL);
+ }
}
+
#endif /* WIN32 */
+ static void
+ write_console(const char *line, int len)
+ {
+ #ifdef WIN32
+ if (!in_error_recursion_trouble() &&
+ GetDatabaseEncoding() != GetPlatformEncoding())
+ {
+ static bool redirected = false;
+ WCHAR *utf16;
+ int utf16len;
+
+ if (!redirected &&
+ (utf16 = pgwin32_toUTF16(line, len, &utf16len)) != NULL)
+ {
+ HANDLE stdHandle;
+ DWORD written;
+
+ stdHandle = GetStdHandle(STD_ERROR_HANDLE);
+ if (WriteConsoleW(stdHandle, utf16, utf16len, &written, NULL))
+ {
+ pfree(utf16);
+ return;
+ }
+
+ /* WriteConsoleW always fails if stderr is redirected. */
+ pfree(utf16);
+ redirected = true;
+ }
+ }
+ #else
+ /*
+ * Conversion on non-win32 platform is not implemented yet.
+ * It requires non-throw version of pg_do_encoding_conversion(),
+ * that converts unconvertable characters to '?' without errors.
+ */
+ #endif
+
+ write(fileno(stderr), line, len);
+ }
+
/*
* setup formatted_log_time, for consistent times between CSV and regular logs
*/
*************** send_message_to_server_log(ErrorData *ed
*** 2206,2212 ****
/* Write to eventlog, if enabled */
if (Log_destination & LOG_DESTINATION_EVENTLOG)
{
! write_eventlog(edata->elevel, buf.data);
}
#endif /* WIN32 */
--- 2272,2278 ----
/* Write to eventlog, if enabled */
if (Log_destination & LOG_DESTINATION_EVENTLOG)
{
! write_eventlog(edata->elevel, buf.data, buf.len);
}
#endif /* WIN32 */
*************** send_message_to_server_log(ErrorData *ed
*** 2230,2239 ****
* because that's really a pipe to the syslogger process.
*/
else if (pgwin32_is_service())
! write_eventlog(edata->elevel, buf.data);
#endif
else
! write(fileno(stderr), buf.data, buf.len);
}
/* If in the syslogger process, try to write messages direct to file */
--- 2296,2305 ----
* because that's really a pipe to the syslogger process.
*/
else if (pgwin32_is_service())
! write_eventlog(edata->elevel, buf.data, buf.len);
#endif
else
! write_console(buf.data, buf.len);
}
/* If in the syslogger process, try to write messages direct to file */
*************** send_message_to_server_log(ErrorData *ed
*** 2256,2267 ****
{
const char *msg = _("Not safe to send CSV data\n");
! write(fileno(stderr), msg, strlen(msg));
if (!(Log_destination & LOG_DESTINATION_STDERR) &&
whereToSendOutput != DestDebug)
{
/* write message to stderr unless we just sent it above */
! write(fileno(stderr), buf.data, buf.len);
}
pfree(buf.data);
}
--- 2322,2333 ----
{
const char *msg = _("Not safe to send CSV data\n");
! write_console(msg, strlen(msg));
if (!(Log_destination & LOG_DESTINATION_STDERR) &&
whereToSendOutput != DestDebug)
{
/* write message to stderr unless we just sent it above */
! write_console(buf.data, buf.len);
}
pfree(buf.data);
}
*************** void
*** 2642,2647 ****
--- 2708,2716 ----
write_stderr(const char *fmt,...)
{
va_list ap;
+ #ifdef WIN32
+ char errbuf[2048]; /* Arbitrary size? */
+ #endif
fmt = _(fmt);
*************** write_stderr(const char *fmt,...)
*** 2651,2656 ****
--- 2720,2726 ----
vfprintf(stderr, fmt, ap);
fflush(stderr);
#else
+ vsnprintf(errbuf, sizeof(errbuf), fmt, ap);
/*
* On Win32, we print to stderr if running on a console, or write to
*************** write_stderr(const char *fmt,...)
*** 2658,2673 ****
*/
if (pgwin32_is_service()) /* Running as a service */
{
! char errbuf[2048]; /* Arbitrary size? */
!
! vsnprintf(errbuf, sizeof(errbuf), fmt, ap);
!
! write_eventlog(ERROR, errbuf);
}
else
{
/* Not running as service, write to stderr */
! vfprintf(stderr, fmt, ap);
fflush(stderr);
}
#endif
--- 2728,2739 ----
*/
if (pgwin32_is_service()) /* Running as a service */
{
! write_eventlog(ERROR, errbuf, strlen(errbuf));
}
else
{
/* Not running as service, write to stderr */
! write_console(errbuf, strlen(errbuf));
fflush(stderr);
}
#endif
diff -cprN head/src/backend/utils/mb/encnames.c work/src/backend/utils/mb/encnames.c
*** head/src/backend/utils/mb/encnames.c 2009-04-24 17:43:50.000000000 +0900
--- work/src/backend/utils/mb/encnames.c 2009-10-13 10:08:42.641335971 +0900
*************** sizeof(pg_encname_tbl) / sizeof(pg_encna
*** 300,433 ****
* XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
* ----------
*/
pg_enc2name pg_enc2name_tbl[] =
{
! {
! "SQL_ASCII", PG_SQL_ASCII
! },
! {
! "EUC_JP", PG_EUC_JP
! },
! {
! "EUC_CN", PG_EUC_CN
! },
! {
! "EUC_KR", PG_EUC_KR
! },
! {
! "EUC_TW", PG_EUC_TW
! },
! {
! "EUC_JIS_2004", PG_EUC_JIS_2004
! },
! {
! "UTF8", PG_UTF8
! },
! {
! "MULE_INTERNAL", PG_MULE_INTERNAL
! },
! {
! "LATIN1", PG_LATIN1
! },
! {
! "LATIN2", PG_LATIN2
! },
! {
! "LATIN3", PG_LATIN3
! },
! {
! "LATIN4", PG_LATIN4
! },
! {
! "LATIN5", PG_LATIN5
! },
! {
! "LATIN6", PG_LATIN6
! },
! {
! "LATIN7", PG_LATIN7
! },
! {
! "LATIN8", PG_LATIN8
! },
! {
! "LATIN9", PG_LATIN9
! },
! {
! "LATIN10", PG_LATIN10
! },
! {
! "WIN1256", PG_WIN1256
! },
! {
! "WIN1258", PG_WIN1258
! },
! {
! "WIN866", PG_WIN866
! },
! {
! "WIN874", PG_WIN874
! },
! {
! "KOI8R", PG_KOI8R
! },
! {
! "WIN1251", PG_WIN1251
! },
! {
! "WIN1252", PG_WIN1252
! },
! {
! "ISO_8859_5", PG_ISO_8859_5
! },
! {
! "ISO_8859_6", PG_ISO_8859_6
! },
! {
! "ISO_8859_7", PG_ISO_8859_7
! },
! {
! "ISO_8859_8", PG_ISO_8859_8
! },
! {
! "WIN1250", PG_WIN1250
! },
! {
! "WIN1253", PG_WIN1253
! },
! {
! "WIN1254", PG_WIN1254
! },
! {
! "WIN1255", PG_WIN1255
! },
! {
! "WIN1257", PG_WIN1257
! },
! {
! "KOI8U", PG_KOI8U
! },
! {
! "SJIS", PG_SJIS
! },
! {
! "BIG5", PG_BIG5
! },
! {
! "GBK", PG_GBK
! },
! {
! "UHC", PG_UHC
! },
! {
! "GB18030", PG_GB18030
! },
! {
! "JOHAB", PG_JOHAB
! },
! {
! "SHIFT_JIS_2004", PG_SHIFT_JIS_2004
! }
};
/* ----------
--- 300,354 ----
* XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h)
* ----------
*/
+ #ifdef WIN32
+ #define def_enc2name(name, codepage) { #name, PG_##name, codepage }
+ #else
+ #define def_enc2name(name, codepage) { #name, PG_##name }
+ #endif
pg_enc2name pg_enc2name_tbl[] =
{
! def_enc2name(SQL_ASCII, 0),
! def_enc2name(EUC_JP, 20932),
! def_enc2name(EUC_CN, 20936),
! def_enc2name(EUC_KR, 51949),
! def_enc2name(EUC_TW, 0),
! def_enc2name(EUC_JIS_2004, 20932),
! def_enc2name(UTF8, 65001),
! def_enc2name(MULE_INTERNAL, 0),
! def_enc2name(LATIN1, 28591),
! def_enc2name(LATIN2, 28592),
! def_enc2name(LATIN3, 28593),
! def_enc2name(LATIN4, 28594),
! def_enc2name(LATIN5, 28599),
! def_enc2name(LATIN6, 0),
! def_enc2name(LATIN7, 0),
! def_enc2name(LATIN8, 0),
! def_enc2name(LATIN9, 28605),
! def_enc2name(LATIN10, 0),
! def_enc2name(WIN1256, 1256),
! def_enc2name(WIN1258, 1258),
! def_enc2name(WIN866, 866),
! def_enc2name(WIN874, 874),
! def_enc2name(KOI8R, 20866),
! def_enc2name(WIN1251, 1251),
! def_enc2name(WIN1252, 1252),
! def_enc2name(ISO_8859_5, 28595),
! def_enc2name(ISO_8859_6, 28596),
! def_enc2name(ISO_8859_7, 28597),
! def_enc2name(ISO_8859_8, 28598),
! def_enc2name(WIN1250, 1250),
! def_enc2name(WIN1253, 1253),
! def_enc2name(WIN1254, 1254),
! def_enc2name(WIN1255, 1255),
! def_enc2name(WIN1257, 1257),
! def_enc2name(KOI8U, 21866),
! def_enc2name(SJIS, 932),
! def_enc2name(BIG5, 950),
! def_enc2name(GBK, 936),
! def_enc2name(UHC, 0),
! def_enc2name(GB18030, 54936),
! def_enc2name(JOHAB, 0),
! def_enc2name(SHIFT_JIS_2004, 932)
};
/* ----------
diff -cprN head/src/backend/utils/mb/mbutils.c work/src/backend/utils/mb/mbutils.c
*** head/src/backend/utils/mb/mbutils.c 2009-07-08 04:28:56.000000000 +0900
--- work/src/backend/utils/mb/mbutils.c 2009-10-13 10:08:42.642363445 +0900
*************** static FmgrInfo *ToClientConvProc = NULL
*** 58,63 ****
--- 58,64 ----
*/
static pg_enc2name *ClientEncoding = &pg_enc2name_tbl[PG_SQL_ASCII];
static pg_enc2name *DatabaseEncoding = &pg_enc2name_tbl[PG_SQL_ASCII];
+ static pg_enc2name *PlatformEncoding = NULL;
/*
* During backend startup we can't set client encoding because we (a)
*************** pg_client_encoding(PG_FUNCTION_ARGS)
*** 978,980 ****
--- 979,1044 ----
Assert(ClientEncoding);
return DirectFunctionCall1(namein, CStringGetDatum(ClientEncoding->name));
}
+
+ int
+ GetPlatformEncoding(void)
+ {
+ if (PlatformEncoding == NULL)
+ PlatformEncoding = &pg_enc2name_tbl[pg_get_encoding_from_locale("")];
+ return PlatformEncoding->encoding;
+ }
+
+ #ifdef WIN32
+
+ /*
+ * Result is palloc'ed null-terminated utf16 string. The character length
+ * is also passed to utf16len if not null. Returns NULL iff failed.
+ */
+ WCHAR *
+ pgwin32_toUTF16(const char *str, int len, int *utf16len)
+ {
+ WCHAR *utf16;
+ int dstlen;
+ UINT codepage;
+
+ codepage = pg_enc2name_tbl[GetDatabaseEncoding()].codepage;
+
+ /*
+ * Use MultiByteToWideChar directly if there is a corresponding codepage,
+ * or double conversion through UTF8.
+ */
+ if (codepage != 0)
+ {
+ utf16 = (WCHAR *) palloc(sizeof(WCHAR) * (len + 1));
+ dstlen = MultiByteToWideChar(codepage, 0, str, len, utf16, len);
+ utf16[dstlen] = L'\0';
+ }
+ else
+ {
+ char *utf8;
+
+ utf8 = (char *) pg_do_encoding_conversion((unsigned char *) str,
+ len, GetDatabaseEncoding(), PG_UTF8);
+ if (utf8 != str)
+ len = strlen(utf8);
+
+ utf16 = (WCHAR *) palloc(sizeof(WCHAR) * (len + 1));
+ dstlen = MultiByteToWideChar(CP_UTF8, 0, utf8, len, utf16, len);
+ utf16[dstlen] = L'\0';
+
+ if (utf8 != str)
+ pfree(utf8);
+ }
+
+ if (dstlen == 0 && len > 0)
+ {
+ pfree(utf16);
+ return NULL; /* error */
+ }
+
+ if (utf16len)
+ *utf16len = dstlen;
+ return utf16;
+ }
+
+ #endif
diff -cprN head/src/include/mb/pg_wchar.h work/src/include/mb/pg_wchar.h
*** head/src/include/mb/pg_wchar.h 2009-06-11 23:49:11.000000000 +0900
--- work/src/include/mb/pg_wchar.h 2009-10-13 10:08:42.642363445 +0900
*************** typedef struct pg_enc2name
*** 257,262 ****
--- 257,265 ----
{
char *name;
pg_enc encoding;
+ #ifdef WIN32
+ unsigned codepage; /* codepage for WIN32 */
+ #endif
} pg_enc2name;
extern pg_enc2name pg_enc2name_tbl[];
*************** extern const char *pg_get_client_encodin
*** 402,407 ****
--- 405,411 ----
extern void SetDatabaseEncoding(int encoding);
extern int GetDatabaseEncoding(void);
extern const char *GetDatabaseEncodingName(void);
+ extern int GetPlatformEncoding(void);
extern void pg_bind_textdomain_codeset(const char *domainname);
extern int pg_valid_client_encoding(const char *name);
*************** extern void mic2latin_with_table(const u
*** 458,461 ****
--- 462,469 ----
extern bool pg_utf8_islegal(const unsigned char *source, int length);
+ #ifdef WIN32
+ extern WCHAR *pgwin32_toUTF16(const char *str, int len, int *utf16len);
+ #endif
+
#endif /* PG_WCHAR_H */
On Mon, Oct 12, 2009 at 9:13 PM, Itagaki Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:
Magnus Hagander <magnus@hagander.net> wrote:
One other question - you note that WriteConsoleW() "could fail if
stderr is redirected". Are you saying that it will always fail when
stderr is redirected, or only sometimes? If ony sometimes, do you know
under which conditions it happens?It will always fail if redirected. We can test the conditions using:
pg_ctl start > result.log
So, the comment should be:
/* WriteConsoleW always fails if stderr is redirected. */I cleaned up the patch per comments. I hope this will be the final one ;-).
* Use in_error_recursion_trouble() instead of own implementation.
* Use def_enc2name() macro to avoid adding the codepage field
on non-win32 platforms.
* Fix a bug of calculation of result length.
* Fix a memory leak on error handling path in pgwin32_toUTF16().If it's always, I assume this just means that the logfile will be in
the database encoding and not in UTF16? Is this what we want, or would
we like the logfile to also be in UTF16? If we can convert it to
UTF16, that would fix the case when you have different databases in
different encodings, wouldn't it? (Even if your editor, unlike the
console subsystem, can view the individual encoding you need, I bet it
can't deal with multiple encodings in the same file)Sure, the logfile will be filled with mixed encoding strings,
that could happen in logfile and syslog on non-win32 platforms.
I think UTF8 is better than UTF16 for logfile encoding because
there are some text editors that do not support wide characters.
At any rate, the logfile encoding feature will come from another patch,
that might add "log_encoding" variable and work on any platforms.
Magnus has promised me on a stack of instant messages that he will
review this soon, but as he hasn't gotten to it yet, I am moving it to
the next CommitFest.
...Robert
Robert Haas wrote:
Sure, the logfile will be filled with mixed encoding strings,
that could happen in logfile and syslog on non-win32 platforms.
I think UTF8 is better than UTF16 for logfile encoding because
there are some text editors that do not support wide characters.
At any rate, the logfile encoding feature will come from another patch,
that might add "log_encoding" variable and work on any platforms.Magnus has promised me on a stack of instant messages that he will
review this soon, but as he hasn't gotten to it yet, I am moving it to
the next CommitFest.
I am with Magnus today and will make sure it gets done.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
2009/10/13 Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>:
Magnus Hagander <magnus@hagander.net> wrote:
One other question - you note that WriteConsoleW() "could fail if
stderr is redirected". Are you saying that it will always fail when
stderr is redirected, or only sometimes? If ony sometimes, do you know
under which conditions it happens?It will always fail if redirected. We can test the conditions using:
pg_ctl start > result.log
So, the comment should be:
/* WriteConsoleW always fails if stderr is redirected. */
Ok, fair enough. We already have a variable for that though - it's
called redirection_done. I think it does what's necessary - I have
used that one in my version of the patch. Please verify that it works
in your environment.
I cleaned up the patch per comments. I hope this will be the final one ;-).
* Use in_error_recursion_trouble() instead of own implementation.
* Use def_enc2name() macro to avoid adding the codepage field
on non-win32 platforms.
Per previous email, I had done this in my version of the patch, so it
looks slightly different than yours, but it has the same
functionality.
* Fix a bug of calculation of result length.
Where exactly is this one? I can't find it compared to my code, but
that could just be out-of-timezone-brain speaking :-)
* Fix a memory leak on error handling path in pgwin32_toUTF16().
Missed that one, thanks!
If it's always, I assume this just means that the logfile will be in
the database encoding and not in UTF16? Is this what we want, or would
we like the logfile to also be in UTF16? If we can convert it to
UTF16, that would fix the case when you have different databases in
different encodings, wouldn't it? (Even if your editor, unlike the
console subsystem, can view the individual encoding you need, I bet it
can't deal with multiple encodings in the same file)Sure, the logfile will be filled with mixed encoding strings,
that could happen in logfile and syslog on non-win32 platforms.
I think UTF8 is better than UTF16 for logfile encoding because
there are some text editors that do not support wide characters.
Don't most text editors on Windows do UTF16? In particular, I'd expect
more of them to do UTF16 than UTF8, but I could be wrong?
At any rate, the logfile encoding feature will come from another patch,
that might add "log_encoding" variable and work on any platforms.
Ok, good. Particularly the "other platform" is the winning argument.
So, what I believe is the latest version of the patch applied. Please
point out if I made a mistake in my changes against yours.
Sorry about the delay :(
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/