BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8
The following bug has been logged on the website:
Bug reference: 15772
Logged by: Eugene Podshivalov
Email address: yaugenka@gmail.com
PostgreSQL version: 11.2
Operating system: Windows 10
Description:
My postgresql.conf has the following locale settings
----
#client_encoding = sql_ascii # actually, defaults to database encoding
# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Russian_Russia.1251' # locale for system error message
strings
lc_monetary = 'Russian_Russia.1251' # locale for monetary formatting
lc_numeric = 'Russian_Russia.1251' # locale for number formatting
lc_time = 'Russian_Russia.1251' # locale for time formatting
----
Server encoding is "UTF8".
Messages in the log file are usually in UTF8, but some messages are logged
in ANSI encoding.
Here are some example cases (in the Russian language) when ANSI is used
instead of UTF8
--
СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.)
ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size".
--
СООБЩЕНИЕ: получен запрос на быстрое выключение
СООБЩЕНИЕ: прерывание всех активных транзакций
--
СООБЩЕНИЕ: система БД была выключена:
СООБЩЕНИЕ: система БД готова принимать подключения
On Thu, Apr 18, 2019 at 01:53:18PM +0000, PG Bug reporting form wrote:
The following bug has been logged on the website:
Bug reference: 15772
Logged by: Eugene Podshivalov
Email address: yaugenka@gmail.com
PostgreSQL version: 11.2
Operating system: Windows 10
Description:My postgresql.conf has the following locale settings
----
#client_encoding = sql_ascii # actually, defaults to database encoding# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Russian_Russia.1251' # locale for system error message
strings
lc_monetary = 'Russian_Russia.1251' # locale for monetary formatting
lc_numeric = 'Russian_Russia.1251' # locale for number formatting
lc_time = 'Russian_Russia.1251' # locale for time formatting
----
Server encoding is "UTF8".
Messages in the log file are usually in UTF8, but some messages are logged
in ANSI encoding.
Here are some example cases (in the Russian language) when ANSI is used
instead of UTF8
--
СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.)
ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size".
--
СООБЩЕНИЕ: получен запрос на быстрое выключение
СООБЩЕНИЕ: прерывание всех активных транзакций
--
СООБЩЕНИЕ: система БД была выключена:
СООБЩЕНИЕ: система БД готова принимать подключения
I am kind of confused since all the messages look like Russian to me,
except for the mention of "max_wal_size". When you say ANSI, do you
mean ISO-8859-5 - Cyrillic, or ASCII?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
Bruce,
Here is a screenshot of how looks like when I open the log file in
notepad++ and switch encoding from UTF8 to ANSI.
[image: image.png]
Regards,
Eugene
чт, 18 апр. 2019 г. в 17:31, Bruce Momjian <bruce@momjian.us>:
Show quoted text
On Thu, Apr 18, 2019 at 01:53:18PM +0000, PG Bug reporting form wrote:
The following bug has been logged on the website:
Bug reference: 15772
Logged by: Eugene Podshivalov
Email address: yaugenka@gmail.com
PostgreSQL version: 11.2
Operating system: Windows 10
Description:My postgresql.conf has the following locale settings
----
#client_encoding = sql_ascii # actually, defaults to databaseencoding
# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Russian_Russia.1251' # locale forsystem error message
strings
lc_monetary = 'Russian_Russia.1251' # locale formonetary formatting
lc_numeric = 'Russian_Russia.1251' # locale for
number formatting
lc_time = 'Russian_Russia.1251' # locale
for time formatting
----
Server encoding is "UTF8".
Messages in the log file are usually in UTF8, but some messages arelogged
in ANSI encoding.
Here are some example cases (in the Russian language) when ANSI is used
instead of UTF8
--
СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.)
ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size".
--
СООБЩЕНИЕ: получен запрос на быстрое выключение
СООБЩЕНИЕ: прерывание всех активных транзакций
--
СООБЩЕНИЕ: система БД была выключена:
СООБЩЕНИЕ: система БД готова принимать подключенияI am kind of confused since all the messages look like Russian to me,
except for the mention of "max_wal_size". When you say ANSI, do you
mean ISO-8859-5 - Cyrillic, or ASCII?--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com+ As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Attachments:
On Thu, Apr 18, 2019 at 05:40:59PM +0300, Eugene Podshivalov wrote:
Bruce,
Here is a screenshot of how looks like when I open the log file in notepad++
and switch encoding from UTF8 to ANSI.
image.png
Uh, I see what you mean. Can you give us a message that is OK and one
that is messed up, but the English versions of those? I still don't
know what ANSI is? What does the output look like in UTF8 mode?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
On 2019-Apr-18, Eugene Podshivalov wrote:
Bruce,
Here is a screenshot of how looks like when I open the log file in
notepad++ and switch encoding from UTF8 to ANSI.
[image: image.png]
I suppose you have databases with the single-byte encoding amidst your
UTF8 ones. AFAIK the log file registers the log entries in the same
encoding that the database uses. Different databases can use different
encodings.
That's pretty broken, but it's how it is.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I suppose you have databases with the single-byte encoding amidst your
UTF8 ones. AFAIK the log file registers the log entries in the same
encoding that the database uses. Different databases can use different
encodings.
That's pretty broken, but it's how it is.
Yeah, and it's not easy to improve on. If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)
Also, what do you do if you get an encoding conversion failure?
That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.
A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.
regards, tom lane
I guess that the issue is related to this setting in the postgresql.conf
file:
lc_messages = 'Russian_Russia.1251' # locale for system
error message
I tried chaning it to 'en_US.UTF-8' and all new message in the log file are
in English and look good regardless of whether I view it in UTF-8 or ANSI
encoding.
I don't know what ANSI stands for either but it goes first in the list of
encodings in notepad++ Encodings menu.
I guess it refers to Windows-1251 in my case.
The English variant of the messed up message in the UTF8 section of the
screenshot above is
LOG: database system was shut down at ...
LOG: database system is ready to accept connections
All my databases have encoding=UTF8, collate=Russian_Russia.1251,
ctype=Russian_Russia.1251
Regards,
Eugene
чт, 18 апр. 2019 г. в 19:20, Tom Lane <tgl@sss.pgh.pa.us>:
Show quoted text
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I suppose you have databases with the single-byte encoding amidst your
UTF8 ones. AFAIK the log file registers the log entries in the same
encoding that the database uses. Different databases can use different
encodings.That's pretty broken, but it's how it is.
Yeah, and it's not easy to improve on. If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)Also, what do you do if you get an encoding conversion failure?
That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.regards, tom lane
Could it be the issue of not all messages taking lc_messages setting into
account?
i.e. in my case all messeges should be in ANSI (Wndows-1251) instead of
UTF-8.
Regards,
Eugene
чт, 18 апр. 2019 г. в 19:26, Eugene Podshivalov <yaugenka@gmail.com>:
Show quoted text
I guess that the issue is related to this setting in the postgresql.conf
file:
lc_messages = 'Russian_Russia.1251' # locale for system
error messageI tried chaning it to 'en_US.UTF-8' and all new message in the log file
are in English and look good regardless of whether I view it in UTF-8 or
ANSI encoding.I don't know what ANSI stands for either but it goes first in the list of
encodings in notepad++ Encodings menu.
I guess it refers to Windows-1251 in my case.The English variant of the messed up message in the UTF8 section of the
screenshot above is
LOG: database system was shut down at ...
LOG: database system is ready to accept connectionsAll my databases have encoding=UTF8, collate=Russian_Russia.1251,
ctype=Russian_Russia.1251Regards,
Eugeneчт, 18 апр. 2019 г. в 19:20, Tom Lane <tgl@sss.pgh.pa.us>:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
I suppose you have databases with the single-byte encoding amidst your
UTF8 ones. AFAIK the log file registers the log entries in the same
encoding that the database uses. Different databases can use different
encodings.That's pretty broken, but it's how it is.
Yeah, and it's not easy to improve on. If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)Also, what do you do if you get an encoding conversion failure?
That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.regards, tom lane