BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

Started by PG Bug reporting formabout 7 years ago8 messagesbugs

noreply@postgresql.org

about 7 years ago

The following bug has been logged on the website:

Bug reference: 15772
Logged by: Eugene Podshivalov
Email address: yaugenka@gmail.com
PostgreSQL version: 11.2
Operating system: Windows 10
Description:

My postgresql.conf has the following locale settings
----
#client_encoding = sql_ascii # actually, defaults to database encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Russian_Russia.1251' # locale for system error message
strings
lc_monetary = 'Russian_Russia.1251' # locale for monetary formatting
lc_numeric = 'Russian_Russia.1251' # locale for number formatting
lc_time = 'Russian_Russia.1251' # locale for time formatting
----
Server encoding is "UTF8".
Messages in the log file are usually in UTF8, but some messages are logged
in ANSI encoding.
Here are some example cases (in the Russian language) when ANSI is used
instead of UTF8
--
СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.)
ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size".
--
СООБЩЕНИЕ: получен запрос на быстрое выключение
СООБЩЕНИЕ: прерывание всех активных транзакций
--
СООБЩЕНИЕ: система БД была выключена:
СООБЩЕНИЕ: система БД готова принимать подключения

Bruce Momjian

bruce@momjian.us

about 7 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

On Thu, Apr 18, 2019 at 01:53:18PM +0000, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 15772
Logged by: Eugene Podshivalov
Email address: yaugenka@gmail.com
PostgreSQL version: 11.2
Operating system: Windows 10
Description:

My postgresql.conf has the following locale settings
----
#client_encoding = sql_ascii # actually, defaults to database encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Russian_Russia.1251' # locale for system error message
strings
lc_monetary = 'Russian_Russia.1251' # locale for monetary formatting
lc_numeric = 'Russian_Russia.1251' # locale for number formatting
lc_time = 'Russian_Russia.1251' # locale for time formatting
----
Server encoding is "UTF8".
Messages in the log file are usually in UTF8, but some messages are logged
in ANSI encoding.
Here are some example cases (in the Russian language) when ANSI is used
instead of UTF8
--
СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.)
ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size".
--
СООБЩЕНИЕ: получен запрос на быстрое выключение
СООБЩЕНИЕ: прерывание всех активных транзакций
--
СООБЩЕНИЕ: система БД была выключена:
СООБЩЕНИЕ: система БД готова принимать подключения

I am kind of confused since all the messages look like Russian to me,
except for the mention of "max_wal_size". When you say ANSI, do you
mean ISO-8859-5 - Cyrillic, or ASCII?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Eugene Podshivalov

yaugenka@gmail.com

about 7 years ago

In reply to: Bruce Momjian (#2)

Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

Bruce,
Here is a screenshot of how looks like when I open the log file in
notepad++ and switch encoding from UTF8 to ANSI.
[image: image.png]

Regards,
Eugene

чт, 18 апр. 2019 г. в 17:31, Bruce Momjian <bruce@momjian.us>:

Show quoted text

On Thu, Apr 18, 2019 at 01:53:18PM +0000, PG Bug reporting form wrote:

The following bug has been logged on the website:

Bug reference: 15772
Logged by: Eugene Podshivalov
Email address: yaugenka@gmail.com
PostgreSQL version: 11.2
Operating system: Windows 10
Description:

My postgresql.conf has the following locale settings
----
#client_encoding = sql_ascii # actually, defaults to database

encoding

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Russian_Russia.1251' # locale for

system error message

strings
lc_monetary = 'Russian_Russia.1251' # locale for

monetary formatting

lc_numeric = 'Russian_Russia.1251' # locale for

number formatting

lc_time = 'Russian_Russia.1251' # locale

for time formatting

----
Server encoding is "UTF8".
Messages in the log file are usually in UTF8, but some messages are

logged

in ANSI encoding.
Here are some example cases (in the Russian language) when ANSI is used
instead of UTF8
--
СООБЩЕНИЕ: контрольные точки происходят слишком часто (через 19 сек.)
ПОДСКАЗКА: Возможно, стоит увеличить параметр "max_wal_size".
--
СООБЩЕНИЕ: получен запрос на быстрое выключение
СООБЩЕНИЕ: прерывание всех активных транзакций
--
СООБЩЕНИЕ: система БД была выключена:
СООБЩЕНИЕ: система БД готова принимать подключения

I am kind of confused since all the messages look like Russian to me,
except for the mention of "max_wal_size". When you say ANSI, do you
mean ISO-8859-5 - Cyrillic, or ASCII?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Bruce Momjian

bruce@momjian.us

about 7 years ago

In reply to: Eugene Podshivalov (#3)

Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

On Thu, Apr 18, 2019 at 05:40:59PM +0300, Eugene Podshivalov wrote:

Bruce,
Here is a screenshot of how looks like when I open the log file in notepad++
and switch encoding from UTF8 to ANSI.
image.png

Uh, I see what you mean. Can you give us a message that is OK and one
that is messed up, but the English versions of those? I still don't
know what ANSI is? What does the output look like in UTF8 mode?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Alvaro Herrera

alvherre@2ndquadrant.com

about 7 years ago

In reply to: Eugene Podshivalov (#3)

Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

On 2019-Apr-18, Eugene Podshivalov wrote:

Bruce,
Here is a screenshot of how looks like when I open the log file in
notepad++ and switch encoding from UTF8 to ANSI.
[image: image.png]

I suppose you have databases with the single-byte encoding amidst your
UTF8 ones. AFAIK the log file registers the log entries in the same
encoding that the database uses. Different databases can use different
encodings.

That's pretty broken, but it's how it is.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

about 7 years ago

In reply to: Alvaro Herrera (#5)

Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I suppose you have databases with the single-byte encoding amidst your
UTF8 ones. AFAIK the log file registers the log entries in the same
encoding that the database uses. Different databases can use different
encodings.

That's pretty broken, but it's how it is.

Yeah, and it's not easy to improve on. If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)

Also, what do you do if you get an encoding conversion failure?

That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.

A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.

regards, tom lane

Eugene Podshivalov

yaugenka@gmail.com

about 7 years ago

In reply to: Tom Lane (#6)

Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

I guess that the issue is related to this setting in the postgresql.conf
file:
lc_messages = 'Russian_Russia.1251' # locale for system
error message

I tried chaning it to 'en_US.UTF-8' and all new message in the log file are
in English and look good regardless of whether I view it in UTF-8 or ANSI
encoding.

I don't know what ANSI stands for either but it goes first in the list of
encodings in notepad++ Encodings menu.
I guess it refers to Windows-1251 in my case.

The English variant of the messed up message in the UTF8 section of the
screenshot above is
LOG: database system was shut down at ...
LOG: database system is ready to accept connections

All my databases have encoding=UTF8, collate=Russian_Russia.1251,
ctype=Russian_Russia.1251

Regards,
Eugene

чт, 18 апр. 2019 г. в 19:20, Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I suppose you have databases with the single-byte encoding amidst your
UTF8 ones. AFAIK the log file registers the log entries in the same
encoding that the database uses. Different databases can use different
encodings.

That's pretty broken, but it's how it is.

Yeah, and it's not easy to improve on. If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)

Also, what do you do if you get an encoding conversion failure?

That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.

A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.

regards, tom lane

Eugene Podshivalov

yaugenka@gmail.com

about 7 years ago

In reply to: Eugene Podshivalov (#7)

Re: BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

Could it be the issue of not all messages taking lc_messages setting into
account?
i.e. in my case all messeges should be in ANSI (Wndows-1251) instead of
UTF-8.

Regards,
Eugene

чт, 18 апр. 2019 г. в 19:26, Eugene Podshivalov <yaugenka@gmail.com>:

Show quoted text

I guess that the issue is related to this setting in the postgresql.conf
file:
lc_messages = 'Russian_Russia.1251' # locale for system
error message

I tried chaning it to 'en_US.UTF-8' and all new message in the log file
are in English and look good regardless of whether I view it in UTF-8 or
ANSI encoding.

I don't know what ANSI stands for either but it goes first in the list of
encodings in notepad++ Encodings menu.
I guess it refers to Windows-1251 in my case.

The English variant of the messed up message in the UTF8 section of the
screenshot above is
LOG: database system was shut down at ...
LOG: database system is ready to accept connections

All my databases have encoding=UTF8, collate=Russian_Russia.1251,
ctype=Russian_Russia.1251

Regards,
Eugene

чт, 18 апр. 2019 г. в 19:20, Tom Lane <tgl@sss.pgh.pa.us>:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I suppose you have databases with the single-byte encoding amidst your
UTF8 ones. AFAIK the log file registers the log entries in the same
encoding that the database uses. Different databases can use different
encodings.

That's pretty broken, but it's how it is.

Yeah, and it's not easy to improve on. If we tried to convert all
log messages to the same encoding, which one would that be?
(Please, no nonsense about UTF8 being a universal solution.
The Japanese don't think so, for instance.)

Also, what do you do if you get an encoding conversion failure?

That's even before you get into implementation-dependent problems,
like what to do early in process startup before the encoding
conversion machinery is operational.

A more realistic idea might be to have separate log files for
different encodings, though that has a bunch of management issues
to solve as well.

regards, tom lane

BUG #15772: Some messages in log files are in ANSI encoding while server encoding is UTF8

Attachments: