BUG #15230: "Logical decoding" is not sensitive to client encoding setting

Started by PG Bug reporting formalmost 8 years ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15230
Logged by: Hillel Eilat
Email address: hillel.eilat@attunity.com
PostgreSQL version: 9.4.4
Operating system: Windows 7
Description:

Logical Decoding is not sensitive to Client character encoding setting

My project uses Logical Decoding by interacting with the WAL backend via
native non-SQL interface.
The plugin used is the common "test_decoding", which is shipped together
with the kit.
There is a Japanese database for which encoding is defined as ""EUC_JP".
Ordinarily - we process the streamed data in UTF8 client encoding - thus
maintaining a common general "consumer" functions.
Consequently, prior to issuing PQconnectdbParams(keywords, values, true) - a
{"client_encoding","UTF8"} couple is introduced.
To be on the safe side - a couple of PQclientEncoding(pConn) /
pg_encoding_to_char(iClientEncoding) is issued thereafter,
for approving that UTF8 was properly set.

Despite the above setting , data which is streamed in does not show up in
UTF8.
It preserves the backend server EUC_JP encoding.

This must be a bug.
One would expect that decoded data which is treamed in should be subjected
to client encoding.

Your assistance will be appreciated.

Regards

Hillel.

In reply to: PG Bug reporting form (#1)
Re: BUG #15230: "Logical decoding" is not sensitive to client encoding setting

2018-06-05 5:29 GMT-03:00 PG Bug reporting form <noreply@postgresql.org>:

The plugin used is the common "test_decoding", which is shipped together
with the kit.

What is the test_decoding output mode? By default, it uses textual
mode. Did you set binary mode (foce-binary=1)?

There is a Japanese database for which encoding is defined as ""EUC_JP".
Ordinarily - we process the streamed data in UTF8 client encoding - thus
maintaining a common general "consumer" functions.
Consequently, prior to issuing PQconnectdbParams(keywords, values, true) - a
{"client_encoding","UTF8"} couple is introduced.
To be on the safe side - a couple of PQclientEncoding(pConn) /
pg_encoding_to_char(iClientEncoding) is issued thereafter,
for approving that UTF8 was properly set.

client_encoding should be set in the replication connection because if
you set it later it won't be passed down to libpqwalreceiver.

[1]: https://www.postgresql.org/docs/9.4/static/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-MODE

--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#3Hillel Eilat
Hillel.Eilat@attunity.com
In reply to: Euler Taveira de Oliveira (#2)
RE: BUG #15230: "Logical decoding" is not sensitive to client encoding setting

Thanks.

1. As per your question - default (=textual) decoding mode is used.

2. Factually - client_encoding is set in the replication connection.
The problem is that it does not help.
Data which is streamed in, is represented in the server_encoding (Japanese in this case) while we expect UTF8 - which was set as client_encoding.

For being more specific - here is the essence of a piece of "C" code which is used for establishing the connection - via PQconnectdbParams(keywords, values, true);
This is the REPLICATION connection on which "START_REPLICATION SLOT "XXXXXXX" LOGICAL LLL/SSS" is executed later.
One would expect that data fetched in via PQgetCopyData(...) thereafter, will show up in client_encoding representation.
But this is not the case...

Your clarifications will be appreciated.

Thanks
Hillel.

char *pszClientEncoding = "UTF8"; // Set client encoding

i = 0; // Initial Array index

keywords[i] = "dbname";
values[i] = pszDbName == NULL ? "replication" : pszDbName;
i++;
keywords[i] = "replication";
values[i] = pszDbName == NULL ? "true" : "database";
i++;
keywords[i] = "fallback_application_name";
values[i] = pszProgName;
i++;

if (pszDbHost)
{
keywords[i] = "host";
values[i] = pszDbHost;
i++;
}
if (pszDbUser)
{
keywords[i] = "user";
values[i] = pszDbUser;
i++;
}
if (pszDbPort)
{
keywords[i] = "port";
values[i] = pszDbPort;
i++;
}

if (pszClientEncoding) // Set client encoding
{
keywords[i] = "client_encoding";
values[i] = pszClientEncoding;
i++;
}

/* Prompting for password here is not a matter of interest (the -"W" connad option) */
//need_password = (dbgetpassword == 1 && dbpassword == NULL);

need_password = 0; // No point in this mechanism here

//do
{
if (pszDbPassword)
{
keywords[i] = "password";
values[i] = pszDecryptedPassword;
}
else
{
keywords[i] = NULL;
values[i] = NULL;
}

tmpconn = PQconnectdbParams(keywords, values, true);

if (!tmpconn)
{
pSetup->config.logger_error((char *)pszLoggingOrg,__LINE__,kPG_LOGGER_SEVERITY_ERROR,"PQconnectdbParams(...) - Could not connect to the server.");
return NULL;
}

if (PQstatus(tmpconn) == CONNECTION_BAD && PQconnectionNeedsPassword(tmpconn) && dbgetpassword != -1)
{
AT_STR->snprintf(szMsg, sizeof(szMsg), "Could not connect to server. Missing or improper password: %s",ar_PQerrorMessage(tmpconn));
pSetup->config.logger_error((char *)pszLoggingOrg,__LINE__,kPG_LOGGER_SEVERITY_ERROR,szMsg);
ar_PQfinish(tmpconn);
return NULL;
}
}
//while (need_password);

-----Original Message-----
From: Euler Taveira [mailto:euler@timbira.com.br]
Sent: Thursday, June 14, 2018 5:28 PM
To: Hillel Eilat <Hillel.Eilat@attunity.com>; pgsql-bugs@lists.postgresql.org
Subject: Re: BUG #15230: "Logical decoding" is not sensitive to client encoding setting

2018-06-05 5:29 GMT-03:00 PG Bug reporting form <noreply@postgresql.org>:

The plugin used is the common "test_decoding", which is shipped
together with the kit.

What is the test_decoding output mode? By default, it uses textual mode. Did you set binary mode (foce-binary=1)?

There is a Japanese database for which encoding is defined as ""EUC_JP".
Ordinarily - we process the streamed data in UTF8 client encoding -
thus maintaining a common general "consumer" functions.
Consequently, prior to issuing PQconnectdbParams(keywords, values,
true) - a {"client_encoding","UTF8"} couple is introduced.
To be on the safe side - a couple of PQclientEncoding(pConn) /
pg_encoding_to_char(iClientEncoding) is issued thereafter, for
approving that UTF8 was properly set.

client_encoding should be set in the replication connection because if you set it later it won't be passed down to libpqwalreceiver.

[1]: https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.postgresql.org%2Fdocs%2F9.4%2Fstatic%2Flogicaldecoding-output-plugin.html%23LOGICALDECODING-OUTPUT-MODE&amp;data=01%7C01%7Chillel.eilat%40attunity.com%7C9a1fc00d858f459156cc08d5d20313bc%7C128547273c574819ab290c418b8310a1%7C1&amp;sdata=i4ViTGALzy04B%2F9GU4MToSVYJLCDxCxZahqChrax%2Bdk%3D&amp;reserved=0

--
Euler Taveira Timbira -
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.timbira.com.br%2F&amp;data=01%7C01%7Chillel.eilat%40attunity.com%7C9a1fc00d858f459156cc08d5d20313bc%7C128547273c574819ab290c418b8310a1%7C1&amp;sdata=NOwGcjs2uIMGLCp6JaCjixKzL3mGDZVGxPJxo5m4UUo%3D&amp;reserved=0
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento