BUG #7913: TO_CHAR Function & Turkish collate

Started by Adnan DURSUNabout 13 years ago7 messagesbugs

a_dursun@hotmail.com

about 13 years ago

The following bug has been logged on the website:

Bug reference: 7913
Logged by: TO_CHAR Function & Turkish collate
Email address: a_dursun@hotmail.com
PostgreSQL version: 9.2.0
Operating system: Linux
Description:

prod=# SELECT TO_CHAR('2013-03-01'::date,'DAY');
to_char
----------
FRİDAY
(1 row)
But it must return as FRIDAY.
Our database lc_collate is tr_TR.UTF-8 and encoding is UTF8.

Best regards,
Adnan DURSUN
Ankar/TURKEY

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Adnan DURSUN (#1)

Re: BUG #7913: TO_CHAR Function & Turkish collate

a_dursun@hotmail.com writes:

prod=# SELECT TO_CHAR('2013-03-01'::date,'DAY');
to_char
----------
FRİDAY
(1 row)
But it must return as FRIDAY.
Our database lc_collate is tr_TR.UTF-8 and encoding is UTF8.

It looks like the cause of this is that the result is computed as
str_toupper("Friday"), and str_toupper() applies a collation-sensitive
upcasing rule.

I think the use of str_toupper() is appropriate when processing the
locale-specific string for a TMDAY specification; but plain DAY is not
supposed to be locale-dependent, so we probably should use an ASCII-only
upcasing rule in the non-TM code path.

Anybody have an opinion on whether to back-patch such a fix? It seems
conceivable that somebody out there is relying on the current behavior.
OTOH, I believe that only Turkish UTF8 locales exhibit this behavior
(the single-byte-encoding code path in str_toupper acts differently for
historical reasons). So it's pretty inconsistent as it stands.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Peter Eisentraut

peter_e@gmx.net

about 13 years ago

In reply to: Tom Lane (#2)

Re: BUG #7913: TO_CHAR Function & Turkish collate

On Sun, 2013-03-03 at 10:42 -0500, Tom Lane wrote:

I think the use of str_toupper() is appropriate when processing the
locale-specific string for a TMDAY specification; but plain DAY is not
supposed to be locale-dependent, so we probably should use an
ASCII-only upcasing rule in the non-TM code path.

Agreed.

Anybody have an opinion on whether to back-patch such a fix?

I think it's a bug that should be backpatched.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Euler Taveira de Oliveira

euler@timbira.com

about 13 years ago

In reply to: Tom Lane (#2)

Re: BUG #7913: TO_CHAR Function & Turkish collate

On 03-03-2013 12:42, Tom Lane wrote:

Anybody have an opinion on whether to back-patch such a fix? It seems
conceivable that somebody out there is relying on the current behavior.
OTOH, I believe that only Turkish UTF8 locales exhibit this behavior
(the single-byte-encoding code path in str_toupper acts differently for
historical reasons). So it's pretty inconsistent as it stands.

Nope. I'm not aware of the Turkish weird rules. Mea culpa. :(

As you suggested, s/str_toupper/pg_toupper/ in the else block (no TM) is the
right fix. I'm not aware of another locale that would break if we apply such a
change in a stable branch. Are you want me to post a fix?

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Euler Taveira de Oliveira (#4)

Re: BUG #7913: TO_CHAR Function & Turkish collate

Euler Taveira <euler@timbira.com> writes:

As you suggested, s/str_toupper/pg_toupper/ in the else block (no TM) is the
right fix. I'm not aware of another locale that would break if we apply such a
change in a stable branch. Are you want me to post a fix?

Thanks, but I have a fix mostly written already.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

about 13 years ago

In reply to: Peter Eisentraut (#3)

Re: BUG #7913: TO_CHAR Function & Turkish collate

Peter Eisentraut <peter_e@gmx.net> writes:

On Sun, 2013-03-03 at 10:42 -0500, Tom Lane wrote:

Anybody have an opinion on whether to back-patch such a fix?

I think it's a bug that should be backpatched.

Done. In addition to day/month names, I found that there were
case-folding hazards for timezone abbreviations ('tz' format)
and Roman numerals for numbers ('rn' format) ... though, curiously,
not for Roman numerals for months.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Devrim GÜNDÜZ

devrim@gunduz.org

about 13 years ago

In reply to: Tom Lane (#6)

Re: BUG #7913: TO_CHAR Function & Turkish collate

Hi,

On Tue, 2013-03-05 at 13:08 -0500, Tom Lane wrote:

I think it's a bug that should be backpatched.

Done. In addition to day/month names, I found that there were
case-folding hazards for timezone abbreviations ('tz' format)
and Roman numerals for numbers ('rn' format) ... though, curiously,
not for Roman numerals for months.

Thanks!

Regards,
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz