postgreSQL UPPER Method is converting the character "µ" into "M"

Started by Sai Tejaover 2 years ago15 messagesgeneral
Jump to latest
#1Sai Teja
saitejasaichintalapudi@gmail.com

Hi team,

I am using UPPER document name for converting the text from lower case into
upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµ

Here, that character is converting into M which should not be case
The postgreSQL is hosted in Linux Machine which is using Locale_ctype:-
en_US_utf.8

Would request you to please suggest any ideas to resolve this issue.

It'll would be very helpful and appreciated.

Thanks,
Sai Teja

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sai Teja (#1)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

Sai Teja <saitejasaichintalapudi@gmail.com> writes:

I am using UPPER document name for converting the text from lower case into
upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµ

According to https://en.wikipedia.org/wiki/Greek_alphabet
the upper-case form of µ (mu) is in fact M, so I'm not sure
what problem you have with this. But in any case, you are
complaining in the wrong place. This conversion is determined
by the rules of the selected locale. If you don't like it,
choose another locale, or make your own, or try to persuade
the glibc maintainers that they are wrong.

regards, tom lane

#3Sai Teja
saitejasaichintalapudi@gmail.com
In reply to: Tom Lane (#2)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

Hi Tom Lane, Thank you for your reply!

Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the
upper method is converting µ as µ only but not as M.

So, while validating the data we have encountered this problem.

We want to have the data in both DB's should be similar (db2 and postgreSQL)

In my local it is windows OS and locale is English_united_states.1252 and
in local it is converting as expected ( testµ into TESTµ)

So, even I want to change locale i cannot change now. Since already db is
created. BTW this db is hosted in azure.

Would request to please suggest if any alternatives are there to resolve
this issue.

Thanks,
Sai Teja

On Wed, 6 Sep, 2023, 7:23 pm Tom Lane, <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Sai Teja <saitejasaichintalapudi@gmail.com> writes:

I am using UPPER document name for converting the text from lower case

into

upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµ

According to https://en.wikipedia.org/wiki/Greek_alphabet
the upper-case form of µ (mu) is in fact M, so I'm not sure
what problem you have with this. But in any case, you are
complaining in the wrong place. This conversion is determined
by the rules of the selected locale. If you don't like it,
choose another locale, or make your own, or try to persuade
the glibc maintainers that they are wrong.

regards, tom lane

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Sai Teja (#3)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On Wed, 2023-09-06 at 19:34 +0530, Sai Teja wrote:

Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but not as M.

Then you are using a different collation in DB2, or "upper" (which is not standardized,
as far as I know), is defined differently there.

So, while validating the data we have encountered this problem.

We want to have the data in both DB's should be similar (db2 and postgreSQL)

In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµ into TESTµ)

So, even I want to change locale i cannot change now. Since already db is created. BTW this db is hosted in azure.

Would request to please suggest if any alternatives are there to resolve this issue. 

Specify a collation that does what you want explicitly:

SELECT upper('testµ' COLLATE "C");

upper
═══════
TESTµ
(1 row)

But then don't complain if your "ö" and "é" are not uppercased either.

Yours,
Laurenz Albe

#5Erik Wienhold
ewie@ewie.name
In reply to: Sai Teja (#1)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On 06/09/2023 15:42 CEST Sai Teja <saitejasaichintalapudi@gmail.com> wrote:

I am using UPPER document name for converting the text from lower case into
upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµ

Here, that character is converting into M which should not be case
The postgreSQL is hosted in Linux Machine which is using Locale_ctype:- en_US_utf.8

Use can use collation C or ucs_basic if you want to uppercase ASCII only:

postgres=# SELECT upper('testµ' COLLATE "C");
upper
-------
TESTµ
(1 row)

--
Erik

#6Helmut Bender
hab@helmut-bender.de
In reply to: Sai Teja (#3)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

Am 06.09.23 um 16:04 schrieb Sai Teja:

In my local it is windows OS and locale is English_united_states.1252
and in local it is converting as expected ( testµ into TESTµ)

You wrote PostgreSQL is hosted on an utf-8 system. That's why it's
working different.

And be careful: the M isn't the uppercase latin m, but in fact the
uppercase µ (U+039C in unicode)! If you would compare it to latin
alphabet 'TESTM' it would be different.

--
Gruß Helmut

#7Francisco Olarte
folarte@peoplecall.com
In reply to: Sai Teja (#3)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote:

Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but not as M.
So, while validating the data we have encountered this problem.

We want to have the data in both DB's should be similar (db2 and postgreSQL)

Data seems to be the same, function behaviour is what seems different.

In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµ into TESTµ)

Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.

If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).

Francisco Olarte.

#8Sai Teja
saitejasaichintalapudi@gmail.com
In reply to: Francisco Olarte (#7)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

Thank you all for your response.
Got one more query,

When I'm trying the below command
Select UPPER('Mass')
I'm getting MASS as output .
But when I created the column with generated always constraint
For example
Alter table xyz add column xyz varchar(800) generated always as
(UPPER(content)) stored

Here content is original string coming from the table
The output is coming as 'µass" when I am selecting the data from the table

Even When I try to use collat C or ucs_basics I'm getting the same when I
select the data from the table
But when I select the data explicitly through UPPER netword like select
UPPER('Mass') then I'm getting expected output such as MASS

Would request you to please suggest the ideas to resolve this issue.

Thanks,
Sai Teja

On Wed, 6 Sep, 2023, 8:59 pm Francisco Olarte, <folarte@peoplecall.com>
wrote:

Show quoted text

On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com>
wrote:

Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the

upper method is converting µ as µ only but not as M.

So, while validating the data we have encountered this problem.

We want to have the data in both DB's should be similar (db2 and

postgreSQL)
Data seems to be the same, function behaviour is what seems different.

In my local it is windows OS and locale is English_united_states.1252

and in local it is converting as expected ( testµ into TESTµ)

Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.

If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).

Francisco Olarte.

#9Erik Wienhold
ewie@ewie.name
In reply to: Sai Teja (#8)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On 06/09/2023 18:04 CEST Sai Teja <saitejasaichintalapudi@gmail.com> wrote:

When I'm trying the below command
Select UPPER('Mass')
I'm getting MASS as output .
But when I created the column with generated always constraint
For example
Alter table xyz add column xyz varchar(800) generated always as (UPPER(content)) stored

Here content is original string coming from the table
The output is coming as 'µass" when I am selecting the data from the table

Please provide the statements (CREATE TABLE, INSERT, and SELECT) to reproduce
it otherwise it's hard to analyze what's going on.

Homoglyphs are one explanation if you get 'µass' from the generated column as
described. Another explanation is that you just selected a different column.

--
Erik

#10Erik Wienhold
ewie@ewie.name
In reply to: Erik Wienhold (#9)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On 06/09/2023 18:37 CEST Erik Wienhold <ewie@ewie.name> wrote:

Homoglyphs are one explanation if you get 'µass' from the generated column as
described.

postgres=# SELECT upper('𝝻𝚊𝚜𝚜');
upper
-------
𝝻𝚊𝚜𝚜
(1 row)

The codepoints I picked are:

* MATHEMATICAL SANS-SERIF BOLD SMALL MU
* MATHEMATICAL MONOSPACE SMALL A
* MATHEMATICAL MONOSPACE SMALL S

--
Erik

#11Sai Teja
saitejasaichintalapudi@gmail.com
In reply to: Erik Wienhold (#10)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

I added one column with generated always column with UPPER CASE like below:-

Alter table table_name t add column data varchar(8000) generated always as
(UPPER(t.content)) stored

Data column is generated always constraint here

This column has many sentences for each row in which some of the characters
are in Greek language.
Like µ, ë, ä, Ä etc..
So, for the example testµ when I choose
1. Select UPPER('testµ')
Output :- TESTM

But as per mail conversation I have used COLLATE ucs_basic like
2. Select UPPER('testµ' collate "ucs_basic")
Output :- TESTµ (which is correct)

3. SELECT UPPER('Mass' collate "ucs_basic")
Output :- MASS (which is correct)

4. Select data from table (here data is the column which is created with
generated always column like mentioned above)

For some of the rows which contains Greek characters I'm getting wrong
output.

For ex:- for the word 'MASS' I'm getting 'µASS' when I select the data from
the table

Summary:- I'm getting wrong output when I use upper keyword with collation
for the table
But when I explicitly call upper keyword with collation like mentioned in
above I'm getting the results as expected.

Even I tried to add collation in the column itself but it didn't worked.

Alter table table_name t add column data varchar(8000) generated always as
(UPPER(t.content, collation "ucs_basic")) stored
Or
Alter table table_name t add column data varchar(8000) generated always as
(UPPER(t.content) collation "ucs_basic") stored

Both didn't worked. As I got wrong output when I selected the data from the
table.

On Wed, 6 Sep, 2023, 10:18 pm Erik Wienhold, <ewie@ewie.name> wrote:

Show quoted text

On 06/09/2023 18:37 CEST Erik Wienhold <ewie@ewie.name> wrote:

Homoglyphs are one explanation if you get 'µass' from the generated

column as

described.

postgres=# SELECT upper('𝝻𝚊𝚜𝚜');
upper
-------
𝝻𝚊𝚜𝚜
(1 row)

The codepoints I picked are:

* MATHEMATICAL SANS-SERIF BOLD SMALL MU
* MATHEMATICAL MONOSPACE SMALL A
* MATHEMATICAL MONOSPACE SMALL S

--
Erik

In reply to: Sai Teja (#3)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On Wed, 6 Sep 2023 19:34:40 +0530
Sai Teja <saitejasaichintalapudi@gmail.com> wrote:

In my local it is windows OS and locale is English_united_states.1252 and
in local it is converting as expected ( testµ into TESTµ)

I'm not familiar with PostgreSQL under Windows. Maybe collation from locale
"English_united_states.1252" is available in PostgreSQL? Could you try creating
a database/table/column using this collation?

Or just look at available collations using:

select collname from pg_collation

++

#13Joe Conway
mail@joeconway.com
In reply to: Helmut Bender (#6)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On 9/6/23 11:22, Helmut Bender wrote:

Am 06.09.23 um 16:04 schrieb Sai Teja:

In my local it is windows OS and locale is English_united_states.1252
and in local it is converting as expected ( testµ into TESTµ)

You wrote PostgreSQL is hosted on an utf-8 system. That's why it's
working different.

And be careful: the M isn't the uppercase latin m, but in fact the
uppercase µ (U+039C in unicode)! If you would compare it to latin
alphabet 'TESTM' it would be different.

Yep, there are interactions between the encoding used by your terminal,
the server encoding, and the client encoding.

You have to be particularly careful when cutting and pasting that the
client encoding and your terminal encoding match or you can get
surprising results.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#14Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Francisco Olarte (#7)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On 2023-09-06 17:29:21 +0200, Francisco Olarte wrote:

On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote:

Actually we are migrating the data from DB2 to postgreSQL. So in DB2
the upper method is converting µ as µ only but not as M.
So, while validating the data we have encountered this problem.

[...]

Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.

If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).

Nope, it doesn't:

hjp=> select chr(181) as micro, chr(956) as mu;
╔═══════╤════╗
║ micro │ mu ║
╟───────┼────╢
║ µ │ μ ║
╚═══════╧════╝
(1 row)

hjp=> select upper(chr(181)) as micro, upper(chr(956)) as mu;
╔═══════╤════╗
║ micro │ mu ║
╟───────┼────╢
║ Μ │ Μ ║
╚═══════╧════╝
(1 row)

(At least not on an Ubuntu 22.04 system using the en_US.UTF-8 locale).

IMHO uppercasing MICRO SIGN doesn't make much sense, but that was the
decision that either the libc maintainers ore the Unicode committee
made.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#15Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Sai Teja (#11)
Re: postgreSQL UPPER Method is converting the character "µ" into "M"

On 2023-09-06 22:39:42 +0530, Sai Teja wrote:

This column has many sentences for each row in which some of the
characters are in Greek language.
Like µ, ë, ä, Ä etc..

Except for the µ[1]and even that's debatable, since it's a micro sign, not a mu., those ̣aren't Greek letters.

hp

[1]: and even that's debatable, since it's a micro sign, not a mu.

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"