postgreSQL UPPER Method is converting the character "µ" into "M"
Hi team,
I am using UPPER document name for converting the text from lower case into
upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµ
Here, that character is converting into M which should not be case
The postgreSQL is hosted in Linux Machine which is using Locale_ctype:-
en_US_utf.8
Would request you to please suggest any ideas to resolve this issue.
It'll would be very helpful and appreciated.
Thanks,
Sai Teja
Sai Teja <saitejasaichintalapudi@gmail.com> writes:
I am using UPPER document name for converting the text from lower case into
upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµ
According to https://en.wikipedia.org/wiki/Greek_alphabet
the upper-case form of µ (mu) is in fact M, so I'm not sure
what problem you have with this. But in any case, you are
complaining in the wrong place. This conversion is determined
by the rules of the selected locale. If you don't like it,
choose another locale, or make your own, or try to persuade
the glibc maintainers that they are wrong.
regards, tom lane
Hi Tom Lane, Thank you for your reply!
Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the
upper method is converting µ as µ only but not as M.
So, while validating the data we have encountered this problem.
We want to have the data in both DB's should be similar (db2 and postgreSQL)
In my local it is windows OS and locale is English_united_states.1252 and
in local it is converting as expected ( testµ into TESTµ)
So, even I want to change locale i cannot change now. Since already db is
created. BTW this db is hosted in azure.
Would request to please suggest if any alternatives are there to resolve
this issue.
Thanks,
Sai Teja
On Wed, 6 Sep, 2023, 7:23 pm Tom Lane, <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Sai Teja <saitejasaichintalapudi@gmail.com> writes:
I am using UPPER document name for converting the text from lower case
into
upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµAccording to https://en.wikipedia.org/wiki/Greek_alphabet
the upper-case form of µ (mu) is in fact M, so I'm not sure
what problem you have with this. But in any case, you are
complaining in the wrong place. This conversion is determined
by the rules of the selected locale. If you don't like it,
choose another locale, or make your own, or try to persuade
the glibc maintainers that they are wrong.regards, tom lane
On Wed, 2023-09-06 at 19:34 +0530, Sai Teja wrote:
Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but not as M.
Then you are using a different collation in DB2, or "upper" (which is not standardized,
as far as I know), is defined differently there.
So, while validating the data we have encountered this problem.
We want to have the data in both DB's should be similar (db2 and postgreSQL)
In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµ into TESTµ)
So, even I want to change locale i cannot change now. Since already db is created. BTW this db is hosted in azure.
Would request to please suggest if any alternatives are there to resolve this issue.
Specify a collation that does what you want explicitly:
SELECT upper('testµ' COLLATE "C");
upper
═══════
TESTµ
(1 row)
But then don't complain if your "ö" and "é" are not uppercased either.
Yours,
Laurenz Albe
On 06/09/2023 15:42 CEST Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
I am using UPPER document name for converting the text from lower case into
upper case.
But here for the below example
Command:- select UPPER(testµ)
Input :- testµ
Output:- TESTM
Expected output :- TESTµHere, that character is converting into M which should not be case
The postgreSQL is hosted in Linux Machine which is using Locale_ctype:- en_US_utf.8
Use can use collation C or ucs_basic if you want to uppercase ASCII only:
postgres=# SELECT upper('testµ' COLLATE "C");
upper
-------
TESTµ
(1 row)
--
Erik
Am 06.09.23 um 16:04 schrieb Sai Teja:
In my local it is windows OS and locale is English_united_states.1252
and in local it is converting as expected ( testµ into TESTµ)
You wrote PostgreSQL is hosted on an utf-8 system. That's why it's
working different.
And be careful: the M isn't the uppercase latin m, but in fact the
uppercase µ (U+039C in unicode)! If you would compare it to latin
alphabet 'TESTM' it would be different.
--
Gruß Helmut
On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the upper method is converting µ as µ only but not as M.
So, while validating the data we have encountered this problem.We want to have the data in both DB's should be similar (db2 and postgreSQL)
Data seems to be the same, function behaviour is what seems different.
In my local it is windows OS and locale is English_united_states.1252 and in local it is converting as expected ( testµ into TESTµ)
Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.
If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).
Francisco Olarte.
Thank you all for your response.
Got one more query,
When I'm trying the below command
Select UPPER('Mass')
I'm getting MASS as output .
But when I created the column with generated always constraint
For example
Alter table xyz add column xyz varchar(800) generated always as
(UPPER(content)) stored
Here content is original string coming from the table
The output is coming as 'µass" when I am selecting the data from the table
Even When I try to use collat C or ucs_basics I'm getting the same when I
select the data from the table
But when I select the data explicitly through UPPER netword like select
UPPER('Mass') then I'm getting expected output such as MASS
Would request you to please suggest the ideas to resolve this issue.
Thanks,
Sai Teja
On Wed, 6 Sep, 2023, 8:59 pm Francisco Olarte, <folarte@peoplecall.com>
wrote:
Show quoted text
On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com>
wrote:Actually we are migrating the data from DB2 to postgreSQL. So in DB2 the
upper method is converting µ as µ only but not as M.
So, while validating the data we have encountered this problem.
We want to have the data in both DB's should be similar (db2 and
postgreSQL)
Data seems to be the same, function behaviour is what seems different.In my local it is windows OS and locale is English_united_states.1252
and in local it is converting as expected ( testµ into TESTµ)
Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).Francisco Olarte.
On 06/09/2023 18:04 CEST Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
When I'm trying the below command
Select UPPER('Mass')
I'm getting MASS as output .
But when I created the column with generated always constraint
For example
Alter table xyz add column xyz varchar(800) generated always as (UPPER(content)) storedHere content is original string coming from the table
The output is coming as 'µass" when I am selecting the data from the table
Please provide the statements (CREATE TABLE, INSERT, and SELECT) to reproduce
it otherwise it's hard to analyze what's going on.
Homoglyphs are one explanation if you get 'µass' from the generated column as
described. Another explanation is that you just selected a different column.
--
Erik
On 06/09/2023 18:37 CEST Erik Wienhold <ewie@ewie.name> wrote:
Homoglyphs are one explanation if you get 'µass' from the generated column as
described.
postgres=# SELECT upper('𝝻𝚊𝚜𝚜');
upper
-------
𝝻𝚊𝚜𝚜
(1 row)
The codepoints I picked are:
* MATHEMATICAL SANS-SERIF BOLD SMALL MU
* MATHEMATICAL MONOSPACE SMALL A
* MATHEMATICAL MONOSPACE SMALL S
--
Erik
I added one column with generated always column with UPPER CASE like below:-
Alter table table_name t add column data varchar(8000) generated always as
(UPPER(t.content)) stored
Data column is generated always constraint here
This column has many sentences for each row in which some of the characters
are in Greek language.
Like µ, ë, ä, Ä etc..
So, for the example testµ when I choose
1. Select UPPER('testµ')
Output :- TESTM
But as per mail conversation I have used COLLATE ucs_basic like
2. Select UPPER('testµ' collate "ucs_basic")
Output :- TESTµ (which is correct)
3. SELECT UPPER('Mass' collate "ucs_basic")
Output :- MASS (which is correct)
4. Select data from table (here data is the column which is created with
generated always column like mentioned above)
For some of the rows which contains Greek characters I'm getting wrong
output.
For ex:- for the word 'MASS' I'm getting 'µASS' when I select the data from
the table
Summary:- I'm getting wrong output when I use upper keyword with collation
for the table
But when I explicitly call upper keyword with collation like mentioned in
above I'm getting the results as expected.
Even I tried to add collation in the column itself but it didn't worked.
Alter table table_name t add column data varchar(8000) generated always as
(UPPER(t.content, collation "ucs_basic")) stored
Or
Alter table table_name t add column data varchar(8000) generated always as
(UPPER(t.content) collation "ucs_basic") stored
Both didn't worked. As I got wrong output when I selected the data from the
table.
On Wed, 6 Sep, 2023, 10:18 pm Erik Wienhold, <ewie@ewie.name> wrote:
Show quoted text
On 06/09/2023 18:37 CEST Erik Wienhold <ewie@ewie.name> wrote:
Homoglyphs are one explanation if you get 'µass' from the generated
column as
described.
postgres=# SELECT upper('𝝻𝚊𝚜𝚜');
upper
-------
𝝻𝚊𝚜𝚜
(1 row)The codepoints I picked are:
* MATHEMATICAL SANS-SERIF BOLD SMALL MU
* MATHEMATICAL MONOSPACE SMALL A
* MATHEMATICAL MONOSPACE SMALL S--
Erik
On Wed, 6 Sep 2023 19:34:40 +0530
Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
In my local it is windows OS and locale is English_united_states.1252 and
in local it is converting as expected ( testµ into TESTµ)
I'm not familiar with PostgreSQL under Windows. Maybe collation from locale
"English_united_states.1252" is available in PostgreSQL? Could you try creating
a database/table/column using this collation?
Or just look at available collations using:
select collname from pg_collation
++
On 9/6/23 11:22, Helmut Bender wrote:
Am 06.09.23 um 16:04 schrieb Sai Teja:
In my local it is windows OS and locale is English_united_states.1252
and in local it is converting as expected ( testµ into TESTµ)You wrote PostgreSQL is hosted on an utf-8 system. That's why it's
working different.And be careful: the M isn't the uppercase latin m, but in fact the
uppercase µ (U+039C in unicode)! If you would compare it to latin
alphabet 'TESTM' it would be different.
Yep, there are interactions between the encoding used by your terminal,
the server encoding, and the client encoding.
You have to be particularly careful when cutting and pasting that the
client encoding and your terminal encoding match or you can get
surprising results.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On 2023-09-06 17:29:21 +0200, Francisco Olarte wrote:
On Wed, 6 Sept 2023 at 16:40, Sai Teja <saitejasaichintalapudi@gmail.com> wrote:
Actually we are migrating the data from DB2 to postgreSQL. So in DB2
the upper method is converting µ as µ only but not as M.
So, while validating the data we have encountered this problem.
[...]
Maybe because unicode has GREEK CAPITAL LETTER MU ( 924, which looks
like capital M ), GREEK SMALL LETTER MU ( 956 ) and MICRO SIGN ( 181,
which looks like small mu ) while windows-1252 only has 0xB6 as micro.
OTOH Windows-1253 ( greek ansi ) has all three.If your small mu are really micro-sign ( which is suspected if youused
1252 ) maybe changing them to that helps ( but I do not have the
resources to test that on hand ).
Nope, it doesn't:
hjp=> select chr(181) as micro, chr(956) as mu;
╔═══════╤════╗
║ micro │ mu ║
╟───────┼────╢
║ µ │ μ ║
╚═══════╧════╝
(1 row)
hjp=> select upper(chr(181)) as micro, upper(chr(956)) as mu;
╔═══════╤════╗
║ micro │ mu ║
╟───────┼────╢
║ Μ │ Μ ║
╚═══════╧════╝
(1 row)
(At least not on an Ubuntu 22.04 system using the en_US.UTF-8 locale).
IMHO uppercasing MICRO SIGN doesn't make much sense, but that was the
decision that either the libc maintainers ore the Unicode committee
made.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
On 2023-09-06 22:39:42 +0530, Sai Teja wrote:
This column has many sentences for each row in which some of the
characters are in Greek language.
Like µ, ë, ä, Ä etc..
Except for the µ[1]and even that's debatable, since it's a micro sign, not a mu., those ̣aren't Greek letters.
hp
[1]: and even that's debatable, since it's a micro sign, not a mu.
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"