unexpected character used as group separator by to_char
Hi,
I'm having a problem with to_char() inserting unexpected characters as group separator.
For the record below, I wish to display 'credit' properly formated in an html form, using to_char(credit, 'FM999G990D00')
SELECT *, to_char(credit, 'FM999G990D00') as yo from tblprospect_balance where id_item = 33;
id_prospect | id_item | date_item | libelle | debit | credit | lettrage | id_facture | yo
-------------+---------+------------+---------+-------+---------+----------+------------+----------
385 | 33 | 2021-03-09 | yo man | 0.00 | 2345.10 | | 8 | 2 345,10
The numbers are properly displayed in the html form. However, updating the record requires the user to manually delete the space in '2 345,10', otherwise the last digit is lost. Typing numbers including a group separator using the space bar works fine.
I exported the record to a text file and inspected it with emacs in hexadecimal mode :
3338 3509 3333 0932 3032 312d 3033 2d30 385.33.2021-03-0
00000010: 3909 796f 206d 616e 0930 2e30 3009 3233 9.yo man.0.00.23
00000020: 3435 2e31 3009 0938 0932 e280 af33 3435 45.10..8.2...345
00000030: 2c31 300a ,10.
As you can see, the space in field 'libelle' (yo man) is different from the one in field 'yo' (2...345,10)
The difference is also apparent in a link built using the record's fields as parameters :
/base/prospect?id_prospect=385&balance=0&update_item=0&id_item=33&id_facture=8&date_item=2021-03-09&libelle=yo%20man&debit=0,00&credit=2%E2%80%AF345%2C10&lettrage=
^^^^^^^^^^^^^ ^^^^^^^^^^^^
What can I do to get a standard space as group separator for my numbers?
#### system information ########
I use postgresql 11.9 on Debian buster
My settings are :
show lc_numeric;
lc_numeric
-------------
fr_FR.UTF-8
locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
\d tblprospect_balance
Table "public.tblprospect_balance"
Column | Type | Collation | Nullable | Default
-------------+---------------+-----------+----------+------------------------------------------------------
id_prospect | integer | | not null |
id_item | integer | | not null | nextval('tblprospect_balance_id_item_seq'::regclass)
date_item | date | | not null | CURRENT_DATE
libelle | text | | |
debit | numeric(10,2) | | not null | 0
credit | numeric(10,2) | | not null | 0
lettrage | text | | |
id_facture | integer | | not null | 0
--
Bien à vous, Vincent Veyron
https://marica.fr/
Logiciel de gestion des sinistres assurances, des dossiers contentieux et des contrats pour le service juridique
On Tue, Mar 9, 2021 at 1:28 PM Vincent Veyron <vv.lists@wanadoo.fr> wrote:
For the record below, I wish to display 'credit' properly formated in an
html form, using to_char(credit, 'FM999G990D00')^^^^^^^^^^^^^ ^^^^^^^^^^^^
What can I do to get a standard space as group separator for my numbers?
By using "G" you are giving up control and letting your locale settings
decide what gets output. You can continue to use to_char but take control
back by being explicit, or pass actual numbers into and out of the database
and let your front-end deal with presentation concerns. I suggest the
latter.
David J.
On Tue, 9 Mar 2021 13:33:19 -0700
"David G. Johnston" <david.g.johnston@gmail.com> wrote:
By using "G" you are giving up control and letting your locale settings
decide what gets output. You can continue to use to_char but take control
back by being explicit, or pass actual numbers into and out of the database
and let your front-end deal with presentation concerns. I suggest the
latter.
Well, that was a fast answer :-)
Indeed, replacing "G" with \s works. However, the application is destined to be multi-lingual, and number formats vary widely between countries :
1,000.25 (US)
1'000,25 (CH)
1 000,25 (FR)
1.000,25 (GER)
etc...
So, I intended to follow this advice by Tom Lane :
https://www.postgresql-archive.org/GENERAL-setting-LC-NUMERIC-td1857521.html
Having a database with the proper lc_numeric setting for each country, and using to_char/to_number to manipulate numbers is much more appealing than writing my own parser in my front end. But this weird space is getting in my way.
--
Bien à vous, Vincent Veyron
https://marica.fr/
Logiciel de gestion des sinistres assurances, des dossiers contentieux et des contrats pour le service juridique
Vincent Veyron <vv.lists@wanadoo.fr> writes:
Having a database with the proper lc_numeric setting for each country, and using to_char/to_number to manipulate numbers is much more appealing than writing my own parser in my front end. But this weird space is getting in my way.
The point here is that 'G' and related format codes act as specified by
your LC_MONETARY locale. If you don't like the results, you need to use
a different locale.
(I suppose you could also use regexp_replace to convert random forms
of whitespace to plain ASCII space.)
regards, tom lane
On 3/9/21 2:22 PM, Tom Lane wrote:
Vincent Veyron <vv.lists@wanadoo.fr> writes:
Having a database with the proper lc_numeric setting for each country, and using to_char/to_number to manipulate numbers is much more appealing than writing my own parser in my front end. But this weird space is getting in my way.
The point here is that 'G' and related format codes act as specified by
your LC_MONETARY locale. If you don't like the results, you need to use
a different locale.(I suppose you could also use regexp_replace to convert random forms
of whitespace to plain ASCII space.)regards, tom lane
I'm in David's camp on this one: there are plenty of currency format
conversion libraries from which to choose.
rjs
On Tue, 09 Mar 2021 16:22:07 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
The point here is that 'G' and related format codes act as specified by
your LC_MONETARY locale. If you don't like the results, you need to use
a different locale.
This is a numeric(10,2) type field though. I tried casting it to money type, with lc_monetary set to "fr_FR.UTF-8", same weird space
(I suppose you could also use regexp_replace to convert random forms
of whitespace to plain ASCII space.)
No dice. 'G' formatting looks like a whitespace, but is different (it appears to be slightly narrower when displayed in html, too) :
select regexp_replace(to_char(1234.56, 'FM999 990D00'), E'[\\s]', 'x');
regexp_replace
----------------
1x234,56
(1 row)
select regexp_replace(to_char(1234.56, 'FM999G990D00'), E'[\\s]', 'x');
regexp_replace
----------------
1 234,56
(1 row)
On 3/9/21 4:03 PM, Vincent Veyron wrote:
No dice. 'G' formatting looks like a whitespace, but is different (it appears to be slightly narrower when displayed in html, too) :
An "n-space", no?
Vincent Veyron <vv.lists@wanadoo.fr> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
(I suppose you could also use regexp_replace to convert random forms
of whitespace to plain ASCII space.)
No dice. 'G' formatting looks like a whitespace, but is different (it appears to be slightly narrower when displayed in html, too) :
Huh ... do you have lc_monetary and lc_ctype set the same? The
latter is (I believe) what determines the behavior of regex \s.
In any case, you could force the issue with a pattern like '[\s\uNNNN]'
for whatever the code point of that character is.
regards, tom lane
On 2021-Mar-10, Vincent Veyron wrote:
On Tue, 09 Mar 2021 16:22:07 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
(I suppose you could also use regexp_replace to convert random forms
of whitespace to plain ASCII space.)No dice. 'G' formatting looks like a whitespace, but is different (it
appears to be slightly narrower when displayed in html, too) :
That space (0xe280af) is U+202F, which appears to be used for French and
Mongolian languages (exclusively?). It is quite possible that in the
future some other language will end up using some different whitespace
character, possibly breaking any code you write today -- the use of
U+202F appears to be quite recent.
Maybe it'd be better to have the query return two fields: one destined
for printing, the other you can use to feed the database back.
--
�lvaro Herrera Valdivia, Chile
"Porque Kim no hac�a nada, pero, eso s�,
con extraordinario �xito" ("Kim", Kipling)
On Tue, 2021-03-09 at 21:28 +0100, Vincent Veyron wrote:
I'm having a problem with to_char() inserting unexpected characters as group separator.
For the record below, I wish to display 'credit' properly formated in an html form, using to_char(credit, 'FM999G990D00')
SELECT *, to_char(credit, 'FM999G990D00') as yo from tblprospect_balance where id_item = 33;
id_prospect | id_item | date_item | libelle | debit | credit | lettrage | id_facture | yo
-------------+---------+------------+---------+-------+---------+----------+------------+----------
385 | 33 | 2021-03-09 | yo man | 0.00 | 2345.10 | | 8 | 2 345,10The numbers are properly displayed in the html form. However, updating the record requires
the user to manually delete the space in '2 345,10', otherwise the last digit is lost.
Typing numbers including a group separator using the space bar works fine.I exported the record to a text file and inspected it with emacs in hexadecimal mode :
3338 3509 3333 0932 3032 312d 3033 2d30 385.33.2021-03-0
00000010: 3909 796f 206d 616e 0930 2e30 3009 3233 9.yo man.0.00.23
00000020: 3435 2e31 3009 0938 0932 e280 af33 3435 45.10..8.2...345
00000030: 2c31 300a ,10.As you can see, the space in field 'libelle' (yo man) is different from the one in field 'yo' (2...345,10)
What can I do to get a standard space as group separator for my numbers?
#### system information ########
I use postgresql 11.9 on Debian busterMy settings are :
show lc_numeric;
lc_numeric
-------------
fr_FR.UTF-8
That "space" is UNICODE U+202F ("Narrow No-Break Space"), and that's what your
C library thinks to be the correct group separator for the French language.
You can either replace the character:
SELECT replace(to_char(2345.10, 'FM999G990D00'), E'\u202F', ' ');
or you can choose a different locale that uses a different group separator
(not sure if that exists).
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
On 10 Mar 2021, at 16:24, Alvaro Herrera wrote:
That space (0xe280af) is U+202F, which appears to be used for French
and
Mongolian languages (exclusively?). It is quite possible that in the
future some other language will end up using some different whitespace
character, possibly breaking any code you write today -- the use of
U+202F appears to be quite recent.
Drifting off topic a little. That a proper code point for things that
will benefit from the whitespace but should still stay together.
Also it’s not that new, added in 1999 —
https://codepoints.net/U+202F
The other use case is between the number and its ISO symbol e.g.,
20 °C — and the non-breaking form is important here
As for who uses a thin space in money… it is a normal in European
notation — https://www.languageediting.com/format-numbers-eu-vs-us/
And the thin space is part of the international standard for breaking up
large numbers (from 1948), specifically no dots or commas should be used
in this role. The dot or comma is only to be used for the decimal point!
All of which just highlights problems in localisation and begs the
question whether there is an ISO locale setting — it should work
everywhere ;)
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a
well-known solution to every human problem — neat, plausible, and
wrong.
— H. L. Mencken, 1920
On Wed, 10 Mar 2021 02:24:57 -0300
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
That space (0xe280af) is U+202F, which appears to be used for French and
Mongolian languages (exclusively?)
On Tue, 09 Mar 2021 18:57:05 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
In any case, you could force the issue with a pattern like '[\s\uNNNN]'
for whatever the code point of that character is.
On Wed, 10 Mar 2021 09:41:19 +0100
Laurenz Albe <laurenz.albe@cybertec.at> wrote:
SELECT replace(to_char(2345.10, 'FM999G990D00'), E'\u202F', ' ');
Thank you all for the helpful answers, I can work with that.
Strange that to_char() and to_number() would use a different separator, though :
select to_number(to_char(1234.56, 'FM999G990D00'), 'FM999G990D00');
to_number
-----------
1234.5
select to_number(replace(to_char(1234.56, 'FM999G990D00'),E'\u202F', ' '), 'FM999G990D00');
to_number
-----------
1234.56
--
Bien à vous, Vincent Veyron
https://compta.libremen.com
Logiciel libre de comptabilité générale en partie double
On 2021-Mar-10, Gavan Schneider wrote:
On 10 Mar 2021, at 16:24, Alvaro Herrera wrote:
That space (0xe280af) is U+202F, which appears to be used for French and
Mongolian languages (exclusively?). It is quite possible that in the
future some other language will end up using some different whitespace
character, possibly breaking any code you write today -- the use of
U+202F appears to be quite recent.Drifting off topic a little. That a proper code point for things that will
benefit from the whitespace but should still stay together.
Also it’s not that new, added in 1999 — https://codepoints.net/U+202F
I probably got misled on this whole thing by these change proposals.
https://www.unicode.org/L2/L2019/19116-clarify-nnbsp.pdf
https://www.unicode.org/L2/L2020/20008-core-text.pdf
Apparently prior to this, they (?) had been using/recommending
THIN SPACE U+2009 as separator, which is not non-breaking.
Anyway, it reinforces my point that it's not impossible that some other
locale definition could use U+2009 when printing numbers, or even some
other kind of spacing entity in non-Latin languages etc. So I think
that for truly robust handling you should separate the thing you use for
display from the thing you use to talk to the database.
And the thin space is part of the international standard for breaking up
large numbers (from 1948), specifically no dots or commas should be used in
this role. The dot or comma is only to be used for the decimal point!
Interesting U+2014 EM DASH I didn't know this.
--
Álvaro Herrera Valdivia, Chile
"This is a foot just waiting to be shot" (Andrew Dunstan)