characters converted to ??? in postgres

Started by armand pirvuabout 8 years ago5 messagesgeneral
Jump to latest
#1armand pirvu
armand.pirvu@gmail.com

Hi all,

Got the following thing : ≠, ≤, and ≥ store in the database as question marks according to one of my developers.

I have postgres installed on both MAC OS X and Centos 7

All locale on both point to UTF8

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

insert into jt1 values ('≤') ;

This I can run either copy/paste which is case 1 (which does reproduce the developer issue) , or have it in an sql script which case 2

On OS X:
- case 1 fails
testdb=# insert into jt1 values ('??') ;
ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27
Note that at paste time ≤ changed in ??
- case 2 is fine
- echo -n '≤' |hexdump -C
00000000 e2 89 a4 |...|
00000003

On Centos:
- Both cases are fine
- echo -n '≤' |hexdump -C
00000000 e2 89 a4 |...|
00000003

http://www.fileformat.info/info/unicode/char/2264/index.htm
UTF-8 (hex) 0xE2 0x89 0xA4 (e289a4)

So to me the representation is fine in all cases. Also in all cases my encoding is UTF8.
I am trying to understand in OS X where does the change occur ? What is causing the failure ?
In the bigger picture a developer complained about this failure and I am fairly sure this is not a postgres issue but I need to prove it

Many thanks for help

-- Armand

#2Peter Eisentraut
peter_e@gmx.net
In reply to: armand pirvu (#1)
Re: characters converted to ??? in postgres

On 1/11/18 16:34, armand pirvu wrote:

On OS X:
- case 1 fails
testdb=# insert into jt1 values ('??') ;
ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27
Note that at paste time ≤ changed in ??

This looks like something is wrong with your libedit library. Try
running psql with the -n option. If that helps, then look into building
psql with libreadline instead. Because libedit is terrible.

- case 2 is fine
- echo -n '≤' |hexdump -C
00000000 e2 89 a4 |...|
00000003

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3armand pirvu
armand.pirvu@gmail.com
In reply to: Peter Eisentraut (#2)
Re: characters converted to ??? in postgres

Hi Peter

The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres built with libedit instead of libreadline , just curious.
Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where this bad conversion happens, in other words it passes to the backend the ?? characters
Any similar trick I could use on the postgres jdbc driver ?

Will report more once I find more from him

Many thanks
Armand

Show quoted text

On Jan 11, 2018, at 4:12 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 1/11/18 16:34, armand pirvu wrote:

On OS X:
- case 1 fails
testdb=# insert into jt1 values ('??') ;
ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa4 0x27
Note that at paste time ≤ changed in ??

This looks like something is wrong with your libedit library. Try
running psql with the -n option. If that helps, then look into building
psql with libreadline instead. Because libedit is terrible.

- case 2 is fine
- echo -n '≤' |hexdump -C
00000000 e2 89 a4 |...|
00000003

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Peter Eisentraut
peter_e@gmx.net
In reply to: armand pirvu (#3)
Re: characters converted to ??? in postgres

On 1/11/18 20:52, armand pirvu wrote:

The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres built with libedit instead of libreadline , just curious.

You'll have to ask that of whoever built the binaries you are using.
There are different sources.

libedit is part of the macOS operating system, whereas libreadline has
to be obtained separately, so that's probably a reason.

Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where this bad conversion happens, in other words it passes to the backend the ?? characters
Any similar trick I could use on the postgres jdbc driver ?

That appears to be a completely separate issue.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5armand pirvu
armand.pirvu@gmail.com
In reply to: Peter Eisentraut (#4)
Re: characters converted to ??? in postgres

Yes it appears to be a separate issue but using OS X is the only way I could reproduce trying to see with my own eyes so to speak
I did though a quick java program and that confirmed to me (at least so far) that the whole issue resides in the so called data object

Thank you so much for help and insight

— Armand

Show quoted text

On Jan 12, 2018, at 8:10 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

On 1/11/18 20:52, armand pirvu wrote:

The -n flag worked fine on OS X. I don’t have this issue on Centos. As a side question I wonder why was postgres built with libedit instead of libreadline , just curious.

You'll have to ask that of whoever built the binaries you are using.
There are different sources.

libedit is part of the macOS operating system, whereas libreadline has
to be obtained separately, so that's probably a reason.

Back to my developer issue , he is using what he calls a data object in java. Apparently this is the place where this bad conversion happens, in other words it passes to the backend the ?? characters
Any similar trick I could use on the postgres jdbc driver ?

That appears to be a completely separate issue.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services