ascii() for utf8

Started by Stuartover 18 years ago7 messages
#1Stuart
smcg2297@frii.com

Does Postgresql have a function like ascii() that will
return the unicode codepoint value for a utf8 character?
(And symmetrically same for question chr() of course).

I didn't find anything in the docs so I think the answer
is no which leads me to ask... Why not? (Hard to believe
lack of need without concluding that either ascii() is
not needed, of utf8 text is little used.)

Are there technical problems in implementing such a
function? Has anyone else already done this (ie, is
there somewhere I could get it from?)

Is there some other non-obvious way to get the cp value
for the utf8 character?

I think I could use plperl or plpython for this but
this seems like an awful lot of overhead for such a
basic task.

#2Decibel!
decibel@decibel.org
In reply to: Stuart (#1)
Re: ascii() for utf8

Moving to -hackers.

On Jul 27, 2007, at 1:22 PM, Stuart wrote:

Does Postgresql have a function like ascii() that will
return the unicode codepoint value for a utf8 character?
(And symmetrically same for question chr() of course).

I didn't find anything in the docs so I think the answer
is no which leads me to ask... Why not? (Hard to believe
lack of need without concluding that either ascii() is
not needed, of utf8 text is little used.)

Are there technical problems in implementing such a
function? Has anyone else already done this (ie, is
there somewhere I could get it from?)

Is there some other non-obvious way to get the cp value
for the utf8 character?

I think I could use plperl or plpython for this but
this seems like an awful lot of overhead for such a
basic task.

I suspect that this is just a matter of no one scratching the itch. I
suspect a patch would be accepted, or you could possibly put
something on pgFoundry. I'd set it up so that ascii() and chr() act
according to the appropriate locale setting (I'm not sure which one
would be appropriate).
--
Decibel!, aka Jim Nasby decibel@decibel.org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

#3Alvaro Herrera
alvherre@commandprompt.com
In reply to: Decibel! (#2)
Re: [GENERAL] ascii() for utf8

Decibel! wrote:

Moving to -hackers.

On Jul 27, 2007, at 1:22 PM, Stuart wrote:

Does Postgresql have a function like ascii() that will
return the unicode codepoint value for a utf8 character?
(And symmetrically same for question chr() of course).

I suspect that this is just a matter of no one scratching the itch. I
suspect a patch would be accepted, or you could possibly put something on
pgFoundry.

Nay; there were some discussions about this not long ago, and I think
one conclusion you could draw from them is that many people want these
functions in the backend.

I'd set it up so that ascii() and chr() act according to the
appropriate locale setting (I'm not sure which one would be appropriate).

I don't see why any of them would react to the locale, but they surely
must honor client encoding.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

#4Stuart McGraw
smcg2297@frii.com
In reply to: Alvaro Herrera (#3)
Re: [GENERAL] ascii() for utf8

From: Alvaro Herrera

Decibel! wrote:

Moving to -hackers.

On Jul 27, 2007, at 1:22 PM, Stuart wrote:

Does Postgresql have a function like ascii() that will
return the unicode codepoint value for a utf8 character?
(And symmetrically same for question chr() of course).

I suspect that this is just a matter of no one scratching the itch. I
suspect a patch would be accepted, or you could possibly put something on
pgFoundry.

Nay; there were some discussions about this not long ago, and I think
one conclusion you could draw from them is that many people want these
functions in the backend.

That would certainly be my preference. I will be distributing an
application, the database part of which may (not sure yet) require
this function, to multiple platforms including Windows and (though
I have never done it) am anticipating it will be significantly harder
if I have to worry about the recipient compiling an external function
or making sure a dll goes in the right place, gets updated, etc.

I'd set it up so that ascii() and chr() act according to the
appropriate locale setting (I'm not sure which one would be appropriate).

I don't see why any of them would react to the locale, but they surely
must honor client encoding.

Wouldn't this be the database encoding? (I have been using
strictly utf-8 and admit I am pretty fuzzy on encoding issues.)

If one had written an external function, how much more effort
would it be to make it acceptable for inclusion in the backend?

#5Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#3)
Re: [GENERAL] ascii() for utf8

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Alvaro Herrera wrote:

Decibel! wrote:

Moving to -hackers.

On Jul 27, 2007, at 1:22 PM, Stuart wrote:

Does Postgresql have a function like ascii() that will
return the unicode codepoint value for a utf8 character?
(And symmetrically same for question chr() of course).

I suspect that this is just a matter of no one scratching the itch. I
suspect a patch would be accepted, or you could possibly put something on
pgFoundry.

Nay; there were some discussions about this not long ago, and I think
one conclusion you could draw from them is that many people want these
functions in the backend.

I'd set it up so that ascii() and chr() act according to the
appropriate locale setting (I'm not sure which one would be appropriate).

I don't see why any of them would react to the locale, but they surely
must honor client encoding.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#5)
Re: [GENERAL] ascii() for utf8

Actually, I am working on this as part of the fixes for invalid encoding
stuff, as recently discussed.

cheers

andrew

Bruce Momjian wrote:

Show quoted text

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Alvaro Herrera wrote:

Decibel! wrote:

Moving to -hackers.

On Jul 27, 2007, at 1:22 PM, Stuart wrote:

Does Postgresql have a function like ascii() that will
return the unicode codepoint value for a utf8 character?
(And symmetrically same for question chr() of course).

I suspect that this is just a matter of no one scratching the itch. I
suspect a patch would be accepted, or you could possibly put something on
pgFoundry.

Nay; there were some discussions about this not long ago, and I think
one conclusion you could draw from them is that many people want these
functions in the backend.

I'd set it up so that ascii() and chr() act according to the
appropriate locale setting (I'm not sure which one would be appropriate).

I don't see why any of them would react to the locale, but they surely
must honor client encoding.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#7Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#6)
Re: [GENERAL] ascii() for utf8

Andrew Dunstan wrote:

Actually, I am working on this as part of the fixes for invalid encoding
stuff, as recently discussed.

OK, I have moved the item into the 8.3 queue.

---------------------------------------------------------------------------

cheers

andrew

Bruce Momjian wrote:

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Alvaro Herrera wrote:

Decibel! wrote:

Moving to -hackers.

On Jul 27, 2007, at 1:22 PM, Stuart wrote:

Does Postgresql have a function like ascii() that will
return the unicode codepoint value for a utf8 character?
(And symmetrically same for question chr() of course).

I suspect that this is just a matter of no one scratching the itch. I
suspect a patch would be accepted, or you could possibly put something on
pgFoundry.

Nay; there were some discussions about this not long ago, and I think
one conclusion you could draw from them is that many people want these
functions in the backend.

I'd set it up so that ascii() and chr() act according to the
appropriate locale setting (I'm not sure which one would be appropriate).

I don't see why any of them would react to the locale, but they surely
must honor client encoding.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +