UTF-8 for SGML docs?

Started by Tatsuo Ishiiabout 18 years ago5 messages
#1Tatsuo Ishii
ishii@postgresql.org

Hi,

Is it possible to use UTF-8 for SGML docs?

I would like to enhance the full text search docs, especially table 12-1,
which includes ASCII and LATIN1 examples only. I find that word,
numword alias etc. allow not only LATIN characters but Asian
ones. This fact is not stated in the doc, and I'm afraid this might
discourage Japanese users to use the full text search.

For this I think I would like to add Japanese examples in the table by
using UTF-8 encoding for the doc text.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#1)
Re: UTF-8 for SGML docs?

Tatsuo Ishii <ishii@postgresql.org> writes:

Is it possible to use UTF-8 for SGML docs?

No :-(. We've been through this already, see discussions awhile back
about spelling non-English names correctly. Unless there's a recognized
HTML entity for the character, you can't use it.

regards, tom lane

#3Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#2)
Re: UTF-8 for SGML docs?

Tatsuo Ishii <ishii@postgresql.org> writes:

Is it possible to use UTF-8 for SGML docs?

No :-(. We've been through this already, see discussions awhile back
about spelling non-English names correctly. Unless there's a recognized
HTML entity for the character, you can't use it.

Ok. So I will just add a comment that Japanese can be used too for word etc.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

#4Gregory Stark
stark@enterprisedb.com
In reply to: Tom Lane (#2)
Re: UTF-8 for SGML docs?

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

Tatsuo Ishii <ishii@postgresql.org> writes:

Is it possible to use UTF-8 for SGML docs?

No :-(. We've been through this already, see discussions awhile back
about spelling non-English names correctly. Unless there's a recognized
HTML entity for the character, you can't use it.

Are entities like &#x3041; ok?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gregory Stark (#4)
Re: UTF-8 for SGML docs?

Gregory Stark <stark@enterprisedb.com> writes:

No :-(. We've been through this already, see discussions awhile back
about spelling non-English names correctly. Unless there's a recognized
HTML entity for the character, you can't use it.

Are entities like &#x3041; ok?

No. See prior thread.

regards, tom lane