UTF-8 docs

Started by Alexander Lakhinover 9 years ago13 messagesdocs
Jump to latest
#1Alexander Lakhin
exclusion@gmail.com

Hello,
I've just seen a discussion about docs endoding in pgsql-hackers.

/messages/by-id/20160822.141645.655870136709055853.t-ishii@sraoss.co.jp
Can we continue the discussion in this mailing list?
We (at Postgres Pro) have developed the whole build chain (with support
for l10n) so we can just share it.

Best regards,
Alexander

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#2Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Alexander Lakhin (#1)
Re: UTF-8 docs

From: Alexander Law <exclusion@gmail.com>
Subject: UTF-8 docs
Date: Mon, 22 Aug 2016 16:36:14 +0300
Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>

Hello,
I've just seen a discussion about docs endoding in pgsql-hackers.

/messages/by-id/20160822.141645.655870136709055853.t-ishii@sraoss.co.jp
Can we continue the discussion in this mailing list?
We (at Postgres Pro) have developed the whole build chain (with
support for l10n) so we can just share it.

I have been just subscribed to the pgsql-docs list.
Here is the last conversation with Peter at pgsql-hackers.

On 8/22/16 9:32 AM, Tatsuo Ishii wrote:

I don't know what kind of problem you are seeing with encoding
handling, but at least UTF-8 is working for Japanese, French and
Russian.

Those translations are using DocBook XML.

But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#3Jürgen Purtz
juergen@purtz.de
In reply to: Tatsuo Ishii (#2)
Re: UTF-8 docs

In the previous mails we have seen some statements concerning the source
format of postgres' documentation and other statements to formats which
are derived from it. In the following I'm only speaking about the
original format. Premised this, I want to second Victor Wagner, who
wrote on pgsql-hackers:

Really, what change we need, it is conversion from SGML to XML format.
It would solve some real problems, such as ability to include diagrams
in the docs, and also let everyone to explicitely specify encoding in
XML declaration (and probably cause switch to UTF-8 as side effect,
because most XML-based tools use UTF-8 as default).

The real fundamental step is the switch from SGML to XML. He consists
not only in a change of the markup format (omittag, shorttag). We must
also replace SGML tools for parsing, validating and generating diverse
output formats like HTML or PDF with modern XML tools. And we need
additional XSLT steps or modifications of the CSS files to replace the
DSSSL scripts. This work is in progress.

After we got rid of all SGML related parts we can profit from the actual
XML tools and standards, eg.:

- Docbook itself is moving from 4.x to 5.x on the basis of XML.
(Actually I don't recommend this additional step because of some
incompatibilities in the migration to 5.x, see:
https://lists.oasis-open.org/archives/docbook/201606/msg00007.html )

- The common attribute "xml:lang" for translations

- Extensions like XInclude, SVG, MathML, ...

- ...

Show quoted text

On 23.08.2016 00:51, Tatsuo Ishii wrote:

From: Alexander Law<exclusion@gmail.com>
Subject: UTF-8 docs
Date: Mon, 22 Aug 2016 16:36:14 +0300
Message-ID:<7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>

Hello,
I've just seen a discussion about docs endoding in pgsql-hackers.

/messages/by-id/20160822.141645.655870136709055853.t-ishii@sraoss.co.jp
Can we continue the discussion in this mailing list?
We (at Postgres Pro) have developed the whole build chain (with
support for l10n) so we can just share it.

I have been just subscribed to the pgsql-docs list.
Here is the last conversation with Peter at pgsql-hackers.

On 8/22/16 9:32 AM, Tatsuo Ishii wrote:

I don't know what kind of problem you are seeing with encoding
handling, but at least UTF-8 is working for Japanese, French and
Russian.

Those translations are using DocBook XML.

But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English:http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#4Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Jürgen Purtz (#3)
Re: UTF-8 docs

Really, what change we need, it is conversion from SGML to XML format.
It would solve some real problems, such as ability to include diagrams
in the docs,

This argument sounds weak to me. Last time when I proposed to include
diagrams in the docs in the pgsql-hackers list, some developers were
against the idea because if the diagram is binary, it's hard to
maintain in git. However up to now, there's no consensus that which
text base diagram source (which allows to generate real diagrams from
it) is good for our purpose. I don't see why just migrating to XML
solves the problem.

(the discussion on diagrams stopped in 2011, as far as I know)
/messages/by-id/1307972167.2862.518.camel@core2

Don't get me wrong. I am not against migrating to XML. I just want to
say that let's not pretend that migrating to XML would solve all the
problems we have.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#5Alexander Lakhin
exclusion@gmail.com
In reply to: Tatsuo Ishii (#4)
Re: UTF-8 docs

Hello,
If I understand you right, your goal to change the docs encoding is to
allow for the docs translation.
But I don't think that the translation should be done that way (with
replacing sgml/xml/whatever).
Just as we don't translate server messages by copying all the source
files and replacing strings in them, we shouldn't translate the docs by
replacing sgml contents.
We at Postgres Pro using the gettext technologies for this. And we have
complete working toolchain (including modified Makefile) for translation
the docs.
(If someone is interested in, we can share our results and provide
everything needed to get started).
Those who translated sgmls/xml before are moving to gettext/po (The
FreeBSD Documentation Project is most remarkable example) or similar
approaches.
So I think a broader view to the evolution of the PostgreSQL
documentation is needed.

Best regards,
Alexander

23.08.2016 17:43, Tatsuo Ishii пишет:

Really, what change we need, it is conversion from SGML to XML format.
It would solve some real problems, such as ability to include diagrams
in the docs,

This argument sounds weak to me. Last time when I proposed to include
diagrams in the docs in the pgsql-hackers list, some developers were
against the idea because if the diagram is binary, it's hard to
maintain in git. However up to now, there's no consensus that which
text base diagram source (which allows to generate real diagrams from
it) is good for our purpose. I don't see why just migrating to XML
solves the problem.

(the discussion on diagrams stopped in 2011, as far as I know)
/messages/by-id/1307972167.2862.518.camel@core2

Don't get me wrong. I am not against migrating to XML. I just want to
say that let's not pretend that migrating to XML would solve all the
problems we have.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#6Jürgen Purtz
juergen@purtz.de
In reply to: Tatsuo Ishii (#4)
Re: UTF-8 docs

Arguments pro and contra diagrams are not the central focus of SGML to
XML conversion, nevertheless: "Diagrams" didn't mean any binary format -
only SVN or any other text-format is acceptable. And: if the SVN source
is generated by any program like Inkscape it tends to get unreadable.
But if we develop a SVN-library with our own predefined graphical
elements, the SVN source gets very clear. The discussion of 2011
mentioned below was continued in 2016:
/messages/by-id/5690218B.9060103@purtz.de.

Regards, J�rgen Purtz

On 23.08.2016 16:43, Tatsuo Ishii wrote:

Really, what change we need, it is conversion from SGML to XML format.
It would solve some real problems, such as ability to include diagrams
in the docs,

This argument sounds weak to me. Last time when I proposed to include
diagrams in the docs in the pgsql-hackers list, some developers were
against the idea because if the diagram is binary, it's hard to
maintain in git. However up to now, there's no consensus that which
text base diagram source (which allows to generate real diagrams from
it) is good for our purpose. I don't see why just migrating to XML
solves the problem.

(the discussion on diagrams stopped in 2011, as far as I know)
/messages/by-id/1307972167.2862.518.camel@core2

Don't get me wrong. I am not against migrating to XML. I just want to
say that let's not pretend that migrating to XML would solve all the
problems we have.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Jürgen Purtz (#6)
Re: UTF-8 docs

Arguments pro and contra diagrams are not the central focus of SGML to
XML conversion, nevertheless: "Diagrams" didn't mean any binary format
- only SVN or any other text-format is acceptable. And: if the SVN
source is generated by any program like Inkscape it tends to get
unreadable. But if we develop a SVN-library with our own predefined
graphical elements, the SVN source gets very clear. The discussion of
2011 mentioned below was continued in 2016:
/messages/by-id/5690218B.9060103@purtz.de.

Oh I didn't know that. Thank you for pointing it out. I hope this work
will be included in 10.0.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#8Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Alexander Lakhin (#5)
Re: UTF-8 docs

Hello,
If I understand you right, your goal to change the docs encoding is to
allow for the docs translation.
But I don't think that the translation should be done that way (with
replacing sgml/xml/whatever).
Just as we don't translate server messages by copying all the source
files and replacing strings in them, we shouldn't translate the docs
by replacing sgml contents.
We at Postgres Pro using the gettext technologies for this. And we
have complete working toolchain (including modified Makefile) for
translation the docs.

Sounds great but I am not sure if the technique can be applied to any
language including Japanese.

(If someone is interested in, we can share our results and provide
everything needed to get started).
Those who translated sgmls/xml before are moving to gettext/po (The
FreeBSD Documentation Project is most remarkable example) or similar
approaches.

As far as I know, FreeBSD's Japanese document project does not use
the gettext technologies.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#9Ioseph Kim
pgsql-kr@postgresql.kr
In reply to: Tatsuo Ishii (#2)
Re: UTF-8 docs

Hi.

I greet this discussion. Korean PgDoc encoding is UTF-8 too.

korean SGML encoding is utf8 since 10 years.

currently, PgDoc is very inefficient for L10N work.

but, we have no clean idea for this workaround.

I think, English main document's encoding be continued currently.

Encoding of each other language documents, I wish to entrust translators.

Regards, Ioseph.

2016년 08월 23일 07:51에 Tatsuo Ishii 이(가) 쓴 글:

Show quoted text

From: Alexander Law <exclusion@gmail.com>
Subject: UTF-8 docs
Date: Mon, 22 Aug 2016 16:36:14 +0300
Message-ID: <7fbf2e80-9507-0521-d0e9-913ab81a58df@gmail.com>

Hello,
I've just seen a discussion about docs endoding in pgsql-hackers.

/messages/by-id/20160822.141645.655870136709055853.t-ishii@sraoss.co.jp
Can we continue the discussion in this mailing list?
We (at Postgres Pro) have developed the whole build chain (with
support for l10n) so we can just share it.

I have been just subscribed to the pgsql-docs list.
Here is the last conversation with Peter at pgsql-hackers.

On 8/22/16 9:32 AM, Tatsuo Ishii wrote:

I don't know what kind of problem you are seeing with encoding
handling, but at least UTF-8 is working for Japanese, French and
Russian.

Those translations are using DocBook XML.

But in the mean time I can create UTF-8 HTML files like this:

make html
[snip]
/bin/mkdir -p html
SP_CHARSET_FIXED=1 SP_ENCODING=UTF-8 openjade -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D . -c /usr/share/sgml/docbook/stylesheet/dsssl/modular/catalog -d stylesheet.dsl -t sgml -i output-html -i include-index postgres.sgml

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

#10Alexander Lakhin
exclusion@gmail.com
In reply to: Tatsuo Ishii (#8)
Re: UTF-8 docs

Hello,
24.08.2016 02:19, Tatsuo Ishii пишет:

We at Postgres Pro using the gettext technologies for this. And we
have complete working toolchain (including modified Makefile) for
translation the docs.

Sounds great but I am not sure if the technique can be applied to any
language including Japanese.

I don't think that gettext is Russian-focused. At
https://babel.postgresql.org/ I see a number of languages, including
Japanese.
Please look at the attached .pot for one of the doc files. You can use
any translation software (which supports .po (e.g. poedit, Lokalize,
...)) to translate it to any language.
And you can look at all the docs converted to .po for translation:
http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download

(If someone is interested in, we can share our results and provide
everything needed to get started).
Those who translated sgmls/xml before are moving to gettext/po (The
FreeBSD Documentation Project is most remarkable example) or similar
approaches.

As far as I know, FreeBSD's Japanese document project does not use
the gettext technologies.

Unfortunately I can't read Japanese and don't understand how Japanese
FreeBSD team works (and why cant it use the gettext technologies) , but
the main FreeBSD Documentation team is moving to PO (see
https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html).

Best regards,
Alexander Lakhin
https://postgrespro.com/

Attachments:

advanced.potapplication/vnd.ms-powerpoint; name=advanced.potDownload
#11Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Alexander Lakhin (#10)
Re: UTF-8 docs

Hello,
24.08.2016 02:19, Tatsuo Ishii пишет:

We at Postgres Pro using the gettext technologies for this. And we
have complete working toolchain (including modified Makefile) for
translation the docs.

Sounds great but I am not sure if the technique can be applied to any
language including Japanese.

I don't think that gettext is Russian-focused. At
https://babel.postgresql.org/ I see a number of languages, including
Japanese.

Yes, I know gettext can be used for Japanese messages translation in
PostgreSQL. I just couldn't imagine how the technique can be expanded
to docs.

Please look at the attached .pot for one of the doc files. You can use
any translation software (which supports .po (e.g. poedit, Lokalize,
...)) to translate it to any language.
And you can look at all the docs converted to .po for translation:
http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download

Ok, the idea is, each sentence in the doc are regarded as "very long
message". Interesting. Looks better than "replacing SGML contents with
translated contents" (which Japanese doc project is doing for now).

(If someone is interested in, we can share our results and provide
everything needed to get started).
Those who translated sgmls/xml before are moving to gettext/po (The
FreeBSD Documentation Project is most remarkable example) or similar
approaches.

As far as I know, FreeBSD's Japanese document project does not use
the gettext technologies.

Unfortunately I can't read Japanese and don't understand how Japanese
FreeBSD team works (and why cant it use the gettext technologies) ,
but the main FreeBSD Documentation team is moving to PO (see
https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html).

Best regards,
Alexander Lakhin
https://postgrespro.com/

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#12Alexander Lakhin
exclusion@gmail.com
In reply to: Tatsuo Ishii (#11)
Re: UTF-8 docs

24.08.2016 09:58, Tatsuo Ishii О©╫О©╫О©╫О©╫О©╫:

Please look at the attached .pot for one of the doc files. You can use

any translation software (which supports .po (e.g. poedit, Lokalize,
...)) to translate it to any language.
And you can look at all the docs converted to .po for translation:
http://oc.postgrespro.ru/index.php/s/puEJKoUwbZ3dia5/download

Ok, the idea is, each sentence in the doc are regarded as "very long
message". Interesting. Looks better than "replacing SGML contents with
translated contents" (which Japanese doc project is doing for now).

Yes, the DocBook format has the natural blocks such as <para>, <title>,
<indexterm>, ... which could be translated as the separate text
fragments. So it's the question of the most accurate mapping of those
blocks to .po fragments and back. (We use for it customized xml2po from
gnome-doc-utils but there are another programs too.) All the other text
processing could be done with the rich gettext toolset.

Unfortunately I can't read Japanese and don't understand how Japanese
FreeBSD team works (and why cant it use the gettext technologies) ,
but the main FreeBSD Documentation team is moving to PO (see
https://www.bsdcan.org/2016/schedule/track/Hacking/680.en.html).

Please look at the presentation, and the
http://wonkity.com/~wblock/translation/translation.pdf -- these
documents greatly explain all the challenges of the pre-existing
approach and the solutions.

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#13Oleg Bartunov
oleg@sai.msu.su
In reply to: Tatsuo Ishii (#7)
Re: UTF-8 docs

On Wed, Aug 24, 2016 at 2:04 AM, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:

Arguments pro and contra diagrams are not the central focus of SGML to
XML conversion, nevertheless: "Diagrams" didn't mean any binary format
- only SVN or any other text-format is acceptable. And: if the SVN
source is generated by any program like Inkscape it tends to get
unreadable. But if we develop a SVN-library with our own predefined
graphical elements, the SVN source gets very clear. The discussion of
2011 mentioned below was continued in 2016:
/messages/by-id/5690218B.9060103@purtz.de.

Oh I didn't know that. Thank you for pointing it out. I hope this work
will be included in 10.0.

We discussed diagrams in docs at PGCon-2016. Heikki's attempt is here
https://wiki.postgresql.org/wiki/Figures_%26_Pics_in_Docs and Emre
made even better using markdeep https://casual-effects.com/markdeep/
I attached his version and it looks very interesting (open it in firefox).

Diagrams made in this way don't need any new tools to learn.

Oleg

Show quoted text

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Attachments:

gin-ascii-v2.md(1).htmltext/html; charset=US-ASCII; name="gin-ascii-v2.md(1).html"Download