Docbook 5.x

Started by Jürgen Purtzalmost 10 years ago86 messagesdocs
Jump to latest
#1Jürgen Purtz
juergen@purtz.de

Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
upgrade to DocBook 5.x. This sounds simple, but it will be a long
process with many sub-tasks.

Rationale:

* Sooner or later we MUST migrate as the 4.x series is outdated: V4.2
dates back to 2002. The 4.x series is no longer actively developed
since 2006. See: http://www.docbook.org/tdg5/en/html/ch01.html "In
October 2006, the DocBook Technical Committee released DocBook V4.5,
the last release planned in the 4.x series."
* V5.0 is available since 2009. See:
http://www.docbook.org/tdg5/en/html/ch01.html: "DocBook V5.0 became
an official Committee Specification in June 2009 and became an
official OASIS Standard in October 2009."
* Actually the technical committee has the third Candidate Release for
V5.1.

PROs:

* The formal part of the migration is supported by existing tools:
http://docbook.org/docs/howto/#convert4to5 (nevertheless some
scripts written by ourself will be necessary).
* The normative schema for Docbook 5.x is written in RELAX NG.
Additionally the technical committee converts this normative schema
to a XSD schema and to DTD, which are not normative but very near to
RELAX NG and will fit for most applications. Hence, we have the
choice between three schema syntaxes and everybody can use his
favourite one.
* Our source file format will switch from SGML to XML. This implies
that we have access to all XML features like XLink, XPath, XSLT,
XSL-FO, SVG, MathML, namespaces, ... .

CONs:

* The migration from 4.x to 5.x implies major changes at 3 different
levels.
o DocBook structure: Previously it was defined in SGML syntax
(DTD). Now it is defined in RELAX NG schema language plus
Schematron rules.
o DocBook files: Previously we used SGML syntax for our files. We
must convert them to a valid XML syntax, eg: tag omission.
o Tools and style sheets: All tools which operate at the native
SGML-level (editors, conversions, ...) must be replaced by XML
conforming tools. As valid XML implicitly conforms to a valid
SGML syntax this step may be accomplished by reconfiguring some
of the tools, eg.: .emacs.

What I have done so far is:

* Conversion of sgml files to valid xml syntax with a perl skript. I
failed to use 'osx' or 'spam'.
* Conversion of these xml files to Docbook5.x format using xsltproc
and Docbooks xslt-migration skripts.
* Creation of html files using xsltproc and Docbooks xslt skripts.
* Creation of fo files using xsltproc and Docbooks xslt skripts.
* Creation of pdf files using fop.
* The conversions needs less than 10 minutes on a Intel i5 processor.

This is a very first raw round-trip with one output file per sgml file
and output type. Not supported: entities (__gt__ as a surrogate),
<[CDATA and similar SGML constructs, PostgreSQL specific style sheets,
Makefile, additional errors occur, .... . I append one file of every
new format for the chapter "Advanced Features": xml (the new source),
html, fo, pdf.

Any ideas or suggestions? Shall we go further on this way? Has anybody
more experiences in SGML-->XML conversions or Docbook 4.x --> 5.x
conversions?

Kind regards
Jürgen Purtz

Attachments:

advanced.xmltext/xml; name=advanced.xmlDownload
advanced.htmltext/html; name=advanced.htmlDownload
advanced.fotext/x-xslfo; name=advanced.foDownload
advanced.pdfapplication/pdf; name=advanced.pdfDownload
#2Alexander Lakhin
exclusion@gmail.com
In reply to: Jürgen Purtz (#1)
Re: Docbook 5.x

Hello Jürgen,

Please look at the discussion that we had some time ago:
/messages/by-id/56337365.2080104@postgrespro.ru

And we (postgrespro) still have plans to migrate to XML as soon as we
get documentation translated.
We had no issues with SGML->XML conversion, "make postgres.xml" creates
XML (with entities and alike), which we use.

When you talking about "conversion of html, fo, pdf, ..." do you mean
using docs/sgml/Makefile or some other scripts?

As to conversion SGML to XML, we need to decide whether to generate a
single XML, or a set of XMLs (corresponding to current SGMLs).
In the latter case - how to include XML-fragments into the main document
(as entities or with xi:include)?

Please, can you explain what are "Docbooks xslt-migration scripts"?
Is Docbook 4.x incompatible with Docbook 5.x and we need to convert it
additionally?

Best regards,
Alexander

-----
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

20.04.2016 17:30, Jürgen Purtz пишет:

Show quoted text

Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
upgrade to DocBook 5.x. This sounds simple, but it will be a long
process with many sub-tasks.

Rationale:

* Sooner or later we MUST migrate as the 4.x series is outdated:
V4.2 dates back to 2002. The 4.x series is no longer actively
developed since 2006. See:
http://www.docbook.org/tdg5/en/html/ch01.html "In October 2006,
the DocBook Technical Committee released DocBook V4.5, the last
release planned in the 4.x series."
* V5.0 is available since 2009. See:
http://www.docbook.org/tdg5/en/html/ch01.html: "DocBook V5.0
became an official Committee Specification in June 2009 and became
an officia7l OASIS Standard in October 2009."
* Actually the technical committee has the third Candidate Release
for V5.1.

PROs:

* The formal part of the migration is supported by existing tools:
http://docbook.org/docs/howto/#convert4to5 (nevertheless some
scripts written by ourself will be necessary).
* The normative schema for Docbook 5.x is written in RELAX NG.
Additionally the technical committee converts this normative
schema to a XSD schema and to DTD, which are not normative but
very near to RELAX NG and will fit for most applications. Hence,
we have the choice between three schema syntaxes and everybody can
use his favourite one.
* Our source file format will switch from SGML to XML. This implies
that we have access to all XML features like XLink, XPath, XSLT,
XSL-FO, SVG, MathML, namespaces, ... .

CONs:

* The migration from 4.x to 5.x implies major changes at 3 different
levels.
o DocBook structure: Previously it was defined in SGML syntax
(DTD). Now it is defined in RELAX NG schema language plus
Schematron rules.
o DocBook files: Previously we used SGML syntax for our files.
We must convert them to a valid XML syntax, eg: tag omission.
o Tools and style sheets: All tools which operate at the native
SGML-level (editors, conversions, ...) must be replaced by XML
conforming tools. As valid XML implicitly conforms to a valid
SGML syntax this step may be accomplished by reconfiguring
some of the tools, eg.: .emacs.

What I have done so far is:

* Conversion of sgml files to valid xml syntax with a perl skript. I
failed to use 'osx' or 'spam'.
* Conversion of these xml files to Docbook5.x format using xsltproc
and Docbooks xslt-migration skripts.
* Creation of html files using xsltproc and Docbooks xslt skripts.
* Creation of fo files using xsltproc and Docbooks xslt skripts.
* Creation of pdf files using fop.
* The conversions needs less than 10 minutes on a Intel i5 processor.

This is a very first raw round-trip with one output file per sgml file
and output type. Not supported: entities (__gt__ as a surrogate),
<[CDATA and similar SGML constructs, PostgreSQL specific style sheets,
Makefile, additional errors occur, .... . I append one file of every
new format for the chapter "Advanced Features": xml (the new source),
html, fo, pdf.

Any ideas or suggestions? Shall we go further on this way? Has anybody
more experiences in SGML-->XML conversions or Docbook 4.x --> 5.x
conversions?

Kind regards
Jürgen Purtz

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Jürgen Purtz (#1)
Re: Docbook 5.x

On 20 April 2016 at 15:30, Jürgen Purtz <juergen@purtz.de> wrote:

What I have done so far is:

- Conversion of sgml files to valid xml syntax with a perl skript. I
failed to use 'osx' or 'spam'.
- Conversion of these xml files to Docbook5.x format using xsltproc
and Docbooks xslt-migration skripts.
- Creation of html files using xsltproc and Docbooks xslt skripts.
- Creation of fo files using xsltproc and Docbooks xslt skripts.
- Creation of pdf files using fop.
- The conversions needs less than 10 minutes on a Intel i5 processor.

So you believe you have/can convert between the two formats accurately, so

we can change things in a single commit?

What verification is offered? Possible?

And that is ready to go now? Will you post your perl script, or the patch?
Other projects use the same file formats, e.g. Slony, XL etc

If an automatic migration is possible do we need to change at all?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Jürgen Purtz
juergen@purtz.de
In reply to: Simon Riggs (#3)
Re: Docbook 5.x

On 20.04.2016 20:41, Simon Riggs wrote:

On 20 April 2016 at 15:30, Jürgen Purtz <juergen@purtz.de
<mailto:juergen@purtz.de>> wrote:

What I have done so far is:

* Conversion of sgml files to valid xml syntax with a perl
skript. I failed to use 'osx' or 'spam'.
* Conversion of these xml files to Docbook5.x format using
xsltproc and Docbooks xslt-migration skripts.
* Creation of html files using xsltproc and Docbooks xslt skripts.
* Creation of fo files using xsltproc and Docbooks xslt skripts.
* Creation of pdf files using fop.
* The conversions needs less than 10 minutes on a Intel i5
processor.

So you believe you have/can convert between the two formats
accurately, so we can change things in a single commit?

What verification is offered? Possible?

And that is ready to go now? Will you post your perl script, or the
patch? Other projects use the same file formats, e.g. Slony, XL etc

If an automatic migration is possible do we need to change at all?

--
Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Hi,

actually I have done only a first raw round-trip to evaluate that there
is no showstopper for my plans. If we find a consensus in the community
that this work is valuable for the postgres documentation I will
continue to work on it in the near future. To answer your questions:

* "do we need to change at all?". This question has to be discussed in
the community. I tried to use the recommended tools like 'osx' and
'spam' - and failed (not at all but in details like newline
processing). This may be a my fault, or it results from the fact
that we still use sgml instead of xml. But over time this task will
get harder and harder: sgml knowledge gets lost, sgml-tools are no
longer actively developed, xml move foreward, ...
* Actually I don't see any showstopper. Therefore I believe that the
conversion from Docbook 4 to 5 is manageable. The plan is that we
will have one xml-file in db5 format per every sgml file in db4 format.
* To support the repository in a continuous way we shall do something
like 'git mv file.sgml file.xml', put the new content to 'file.xml'
and 'git commit'. Additionally the newlines must be kept during all
conversation steps.
* Maybe some very individual (manual) steps are necessary, but it
shall be possible that also this can be scripted. Therefore the
conversion shall run fast and a single commit shall work on the
complete documentation.
* There are no special "Postgres" tasks in the Perl script or at any
other places. It depends on docbook only. Therefore other projects
can use it in the same way. Of course I will publish all sources.
* Actually I try to generate well-formed xml. Validation against the
Docbook 5 schema will follow.

Alexander Law posted additional suggestions and questions:

Hello Jürgen,

Please look at the discussion that we had some time ago:
/messages/by-id/56337365.2080104@postgrespro.ru

And we (postgrespro) still have plans to migrate to XML as soon as
we get documentation translated.
We had no issues with SGML->XML conversion, "make postgres.xml"
creates XML (with entities and alike), which we use.

When you talking about "conversion of html, fo, pdf, ..." do you
mean using docs/sgml/Makefile or some other scripts?

As to conversion SGML to XML, we need to decide whether to generate
a single XML, or a set of XMLs (corresponding to current SGMLs).
In the latter case - how to include XML-fragments into the main
document (as entities or with xi:include)?

Please, can you explain what are "Docbooks xslt-migration scripts"?
Is Docbook 4.x incompatible with Docbook 5.x and we need to convert
it additionally?

Best regards,
Alexander

-----
Alexander Lakhin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

My answers:

* Docbook 4 and 5 are not compatible. There are new elements, others
have gone and are replaced by more generic ones. But the Docbook
project offers xslt's to convert Docbook 4 xml-files to Docbook5
xml-files.
* There are pros and cons using postgres.xml as a starting point. PRO:
well formed (and valid?) xml format. Entities keeps alive. No more
"<![CDATA[", "<![%include" and similar sgml constructs. CON: Only
one file. Ugly line break algorithm.
* Actually I don't use the existing Makefile. I start Perl, xsltproc
and fop with a different script. If I continue to work, I have to
change the Makefile.
* "how to include XML-fragments into the main document (as entities or
with xi:include) ?". As described above, I prefer one file per
existing sgml-file. But some of those sgml-files have more than one
root element. It such situations (and without further processing)
the resulting xml-files will have fragments. In general it will be
more "Docbook 5 compliant" to use xi:include instead of entities.
* "Docbooks xslt-migration scripts": see:
http://docbook.org/docs/howto/#convert4to5

Kind regards
Jürgen Purtz

#5Jürgen Purtz
juergen@purtz.de
In reply to: Simon Riggs (#3)
Re: Docbook 5.x

Hello,

the conversion of PostgreSQL documentation from Docbook 4.x to 5.x
consists of the following steps:

1. pure sgml --> xml conversion (done, Perl script)
2. 4.x markup --> 5.x markup (done, Docbook standard migration script)
3. post-processing of 5.x files (done, Perl: xi:include, entities, ...)
4. generate the complete file postgres_all.xml with xmllint (done)
5. generate online documentation (html, man, text)
6. generate print documentation (rtf, pdf)
7. adopt Makefile to the new situation

After step 3 we have well-formed xml files, most of them are valid
against Docbook 5.0 xsd. Actually the following non-valid situations occur:

* a lot of unknown xref targets, as the target exists in a different file
* 4 remaining sgml-entities: standalone-xxx and include-xxx
* some markups, which are not valid in 5.x, mostly with <synopsis> and
<function>. This must be resolved manually (5.x offers comprehensive
possibilities for very detailed markups with <funcsynopsis> and
<cmdsynopsis>)

Steps 5 and 6 implies the replacement of our dsssl script with xslt
scripts. I guess that this is much more difficult and lengthy than
everything else I have done in this project so far. Furthermore I don't
have any knowledge about dsssl. And this is the reason why I write this
mail. Is anybody out there who can support me for the dsssl --> xslt
conversion - or at least can answer questions like:

* Is our file stylesheet.dsl written from scratch - or is it derived
from any docbook 1/2/3/4.x generic stylesheet?
* Which person has developed this file?
* What is the role of the *.xsl files in the sgml-directory and how do
they collaborate with stylesheet.dsl?

Regards, Jürgen

Show quoted text

On 20.04.2016 20:41, Simon Riggs wrote:

On 20 April 2016 at 15:30, Jürgen Purtz <juergen@purtz.de
<mailto:juergen@purtz.de>> wrote:

What I have done so far is:

* Conversion of sgml files to valid xml syntax with a perl
skript. I failed to use 'osx' or 'spam'.
* Conversion of these xml files to Docbook5.x format using
xsltproc and Docbooks xslt-migration skripts.
* Creation of html files using xsltproc and Docbooks xslt skripts.
* Creation of fo files using xsltproc and Docbooks xslt skripts.
* Creation of pdf files using fop.
* The conversions needs less than 10 minutes on a Intel i5
processor.

So you believe you have/can convert between the two formats
accurately, so we can change things in a single commit?

What verification is offered? Possible?

And that is ready to go now? Will you post your perl script, or the
patch? Other projects use the same file formats, e.g. Slony, XL etc

If an automatic migration is possible do we need to change at all?

--
Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/&gt;
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jürgen Purtz (#1)
Re: Docbook 5.x

J�rgen Purtz wrote:

Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
upgrade to DocBook 5.x. This sounds simple, but it will be a long process
with many sub-tasks.

Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#7Joshua D. Drake
jd@commandprompt.com
In reply to: Alvaro Herrera (#6)
Re: Docbook 5.x

On 05/03/2016 12:34 PM, Alvaro Herrera wrote:

J�rgen Purtz wrote:

Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
upgrade to DocBook 5.x. This sounds simple, but it will be a long process
with many sub-tasks.

Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?

IIRC:

TGL submitted a patch for the openjade bug way back when that caused
that issue.

TGL, do you know what happened there?

Sincerely,

JD

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#8Oleg Bartunov
oleg@sai.msu.su
In reply to: Alvaro Herrera (#6)
Re: Docbook 5.x

On Tue, May 3, 2016 at 10:34 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Jürgen Purtz wrote:

Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I suggest an
upgrade to DocBook 5.x. This sounds simple, but it will be a long process
with many sub-tasks.

Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?

As it stated in
/messages/by-id/562E061B.1090809@postgrespro.ru
the xml performance may be greatly improved. Alexander, what is current
state of art of your patch ? How slow is xml in compare to sgml ?

Show quoted text

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#7)
Re: Docbook 5.x

"Joshua D. Drake" <jd@commandprompt.com> writes:

On 05/03/2016 12:34 PM, Alvaro Herrera wrote:

Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?

IIRC:
TGL submitted a patch for the openjade bug way back when that caused
that issue.

I think you're thinking of this:
/messages/by-id/24388.1166800682@sss.pgh.pa.us

I do not recall just when/how that got resolved upstream, or if they
ever even responded to me. But it must have been resolved, because the
performance before that was patched was untenable even then, and would be
far more so now considering how much our docs have grown since 2006.
I have not heard anyone complaining lately that PDF output takes three
days to build.

In short, I doubt that that's relevant anymore. If it was, it would
certainly not be favorable to the XML toolchain.

BTW, the thread that that message is embedded in is pretty relevant,
because it was all about yet another lets-switch-to-XML proposal...

regards, tom lane

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#9)
Re: Docbook 5.x

I wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

IIRC:
TGL submitted a patch for the openjade bug way back when that caused
that issue.

I think you're thinking of this:
/messages/by-id/24388.1166800682@sss.pgh.pa.us

I do not recall just when/how that got resolved upstream, or if they
ever even responded to me. But it must have been resolved, because the
performance before that was patched was untenable even then, and would be
far more so now considering how much our docs have grown since 2006.

Actually, further digging suggests that Peter found a way to hack our
stylesheets to avoid that openjade bug:

/messages/by-id/200612100315.47269.peter_e@gmx.net
http://git.postgresql.org/gitweb/?p=postgresql.git&amp;a=commitdiff&amp;h=465269b8a

So it's possible that the openjade bug is still there, but has been
defanged for our purposes. In any case, there's still little reason
to think that it would apply to a different toolchain.

regards, tom lane

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Oleg Bartunov (#8)
Re: Docbook 5.x

On 5/3/16 4:13 PM, Oleg Bartunov wrote:

As it stated in
/messages/by-id/562E061B.1090809@postgrespro.ru
the xml performance may be greatly improved. Alexander, what is current
state of art of your patch ? How slow is xml in compare to sgml ?

Please make sure the patch is registered in the next commit fest.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#12Jürgen Purtz
juergen@purtz.de
In reply to: Oleg Bartunov (#8)
Re: Docbook 5.x

Hello,

I measured following elapsed times on an Intel i5 processor:

1. generate all HTML files with dsl script (make html): 0:48 min.
2. generate all HTML files with xslt script (make xslthtml): 16:01 min.
3. generate all HTML files with xslt script in the new environment
(pure Docbook5): 4:07 min.
4. Generating different things via dsl scripts in the new environment
may be possible. But the changelog of the Docbook5 dsl scripts
shows, that the last modification occurred in 2004 - this way is a
dead end.

There is one principle and a lot of minor differences between 2 and 3.
Solution 2 is based on an xml-file and xslt scripts which are based on
Docbook4. The basic difference to 3 is, that in 3 everything is Docbook5
compliant: there are only Docbook5 xml- and xslt-files (as my workflow
is: db4 --> xml --> db5 -- (db5 xslt) --> html). The minor differences
concerns the fact, that actually there are errors in my xml files and
that I made only a few parameterisation to the Docbook5 standard xslt
files - no optimization at all.

I used following tools: perl, xmllint and xsltproc. osx and OpenJade are
obsolete in the new environment (so far, there is much more work to do).

Jürgen Purtz

Show quoted text

On 03.05.2016 22:13, Oleg Bartunov wrote:

On Tue, May 3, 2016 at 10:34 PM, Alvaro Herrera
<alvherre@2ndquadrant.com <mailto:alvherre@2ndquadrant.com>> wrote:

Jürgen Purtz wrote:

Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I

suggest an

upgrade to DocBook 5.x. This sounds simple, but it will be a

long process

with many sub-tasks.

Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's
forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?

As it stated in
/messages/by-id/562E061B.1090809@postgrespro.ru
the xml performance may be greatly improved. Alexander, what is
current state of art of your patch ? How slow is xml in compare to sgml ?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org
<mailto:pgsql-docs@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jürgen Purtz (#12)
Re: Docbook 5.x

=?UTF-8?Q?J=c3=bcrgen_Purtz?= <juergen@purtz.de> writes:

I measured following elapsed times on an Intel i5 processor:

1. generate all HTML files with dsl script (make html): 0:48 min.
2. generate all HTML files with xslt script (make xslthtml): 16:01 min.
3. generate all HTML files with xslt script in the new environment
(pure Docbook5): 4:07 min.
4. Generating different things via dsl scripts in the new environment
may be possible. But the changelog of the Docbook5 dsl scripts
shows, that the last modification occurred in 2004 - this way is a
dead end.

Ouch. What about output to PDF? While we don't care as much about
that as HTML for day-to-day use, it has to be feasible (ie, not hours).

regards, tom lane

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#14Alexander Lakhin
exclusion@gmail.com
In reply to: Jürgen Purtz (#12)
Re: Docbook 5.x

Hello Jürgen,

As was stated in the aforementioned thread, solution 2 can be much (8x)
faster with some xslt optimizations, but I think now we should outline
some roadmap before we start to prepare patches and so.
Maybe we should convert to XML with DocBook4 at first step?
Then, once we get everything stabilized, we can upgrade to DocBook5.
Shouldn't we decompose the conversion procedure, so we could perform
fully automatic conversion without any manual changes, and then fix
non-valid situations, you described before?

And one more question - Is conversion to DocBook5 your final goal? Or
maybe you have any further plans regarding documentation, such as
translating it to Deutsch?

Best regards,
Alexander

04.05.2016 17:44, Jürgen Purtz пишет:

Show quoted text

Hello,

I measured following elapsed times on an Intel i5 processor:

1. generate all HTML files with dsl script (make html): 0:48 min.
2. generate all HTML files with xslt script (make xslthtml): 16:01 min.
3. generate all HTML files with xslt script in the new environment
(pure Docbook5): 4:07 min.
4. Generating different things via dsl scripts in the new environment
may be possible. But the changelog of the Docbook5 dsl scripts
shows, that the last modification occurred in 2004 - this way is a
dead end.

There is one principle and a lot of minor differences between 2 and 3.
Solution 2 is based on an xml-file and xslt scripts which are based on
Docbook4. The basic difference to 3 is, that in 3 everything is
Docbook5 compliant: there are only Docbook5 xml- and xslt-files (as my
workflow is: db4 --> xml --> db5 -- (db5 xslt) --> html). The minor
differences concerns the fact, that actually there are errors in my
xml files and that I made only a few parameterisation to the Docbook5
standard xslt files - no optimization at all.

I used following tools: perl, xmllint and xsltproc. osx and OpenJade
are obsolete in the new environment (so far, there is much more work
to do).

Jürgen Purtz

On 03.05.2016 22:13, Oleg Bartunov wrote:

On Tue, May 3, 2016 at 10:34 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Jürgen Purtz wrote:

Hi,
actually we use DocBook V4.2 for the PostgreSQL manuals. I

suggest an

upgrade to DocBook 5.x. This sounds simple, but it will be a

long process

with many sub-tasks.

Yes, agreed. The killer objection placed last time was that it took
something like 10x longer to generate the HTML using the XML-based
toolchain than the SGML-based ones. If this is not fixed, let's
forget
about this whole thing until it is. So, would you time the process
using both toolchains and report back?

As it stated in
/messages/by-id/562E061B.1090809@postgrespro.ru
the xml performance may be greatly improved. Alexander, what is
current state of art of your patch ? How slow is xml in compare to sgml ?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jürgen Purtz (#12)
Re: Docbook 5.x

J�rgen Purtz wrote:

I measured following elapsed times on an Intel i5 processor:

1. generate all HTML files with dsl script (make html): 0:48 min.
2. generate all HTML files with xslt script (make xslthtml): 16:01 min.
3. generate all HTML files with xslt script in the new environment
(pure Docbook5): 4:07 min.
4. Generating different things via dsl scripts in the new environment
may be possible. But the changelog of the Docbook5 dsl scripts
shows, that the last modification occurred in 2004 - this way is a
dead end.

Thanks.

The dsl toolchain has a "make html" format which creates the index and a
"make draft" that doesn't. You timed the former only. What's the
timing for an equivalent of "make draft" in the xslt chain? If it
exists and is short enough, it seems acceptable to me that the complete
(with index) build takes ~4x as long as today; the draft timing is more
critical, I would think.

Man pages are already generated using xslt, so I suppose that wouldn't
change. PDF creation timing is also critical.

FWIW, in my laptop "make draft" takes 1m18.788s and a "make html"
takes 1m26.676s. So it's just 8 seconds to generate the SGML file for
the index, and no reruns required ... hmm. I think I'm gonna forget
about "make draft" in the future.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alexander Lakhin (#14)
Re: Docbook 5.x

Alexander Law wrote:

Hello J�rgen,

As was stated in the aforementioned thread, solution 2 can be much (8x)
faster with some xslt optimizations, but I think now we should outline some
roadmap before we start to prepare patches and so.

Can the Docbook5 build be sped up with similar hacks?

If the stylesheet tweaks you did are universally useful, why not
contribute them back to upstream Docbook?

Maybe we should convert to XML with DocBook4 at first step?
Then, once we get everything stabilized, we can upgrade to DocBook5.

Not sure there's much point in having an intermediate step in the
repository that makes the doc build so much slower. I'd rather go to
Docbook5 straight away.

Shouldn't we decompose the conversion procedure, so we could perform fully
automatic conversion without any manual changes, and then fix non-valid
situations, you described before?

I don't think so -- this means leaving a state in the repo in which the
docs don't actually build.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#17Jürgen Purtz
juergen@purtz.de
In reply to: Tom Lane (#13)
Re: Docbook 5.x

On 04.05.2016 16:51, Tom Lane wrote:

Ouch. What about output to PDF? While we don't care as much about
that as HTML for day-to-day use, it has to be feasible (ie, not hours).

regards, tom lane

Actually I made tests using fop on single files (the converted sgml
files). This works within seconds and in my very first mail from
2016-04-20 I added the results for the 'advanced.xml' file. When I try
to convert the complete 'postgres_all.xml' file, fop crashes after some
minutes. As fop is a Java application, it is possible that the assigned
main memory is short (-Xms -Xmx, ...) - or it comes from some other Java
specific issues. I will work on this in the next days.

Jürgen Purtz

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#15)
Re: Docbook 5.x

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

The dsl toolchain has a "make html" format which creates the index and a
"make draft" that doesn't. You timed the former only. What's the
timing for an equivalent of "make draft" in the xslt chain? If it
exists and is short enough, it seems acceptable to me that the complete
(with index) build takes ~4x as long as today; the draft timing is more
critical, I would think.

I would object to that; I don't ever use "make draft", in part because
I frequently want to look at whether the index entries look sensible.
Also, as you noted, the time savings is pretty minimal at present.

regards, tom lane

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#19Alexander Lakhin
exclusion@gmail.com
In reply to: Alvaro Herrera (#16)
Re: Docbook 5.x

Hello Alvaro,
04.05.2016 18:21, Alvaro Herrera wrote:

Alexander Law wrote:

Hello Jürgen,

As was stated in the aforementioned thread, solution 2 can be much (8x)
faster with some xslt optimizations, but I think now we should outline some
roadmap before we start to prepare patches and so.

Can the Docbook5 build be sped up with similar hacks?

If the stylesheet tweaks you did are universally useful, why not
contribute them back to upstream Docbook?

I can't guarantee that these tweaks with work for all the DocBook
documents, though I've made sure that the result is the same for the
postgresql doc html's (as I stated in
/messages/by-id/562E061B.1090809@postgrespro.ru).

Maybe we should convert to XML with DocBook4 at first step?
Then, once we get everything stabilized, we can upgrade to DocBook5.

Not sure there's much point in having an intermediate step in the
repository that makes the doc build so much slower. I'd rather go to
Docbook5 straight away.

Shouldn't we decompose the conversion procedure, so we could perform fully
automatic conversion without any manual changes, and then fix non-valid
situations, you described before?

I don't think so -- this means leaving a state in the repo in which the
docs don't actually build.

I mean we could build the docs just as we do it now (as DocBook4). So we
can continue to use existing toolchain (and Makefile as it can generate
html and pdf from XML), just change a format for now.

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#20Jürgen Purtz
juergen@purtz.de
In reply to: Alexander Lakhin (#14)
Re: Docbook 5.x

On 04.05.2016 17:08, Alexander Law wrote:

As was stated in the aforementioned thread, solution 2 can be much
(8x) faster with some xslt optimizations, but I think now we should
outline some roadmap before we start to prepare patches and so.
Maybe we should convert to XML with DocBook4 at first step?
Then, once we get everything stabilized, we can upgrade to DocBook5.
Shouldn't we decompose the conversion procedure, so we could perform
fully automatic conversion without any manual changes, and then fix
non-valid situations, you described before?

Hello Alexander,

I havn't seen your xslt optimization so far. What have you done? Where
can I find the optimized script or a description?

"Divide and conquer" is a good strategy and people use it in many cases.
As you have stated, there are two major steps: from db4-sgml to db4-xml
and from there to db5-xml. In parallel to the second one we shall
migrate from dsl scripts to db5-xslt scripts. Your idea to go step by
step and stabilise at the intermediate level is good in general. But in
this case it may be unnecessary. The first step is very small. It
consists mainly of the elimination of shorttags and empty elements. This
is a pure formal act without risk. If we would stop at this point,
people are forced to switch their environment, eg .emacs from db4-sgml
to db4-xml - and after the second step to db5-xml. This is possible -
but the twice changing will bring (possibly) more confusion than
advantages. The real challenge is the second step as it implies some
manual modifications (entities, non-valid markup in sense of db5-schema)
and a switch to a different output chain. Maybe we can live for a while
with some files, which are not valid against db5-schema - as far as the
output chain produces correct results.

Jürgen Purtz

--
Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jürgen Purtz (#20)
#22Alexander Lakhin
exclusion@gmail.com
In reply to: Jürgen Purtz (#20)
#23Jürgen Purtz
juergen@purtz.de
In reply to: Alvaro Herrera (#21)
#24Jürgen Purtz
juergen@purtz.de
In reply to: Alvaro Herrera (#15)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jürgen Purtz (#23)
#26Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#14)
#27Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#11)
#28Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#26)
#29Jürgen Purtz
juergen@purtz.de
In reply to: Jürgen Purtz (#17)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jürgen Purtz (#29)
#31Jürgen Purtz
juergen@purtz.de
In reply to: Peter Eisentraut (#26)
#32Peter Eisentraut
peter_e@gmx.net
In reply to: Jürgen Purtz (#31)
#33Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#32)
#34Jürgen Purtz
juergen@purtz.de
In reply to: Peter Eisentraut (#32)
#35Peter Eisentraut
peter_e@gmx.net
In reply to: Jürgen Purtz (#34)
#36Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#27)
#37Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#36)
#38Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#37)
#39Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#38)
#40Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#39)
#41Jürgen Purtz
juergen@purtz.de
In reply to: Peter Eisentraut (#26)
#42Alexander Lakhin
exclusion@gmail.com
In reply to: Jürgen Purtz (#41)
#43Jürgen Purtz
juergen@purtz.de
In reply to: Peter Eisentraut (#26)
#44Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#38)
#45Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#44)
#46Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#45)
#47Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#46)
#48Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#47)
#49Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#47)
#50Jürgen Purtz
juergen@purtz.de
In reply to: Alexander Lakhin (#49)
#51Alexander Lakhin
exclusion@gmail.com
In reply to: Jürgen Purtz (#50)
#52Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jürgen Purtz (#50)
#53Alexander Lakhin
exclusion@gmail.com
In reply to: Alvaro Herrera (#52)
#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alexander Lakhin (#53)
#55Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alexander Lakhin (#53)
#56Alexander Lakhin
exclusion@gmail.com
In reply to: Alvaro Herrera (#55)
#57Alexander Lakhin
exclusion@gmail.com
In reply to: Alvaro Herrera (#52)
#58Jürgen Purtz
juergen@purtz.de
In reply to: Alexander Lakhin (#57)
#59Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#49)
#60Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#59)
#61Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#60)
#62Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#59)
#63Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#62)
#64Jürgen Purtz
juergen@purtz.de
In reply to: Alexander Lakhin (#63)
#65Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#63)
#66Jürgen Purtz
juergen@purtz.de
In reply to: Alexander Lakhin (#65)
#67Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#65)
#68Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#65)
#69Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#65)
#70Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#69)
#71Thomas Munro
thomas.munro@gmail.com
In reply to: Alexander Lakhin (#70)
#72Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#70)
#73Jürgen Purtz
juergen@purtz.de
In reply to: Peter Eisentraut (#72)
#74Peter Eisentraut
peter_e@gmx.net
In reply to: Jürgen Purtz (#73)
#75Alexander Lakhin
exclusion@gmail.com
In reply to: Jürgen Purtz (#73)
#76Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#75)
#77Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#76)
#78Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#77)
#79Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#78)
#80Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#79)
#81Alexander Lakhin
exclusion@gmail.com
In reply to: Jürgen Purtz (#1)
#82Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#81)
#83Peter Eisentraut
peter_e@gmx.net
In reply to: Jürgen Purtz (#1)
#84Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#83)
#85Peter Eisentraut
peter_e@gmx.net
In reply to: Alexander Lakhin (#84)
#86Alexander Lakhin
exclusion@gmail.com
In reply to: Peter Eisentraut (#85)