
# Migration of PG's documentation from DocBook 4.5 to DocBook 5.2

The migration from DocBook 4.x to 5.x is a huge step that changes most
of PG's sgml files. DocBook supports the migration with some scripts,
see: https://docbook.org/docs/howto/howto.html.  

But PG's documentation doesn't meet all prerequisites to utilize DocBook's
scripts directly. db4-upgrade.xsl is slightly modified (see comments
starting with 'jup'). There are some bash, Perl, and sed commands to solve
generic and individual problems. This is more or less manual work.
To be able to perform such changes at any point in time, all changes are
done within scripts.


## Major DocBook changes

- Discontinuation of a DOCTYPE declaration. Instead, there is an XML conforming
namespace which uniquely identifies DocBook tags.
- Discontinuation of DTDs (and XSD schema). Instead, the validation is done against
a RELAX NG schema.
- Some tag names change, especially to adopt the XML conventions and standards.
The content model of some tags is narrowed down and defined more precise.


## Migration steps

The migration is steered by conv.sh. The script uses 3 directories: All scripts and
other necessary information is located in **$ToolDir**, the existing sgml files
are located in **$FromSgmlDir**, the migrated ones are in **$ToSgmlDir**. 
1. Preparation: The git tree of the complete PG source is copied to a different place.
   So we can use git after any intermediate step to check the changes so far.
2. Migration: Some standard modifications to every single file to make them XML conform.
   Few individual changes per file. Perform the standard DocBook migration.
3. Changes: Perform some standard changes on few files. Perform many individual changes
   on many files.
4. Validation: Perform validation against the RELAX NG schema. This is done with
   Jing because the error messages delivered by xmllint are not helpful.
5. Check results by comparing old/new sgml and html files via diff.


## Introduction of a new tool

In the past, we used the tool **xmllint** to validate the sgml files against the DocBook
DTD. This worked well. Also, its validation against a RELAX NG schema works well as far
as no schema-validation occurs. But if the RELAX NG schema is violated by an sgml file,
the resulting error messages are more confusing than helpful.

Therefore, we should consider to introduce another validator. During the migration phase,
we have used **jing**. It's Java, it's fast, the error messages are very precise. But there
are many others: https://relaxng.org/#validators. Should we possibly provide multiple
validators in doc/src/sgml/Makefile?


### Installation of **jing** on Ubuntu:

    sudo apt-get install jing
    sudo apt-get install libavalon-framework-java  # (... possibly more)
    export JAVA_HOME="....."                       # adopt to your situation
    export JAVA_CMD="$JAVA_HOME"


## Problems

Single page and multiple pages HTML output can be generated.

But currently the generation of **pdf** and **epub** files shows an unacceptable
runtime behavior. An intentionally reduced postgres.sgml file (up to
about 100 pages of output) creates the expected pdf and epub output.


## ToDo

- Adoption of doc/src/sgml/Makefile
- Additional CSS definitions ???
- Adoption of Appendix J: Documentation
- Adoption of README.link
- Old release notes ???


## Forecast

Entities: We use **character entities** (e.g.: \&mdash;) as well as **parameter entities**
(e.g.: %filelist;). The use of character entities instead of hex-values or direct
Unicode-values is helpful because it improves the readability of the source for authors.
The use of parameter entities can - theoretically - be replaced by the more XML-conform
XInclude mechanism. But this isn't possible without major changes in most files:
 - Every xml/sgml-file must be XML conform, especially it needs a single root element.
 - In every xml/sgml-file we must re-declare namespace(s). The reason is that parameter
   entities perform a plain text substitution whereas xi:include creates trees and combines
   them. During the combination of such subtrees namespaces get - intentionally -
   not inherited. In every file only its own namespaces are known.


