fixing bookindex.html bloat
Hi,
Sometime last year I was surprised to see (not on a public list unfortunately)
that bookindex.html is 657kB, with > 200kB just being repetitions of
xmlns="http://www.w3.org/1999/xhtml" xmlns:xlink="http://www.w3.org/1999/xlink"
Reminded of this, due to a proposal to automatically generate docs as part of
cfbot runs (which'd be fairly likely to update bookindex.html), I spent a few
painful hours last night trying to track this down.
The reason for the two xmlns= are different. The
xmlns="http://www.w3.org/1999/xhtml" is afaict caused by confusion on our
part.
Some of our stylesheets use
xmlns="http://www.w3.org/TR/xhtml1/transitional"
others use
xmlns="http://www.w3.org/1999/xhtml"
It's noteworthy that the docbook xsl stylesheets end up with
<html xmlns="http://www.w3.org/1999/xhtml">
so it's a bit pointless to reference http://www.w3.org/TR/xhtml1/transitional
afaict.
Adding xmlns="http://www.w3.org/1999/xhtml" to stylesheet-html-common.xsl gets
rid of xmlns="http://www.w3.org/TR/xhtml1/transitional" in bookindex specific
content.
Changing stylesheet.xsl from transitional to http://www.w3.org/1999/xhtml gets
rid of xmlns="http://www.w3.org/TR/xhtml1/transitional" in navigation/footer.
Of course we should likely change all http://www.w3.org/TR/xhtml1/transitional
references, rather than just the one necessary to get rid of the xmlns= spam.
So far, so easy. It took me way longer to understand what's causing the
all the xmlns:xlink= appearances.
For a long time I was misdirected because if I remove the <xsl:template
name="generate-basic-index"> in stylesheet-html-common.xsl, the number of
xmlns:xlink drastically reduces to a handful. Which made me think that their
existance is somehow our fault. And I tried and tried to find the cause.
But it turns out that this originally is caused by a still existing buglet in
the docbook xsl stylesheets, specifically autoidx.xsl. It doesn't omit xlink
in exclude-result-prefixes, but uses ids etc from xlink.
The reason that we end up with so many more xmlns:xlink is just that without
our customization there ends up being a single
<div xmlns:xlink="http://www.w3.org/1999/xlink" class="index">
and then everything below that doesn't need the xmlns:xlink anymore. But
because stylesheet-html-common.xsl emits the div, the xmlns:xlink is emitted
for each element that autoidx.xsl has "control" over.
Waiting for docbook to fix this seems a bit futile, I eventually found a
bugreport about this, from 2016: https://sourceforge.net/p/docbook/bugs/1384/
But we can easily reduce the "impact" of the issue, by just adding a single
xmlns:xlink to <div class="index">, which is sufficient to convince xsltproc
to not repeat it.
Before:
-rw-r--r-- 1 andres andres 683139 Feb 13 04:31 html-broken/bookindex.html
After:
-rw-r--r-- 1 andres andres 442923 Feb 13 12:03 html/bookindex.html
While most of the savings are in bookindex, the rest of the files are reduced
by another ~100kB.
WIP patch attached. For now I just adjusted the minimal set of
xmlns="http://www.w3.org/TR/xhtml1/transitional", but I think we should update
all.
Greetings,
Andres Freund
Attachments:
pg-html-stylesheet.difftext/x-diff; charset=us-asciiDownload
diff --git i/doc/src/sgml/stylesheet-html-common.xsl w/doc/src/sgml/stylesheet-html-common.xsl
index d9961089c65..9f69af40a94 100644
--- i/doc/src/sgml/stylesheet-html-common.xsl
+++ w/doc/src/sgml/stylesheet-html-common.xsl
@@ -4,6 +4,7 @@
%common.entities;
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
+ xmlns="http://www.w3.org/1999/xhtml"
version="1.0">
<!--
@@ -126,7 +127,11 @@ set toc,title
&uppercase;),
substring(&primary;, 1, 1)))]"/>
- <div class="index">
+ <!-- pgsql-docs: added xmlns:xlink, autoidx.xsl doesn't include xlink in
+ exclude-result-prefixes. Without our customization that just leads to a
+ single xmlns:xlink in this div, but because we emit it it otherwise
+ gets pushed down to the elements output by autoidx.xsl -->
+ <div class="index" xmlns:xlink="http://www.w3.org/1999/xlink">
<!-- pgsql-docs: begin added stuff -->
<p class="indexdiv-quicklinks">
<a href="#indexdiv-Symbols">
diff --git i/doc/src/sgml/stylesheet.xsl w/doc/src/sgml/stylesheet.xsl
index 0eac594f0cc..24a9481fd49 100644
--- i/doc/src/sgml/stylesheet.xsl
+++ w/doc/src/sgml/stylesheet.xsl
@@ -1,7 +1,7 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version='1.0'
- xmlns="http://www.w3.org/TR/xhtml1/transitional"
+ xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="#default">
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"/>
Hi,
On 2022-02-13 12:16:18 -0800, Andres Freund wrote:
Waiting for docbook to fix this seems a bit futile, I eventually found a
bugreport about this, from 2016:
https://sourceforge.net/p/docbook/bugs/1384/
Now also reported to the current repo:
https://github.com/docbook/xslt10-stylesheets/issues/239
While there's been semi-regular changes / fixes, they've not done a release in
years... So even if they fix it, it'll likely not trickle down into distro
packages etc anytime soon, if ever.
Greetings,
Andres Freund
On 13.02.22 21:16, Andres Freund wrote:
The reason for the two xmlns= are different. The
xmlns="http://www.w3.org/1999/xhtml" is afaict caused by confusion on our
part.Some of our stylesheets use
xmlns="http://www.w3.org/TR/xhtml1/transitional"
others use
xmlns="http://www.w3.org/1999/xhtml"It's noteworthy that the docbook xsl stylesheets end up with
<html xmlns="http://www.w3.org/1999/xhtml">
so it's a bit pointless to reference http://www.w3.org/TR/xhtml1/transitional
afaict.Adding xmlns="http://www.w3.org/1999/xhtml" to stylesheet-html-common.xsl gets
rid of xmlns="http://www.w3.org/TR/xhtml1/transitional" in bookindex specific
content.Changing stylesheet.xsl from transitional to http://www.w3.org/1999/xhtml gets
rid of xmlns="http://www.w3.org/TR/xhtml1/transitional" in navigation/footer.Of course we should likely change all http://www.w3.org/TR/xhtml1/transitional
references, rather than just the one necessary to get rid of the xmlns= spam.
Yeah, that is currently clearly wrong. It appears I originally copied
the wrong namespace declarations from examples that show how to
customize the DocBook stylesheets, but those examples were apparently
wrong or outdated in this respect. It seems we also lack some namespace
declarations altogether, as shown by your need to add it to
stylesheet-html-common.xsl. This appears to need some careful cleanup.
The reason that we end up with so many more xmlns:xlink is just that without
our customization there ends up being a single
<div xmlns:xlink="http://www.w3.org/1999/xlink" class="index">
and then everything below that doesn't need the xmlns:xlink anymore. But
because stylesheet-html-common.xsl emits the div, the xmlns:xlink is emitted
for each element that autoidx.xsl has "control" over.Waiting for docbook to fix this seems a bit futile, I eventually found a
bugreport about this, from 2016: https://sourceforge.net/p/docbook/bugs/1384/But we can easily reduce the "impact" of the issue, by just adding a single
xmlns:xlink to <div class="index">, which is sufficient to convince xsltproc
to not repeat it.
I haven't fully wrapped my head around this. I tried adding xlink to
our own exclude-result-prefixes, but that didn't seem to have the right
effect.
Hi,
On 2022-02-14 18:31:25 +0100, Peter Eisentraut wrote:
The reason that we end up with so many more xmlns:xlink is just that without
our customization there ends up being a single
<div xmlns:xlink="http://www.w3.org/1999/xlink" class="index">
and then everything below that doesn't need the xmlns:xlink anymore. But
because stylesheet-html-common.xsl emits the div, the xmlns:xlink is emitted
for each element that autoidx.xsl has "control" over.Waiting for docbook to fix this seems a bit futile, I eventually found a
bugreport about this, from 2016: https://sourceforge.net/p/docbook/bugs/1384/But we can easily reduce the "impact" of the issue, by just adding a single
xmlns:xlink to <div class="index">, which is sufficient to convince xsltproc
to not repeat it.I haven't fully wrapped my head around this. I tried adding xlink to our
own exclude-result-prefixes, but that didn't seem to have the right effect.
It can't, because it's not one of our stylesheets that causes the xlink: stuff
to be included. It's autoidx.xls - just adding xlink to autoidx's
exclude-result-prefixes fixes the problem "properly", but we can't really
modify it.
The reason adding xmlns:xlink to our div (or even higher up) helps is that
then nodes below it don't need to include it again (when output by autoidx),
which drastically reduces the number of xmlns:xlink. So it's just a somewhat
ugly workaround, but for >100kB it seems better than the alternative.
Greetings,
Andres Freund
On 14.02.22 18:31, Peter Eisentraut wrote:
Yeah, that is currently clearly wrong. It appears I originally copied
the wrong namespace declarations from examples that show how to
customize the DocBook stylesheets, but those examples were apparently
wrong or outdated in this respect. It seems we also lack some namespace
declarations altogether, as shown by your need to add it to
stylesheet-html-common.xsl. This appears to need some careful cleanup.
The attached patch cleans up the xhtml namespace declarations properly,
I think.
For the xlink business, I don't have a better idea than you, so your
workaround proposal seems fine.
Attachments:
0001-Fix-XML-namespace-declarations.patchtext/plain; charset=UTF-8; name=0001-Fix-XML-namespace-declarations.patchDownload
From 45d361e0bc7bd89b41880eb83cdccabf5626b71c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Mon, 14 Feb 2022 22:56:11 +0100
Subject: [PATCH] Fix XML namespace declarations
---
doc/src/sgml/stylesheet-hh.xsl | 4 +---
doc/src/sgml/stylesheet-html-common.xsl | 3 ++-
doc/src/sgml/stylesheet-html-nochunk.xsl | 4 +---
doc/src/sgml/stylesheet-text.xsl | 3 +--
doc/src/sgml/stylesheet.xsl | 3 +--
5 files changed, 6 insertions(+), 11 deletions(-)
diff --git a/doc/src/sgml/stylesheet-hh.xsl b/doc/src/sgml/stylesheet-hh.xsl
index 1b1ab4bbe9..6f4b706dac 100644
--- a/doc/src/sgml/stylesheet-hh.xsl
+++ b/doc/src/sgml/stylesheet-hh.xsl
@@ -1,8 +1,6 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
- version='1.0'
- xmlns="http://www.w3.org/TR/xhtml1/transitional"
- exclude-result-prefixes="#default">
+ version='1.0'>
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/htmlhelp/htmlhelp.xsl"/>
<xsl:include href="stylesheet-common.xsl" />
diff --git a/doc/src/sgml/stylesheet-html-common.xsl b/doc/src/sgml/stylesheet-html-common.xsl
index d9961089c6..96dd2cc038 100644
--- a/doc/src/sgml/stylesheet-html-common.xsl
+++ b/doc/src/sgml/stylesheet-html-common.xsl
@@ -4,7 +4,8 @@
%common.entities;
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
- version="1.0">
+ version="1.0"
+ xmlns="http://www.w3.org/1999/xhtml">
<!--
This file contains XSLT stylesheet customizations that are common to
diff --git a/doc/src/sgml/stylesheet-html-nochunk.xsl b/doc/src/sgml/stylesheet-html-nochunk.xsl
index 78add26a9f..8167127b93 100644
--- a/doc/src/sgml/stylesheet-html-nochunk.xsl
+++ b/doc/src/sgml/stylesheet-html-nochunk.xsl
@@ -1,8 +1,6 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
- version='1.0'
- xmlns="http://www.w3.org/TR/xhtml1/transitional"
- exclude-result-prefixes="#default">
+ version='1.0'>
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/xhtml/docbook.xsl"/>
<xsl:include href="stylesheet-common.xsl" />
diff --git a/doc/src/sgml/stylesheet-text.xsl b/doc/src/sgml/stylesheet-text.xsl
index 476b871870..529cc9ec38 100644
--- a/doc/src/sgml/stylesheet-text.xsl
+++ b/doc/src/sgml/stylesheet-text.xsl
@@ -1,8 +1,7 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version='1.0'
- xmlns="http://www.w3.org/TR/xhtml1/transitional"
- exclude-result-prefixes="#default">
+ xmlns="http://www.w3.org/1999/xhtml">
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/xhtml/docbook.xsl"/>
<xsl:import href="stylesheet-common.xsl" />
diff --git a/doc/src/sgml/stylesheet.xsl b/doc/src/sgml/stylesheet.xsl
index 0eac594f0c..b6141303ab 100644
--- a/doc/src/sgml/stylesheet.xsl
+++ b/doc/src/sgml/stylesheet.xsl
@@ -1,8 +1,7 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version='1.0'
- xmlns="http://www.w3.org/TR/xhtml1/transitional"
- exclude-result-prefixes="#default">
+ xmlns="http://www.w3.org/1999/xhtml">
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"/>
<xsl:include href="stylesheet-common.xsl" />
--
2.35.1
Hi,
On 2022-02-14 23:06:20 +0100, Peter Eisentraut wrote:
The attached patch cleans up the xhtml namespace declarations properly, I
think.
Looks good to me.
For the xlink business, I don't have a better idea than you, so your
workaround proposal seems fine.
K. Will you apply your patch, and then I'll push mine ontop?
Greetings,
Andres Freund
On 15.02.22 00:06, Andres Freund wrote:
On 2022-02-14 23:06:20 +0100, Peter Eisentraut wrote:
The attached patch cleans up the xhtml namespace declarations properly, I
think.Looks good to me.
For the xlink business, I don't have a better idea than you, so your
workaround proposal seems fine.K. Will you apply your patch, and then I'll push mine ontop?
done
On 2022-02-15 11:16:12 +0100, Peter Eisentraut wrote:
On 15.02.22 00:06, Andres Freund wrote:
On 2022-02-14 23:06:20 +0100, Peter Eisentraut wrote:
For the xlink business, I don't have a better idea than you, so your
workaround proposal seems fine.K. Will you apply your patch, and then I'll push mine ontop?
done
done as well.