split func.sgml to separated individual sgml files

Started by jian heabout 1 year ago44 messages
#1jian he
jian.universality@gmail.com
2 attachment(s)

hi.

move to a new thread.
Since the old thread[1]/messages/by-id/CA+TgmoZ2F+K0j=6BOJLD=YfpJMdJRXC7sWmtXGRjx1Rq0x8PUA@mail.gmail.com, many things have interacted together.

we are going to split func.sgml to 31 inviduaul sgml files.
the new file name pattern is "func-" as the prefix.
all the func-*.sgml files stored in doc/src/sgml/func
based on the original func.sgml line number,
each file has a line beginning and line end,
which is why the python script is long.

python3 v1-0001-split_func_sgml.py
git apply v1-0001-all-filelist-for-directory-doc-src-sgml-func.patch
execute the above two commands, we should go good to go.

The following is step-by-step logic.

#--step0 get line number info, and validate it.
in func.sgml, we have 62 "sect1", these corresponding to the line
begin and line end of the new sgml file.
so in func.sgml, we locate and validate it.
later we use the SED command to do the copy, and need these line
number information.

#--step1. create doc/src/sgml/func directory, move func.sgml to there.
create each new empty invidual sgml file

#---step2 construct sed copy commands and execute it.
This will roughly copy func.sgml content from line 52 to line 31655 to
corresponding new individual sgml file, based on
line number information.

#-----step3 validates that the new file only has 2 "sect1". also
validate new files "<sect1 id.*"is unique.".
just to make sure the output works fine.

#---setp4 truncate func.sgml begins from line 52.

#---step5 append place-holder string to func.sgml

[1]: /messages/by-id/CA+TgmoZ2F+K0j=6BOJLD=YfpJMdJRXC7sWmtXGRjx1Rq0x8PUA@mail.gmail.com
/messages/by-id/CA+TgmoZ2F+K0j=6BOJLD=YfpJMdJRXC7sWmtXGRjx1Rq0x8PUA@mail.gmail.com

Attachments:

v1-0001-split_func_sgml.pytext/x-python; charset=US-ASCII; name=v1-0001-split_func_sgml.pyDownload
v1-0001-all-filelist-for-directory-doc-src-sgml-func.patchtext/x-patch; charset=US-ASCII; name=v1-0001-all-filelist-for-directory-doc-src-sgml-func.patchDownload
From d123a7c9ef6ad45e3b697aa20bcfc831f594b45d Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Sun, 21 Jul 2024 20:43:45 +0800
Subject: [PATCH v6 1/1] all filelist for directory doc/src/sgml/func

---
 doc/src/sgml/filelist.sgml      |  5 ++++-
 doc/src/sgml/func/allfiles.sgml | 40 +++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/func/allfiles.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index a7ff5f82..d9f36933 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -17,7 +17,10 @@
 <!ENTITY datatype   SYSTEM "datatype.sgml">
 <!ENTITY ddl        SYSTEM "ddl.sgml">
 <!ENTITY dml        SYSTEM "dml.sgml">
-<!ENTITY func       SYSTEM "func.sgml">
+
+<!ENTITY % allfiles_func   SYSTEM "func/allfiles.sgml">
+%allfiles_func;
+
 <!ENTITY indices    SYSTEM "indices.sgml">
 <!ENTITY json       SYSTEM "json.sgml">
 <!ENTITY mvcc       SYSTEM "mvcc.sgml">
diff --git a/doc/src/sgml/func/allfiles.sgml b/doc/src/sgml/func/allfiles.sgml
new file mode 100644
index 00000000..34eec608
--- /dev/null
+++ b/doc/src/sgml/func/allfiles.sgml
@@ -0,0 +1,40 @@
+<!--
+doc/src/sgml/func/allfiles.sgml
+PostgreSQL documentation
+Complete list of usable sgml source files in this directory.
+-->
+
+<!-- function references -->
+
+<!ENTITY func                       SYSTEM "func.sgml">
+<!ENTITY func-logical               SYSTEM "func-logical.sgml">
+<!ENTITY func-comparison            SYSTEM "func-comparison.sgml">
+<!ENTITY func-math                  SYSTEM "func-math.sgml">
+<!ENTITY func-string                SYSTEM "func-string.sgml">
+<!ENTITY func-binarystring          SYSTEM "func-binarystring.sgml">
+<!ENTITY func-bitstring             SYSTEM "func-bitstring.sgml">
+<!ENTITY func-matching              SYSTEM "func-matching.sgml">
+<!ENTITY func-formatting            SYSTEM "func-formatting.sgml">
+<!ENTITY func-datetime              SYSTEM "func-datetime.sgml">
+<!ENTITY func-enum                  SYSTEM "func-enum.sgml">
+<!ENTITY func-geometry              SYSTEM "func-geometry.sgml">
+<!ENTITY func-net                   SYSTEM "func-net.sgml">
+<!ENTITY func-textsearch            SYSTEM "func-textsearch.sgml">
+<!ENTITY func-uuid                  SYSTEM "func-uuid.sgml">
+<!ENTITY func-xml                   SYSTEM "func-xml.sgml">
+<!ENTITY func-json                  SYSTEM "func-json.sgml">
+<!ENTITY func-sequence              SYSTEM "func-sequence.sgml">
+<!ENTITY func-conditional           SYSTEM "func-conditional.sgml">
+<!ENTITY func-array                 SYSTEM "func-array.sgml">
+<!ENTITY func-range                 SYSTEM "func-range.sgml">
+<!ENTITY func-aggregate             SYSTEM "func-aggregate.sgml">
+<!ENTITY func-window                SYSTEM "func-window.sgml">
+<!ENTITY func-merge-support         SYSTEM "func-merge-support.sgml">
+<!ENTITY func-subquery              SYSTEM "func-subquery.sgml">
+<!ENTITY func-comparisons           SYSTEM "func-comparisons.sgml">
+<!ENTITY func-srf                   SYSTEM "func-srf.sgml">
+<!ENTITY func-info                  SYSTEM "func-info.sgml">
+<!ENTITY func-admin                 SYSTEM "func-admin.sgml">
+<!ENTITY func-trigger               SYSTEM "func-trigger.sgml">
+<!ENTITY func-event-triggers        SYSTEM "func-event-triggers.sgml">
+<!ENTITY func-statistics            SYSTEM "func-statistics.sgml">
\ No newline at end of file
-- 
2.34.1

#2Corey Huinker
corey.huinker@gmail.com
In reply to: jian he (#1)
Re: split func.sgml to separated individual sgml files

The following is step-by-step logic.

The end result (one file per section) seems good to me.

I suspect that reviewer burden may be the biggest barrier to going forward.
Perhaps breaking up the changes so that each new sect1 file gets its own
commit, allowing the reviewer to more easily (if not programmatically)
verify that the text that moved out of func.sgml moved into
func-sect-foo.sgml.

Granted, the committer will likely squash all of those commits down into
one big one, but by the the hard work of reviewing is done by then.

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Corey Huinker (#2)
Re: split func.sgml to separated individual sgml files

On Wed, Nov 13, 2024 at 1:11 PM Corey Huinker <corey.huinker@gmail.com>
wrote:

The following is step-by-step logic.

The end result (one file per section) seems good to me.

I suspect that reviewer burden may be the biggest barrier to going
forward. Perhaps breaking up the changes so that each new sect1 file gets
its own commit, allowing the reviewer to more easily (if not
programmatically) verify that the text that moved out of func.sgml moved
into func-sect-foo.sgml.

Granted, the committer will likely squash all of those commits down into
one big one, but by the the hard work of reviewing is done by then.

Validation is pretty trivial. I just built the before and after HTML files
and confirmed they are exactly the same size.

I suppose we might have lost some comments or something that wouldn't end
up visible in the HTML (seems unlikely) but this is basically one-and-done
so long as you don't let other commits happen (that touch this area) while
you extract and build HEAD and then compare it to the patched build
results. The git diff will let us know the script didn't affect any source
files it wasn't supposed to.

In short, ready to commit (see last paragraph below however), but the
committer will need to run the python script at the time of commit on the
then-current tree.

In my recent patch touching filelist.sgml I would be placing this new
%allfiles_func; line pairing at the top just beneath %allfiles; which is
the first child element. But the choice made here makes sense should this
go in first.

There is little downside, though, to renaming the existing %allfiles; to
%allfiles_ref; It's a local-only name.

David J.

#4jian he
jian.universality@gmail.com
In reply to: David G. Johnston (#3)
2 attachment(s)
Re: split func.sgml to separated individual sgml files

On Thu, Mar 20, 2025 at 10:16 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

In short, ready to commit (see last paragraph below however), but the committer will need to run the python script at the time of commit on the then-current tree.

hi.
more explanation, since the python script seems quite large...

each <sect1 id="functions-XXX"> in doc/src/sgml/func.sgml
corresponds to each individual section in [1]https://www.postgresql.org/docs/current/functions.html.

each <sect1 id="functions-XXX"> within func.sgml is unique.
if you try to rename it, having two <sect1 id="functions-logical">
will error out saying something like:
../../Desktop/pg_src/src6/postgres/doc/src/sgml/postgres.sgml:199:
element sect1: validity error : ID functions-logical already defined
see [2]https://en.wikipedia.org/wiki/Document_type_definition also.

Based on this, we can use the literal string <sect1 id="functions-XXX"> to
perform pattern matching and identify the line numbers that mark the start and
end of each <sect1> section.

The polished v2 python script use the following steps for splitting func.sgml
into several pieces:

0. For each 9.X section listed in [1]https://www.postgresql.org/docs/current/functions.html, create an empty SGML file to hold the
corresponding content.

1. Use the pattern <sect1 id="functions-XXX"> to locate the starting and ending
line number of each section in func.sgml

2. Copy func.sgml all the content block (<sect1>)

<sect1 id="functions-XXX">
...main content
</sect1>

into the newly created SGML files.

3. Remove the copied content from func.sgml.
4. In func.sgml, insert general entity references [3]https://www.gnu.org/software/sed/manual/html_node/Command_002dLine-Options.html#index-_002di to include the newly
created SGML files.

because PG18, and PG17, Chapter 9. Functions and Operators
have the same amount of section (31),

so v1-0001-split_func_sgml.py will work just fine.
but I did some minor changes, therefore v2 attached.

----------------------------------------------------
I used the sed --in-place option [3]https://www.gnu.org/software/sed/manual/html_node/Command_002dLine-Options.html#index-_002di to modify and truncate the original large
func.sgml file directly.
I also used the -n and -p options with sed to extract lines from func.sgml
between line X and line Y, as shown in reference [4]https://www.gnu.org/software/sed/manual/html_node/Common-Commands.html#index-n-_0028next_002dline_0029.

for the attach file:
first run ``python3 v2-0001-split_func_sgml.py``
then run ``git apply v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot``
(`git am` won't work, need to use `git apply`).

[1]: https://www.postgresql.org/docs/current/functions.html
[2]: https://en.wikipedia.org/wiki/Document_type_definition
[3]: https://www.gnu.org/software/sed/manual/html_node/Command_002dLine-Options.html#index-_002di
[4]: https://www.gnu.org/software/sed/manual/html_node/Common-Commands.html#index-n-_0028next_002dline_0029

Attachments:

v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbotapplication/octet-stream; name=v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbotDownload
From 1406e3c443326726dc9bc304907304b6210e1d7c Mon Sep 17 00:00:00 2001
From: jian he <jian.universality@gmail.com>
Date: Tue, 24 Jun 2025 11:14:00 +0800
Subject: [PATCH v2 1/1] update filelist.sgml allfiles.sgml

---
 doc/src/sgml/filelist.sgml      |  5 ++++-
 doc/src/sgml/func/allfiles.sgml | 40 +++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 doc/src/sgml/func/allfiles.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index fef9584f908..0b5fa7ea74b 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -17,7 +17,10 @@
 <!ENTITY datatype   SYSTEM "datatype.sgml">
 <!ENTITY ddl        SYSTEM "ddl.sgml">
 <!ENTITY dml        SYSTEM "dml.sgml">
-<!ENTITY func       SYSTEM "func.sgml">
+
+<!ENTITY % allfiles_func   SYSTEM "func/allfiles.sgml">
+%allfiles_func;
+
 <!ENTITY indices    SYSTEM "indices.sgml">
 <!ENTITY json       SYSTEM "json.sgml">
 <!ENTITY mvcc       SYSTEM "mvcc.sgml">
diff --git a/doc/src/sgml/func/allfiles.sgml b/doc/src/sgml/func/allfiles.sgml
new file mode 100644
index 00000000000..ce11ef1d5d8
--- /dev/null
+++ b/doc/src/sgml/func/allfiles.sgml
@@ -0,0 +1,40 @@
+<!--
+doc/src/sgml/func/allfiles.sgml
+PostgreSQL documentation
+Complete list of usable sgml source files in this directory.
+-->
+
+<!-- function references -->
+
+<!ENTITY func                       SYSTEM "func.sgml">
+<!ENTITY func-logical               SYSTEM "func-logical.sgml">
+<!ENTITY func-comparison            SYSTEM "func-comparison.sgml">
+<!ENTITY func-math                  SYSTEM "func-math.sgml">
+<!ENTITY func-string                SYSTEM "func-string.sgml">
+<!ENTITY func-binarystring          SYSTEM "func-binarystring.sgml">
+<!ENTITY func-bitstring             SYSTEM "func-bitstring.sgml">
+<!ENTITY func-matching              SYSTEM "func-matching.sgml">
+<!ENTITY func-formatting            SYSTEM "func-formatting.sgml">
+<!ENTITY func-datetime              SYSTEM "func-datetime.sgml">
+<!ENTITY func-enum                  SYSTEM "func-enum.sgml">
+<!ENTITY func-geometry              SYSTEM "func-geometry.sgml">
+<!ENTITY func-net                   SYSTEM "func-net.sgml">
+<!ENTITY func-textsearch            SYSTEM "func-textsearch.sgml">
+<!ENTITY func-uuid                  SYSTEM "func-uuid.sgml">
+<!ENTITY func-xml                   SYSTEM "func-xml.sgml">
+<!ENTITY func-json                  SYSTEM "func-json.sgml">
+<!ENTITY func-sequence              SYSTEM "func-sequence.sgml">
+<!ENTITY func-conditional           SYSTEM "func-conditional.sgml">
+<!ENTITY func-array                 SYSTEM "func-array.sgml">
+<!ENTITY func-range                 SYSTEM "func-range.sgml">
+<!ENTITY func-aggregate             SYSTEM "func-aggregate.sgml">
+<!ENTITY func-window                SYSTEM "func-window.sgml">
+<!ENTITY func-merge-support         SYSTEM "func-merge-support.sgml">
+<!ENTITY func-subquery              SYSTEM "func-subquery.sgml">
+<!ENTITY func-comparisons           SYSTEM "func-comparisons.sgml">
+<!ENTITY func-srf                   SYSTEM "func-srf.sgml">
+<!ENTITY func-info                  SYSTEM "func-info.sgml">
+<!ENTITY func-admin                 SYSTEM "func-admin.sgml">
+<!ENTITY func-trigger               SYSTEM "func-trigger.sgml">
+<!ENTITY func-event-triggers        SYSTEM "func-event-triggers.sgml">
+<!ENTITY func-statistics            SYSTEM "func-statistics.sgml">
-- 
2.34.1

v2-0001-split_func_sgml.pytext/x-python; charset=US-ASCII; name=v2-0001-split_func_sgml.pyDownload
#5jian he
jian.universality@gmail.com
In reply to: jian he (#4)
Re: split func.sgml to separated individual sgml files

hi.

after run the v2 python script and ``git apply
v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot``
git status -u
shows:

Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: doc/src/sgml/filelist.sgml
deleted: doc/src/sgml/func.sgml

That means to verify the changes, we only need to verify html files
related to "functions".

I use GNU diff to compare the HTML output of doc/src/sgml/func.sgml generated
from the master branch against the HTML file produced by the patch.
For example, $DOC9 is the PATCH (split func.sgml) html file directory, $DOC5 is
the master branch html file directory. and no message produced while running
diff, which means the patch (with the script) produced output is the
same as the master branch.

diff $DOC5/functions.html $DOC9/functions.html
diff $DOC5/functions-logical.html $DOC9/functions-logical.html
diff $DOC5/functions-comparison.html $DOC9/functions-comparison.html
diff $DOC5/functions-math.html $DOC9/functions-math.html
diff $DOC5/functions-string.html $DOC9/functions-string.html
diff $DOC5/functions-binarystring.html $DOC9/functions-binarystring.html
diff $DOC5/functions-matching.html $DOC9/functions-matching.html
diff $DOC5/functions-formatting.html $DOC9/functions-formatting.html
diff $DOC5/functions-datetime.html $DOC9/functions-datetime.html
diff $DOC5/functions-enum.html $DOC9/functions-enum.html
diff $DOC5/functions-geometry.html $DOC9/functions-geometry.html
diff $DOC5/functions-net.html $DOC9/functions-net.html
diff $DOC5/functions-textsearch.html $DOC9/functions-textsearch.html
diff $DOC5/functions-uuid.html $DOC9/functions-uuid.html
diff $DOC5/functions-xml.html $DOC9/functions-xml.html
diff $DOC5/functions-json.html $DOC9/functions-json.html
diff $DOC5/functions-sequence.html $DOC9/functions-sequence.html
diff $DOC5/functions-conditional.html $DOC9/functions-conditional.html
diff $DOC5/functions-array.html $DOC9/functions-array.html
diff $DOC5/functions-range.html $DOC9/functions-range.html
diff $DOC5/functions-aggregate.html $DOC9/functions-aggregate.html
diff $DOC5/functions-window.html $DOC9/functions-window.html
diff $DOC5/functions-merge-support.html $DOC9/functions-merge-support.html
diff $DOC5/functions-subquery.html $DOC9/functions-subquery.html
diff $DOC5/functions-comparisons.html $DOC9/functions-comparisons.html
diff $DOC5/functions-srf.html $DOC9/functions-srf.html
diff $DOC5/functions-info.html $DOC9/functions-info.html
diff $DOC5/functions-admin.html $DOC9/functions-admin.html
diff $DOC5/functions-trigger.html $DOC9/functions-trigger.html
diff $DOC5/functions-event-triggers.html $DOC9/functions-event-triggers.html
diff $DOC5/functions-statistics.html $DOC9/functions-statistics.html
#6Andrew Dunstan
andrew@dunslane.net
In reply to: jian he (#5)
Re: split func.sgml to separated individual sgml files

On 2025-07-29 Tu 2:15 AM, jian he wrote:

hi.

after run the v2 python script and ``git apply
v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot``
git status -u
shows:

Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: doc/src/sgml/filelist.sgml
deleted: doc/src/sgml/func.sgml

That means to verify the changes, we only need to verify html files
related to "functions".

I use GNU diff to compare the HTML output of doc/src/sgml/func.sgml generated
from the master branch against the HTML file produced by the patch.
For example, $DOC9 is the PATCH (split func.sgml) html file directory, $DOC5 is
the master branch html file directory. and no message produced while running
diff, which means the patch (with the script) produced output is the
same as the master branch.

[snip]

OK. I'm inclined to do this after the CF finishes, to avoid collisions
with other patches. I assume it's going to make the CFbot fairly unhappy.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#6)
Re: split func.sgml to separated individual sgml files

Andrew Dunstan <andrew@dunslane.net> writes:

OK. I'm inclined to do this after the CF finishes, to avoid collisions
with other patches. I assume it's going to make the CFbot fairly unhappy.

+1 for proceeding that way. (I did not look at whether the proposed
changes are sane, but I agree that this'll inevitably break a lot of
pending patches.)

regards, tom lane

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#7)
Re: split func.sgml to separated individual sgml files

On 2025-07-29 Tu 11:40 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

OK. I'm inclined to do this after the CF finishes, to avoid collisions
with other patches. I assume it's going to make the CFbot fairly unhappy.

+1 for proceeding that way. (I did not look at whether the proposed
changes are sane, but I agree that this'll inevitably break a lot of
pending patches.)

Done.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#9Florents Tselai
florents.tselai@gmail.com
In reply to: Andrew Dunstan (#8)
1 attachment(s)
Re: split func.sgml to separated individual sgml files

On 4 Aug 2025, at 4:09 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-07-29 Tu 11:40 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

OK. I'm inclined to do this after the CF finishes, to avoid collisions
with other patches. I assume it's going to make the CFbot fairly unhappy.

+1 for proceeding that way. (I did not look at whether the proposed
changes are sane, but I agree that this'll inevitably break a lot of
pending patches.)

Done.

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Attachments:

sgml-func-Makefile.patchapplication/octet-stream; name=sgml-func-Makefile.patchDownload
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 11aac913812..71798b0b213 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -59,7 +59,7 @@ GENERATED_SGML = version.sgml \
 	features-supported.sgml features-unsupported.sgml errcodes-table.sgml \
 	keywords-table.sgml targets-meson.sgml wait_event_types.sgml
 
-ALL_SGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
+ALL_SGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/func/*.sgml) $(GENERATED_SGML)
 
 ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
 
#10Euler Taveira
euler@eulerto.com
In reply to: Florents Tselai (#9)
1 attachment(s)
Re: split func.sgml to separated individual sgml files

On Mon, Sep 1, 2025, at 7:35 AM, Florents Tselai wrote:

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Good catch.

However, your patch doesn't fix all issues. The check target (check-tabs and
check-nbsp) is broken; these targets should also include the func files.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachments:

v2-0001-doc-fix-Makefile-after-func.sgml-split.patchtext/x-patch; name="=?UTF-8?Q?v2-0001-doc-fix-Makefile-after-func.sgml-split.patch?="Download
From d18faaed8afab8494afbab4208a709c8ddb3d624 Mon Sep 17 00:00:00 2001
From: Euler Taveira <euler@eulerto.com>
Date: Mon, 1 Sep 2025 10:33:32 -0300
Subject: [PATCH v2] doc: fix Makefile after func.sgml split

The commit 4e23c9ef65a forgot to add dependencies to some targets. It
should build if any func/*.sgml file is modified. The check target
should inspect all func/*.sgml files.
---
 doc/src/sgml/Makefile | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 11aac913812..b53b2694a6b 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -59,7 +59,7 @@ GENERATED_SGML = version.sgml \
 	features-supported.sgml features-unsupported.sgml errcodes-table.sgml \
 	keywords-table.sgml targets-meson.sgml wait_event_types.sgml
 
-ALL_SGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
+ALL_SGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
 
 ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
 
@@ -263,14 +263,14 @@ endif # sqlmansectnum != 7
 
 # tabs are harmless, but it is best to avoid them in SGML files
 check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
+	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
 	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
 
 # Non-breaking spaces are harmless, but it is best to avoid them in SGML files.
 # Use perl command because non-GNU grep or sed could not have hex escape sequence.
 check-nbsp:
 	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
-	  $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
+	  $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
 	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
 
 ##
-- 
2.39.5

#11Florents Tselai
florents.tselai@gmail.com
In reply to: Euler Taveira (#10)
1 attachment(s)
Re: split func.sgml to separated individual sgml files

On 1 Sep 2025, at 4:35 PM, Euler Taveira <euler@eulerto.com> wrote:

On Mon, Sep 1, 2025, at 7:35 AM, Florents Tselai wrote:

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Good catch.

However, your patch doesn't fix all issues. The check target (check-tabs and
check-nbsp) is broken; these targets should also include the func files.

Ah, you’re right, but then again, I’d expect ALL_SGML to be used
consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I think
there's no harm in checking them too.

Attachments:

v3-0001-The-commit-4e23c9ef65a-forgot-to-add-dependencies.patchapplication/octet-stream; name=v3-0001-The-commit-4e23c9ef65a-forgot-to-add-dependencies.patchDownload
From df0aff79789ef742e80820ee93cdbf5d99b305d4 Mon Sep 17 00:00:00 2001
From: Florents Tselai <florents.tselai@gmail.com>
Date: Mon, 1 Sep 2025 18:40:35 +0300
Subject: [PATCH v3] The commit 4e23c9ef65a forgot to add dependencies to some
 targets. It should build if any func/*.sgml file is modified. The check
 target should inspect all func/*.sgml files.

---
 doc/src/sgml/Makefile | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 11aac913812..13a23dc334f 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -59,7 +59,8 @@ GENERATED_SGML = version.sgml \
 	features-supported.sgml features-unsupported.sgml errcodes-table.sgml \
 	keywords-table.sgml targets-meson.sgml wait_event_types.sgml
 
-ALL_SGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
+ALL_SGML := $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml) $(GENERATED_SGML)
+ALL_XSL := $(wildcard $(srcdir)/*.xsl)
 
 ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
 
@@ -263,14 +264,14 @@ endif # sqlmansectnum != 7
 
 # tabs are harmless, but it is best to avoid them in SGML files
 check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
-	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
+	@( ! grep '	' $(ALL_SGML) $(ALL_XSL) ) || \
+	(echo "Tabs appear in SGML/XML files" 1>&2; exit 1)
 
 # Non-breaking spaces are harmless, but it is best to avoid them in SGML files.
 # Use perl command because non-GNU grep or sed could not have hex escape sequence.
 check-nbsp:
 	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
-	  $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
+  	$(ALL_SGML) $(ALL_XSL) ) || \
 	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
 
 ##
-- 
2.49.0

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Florents Tselai (#11)
Re: split func.sgml to separated individual sgml files

On 2025-09-01 Mo 11:44 AM, Florents Tselai wrote:

On 1 Sep 2025, at 4:35 PM, Euler Taveira <euler@eulerto.com> wrote:

On Mon, Sep 1, 2025, at 7:35 AM, Florents Tselai wrote:

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Good catch.

However, your patch doesn't fix all issues. The check target
(check-tabs and
check-nbsp) is broken; these targets should also include the func files.

Ah, you’re right, but then again,  I’d expect ALL_SGML to be used
consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I
think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles
anywhere. I note that the meson.build doesn't appear to have a check
target at all, or anything that looks for hard tabs or nbsps.Those
checks were added to the Makefile back in October in commit 5b7da5c261d,
but that got missed even though Daniel had mentioned it in the
discussion thread.[1]/messages/by-id/F7102912-0BDA-42A3-BDCF-8A4CBD1CC688@yesql.se

cheers

andrew

[1]: /messages/by-id/F7102912-0BDA-42A3-BDCF-8A4CBD1CC688@yesql.se
/messages/by-id/F7102912-0BDA-42A3-BDCF-8A4CBD1CC688@yesql.se

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#13Florents Tselai
florents.tselai@gmail.com
In reply to: Andrew Dunstan (#12)
Re: split func.sgml to separated individual sgml files

On Tue, Sep 2, 2025 at 5:54 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-09-01 Mo 11:44 AM, Florents Tselai wrote:

On 1 Sep 2025, at 4:35 PM, Euler Taveira <euler@eulerto.com> wrote:

On Mon, Sep 1, 2025, at 7:35 AM, Florents Tselai wrote:

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Good catch.

However, your patch doesn't fix all issues. The check target (check-tabs
and
check-nbsp) is broken; these targets should also include the func files.

Ah, you’re right, but then again, I’d expect ALL_SGML to be used
consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I
think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles
anywhere. I note that the meson.build doesn't appear to have a check target
at all, or anything that looks for hard tabs or nbsps.Those checks were
added to the Makefile back in October in commit 5b7da5c261d, but that got
missed even though Daniel had mentioned it in the discussion thread.[1]

From the message and discussion in 5b7da5c261d it looks like we do;
and I've seen some messages here and there that people have indeed trouble
applying patches due to spurious whitespace
and special chars.
So I assume the better solution would be having such checks in meson too,

#14Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#12)
1 attachment(s)
Re: split func.sgml to separated individual sgml files

Hi,

On Tue, 2 Sept 2025 at 17:54, Andrew Dunstan <andrew@dunslane.net> wrote:

Ah, you’re right, but then again, I’d expect ALL_SGML to be used consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles anywhere. I note that the meson.build doesn't appear to have a check target at all, or anything that looks for hard tabs or nbsps.Those checks were added to the Makefile back in October in commit 5b7da5c261d, but that got missed even though Daniel had mentioned it in the discussion thread.[1]

I have been working on running these checks under the Meson build
system. To do this, I converted the checks into a Perl script
(sgml_syntax_check) and ran it against both the Makefile and Meson.
Test's name is 'sgml_syntax_check' in the Meson. One difference I
noticed: I could not find a way in Meson to create a test that does
not run by default. As a result, this syntax test runs every time you
run the 'meson test'. This behaviour differs from Autoconf, but I
think it is acceptable.

Additionally, some of the CI OSes were missing docbook-xml; but it has
now been installed.

I did not create a new thread for that, I can create one if you think
that it would be better.

CI run with the attached patch applied:
https://cirrus-ci.com/build/6610354173640704

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

Add-sgml_syntax_check-test-to-the-Meson-build.txttext/plain; charset=US-ASCII; name=Add-sgml_syntax_check-test-to-the-Meson-build.txtDownload
From 27ab61775945d837e37ed6a0ce0c301697d183a1 Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Mon, 8 Sep 2025 17:16:05 +0300
Subject: [PATCH v1] Add sgml_syntax_check test to the Meson build

The 'sgml' check from the Makefile has been converted into a Perl script
(sgml_syntax_check) and integrated into meson.build. Unlike Autoconf,
Meson does not provide a way to mark tests as non-default, so this
script runs on every 'meson test'. While this differs from the previous
behavior, it is considered acceptable.
---
 doc/src/sgml/Makefile               |  16 +---
 doc/src/sgml/meson.build            |  23 ++++++
 doc/src/sgml/t/sgml_syntax_check.pl | 118 ++++++++++++++++++++++++++++
 .cirrus.tasks.yml                   |   3 +
 4 files changed, 146 insertions(+), 14 deletions(-)
 create mode 100755 doc/src/sgml/t/sgml_syntax_check.pl

diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 11aac913812..3256340a5b2 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -200,8 +200,8 @@ MAKEINFO = makeinfo
 ##
 
 # Quick syntax check without style processing
-check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp
-	$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
+check: postgres.sgml $(ALL_SGML)
+	$(PERL) $(srcdir)/t/sgml_syntax_check.pl --xmllint "$(XMLLINT)" --srcdir $(srcdir)
 
 
 ##
@@ -261,18 +261,6 @@ clean-man:
 
 endif # sqlmansectnum != 7
 
-# tabs are harmless, but it is best to avoid them in SGML files
-check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
-	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
-
-# Non-breaking spaces are harmless, but it is best to avoid them in SGML files.
-# Use perl command because non-GNU grep or sed could not have hex escape sequence.
-check-nbsp:
-	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
-	  $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
-	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
-
 ##
 ## Clean
 ##
diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 6ae192eac68..89d8b01c944 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -306,3 +306,26 @@ endif
 if alldocs.length() != 0
   alias_target('alldocs', alldocs)
 endif
+
+sgml_syntax_check = files(
+  't/sgml_syntax_check.pl'
+)
+
+test(
+  'sgml_syntax_check',
+  perl,
+  protocol: 'exitcode',
+  suite: 'doc',
+  args: [
+    sgml_syntax_check,
+    '--xmllint',
+      '@0@ --nonet'.format(xmllint_bin.full_path()),
+    '--srcdir',
+      meson.current_source_dir(),
+    '--builddir',
+      meson.current_build_dir(),
+  ],
+  depends: doc_generated
+)
+
+testprep_targets += doc_generated
diff --git a/doc/src/sgml/t/sgml_syntax_check.pl b/doc/src/sgml/t/sgml_syntax_check.pl
new file mode 100755
index 00000000000..7ff1d9a7c26
--- /dev/null
+++ b/doc/src/sgml/t/sgml_syntax_check.pl
@@ -0,0 +1,118 @@
+# /usr/bin/perl
+
+# doc/src/sgml/sgml_syntax_check.pl
+
+use strict;
+use warnings FATAL => 'all';
+use Getopt::Long;
+
+use File::Find;
+
+my $xmllint;
+my $srcdir;
+my $builddir;
+
+GetOptions(
+	'xmllint:s' => \$xmllint,
+	'srcdir:s' => \$srcdir,
+	'builddir:s' => \$builddir) or die "$0: wrong arguments";
+
+die "$0: --srcdir must be specified\n" unless defined $srcdir;
+
+my $postgres_sgml = "postgres.sgml";
+my $xmlinclude = "--path . --path $srcdir";
+$xmlinclude .= " --path $builddir" if defined $builddir;
+
+# find files to process in check_tabs, check_nbsp will use additional files
+my @files_to_process;
+my @dirs_to_search = ($srcdir);
+push @dirs_to_search, $builddir if defined $builddir;
+find(
+	sub {
+		return unless -f $_;
+		return if $_ !~ /\.xsl$/;
+		push @files_to_process, $File::Find::name;
+	},
+	@dirs_to_search,);
+
+push @dirs_to_search, "$srcdir/ref";
+find(
+	sub {
+		return unless -f $_;
+		return unless /\.sgml$/;
+		push @files_to_process, $File::Find::name;
+	},
+	@dirs_to_search,);
+
+
+# tabs are harmless, but it is best to avoid them in SGML files
+sub check_tabs
+{
+	my @files = @files_to_process;
+
+	my $errors = 0;
+	for my $f (@files)
+	{
+		open my $fh, "<:encoding(UTF-8)", $f or die "Can't open $f: $!";
+		while (<$fh>)
+		{
+			if (/\t/)
+			{
+				warn "Tab found in $f:$_";
+				$errors++;
+			}
+		}
+	}
+
+	if ($errors)
+	{
+		die "Tabs appear in SGML/XML files\n";
+	}
+}
+
+# non-breaking spaces are harmless, but it is best to avoid them in SGML files
+sub check_nbsp
+{
+	my @files;
+
+	# find additional '$srcdir/images/*.xsl' files to process in check_nbsp
+	find(
+		sub {
+			return unless -f $_;
+			return if $_ !~ /\.xsl$/;
+			push @files, $File::Find::name;
+		},
+		"$srcdir/images",);
+	push @files, @files_to_process;
+
+	my $errors = 0;
+	for my $f (@files)
+	{
+		open my $fh, "<:raw", $f or die "Can't open $f: $!";
+		my $line_no = 0;
+		while (<$fh>)
+		{
+			$line_no++;
+			if (/\xC2\xA0/)
+			{
+				warn "$f:$line_no: contains non-breaking space\n";
+				$errors++;
+			}
+		}
+	}
+
+	if ($errors)
+	{
+		die "Non-breaking spaces appear in SGML/XML files\n";
+	}
+}
+
+sub run_xmllint
+{
+	my $cmd = "$xmllint $xmlinclude --noout --valid $postgres_sgml";
+	system($cmd) == 0 or die "xmllint validation failed\n";
+}
+
+run_xmllint();
+check_tabs();
+check_nbsp();
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index eca9d62fc22..1c937247a9a 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -627,6 +627,8 @@ task:
     TEST_JOBS: 8
     IMAGE: ghcr.io/cirruslabs/macos-runner:sonoma
 
+    XML_CATALOG_FILES: /opt/local/share/xml/docbook/4.5/catalog.xml
+
     CIRRUS_WORKING_DIR: ${HOME}/pgsql/
     CCACHE_DIR: ${HOME}/ccache
     MACPORTS_CACHE: ${HOME}/macports-cache
@@ -641,6 +643,7 @@ task:
 
     MACOS_PACKAGE_LIST: >-
       ccache
+      docbook-xml-4.5
       icu
       kerberos5
       lz4
-- 
2.51.0

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#14)
Re: split func.sgml to separated individual sgml files

On 2025-09-12 Fr 10:12 AM, Nazir Bilal Yavuz wrote:

Hi,

On Tue, 2 Sept 2025 at 17:54, Andrew Dunstan <andrew@dunslane.net> wrote:

Ah, you’re right, but then again, I’d expect ALL_SGML to be used consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles anywhere. I note that the meson.build doesn't appear to have a check target at all, or anything that looks for hard tabs or nbsps.Those checks were added to the Makefile back in October in commit 5b7da5c261d, but that got missed even though Daniel had mentioned it in the discussion thread.[1]

I have been working on running these checks under the Meson build
system.

Thanks for this!

To do this, I converted the checks into a Perl script
(sgml_syntax_check) and ran it against both the Makefile and Meson.
Test's name is 'sgml_syntax_check' in the Meson. One difference I
noticed: I could not find a way in Meson to create a test that does
not run by default. As a result, this syntax test runs every time you
run the 'meson test'. This behaviour differs from Autoconf, but I
think it is acceptable.

Yes, I think so too.

Additionally, some of the CI OSes were missing docbook-xml; but it has
now been installed.

I did not create a new thread for that, I can create one if you think
that it would be better.

CI run with the attached patch applied:
https://cirrus-ci.com/build/6610354173640704

I am away this coming week, will check it out in detail when I return.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#14)
1 attachment(s)
Re: split func.sgml to separated individual sgml files

On 2025-09-12 Fr 10:12 AM, Nazir Bilal Yavuz wrote:

Hi,

On Tue, 2 Sept 2025 at 17:54, Andrew Dunstan <andrew@dunslane.net> wrote:

Ah, you’re right, but then again, I’d expect ALL_SGML to be used consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles anywhere. I note that the meson.build doesn't appear to have a check target at all, or anything that looks for hard tabs or nbsps.Those checks were added to the Makefile back in October in commit 5b7da5c261d, but that got missed even though Daniel had mentioned it in the discussion thread.[1]

I have been working on running these checks under the Meson build
system. To do this, I converted the checks into a Perl script
(sgml_syntax_check) and ran it against both the Makefile and Meson.
Test's name is 'sgml_syntax_check' in the Meson. One difference I
noticed: I could not find a way in Meson to create a test that does
not run by default. As a result, this syntax test runs every time you
run the 'meson test'. This behaviour differs from Autoconf, but I
think it is acceptable.

Additionally, some of the CI OSes were missing docbook-xml; but it has
now been installed.

I did not create a new thread for that, I can create one if you think
that it would be better.

CI run with the attached patch applied:
https://cirrus-ci.com/build/6610354173640704

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachments:

0001-Improve-docs-syntax-checking.patchtext/x-patch; charset=UTF-8; name=0001-Improve-docs-syntax-checking.patchDownload
From db67ff2c1dda3f358f578a0a9d8d09795ee09f73 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <andrew@dunslane.net>
Date: Tue, 30 Sep 2025 15:39:15 -0400
Subject: [PATCH] Improve docs syntax checking

Move the checks out of the Makefile into a perl script that can be
called from both the Makefile and meson.build. The set of files checked
is simplified, so it is just all the sgml and xsl files found in
docs/src/sgml directory tree.

Along the way make some adjustments to .cirrus.tasks,yml to support this
better in CI.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-Author: Andrew Dunstan <andrew@dunslane.net>

Discussion: https://postgr.es/m/CAN55FZ3BnM+0twT-ZWL8As9oBEte_b+SBU==cz6Hk8JUCM_5Wg@mail.gmail.com
---
 .cirrus.tasks.yml                 |  3 ++
 doc/src/sgml/Makefile             | 16 +------
 doc/src/sgml/meson.build          | 23 ++++++++++
 doc/src/sgml/sgml_syntax_check.pl | 75 +++++++++++++++++++++++++++++++
 4 files changed, 103 insertions(+), 14 deletions(-)
 create mode 100755 doc/src/sgml/sgml_syntax_check.pl

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index eca9d62fc22..1c937247a9a 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -627,6 +627,8 @@ task:
     TEST_JOBS: 8
     IMAGE: ghcr.io/cirruslabs/macos-runner:sonoma
 
+    XML_CATALOG_FILES: /opt/local/share/xml/docbook/4.5/catalog.xml
+
     CIRRUS_WORKING_DIR: ${HOME}/pgsql/
     CCACHE_DIR: ${HOME}/ccache
     MACPORTS_CACHE: ${HOME}/macports-cache
@@ -641,6 +643,7 @@ task:
 
     MACOS_PACKAGE_LIST: >-
       ccache
+      docbook-xml-4.5
       icu
       kerberos5
       lz4
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index b53b2694a6b..24670204cbc 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -200,8 +200,8 @@ MAKEINFO = makeinfo
 ##
 
 # Quick syntax check without style processing
-check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp
-	$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
+check: postgres.sgml $(ALL_SGML)
+	$(PERL) $(srcdir)/sgml_syntax_check.pl --xmllint "$(XMLLINT)" --srcdir $(srcdir)
 
 
 ##
@@ -261,18 +261,6 @@ clean-man:
 
 endif # sqlmansectnum != 7
 
-# tabs are harmless, but it is best to avoid them in SGML files
-check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
-	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
-
-# Non-breaking spaces are harmless, but it is best to avoid them in SGML files.
-# Use perl command because non-GNU grep or sed could not have hex escape sequence.
-check-nbsp:
-	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
-	  $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
-	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
-
 ##
 ## Clean
 ##
diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 6ae192eac68..ce0dea587cd 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -306,3 +306,26 @@ endif
 if alldocs.length() != 0
   alias_target('alldocs', alldocs)
 endif
+
+sgml_syntax_check = files(
+  'sgml_syntax_check.pl'
+)
+
+test(
+  'sgml_syntax_check',
+  perl,
+  protocol: 'exitcode',
+  suite: 'doc',
+  args: [
+    sgml_syntax_check,
+    '--xmllint',
+      '@0@ --nonet'.format(xmllint_bin.full_path()),
+    '--srcdir',
+      meson.current_source_dir(),
+    '--builddir',
+      meson.current_build_dir(),
+  ],
+  depends: doc_generated
+)
+
+testprep_targets += doc_generated
diff --git a/doc/src/sgml/sgml_syntax_check.pl b/doc/src/sgml/sgml_syntax_check.pl
new file mode 100755
index 00000000000..548769cd7eb
--- /dev/null
+++ b/doc/src/sgml/sgml_syntax_check.pl
@@ -0,0 +1,75 @@
+# /usr/bin/perl
+
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# doc/src/sgml/sgml_syntax_check.pl
+
+use strict;
+use warnings FATAL => 'all';
+use Getopt::Long;
+
+use File::Find;
+
+my $xmllint;
+my $srcdir;
+my $builddir;
+
+GetOptions(
+	'xmllint:s' => \$xmllint,
+	'srcdir:s' => \$srcdir,
+	'builddir:s' => \$builddir) or die "$0: wrong arguments";
+
+die "$0: --srcdir must be specified\n" unless defined $srcdir;
+
+my $xmlinclude = "--path . --path $srcdir";
+$xmlinclude .= " --path $builddir" if defined $builddir;
+
+# find files to process - all the sgml and xsl files (including in subdirectories)
+my @files_to_process;
+my @dirs_to_search = ($srcdir);
+push @dirs_to_search, $builddir if defined $builddir;
+find(
+	sub {
+		return unless -f $_;
+		return if $_ !~ /\.(sgml|xsl)$/;
+		push @files_to_process, $File::Find::name;
+	},
+	@dirs_to_search,);
+
+# tabs and non-breaking spaces are harmless, but it is best to avoid them in SGML files
+sub check_tabs_and_nbsp
+{
+	my $errors = 0;
+	for my $f (@files_to_process)
+	{
+		open my $fh, "<:encoding(UTF-8)", $f or die "Can't open $f: $!";
+		while (<$fh>)
+		{
+			if (/\t/)
+			{
+				print STDERR "Tab found in $f:$_";
+				$errors++;
+			}
+			if (/\xC2\xA0/)
+			{
+				print STDERR "$f:$line_no: contains non-breaking space\n";
+				$errors++;
+			}
+		}
+		close($fh);
+	}
+
+	if ($errors)
+	{
+		die "Tabsand/or non-breaking spaces appear in SGML/XML files\n";
+	}
+}
+
+sub run_xmllint
+{
+	my $cmd = "$xmllint $xmlinclude --noout --valid postgres.sgml";
+	system($cmd) == 0 or die "xmllint validation failed\n";
+}
+
+run_xmllint();
+check_tabs_and_nbsp();
-- 
2.34.1

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#16)
Re: split func.sgml to separated individual sgml files

Andrew Dunstan <andrew@dunslane.net> writes:

On 2025-09-12 Fr 10:12 AM, Nazir Bilal Yavuz wrote:

Test's name is 'sgml_syntax_check' in the Meson. One difference I
noticed: I could not find a way in Meson to create a test that does
not run by default. As a result, this syntax test runs every time you
run the 'meson test'. This behaviour differs from Autoconf, but I
think it is acceptable.

Actually, I've been meaning to complain about the fact that these
checks aren't run by the default Makefile target. I never remember
that there is a separate "check" target, and even if I did remember
it's mostly useless to me because I always want to look at the
rendered HTML. So when I'm working on the docs I always just say
"make" in the doc/src/sgml directory. It'd be helpful, at least to
me, if the default target ran the tabs and nbsp checks. It already
does run xmllint, so that change could probably be integrated with
what you've done here without too much trouble.

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

+1 for merging those two checks into one pass, especially if we're
to run them by default.

regards, tom lane

#18Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#16)
1 attachment(s)
Re: split func.sgml to separated individual sgml files

Hi,

On Tue, 30 Sept 2025 at 22:48, Andrew Dunstan <andrew@dunslane.net> wrote:

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

It looks good to me. I made 2 changes to your patch:

1- Declaration of $line_no is lost, I re-added it.
2- s/.cirrus.tasks,yml/.cirrus.tasks.yml/ in the commit message.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v2-0001-Improve-docs-syntax-checking.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Improve-docs-syntax-checking.patchDownload
From 4b079e4384f0934d9425635168c2acd399eca579 Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <andrew@dunslane.net>
Date: Tue, 30 Sep 2025 15:39:15 -0400
Subject: [PATCH v2] Improve docs syntax checking

Move the checks out of the Makefile into a perl script that can be
called from both the Makefile and meson.build. The set of files checked
is simplified, so it is just all the sgml and xsl files found in
docs/src/sgml directory tree.

Along the way make some adjustments to .cirrus.tasks.yml to support this
better in CI.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-Author: Andrew Dunstan <andrew@dunslane.net>

Discussion: https://postgr.es/m/CAN55FZ3BnM+0twT-ZWL8As9oBEte_b+SBU==cz6Hk8JUCM_5Wg@mail.gmail.com
---
 doc/src/sgml/Makefile             | 16 +------
 doc/src/sgml/meson.build          | 23 +++++++++
 doc/src/sgml/sgml_syntax_check.pl | 77 +++++++++++++++++++++++++++++++
 .cirrus.tasks.yml                 |  3 ++
 4 files changed, 105 insertions(+), 14 deletions(-)
 create mode 100755 doc/src/sgml/sgml_syntax_check.pl

diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index b53b2694a6b..24670204cbc 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -200,8 +200,8 @@ MAKEINFO = makeinfo
 ##
 
 # Quick syntax check without style processing
-check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp
-	$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
+check: postgres.sgml $(ALL_SGML)
+	$(PERL) $(srcdir)/sgml_syntax_check.pl --xmllint "$(XMLLINT)" --srcdir $(srcdir)
 
 
 ##
@@ -261,18 +261,6 @@ clean-man:
 
 endif # sqlmansectnum != 7
 
-# tabs are harmless, but it is best to avoid them in SGML files
-check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
-	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
-
-# Non-breaking spaces are harmless, but it is best to avoid them in SGML files.
-# Use perl command because non-GNU grep or sed could not have hex escape sequence.
-check-nbsp:
-	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
-	  $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
-	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
-
 ##
 ## Clean
 ##
diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 6ae192eac68..ce0dea587cd 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -306,3 +306,26 @@ endif
 if alldocs.length() != 0
   alias_target('alldocs', alldocs)
 endif
+
+sgml_syntax_check = files(
+  'sgml_syntax_check.pl'
+)
+
+test(
+  'sgml_syntax_check',
+  perl,
+  protocol: 'exitcode',
+  suite: 'doc',
+  args: [
+    sgml_syntax_check,
+    '--xmllint',
+      '@0@ --nonet'.format(xmllint_bin.full_path()),
+    '--srcdir',
+      meson.current_source_dir(),
+    '--builddir',
+      meson.current_build_dir(),
+  ],
+  depends: doc_generated
+)
+
+testprep_targets += doc_generated
diff --git a/doc/src/sgml/sgml_syntax_check.pl b/doc/src/sgml/sgml_syntax_check.pl
new file mode 100755
index 00000000000..1e1fa5d8245
--- /dev/null
+++ b/doc/src/sgml/sgml_syntax_check.pl
@@ -0,0 +1,77 @@
+# /usr/bin/perl
+
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# doc/src/sgml/sgml_syntax_check.pl
+
+use strict;
+use warnings FATAL => 'all';
+use Getopt::Long;
+
+use File::Find;
+
+my $xmllint;
+my $srcdir;
+my $builddir;
+
+GetOptions(
+	'xmllint:s' => \$xmllint,
+	'srcdir:s' => \$srcdir,
+	'builddir:s' => \$builddir) or die "$0: wrong arguments";
+
+die "$0: --srcdir must be specified\n" unless defined $srcdir;
+
+my $xmlinclude = "--path . --path $srcdir";
+$xmlinclude .= " --path $builddir" if defined $builddir;
+
+# find files to process - all the sgml and xsl files (including in subdirectories)
+my @files_to_process;
+my @dirs_to_search = ($srcdir);
+push @dirs_to_search, $builddir if defined $builddir;
+find(
+	sub {
+		return unless -f $_;
+		return if $_ !~ /\.(sgml|xsl)$/;
+		push @files_to_process, $File::Find::name;
+	},
+	@dirs_to_search,);
+
+# tabs and non-breaking spaces are harmless, but it is best to avoid them in SGML files
+sub check_tabs_and_nbsp
+{
+	my $errors = 0;
+	for my $f (@files_to_process)
+	{
+		open my $fh, "<:encoding(UTF-8)", $f or die "Can't open $f: $!";
+		my $line_no = 0;
+		while (<$fh>)
+		{
+			$line_no++;
+			if (/\t/)
+			{
+				print STDERR "Tab found in $f:$_";
+				$errors++;
+			}
+			if (/\xC2\xA0/)
+			{
+				print STDERR "$f:$line_no: contains non-breaking space\n";
+				$errors++;
+			}
+		}
+		close($fh);
+	}
+
+	if ($errors)
+	{
+		die "Tabsand/or non-breaking spaces appear in SGML/XML files\n";
+	}
+}
+
+sub run_xmllint
+{
+	my $cmd = "$xmllint $xmlinclude --noout --valid postgres.sgml";
+	system($cmd) == 0 or die "xmllint validation failed\n";
+}
+
+run_xmllint();
+check_tabs_and_nbsp();
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index eca9d62fc22..1c937247a9a 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -627,6 +627,8 @@ task:
     TEST_JOBS: 8
     IMAGE: ghcr.io/cirruslabs/macos-runner:sonoma
 
+    XML_CATALOG_FILES: /opt/local/share/xml/docbook/4.5/catalog.xml
+
     CIRRUS_WORKING_DIR: ${HOME}/pgsql/
     CCACHE_DIR: ${HOME}/ccache
     MACPORTS_CACHE: ${HOME}/macports-cache
@@ -641,6 +643,7 @@ task:
 
     MACOS_PACKAGE_LIST: >-
       ccache
+      docbook-xml-4.5
       icu
       kerberos5
       lz4
-- 
2.51.0

#19Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Nazir Bilal Yavuz (#18)
1 attachment(s)
Re: split func.sgml to separated individual sgml files

Hi,

On Wed, 1 Oct 2025 at 15:09, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Tue, 30 Sept 2025 at 22:48, Andrew Dunstan <andrew@dunslane.net> wrote:

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

It looks good to me. I made 2 changes to your patch:

1- Declaration of $line_no is lost, I re-added it.
2- s/.cirrus.tasks,yml/.cirrus.tasks.yml/ in the commit message.

Two more minor changes that I missed in the v2:

1- I added $line_no and removed $_ from the tab check's warning
message. I think it is better this way, otherwise if the line only
contains tab character; $_ will print an empty looking line.
2- s/Tabsand/Tabs and/

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v3-0001-Improve-docs-syntax-checking.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Improve-docs-syntax-checking.patchDownload
From e12da65a6ada9a394907cbfaf448f978691e9dba Mon Sep 17 00:00:00 2001
From: Andrew Dunstan <andrew@dunslane.net>
Date: Tue, 30 Sep 2025 15:39:15 -0400
Subject: [PATCH v3] Improve docs syntax checking

Move the checks out of the Makefile into a perl script that can be
called from both the Makefile and meson.build. The set of files checked
is simplified, so it is just all the sgml and xsl files found in
docs/src/sgml directory tree.

Along the way make some adjustments to .cirrus.tasks.yml to support this
better in CI.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Co-Author: Andrew Dunstan <andrew@dunslane.net>

Discussion: https://postgr.es/m/CAN55FZ3BnM+0twT-ZWL8As9oBEte_b+SBU==cz6Hk8JUCM_5Wg@mail.gmail.com
---
 doc/src/sgml/Makefile             | 16 +------
 doc/src/sgml/meson.build          | 23 +++++++++
 doc/src/sgml/sgml_syntax_check.pl | 77 +++++++++++++++++++++++++++++++
 .cirrus.tasks.yml                 |  3 ++
 4 files changed, 105 insertions(+), 14 deletions(-)
 create mode 100755 doc/src/sgml/sgml_syntax_check.pl

diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index b53b2694a6b..24670204cbc 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -200,8 +200,8 @@ MAKEINFO = makeinfo
 ##
 
 # Quick syntax check without style processing
-check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp
-	$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
+check: postgres.sgml $(ALL_SGML)
+	$(PERL) $(srcdir)/sgml_syntax_check.pl --xmllint "$(XMLLINT)" --srcdir $(srcdir)
 
 
 ##
@@ -261,18 +261,6 @@ clean-man:
 
 endif # sqlmansectnum != 7
 
-# tabs are harmless, but it is best to avoid them in SGML files
-check-tabs:
-	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \
-	(echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
-
-# Non-breaking spaces are harmless, but it is best to avoid them in SGML files.
-# Use perl command because non-GNU grep or sed could not have hex escape sequence.
-check-nbsp:
-	@ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \
-	  $(wildcard $(srcdir)/*.sgml $(srcdir)/func/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl $(srcdir)/images/*.xsl) ) || \
-	(echo "Non-breaking spaces appear in SGML/XML files" 1>&2;  exit 1)
-
 ##
 ## Clean
 ##
diff --git a/doc/src/sgml/meson.build b/doc/src/sgml/meson.build
index 6ae192eac68..ce0dea587cd 100644
--- a/doc/src/sgml/meson.build
+++ b/doc/src/sgml/meson.build
@@ -306,3 +306,26 @@ endif
 if alldocs.length() != 0
   alias_target('alldocs', alldocs)
 endif
+
+sgml_syntax_check = files(
+  'sgml_syntax_check.pl'
+)
+
+test(
+  'sgml_syntax_check',
+  perl,
+  protocol: 'exitcode',
+  suite: 'doc',
+  args: [
+    sgml_syntax_check,
+    '--xmllint',
+      '@0@ --nonet'.format(xmllint_bin.full_path()),
+    '--srcdir',
+      meson.current_source_dir(),
+    '--builddir',
+      meson.current_build_dir(),
+  ],
+  depends: doc_generated
+)
+
+testprep_targets += doc_generated
diff --git a/doc/src/sgml/sgml_syntax_check.pl b/doc/src/sgml/sgml_syntax_check.pl
new file mode 100755
index 00000000000..2264700a453
--- /dev/null
+++ b/doc/src/sgml/sgml_syntax_check.pl
@@ -0,0 +1,77 @@
+# /usr/bin/perl
+
+# Copyright (c) 2025, PostgreSQL Global Development Group
+
+# doc/src/sgml/sgml_syntax_check.pl
+
+use strict;
+use warnings FATAL => 'all';
+use Getopt::Long;
+
+use File::Find;
+
+my $xmllint;
+my $srcdir;
+my $builddir;
+
+GetOptions(
+	'xmllint:s' => \$xmllint,
+	'srcdir:s' => \$srcdir,
+	'builddir:s' => \$builddir) or die "$0: wrong arguments";
+
+die "$0: --srcdir must be specified\n" unless defined $srcdir;
+
+my $xmlinclude = "--path . --path $srcdir";
+$xmlinclude .= " --path $builddir" if defined $builddir;
+
+# find files to process - all the sgml and xsl files (including in subdirectories)
+my @files_to_process;
+my @dirs_to_search = ($srcdir);
+push @dirs_to_search, $builddir if defined $builddir;
+find(
+	sub {
+		return unless -f $_;
+		return if $_ !~ /\.(sgml|xsl)$/;
+		push @files_to_process, $File::Find::name;
+	},
+	@dirs_to_search,);
+
+# tabs and non-breaking spaces are harmless, but it is best to avoid them in SGML files
+sub check_tabs_and_nbsp
+{
+	my $errors = 0;
+	for my $f (@files_to_process)
+	{
+		open my $fh, "<:encoding(UTF-8)", $f or die "Can't open $f: $!";
+		my $line_no = 0;
+		while (<$fh>)
+		{
+			$line_no++;
+			if (/\t/)
+			{
+				print STDERR "Tab found in $f:$line_no\n";
+				$errors++;
+			}
+			if (/\xC2\xA0/)
+			{
+				print STDERR "$f:$line_no: contains non-breaking space\n";
+				$errors++;
+			}
+		}
+		close($fh);
+	}
+
+	if ($errors)
+	{
+		die "Tabs and/or non-breaking spaces appear in SGML/XML files\n";
+	}
+}
+
+sub run_xmllint
+{
+	my $cmd = "$xmllint $xmlinclude --noout --valid postgres.sgml";
+	system($cmd) == 0 or die "xmllint validation failed\n";
+}
+
+run_xmllint();
+check_tabs_and_nbsp();
diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index eca9d62fc22..1c937247a9a 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -627,6 +627,8 @@ task:
     TEST_JOBS: 8
     IMAGE: ghcr.io/cirruslabs/macos-runner:sonoma
 
+    XML_CATALOG_FILES: /opt/local/share/xml/docbook/4.5/catalog.xml
+
     CIRRUS_WORKING_DIR: ${HOME}/pgsql/
     CCACHE_DIR: ${HOME}/ccache
     MACPORTS_CACHE: ${HOME}/macports-cache
@@ -641,6 +643,7 @@ task:
 
     MACOS_PACKAGE_LIST: >-
       ccache
+      docbook-xml-4.5
       icu
       kerberos5
       lz4
-- 
2.51.0

#20Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#19)
Re: split func.sgml to separated individual sgml files

On 2025-10-01 We 8:27 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 1 Oct 2025 at 15:09, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Tue, 30 Sept 2025 at 22:48, Andrew Dunstan <andrew@dunslane.net> wrote:

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

It looks good to me. I made 2 changes to your patch:

1- Declaration of $line_no is lost, I re-added it.
2- s/.cirrus.tasks,yml/.cirrus.tasks.yml/ in the commit message.

Two more minor changes that I missed in the v2:

1- I added $line_no and removed $_ from the tab check's warning
message. I think it is better this way, otherwise if the line only
contains tab character; $_ will print an empty looking line.
2- s/Tabsand/Tabs and/

OK, thanks, looks good. How do we go about doing what Tom wants (i.e.
running the tests by default) under meson. I think in the Makefile we
could just add it to the html target.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#21Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#20)
Re: split func.sgml to separated individual sgml files

Hi,

On Wed, 1 Oct 2025 at 23:02, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-10-01 We 8:27 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 1 Oct 2025 at 15:09, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Tue, 30 Sept 2025 at 22:48, Andrew Dunstan <andrew@dunslane.net> wrote:

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

It looks good to me. I made 2 changes to your patch:

1- Declaration of $line_no is lost, I re-added it.
2- s/.cirrus.tasks,yml/.cirrus.tasks.yml/ in the commit message.

Two more minor changes that I missed in the v2:

1- I added $line_no and removed $_ from the tab check's warning
message. I think it is better this way, otherwise if the line only
contains tab character; $_ will print an empty looking line.
2- s/Tabsand/Tabs and/

OK, thanks, looks good. How do we go about doing what Tom wants (i.e.
running the tests by default) under meson. I think in the Makefile we
could just add it to the html target.

I might be misunderstanding, but these syntax checks already run by
default under meson build with this patch. Would we just need to add
this test to the HTML target in the Makefile?

--
Regards,
Nazir Bilal Yavuz
Microsoft

#22Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#21)
Re: split func.sgml to separated individual sgml files

On 2025-10-02 Th 2:58 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 1 Oct 2025 at 23:02, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-10-01 We 8:27 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 1 Oct 2025 at 15:09, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Tue, 30 Sept 2025 at 22:48, Andrew Dunstan <andrew@dunslane.net> wrote:

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

It looks good to me. I made 2 changes to your patch:

1- Declaration of $line_no is lost, I re-added it.
2- s/.cirrus.tasks,yml/.cirrus.tasks.yml/ in the commit message.

Two more minor changes that I missed in the v2:

1- I added $line_no and removed $_ from the tab check's warning
message. I think it is better this way, otherwise if the line only
contains tab character; $_ will print an empty looking line.
2- s/Tabsand/Tabs and/

OK, thanks, looks good. How do we go about doing what Tom wants (i.e.
running the tests by default) under meson. I think in the Makefile we
could just add it to the html target.

I might be misunderstanding, but these syntax checks already run by
default under meson build with this patch. Would we just need to add
this test to the HTML target in the Makefile?

Oh, ok, I missed that about meson. I will adjust the Makefile.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#23Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#22)
Re: split func.sgml to separated individual sgml files

Hi,

On Thu, 2 Oct 2025 at 15:27, Andrew Dunstan <andrew@dunslane.net> wrote:

Oh, ok, I missed that about meson. I will adjust the Makefile.

I think there is one more problem that we need to think about. This
test runs when the xmllint is enabled but it also requires docbook
(docbook-xml on some OSes) to be installed, otherwise the test fails
with 'I/O error : Attempt to load network entity
http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd&#39;. I think that
we need to skip this test if the docbook can not be found in the
system. Otherwise that would be a hassle for most of the people and
buildfarm members. What do you think about this?

--
Regards,
Nazir Bilal Yavuz
Microsoft

#24Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#23)
Re: split func.sgml to separated individual sgml files

On 2025-10-02 Th 8:52 AM, Nazir Bilal Yavuz wrote:

Hi,

On Thu, 2 Oct 2025 at 15:27, Andrew Dunstan<andrew@dunslane.net> wrote:

Oh, ok, I missed that about meson. I will adjust the Makefile.

I think there is one more problem that we need to think about. This
test runs when the xmllint is enabled but it also requires docbook
(docbook-xml on some OSes) to be installed, otherwise the test fails
with 'I/O error : Attempt to load network entity
http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd&#39;. I think that
we need to skip this test if the docbook can not be found in the
system. Otherwise that would be a hassle for most of the people and
buildfarm members. What do you think about this?

Oops, missed seeing this earlier. Yes, I think we need to skip the test
in the meson case. Probably nothing more needed for the Makefile.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#25Peter Eisentraut
peter@eisentraut.org
In reply to: Andrew Dunstan (#20)
Re: split func.sgml to separated individual sgml files

On 01.10.25 22:02, Andrew Dunstan wrote:

(Maybe these discussions could have been in a new thread and not hidden
under some unrelated thing.)

OK, thanks, looks good. How do we go about doing what Tom wants (i.e.
running the tests by default) under meson. I think in the Makefile we
could just add it to the html target.

-html: html-stamp
+html: check html-stamp

This is not a good solution. This means the html target is never up to
date. Compare PostgreSQL 18:

$ make html
make: Nothing to be done for 'html'.
$ make -q html; echo $?
0

And master:

$ make html
perl ...
$ make -q html; echo $?
1

Also, consider the postgres-full.xml target:

# Run validation only once, common to all subsequent targets. While
# we're at it, also resolve all entities (that is, copy all included
# files into one big file). This helps tools that don't understand
# vpath builds (such as dbtoepub).
postgres-full.xml: postgres.sgml $(ALL_SGML)
$(XMLLINT) $(XMLINCLUDE) --output $@ --noent --valid $<

Note that this already does validation. The way this is structured now
is that it runs the validation once when you create postgres-full.xml,
which is than later input into the HTML generation, and then you run the
validation again, on the already-processed input files, which doesn't
make any sense.

I suspect what you're really after here is the functionality of the
check-tabs and check-nbsp targets. So the new Perl script really just
has to cover those two and doesn't have to bother with xmllint. And
then you just call that script as part of the postgres-full.xml target.

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#25)
Re: split func.sgml to separated individual sgml files

Peter Eisentraut <peter@eisentraut.org> writes:

I suspect what you're really after here is the functionality of the
check-tabs and check-nbsp targets. So the new Perl script really just
has to cover those two and doesn't have to bother with xmllint. And
then you just call that script as part of the postgres-full.xml target.

Yeah, that's what I was imagining: replace the xmllint call in
postgres-full.xml with this new script that will also run the
tab/nbsp checks.

regards, tom lane

#27Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#24)
1 attachment(s)
Re: split func.sgml to separated individual sgml files

Hi,

On Thu, 2 Oct 2025 at 21:43, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-10-02 Th 8:52 AM, Nazir Bilal Yavuz wrote:

I think there is one more problem that we need to think about. This
test runs when the xmllint is enabled but it also requires docbook
(docbook-xml on some OSes) to be installed, otherwise the test fails
with 'I/O error : Attempt to load network entity
http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd&#39;. I think that
we need to skip this test if the docbook can not be found in the
system. Otherwise that would be a hassle for most of the people and
buildfarm members. What do you think about this?

Oops, missed seeing this earlier. Yes, I think we need to skip the test in the meson case. Probably nothing more needed for the Makefile.

Here is the patch which does that. It has a basic check for the
docbook and if the docbook can not be found, then meson skips the
test.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v4-0001-meson-Skip-sgml_syntax_check-test-if-DocBook-DTD-.patchtext/x-patch; charset=US-ASCII; name=v4-0001-meson-Skip-sgml_syntax_check-test-if-DocBook-DTD-.patchDownload
From e18b5c4901d0e671017b58417b57d81a7ff4931a Mon Sep 17 00:00:00 2001
From: Nazir Bilal Yavuz <byavuz81@gmail.com>
Date: Fri, 3 Oct 2025 12:56:58 +0300
Subject: [PATCH v4] meson: Skip sgml_syntax_check test if DocBook DTD can not
 be found

Add a simple DocBook test file (docbook-test.sgml) and update
sgml_syntax_check.pl to validate that the DocBook DTD is available.

The script now runs xmllint on the test file and exits with code 77 if
the DocBook DTD can not be found. This allows Meson to skip the syntax
check gracefully instead of failing when DocBook is missing.

Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/CAN55FZ3BnM+0twT-ZWL8As9oBEte_b+SBU==cz6Hk8JUCM_5Wg@mail.gmail.com
---
 doc/src/sgml/docbook-test.sgml    |  9 +++++++++
 doc/src/sgml/sgml_syntax_check.pl | 12 ++++++++++++
 2 files changed, 21 insertions(+)
 create mode 100644 doc/src/sgml/docbook-test.sgml

diff --git a/doc/src/sgml/docbook-test.sgml b/doc/src/sgml/docbook-test.sgml
new file mode 100644
index 00000000000..242a52676e0
--- /dev/null
+++ b/doc/src/sgml/docbook-test.sgml
@@ -0,0 +1,9 @@
+<!-- doc/src/sgml/docbook-test.sgml -->
+
+<!-- This file is used to check if the DocBook can be found -->
+
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+  "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
+<book>
+  <title>DocBook Test</title>
+</book>
diff --git a/doc/src/sgml/sgml_syntax_check.pl b/doc/src/sgml/sgml_syntax_check.pl
index 2264700a453..96b1c755bc5 100755
--- a/doc/src/sgml/sgml_syntax_check.pl
+++ b/doc/src/sgml/sgml_syntax_check.pl
@@ -73,5 +73,17 @@ sub run_xmllint
 	system($cmd) == 0 or die "xmllint validation failed\n";
 }
 
+sub test_docbook
+{
+	my $cmd = "$xmllint $xmlinclude --noout --valid docbook-test.sgml";
+	if (system($cmd) != 0)
+	{
+		print STDERR "DocBook DTD file can not be found\n";
+		# Meson uses exit code 77 to skip the test instead of failing it
+		exit(77);
+	}
+}
+
+test_docbook();
 run_xmllint();
 check_tabs_and_nbsp();
-- 
2.51.0

#28Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Tom Lane (#26)
Re: split func.sgml to separated individual sgml files

Hi,

On Thu, 2 Oct 2025 at 23:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

I suspect what you're really after here is the functionality of the
check-tabs and check-nbsp targets. So the new Perl script really just
has to cover those two and doesn't have to bother with xmllint. And
then you just call that script as part of the postgres-full.xml target.

Yeah, that's what I was imagining: replace the xmllint call in
postgres-full.xml with this new script that will also run the
tab/nbsp checks.

Does not this mean we can not run the syntax check by itself in the
make builds? If I understand correctly, we need to create
postgres-full.xml each time we want to run the syntax check, right?

I was under the impression that the sgml_syntax_check.pl test would be
a lightweight way to do a syntax check, so that we could easily use it
by itself or in the CI.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#29Peter Eisentraut
peter@eisentraut.org
In reply to: Nazir Bilal Yavuz (#28)
Re: split func.sgml to separated individual sgml files

On 03.10.25 13:48, Nazir Bilal Yavuz wrote:

On Thu, 2 Oct 2025 at 23:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

I suspect what you're really after here is the functionality of the
check-tabs and check-nbsp targets. So the new Perl script really just
has to cover those two and doesn't have to bother with xmllint. And
then you just call that script as part of the postgres-full.xml target.

Yeah, that's what I was imagining: replace the xmllint call in
postgres-full.xml with this new script that will also run the
tab/nbsp checks.

Does not this mean we can not run the syntax check by itself in the
make builds? If I understand correctly, we need to create
postgres-full.xml each time we want to run the syntax check, right?

If you look at this more closely, creating postgres-full.xml and running
the syntax check perform the same operations, except that the latter
throws away the output. So it seems redundant to build a whole new code
path for this. I think you can make the check target dependent on
postgres-full.xml and be done, kind of like this (starting from
pre-b2922562726):

diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index b53b2694a6b..574ae7b3984 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -69,8 +69,12 @@ ALL_IMAGES := $(wildcard $(srcdir)/images/*.svg)
  # files into one big file).  This helps tools that don't understand
  # vpath builds (such as dbtoepub).
  postgres-full.xml: postgres.sgml $(ALL_SGML)
+	$(MAKE) check-tabs check-nbsp
  	$(XMLLINT) $(XMLINCLUDE) --output $@ --noent --valid $<
+# Quick syntax check without style processing
+check: postgres-full.xml
+

##
## Man pages
@@ -195,15 +199,6 @@ MAKEINFO = makeinfo
$(MAKEINFO) --enable-encoding --no-split --no-validate $< -o $@

-##
-## Check
-##
-
-# Quick syntax check without style processing
-check: postgres.sgml $(ALL_SGML) check-tabs check-nbsp
- $(XMLLINT) $(XMLINCLUDE) --noout --valid $<
-
-
##
## Install
##

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#29)
Re: split func.sgml to separated individual sgml files

Peter Eisentraut <peter@eisentraut.org> writes:

If you look at this more closely, creating postgres-full.xml and running
the syntax check perform the same operations, except that the latter
throws away the output. So it seems redundant to build a whole new code
path for this. I think you can make the check target dependent on
postgres-full.xml and be done, kind of like this (starting from
pre-b2922562726):

Would it be unreasonable to discard the "check" target altogether?
It made sense back in the day when actually building the html docs
took many minutes. But I haven't used it in years, so I wonder
if anyone else has either.

regards, tom lane

#31Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#30)
Re: split func.sgml to separated individual sgml files

On 2025-10-03 Fr 10:41 AM, Tom Lane wrote:

Peter Eisentraut<peter@eisentraut.org> writes:

If you look at this more closely, creating postgres-full.xml and running
the syntax check perform the same operations, except that the latter
throws away the output. So it seems redundant to build a whole new code
path for this. I think you can make the check target dependent on
postgres-full.xml and be done, kind of like this (starting from
pre-b2922562726):

Would it be unreasonable to discard the "check" target altogether?
It made sense back in the day when actually building the html docs
took many minutes. But I haven't used it in years, so I wonder
if anyone else has either.

I have no objection. We'll need to work out what we're doing on the
meson side, which is kinda where we came in ...

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#32Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#31)
Re: split func.sgml to separated individual sgml files

Hi,

On Fri, 3 Oct 2025 at 18:47, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-10-03 Fr 10:41 AM, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

If you look at this more closely, creating postgres-full.xml and running
the syntax check perform the same operations, except that the latter
throws away the output. So it seems redundant to build a whole new code
path for this. I think you can make the check target dependent on
postgres-full.xml and be done, kind of like this (starting from
pre-b2922562726):

Would it be unreasonable to discard the "check" target altogether?
It made sense back in the day when actually building the html docs
took many minutes. But I haven't used it in years, so I wonder
if anyone else has either.

I have no objection. We'll need to work out what we're doing on the meson side, which is kinda where we came in ...

I can work on this but I want to clarify it first. Which one do you prefer:

1- We won't have any command to do syntax checks (including tab and
nbsp), these checks will automatically run when we generate docs.

2- We will have a 'check' target but it will only do tab and nbsp
checks; xmllint will run only when generating the docs.

--
Regards,
Nazir Bilal Yavuz
Microsoft

#33Peter Eisentraut
peter@eisentraut.org
In reply to: Nazir Bilal Yavuz (#32)
Re: split func.sgml to separated individual sgml files

On 06.10.25 10:29, Nazir Bilal Yavuz wrote:

Hi,

On Fri, 3 Oct 2025 at 18:47, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-10-03 Fr 10:41 AM, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

If you look at this more closely, creating postgres-full.xml and running
the syntax check perform the same operations, except that the latter
throws away the output. So it seems redundant to build a whole new code
path for this. I think you can make the check target dependent on
postgres-full.xml and be done, kind of like this (starting from
pre-b2922562726):

Would it be unreasonable to discard the "check" target altogether?
It made sense back in the day when actually building the html docs
took many minutes. But I haven't used it in years, so I wonder
if anyone else has either.

I have no objection. We'll need to work out what we're doing on the meson side, which is kinda where we came in ...

I can work on this but I want to clarify it first. Which one do you prefer:

1- We won't have any command to do syntax checks (including tab and
nbsp), these checks will automatically run when we generate docs.

2- We will have a 'check' target but it will only do tab and nbsp
checks; xmllint will run only when generating the docs.

I don't know, people have a lot of individual workflows, and they are
not reading this thread. I still don't know what we are actually trying
to fix here, I just noticed that what was committed is flawed.

I would prefer that b2922562726 be reverted, and then someone start a
new thread with a descriptive change proposal.

#34Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Peter Eisentraut (#33)
Re: split func.sgml to separated individual sgml files

Hi,

On Mon, 6 Oct 2025 at 11:54, Peter Eisentraut <peter@eisentraut.org> wrote:

On 06.10.25 10:29, Nazir Bilal Yavuz wrote:

I can work on this but I want to clarify it first. Which one do you prefer:

1- We won't have any command to do syntax checks (including tab and
nbsp), these checks will automatically run when we generate docs.

2- We will have a 'check' target but it will only do tab and nbsp
checks; xmllint will run only when generating the docs.

I don't know, people have a lot of individual workflows, and they are
not reading this thread. I still don't know what we are actually trying
to fix here, I just noticed that what was committed is flawed.

The problem was meson build doesn't have tab and nbsp checks [1]/messages/by-id/7020df24-1d5f-41e5-8948-2e8d5da57935@dunslane.net. We
were trying to enable these checks on meson build by moving these
checks to the perl script so that we can run this script on both build
systems.

I would prefer that b2922562726 be reverted, and then someone start a
new thread with a descriptive change proposal.

Sounds good to me. I can create a new thread if it gets reverted.

[1]: /messages/by-id/7020df24-1d5f-41e5-8948-2e8d5da57935@dunslane.net

--
Regards,
Nazir Bilal Yavuz
Microsoft

#35Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#34)
Re: split func.sgml to separated individual sgml files

On 2025-10-06 Mo 6:44 AM, Nazir Bilal Yavuz wrote:

Hi,

On Mon, 6 Oct 2025 at 11:54, Peter Eisentraut <peter@eisentraut.org> wrote:

On 06.10.25 10:29, Nazir Bilal Yavuz wrote:

I can work on this but I want to clarify it first. Which one do you prefer:

1- We won't have any command to do syntax checks (including tab and
nbsp), these checks will automatically run when we generate docs.

2- We will have a 'check' target but it will only do tab and nbsp
checks; xmllint will run only when generating the docs.

I don't know, people have a lot of individual workflows, and they are
not reading this thread. I still don't know what we are actually trying
to fix here, I just noticed that what was committed is flawed.

The problem was meson build doesn't have tab and nbsp checks [1]. We
were trying to enable these checks on meson build by moving these
checks to the perl script so that we can run this script on both build
systems.

I would prefer that b2922562726 be reverted, and then someone start a
new thread with a descriptive change proposal.

Sounds good to me. I can create a new thread if it gets reverted.

[1] /messages/by-id/7020df24-1d5f-41e5-8948-2e8d5da57935@dunslane.net

OK, reverted.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#36Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#30)
Re: split func.sgml to separated individual sgml files

On Fri, Oct 3, 2025 at 10:41:56AM -0400, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

If you look at this more closely, creating postgres-full.xml and running
the syntax check perform the same operations, except that the latter
throws away the output. So it seems redundant to build a whole new code
path for this. I think you can make the check target dependent on
postgres-full.xml and be done, kind of like this (starting from
pre-b2922562726):

Would it be unreasonable to discard the "check" target altogether?
It made sense back in the day when actually building the html docs
took many minutes. But I haven't used it in years, so I wonder
if anyone else has either.

I run 'make check' on the SGML every time I build the C code.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#37Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#36)
Re: split func.sgml to separated individual sgml files

On Mon, Oct 6, 2025 at 10:55:53AM -0400, Bruce Momjian wrote:

On Fri, Oct 3, 2025 at 10:41:56AM -0400, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

If you look at this more closely, creating postgres-full.xml and running
the syntax check perform the same operations, except that the latter
throws away the output. So it seems redundant to build a whole new code
path for this. I think you can make the check target dependent on
postgres-full.xml and be done, kind of like this (starting from
pre-b2922562726):

Would it be unreasonable to discard the "check" target altogether?
It made sense back in the day when actually building the html docs
took many minutes. But I haven't used it in years, so I wonder
if anyone else has either.

I run 'make check' on the SGML every time I build the C code.

Uh, more accurately I run:

make --silent postgres.sgml
make --silent check
make check-tabs

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#37)
Re: split func.sgml to separated individual sgml files

Bruce Momjian <bruce@momjian.us> writes:

Uh, more accurately I run:

make --silent postgres.sgml
make --silent check
make check-tabs

If we included the tabs/nbsp checks in the normal build, then the
first of those would cover everything. Even as it is, I don't
think the "make check" step is adding anything.

regards, tom lane

#39Álvaro Herrera
alvherre@kurilemu.de
In reply to: Tom Lane (#30)
Re: split func.sgml to separated individual sgml files

On 2025-Oct-03, Tom Lane wrote:

Would it be unreasonable to discard the "check" target altogether?
It made sense back in the day when actually building the html docs
took many minutes. But I haven't used it in years, so I wonder
if anyone else has either.

I wouldn't particularly appreciate that. Doing "make check" takes 0.6
seconds for me, while the HTML build is 28 seconds. It's quite a
difference.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"This is a foot just waiting to be shot" (Andrew Dunstan)

#40Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#38)
Re: split func.sgml to separated individual sgml files

On Mon, Oct 6, 2025 at 11:13:24AM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Uh, more accurately I run:

make --silent postgres.sgml
make --silent check
make check-tabs

If we included the tabs/nbsp checks in the normal build, then the
first of those would cover everything. Even as it is, I don't
think the "make check" step is adding anything.

Looking at my test code, I do

$ make postgres.sgml
make: Nothing to be done for 'postgres.sgml'.

and my shell comment says it is so configure runs and can check that
works first, but it looks like it now does nothing.

I agree the "make --silent check-tabs" doesn't add anything because that
is already part of 'make check'.

I still would like to run checks without building the HTML.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#41Andrew Dunstan
andrew@dunslane.net
In reply to: Álvaro Herrera (#39)
Re: split func.sgml to separated individual sgml files

On 2025-10-06 Mo 12:00 PM, Álvaro Herrera wrote:

On 2025-Oct-03, Tom Lane wrote:

Would it be unreasonable to discard the "check" target altogether?
It made sense back in the day when actually building the html docs
took many minutes. But I haven't used it in years, so I wonder
if anyone else has either.

I wouldn't particularly appreciate that. Doing "make check" takes 0.6
seconds for me, while the HTML build is 28 seconds. It's quite a
difference.

OK, so I think that one's not going to fly. We could keep the check
target and also run the checks as part of building postgres-full.sgml.

It's less clear to me how to do that in meson, though, since you can
only have a single command in a custom target.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#41)
Re: split func.sgml to separated individual sgml files

Andrew Dunstan <andrew@dunslane.net> writes:

OK, so I think that one's not going to fly. We could keep the check
target and also run the checks as part of building postgres-full.sgml.

Works for me.

regards, tom lane

#43Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#41)
Re: split func.sgml to separated individual sgml files

Hi,

On 2025-10-07 14:39:44 -0400, Andrew Dunstan wrote:

It's less clear to me how to do that in meson, though, since you can only
have a single command in a custom target.

Create a stamp file for the check success and make that a dependency of
the main build too.

Greetings,

Andres Freund

#44Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Nazir Bilal Yavuz (#34)
Re: split func.sgml to separated individual sgml files

Hi,

On Mon, 6 Oct 2025 at 13:44, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

Sounds good to me. I can create a new thread if it gets reverted.

I created a new thread [1]/messages/by-id/CAN55FZ1qzoDcaKqsR3DwE=X6FL+wpm+=KLvH6ahrRXNhjU53DQ@mail.gmail.com and tried to apply recent feedback on this thread.

[1]: /messages/by-id/CAN55FZ1qzoDcaKqsR3DwE=X6FL+wpm+=KLvH6ahrRXNhjU53DQ@mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft