split func.sgml to separated individual sgml files

Started by jian heover 1 year ago44 messages
Jump to latest
#1jian he
jian.universality@gmail.com

hi.

move to a new thread.
Since the old thread[1]/messages/by-id/CA+TgmoZ2F+K0j=6BOJLD=YfpJMdJRXC7sWmtXGRjx1Rq0x8PUA@mail.gmail.com, many things have interacted together.

we are going to split func.sgml to 31 inviduaul sgml files.
the new file name pattern is "func-" as the prefix.
all the func-*.sgml files stored in doc/src/sgml/func
based on the original func.sgml line number,
each file has a line beginning and line end,
which is why the python script is long.

python3 v1-0001-split_func_sgml.py
git apply v1-0001-all-filelist-for-directory-doc-src-sgml-func.patch
execute the above two commands, we should go good to go.

The following is step-by-step logic.

#--step0 get line number info, and validate it.
in func.sgml, we have 62 "sect1", these corresponding to the line
begin and line end of the new sgml file.
so in func.sgml, we locate and validate it.
later we use the SED command to do the copy, and need these line
number information.

#--step1. create doc/src/sgml/func directory, move func.sgml to there.
create each new empty invidual sgml file

#---step2 construct sed copy commands and execute it.
This will roughly copy func.sgml content from line 52 to line 31655 to
corresponding new individual sgml file, based on
line number information.

#-----step3 validates that the new file only has 2 "sect1". also
validate new files "<sect1 id.*"is unique.".
just to make sure the output works fine.

#---setp4 truncate func.sgml begins from line 52.

#---step5 append place-holder string to func.sgml

[1]: /messages/by-id/CA+TgmoZ2F+K0j=6BOJLD=YfpJMdJRXC7sWmtXGRjx1Rq0x8PUA@mail.gmail.com
/messages/by-id/CA+TgmoZ2F+K0j=6BOJLD=YfpJMdJRXC7sWmtXGRjx1Rq0x8PUA@mail.gmail.com

Attachments:

v1-0001-split_func_sgml.pytext/x-python; charset=US-ASCII; name=v1-0001-split_func_sgml.pyDownload
v1-0001-all-filelist-for-directory-doc-src-sgml-func.patchtext/x-patch; charset=US-ASCII; name=v1-0001-all-filelist-for-directory-doc-src-sgml-func.patchDownload+44-2
#2Corey Huinker
corey.huinker@gmail.com
In reply to: jian he (#1)
Re: split func.sgml to separated individual sgml files

The following is step-by-step logic.

The end result (one file per section) seems good to me.

I suspect that reviewer burden may be the biggest barrier to going forward.
Perhaps breaking up the changes so that each new sect1 file gets its own
commit, allowing the reviewer to more easily (if not programmatically)
verify that the text that moved out of func.sgml moved into
func-sect-foo.sgml.

Granted, the committer will likely squash all of those commits down into
one big one, but by the the hard work of reviewing is done by then.

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Corey Huinker (#2)
Re: split func.sgml to separated individual sgml files

On Wed, Nov 13, 2024 at 1:11 PM Corey Huinker <corey.huinker@gmail.com>
wrote:

The following is step-by-step logic.

The end result (one file per section) seems good to me.

I suspect that reviewer burden may be the biggest barrier to going
forward. Perhaps breaking up the changes so that each new sect1 file gets
its own commit, allowing the reviewer to more easily (if not
programmatically) verify that the text that moved out of func.sgml moved
into func-sect-foo.sgml.

Granted, the committer will likely squash all of those commits down into
one big one, but by the the hard work of reviewing is done by then.

Validation is pretty trivial. I just built the before and after HTML files
and confirmed they are exactly the same size.

I suppose we might have lost some comments or something that wouldn't end
up visible in the HTML (seems unlikely) but this is basically one-and-done
so long as you don't let other commits happen (that touch this area) while
you extract and build HEAD and then compare it to the patched build
results. The git diff will let us know the script didn't affect any source
files it wasn't supposed to.

In short, ready to commit (see last paragraph below however), but the
committer will need to run the python script at the time of commit on the
then-current tree.

In my recent patch touching filelist.sgml I would be placing this new
%allfiles_func; line pairing at the top just beneath %allfiles; which is
the first child element. But the choice made here makes sense should this
go in first.

There is little downside, though, to renaming the existing %allfiles; to
%allfiles_ref; It's a local-only name.

David J.

#4jian he
jian.universality@gmail.com
In reply to: David G. Johnston (#3)
Re: split func.sgml to separated individual sgml files

On Thu, Mar 20, 2025 at 10:16 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

In short, ready to commit (see last paragraph below however), but the committer will need to run the python script at the time of commit on the then-current tree.

hi.
more explanation, since the python script seems quite large...

each <sect1 id="functions-XXX"> in doc/src/sgml/func.sgml
corresponds to each individual section in [1]https://www.postgresql.org/docs/current/functions.html.

each <sect1 id="functions-XXX"> within func.sgml is unique.
if you try to rename it, having two <sect1 id="functions-logical">
will error out saying something like:
../../Desktop/pg_src/src6/postgres/doc/src/sgml/postgres.sgml:199:
element sect1: validity error : ID functions-logical already defined
see [2]https://en.wikipedia.org/wiki/Document_type_definition also.

Based on this, we can use the literal string <sect1 id="functions-XXX"> to
perform pattern matching and identify the line numbers that mark the start and
end of each <sect1> section.

The polished v2 python script use the following steps for splitting func.sgml
into several pieces:

0. For each 9.X section listed in [1]https://www.postgresql.org/docs/current/functions.html, create an empty SGML file to hold the
corresponding content.

1. Use the pattern <sect1 id="functions-XXX"> to locate the starting and ending
line number of each section in func.sgml

2. Copy func.sgml all the content block (<sect1>)

<sect1 id="functions-XXX">
...main content
</sect1>

into the newly created SGML files.

3. Remove the copied content from func.sgml.
4. In func.sgml, insert general entity references [3]https://www.gnu.org/software/sed/manual/html_node/Command_002dLine-Options.html#index-_002di to include the newly
created SGML files.

because PG18, and PG17, Chapter 9. Functions and Operators
have the same amount of section (31),

so v1-0001-split_func_sgml.py will work just fine.
but I did some minor changes, therefore v2 attached.

----------------------------------------------------
I used the sed --in-place option [3]https://www.gnu.org/software/sed/manual/html_node/Command_002dLine-Options.html#index-_002di to modify and truncate the original large
func.sgml file directly.
I also used the -n and -p options with sed to extract lines from func.sgml
between line X and line Y, as shown in reference [4]https://www.gnu.org/software/sed/manual/html_node/Common-Commands.html#index-n-_0028next_002dline_0029.

for the attach file:
first run ``python3 v2-0001-split_func_sgml.py``
then run ``git apply v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot``
(`git am` won't work, need to use `git apply`).

[1]: https://www.postgresql.org/docs/current/functions.html
[2]: https://en.wikipedia.org/wiki/Document_type_definition
[3]: https://www.gnu.org/software/sed/manual/html_node/Command_002dLine-Options.html#index-_002di
[4]: https://www.gnu.org/software/sed/manual/html_node/Common-Commands.html#index-n-_0028next_002dline_0029

Attachments:

v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbotapplication/octet-stream; name=v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbotDownload+44-2
v2-0001-split_func_sgml.pytext/x-python; charset=US-ASCII; name=v2-0001-split_func_sgml.pyDownload
#5jian he
jian.universality@gmail.com
In reply to: jian he (#4)
Re: split func.sgml to separated individual sgml files

hi.

after run the v2 python script and ``git apply
v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot``
git status -u
shows:

Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: doc/src/sgml/filelist.sgml
deleted: doc/src/sgml/func.sgml

That means to verify the changes, we only need to verify html files
related to "functions".

I use GNU diff to compare the HTML output of doc/src/sgml/func.sgml generated
from the master branch against the HTML file produced by the patch.
For example, $DOC9 is the PATCH (split func.sgml) html file directory, $DOC5 is
the master branch html file directory. and no message produced while running
diff, which means the patch (with the script) produced output is the
same as the master branch.

diff $DOC5/functions.html $DOC9/functions.html
diff $DOC5/functions-logical.html $DOC9/functions-logical.html
diff $DOC5/functions-comparison.html $DOC9/functions-comparison.html
diff $DOC5/functions-math.html $DOC9/functions-math.html
diff $DOC5/functions-string.html $DOC9/functions-string.html
diff $DOC5/functions-binarystring.html $DOC9/functions-binarystring.html
diff $DOC5/functions-matching.html $DOC9/functions-matching.html
diff $DOC5/functions-formatting.html $DOC9/functions-formatting.html
diff $DOC5/functions-datetime.html $DOC9/functions-datetime.html
diff $DOC5/functions-enum.html $DOC9/functions-enum.html
diff $DOC5/functions-geometry.html $DOC9/functions-geometry.html
diff $DOC5/functions-net.html $DOC9/functions-net.html
diff $DOC5/functions-textsearch.html $DOC9/functions-textsearch.html
diff $DOC5/functions-uuid.html $DOC9/functions-uuid.html
diff $DOC5/functions-xml.html $DOC9/functions-xml.html
diff $DOC5/functions-json.html $DOC9/functions-json.html
diff $DOC5/functions-sequence.html $DOC9/functions-sequence.html
diff $DOC5/functions-conditional.html $DOC9/functions-conditional.html
diff $DOC5/functions-array.html $DOC9/functions-array.html
diff $DOC5/functions-range.html $DOC9/functions-range.html
diff $DOC5/functions-aggregate.html $DOC9/functions-aggregate.html
diff $DOC5/functions-window.html $DOC9/functions-window.html
diff $DOC5/functions-merge-support.html $DOC9/functions-merge-support.html
diff $DOC5/functions-subquery.html $DOC9/functions-subquery.html
diff $DOC5/functions-comparisons.html $DOC9/functions-comparisons.html
diff $DOC5/functions-srf.html $DOC9/functions-srf.html
diff $DOC5/functions-info.html $DOC9/functions-info.html
diff $DOC5/functions-admin.html $DOC9/functions-admin.html
diff $DOC5/functions-trigger.html $DOC9/functions-trigger.html
diff $DOC5/functions-event-triggers.html $DOC9/functions-event-triggers.html
diff $DOC5/functions-statistics.html $DOC9/functions-statistics.html
#6Andrew Dunstan
andrew@dunslane.net
In reply to: jian he (#5)
Re: split func.sgml to separated individual sgml files

On 2025-07-29 Tu 2:15 AM, jian he wrote:

hi.

after run the v2 python script and ``git apply
v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot``
git status -u
shows:

Changes not staged for commit:
(use "git add/rm <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: doc/src/sgml/filelist.sgml
deleted: doc/src/sgml/func.sgml

That means to verify the changes, we only need to verify html files
related to "functions".

I use GNU diff to compare the HTML output of doc/src/sgml/func.sgml generated
from the master branch against the HTML file produced by the patch.
For example, $DOC9 is the PATCH (split func.sgml) html file directory, $DOC5 is
the master branch html file directory. and no message produced while running
diff, which means the patch (with the script) produced output is the
same as the master branch.

[snip]

OK. I'm inclined to do this after the CF finishes, to avoid collisions
with other patches. I assume it's going to make the CFbot fairly unhappy.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#6)
Re: split func.sgml to separated individual sgml files

Andrew Dunstan <andrew@dunslane.net> writes:

OK. I'm inclined to do this after the CF finishes, to avoid collisions
with other patches. I assume it's going to make the CFbot fairly unhappy.

+1 for proceeding that way. (I did not look at whether the proposed
changes are sane, but I agree that this'll inevitably break a lot of
pending patches.)

regards, tom lane

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#7)
Re: split func.sgml to separated individual sgml files

On 2025-07-29 Tu 11:40 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

OK. I'm inclined to do this after the CF finishes, to avoid collisions
with other patches. I assume it's going to make the CFbot fairly unhappy.

+1 for proceeding that way. (I did not look at whether the proposed
changes are sane, but I agree that this'll inevitably break a lot of
pending patches.)

Done.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#9Florents Tselai
florents.tselai@gmail.com
In reply to: Andrew Dunstan (#8)
Re: split func.sgml to separated individual sgml files

On 4 Aug 2025, at 4:09 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-07-29 Tu 11:40 AM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

OK. I'm inclined to do this after the CF finishes, to avoid collisions
with other patches. I assume it's going to make the CFbot fairly unhappy.

+1 for proceeding that way. (I did not look at whether the proposed
changes are sane, but I agree that this'll inevitably break a lot of
pending patches.)

Done.

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Attachments:

sgml-func-Makefile.patchapplication/octet-stream; name=sgml-func-Makefile.patchDownload+1-1
#10Euler Taveira
euler@eulerto.com
In reply to: Florents Tselai (#9)
Re: split func.sgml to separated individual sgml files

On Mon, Sep 1, 2025, at 7:35 AM, Florents Tselai wrote:

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Good catch.

However, your patch doesn't fix all issues. The check target (check-tabs and
check-nbsp) is broken; these targets should also include the func files.

--
Euler Taveira
EDB https://www.enterprisedb.com/

Attachments:

v2-0001-doc-fix-Makefile-after-func.sgml-split.patchtext/x-patch; name="=?UTF-8?Q?v2-0001-doc-fix-Makefile-after-func.sgml-split.patch?="Download+3-4
#11Florents Tselai
florents.tselai@gmail.com
In reply to: Euler Taveira (#10)
Re: split func.sgml to separated individual sgml files

On 1 Sep 2025, at 4:35 PM, Euler Taveira <euler@eulerto.com> wrote:

On Mon, Sep 1, 2025, at 7:35 AM, Florents Tselai wrote:

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Good catch.

However, your patch doesn't fix all issues. The check target (check-tabs and
check-nbsp) is broken; these targets should also include the func files.

Ah, you’re right, but then again, I’d expect ALL_SGML to be used
consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I think
there's no harm in checking them too.

Attachments:

v3-0001-The-commit-4e23c9ef65a-forgot-to-add-dependencies.patchapplication/octet-stream; name=v3-0001-The-commit-4e23c9ef65a-forgot-to-add-dependencies.patchDownload+5-5
#12Andrew Dunstan
andrew@dunslane.net
In reply to: Florents Tselai (#11)
Re: split func.sgml to separated individual sgml files

On 2025-09-01 Mo 11:44 AM, Florents Tselai wrote:

On 1 Sep 2025, at 4:35 PM, Euler Taveira <euler@eulerto.com> wrote:

On Mon, Sep 1, 2025, at 7:35 AM, Florents Tselai wrote:

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Good catch.

However, your patch doesn't fix all issues. The check target
(check-tabs and
check-nbsp) is broken; these targets should also include the func files.

Ah, you’re right, but then again,  I’d expect ALL_SGML to be used
consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I
think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles
anywhere. I note that the meson.build doesn't appear to have a check
target at all, or anything that looks for hard tabs or nbsps.Those
checks were added to the Makefile back in October in commit 5b7da5c261d,
but that got missed even though Daniel had mentioned it in the
discussion thread.[1]/messages/by-id/F7102912-0BDA-42A3-BDCF-8A4CBD1CC688@yesql.se

cheers

andrew

[1]: /messages/by-id/F7102912-0BDA-42A3-BDCF-8A4CBD1CC688@yesql.se
/messages/by-id/F7102912-0BDA-42A3-BDCF-8A4CBD1CC688@yesql.se

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#13Florents Tselai
florents.tselai@gmail.com
In reply to: Andrew Dunstan (#12)
Re: split func.sgml to separated individual sgml files

On Tue, Sep 2, 2025 at 5:54 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2025-09-01 Mo 11:44 AM, Florents Tselai wrote:

On 1 Sep 2025, at 4:35 PM, Euler Taveira <euler@eulerto.com> wrote:

On Mon, Sep 1, 2025, at 7:35 AM, Florents Tselai wrote:

While working on this https://commitfest.postgresql.org/patch/6020/
I discovered that when changing for func/func-aggregate.sgml, the HTML
wasn’t marked for update.

IIUC the doc/Makefile should be updated as attached, right ?

Good catch.

However, your patch doesn't fix all issues. The check target (check-tabs
and
check-nbsp) is broken; these targets should also include the func files.

Ah, you’re right, but then again, I’d expect ALL_SGML to be used
consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I
think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles
anywhere. I note that the meson.build doesn't appear to have a check target
at all, or anything that looks for hard tabs or nbsps.Those checks were
added to the Makefile back in October in commit 5b7da5c261d, but that got
missed even though Daniel had mentioned it in the discussion thread.[1]

From the message and discussion in 5b7da5c261d it looks like we do;
and I've seen some messages here and there that people have indeed trouble
applying patches due to spurious whitespace
and special chars.
So I assume the better solution would be having such checks in meson too,

#14Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#12)
Re: split func.sgml to separated individual sgml files

Hi,

On Tue, 2 Sept 2025 at 17:54, Andrew Dunstan <andrew@dunslane.net> wrote:

Ah, you’re right, but then again, I’d expect ALL_SGML to be used consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles anywhere. I note that the meson.build doesn't appear to have a check target at all, or anything that looks for hard tabs or nbsps.Those checks were added to the Makefile back in October in commit 5b7da5c261d, but that got missed even though Daniel had mentioned it in the discussion thread.[1]

I have been working on running these checks under the Meson build
system. To do this, I converted the checks into a Perl script
(sgml_syntax_check) and ran it against both the Makefile and Meson.
Test's name is 'sgml_syntax_check' in the Meson. One difference I
noticed: I could not find a way in Meson to create a test that does
not run by default. As a result, this syntax test runs every time you
run the 'meson test'. This behaviour differs from Autoconf, but I
think it is acceptable.

Additionally, some of the CI OSes were missing docbook-xml; but it has
now been installed.

I did not create a new thread for that, I can create one if you think
that it would be better.

CI run with the attached patch applied:
https://cirrus-ci.com/build/6610354173640704

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

Add-sgml_syntax_check-test-to-the-Meson-build.txttext/plain; charset=US-ASCII; name=Add-sgml_syntax_check-test-to-the-Meson-build.txtDownload+146-15
#15Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#14)
Re: split func.sgml to separated individual sgml files

On 2025-09-12 Fr 10:12 AM, Nazir Bilal Yavuz wrote:

Hi,

On Tue, 2 Sept 2025 at 17:54, Andrew Dunstan <andrew@dunslane.net> wrote:

Ah, you’re right, but then again, I’d expect ALL_SGML to be used consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles anywhere. I note that the meson.build doesn't appear to have a check target at all, or anything that looks for hard tabs or nbsps.Those checks were added to the Makefile back in October in commit 5b7da5c261d, but that got missed even though Daniel had mentioned it in the discussion thread.[1]

I have been working on running these checks under the Meson build
system.

Thanks for this!

To do this, I converted the checks into a Perl script
(sgml_syntax_check) and ran it against both the Makefile and Meson.
Test's name is 'sgml_syntax_check' in the Meson. One difference I
noticed: I could not find a way in Meson to create a test that does
not run by default. As a result, this syntax test runs every time you
run the 'meson test'. This behaviour differs from Autoconf, but I
think it is acceptable.

Yes, I think so too.

Additionally, some of the CI OSes were missing docbook-xml; but it has
now been installed.

I did not create a new thread for that, I can create one if you think
that it would be better.

CI run with the attached patch applied:
https://cirrus-ci.com/build/6610354173640704

I am away this coming week, will check it out in detail when I return.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#14)
Re: split func.sgml to separated individual sgml files

On 2025-09-12 Fr 10:12 AM, Nazir Bilal Yavuz wrote:

Hi,

On Tue, 2 Sept 2025 at 17:54, Andrew Dunstan <andrew@dunslane.net> wrote:

Ah, you’re right, but then again, I’d expect ALL_SGML to be used consistently, but it isn't and I didn't check.
v3 does that.
Note that GENERATED_SGML where'te included in these two targets but I think there's no harm in checking them too.

Do we actually care about those? I don't want to add needless cycles anywhere. I note that the meson.build doesn't appear to have a check target at all, or anything that looks for hard tabs or nbsps.Those checks were added to the Makefile back in October in commit 5b7da5c261d, but that got missed even though Daniel had mentioned it in the discussion thread.[1]

I have been working on running these checks under the Meson build
system. To do this, I converted the checks into a Perl script
(sgml_syntax_check) and ran it against both the Makefile and Meson.
Test's name is 'sgml_syntax_check' in the Meson. One difference I
noticed: I could not find a way in Meson to create a test that does
not run by default. As a result, this syntax test runs every time you
run the 'meson test'. This behaviour differs from Autoconf, but I
think it is acceptable.

Additionally, some of the CI OSes were missing docbook-xml; but it has
now been installed.

I did not create a new thread for that, I can create one if you think
that it would be better.

CI run with the attached patch applied:
https://cirrus-ci.com/build/6610354173640704

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachments:

0001-Improve-docs-syntax-checking.patchtext/x-patch; charset=UTF-8; name=0001-Improve-docs-syntax-checking.patchDownload+103-15
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#16)
Re: split func.sgml to separated individual sgml files

Andrew Dunstan <andrew@dunslane.net> writes:

On 2025-09-12 Fr 10:12 AM, Nazir Bilal Yavuz wrote:

Test's name is 'sgml_syntax_check' in the Meson. One difference I
noticed: I could not find a way in Meson to create a test that does
not run by default. As a result, this syntax test runs every time you
run the 'meson test'. This behaviour differs from Autoconf, but I
think it is acceptable.

Actually, I've been meaning to complain about the fact that these
checks aren't run by the default Makefile target. I never remember
that there is a separate "check" target, and even if I did remember
it's mostly useless to me because I always want to look at the
rendered HTML. So when I'm working on the docs I always just say
"make" in the doc/src/sgml directory. It'd be helpful, at least to
me, if the default target ran the tabs and nbsp checks. It already
does run xmllint, so that change could probably be integrated with
what you've done here without too much trouble.

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

+1 for merging those two checks into one pass, especially if we're
to run them by default.

regards, tom lane

#18Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#16)
Re: split func.sgml to separated individual sgml files

Hi,

On Tue, 30 Sept 2025 at 22:48, Andrew Dunstan <andrew@dunslane.net> wrote:

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

It looks good to me. I made 2 changes to your patch:

1- Declaration of $line_no is lost, I re-added it.
2- s/.cirrus.tasks,yml/.cirrus.tasks.yml/ in the commit message.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v2-0001-Improve-docs-syntax-checking.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Improve-docs-syntax-checking.patchDownload+105-15
#19Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Nazir Bilal Yavuz (#18)
Re: split func.sgml to separated individual sgml files

Hi,

On Wed, 1 Oct 2025 at 15:09, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Tue, 30 Sept 2025 at 22:48, Andrew Dunstan <andrew@dunslane.net> wrote:

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

It looks good to me. I made 2 changes to your patch:

1- Declaration of $line_no is lost, I re-added it.
2- s/.cirrus.tasks,yml/.cirrus.tasks.yml/ in the commit message.

Two more minor changes that I missed in the v2:

1- I added $line_no and removed $_ from the tab check's warning
message. I think it is better this way, otherwise if the line only
contains tab character; $_ will print an empty looking line.
2- s/Tabsand/Tabs and/

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachments:

v3-0001-Improve-docs-syntax-checking.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Improve-docs-syntax-checking.patchDownload+105-15
#20Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#19)
Re: split func.sgml to separated individual sgml files

On 2025-10-01 We 8:27 AM, Nazir Bilal Yavuz wrote:

Hi,

On Wed, 1 Oct 2025 at 15:09, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:

On Tue, 30 Sept 2025 at 22:48, Andrew Dunstan <andrew@dunslane.net> wrote:

Hi Bilal,

This got preempted slightly by Tom's commit 170a8a3f460, but I think
it's worth doing. I tried to simplify it some. See attached. There
doesn't seem to me to be any point in using a different set of files for
the tab tests and the NBSP tests. If we use the same set of files we can
improve the efficiency easily by opening them only once. Here we just
look for all the sgml files and all the xsl files and process them all.

WDYT?

It looks good to me. I made 2 changes to your patch:

1- Declaration of $line_no is lost, I re-added it.
2- s/.cirrus.tasks,yml/.cirrus.tasks.yml/ in the commit message.

Two more minor changes that I missed in the v2:

1- I added $line_no and removed $_ from the tab check's warning
message. I think it is better this way, otherwise if the line only
contains tab character; $_ will print an empty looking line.
2- s/Tabsand/Tabs and/

OK, thanks, looks good. How do we go about doing what Tom wants (i.e.
running the tests by default) under meson. I think in the Makefile we
could just add it to the html target.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#21Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#20)
#22Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#21)
#23Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#22)
#24Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#23)
#25Peter Eisentraut
peter_e@gmx.net
In reply to: Andrew Dunstan (#20)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#25)
#27Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#24)
#28Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Tom Lane (#26)
#29Peter Eisentraut
peter_e@gmx.net
In reply to: Nazir Bilal Yavuz (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#29)
#31Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#30)
#32Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Andrew Dunstan (#31)
#33Peter Eisentraut
peter_e@gmx.net
In reply to: Nazir Bilal Yavuz (#32)
#34Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Peter Eisentraut (#33)
#35Andrew Dunstan
andrew@dunslane.net
In reply to: Nazir Bilal Yavuz (#34)
#36Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#30)
#37Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#37)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#30)
#40Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#38)
#41Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#39)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#41)
#43Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#41)
#44Nazir Bilal Yavuz
byavuz81@gmail.com
In reply to: Nazir Bilal Yavuz (#34)