Docs Build in CI failing with "failed to load external entity"

Started by Melanie Plagemanabout 1 year ago7 messages
#1Melanie Plageman
melanieplageman@gmail.com

Hi,

I know in the past docs builds failing with "failed to load external
entity" have happened on macos. But, recently I've noticed this
failure for docs build on CI (not on macos) -- docs build is one of
the jobs run under the "Compiler Warnings" task.

See an example of this on CI for the github mirror [1]https://github.com/postgres/postgres/runs/32028560196.

Anyone know what the story is here? I couldn't find an existing thread
on this specific issue.

- Melanie

[1]: https://github.com/postgres/postgres/runs/32028560196

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Melanie Plageman (#1)
Re: Docs Build in CI failing with "failed to load external entity"

Melanie Plageman <melanieplageman@gmail.com> writes:

I know in the past docs builds failing with "failed to load external
entity" have happened on macos. But, recently I've noticed this
failure for docs build on CI (not on macos) -- docs build is one of
the jobs run under the "Compiler Warnings" task.

It looks to me like a broken docbook installation on (one of?)
the CI machines. Note that the *first* complaint is

[19:23:20.590] file:///etc/xml/catalog:1: parser error : Document is empty

I suspect that the subsequent "failed to load external entity"
complaints happen because the XML processor doesn't find any DTD
objects in the local catalog, so it tries to go out to the net for
them, and is foiled by the --no-net switch we use.

regards, tom lane

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#2)
Re: Docs Build in CI failing with "failed to load external entity"

On Fri, Oct 25, 2024 at 4:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Melanie Plageman <melanieplageman@gmail.com> writes:

I know in the past docs builds failing with "failed to load external
entity" have happened on macos. But, recently I've noticed this
failure for docs build on CI (not on macos) -- docs build is one of
the jobs run under the "Compiler Warnings" task.

It looks to me like a broken docbook installation on (one of?)
the CI machines. Note that the *first* complaint is

[19:23:20.590] file:///etc/xml/catalog:1: parser error : Document is empty

Yeah. That CI job runs on a canned Debian image that is rebuilt and
republished every couple of days to make sure it's using up to date
packages and kernel etc. Perhaps the package installation silently
corrupted /etc/xml/catalog, given that multiple packages probably mess
with it, though I don't have a specific theory for how that could
happen, given that package installation seems to be serial... The
installation log doesn't seem to show anything suspicious.

https://github.com/anarazel/pg-vm-images/
https://cirrus-ci.com/github/anarazel/pg-vm-images
https://cirrus-ci.com/build/5427240429682688
https://api.cirrus-ci.com/v1/task/6621385303261184/logs/build_image.log

I tried simply reinstalling docbook-xml in my own github account
(which is showing the problem), and it cleared the error:

   setup_additional_packages_script: |
-    #apt-get update
-    #DEBIAN_FRONTEND=noninteractive apt-get -y install ...
+    apt-get update
+    DEBIAN_FRONTEND=noninteractive apt-get -y install --reinstall docbook-xml

https://cirrus-ci.com/task/6458406242877440

I wonder if this will magically fix itself when the next CI image
build cron job kicks off. I have no idea what time zone this page is
showing but it should happen in another day or so, unless Andres is
around to kick it sooner:

https://cirrus-ci.com/github/anarazel/pg-vm-images

#4Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#3)
Re: Docs Build in CI failing with "failed to load external entity"

Hi,

On 2024-10-25 08:22:42 +0300, Thomas Munro wrote:

On Fri, Oct 25, 2024 at 4:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Melanie Plageman <melanieplageman@gmail.com> writes:

I know in the past docs builds failing with "failed to load external
entity" have happened on macos. But, recently I've noticed this
failure for docs build on CI (not on macos) -- docs build is one of
the jobs run under the "Compiler Warnings" task.

It looks to me like a broken docbook installation on (one of?)
the CI machines. Note that the *first* complaint is

[19:23:20.590] file:///etc/xml/catalog:1: parser error : Document is empty

Yeah. That CI job runs on a canned Debian image that is rebuilt and
republished every couple of days to make sure it's using up to date
packages and kernel etc. Perhaps the package installation silently
corrupted /etc/xml/catalog, given that multiple packages probably mess
with it, though I don't have a specific theory for how that could
happen, given that package installation seems to be serial... The
installation log doesn't seem to show anything suspicious.

Yea, it's clearly corrupted - the file is empty. I don't understand how that
can happen, particularly without any visible error. I certainly can't
reproduce it when installing the packages exactly the same way it happens for
the image.

I also don't think this happened before, despite the recipe for building the
images not having meaningfully changed in quite a while. So it must be some
rare edge case.

I wonder if this will magically fix itself when the next CI image
build cron job kicks off. I have no idea what time zone this page is
showing but it should happen in another day or so, unless Andres is
around to kick it sooner:

https://cirrus-ci.com/github/anarazel/pg-vm-images

I did trigger a rebuild of the image just now. Hopefully that'll fix it.

Greetings,

Andres Freund

#5Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#4)
Re: Docs Build in CI failing with "failed to load external entity"

Hi,

On 2024-10-25 04:14:03 -0400, Andres Freund wrote:

On 2024-10-25 08:22:42 +0300, Thomas Munro wrote:

I wonder if this will magically fix itself when the next CI image
build cron job kicks off. I have no idea what time zone this page is
showing but it should happen in another day or so, unless Andres is
around to kick it sooner:

https://cirrus-ci.com/github/anarazel/pg-vm-images

I did trigger a rebuild of the image just now. Hopefully that'll fix it.

It did.

Greetings,

Andres Freund

#6Melanie Plageman
melanieplageman@gmail.com
In reply to: Andres Freund (#5)
Re: Docs Build in CI failing with "failed to load external entity"

On Fri, Oct 25, 2024 at 4:31 AM Andres Freund <andres@anarazel.de> wrote:

On 2024-10-25 04:14:03 -0400, Andres Freund wrote:

On 2024-10-25 08:22:42 +0300, Thomas Munro wrote:

I wonder if this will magically fix itself when the next CI image
build cron job kicks off. I have no idea what time zone this page is
showing but it should happen in another day or so, unless Andres is
around to kick it sooner:

https://cirrus-ci.com/github/anarazel/pg-vm-images

I did trigger a rebuild of the image just now. Hopefully that'll fix it.

It did.

I noticed that CI for my fork of Postgres, which had been failing on
docs build and on test-running on the injection points test only on
freebsd, started working as expected again this morning. All of this
is a bit of magic to me -- are the CI images you build used by all of
our CIs?

- Melanie

#7Andres Freund
andres@anarazel.de
In reply to: Melanie Plageman (#6)
Re: Docs Build in CI failing with "failed to load external entity"

On 2024-10-25 09:34:41 -0400, Melanie Plageman wrote:

I noticed that CI for my fork of Postgres, which had been failing on
docs build and on test-running on the injection points test only on
freebsd, started working as expected again this morning. All of this
is a bit of magic to me -- are the CI images you build used by all of
our CIs?

Yes. Installing the packages every time would be far far too time consuming.