what's going on with lapwing?
lapwing has been failing sepgsql-check for over a month, but there's
no log file:
I feel like something must be broken with logfile collection in the
buildfarm client, because I keep seeing failures like this where
there's no log to view. It's hard to know whether this is a
configuration issue with this BF member or whether something actually
needs fixing. In any case, we should try to figure out what's going on
here.
Apologies if this has been discussed elsewhere and I have missed it.
Pointers appreciated.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
lapwing has been failing sepgsql-check for over a month, but there's
no log file:
I believe it hasn't been updated with the buildfarm client changes
needed to run sepgsql-check since aeb8ea361.
regards, tom lane
On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
lapwing has been failing sepgsql-check for over a month, but there's
no log file:I believe it hasn't been updated with the buildfarm client changes
needed to run sepgsql-check since aeb8ea361.
OK, thanks. The owner is pgbuildfarm@rjuju.net, which I'm guessing
might be Julien Rouhaud, but not sure? It would be good to get that
updated, especially with the end of the release cycle hard upon us.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Mar 03, 2025 at 03:52:30PM -0500, Robert Haas wrote:
On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
lapwing has been failing sepgsql-check for over a month, but there's
no log file:I believe it hasn't been updated with the buildfarm client changes
needed to run sepgsql-check since aeb8ea361.
Well, AFAIK the usual habit when something is broken and a buildfarm cilent
upgrade is needed is to warn the buildfarm owners. There was an email
yesterday for installing libcurl which I did. There was an email before last
release for possibly stuck tests which I checked. There was no such email to
ask to update the client, so I'm not sure why you expected me to do so?
Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm
client release (18) is from november, did I miss something?
OK, thanks. The owner is pgbuildfarm@rjuju.net, which I'm guessing
might be Julien Rouhaud, but not sure?
Yes it's me. I'm not sure why it's such a mystery as I've been answering for
it for many years. Having an alias is useful to me as I can redirect it to
some private mailbox with less traffic and therefore react promptly in case of
any problem.
Julien Rouhaud <rjuju123@gmail.com> writes:
On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I believe it hasn't been updated with the buildfarm client changes
needed to run sepgsql-check since aeb8ea361.
Well, AFAIK the usual habit when something is broken and a buildfarm cilent
upgrade is needed is to warn the buildfarm owners. There was an email
yesterday for installing libcurl which I did. There was an email before last
release for possibly stuck tests which I checked. There was no such email to
ask to update the client, so I'm not sure why you expected me to do so?
Yeah, I think a new buildfarm release is overdue. We have this issue
affecting sepgsql-check, and we have the TestUpgradeXversion changes
that are necessary for that still to work, and it's not great to
expect owners to run hand-patched scripts.
regards, tom lane
On Mon, Mar 3, 2025 at 6:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Well, AFAIK the usual habit when something is broken and a buildfarm cilent
upgrade is needed is to warn the buildfarm owners. There was an email
yesterday for installing libcurl which I did. There was an email before last
release for possibly stuck tests which I checked. There was no such email to
ask to update the client, so I'm not sure why you expected me to do so?Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm
client release (18) is from november, did I miss something?
Honestly, I just noticed that the buildfarm member in question had
been red for over a month and I figured it was a setup issue of some
kind. I guess I was wrong. It didn't cross my mind that a commit over
a month ago had broken things and there had been no subsequent revert
or buildfarm client release. I'm frankly quite astonished because Tom
recently told me that I needed to fix something RIGHT NOW or else
revert when the commit had been in the tree for two hours. Given that
context, maybe you can understand why I thought it was unlikely that
we were just chilling about this being broken after 38 days.
Yes it's me. I'm not sure why it's such a mystery as I've been answering for
it for many years. Having an alias is useful to me as I can redirect it to
some private mailbox with less traffic and therefore react promptly in case of
any problem.
Buildfarm members are labelled with an email address, not a name. I
don't magically know the names of all the people who correspond to
those email addresses. I've learned some of them over time, but I
didn't recognize rjuju.net.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2025-03-03 Mo 6:12 PM, Tom Lane wrote:
Julien Rouhaud<rjuju123@gmail.com> writes:
On Mon, Mar 3, 2025 at 2:53 PM Tom Lane<tgl@sss.pgh.pa.us> wrote:
I believe it hasn't been updated with the buildfarm client changes
needed to run sepgsql-check since aeb8ea361.Well, AFAIK the usual habit when something is broken and a buildfarm cilent
upgrade is needed is to warn the buildfarm owners. There was an email
yesterday for installing libcurl which I did. There was an email before last
release for possibly stuck tests which I checked. There was no such email to
ask to update the client, so I'm not sure why you expected me to do so?Yeah, I think a new buildfarm release is overdue. We have this issue
affecting sepgsql-check, and we have the TestUpgradeXversion changes
that are necessary for that still to work, and it's not great to
expect owners to run hand-patched scripts.
Yeah, I try to avoid making too many releases, but I agree it's time to
push one.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
On Tue, Mar 04, 2025 at 08:33:46AM -0500, Robert Haas wrote:
On Mon, Mar 3, 2025 at 6:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Well, AFAIK the usual habit when something is broken and a buildfarm cilent
upgrade is needed is to warn the buildfarm owners. There was an email
yesterday for installing libcurl which I did. There was an email before last
release for possibly stuck tests which I checked. There was no such email to
ask to update the client, so I'm not sure why you expected me to do so?Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm
client release (18) is from november, did I miss something?Honestly, I just noticed that the buildfarm member in question had
been red for over a month and I figured it was a setup issue of some
kind. I guess I was wrong. It didn't cross my mind that a commit over
a month ago had broken things and there had been no subsequent revert
or buildfarm client release. I'm frankly quite astonished because Tom
recently told me that I needed to fix something RIGHT NOW or else
revert when the commit had been in the tree for two hours. Given that
context, maybe you can understand why I thought it was unlikely that
we were just chilling about this being broken after 38 days.
Yes, I do remember the 2 threads and I totally understand. On my side I know
that I don't have as much time as I'd like to contribute but I at least make
sure that I don't leave my animal broken because of something on my side. So I
did try to check when I received the first failure notifications, also saw that
there were no logs, found a few other animals with the same symptoms and
concluded that it would eventually be resolved either with a fix or some
instructions on the buildfarm client side or something. I did check again a
few days ago when I was that nothing was pushed anymore on that branch only,
saw that it was the same on other animals, got confused and gave up.
Yes it's me. I'm not sure why it's such a mystery as I've been answering for
it for many years. Having an alias is useful to me as I can redirect it to
some private mailbox with less traffic and therefore react promptly in case of
any problem.Buildfarm members are labelled with an email address, not a name. I
don't magically know the names of all the people who correspond to
those email addresses. I've learned some of them over time, but I
didn't recognize rjuju.net.
Ah sorry, I thought that the "rjuju" was a poor enough alias (which I more or
less use everywhere) that it would be recognisable. In any case I do receive
the emails so I can reply, just not from that address.
On Tue, Mar 4, 2025 at 9:26 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Yes, I do remember the 2 threads and I totally understand. On my side I know
instructions on the buildfarm client side or something. I did check again a
few days ago when I was that nothing was pushed anymore on that branch only,
saw that it was the same on other animals, got confused and gave up.
Sure, makes sense.
Buildfarm members are labelled with an email address, not a name. I
don't magically know the names of all the people who correspond to
those email addresses. I've learned some of them over time, but I
didn't recognize rjuju.net.Ah sorry, I thought that the "rjuju" was a poor enough alias (which I more or
less use everywhere) that it would be recognisable. In any case I do receive
the emails so I can reply, just not from that address.
I think honestly at some point in the past I did have that alias <=>
real name mapping in my brain, but somewhere along the way it fell
out. On the list, I tend to pay more attention to people advertised
real names than to their nicknames/handles, because that's what goes
in release notes, what people's name tag will say at a conference,
etc.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
Honestly, I just noticed that the buildfarm member in question had
been red for over a month and I figured it was a setup issue of some
kind. I guess I was wrong. It didn't cross my mind that a commit over
a month ago had broken things and there had been no subsequent revert
or buildfarm client release. I'm frankly quite astonished because Tom
recently told me that I needed to fix something RIGHT NOW or else
revert when the commit had been in the tree for two hours. Given that
context, maybe you can understand why I thought it was unlikely that
we were just chilling about this being broken after 38 days.
As far as that goes, there are degrees of buildfarm brokenness.
Right now we have three remaining animals that are affected by the
sepgsql issue (two of which seem to have been stopped awhile ago by
their owners), and the error is perfectly predictable/repeatable.
So it's easy to ignore.
At the point where I complained to you about that other problem,
it was looking like it might cause a quarter or a third of the
buildfarm to fail intermittently. Maybe I overestimated the
frequency of the failure, but if that was accurate it would have
resulted in a lot of fruitless double-checking of failures to
see if there was anything real underneath the noise. So I find
that sort of case much more painful.
But yeah, I thought we were overdue for a buildfarm release.
I'm pleased to see that Andrew just pushed one.
regards, tom lane
On Tue, Mar 4, 2025 at 10:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
At the point where I complained to you about that other problem,
it was looking like it might cause a quarter or a third of the
buildfarm to fail intermittently. Maybe I overestimated the
frequency of the failure, but if that was accurate it would have
resulted in a lot of fruitless double-checking of failures to
see if there was anything real underneath the noise. So I find
that sort of case much more painful.
I think that's actually totally fair. I was not upset that you wanted
it fixed, or even that you wanted it fixed relatively quickly. The
things that I was upset about were:
1. There's no real way for me to avoid this kind of pain. That's not
your fault, but it is something that I think we need to address as a
community. As Jelte said on the other thread, other projects have
infrastructure that allows them to avoid these kinds of problems by
being able to do pre-commit testing. Having modern infrastructure for
stuff like this is an important part of attracting and retaining
developers.
2. Two hours is just not enough time. Never mind that people like to
have evenings and weekends off -- this is supposed to be a community
that operates by consensus. It doesn't seem right to spend months or
years discussing the design before committing, and then after commit,
boom, you have to make a unilateral decision about what to change
within -- not even hours, but minutes. Because you also need time for
BF results to show up, and then you need time to code and test
whatever you decided. I would actually be quite sympathetic to the
time frame here if we were immediately before a feature or release
freeze when there is no tolerance for error, but not in this
situation.
3. You ignored the substantive questions that I asked you to comment
only on the procedural issue of fixing the BF, even though you were a
previous participant in the discussion on that patch.
Maybe my tolerance for reverts is just lower than yours. I think it's
bad when somebody has a problem like this, insta-reverts, then tries
it again later after changing the patch, then maybe the same thing
happens again, gets insta-reverted a second time, then maybe the third
time the commit actually sticks. I think that clutters up the commit
history with a bunch of junk, and that junk is permanent. The
buildfarm being red is bad, but after it's fixed it will be green
again and the time for which it was red will have little enduring
impact. But everybody who tries to find stuff in the commit log is
potentially inconvenienced by reverts, forever. Commands like 'git log
FILENAME' or 'git log -Gstring' will now return extra, spurious hits.
If I absolutely have to choose between the BF being red for a couple
of days and revert ping-pong, I would prefer the former.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Tue, Mar 04, 2025 at 10:18:49AM -0500, Tom Lane wrote:
But yeah, I thought we were overdue for a buildfarm release.
I'm pleased to see that Andrew just pushed one.
FWIW I installed the client version 19.1 this morning and forced a run on HEAD
and lapwing is back to green.
On Wed, Mar 5, 2025 at 9:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
FWIW I installed the client version 19.1 this morning and forced a run on HEAD
and lapwing is back to green.
Thanks, appreciate it.
By the way, is there a particular reason why we're keeping Debian 7
coverage in the buildfarm? I don't want to be in a huge rush to kill
platforms people still care about, but it was pointed out to me
off-list that this is quite an old release -- it seems Debian 7 was
first released in 2013, last released in 2016, EOL in 2018. I assume
that not too many people are going to install a PostgreSQL release
that comes out in 2025 on an OS that's been EOL for 7 years (or 12
years if the BF page is correct that this is actually Debian 7.0).
Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and
13, but not 8 or 10. Is there a theory behind all of this or is the
current situation somewhat accidental?
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2025-03-06 Th 10:45 AM, Robert Haas wrote:
On Wed, Mar 5, 2025 at 9:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
FWIW I installed the client version 19.1 this morning and forced a run on HEAD
and lapwing is back to green.Thanks, appreciate it.
By the way, is there a particular reason why we're keeping Debian 7
coverage in the buildfarm? I don't want to be in a huge rush to kill
platforms people still care about, but it was pointed out to me
off-list that this is quite an old release -- it seems Debian 7 was
first released in 2013, last released in 2016, EOL in 2018. I assume
that not too many people are going to install a PostgreSQL release
that comes out in 2025 on an OS that's been EOL for 7 years (or 12
years if the BF page is correct that this is actually Debian 7.0).
Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and
13, but not 8 or 10. Is there a theory behind all of this or is the
current situation somewhat accidental?
Fairly accidental, I think.
We do have a project at EDB at fill in certain gaps in buildfarm
coverage, so maybe we can reduce the incidence of such accidents.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
Hi,
On 2025-03-06 11:57:05 -0500, Andrew Dunstan wrote:
On 2025-03-06 Th 10:45 AM, Robert Haas wrote:
By the way, is there a particular reason why we're keeping Debian 7
coverage in the buildfarm? I don't want to be in a huge rush to kill
platforms people still care about, but it was pointed out to me
off-list that this is quite an old release -- it seems Debian 7 was
first released in 2013, last released in 2016, EOL in 2018. I assume
that not too many people are going to install a PostgreSQL release
that comes out in 2025 on an OS that's been EOL for 7 years (or 12
years if the BF page is correct that this is actually Debian 7.0).
Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and
13, but not 8 or 10. Is there a theory behind all of this or is the
current situation somewhat accidental?Fairly accidental, I think.
We do have a project at EDB at fill in certain gaps in buildfarm coverage,
so maybe we can reduce the incidence of such accidents.
I think the way to fix the gap is to drop the buildfarm animal running an OS
that has been unsupported for 7 years / without security fixes for 9 years,
not to add an animal running an OS that has been unsupported for 4 years /
without security fixes for 6 years (i.e. Debian 8).
Debian 9 has been out of support for 2 years / without security fixes for 4.
Debian 10 is also out of LTS support, albeit more recently 30 June 2024 and
has been out of security support for 2 1/2 years.
Keeping this old stuff around is a burden on everyone that commits stuff and
probably on some contributors too.
I'd not necessarily fight hard to drop a perfectly working Debian 10 animal,
but adding a new one at this point makes no sense whatsoever.
Greetings,
Andres
Andrew Dunstan <andrew@dunslane.net> writes:
On 2025-03-06 Th 10:45 AM, Robert Haas wrote:
By the way, is there a particular reason why we're keeping Debian 7
coverage in the buildfarm? I don't want to be in a huge rush to kill
platforms people still care about, but it was pointed out to me
off-list that this is quite an old release -- it seems Debian 7 was
first released in 2013, last released in 2016, EOL in 2018. I assume
that not too many people are going to install a PostgreSQL release
that comes out in 2025 on an OS that's been EOL for 7 years (or 12
years if the BF page is correct that this is actually Debian 7.0).
I don't think that's the way to think about old buildfarm members.
Sure, nobody is very likely to be putting PG 18 on a Debian 7 box,
but the odds are much higher that they might have PG 13 on it and
wish to update to 13.latest. So what you need to compare OS EOL
dates to is not current development but our oldest supported branch.
Having said that, PG 13 came out in 2020, so yeah it'd be reasonable
to retire Debian 7 buildfarm animals now. But the gap isn't nearly
as large as you make it sound.
We do have a project at EDB at fill in certain gaps in buildfarm
coverage, so maybe we can reduce the incidence of such accidents.
+1
regards, tom lane
On Thu, Mar 6, 2025 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't think that's the way to think about old buildfarm members.
Sure, nobody is very likely to be putting PG 18 on a Debian 7 box,
but the odds are much higher that they might have PG 13 on it and
wish to update to 13.latest. So what you need to compare OS EOL
dates to is not current development but our oldest supported branch.
But the work it's creating is mostly because it's still testing
master. If it were only testing a gradually-decreasing set of older
branches, that wouldn't seem weird to me.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Mar 6, 2025 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I don't think that's the way to think about old buildfarm members.
Sure, nobody is very likely to be putting PG 18 on a Debian 7 box,
but the odds are much higher that they might have PG 13 on it and
wish to update to 13.latest. So what you need to compare OS EOL
dates to is not current development but our oldest supported branch.
But the work it's creating is mostly because it's still testing
master. If it were only testing a gradually-decreasing set of older
branches, that wouldn't seem weird to me.
I don't think it's reasonable to ask buildfarm owners to set up their
animals like that, because (AFAIK) it requires tedious, error-prone
configuration of moving parts that we don't supply, like cron scripts.
If there were some trivial way to do that, it'd be more acceptable.
Maybe invent a build-farm.conf option like "newest_branch_to_build"?
branches_to_build covers some adjacent territory, but its filtering
options go the wrong way (only branches newer than X, whereas what
we want here is only branches older than X); probably we could
also address this with more options there.
regards, tom lane
On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
If there were some trivial way to do that, it'd be more acceptable.
Maybe invent a build-farm.conf option like "newest_branch_to_build"?
branches_to_build covers some adjacent territory, but its filtering
options go the wrong way (only branches newer than X, whereas what
we want here is only branches older than X); probably we could
also address this with more options there.
Yes, that would be nice. I also think we should mandate the use of
that option for OS versions that are EOL for more than X years, for
some to-be-determined value of X, like maybe 3 or something. Right
now, in the absence of any policy, and in the absence also of any
agreement on what the policy should be, we have to have a huge mailing
list thread every time somebody wants to get rid of an OS. I think it
should just be automatic. We don't need to give up -- and IMHO
shouldn't give up -- on an OS the moment the vendor pulls the plug
either on a certain release or on the system in general, but we
shouldn't have to individually litigate every case, either. Right now
half of when we desupport an OS seems to boil down to when the hard
drive on the last remaining server anybody has access to finally dies,
but that leads to weird outcomes where some operating systems are not
tested even though they are still in active use and others continue to
get tested long after they are not. To me, it all just feels a bit too
random, and I think we would be well-served by being more intentional
about it.
--
Robert Haas
EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe invent a build-farm.conf option like "newest_branch_to_build"?
Yes, that would be nice. I also think we should mandate the use of
that option for OS versions that are EOL for more than X years, for
some to-be-determined value of X, like maybe 3 or something.
It's hard to "mandate" anything in a distributed project like this.
I don't really see a need to either, at least for cases where an
old animal isn't causing us extra work. When it does, though,
it'd be nice to be able to decide "we're not gonna support that
OS version beyond PG nn", and then have a simple recipe to give
the BF owner that's less drastic than "shut it down".
regards, tom lane