More efficient build farm animal wakeup?

Started by Thomas Munroover 3 years ago28 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

Is there a way to find out about new git commits that is more
efficient and timely than running N git fetches or whatever every
minute in a cron job? Maybe some kind of long polling where you send
an HTTP request that says "I think the tips of branches x, y, z are at
111, 222, 333" and the server responds when that ceases to be true?

#2Magnus Hagander
magnus@hagander.net
In reply to: Thomas Munro (#1)
Re: More efficient build farm animal wakeup?

On Sat, Nov 19, 2022 at 4:13 AM Thomas Munro <thomas.munro@gmail.com> wrote:

Hi,

Is there a way to find out about new git commits that is more
efficient and timely than running N git fetches or whatever every
minute in a cron job? Maybe some kind of long polling where you send
an HTTP request that says "I think the tips of branches x, y, z are at
111, 222, 333" and the server responds when that ceases to be true?

I'm not aware of any such thing standardized for git, but it wouldn't be
hard to build one for that (I'm talking primarily about the server side
here, not how to integrate that into the buildfarm side of things).

We could also set something up whereby we could fire off webhooks when
branches change (easy enough for registered servers in the buildfarm as we
can easily avoid abuse there -- it would take more work to make something
like that a public service, due to the risk of abuse). But that may
actually be worse off, since I bet a lot of buildfarm animals (most even?)
are probably sitting behind a NAT gateway of some kind, meaning consuming
webhooks is hard.

I did something similar for how we did things on borka (using some internal
pginfra webhooks that are not available to the public at this time), but I
had to revert that because of issues with concurrent buildfarm runs in the
environment that we had set up. But we are using it for the snapshots docs
builder, to make sure the website for that gets updated immediately after a
commit on master. But the principle definitely work.

Another thing to consider would be that something like this would cause all
buildfarm clients to start git pull:ing down changes at more or less
exactly the same time. Though in total that would probably still mean a lot
less load than those that "git pul" very frequently today, it could
potentially lead to some nets with lots of bf clients experiencing some
level of bandwidth filling or something. Could probably be solved pretty
easily with a random delay (which doesn't have to be long, as for most git
pulls it will be a very quick operation), just something that's worth
considering.

tl,tr; it's not there now, but yes if we can find a smart way for th ebf
clients to consume it, it is something we could build and deploy fairly
easily.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Thomas Munro (#1)
Re: More efficient build farm animal wakeup?

On 2022-11-18 Fr 22:12, Thomas Munro wrote:

Hi,

Is there a way to find out about new git commits that is more
efficient and timely than running N git fetches or whatever every
minute in a cron job? Maybe some kind of long polling where you send
an HTTP request that says "I think the tips of branches x, y, z are at
111, 222, 333" and the server responds when that ceases to be true?

It might not suit your use case, but one of the things I do to reduce
fetch load is to run a local mirror which runs

   git fetch -q --prune

every 5 minutes. It also runs a git daemon, and several of my animals
point at that.

If there's a better git API I'll be happy to try to use it.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Magnus Hagander (#2)
Re: More efficient build farm animal wakeup?

On Sun, Nov 20, 2022 at 1:35 AM Magnus Hagander <magnus@hagander.net> wrote:

tl,tr; it's not there now, but yes if we can find a smart way for th ebf clients to consume it, it is something we could build and deploy fairly easily.

Cool -- it sounds a lot like you've thought about this already :-)

About the client: currently run_branches.pl makes an HTTP request for
the "branches of interest" list. Seems like a candidate point for a
long poll? I don't think it'd have to be much smarter than it is
today, it'd just have to POST the commits it already has, I think.

Perhaps as a first step, the server could immediately report which
branches to bother fetching, considering the client's existing
commits. That'd almost always be none, but ~11.7 times per day a new
commit shows up, and once a year there's a new interesting branch.
That would avoid the need for the 6 git fetches that usually follow in
the common case, which admittedly might not be a change worth making
on its own. After all, the git fetches are probably quite similar
HTTP requests themselves, except that there 6 of them, one per branch,
and they hit the public git server instead of some hypothetical
buildfarm endpoint.

Then you could switch to long polling by letting the client say "if
currently none, I'm prepared to wait up to X seconds for a different
answer", assuming you know how to build the server side of that
(insert magic here). Of course, you can't make it too long or your
session might be dropped in the badlands between client and server,
but that's just a reason to make X configurable. I think RFC6202 says
that 120 seconds probably works fine across most kinds of links, which
means that you lower the total poll rate hitting the server, but--more
interestingly for me as a client--you minimise latency when something
finally happens. (With various keepalive tricks and/or heartbeat
streaming tricks you could possibly make it much higher, who knows...
but you'd have to set it very very low to do worse than what we're
doing today in total request count). Or maybe there is some existing
easy perl library that could be used for this (joke answer: cpan
install Twitter::API and follow @pg_commits).

By the way, the reason I wrote this is because I've just been
re-establishing my animal elver. It's set to run every minute by
cron, and spends nearly *half of each minute* running various git
commands when nothing is happening. Actually it's more than 6
connections to the server, because I see there's a fetch and an
ls-remote, so it's at least 12 (being unfamiliar with git plumbing, it
could be much more for all I know, and I kinda suspect so based on the
total run time). Admittedly network packets take a little while to
fly to my South Pacific location so maybe this looks less insane from
over there.

However, when I started this thread I was half expecting such a thing
to exist already, somewhere, I just haven't been able to find it
myself... Don't other people have this problem? Maybe everybody who
has this problem uses webhooks (git server post commit hook opens
connection to client) as you mentioned, but as you also mentioned
that'd never fly for our topology.

#5Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#1)
Re: More efficient build farm animal wakeup?

Hi,

On 2022-11-19 16:12:24 +1300, Thomas Munro wrote:

Is there a way to find out about new git commits that is more
efficient and timely than running N git fetches or whatever every
minute in a cron job? Maybe some kind of long polling where you send
an HTTP request that says "I think the tips of branches x, y, z are at
111, 222, 333" and the server responds when that ceases to be true?

I think a git fetch is actually ok for that - it doesn't take a whole lot of
resources. However run_builds.pl is more heavyweight. For one, it starts one
run_build.pl for each branch, which each then fetches from git separately. But
more importantly, each run_build.pl seems to actually do a fair bit of work
before discovering nothing has changed.

A typical log I see:

Nov 20 06:08:17 bf-valgrind-v4 run_branches.pl[3289916]: Sun Nov 20 06:08:17 2022: buildfarm run for grassquit:REL_14_STABLE starting
Nov 20 06:08:17 bf-valgrind-v4 run_branches.pl[3289916]: grassquit:REL_14_STABLE [06:08:17] checking out source ...
Nov 20 06:08:20 bf-valgrind-v4 run_branches.pl[3289916]: grassquit:REL_14_STABLE [06:08:20] checking if build run needed ...
Nov 20 06:08:20 bf-valgrind-v4 run_branches.pl[3289916]: grassquit:REL_14_STABLE [06:08:20] No build required: last status = Sat Nov 19 23:54:38 2022 GMT, cur>

So we spend three seconds in the "checking out source" stage, just to then see
that nothing has actually changed.

Greetings,

Andres Freund

#6Magnus Hagander
magnus@hagander.net
In reply to: Thomas Munro (#4)
Re: More efficient build farm animal wakeup?

On Sun, Nov 20, 2022 at 4:56 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Sun, Nov 20, 2022 at 1:35 AM Magnus Hagander <magnus@hagander.net>
wrote:

tl,tr; it's not there now, but yes if we can find a smart way for th ebf

clients to consume it, it is something we could build and deploy fairly
easily.

Cool -- it sounds a lot like you've thought about this already :-)

About the client: currently run_branches.pl makes an HTTP request for
the "branches of interest" list. Seems like a candidate point for a
long poll? I don't think it'd have to be much smarter than it is
today, it'd just have to POST the commits it already has, I think.

Um, branches of interest will only pick up when it gets a new *branch*, not
a new *commit*, so I think that would be a very different problem to solve.
And I don't think we have new branche *that* often...

Perhaps as a first step, the server could immediately report which

branches to bother fetching, considering the client's existing
commits. That'd almost always be none, but ~11.7 times per day a new
commit shows up, and once a year there's a new interesting branch.
That would avoid the need for the 6 git fetches that usually follow in
the common case, which admittedly might not be a change worth making
on its own. After all, the git fetches are probably quite similar
HTTP requests themselves, except that there 6 of them, one per branch,
and they hit the public git server instead of some hypothetical
buildfarm endpoint.

As Andres mentioned downthread, that's not a lot more lightweight than what
"git fetch" does.

The thing we'd want to avoid is having to do that so much and often. And
getting to that is going to require modification of the buildfarm client to
make it more "smart" regardless. In particular, making it do this "right"
in the face of multiple branches is probably going to be a big win.

Then you could switch to long polling by letting the client say "if

currently none, I'm prepared to wait up to X seconds for a different
answer", assuming you know how to build the server side of that
(insert magic here). Of course, you can't make it too long or your
session might be dropped in the badlands between client and server,
but that's just a reason to make X configurable. I think RFC6202 says
that 120 seconds probably works fine across most kinds of links, which
means that you lower the total poll rate hitting the server, but--more
interestingly for me as a client--you minimise latency when something
finally happens. (With various keepalive tricks and/or heartbeat
streaming tricks you could possibly make it much higher, who knows...
but you'd have to set it very very low to do worse than what we're
doing today in total request count). Or maybe there is some existing
easy perl library that could be used for this (joke answer: cpan
install Twitter::API and follow @pg_commits).

I also honestly wonder how big a problem a much longer than 120 seconds
timeout would be in practice. Since we own both the client and the server
in this case, we'd only be at mercy of network equipment in between and I
think we're much less exposed to weirdness there than "the average
browser". Thus, as long as it's configurable, I think we could go for
something much longer by default.

I'd imagine something like a
GET https://git.postgresql.org/buildfarm-branchtips
X-branch-master: a4adc31f69
X-branch-REL_14_STABLE: b33283cbd3
X-longpoll: 120

For that one it would check branch master and rel 14, and if either
branchtip doesn't match what was in the header, it'd return immediately
with a textfile that's basically
master:<whateveritis>

if master has changed and not REL_14.

If nothing has changed, go into longpoll for 120 seconds based on the
header, and if nothing at all has changed in that time, return a 304.

We could also use something like a websocket to just stream the changes out
over.

In either case it would also need to change the buildfarm client to run as
a daemon rather than a cronjob I think? (obviously optional, we don't have
to remove the current abilities)

However, when I started this thread I was half expecting such a thing

to exist already, somewhere, I just haven't been able to find it
myself... Don't other people have this problem? Maybe everybody who
has this problem uses webhooks (git server post commit hook opens
connection to client) as you mentioned, but as you also mentioned
that'd never fly for our topology.

Yeah, webhook seems to be what most people use.

FWIW, an implementation for us would be a small daemon that receives such
webhooks from our git server and redistributtes it for the long polling.
That's still the easiest way to get the data out of git itself...

//Magnus

#7Thomas Munro
thomas.munro@gmail.com
In reply to: Magnus Hagander (#6)
Re: More efficient build farm animal wakeup?

On Mon, Nov 21, 2022 at 10:31 AM Magnus Hagander <magnus@hagander.net> wrote:

Um, branches of interest will only pick up when it gets a new *branch*, not a new *commit*, so I think that would be a very different problem to solve. And I don't think we have new branche *that* often...

Sure, could be done with an extra different request you make from time
to time or keeping the existing list. No strong opinions on that, I
was just observing that it could also be combined, something like:

Client: I have 14@1234, 15@1234, HEAD@1234; what should I do now, boss?
Server: You should fetch 14 (it has a new commit) and 16 (it's a new
branch you didn't mention).

I'd imagine something like a
GET https://git.postgresql.org/buildfarm-branchtips
X-branch-master: a4adc31f69
X-branch-REL_14_STABLE: b33283cbd3
X-longpoll: 120

For that one it would check branch master and rel 14, and if either branchtip doesn't match what was in the header, it'd return immediately with a textfile that's basically
master:<whateveritis>

if master has changed and not REL_14.

If nothing has changed, go into longpoll for 120 seconds based on the header, and if nothing at all has changed in that time, return a 304.

LGTM, that's exactly the sort of thing I was imagining.

We could also use something like a websocket to just stream the changes out over.

True. The reason I started on about long polling instead of
websockets is that I was imagining that the simpler, dumber protocol
where the client doesn't even really know it's participating a new
kind of magic would be more cromulent in ye olde perl script (no new
cpan dependencies).

In either case it would also need to change the buildfarm client to run as a daemon rather than a cronjob I think? (obviously optional, we don't have to remove the current abilities)

Given that the point of the build farm is (these days) to test on
weird computers and operating systems, I expect that proper 'run like
a service' support would be painful or not get done. It'd be nice if
there were some way to make this work with simple crontab entries...

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Andrew Dunstan (#3)
Re: More efficient build farm animal wakeup?

On Sun, Nov 20, 2022 at 2:44 AM Andrew Dunstan <andrew@dunslane.net> wrote:

It might not suit your use case, but one of the things I do to reduce
fetch load is to run a local mirror which runs

git fetch -q --prune

every 5 minutes. It also runs a git daemon, and several of my animals
point at that.

Thanks. I understand now that my configuration without a local mirror
is super inefficient (it spends the first ~25s of each minute running
git commands). Still, even though that can be improved by me setting
up more stuff, I'd like something event-driven rather than short
polling-based for lower latency.

If there's a better git API I'll be happy to try to use it.

Cool. Seems like we just have to invent something first...

FWIW I'm also trying to chase the short polling out of cfbot. It
regularly harasses the git servers at one end (could be fixed with
this approach), and wastes a percentage of our allotted CPU slots on
the other end by scheduling periodically (could be fixed with webhooks
from Cirrus).

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Thomas Munro (#8)
Re: More efficient build farm animal wakeup?

On 2022-11-20 Su 17:32, Thomas Munro wrote:

On Sun, Nov 20, 2022 at 2:44 AM Andrew Dunstan <andrew@dunslane.net> wrote:

It might not suit your use case, but one of the things I do to reduce
fetch load is to run a local mirror which runs

git fetch -q --prune

every 5 minutes. It also runs a git daemon, and several of my animals
point at that.

Thanks. I understand now that my configuration without a local mirror
is super inefficient (it spends the first ~25s of each minute running
git commands). Still, even though that can be improved by me setting
up more stuff, I'd like something event-driven rather than short
polling-based for lower latency.

If there's a better git API I'll be happy to try to use it.

Cool. Seems like we just have to invent something first...

FWIW I'm also trying to chase the short polling out of cfbot. It
regularly harasses the git servers at one end (could be fixed with
this approach), and wastes a percentage of our allotted CPU slots on
the other end by scheduling periodically (could be fixed with webhooks
from Cirrus).

I think I have solved most of the actual issues without getting too complex.

Here's how:

The buildfarm server now creates a companion to branches_of_interest.txt
called branches_of_interest.json which looks like this:

[
   {
      "REL_11_STABLE" : "140c803723"
   },
   {
      "REL_12_STABLE" : "4cbcb7ed85"
   },
   {
      "REL_13_STABLE" : "c13667b518"
   },
   {
      "REL_14_STABLE" : "5cda142bb9"
   },
   {
      "REL_15_STABLE" : "ff9d27ee2b"
   },
   {
      "HEAD" : "51b5834cd5"
   }
]

It updates this every time it does a git fetch, currently every 5 minutes.

run_branches.pl fetches this file instead of the plain list of branches,
and before running run_build.pl checks if the given commit was the
latest one tested, and if so and a build isn't being forced, skips the
branch. Thus, in the case where all the branches are up to date there
will be no git calls whatsoever.

You can try it out by getting run_branches.pl from
<https://raw.githubusercontent.com/PGBuildFarm/client-code/main/run_branches.pl&gt;

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#9)
Re: More efficient build farm animal wakeup?

Andrew Dunstan <andrew@dunslane.net> writes:

The buildfarm server now creates a companion to branches_of_interest.txt
called branches_of_interest.json which looks like this:

... okay ...

It updates this every time it does a git fetch, currently every 5 minutes.

That up-to-five-minute delay, on top of whatever cronjob delay one has
on one's animals, seems kind of sad. I've gotten kind of spoiled maybe
by seeing first buildfarm results typically within 15 minutes of a push.
But if we're trying to improve matters in this area, this doesn't seem
like quite the way to go.

But it does seem like this eliminates one expense. Now that you have
that bit, maybe we could arrange a webhook or something that allows
branches_of_interest.json to get updated immediately after a push?

regards, tom lane

#11Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#10)
Re: More efficient build farm animal wakeup?

n Mon, Nov 21, 2022 at 9:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The buildfarm server now creates a companion to branches_of_interest.txt
called branches_of_interest.json which looks like this:

... okay ...

Yeah, it's not as efficient as something like long polling or web sockets,
but it is most definitely a lot simpler!

If we're going to have a lot of animals do pulls of this file every minute
or more, it's certainly a lot better to pull this small file than to make
multiple git calls.

It could trivially be made even more efficient by making the request with
either a If-None-Match or If-Modified-Since. While it's still small, that
cuts the size approximately in half, and would allow you to skip even more
processing if nothing has changed.

It updates this every time it does a git fetch, currently every 5 minutes.

That up-to-five-minute delay, on top of whatever cronjob delay one has
on one's animals, seems kind of sad. I've gotten kind of spoiled maybe
by seeing first buildfarm results typically within 15 minutes of a push.
But if we're trying to improve matters in this area, this doesn't seem
like quite the way to go.

But it does seem like this eliminates one expense. Now that you have
that bit, maybe we could arrange a webhook or something that allows
branches_of_interest.json to get updated immediately after a push?

Webhooks are definitely a lot easier to implement in between our servers
yeah, so that shouldn't be too hard. We could use the same hooks that we
use for borka to build the docs, but have it just run whatever script it is
the buildfarm needs. I assume it's just something trivial to run there,
Andrew?

//Magnus

#12Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#9)
Re: More efficient build farm animal wakeup?

On Mon, Nov 21, 2022 at 9:51 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-11-20 Su 17:32, Thomas Munro wrote:

On Sun, Nov 20, 2022 at 2:44 AM Andrew Dunstan <andrew@dunslane.net>

wrote:

It might not suit your use case, but one of the things I do to reduce
fetch load is to run a local mirror which runs

git fetch -q --prune

every 5 minutes. It also runs a git daemon, and several of my animals
point at that.

Thanks. I understand now that my configuration without a local mirror
is super inefficient (it spends the first ~25s of each minute running
git commands). Still, even though that can be improved by me setting
up more stuff, I'd like something event-driven rather than short
polling-based for lower latency.

If there's a better git API I'll be happy to try to use it.

Cool. Seems like we just have to invent something first...

FWIW I'm also trying to chase the short polling out of cfbot. It
regularly harasses the git servers at one end (could be fixed with
this approach), and wastes a percentage of our allotted CPU slots on
the other end by scheduling periodically (could be fixed with webhooks
from Cirrus).

I think I have solved most of the actual issues without getting too
complex.

Here's how:

The buildfarm server now creates a companion to branches_of_interest.txt
called branches_of_interest.json which looks like this:

[
{
"REL_11_STABLE" : "140c803723"
},
{
"REL_12_STABLE" : "4cbcb7ed85"
},
{
"REL_13_STABLE" : "c13667b518"
},
{
"REL_14_STABLE" : "5cda142bb9"
},
{
"REL_15_STABLE" : "ff9d27ee2b"
},
{
"HEAD" : "51b5834cd5"
}
]

Is there a reason this file is a list of hashes each hash with a single
value in it? Would it make more sense if it was:
{
"REL_11_STABLE": "140c803723",
"REL_12_STABLE": "4cbcb7ed85",
"REL_13_STABLE": "c13667b518",
"REL_14_STABLE": "5cda142bb9",
"REL_15_STABLE": "ff9d27ee2b",
"HEAD": "51b5834cd5"
}

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#12)
Re: More efficient build farm animal wakeup?

On 2022-11-21 Mo 16:26, Magnus Hagander wrote:

Is there a reason this file is a list of hashes each hash with a
single value in it? Would it make more sense if it was:
{
  "REL_11_STABLE": "140c803723",
  "REL_12_STABLE": "4cbcb7ed85",
  "REL_13_STABLE": "c13667b518",
  "REL_14_STABLE": "5cda142bb9",
  "REL_15_STABLE": "ff9d27ee2b",
  "HEAD": "51b5834cd5"
}
 

No. It's the way it is because the client relies on their being in the
right order. JSON hashes are conceptually unordered.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#14Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#13)
Re: More efficient build farm animal wakeup?

On Mon, Nov 21, 2022 at 11:27 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-11-21 Mo 16:26, Magnus Hagander wrote:

Is there a reason this file is a list of hashes each hash with a
single value in it? Would it make more sense if it was:
{
"REL_11_STABLE": "140c803723",
"REL_12_STABLE": "4cbcb7ed85",
"REL_13_STABLE": "c13667b518",
"REL_14_STABLE": "5cda142bb9",
"REL_15_STABLE": "ff9d27ee2b",
"HEAD": "51b5834cd5"
}

No. It's the way it is because the client relies on their being in the
right order. JSON hashes are conceptually unordered.

Ah yeah, if they need to be ordered that certainly makes more sense.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#10)
Re: More efficient build farm animal wakeup?

On 2022-11-21 Mo 15:58, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The buildfarm server now creates a companion to branches_of_interest.txt
called branches_of_interest.json which looks like this:

... okay ...

It updates this every time it does a git fetch, currently every 5 minutes.

That up-to-five-minute delay, on top of whatever cronjob delay one has
on one's animals, seems kind of sad. I've gotten kind of spoiled maybe
by seeing first buildfarm results typically within 15 minutes of a push.
But if we're trying to improve matters in this area, this doesn't seem
like quite the way to go.

Well, 5 minutes was originally chosen because it was sufficient for the
purpose for which up to now the server used its mirror. Now we have
added a new purpose we can certainly revisit that. Shall I try 2 minutes
or go down to 1?

But it does seem like this eliminates one expense. Now that you have
that bit, maybe we could arrange a webhook or something that allows
branches_of_interest.json to get updated immediately after a push?

Sure, if you think and extra few seconds is worth saving.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#15)
Re: More efficient build farm animal wakeup?

Andrew Dunstan <andrew@dunslane.net> writes:

On 2022-11-21 Mo 15:58, Tom Lane wrote:

But if we're trying to improve matters in this area, this doesn't seem
like quite the way to go.

Well, 5 minutes was originally chosen because it was sufficient for the
purpose for which up to now the server used its mirror. Now we have
added a new purpose we can certainly revisit that. Shall I try 2 minutes
or go down to 1?

Actually, if we implement a webhook to update this, the server could
stop doing speculative git pulls too, no?

regards, tom lane

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#11)
Re: More efficient build farm animal wakeup?

On 2022-11-21 Mo 16:20, Magnus Hagander wrote:

n Mon, Nov 21, 2022 at 9:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The buildfarm server now creates a companion to

branches_of_interest.txt

called branches_of_interest.json which looks like this:

... okay ...

Yeah, it's not as efficient as something like long polling or web
sockets, but it is most definitely a lot simpler!

If we're going to have a lot of animals do pulls of this file every
minute or more, it's certainly a lot better to pull this small file
than to make multiple git calls.

It could trivially be made even more efficient by making the request
with either a If-None-Match or If-Modified-Since. While it's still
small, that cuts the size approximately in half, and would allow you
to skip even more processing if nothing has changed.

I'll look at that.

It updates this every time it does a git fetch, currently every

5 minutes.

That up-to-five-minute delay, on top of whatever cronjob delay one has
on one's animals, seems kind of sad.  I've gotten kind of spoiled
maybe
by seeing first buildfarm results typically within 15 minutes of a
push.
But if we're trying to improve matters in this area, this doesn't seem
like quite the way to go.

But it does seem like this eliminates one expense.  Now that you have
that bit, maybe we could arrange a webhook or something that allows
branches_of_interest.json to get updated immediately after a push?

Webhooks are definitely a lot easier to implement in between our
servers yeah, so that shouldn't be too hard. We could use the same
hooks that we use for borka to build the docs, but have it just run
whatever script it is the buildfarm needs. I assume it's just
something trivial to run there, Andrew?

Yes, I think much better between servers. Currently the cron job looks
something like this:

*/5 * * * * cd $HOME/postgresql.git && git fetch -q &&
$HOME/website/bin/branches_of_interest.pl

That script is what sets up the json files.

I know nothing about git webhooks though, someone will have to point me
in the right direction.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#18Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#16)
Re: More efficient build farm animal wakeup?

On Mon, Nov 21, 2022 at 11:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 2022-11-21 Mo 15:58, Tom Lane wrote:

But if we're trying to improve matters in this area, this doesn't seem
like quite the way to go.

Well, 5 minutes was originally chosen because it was sufficient for the
purpose for which up to now the server used its mirror. Now we have
added a new purpose we can certainly revisit that. Shall I try 2 minutes
or go down to 1?

Actually, if we implement a webhook to update this, the server could
stop doing speculative git pulls too, no?

That would be the main point, yes. Saves a few hundred (or thousand)
wasteful git pulls *and* reacts quicker to actual pushes. As long as you
have a clear line of communications between the machines, it's basically
win/win I think. That's probably why, as Thomas noticed earlier, that's
what "everybody" does.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#19Magnus Hagander
magnus@hagander.net
In reply to: Andrew Dunstan (#17)
Re: More efficient build farm animal wakeup?

On Mon, Nov 21, 2022 at 11:42 PM Andrew Dunstan <andrew@dunslane.net> wrote:

On 2022-11-21 Mo 16:20, Magnus Hagander wrote:

n Mon, Nov 21, 2022 at 9:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

The buildfarm server now creates a companion to

branches_of_interest.txt

called branches_of_interest.json which looks like this:

... okay ...

Yeah, it's not as efficient as something like long polling or web
sockets, but it is most definitely a lot simpler!

If we're going to have a lot of animals do pulls of this file every
minute or more, it's certainly a lot better to pull this small file
than to make multiple git calls.

It could trivially be made even more efficient by making the request
with either a If-None-Match or If-Modified-Since. While it's still
small, that cuts the size approximately in half, and would allow you
to skip even more processing if nothing has changed.

I'll look at that.

It updates this every time it does a git fetch, currently every

5 minutes.

That up-to-five-minute delay, on top of whatever cronjob delay one

has

on one's animals, seems kind of sad. I've gotten kind of spoiled
maybe
by seeing first buildfarm results typically within 15 minutes of a
push.
But if we're trying to improve matters in this area, this doesn't

seem

like quite the way to go.

But it does seem like this eliminates one expense. Now that you have
that bit, maybe we could arrange a webhook or something that allows
branches_of_interest.json to get updated immediately after a push?

Webhooks are definitely a lot easier to implement in between our
servers yeah, so that shouldn't be too hard. We could use the same
hooks that we use for borka to build the docs, but have it just run
whatever script it is the buildfarm needs. I assume it's just
something trivial to run there, Andrew?

Yes, I think much better between servers. Currently the cron job looks
something like this:

*/5 * * * * cd $HOME/postgresql.git && git fetch -q &&
$HOME/website/bin/branches_of_interest.pl

That script is what sets up the json files.

I know nothing about git webhooks though, someone will have to point me
in the right direction.

I can set that up for you -- we have ready-made packages for 95% of what's
needed for that one as we use it elsewhere in the infra. So I'll just set
something up that will run that exact script (as the correct user of
course) and comment out the cronjob,and then send you the details of what
is set up where (I don't recall it offhand, but as it's the same we have
elsewhere I'll find it quickly once I look into it).

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#20Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#19)
Re: More efficient build farm animal wakeup?

On Tue, Nov 22, 2022 at 12:10 AM Magnus Hagander <magnus@hagander.net>
wrote:

On Mon, Nov 21, 2022 at 11:42 PM Andrew Dunstan <andrew@dunslane.net>
wrote:

On 2022-11-21 Mo 16:20, Magnus Hagander wrote:

n Mon, Nov 21, 2022 at 9:58 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

b> > The buildfarm server now creates a companion to

branches_of_interest.txt

called branches_of_interest.json which looks like this:

... okay ...

Yeah, it's not as efficient as something like long polling or web
sockets, but it is most definitely a lot simpler!

If we're going to have a lot of animals do pulls of this file every
minute or more, it's certainly a lot better to pull this small file
than to make multiple git calls.

It could trivially be made even more efficient by making the request
with either a If-None-Match or If-Modified-Since. While it's still
small, that cuts the size approximately in half, and would allow you
to skip even more processing if nothing has changed.

I'll look at that.

It updates this every time it does a git fetch, currently every

5 minutes.

That up-to-five-minute delay, on top of whatever cronjob delay one

has

on one's animals, seems kind of sad. I've gotten kind of spoiled
maybe
by seeing first buildfarm results typically within 15 minutes of a
push.
But if we're trying to improve matters in this area, this doesn't

seem

like quite the way to go.

But it does seem like this eliminates one expense. Now that you

have

that bit, maybe we could arrange a webhook or something that allows
branches_of_interest.json to get updated immediately after a push?

Webhooks are definitely a lot easier to implement in between our
servers yeah, so that shouldn't be too hard. We could use the same
hooks that we use for borka to build the docs, but have it just run
whatever script it is the buildfarm needs. I assume it's just
something trivial to run there, Andrew?

Yes, I think much better between servers. Currently the cron job looks
something like this:

*/5 * * * * cd $HOME/postgresql.git && git fetch -q &&
$HOME/website/bin/branches_of_interest.pl

That script is what sets up the json files.

I know nothing about git webhooks though, someone will have to point me
in the right direction.

I can set that up for you -- we have ready-made packages for 95% of what's
needed for that one as we use it elsewhere in the infra. So I'll just set
something up that will run that exact script (as the correct user of
course) and comment out the cronjob,and then send you the details of what
is set up where (I don't recall it offhand, but as it's the same we have
elsewhere I'll find it quickly once I look into it).

Hi!

This should now be set up, and Andrew has been sent the instructions for
how to access that setup on the buildfarm server. So hopefully it will now
be updating the buildfarm server side of things within a couple of seconds
from a commit, and not do any speculative pulls. But we'll keep an extra
eye on it for a bit of course, as it's entirely possible I got something
worng :)

(This is only the part git -> bf server, of course, as that step doesn't
need any client changes it was easier to do quickly)

//Magnus

#21Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#21)
#23Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#21)
#24Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#23)
#25Magnus Hagander
magnus@hagander.net
In reply to: Thomas Munro (#24)
#26Thomas Munro
thomas.munro@gmail.com
In reply to: Magnus Hagander (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#26)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#27)