The buildfarm is in a pretty bad way, folks

Started by Tom Lanealmost 8 years ago9 messages
#1Tom Lane
tgl@sss.pgh.pa.us

It sure looks like there's been a frantic push to commit stuff that
maybe wasn't quite fully baked. I'm not terribly on board with that,
because it's likely to be hard to disentangle who broke what.
But in particular, it's clear that partition_prune and
isolation/checksum_cancel are showing big problems.

regards, tom lane

#2Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#1)
Re: The buildfarm is in a pretty bad way, folks

Hi,

On 2018-04-06 16:59:11 -0400, Tom Lane wrote:

It sure looks like there's been a frantic push to commit stuff that
maybe wasn't quite fully baked. I'm not terribly on board with that,
because it's likely to be hard to disentangle who broke what.
But in particular, it's clear that partition_prune and
isolation/checksum_cancel are showing big problems.

While I'm obviously also unhappy about the frantic push to push semi
baked stuff, I'm not sure the two issues you point to above are that
good examples of carelessness. At least the latter seems mostly a pretty
normal portability thing around orderedness?

Greetings,

Andres Freund

#3Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#1)
Re: The buildfarm is in a pretty bad way, folks

On Fri, Apr 6, 2018 at 10:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It sure looks like there's been a frantic push to commit stuff that
maybe wasn't quite fully baked. I'm not terribly on board with that,
because it's likely to be hard to disentangle who broke what.
But in particular, it's clear that partition_prune and
isolation/checksum_cancel are showing big problems.

Daniel is working on investigating the isolationtester thing. See a mail on
one of the threads where initial indications were the "atomics with no real
atomics" (or whatever you'd call it) were to blame. We could redo that
thing without atomics to get rid of that (and possibly should), but it
would be good to figure out if it's actually broken first, so that part can
get fixed if it is.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#4Andres Freund
andres@anarazel.de
In reply to: Magnus Hagander (#3)
Re: The buildfarm is in a pretty bad way, folks

On 2018-04-06 23:12:19 +0200, Magnus Hagander wrote:

Daniel is working on investigating the isolationtester thing. See a mail on
one of the threads where initial indications were the "atomics with no real
atomics" (or whatever you'd call it) were to blame. We could redo that
thing without atomics to get rid of that (and possibly should), but it
would be good to figure out if it's actually broken first, so that part can
get fixed if it is.

Is that an explanation for
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gharial&amp;dt=2018-04-06%2019%3A18%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lousyjack&amp;dt=2018-04-06%2016%3A03%3A01
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2018-04-06%2015%3A46%3A16
? Those all don't seem fall under that? Having proper atomics?

Greetings,

Andres Freund

#5Magnus Hagander
magnus@hagander.net
In reply to: Andres Freund (#4)
Re: The buildfarm is in a pretty bad way, folks

On Fri, Apr 6, 2018 at 11:19 PM, Andres Freund <andres@anarazel.de> wrote:

On 2018-04-06 23:12:19 +0200, Magnus Hagander wrote:

Daniel is working on investigating the isolationtester thing. See a mail

on

one of the threads where initial indications were the "atomics with no

real

atomics" (or whatever you'd call it) were to blame. We could redo that
thing without atomics to get rid of that (and possibly should), but it
would be good to figure out if it's actually broken first, so that part

can

get fixed if it is.

Is that an explanation for
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
gharial&dt=2018-04-06%2019%3A18%3A11
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
lousyjack&dt=2018-04-06%2016%3A03%3A01
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=
sungazer&dt=2018-04-06%2015%3A46%3A16
? Those all don't seem fall under that? Having proper atomics?

No, sorry, bad wording. The initial indications were that, that's not the
*only* indications. There is possibly/probably more than one thing.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#6Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#1)
Re: The buildfarm is in a pretty bad way, folks

Tom Lane wrote:

It sure looks like there's been a frantic push to commit stuff that
maybe wasn't quite fully baked. I'm not terribly on board with that,
because it's likely to be hard to disentangle who broke what.
But in particular, it's clear that partition_prune and
isolation/checksum_cancel are showing big problems.

The partition_prune failure is clearly a minor portability issue which
I'll investigate after I pick up the kids. From where I sit, if we let
that patch bake any more, it will burn in the oven.

Partition prune also broke the sepgsql test also -- I think because one
partition is no longer scanned. Seems a reasonable thing to me, just
need to update the expected file. But I'll look closer.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#2)
Re: The buildfarm is in a pretty bad way, folks

Andres Freund <andres@anarazel.de> writes:

On 2018-04-06 16:59:11 -0400, Tom Lane wrote:

But in particular, it's clear that partition_prune and
isolation/checksum_cancel are showing big problems.

While I'm obviously also unhappy about the frantic push to push semi
baked stuff, I'm not sure the two issues you point to above are that
good examples of carelessness. At least the latter seems mostly a pretty
normal portability thing around orderedness?

I'm just venting, perhaps, but if there's a good reason for that
to have been left broken for ~24 hours, I don't know what it is.
It's getting in the way of testing other recent commits.

(I'm also not real happy about the amount of time the checksum-xxx
tests consume.)

regards, tom lane

#8Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#7)
Re: The buildfarm is in a pretty bad way, folks

On Fri, Apr 6, 2018 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2018-04-06 16:59:11 -0400, Tom Lane wrote:

But in particular, it's clear that partition_prune and
isolation/checksum_cancel are showing big problems.

While I'm obviously also unhappy about the frantic push to push semi
baked stuff, I'm not sure the two issues you point to above are that
good examples of carelessness. At least the latter seems mostly a pretty
normal portability thing around orderedness?

I'm just venting, perhaps, but if there's a good reason for that
to have been left broken for ~24 hours, I don't know what it is.
It's getting in the way of testing other recent commits.

(I'm also not real happy about the amount of time the checksum-xxx
tests consume.)

The isolation tester ones, or the regular ones? Because the regular ones
finish in << 30 seconds here, just wondering if that actually counts as too
time consuming in this type of tests?

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/&gt;
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/&gt;

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#8)
Re: The buildfarm is in a pretty bad way, folks

Magnus Hagander <magnus@hagander.net> writes:

On Fri, Apr 6, 2018 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

(I'm also not real happy about the amount of time the checksum-xxx
tests consume.)

The isolation tester ones, or the regular ones? Because the regular ones
finish in << 30 seconds here, just wondering if that actually counts as too
time consuming in this type of tests?

The isolationtester ones. Looking at longfin, which while not a speed
demon isn't real slow either, the isolation-check step was taking 2:05
two days ago and now it's at 2:48. That's a pretty big incremental
jump for one feature.

regards, tom lane