Three animals fail test-decoding-check on REL_10_STABLE

Started by Thomas Munroabout 7 years ago10 messages
#1Thomas Munro
thomas.munro@enterprisedb.com

Hi,

Only gaur shows useful logs:

SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot',
'test_decoding');
! ERROR: could not access file "test_decoding": No such file or directory

Does this mean it didn't build the test_decoding module?

Of the failing animals, damselfly builds with the highest frequency,
and it reports the following 4 commits between the first failure[1]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2019-01-01%2010%3A39%3A41
and the preceding success (and has been failing ever since):

962da60591 Tue Jan 1 01:39:34 2019 UTC Fix generation of padding
message before encrypting Elgamal in pgcrypto
bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC Process EXTRA_INSTALL
serially, during the first temp-install.
e7ebc8c285 Mon Dec 31 21:55:04 2018 UTC Send EXTRA_INSTALL errors to
install.log, not stderr.
7c97b0f55e Mon Dec 31 21:51:18 2018 UTC pg_regress: Promptly detect
failed postmaster startup.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2019-01-01%2010%3A39%3A41

--
Thomas Munro
http://www.enterprisedb.com

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#1)
Re: Three animals fail test-decoding-check on REL_10_STABLE

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Only gaur shows useful logs:

SELECT 'init' FROM
pg_create_logical_replication_slot('regression_slot',
'test_decoding');
! ERROR: could not access file "test_decoding": No such file or directory

Does this mean it didn't build the test_decoding module?

I'm wondering if it built it but didn't install it, as a result of
some problem with

bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC Process EXTRA_INSTALL
serially, during the first temp-install.

Will take a look later, but since gaur is so slow, it may be awhile
before I have any answers.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: Three animals fail test-decoding-check on REL_10_STABLE

I wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Does this mean it didn't build the test_decoding module?

I'm wondering if it built it but didn't install it, as a result of
some problem with

bedda9fbb7 Mon Dec 31 21:57:57 2018 UTC Process EXTRA_INSTALL
serially, during the first temp-install.

So it appears that in v10,

./configure ... --enable-tap-tests ...
make
make install
cd contrib/test_decoding
make check

fails due to failure to install test_decoding into the tmp_install
tree, while it works in v11. Moreover, that's not specific to
gaur: it happens on my Linux box too. I'm not very sure why only
three buildfarm animals are unhappy --- maybe in the buildfarm
context it requires a specific combination of options to show the
problem.

There's no obvious difference between bedda9fbb and 6dd690be3,
so I surmise that that patch depended somehow on some previous
work that only went into v11 not v10. Haven't found what, yet.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: Three animals fail test-decoding-check on REL_10_STABLE

I wrote:

There's no obvious difference between bedda9fbb and 6dd690be3,
so I surmise that that patch depended somehow on some previous
work that only went into v11 not v10. Haven't found what, yet.

Ah, looks like it was 42e61c774. I'll push a fix shortly.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#3)
Re: Three animals fail test-decoding-check on REL_10_STABLE

I wrote:

So it appears that in v10,
./configure ... --enable-tap-tests ...
make
make install
cd contrib/test_decoding
make check
fails due to failure to install test_decoding into the tmp_install
tree, while it works in v11. Moreover, that's not specific to
gaur: it happens on my Linux box too. I'm not very sure why only
three buildfarm animals are unhappy --- maybe in the buildfarm
context it requires a specific combination of options to show the
problem.

While I think I've fixed this bug, I'm still quite confused about why
only some buildfarm animals showed the problem. Comparing log files,
it seems that the ones that were working were relying on having
done a complete temp-install at a higher level, while the ones that
were failing were trying to make a temp install from scratch in
contrib/test_decoding and hence seeing the bug. For example,
longfin's test-decoding-check log starts out

napshot: 2019-01-11 21:12:17

/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../src/test/regress all
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../../src/port all
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../backend submake-errcodes
make[3]: Nothing to be done for `submake-errcodes'.

while gaur's starts out

Snapshot: 2019-01-11 07:30:45

rm -rf '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install
/bin/sh ../../config/install-sh -c -d '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log
make -C '../..' DESTDIR='/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install install >'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1
make -j1 checkprep >>'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1
make -C ../../src/test/regress all
make[1]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/test/regress'
make -C ../../../src/port all
make[2]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/port'
make -C ../backend submake-errcodes
make[3]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/backend'
make[3]: Nothing to be done for `submake-errcodes'.

These two animals are running the same buildfarm client version,
and I don't see any relevant difference in their configurations,
so why are they behaving differently? Andrew, any ideas?

regards, tom lane

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#5)
Re: Three animals fail test-decoding-check on REL_10_STABLE

On 1/11/19 6:33 PM, Tom Lane wrote:

I wrote:

So it appears that in v10,
./configure ... --enable-tap-tests ...
make
make install
cd contrib/test_decoding
make check
fails due to failure to install test_decoding into the tmp_install
tree, while it works in v11. Moreover, that's not specific to
gaur: it happens on my Linux box too. I'm not very sure why only
three buildfarm animals are unhappy --- maybe in the buildfarm
context it requires a specific combination of options to show the
problem.

While I think I've fixed this bug, I'm still quite confused about why
only some buildfarm animals showed the problem. Comparing log files,
it seems that the ones that were working were relying on having
done a complete temp-install at a higher level, while the ones that
were failing were trying to make a temp install from scratch in
contrib/test_decoding and hence seeing the bug. For example,
longfin's test-decoding-check log starts out

napshot: 2019-01-11 21:12:17

/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../src/test/regress all
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../../../src/port all
/Applications/Xcode.app/Contents/Developer/usr/bin/make -C ../backend submake-errcodes
make[3]: Nothing to be done for `submake-errcodes'.

while gaur's starts out

Snapshot: 2019-01-11 07:30:45

rm -rf '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install
/bin/sh ../../config/install-sh -c -d '/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log
make -C '../..' DESTDIR='/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install install >'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1
make -j1 checkprep >>'/home/bfarm/bf-data/REL_10_STABLE/pgsql.build'/tmp_install/log/install.log 2>&1
make -C ../../src/test/regress all
make[1]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/test/regress'
make -C ../../../src/port all
make[2]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/port'
make -C ../backend submake-errcodes
make[3]: Entering directory `/home/bfarm/bf-data/REL_10_STABLE/pgsql.build/src/backend'
make[3]: Nothing to be done for `submake-errcodes'.

These two animals are running the same buildfarm client version,
and I don't see any relevant difference in their configurations,
so why are they behaving differently? Andrew, any ideas?

Possibly an error in 
https://github.com/PGBuildFarm/client-code/commit/3026438dcefebcc6fe2d44eb7b60812e257a0614

It looks like longfin detects that it has all it needs to proceed, and
so calls make with "NO_INSTALL=yes", but gaur doesn't.  Not sure why
that would be - if anything I'd expect the test to fail on OSX rather
than HP-UX. Is there something weird about naming of library files on HP-UX?

cheers

andrew

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#6)
Re: Three animals fail test-decoding-check on REL_10_STABLE

Andrew Dunstan <andrew@dunslane.net> writes:

On 1/11/19 6:33 PM, Tom Lane wrote:

While I think I've fixed this bug, I'm still quite confused about why
only some buildfarm animals showed the problem.

... Is there something weird about naming of library files on HP-UX?

Doh! I looked right at this code last night, but it failed to click:

# these files should be present if we've temp_installed everything,
# and not if we haven't. The represent core, contrib and test_modules.
return ( (-d $tmp_loc)
&& (-f "$bindir/postgres" || -f "$bindir/postgres.exe")
&& (-f "$libdir/hstore.so" || -f "$libdir/hstore.dll")
&& (-f "$libdir/test_parser.so" || -f "$libdir/test_parser.dll"));

On HPUX (at least the version gaur is running), the extension for
shared libraries is ".sl" not ".so".

That doesn't explain the failures on damselfly and koreaceratops,
but they're both running very old buildfarm clients, which most
likely just don't have the optimization to share a temp-install.

I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port
instead of listing all the possibilities here. But I'm not sure how
you'd deal with this bit in Makefile.hpux:

ifeq ($(host_cpu), ia64)
DLSUFFIX = .so
else
DLSUFFIX = .sl
endif

Anyway, the bigger picture here is that the shared-temp-install
optimization is masking bugs in local "make check" rules. Not
sure how much we care about that, though. Any such bug is only
of interest to developers, and it only matters if someone actually
stumbles over it.

regards, tom lane

#8Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Tom Lane (#7)
Re: Three animals fail test-decoding-check on REL_10_STABLE

On 1/12/19 2:03 PM, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

On 1/11/19 6:33 PM, Tom Lane wrote:

While I think I've fixed this bug, I'm still quite confused about why
only some buildfarm animals showed the problem.

... Is there something weird about naming of library files on HP-UX?

Doh! I looked right at this code last night, but it failed to click:

# these files should be present if we've temp_installed everything,
# and not if we haven't. The represent core, contrib and test_modules.
return ( (-d $tmp_loc)
&& (-f "$bindir/postgres" || -f "$bindir/postgres.exe")
&& (-f "$libdir/hstore.so" || -f "$libdir/hstore.dll")
&& (-f "$libdir/test_parser.so" || -f "$libdir/test_parser.dll"));

On HPUX (at least the version gaur is running), the extension for
shared libraries is ".sl" not ".so".

That doesn't explain the failures on damselfly and koreaceratops,
but they're both running very old buildfarm clients, which most
likely just don't have the optimization to share a temp-install.

Yes, they are on an older version that doesn't use the NO_TEMP_INSTALL
flag at all.

I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port
instead of listing all the possibilities here. But I'm not sure how
you'd deal with this bit in Makefile.hpux:

ifeq ($(host_cpu), ia64)
DLSUFFIX = .so
else
DLSUFFIX = .sl
endif

I'd rather get make to tell us directly, something like:

.PHONY: show_dl_suffix
show_dl_suffix:
    @echo $(DLSUFFIX)

I can arrange something like that in the buildfarm code if we think the
use case is too narrow.

Anyway, the bigger picture here is that the shared-temp-install
optimization is masking bugs in local "make check" rules. Not
sure how much we care about that, though. Any such bug is only
of interest to developers, and it only matters if someone actually
stumbles over it.

right.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#8)
Re: Three animals fail test-decoding-check on REL_10_STABLE

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 1/12/19 2:03 PM, Tom Lane wrote:

I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port
instead of listing all the possibilities here.

I'd rather get make to tell us directly, something like:
.PHONY: show_dl_suffix
show_dl_suffix:
    @echo $(DLSUFFIX)

No objection here, but of course you'd have to back-patch that into
all active branches.

(The Darwin case is slightly exciting, but it looks like you'd get
the right answer as long as Makefile.shlib doesn't get involved.)

regards, tom lane

#10Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Tom Lane (#9)
Re: Three animals fail test-decoding-check on REL_10_STABLE

On 1/13/19 9:24 AM, Tom Lane wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 1/12/19 2:03 PM, Tom Lane wrote:

I wonder if it's practical to scrape DLSUFFIX out of src/Makefile.port
instead of listing all the possibilities here.

I'd rather get make to tell us directly, something like:
.PHONY: show_dl_suffix
show_dl_suffix:
    @echo $(DLSUFFIX)

No objection here, but of course you'd have to back-patch that into
all active branches.

(The Darwin case is slightly exciting, but it looks like you'd get
the right answer as long as Makefile.shlib doesn't get involved.)

OK, I'll make that happen.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services