The xversion-upgrade test fails to stop server

Started by Alexander Lakhinover 1 year ago9 messages
#1Alexander Lakhin
exclusion@gmail.com

Hello Andrew,

While reviewing recent buildfarm failures, I came across this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2024-05-23%2004%3A11%3A03

upgrade.crake/REL_16_STABLE/REL9_5_STABLE-ctl4.log
waiting for server to shut
down...........................................................................................................................
failed
pg_ctl: server does not shut down

Looking at:
https://github.com/PGBuildFarm/client-code/blob/05014d50e/PGBuild/Modules/TestUpgradeXversion.pm#L641

I see that ctl4.log is created after updating extensions and
REL9_5_STABLE-update_extensions.log contains:
You are now connected to database "contrib_regression_redis_fdw" as user "buildfarm".
ALTER EXTENSION "hstore" UPDATE;
ALTER EXTENSION
You are now connected to database "contrib_regression_btree_gin" as user "buildfarm".
ALTER EXTENSION "btree_gin" UPDATE;
ALTER EXTENSION
...
but I see no corresponding server log file containing these commands in the
failure log.

When running the same test locally, I find these in inst/upgrade_log.

Maybe uploading this log file too would help to understand what is the
cause of the failure...

Best regards,
Alexander

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Alexander Lakhin (#1)
Re: The xversion-upgrade test fails to stop server

Sent from my iPhone

On May 30, 2024, at 8:00 AM, Alexander Lakhin <exclusion@gmail.com> wrote:

Hello Andrew,

While reviewing recent buildfarm failures, I came across this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&amp;dt=2024-05-23%2004%3A11%3A03

upgrade.crake/REL_16_STABLE/REL9_5_STABLE-ctl4.log
waiting for server to shut down........................................................................................................................... failed
pg_ctl: server does not shut down

Looking at:
https://github.com/PGBuildFarm/client-code/blob/05014d50e/PGBuild/Modules/TestUpgradeXversion.pm#L641

I see that ctl4.log is created after updating extensions and
REL9_5_STABLE-update_extensions.log contains:
You are now connected to database "contrib_regression_redis_fdw" as user "buildfarm".
ALTER EXTENSION "hstore" UPDATE;
ALTER EXTENSION
You are now connected to database "contrib_regression_btree_gin" as user "buildfarm".
ALTER EXTENSION "btree_gin" UPDATE;
ALTER EXTENSION
...
but I see no corresponding server log file containing these commands in the
failure log.

When running the same test locally, I find these in inst/upgrade_log.

Maybe uploading this log file too would help to understand what is the
cause of the failure...

Will investigate after I return from pgconf

Cheers

Andrew

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Alexander Lakhin (#1)
Re: The xversion-upgrade test fails to stop server

On 2024-05-30 Th 11:00, Alexander Lakhin wrote:

Hello Andrew,

While reviewing recent buildfarm failures, I came across this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&amp;dt=2024-05-23%2004%3A11%3A03

upgrade.crake/REL_16_STABLE/REL9_5_STABLE-ctl4.log
waiting for server to shut
down...........................................................................................................................
failed
pg_ctl: server does not shut down

Looking at:
https://github.com/PGBuildFarm/client-code/blob/05014d50e/PGBuild/Modules/TestUpgradeXversion.pm#L641

I see that ctl4.log is created after updating extensions and
REL9_5_STABLE-update_extensions.log contains:
You are now connected to database "contrib_regression_redis_fdw" as
user "buildfarm".
ALTER EXTENSION "hstore" UPDATE;
ALTER EXTENSION
You are now connected to database "contrib_regression_btree_gin" as
user "buildfarm".
ALTER EXTENSION "btree_gin" UPDATE;
ALTER EXTENSION
...
but I see no corresponding server log file containing these commands
in the
failure log.

When running the same test locally, I find these in inst/upgrade_log.

Maybe uploading this log file too would help to understand what is the
cause of the failure...

Yeah, I'll fix that.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#4Alexander Lakhin
exclusion@gmail.com
In reply to: Andrew Dunstan (#3)
Re: The xversion-upgrade test fails to stop server

02.06.2024 21:39, Andrew Dunstan wrote:

Maybe uploading this log file too would help to understand what is the
cause of the failure...

Yeah, I'll fix that.

Thank you, Andrew!

Could you also take a look at:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2024-04-21%2014%3A09%3A53

This log contains:
test sto_using_select             ... FAILED    27556 ms

but I can't see ../snapshot_too_old/output_iso/regression.diff and
.../snapshot_too_old/output_iso/log/postmaster.log in the log.

Best regards,
Alexander

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Alexander Lakhin (#4)
Re: The xversion-upgrade test fails to stop server

On 2024-06-02 Su 16:00, Alexander Lakhin wrote:

02.06.2024 21:39, Andrew Dunstan wrote:

Maybe uploading this log file too would help to understand what is the
cause of the failure...

Yeah, I'll fix that.

Thank you, Andrew!

Could you also take a look at:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&amp;dt=2024-04-21%2014%3A09%3A53

This log contains:
test sto_using_select             ... FAILED    27556 ms

but I can't see ../snapshot_too_old/output_iso/regression.diff and
.../snapshot_too_old/output_iso/log/postmaster.log in the log.

will do.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#6Alexander Lakhin
exclusion@gmail.com
In reply to: Alexander Lakhin (#1)
Re: The xversion-upgrade test fails to stop server

Hello,

30.05.2024 18:00, Alexander Lakhin wrote:

While reviewing recent buildfarm failures, I came across this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&amp;dt=2024-05-23%2004%3A11%3A03

upgrade.crake/REL_16_STABLE/REL9_5_STABLE-ctl4.log
waiting for server to shut
down...........................................................................................................................
failed
pg_ctl: server does not shut down

I've grepped through logs of the last 167
xversion-upgrade-REL9_5_STABLE-REL_16_STABLE/*ctl4.log on crake and got
the following results:
waiting for server to shut down........ done
waiting for server to shut down............................... done
waiting for server to shut down.............. done
waiting for server to shut down........ done
waiting for server to shut down....................... done
waiting for server to shut down..... done
waiting for server to shut down...... done
waiting for server to shut down............................... done
waiting for server to shut down.... done
waiting for server to shut down....................................... done
waiting for server to shut down....................... done
waiting for server to shut down............................................ done
waiting for server to shut down..... done
waiting for server to shut down....................... done
waiting for server to shut down.... done
waiting for server to shut down..... done
waiting for server to shut down................... done
waiting for server to shut down...................................... done
waiting for server to shut down........ done
waiting for server to shut down................ done
waiting for server to shut down............ done
waiting for server to shut down........... done
waiting for server to shut down........ done
waiting for server to shut down.... done
waiting for server to shut down.................................................. done
waiting for server to shut down........................................................... done
waiting for server to shut down..... done
waiting for server to shut down.............. done
waiting for server to shut down......... done
waiting for server to shut down........................................................... done
waiting for server to shut down................ done
waiting for server to shut down.................................... done
waiting for server to shut down........ done
waiting for server to shut down............ done
waiting for server to shut down................................................................... done
waiting for server to shut down..... done
waiting for server to shut down.............................................. done
waiting for server to shut down............. done
waiting for server to shut down............. done
waiting for server to shut down..... done
waiting for server to shut down.............................................. done
waiting for server to shut down...... done
waiting for server to shut down....... done
waiting for server to shut down....... done
waiting for server to shut down.......................... done
waiting for server to shut down............ done
waiting for server to shut down.................... done
waiting for server to shut down..... done
waiting for server to shut down........ done
waiting for server to shut down....... done
waiting for server to shut down....................... done
waiting for server to shut down........... done
waiting for server to shut down.............. done
waiting for server to shut down.................... done
waiting for server to shut down............. done
waiting for server to shut down...................................... done
waiting for server to shut down............................................... done
waiting for server to shut down........................................ done
waiting for server to shut down.......... done
waiting for server to shut down......................................... done
waiting for server to shut down......................................... done
waiting for server to shut down........ done
waiting for server to shut down..... done
waiting for server to shut down................................................................... done
waiting for server to shut down............................. done
waiting for server to shut down.......... done
waiting for server to shut down..... done
waiting for server to shut down......................... done
waiting for server to shut down...... done
waiting for server to shut down..... done
waiting for server to shut down......... done
waiting for server to shut down.......................................................... done
waiting for server to shut down............. done
waiting for server to shut down..... done
waiting for server to shut down........ done
waiting for server to shut down........................................... done
waiting for server to shut down.... done
waiting for server to shut down..... done
waiting for server to shut down...... done
waiting for server to shut down............. done
waiting for server to shut down.......................................................... done
waiting for server to shut down........................... done
waiting for server to shut down..... done
waiting for server to shut down....... done
waiting for server to shut down...... done
waiting for server to shut down......................................... done
waiting for server to shut down.......................... done
waiting for server to shut down.......................... done
waiting for server to shut down................................... done
waiting for server to shut down..... done
waiting for server to shut down.................. done
waiting for server to shut down....................... done
waiting for server to shut down................. done
waiting for server to shut down........................................... done
waiting for server to shut down........ done
waiting for server to shut down...................................................................... done
waiting for server to shut down............. done
waiting for server to shut down.................................................................... done
waiting for server to shut down................... done
waiting for server to shut down.............. done
waiting for server to shut down..................... done
waiting for server to shut
down.................................................................................................... done
waiting for server to shut down..... done
waiting for server to shut down....................... done
waiting for server to shut down................................ done
waiting for server to shut down....... done
waiting for server to shut down....... done
waiting for server to shut down.............. done
waiting for server to shut down................................................ done
waiting for server to shut down...... done
waiting for server to shut down..................... done
waiting for server to shut down...... done
waiting for server to shut down........ done
waiting for server to shut down..... done
waiting for server to shut down.... done
waiting for server to shut down......... done
waiting for server to shut down...... done
waiting for server to shut down........... done
waiting for server to shut down...................... done
waiting for server to shut down............ done
waiting for server to shut down................. done
waiting for server to shut down.... done
waiting for server to shut down................................... done
waiting for server to shut down.............................. done
waiting for server to shut down....................................................................................... done
waiting for server to shut
down................................................................................................... done
waiting for server to shut down...... done
waiting for server to shut down.... done
waiting for server to shut down........... done
waiting for server to shut down..... done
waiting for server to shut down......... done
waiting for server to shut down.................... done
waiting for server to shut down..... done
waiting for server to shut down....................... done
waiting for server to shut down................. done
waiting for server to shut down........................................ done
waiting for server to shut down...... done
waiting for server to shut down.... done
waiting for server to shut down.................................... done
waiting for server to shut down.............................. done
waiting for server to shut down.................................. done
waiting for server to shut down.............................................................. done
waiting for server to shut down.................................................. done
waiting for server to shut down............................................................. done
waiting for server to shut down............ done
waiting for server to shut down............ done
waiting for server to shut down...................................... done
waiting for server to shut down.......................................... done
waiting for server to shut down................................. done
waiting for server to shut down................... done
waiting for server to shut
down...........................................................................................................................
failed
waiting for server to shut down................................................. done
waiting for server to shut down............ done
waiting for server to shut down................................................................... done
waiting for server to shut down......... done
waiting for server to shut down......... done
waiting for server to shut down.................... done
waiting for server to shut down........ done
waiting for server to shut down............................... done
waiting for server to shut down......... done
waiting for server to shut down......................................................... done
waiting for server to shut down................................................... done
waiting for server to shut down............ done
waiting for server to shut down...... done
waiting for server to shut down........................................... done
waiting for server to shut
down.......................................................................................... done
waiting for server to shut down............................................................................... done

Thus, pg_ctl stopped waiting after 120 seconds timeout, but we can see
"allowed" duration around 100 seconds.

A similar failure have occurred today:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&amp;amp;dt=2024-06-08%2001%3A41%3A41
waiting for server to shut
down...........................................................................................................................
failed
pg_ctl: server does not shut down

and the last StopDb-C:4 runs on opaleye show the following timings:
waiting for server to shut down............. done
waiting for server to shut down......................... done
waiting for server to shut down......................... done
waiting for server to shut down.......................... done
waiting for server to shut down.............. done
waiting for server to shut down...................... done
waiting for server to shut down................................. done
waiting for server to shut down........................... done
waiting for server to shut down............ done
waiting for server to shut down................ done
waiting for server to shut down.................. done
waiting for server to shut down............................... done
waiting for server to shut down............................ done
waiting for server to shut down.................. done
waiting for server to shut down..................... done
waiting for server to shut down........................ done
waiting for server to shut down.......................... done
waiting for server to shut down......................................... done
waiting for server to shut down.................... done
waiting for server to shut down..................................... done
waiting for server to shut down............................................. done
waiting for server to shut down........................... done
waiting for server to shut down................... done
waiting for server to shut down............. done
waiting for server to shut down........................ done
waiting for server to shut down......................... done
waiting for server to shut down................... done
waiting for server to shut down................. done
waiting for server to shut down..................... done
waiting for server to shut down......................... done
waiting for server to shut down............................... done
waiting for server to shut down.................................................. done
waiting for server to shut down............... done
waiting for server to shut down................. done
waiting for server to shut down..................... done
waiting for server to shut down................ done
waiting for server to shut down.............. done
waiting for server to shut down................ done
waiting for server to shut down.......................... done
waiting for server to shut down.................. done
waiting for server to shut down.................... done
waiting for server to shut down................ done
waiting for server to shut down....................... done
waiting for server to shut down................ done
waiting for server to shut down...................... done
waiting for server to shut down............... done
waiting for server to shut down.............. done
waiting for server to shut down........................... done
waiting for server to shut down............. done
waiting for server to shut down..................... done
waiting for server to shut down................. done
waiting for server to shut down........................... done
waiting for server to shut down..................... done
waiting for server to shut down................ done
waiting for server to shut down........................................... done
waiting for server to shut down....................... done
waiting for server to shut down.............................................................................. done
waiting for server to shut
down...........................................................................................................................
failed

So maybe it would make sense to increase default PGCTLTIMEOUT for
buildfarm clients, say, to 180 seconds?

Best regards,
Alexander

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Alexander Lakhin (#6)
Re: The xversion-upgrade test fails to stop server

On 2024-06-08 Sa 10:00, Alexander Lakhin wrote:

Hello,

30.05.2024 18:00, Alexander Lakhin wrote:

While reviewing recent buildfarm failures, I came across this one:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&amp;dt=2024-05-23%2004%3A11%3A03

upgrade.crake/REL_16_STABLE/REL9_5_STABLE-ctl4.log
waiting for server to shut
down...........................................................................................................................
failed
pg_ctl: server does not shut down

I've grepped through logs of the last 167
xversion-upgrade-REL9_5_STABLE-REL_16_STABLE/*ctl4.log on crake and got
the following results:
waiting for server to shut down........ done

[...]

So maybe it would make sense to increase default PGCTLTIMEOUT for
buildfarm clients, say, to 180 seconds?

Sure. For now I have added it to the config on crake, but we can make it
a default.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#8Alexander Lakhin
exclusion@gmail.com
In reply to: Andrew Dunstan (#5)
Re: The xversion-upgrade test fails to stop server

Hello Andrew,

04.06.2024 14:56, Andrew Dunstan wrote:

but I can't see ../snapshot_too_old/output_iso/regression.diff and
.../snapshot_too_old/output_iso/log/postmaster.log in the log.

will do.

I've discovered a couple of other failures where the interesting log files
are not saved.

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&amp;dt=2024-08-06%2014%3A56%3A39
src/test/modules.

I've emulated the failure (not trying to guess the real cause) with:
--- a/src/test/modules/test_custom_rmgrs/Makefile
+++ b/src/test/modules/test_custom_rmgrs/Makefile
@@ -14,0 +15,4 @@ TAP_TESTS = 1
+remove:
+    make uninstall
+install: remove

and saw mostly the same with the buildfarm client REL_17.

I've tried the following addition to run_build.pl:
@@ -2194,6 +2194,8 @@ sub make_testmodules_install_check
         my @logs = glob("$pgsql/src/test/modules/*/regression.diffs");
         push(@logs, "inst/logfile");
         $log->add_log($_) foreach (@logs);
+       @logs = glob("$pgsql/src/test/modules/*/tmp_check/log/*");
+       $log->add_log($_) foreach (@logs);
and it works as expected for me.

The output of another failure ([2]https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&amp;dt=2024-08-17%2001%3A12%3A50) contains only:
\342\226\266 1/1 + partition_prune                          3736 ms FAIL
but no regression.diffs
(Fortunately, in this particular case I found "FATAL:  incorrect checksum
in control file" in inst/logfile, so that can explain the failure.)

Probably, run_build.pl needs something like:
@@ -1848,7 +1848,7 @@ sub make_install_check
                 @checklog = run_log("cd $pgsql/src/test/regress && $make $chktarget");
         }
         my $status = $? >> 8;
-       my @logfiles = ("$pgsql/src/test/regress/regression.diffs", "inst/logfile");
+       my @logfiles = ("$pgsql/src/test/regress/regression.diffs", 
"$pgsql/testrun/regress-running/regress/regression.diffs", "inst/logfile");

A similar failure have occurred today:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&amp;amp;dt=2024-06-08%2001%3A41%3A41

So maybe it would make sense to increase default PGCTLTIMEOUT for
buildfarm clients, say, to 180 seconds?

Sure. For now I have added it to the config on crake, but we can make it a default.

By the way, opaleye failed once again (with 120 seconds timeout):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&amp;dt=2024-08-13%2002%3A04%3A07

[1]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&amp;dt=2024-08-06%2014%3A56%3A39
[2]: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&amp;dt=2024-08-17%2001%3A12%3A50

Best regards,
Alexander

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Alexander Lakhin (#8)
Re: The xversion-upgrade test fails to stop server

On 2024-08-18 Su 12:00 PM, Alexander Lakhin wrote:

Hello Andrew,

04.06.2024 14:56, Andrew Dunstan wrote:

but I can't see ../snapshot_too_old/output_iso/regression.diff and
.../snapshot_too_old/output_iso/log/postmaster.log in the log.

will do.

I've discovered a couple of other failures where the interesting log
files
are not saved.

[1] has only inst/logfile saved and that's not enough for TAP tests in
src/test/modules.

I've emulated the failure (not trying to guess the real cause) with:
--- a/src/test/modules/test_custom_rmgrs/Makefile
+++ b/src/test/modules/test_custom_rmgrs/Makefile
@@ -14,0 +15,4 @@ TAP_TESTS = 1
+remove:
+    make uninstall
+install: remove

and saw mostly the same with the buildfarm client REL_17.

I've tried the following addition to run_build.pl:
@@ -2194,6 +2194,8 @@ sub make_testmodules_install_check
        my @logs = glob("$pgsql/src/test/modules/*/regression.diffs");
        push(@logs, "inst/logfile");
        $log->add_log($_) foreach (@logs);
+       @logs = glob("$pgsql/src/test/modules/*/tmp_check/log/*");
+       $log->add_log($_) foreach (@logs);
and it works as expected for me.

The output of another failure ([2]) contains only:
\342\226\266 1/1 + partition_prune                          3736 ms FAIL
but no regression.diffs
(Fortunately, in this particular case I found "FATAL:  incorrect checksum
in control file" in inst/logfile, so that can explain the failure.)

Probably, run_build.pl needs something like:
@@ -1848,7 +1848,7 @@ sub make_install_check
                @checklog = run_log("cd $pgsql/src/test/regress && 
$make $chktarget");
        }
        my $status = $? >> 8;
-       my @logfiles = ("$pgsql/src/test/regress/regression.diffs", 
"inst/logfile");
+       my @logfiles = ("$pgsql/src/test/regress/regression.diffs", 
"$pgsql/testrun/regress-running/regress/regression.diffs", 
"inst/logfile");

Thanks, I have added these tweaks.

A similar failure have occurred today:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&amp;amp;dt=2024-06-08%2001%3A41%3A41

So maybe it would make sense to increase default PGCTLTIMEOUT for
buildfarm clients, say, to 180 seconds?

Sure. For now I have added it to the config on crake, but we can make
it a default.

By the way, opaleye failed once again (with 120 seconds timeout):
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=opaleye&amp;dt=2024-08-13%2002%3A04%3A07

[1]
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gull&amp;dt=2024-08-06%2014%3A56%3A39
[2]
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&amp;dt=2024-08-17%2001%3A12%3A50

Yeah. In the next release the default will increase to 180.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com