Bogus cleanup code in PostgresNode.pm
I noticed that even when they are successful, buildfarm members bowerbird
and jacana tend to spew a lot of messages like this in their bin-check
steps:
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/global: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/pg_xlog: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata: Permission denied at /usr/lib/perl5/5.8/File/Temp.pm line 898
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
### Signalling QUIT to 9156 for node "main"
# Running: pg_ctl kill QUIT 9156
What is happening here is that the test script is not bothering to do an
explicit $node->stop operation, and if it doesn't, the automatic cleanup
steps happen in the wrong order: the File::Temp destructor for the temp
data directory runs before PostgresNode.pm's DESTROY function, which is
what's issuing the "pg_ctl kill" command. On Unix that's just messy,
but on Windows it fails because you can't delete a process's working
directory. I am not sure whether this is guaranteed wrong or just
sometimes wrong; the Perl docs I can find say that destructors are run in
unspecified order once interpreter shutdown begins. But by adding some
debug printout I was able to verify on my own machine that the data
directory was already gone when DESTROY runs.
I believe we can fix this by forcing postmaster shutdown in an END
routine instead of a DESTROY routine, and hence propose the attached
patch, which does things in the right order for me. I'm a pretty
poor Perl programmer, so I'd appreciate somebody vetting this.
regards, tom lane
Attachments:
properly-timed-PostgresNode-shutdown.patchtext/x-diff; charset=us-ascii; name=properly-timed-PostgresNode-shutdown.patchDownload+14-13
On Mon, Apr 25, 2016 at 11:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I noticed that even when they are successful, buildfarm members bowerbird
and jacana tend to spew a lot of messages like this in their bin-check
steps:Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/global: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata/pg_xlog: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf/pgdata: Permission denied at /usr/lib/perl5/5.8/File/Temp.pm line 898
Can't remove directory /home/pgrunner/bf/root/HEAD/pgsql.build/src/bin/scripts/tmp_check/data_main_DdUf: Directory not empty at /usr/lib/perl5/5.8/File/Temp.pm line 898
### Signalling QUIT to 9156 for node "main"
# Running: pg_ctl kill QUIT 9156What is happening here is that the test script is not bothering to do an
explicit $node->stop operation, and if it doesn't, the automatic cleanup
steps happen in the wrong order: the File::Temp destructor for the temp
data directory runs before PostgresNode.pm's DESTROY function, which is
what's issuing the "pg_ctl kill" command. On Unix that's just messy,
but on Windows it fails because you can't delete a process's working
directory. I am not sure whether this is guaranteed wrong or just
sometimes wrong; the Perl docs I can find say that destructors are run in
unspecified order once interpreter shutdown begins. But by adding some
debug printout I was able to verify on my own machine that the data
directory was already gone when DESTROY runs.
The docs say regarding File::Temp that he object is removed once the
object goes out of scope in the parent:
http://search.cpan.org/~dagolden/File-Temp-0.2304/lib/File/Temp.pm
So basically it means that when we enter in PostgresNode's DESTROY the
temporary folder just "went out of scope" and has been removed?
DESTROY is run once per object, END is a global destructor, and END is
called really at the end of the execution. And actually one reason why
a DESTROY block instead of END is given by Alvaro here:
/messages/by-id/20151201231121.GI2763@alvherre.pgsql
"
- I changed start/stop/restart so that they keep track of the postmaster
PID; also added a DESTROY sub to PostgresNode that sends SIGQUIT.
This means that when the test finishes, the server gets an immediate
stop signal. We were getting a lot of errors in the server log about
failing to write to the stats file otherwise, until the node noticed
that the datadir was gone.
"
I believe we can fix this by forcing postmaster shutdown in an END
routine instead of a DESTROY routine, and hence propose the attached
patch, which does things in the right order for me. I'm a pretty
poor Perl programmer, so I'd appreciate somebody vetting this.
Another, perhaps more solid approach, would be put the DESTROY method
in charge of removing PGDATA and extend TestLib::tempdir with an
argument to be able to switch to CLEANUP => 0 at will. Then we use
this argument for PGDATA after sending SIGQUIT.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes:
On Mon, Apr 25, 2016 at 11:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I believe we can fix this by forcing postmaster shutdown in an END
routine instead of a DESTROY routine, and hence propose the attached
patch, which does things in the right order for me. I'm a pretty
poor Perl programmer, so I'd appreciate somebody vetting this.
Another, perhaps more solid approach, would be put the DESTROY method
in charge of removing PGDATA and extend TestLib::tempdir with an
argument to be able to switch to CLEANUP => 0 at will. Then we use
this argument for PGDATA after sending SIGQUIT.
Bearing in mind that I'm not a Perl expert --- this bothers me because
of what I read about the order of global destructor calls being
unspecified. See http://perldoc.perl.org/perlobj.html#Destructors
specifically:
When the last reference to an object goes away, the object is
destroyed. If you only have one reference to an object stored in a
lexical scalar, the object is destroyed when that scalar goes out of
scope. If you store the object in a package global, that object may not
go out of scope until the program exits.
(the last sentence being the one that applies here) and
The order in which objects are destroyed during the global destruction
before the program exits is unpredictable.
I do not think it's a good idea to have destructors with
externally-visible effects happening in an undefined order. The present
bug is exactly because those things are happening in the "wrong" order.
Doubling down on using the DESTROY method won't make that better.
Now, whether using END is really an improvement is a separate question.
I have the impression that END calls happen in a better-defined order,
but I'm not a perl monk.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Apr 26, 2016 at 2:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Now, whether using END is really an improvement is a separate question.
I have the impression that END calls happen in a better-defined order,
but I'm not a perl monk.
For the archive's sake, 08af9219 is the result commit.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier wrote:
On Tue, Apr 26, 2016 at 2:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Now, whether using END is really an improvement is a separate question.
I have the impression that END calls happen in a better-defined order,
but I'm not a perl monk.For the archive's sake, 08af9219 is the result commit.
Thanks!
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers