pgsql: Add test for postmaster crash restarts.

Started by Andres Freundover 8 years ago22 messages

andres@anarazel.de

over 8 years ago

Add test for postmaster crash restarts.

Given that I managed to break this... We probably should extend the
tests to also cover other sub-processes dying, but that's something
for later.

Author: Andres Freund
Discussion: /messages/by-id/20170917080752.rcmihzfmgbeuqjk2@alap3.anarazel.de

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/a1924a4ea29399111e5155532ca24c9c51d3c82d

Modified Files
--------------
src/test/recovery/t/013_crash_restart.pl | 192 +++++++++++++++++++++++++++++++
1 file changed, 192 insertions(+)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#1)

Re: pgsql: Add test for postmaster crash restarts.

Andres Freund <andres@anarazel.de> writes:

Add test for postmaster crash restarts.

Hm, calliphoridae doesn't like this.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#2)

Re: pgsql: Add test for postmaster crash restarts.

On September 18, 2017 8:55:35 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

Add test for postmaster crash restarts.

Hm, calliphoridae doesn't like this.

Yea. Not clear to me why yet. The machine ran a number of instances with nearly the same config successfully. Can't imagine that copyparse makes a difference here. I suspect it's somehow load related... Ran a good number of iterations locally, didn't reproduce, even under high load. Think I'll add bit more error reporting.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#3)

Re: pgsql: Add test for postmaster crash restarts.

I discovered that prairiedog has been hung up for many hours in the
013_crash_restart.pl. It looks to me like the explanation is that
the test has a race condition, because what I find in the postmaster
log is

2017-09-19 00:31:48.194 EDT [27839] [unknown] LOG: connection received: host=[local]
2017-09-19 00:31:48.203 EDT [27839] [unknown] LOG: connection authorized: user=buildfarm database=postgres
2017-09-19 00:31:48.218 EDT [27839] t/013_crash_restart.pl LOG: statement: CREATE TABLE alive(status text);
2017-09-19 00:31:48.266 EDT [27839] t/013_crash_restart.pl LOG: statement: INSERT INTO alive VALUES($$committed-before-sigquit$$);
2017-09-19 00:31:48.271 EDT [27839] t/013_crash_restart.pl LOG: statement: SELECT pg_backend_pid();
2017-09-19 00:31:48.278 EDT [27839] t/013_crash_restart.pl LOG: statement: BEGIN;
2017-09-19 00:31:48.280 EDT [27839] t/013_crash_restart.pl LOG: statement: INSERT INTO alive VALUES($$in-progress-before-sigquit$$) RETURNING status;
2017-09-19 00:31:48.292 EDT [27839] t/013_crash_restart.pl WARNING: terminating connection because of crash of another server process
2017-09-19 00:31:48.292 EDT [27839] t/013_crash_restart.pl DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted
shared memory.
2017-09-19 00:31:48.292 EDT [27839] t/013_crash_restart.pl HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-09-19 00:31:48.299 EDT [27827] LOG: server process (PID 27839) exited with exit code 2
2017-09-19 00:31:48.299 EDT [27827] DETAIL: Failed process was running: INSERT INTO alive VALUES($$in-progress-before-sigquit$$) RETURNING status;
2017-09-19 00:31:48.300 EDT [27827] LOG: terminating any other active server processes
2017-09-19 00:31:48.307 EDT [27832] WARNING: terminating connection because of crash of another server process
2017-09-19 00:31:48.307 EDT [27832] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-09-19 00:31:48.307 EDT [27832] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2017-09-19 00:31:48.317 EDT [27827] LOG: all server processes terminated; reinitializing
2017-09-19 00:31:48.333 EDT [27840] LOG: database system was interrupted; last known up at 2017-09-19 00:31:47 EDT
2017-09-19 00:31:48.338 EDT [27840] LOG: database system was not properly shut down; automatic recovery in progress
2017-09-19 00:31:48.346 EDT [27840] LOG: redo starts at 0/15A89EC
2017-09-19 00:31:48.361 EDT [27840] LOG: invalid record length at 0/15C6D74: wanted 24, got 0
2017-09-19 00:31:48.362 EDT [27840] LOG: redo done at 0/15C6D50
2017-09-19 00:31:48.362 EDT [27840] LOG: last completed transaction was at log time 2017-09-19 00:31:48.270076-04
2017-09-19 00:31:48.474 EDT [27827] LOG: database system is ready to accept connections
2017-09-19 00:31:48.492 EDT [27847] [unknown] LOG: connection received: host=[local]
2017-09-19 00:31:48.499 EDT [27847] [unknown] LOG: connection authorized: user=buildfarm database=postgres
2017-09-19 00:31:48.578 EDT [27847] t/013_crash_restart.pl LOG: statement: SELECT pg_sleep(3600);

IOW, the "$monitor" instance of psql did not complete making its
connection until after the crash/restart cycle had occurred.
So we're just sitting there waiting for a crash report that won't
come. Which is another very serious deficiency in this test:
lacking any sort of timeout, it will just freeze indefinitely
if anything doesn't happen exactly the way it expects. From a
buildfarm owner's standpoint, that's pretty damn unfriendly.
It means having to manually unwedge your animals from time to time.

I'd like to ask you to revert this test, at least pending making
it a whole lot more bulletproof. We don't really need crash
recovery testing in the buildfarm IMO --- we hackers crash the
system plenty often enough to notice problems there.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#4)

Re: pgsql: Add test for postmaster crash restarts.

Hi,

On 2017-09-19 12:13:54 -0400, Tom Lane wrote:

IOW, the "$monitor" instance of psql did not complete making its
connection until after the crash/restart cycle had occurred.

That'd be easy enough to fix...

Just something like

$monitor_stdin .= q[
SELECT $$am-i-up$$;
];
$monitor->pump until $monitor_stdout =~ /am-i-up/;
$monitor_stdout = '';

So we're just sitting there waiting for a crash report that won't
come. Which is another very serious deficiency in this test:
lacking any sort of timeout, it will just freeze indefinitely
if anything doesn't happen exactly the way it expects. From a
buildfarm owner's standpoint, that's pretty damn unfriendly.
It means having to manually unwedge your animals from time to time.

Note that I just copied the code for that from another test - this is
isn't unique to this test. I agree that it'd be good to add a timeout to
those pump calls.

I'd like to ask you to revert this test, at least pending making
it a whole lot more bulletproof.

Hm. Ok. That seems like an overreaction to me - the failure rate isn't
actualy that high so far. I'm happy to add both timeouts and "earlier
startup" of the $monitor, but I'd prefer to do so in-tree - I'd run the
test through 100+ iterations locally, without any of this showing up.

We don't really need crash recovery testing in the buildfarm IMO ---
we hackers crash the system plenty often enough to notice problems
there.

I for one don't exercise that kind of crash restarts, my development
scripts all work with restart_after_crash = false. What I find more
concerning however is coverage of EXEC_BACKEND, which has far fewer
developers actively running it constantly.

Greetings,

Andres Freund

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#5)

Re: pgsql: Add test for postmaster crash restarts.

Andres Freund <andres@anarazel.de> writes:

On 2017-09-19 12:13:54 -0400, Tom Lane wrote:

IOW, the "$monitor" instance of psql did not complete making its
connection until after the crash/restart cycle had occurred.

That'd be easy enough to fix...

Just something like

$monitor_stdin .= q[
SELECT $$am-i-up$$;
];
$monitor->pump until $monitor_stdout =~ /am-i-up/;
$monitor_stdout = '';

Hm, do we care whether the sleep call has started? Maybe not.

I'd like to ask you to revert this test, at least pending making
it a whole lot more bulletproof.

Hm. Ok. That seems like an overreaction to me - the failure rate isn't
actualy that high so far.

It's not that it's high, it's that having to ask buildfarm owners to
manually unwedge their critters is not very cool. Maybe prairiedog
will be the only one that gets stuck, but I kinda doubt it.

I'm happy to add both timeouts and "earlier
startup" of the $monitor, but I'd prefer to do so in-tree - I'd run the
test through 100+ iterations locally, without any of this showing up.

Well, please fix it ASAP, if you don't want to take it out pending
the fixes.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#6)

Re: pgsql: Add test for postmaster crash restarts.

On September 19, 2017 9:53:28 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, please fix it ASAP, if you don't want to take it out pending
the fixes.

Will as soon as I finished my morning coffee. Uncaffeinated, which my phone fittingly autocorrects to unvaccinated, commits aren't a good idea.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Andres Freund (#7)

Re: pgsql: Add test for postmaster crash restarts.

On 2017-09-19 09:58:26 -0700, Andres Freund wrote:

On September 19, 2017 9:53:28 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, please fix it ASAP, if you don't want to take it out pending
the fixes.

Will as soon as I finished my morning coffee. Uncaffeinated, which my phone fittingly autocorrects to unvaccinated, commits aren't a good idea.

Done. Survived ~100 cycles here locally, while running make -j16 -s
check-world in two parallel branches. But I have a fast disk, so it's
not comparable.

If the buildfarm doesn't complain about the use of IPC::Run's timeout
functionality, we should probably patch that into the other use of
IPC::Run as well, but especially into the other user of the pump() until
... scheme.

Greetings,

Andres Freund

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#8)

Re: pgsql: Add test for postmaster crash restarts.

Andres Freund <andres@anarazel.de> writes:

If the buildfarm doesn't complain about the use of IPC::Run's timeout
functionality, we should probably patch that into the other use of
IPC::Run as well, but especially into the other user of the pump() until
... scheme.

jacana hasn't passed this regression test yet, but at least that proves
the timeout stuff works ;-)

I think jacana's problem is simply that "kill QUIT" is never gonna work
on Windows, there being no such animal there. It looks like "kill KILL"
doesn't work either, which surprises me slightly more but not much.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#10

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#1)

Re: pgsql: Add test for postmaster crash restarts.

[re-adding commiters which I inadvertently left off]

On 09/30/2017 06:10 PM, Andres Freund wrote:

I was just looking at this. Why aren't we using "pg_ctl kill" to
terminate the backend? That's supposed to be portable.

Because pg_ctl can't do that for any process but postmaster, no? The
test is supposed to find issues with backend death (and has
defficiencies in error reporting already, and would have caught a bug
I'd introduced previously).

No, I don't think so. That's not what the docs say. That's why you give
it a pid argument" "pg_ctl kill signal_name process_id"

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Import Notes

Reply to msg id not found: 20170930221043.iay4s7rmwm5l7wiw@alap3.anarazel.de

#11

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Andrew Dunstan (#10)

Re: pgsql: Add test for postmaster crash restarts.

On 2017-09-30 18:21:33 -0400, Andrew Dunstan wrote:

[re-adding commiters which I inadvertently left off]

On 09/30/2017 06:10 PM, Andres Freund wrote:

I was just looking at this. Why aren't we using "pg_ctl kill" to
terminate the backend? That's supposed to be portable.

Because pg_ctl can't do that for any process but postmaster, no? The
test is supposed to find issues with backend death (and has
defficiencies in error reporting already, and would have caught a bug
I'd introduced previously).

No, I don't think so. That's not what the docs say. That's why you give
it a pid argument" "pg_ctl kill signal_name process_id"

Oh, cool. Didn't know that one. So the answer is:
"Because Andres doesn't know squat.".

But even after fixing that, there unfortunately is:

static void
set_sig(char *signame)
{
…
#if 0
/* probably should NOT provide SIGKILL */
else if (strcmp(signame, "KILL") == 0)
sig = SIGKILL;
#endif

I'm unclear on what that provision is achieving? If you can kill with
pg_ctl you can do other nasty stuff too (like just use kill instead of
pg_ctl)?

Greetings,

Andres Freund

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#12

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Andres Freund (#11)

1 attachment(s)

Re: pgsql: Add test for postmaster crash restarts.

On 2017-09-30 15:27:12 -0700, Andres Freund wrote:

On 2017-09-30 18:21:33 -0400, Andrew Dunstan wrote:

[re-adding commiters which I inadvertently left off]

On 09/30/2017 06:10 PM, Andres Freund wrote:

I was just looking at this. Why aren't we using "pg_ctl kill" to
terminate the backend? That's supposed to be portable.

Because pg_ctl can't do that for any process but postmaster, no? The
test is supposed to find issues with backend death (and has
defficiencies in error reporting already, and would have caught a bug
I'd introduced previously).

No, I don't think so. That's not what the docs say. That's why you give
it a pid argument" "pg_ctl kill signal_name process_id"

Oh, cool. Didn't know that one. So the answer is:
"Because Andres doesn't know squat.".

But even after fixing that, there unfortunately is:

static void
set_sig(char *signame)
{
…
#if 0
/* probably should NOT provide SIGKILL */
else if (strcmp(signame, "KILL") == 0)
sig = SIGKILL;
#endif

I'm unclear on what that provision is achieving? If you can kill with
pg_ctl you can do other nasty stuff too (like just use kill instead of
pg_ctl)?

Could you perhaps test whether windows likes things after the following
patch? I don't think the kill_kill guarantees are really needed here,
so we might even be able to allow this on msvc.

Greetings,

Andres Freund

Attachments:

windowsify.patchtext/x-diff; charset=us-asciiDownload

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 4e02c4cea1..0990647297 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -1961,11 +1961,8 @@ set_sig(char *signame)
 		sig = SIGQUIT;
 	else if (strcmp(signame, "ABRT") == 0)
 		sig = SIGABRT;
-#if 0
-	/* probably should NOT provide SIGKILL */
 	else if (strcmp(signame, "KILL") == 0)
 		sig = SIGKILL;
-#endif
 	else if (strcmp(signame, "TERM") == 0)
 		sig = SIGTERM;
 	else if (strcmp(signame, "USR1") == 0)
diff --git a/src/test/recovery/t/013_crash_restart.pl b/src/test/recovery/t/013_crash_restart.pl
index ca02054ff0..91a8ef90c1 100644
--- a/src/test/recovery/t/013_crash_restart.pl
+++ b/src/test/recovery/t/013_crash_restart.pl
@@ -16,16 +16,8 @@ use Test::More;
 use Config;
 use Time::HiRes qw(usleep);
 
-if ($Config{osname} eq 'MSWin32')
-{
-	# some Windows Perls at least don't like IPC::Run's
-	# start/kill_kill regime.
-	plan skip_all => "Test fails on Windows perl";
-}
-else
-{
-	plan tests => 18;
-}
+plan tests => 18;
+
 
 # To avoid hanging while expecting some specific input from a psql
 # instance being driven by us, add a timeout high enough that it
@@ -106,10 +98,10 @@ $monitor_stdout = '';
 $monitor_stderr = '';
 
 # kill once with QUIT - we expect psql to exit, while emitting error message first
-my $cnt = kill 'QUIT', $pid;
+my $ret = TestLib::system_log('pg_ctl', 'kill', 'QUIT', $pid);
 
 # Exactly process should have been alive to be killed
-is($cnt, 1, "exactly one process killed with SIGQUIT");
+is($ret, 0, "killed process with SIGQUIT");
 
 # Check that psql sees the killed backend as having been terminated
 $killme_stdin .= q[
@@ -119,14 +111,14 @@ ok(pump_until($killme, \$killme_stderr, qr/WARNING:  terminating connection beca
    "psql query died successfully after SIGQUIT");
 $killme_stderr = '';
 $killme_stdout = '';
-$killme->kill_kill;
+$killme->finish;
 
 # Wait till server restarts - we should get the WARNING here, but
 # sometimes the server is unable to send that, if interrupted while
 # sending.
 ok(pump_until($monitor, \$monitor_stderr, qr/WARNING:  terminating connection because of crash of another server process|server closed the connection unexpectedly/m),
    "psql monitor died successfully after SIGQUIT");
-$monitor->kill_kill;
+$monitor->finish;
 
 # Wait till server restarts
 is($node->poll_query_until('postgres', 'SELECT $$restarted after sigquit$$;', 'restarted after sigquit'),
@@ -179,8 +171,8 @@ $monitor_stderr = '';
 
 # kill with SIGKILL this time - we expect the backend to exit, without
 # being able to emit an error error message
-$cnt = kill 'KILL', $pid;
-is($cnt, 1, "exactly one process killed with KILL");
+$ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid);
+is($ret, 0, "killed process with KILL");
 
 # Check that psql sees the server as being terminated. No WARNING,
 # because signal handlers aren't being run on SIGKILL.
@@ -189,14 +181,14 @@ SELECT 1;
 ];
 ok(pump_until($killme, \$killme_stderr, qr/server closed the connection unexpectedly/m),
    "psql query died successfully after SIGKILL");
-$killme->kill_kill;
+$killme->finish;
 
 # Wait till server restarts - we should get the WARNING here, but
 # sometimes the server is unable to send that, if interrupted while
 # sending.
 ok(pump_until($monitor, \$monitor_stderr, qr/WARNING:  terminating connection because of crash of another server process|server closed the connection unexpectedly/m),
    "psql monitor died successfully after SIGKILL");
-$monitor->kill_kill;
+$monitor->finish;
 
 # Wait till server restarts
 is($node->poll_query_until('postgres', 'SELECT 1', '1'), "1", "reconnected after SIGKILL");

#13

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#12)

Re: pgsql: Add test for postmaster crash restarts.

On 09/30/2017 06:44 PM, Andres Freund wrote:

On 2017-09-30 15:27:12 -0700, Andres Freund wrote:

On 2017-09-30 18:21:33 -0400, Andrew Dunstan wrote:

[re-adding commiters which I inadvertently left off]

On 09/30/2017 06:10 PM, Andres Freund wrote:

I was just looking at this. Why aren't we using "pg_ctl kill" to
terminate the backend? That's supposed to be portable.

Because pg_ctl can't do that for any process but postmaster, no? The
test is supposed to find issues with backend death (and has
defficiencies in error reporting already, and would have caught a bug
I'd introduced previously).

No, I don't think so. That's not what the docs say. That's why you give
it a pid argument" "pg_ctl kill signal_name process_id"

Oh, cool. Didn't know that one. So the answer is:
"Because Andres doesn't know squat.".

But even after fixing that, there unfortunately is:

static void
set_sig(char *signame)
{
ï¿½
#if 0
/* probably should NOT provide SIGKILL */
else if (strcmp(signame, "KILL") == 0)
sig = SIGKILL;
#endif

I'm unclear on what that provision is achieving? If you can kill with
pg_ctl you can do other nasty stuff too (like just use kill instead of
pg_ctl)?

I put it in when we rewrote pg_ctl in C many years ago, possibly out of
a superabundance of caution. I agree it's worth revisiting. I think the
idea was that there's a difference between an ordinary footgun and an
officially sanctioned footgun :-)

Could you perhaps test whether windows likes things after the following
patch? I don't think the kill_kill guarantees are really needed here,
so we might even be able to allow this on msvc.

Haven't tested on MSVC but with this patch it passes on jacana (mingw).

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#14

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Andrew Dunstan (#13)

Re: pgsql: Add test for postmaster crash restarts.

Hi,

On 2017-09-30 22:28:39 -0400, Andrew Dunstan wrote:

But even after fixing that, there unfortunately is:

static void
set_sig(char *signame)
{
…
#if 0
/* probably should NOT provide SIGKILL */
else if (strcmp(signame, "KILL") == 0)
sig = SIGKILL;
#endif

I'm unclear on what that provision is achieving? If you can kill with
pg_ctl you can do other nasty stuff too (like just use kill instead of
pg_ctl)?

I put it in when we rewrote pg_ctl in C many years ago, possibly out of
a superabundance of caution. I agree it's worth revisiting. I think the
idea was that there's a difference between an ordinary footgun and an
officially sanctioned footgun :-)

Heh. I'm inclined to take it out. We could add a --use-the-force-luke
type parameter, but it doesn't seem worth it.

Haven't tested on MSVC but with this patch it passes on jacana (mingw).

Yay! Thanks for testing.

Greetings,

Andres Freund

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#15

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#14)

pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.

On 09/30/2017 10:32 PM, Andres Freund wrote:

Hi,

On 2017-09-30 22:28:39 -0400, Andrew Dunstan wrote:

But even after fixing that, there unfortunately is:

static void
set_sig(char *signame)
{
…
#if 0
/* probably should NOT provide SIGKILL */
else if (strcmp(signame, "KILL") == 0)
sig = SIGKILL;
#endif

I'm unclear on what that provision is achieving? If you can kill with
pg_ctl you can do other nasty stuff too (like just use kill instead of
pg_ctl)?

I put it in when we rewrote pg_ctl in C many years ago, possibly out of
a superabundance of caution. I agree it's worth revisiting. I think the
idea was that there's a difference between an ordinary footgun and an
officially sanctioned footgun :-)

Heh. I'm inclined to take it out. We could add a --use-the-force-luke
type parameter, but it doesn't seem worth it.

I agree, but I think we need this discussed on -hackers. Does anyone
have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
points out, in most places you can just call kill from the command line
anyway, so disallowing it is not really a security feature. Having it
would let us have portable crash restart tests.

cheers

andrew

Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#14)

Re: pgsql: Add test for postmaster crash restarts.

On 09/30/2017 10:32 PM, Andres Freund wrote:

Haven't tested on MSVC but with this patch it passes on jacana (mingw).

Yay! Thanks for testing.

I have now tested on bowerbird (MSVC) and it passes. This suggests that
we can run tests there in cases where we can use IPC::Run's finish()
instead of kill_kill(). Perhaps someone would like to look at the the
two other cases where we do that (recovery/t/011_crash_recovery.pl and
recovery/t/006_logical_decoding.pl) and see if they are amenable to this
treatment. It woould be nice to be able to run all the tests on all
platforms.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#17

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andrew Dunstan (#15)

Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 09/30/2017 10:32 PM, Andres Freund wrote:

Heh. I'm inclined to take it out. We could add a --use-the-force-luke
type parameter, but it doesn't seem worth it.

I agree, but I think we need this discussed on -hackers. Does anyone
have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
points out, in most places you can just call kill from the command line
anyway, so disallowing it is not really a security feature. Having it
would let us have portable crash restart tests.

+1 for portable tests, but it still seems like something we don't want
to encourage users to use. What do you think of leaving it out of the
documentation?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#17)

Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.

On 2017-10-01 16:42:44 -0400, Tom Lane wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 09/30/2017 10:32 PM, Andres Freund wrote:

Heh. I'm inclined to take it out. We could add a --use-the-force-luke
type parameter, but it doesn't seem worth it.

I agree, but I think we need this discussed on -hackers. Does anyone
have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
points out, in most places you can just call kill from the command line
anyway, so disallowing it is not really a security feature. Having it
would let us have portable crash restart tests.

+1 for portable tests, but it still seems like something we don't want
to encourage users to use. What do you think of leaving it out of the
documentation?

As far as I can tell we've not documented the set of acceptable signals
anywhere but the source. I think we can just keep it that way?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Andrew Dunstan

andrew.dunstan@2ndquadrant.com

over 8 years ago

In reply to: Andres Freund (#18)

Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.

On 10/01/2017 04:48 PM, Andres Freund wrote:

On 2017-10-01 16:42:44 -0400, Tom Lane wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 09/30/2017 10:32 PM, Andres Freund wrote:

Heh. I'm inclined to take it out. We could add a --use-the-force-luke
type parameter, but it doesn't seem worth it.

I agree, but I think we need this discussed on -hackers. Does anyone
have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
points out, in most places you can just call kill from the command line
anyway, so disallowing it is not really a security feature. Having it
would let us have portable crash restart tests.

+1 for portable tests, but it still seems like something we don't want
to encourage users to use. What do you think of leaving it out of the
documentation?

As far as I can tell we've not documented the set of acceptable signals
anywhere but the source. I think we can just keep it that way?

As documented it's in the help text:

printf(_("\nAllowed signal names for kill:\n"));
printf(" ABRT HUP INT QUIT TERM USR1 USR2\n");

So we can leave it out of there. OTOH I'm not a huge fan of security by
obscurity. I guess this wouldn't be too bad a case.

cheers

andrew

Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Andrew Dunstan (#19)

Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.

On 2017-10-01 17:47:52 -0400, Andrew Dunstan wrote:

On 10/01/2017 04:48 PM, Andres Freund wrote:

On 2017-10-01 16:42:44 -0400, Tom Lane wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 09/30/2017 10:32 PM, Andres Freund wrote:

Heh. I'm inclined to take it out. We could add a --use-the-force-luke
type parameter, but it doesn't seem worth it.

I agree, but I think we need this discussed on -hackers. Does anyone
have an objection to allowing "pg_ctl kill KILL somepid"? As Andres
points out, in most places you can just call kill from the command line
anyway, so disallowing it is not really a security feature. Having it
would let us have portable crash restart tests.

+1 for portable tests, but it still seems like something we don't want
to encourage users to use. What do you think of leaving it out of the
documentation?

As far as I can tell we've not documented the set of acceptable signals
anywhere but the source. I think we can just keep it that way?

As documented it's in the help text:

printf(_("\nAllowed signal names for kill:\n"));
printf("ï¿½ ABRT HUP INT QUIT TERM USR1 USR2\n");

Oh, hm. I'd looked above.

So we can leave it out of there. OTOH I'm not a huge fan of security by
obscurity. I guess this wouldn't be too bad a case.

I'd personally include it, given that we already allow and document
ABRT. There's no meaningful difference between the two.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Tom Lane

tgl@sss.pgh.pa.us

over 8 years ago

In reply to: Andres Freund (#20)

Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.

Andres Freund <andres@anarazel.de> writes:

On 2017-10-01 17:47:52 -0400, Andrew Dunstan wrote:

So we can leave it out of there. OTOH I'm not a huge fan of security by
obscurity. I guess this wouldn't be too bad a case.

I'd personally include it, given that we already allow and document
ABRT. There's no meaningful difference between the two.

True.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Andres Freund

andres@anarazel.de

over 8 years ago

In reply to: Tom Lane (#21)

Re: pg_ctl kill support for KILL signal was Re: [COMMITTERS] pgsql: Add test for postmaster crash restarts.

On 2017-10-01 18:01:27 -0400, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2017-10-01 17:47:52 -0400, Andrew Dunstan wrote:

So we can leave it out of there. OTOH I'm not a huge fan of security by
obscurity. I guess this wouldn't be too bad a case.

I'd personally include it, given that we already allow and document
ABRT. There's no meaningful difference between the two.

True.

I'll push it that way then. Adapting a help text later is fairly
harmless.

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers