Testing pg_terminate_backend()

Started by Bruce Momjianover 17 years ago11 messages
#1Bruce Momjian
bruce@momjian.us
2 attachment(s)

bruce wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

The closest thing I can think of to an automated test is to run repeated
sets of the parallel regression tests, and each time SIGTERM a randomly
chosen backend at a randomly chosen time. Then see if anything "funny"

Yep, that was my plan, plus running the parallel regression tests you
get the possibility of >2 backends.

I was intentionally suggesting only one kill per test cycle. Multiple
kills will probably create an O(N^2) explosion in the set of possible
downstream-failure deltas. I doubt you'd really get any improvement
in testing coverage to justify the much larger amount of hand validation
needed.

It also strikes me that you could make some simple alterations to the
regression tests to reduce the set of observable downstream deltas.
For example, anyplace where a test loads a table with successive INSERTs
and that table is used by later tests, wrap the INSERT sequence with
BEGIN/END. Then there is only one possible downstream delta (empty
table) and not N different possibilities for an N-row table.

I have added pg_terminate_backend() to use SIGTERM and will start
running tests as discussed with Tom. I will post my scripts too.

Attached is my test script. I ran it for 14 hours (asserts on),
running 450 regression tests, with up to seven backends killed per
regression test.

I have processed the combined regression.diffs files by pickouting out
all the new error messages. I don't see anything unusual in there.

Should I run it differently?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/root/sigtesttext/plainDownload
/rtmp/difftext/x-diffDownload
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Testing pg_terminate_backend()

Bruce Momjian <bruce@momjian.us> writes:

Attached is my test script. I ran it for 14 hours (asserts on),
running 450 regression tests, with up to seven backends killed per
regression test.

Hmm, there are something on the order of 10000 SQL commands in our
regression tests, so even assuming perfect randomness you've exercised
SIGTERM on maybe 10% of them --- and of course there's multiple places
in a complex DDL command where SIGTERM might conceivably be a problem.

Who was volunteering to run this 24x7 for awhile?

SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`

Uh, where's the randomness coming from?

regards, tom lane

#3Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#2)
Re: Testing pg_terminate_backend()

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Attached is my test script. I ran it for 14 hours (asserts on),
running 450 regression tests, with up to seven backends killed per
regression test.

Hmm, there are something on the order of 10000 SQL commands in our
regression tests, so even assuming perfect randomness you've exercised
SIGTERM on maybe 10% of them --- and of course there's multiple places
in a complex DDL command where SIGTERM might conceivably be a problem.

Who was volunteering to run this 24x7 for awhile?

That was me. As long as the script runs properly on linux, I can get
that started as soon as I'm fed instructions on how to do it :-) Do I
just fix the paths and set it running, or do I need to prepare
something else?

SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`

Uh, where's the randomness coming from?

... but I should probably wait until that one is answered or fixed, I
guess :-)

//Magnus

#4Alvaro Herrera
alvherre@commandprompt.com
In reply to: Magnus Hagander (#3)
Re: Testing pg_terminate_backend()

Magnus Hagander wrote:

Tom Lane wrote:

SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`

Uh, where's the randomness coming from?

... but I should probably wait until that one is answered or fixed, I
guess :-)

bash.

RANDOM Each time this parameter is referenced, a random integer between
0 and 32767 is generated. The sequence of random numbers may be
initialized by assigning a value to RANDOM. If RANDOM is unset,
it loses its special properties, even if it is subsequently
reset.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: Testing pg_terminate_backend()

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Attached is my test script. I ran it for 14 hours (asserts on),
running 450 regression tests, with up to seven backends killed per
regression test.

Hmm, there are something on the order of 10000 SQL commands in our
regression tests, so even assuming perfect randomness you've exercised
SIGTERM on maybe 10% of them --- and of course there's multiple places
in a complex DDL command where SIGTERM might conceivably be a problem.

Who was volunteering to run this 24x7 for awhile?

Yes, that is what it needs.

SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`

Uh, where's the randomness coming from?

In bash $RANDOM returns a random number from 0-32k every time;
#!/bin/bash is specified in the top line.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#6Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#3)
Re: Testing pg_terminate_backend()

Magnus Hagander wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Attached is my test script. I ran it for 14 hours (asserts on),
running 450 regression tests, with up to seven backends killed per
regression test.

Hmm, there are something on the order of 10000 SQL commands in our
regression tests, so even assuming perfect randomness you've exercised
SIGTERM on maybe 10% of them --- and of course there's multiple places
in a complex DDL command where SIGTERM might conceivably be a problem.

Who was volunteering to run this 24x7 for awhile?

That was me. As long as the script runs properly on linux, I can get
that started as soon as I'm fed instructions on how to do it :-) Do I
just fix the paths and set it running, or do I need to prepare
something else?

Nothing special to prepare. Compile with asserts enabled, and run the
script. The comment at the top explains how to analyze the log for
interesting error messages.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#7Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#1)
Re: [PATCHES] Testing pg_terminate_backend()

Magnus, others, how is the SIGTERM testing going?

---------------------------------------------------------------------------

Bruce Momjian wrote:

bruce wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

The closest thing I can think of to an automated test is to run repeated
sets of the parallel regression tests, and each time SIGTERM a randomly
chosen backend at a randomly chosen time. Then see if anything "funny"

Yep, that was my plan, plus running the parallel regression tests you
get the possibility of >2 backends.

I was intentionally suggesting only one kill per test cycle. Multiple
kills will probably create an O(N^2) explosion in the set of possible
downstream-failure deltas. I doubt you'd really get any improvement
in testing coverage to justify the much larger amount of hand validation
needed.

It also strikes me that you could make some simple alterations to the
regression tests to reduce the set of observable downstream deltas.
For example, anyplace where a test loads a table with successive INSERTs
and that table is used by later tests, wrap the INSERT sequence with
BEGIN/END. Then there is only one possible downstream delta (empty
table) and not N different possibilities for an N-row table.

I have added pg_terminate_backend() to use SIGTERM and will start
running tests as discussed with Tom. I will post my scripts too.

Attached is my test script. I ran it for 14 hours (asserts on),
running 450 regression tests, with up to seven backends killed per
regression test.

I have processed the combined regression.diffs files by pickouting out
all the new error messages. I don't see anything unusual in there.

Should I run it differently?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#!/bin/bash

REGRESSION_DURATION=80 # average duration of regression test in seconds
OUTFILE=/rtmp/regression.sigterm

# To analyze output, use:
# grep '^\+ *[A-Z][A-Z]*:' /rtmp/regression.sigterm | sort | uniq | less

cd /pg/test/regress

while :
do
(
SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
echo "Sleeping $SLEEP seconds"
sleep "$SLEEP"
echo "Trying kill"
# send up to 7 kill signals
for X in 1 2 3 4 5 6 7
do
psql -p 55432 -qt -c "
SELECT pg_terminate_backend(stat.procpid)
FROM (SELECT procpid FROM pg_stat_activity
ORDER BY random() LIMIT 1) AS stat
" template1 2> /dev/null
if [ "$?" -eq 0 ]
then echo "Kill sent"
fi
sleep 5
done
) &
gmake check
wait
[ -s regression.diffs ] && cat regression.diffs >> "$OUTFILE"
done

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#8Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#7)
Re: [PATCHES] Testing pg_terminate_backend()

It looks pretty good from here. I have an output of about 50 million
lines, and the only FATAL stuff is the "terminating due to admin
command". All other errors look consistent with things like the backend
that creates a table gets killed, so anybody trying to access that
table later will fail with a does not exist error.

//Magnus

Bruce Momjian wrote:

Show quoted text

Magnus, others, how is the SIGTERM testing going?

---------------------------------------------------------------------------

Bruce Momjian wrote:

bruce wrote:

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

The closest thing I can think of to an automated test is to
run repeated sets of the parallel regression tests, and each
time SIGTERM a randomly chosen backend at a randomly chosen
time. Then see if anything "funny"

Yep, that was my plan, plus running the parallel regression
tests you get the possibility of >2 backends.

I was intentionally suggesting only one kill per test cycle.
Multiple kills will probably create an O(N^2) explosion in the
set of possible downstream-failure deltas. I doubt you'd
really get any improvement in testing coverage to justify the
much larger amount of hand validation needed.

It also strikes me that you could make some simple alterations
to the regression tests to reduce the set of observable
downstream deltas. For example, anyplace where a test loads a
table with successive INSERTs and that table is used by later
tests, wrap the INSERT sequence with BEGIN/END. Then there is
only one possible downstream delta (empty table) and not N
different possibilities for an N-row table.

I have added pg_terminate_backend() to use SIGTERM and will start
running tests as discussed with Tom. I will post my scripts too.

Attached is my test script. I ran it for 14 hours (asserts on),
running 450 regression tests, with up to seven backends killed per
regression test.

I have processed the combined regression.diffs files by pickouting
out all the new error messages. I don't see anything unusual in
there.

Should I run it differently?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#!/bin/bash

REGRESSION_DURATION=80 # average duration of regression test
in seconds OUTFILE=/rtmp/regression.sigterm

# To analyze output, use:
# grep '^\+ *[A-Z][A-Z]*:' /rtmp/regression.sigterm | sort | uniq |
less

cd /pg/test/regress

while :
do
(
SLEEP=`expr $RANDOM \* $REGRESSION_DURATION / 32767`
echo "Sleeping $SLEEP seconds"
sleep "$SLEEP"
echo "Trying kill"
# send up to 7 kill signals
for X in 1 2 3 4 5 6 7
do
psql -p 55432 -qt -c "
SELECT
pg_terminate_backend(stat.procpid) FROM (SELECT procpid FROM
pg_stat_activity ORDER BY random() LIMIT 1) AS stat
" template1 2> /dev/null
if [ "$?" -eq 0 ]
then echo "Kill sent"
fi
sleep 5
done
) &
gmake check
wait
[ -s regression.diffs ] && cat regression.diffs >>
"$OUTFILE" done

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#9Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#8)
Re: [PATCHES] Testing pg_terminate_backend()

Magnus Hagander wrote:

It looks pretty good from here. I have an output of about 50 million
lines, and the only FATAL stuff is the "terminating due to admin
command". All other errors look consistent with things like the backend
that creates a table gets killed, so anybody trying to access that
table later will fail with a does not exist error.

OK, how long does a regression test take to run, and how long did you
run the script? Then please compute the number of regression runs.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#10Magnus Hagander
magnus@hagander.net
In reply to: Bruce Momjian (#9)
Re: [PATCHES] Testing pg_terminate_backend()

Bruce Momjian wrote:

Magnus Hagander wrote:

It looks pretty good from here. I have an output of about 50 million
lines, and the only FATAL stuff is the "terminating due to admin
command". All other errors look consistent with things like the
backend that creates a table gets killed, so anybody trying to
access that table later will fail with a does not exist error.

OK, how long does a regression test take to run, and how long did you
run the script? Then please compute the number of regression runs.

Hmm. This looks like somewhere between 10,000 and 20,000 runs.

//Magnus

#11Bruce Momjian
bruce@momjian.us
In reply to: Magnus Hagander (#10)
Re: [PATCHES] Testing pg_terminate_backend()

Can we conclude this has been tested enough for 8.4?

---------------------------------------------------------------------------

Magnus Hagander wrote:

Bruce Momjian wrote:

Magnus Hagander wrote:

It looks pretty good from here. I have an output of about 50 million
lines, and the only FATAL stuff is the "terminating due to admin
command". All other errors look consistent with things like the
backend that creates a table gets killed, so anybody trying to
access that table later will fail with a does not exist error.

OK, how long does a regression test take to run, and how long did you
run the script? Then please compute the number of regression runs.

Hmm. This looks like somewhere between 10,000 and 20,000 runs.

//Magnus

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +