pgsql: Modify the isolation tester so that multiple sessions can wait.
Modify the isolation tester so that multiple sessions can wait.
This allows testing of deadlock scenarios. Scenarios that would
previously have been considered invalid are now simply taken as a
scenario in which more than one backend will wait.
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/38f8bdcac4982215beb9f65a19debecaf22fd470
Modified Files
--------------
src/test/isolation/README | 9 +-
src/test/isolation/isolationtester.c | 330 ++++++++++++++++++++---------------
2 files changed, 194 insertions(+), 145 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Robert Haas wrote:
Modify the isolation tester so that multiple sessions can wait.
Mikael was kind enough to set up an OpenBSD 5.9 buildfarm member, and
it's been failing in isolationtester. (I kindly accepted my suggestion
to put it to run even though it is failing so that we could look at the
logs). I think this commit is the problem; see the backtrace he sent me
on private email.
#0 0x00001e18c62af87a in thrkill () at <stdin>:2
2 <stdin>: No such file or directory.
in <stdin>
(gdb) bt
#0 0x00001e18c62af87a in thrkill () at <stdin>:2
#1 0x00001e18c62aaf39 in *_libc_abort () at
/usr/src/lib/libc/stdlib/abort.c:52
#2 0x00001e18c62a7838 in *_libc_memcpy (dst0=0x0, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
#3 0x00001e16a7904a7e in run_permutation (testspec=0x1e16a7d08b20,
nsteps=11, steps=Variable "steps" is not available.
) at isolationtester.c:617
#4 0x00001e16a7904d7c in run_testspec (testspec=0x1e16a7d08b20) at
isolationtester.c:369
#5 0x00001e16a79053ce in main (argc=Variable "argc" is not available.
) at isolationtester.c:251
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 27, 2016 at 1:51 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
Modify the isolation tester so that multiple sessions can wait.
Mikael was kind enough to set up an OpenBSD 5.9 buildfarm member, and
it's been failing in isolationtester. (I kindly accepted my suggestion
to put it to run even though it is failing so that we could look at the
logs). I think this commit is the problem; see the backtrace he sent me
on private email.#0 0x00001e18c62af87a in thrkill () at <stdin>:2
2 <stdin>: No such file or directory.
in <stdin>
(gdb) bt
#0 0x00001e18c62af87a in thrkill () at <stdin>:2
#1 0x00001e18c62aaf39 in *_libc_abort () at
/usr/src/lib/libc/stdlib/abort.c:52
#2 0x00001e18c62a7838 in *_libc_memcpy (dst0=0x0, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
#3 0x00001e16a7904a7e in run_permutation (testspec=0x1e16a7d08b20,
nsteps=11, steps=Variable "steps" is not available.
) at isolationtester.c:617
#4 0x00001e16a7904d7c in run_testspec (testspec=0x1e16a7d08b20) at
isolationtester.c:369
#5 0x00001e16a79053ce in main (argc=Variable "argc" is not available.
) at isolationtester.c:251
That looks like malloc() returned NULL. I noticed when writing that
patch that isolationtester has never had any checks for malloc
returning NULL, which is bad, and probably worth fixing, but I didn't
choose to stop and fix it at that time.
I don't know off-hand why you see that problem starting at this commit
and not before, or why it doesn't show up on other machines.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote:
That looks like malloc() returned NULL. I noticed when writing that
patch that isolationtester has never had any checks for malloc
returning NULL, which is bad, and probably worth fixing, but I didn't
choose to stop and fix it at that time.
I didn't actually check closely but I wondered whether the pointer
arithmetic is actually correct. Note that the memcpy length is zero.
I doubt malloc returning null is the problem; how could it happen
exactly at the same spot every time the suite has run?
I don't know off-hand why you see that problem starting at this commit
and not before, or why it doesn't show up on other machines.
Perhaps it's only a problem for OpenBSD's libc and not for glibc which
is the most common. The only other openbsd machine in buildfarm doesn't
run the isolation tests.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Apr 28, 2016 at 7:02 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
That looks like malloc() returned NULL. I noticed when writing that
patch that isolationtester has never had any checks for malloc
returning NULL, which is bad, and probably worth fixing, but I didn't
choose to stop and fix it at that time.I didn't actually check closely but I wondered whether the pointer
arithmetic is actually correct. Note that the memcpy length is zero.
I doubt malloc returning null is the problem; how could it happen
exactly at the same spot every time the suite has run?I don't know off-hand why you see that problem starting at this commit
and not before, or why it doesn't show up on other machines.Perhaps it's only a problem for OpenBSD's libc and not for glibc which
is the most common. The only other openbsd machine in buildfarm doesn't
run the isolation tests.
Also happens on OpenBSD 5.8. Isn't this a classic case where memmove
is called for? Replacing the memcpy at line 617 with memmove makes
the tests run successfully, but at first glance the other two
instances of memcpy in run_permutation should also be changed to
memmove, no?
--
Thomas Munro
http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Apr 28, 2016 at 8:46 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
On Thu, Apr 28, 2016 at 7:02 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Robert Haas wrote:
That looks like malloc() returned NULL. I noticed when writing that
patch that isolationtester has never had any checks for malloc
returning NULL, which is bad, and probably worth fixing, but I didn't
choose to stop and fix it at that time.I didn't actually check closely but I wondered whether the pointer
arithmetic is actually correct. Note that the memcpy length is zero.
I doubt malloc returning null is the problem; how could it happen
exactly at the same spot every time the suite has run?I don't know off-hand why you see that problem starting at this commit
and not before, or why it doesn't show up on other machines.Perhaps it's only a problem for OpenBSD's libc and not for glibc which
is the most common. The only other openbsd machine in buildfarm doesn't
run the isolation tests.Also happens on OpenBSD 5.8. Isn't this a classic case where memmove
is called for? Replacing the memcpy at line 617 with memmove makes
the tests run successfully, but at first glance the other two
instances of memcpy in run_permutation should also be changed to
memmove, no?
That abort at memcpy.c:65 is indeed a check for overlapping regions:
That leaves the question of why the backtrace shows arguments that
wouldn't trigger it (for example memcpy exits early for length == 0).
But it appears that those stack arguments are mangled by the time gdb
dumps them, like this:
#include <string.h>
int main(int argc, char *argv[])
{
char buffer[] = "hello world";
memcpy(&buffer[0], &buffer[1], 2);
return 0;
}
#0 0x00000bce27eab90a in kill () at <stdin>:2
#1 0x00000bce27ee5b19 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
#2 0x00000bce27ebcde8 in memcpy (dst0=0xf7ec5, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
#3 0x00000bcb5fc00c7a in main (argc=1, argv=0x7f7fffff6818) at foo.c:6
Suggested patch attached.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
memmove.patchapplication/octet-stream; name=memmove.patchDownload
diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c
index 2969ce9..908a7ce 100644
--- a/src/test/isolation/isolationtester.c
+++ b/src/test/isolation/isolationtester.c
@@ -554,8 +554,8 @@ run_permutation(TestSpec *testspec, int nsteps, Step **steps)
/* Remove that step from the waiting[] array. */
if (w + 1 < nwaiting)
- memcpy(&waiting[w], &waiting[w + 1],
- (nwaiting - (w + 1)) * sizeof(Step *));
+ memmove(&waiting[w], &waiting[w + 1],
+ (nwaiting - (w + 1)) * sizeof(Step *));
nwaiting--;
break;
@@ -582,8 +582,8 @@ run_permutation(TestSpec *testspec, int nsteps, Step **steps)
/* This one finished, too! */
errorstep[nerrorstep++] = waiting[w];
if (w + 1 < nwaiting)
- memcpy(&waiting[w], &waiting[w + 1],
- (nwaiting - (w + 1)) * sizeof(Step *));
+ memmove(&waiting[w], &waiting[w + 1],
+ (nwaiting - (w + 1)) * sizeof(Step *));
nwaiting--;
}
}
@@ -614,8 +614,8 @@ run_permutation(TestSpec *testspec, int nsteps, Step **steps)
{
errorstep[nerrorstep++] = waiting[w];
if (w + 1 < nwaiting)
- memcpy(&waiting[w], &waiting[w + 1],
- (nwaiting - (w + 1)) * sizeof(Step *));
+ memmove(&waiting[w], &waiting[w + 1],
+ (nwaiting - (w + 1)) * sizeof(Step *));
nwaiting--;
}
}
Thomas Munro <thomas.munro@enterprisedb.com> writes:
Also happens on OpenBSD 5.8. Isn't this a classic case where memmove
is called for? Replacing the memcpy at line 617 with memmove makes
the tests run successfully, but at first glance the other two
instances of memcpy in run_permutation should also be changed to
memmove, no?
Yeah, that's clearly busted. Surprising that it has not failed on
any other platforms.
Suggested patch attached.
Will push in a moment.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-04-28 00:15, Tom Lane wrote:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
Also happens on OpenBSD 5.8. Isn't this a classic case where memmove
is called for? Replacing the memcpy at line 617 with memmove makes
the tests run successfully, but at first glance the other two
instances of memcpy in run_permutation should also be changed to
memmove, no?Yeah, that's clearly busted. Surprising that it has not failed on
any other platforms.Suggested patch attached.
Will push in a moment.
And that seems to have done the trick. I started a manual run of HEAD
on curculio and now it's green on the build farm.
Thanks!
/Mikael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= <mikael.kjellstrom@mksoft.nu> writes:
And that seems to have done the trick. I started a manual run of HEAD
on curculio and now it's green on the build farm.
Thanks!
Thank *you*, for putting up a buildfarm member that caught this bug.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers