pgsql: Add isolationtester spec for old heapam.c bug

Started by Alvaro Herreraabout 10 years ago6 messageshackers
Jump to latest
#1Alvaro Herrera
alvherre@2ndquadrant.com

Add isolationtester spec for old heapam.c bug

In 0e5680f4737a, I fixed a bug in heapam that caused spurious deadlocks
when multiple updates concurrently attempted to modify the old version
of an updated tuple whose new version was key-share locked. I proposed
an isolationtester spec file that reproduced the bug, but back then
isolationtester wasn't mature enough to be able to run it. Now that
38f8bdcac498 is in the tree, we can have this spec file too.

Discussion: /messages/by-id/20141212205254.GC1768@alvh.no-ip.org

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/c9578135f769072e2597b88402f256a398279c91

Modified Files
--------------
src/test/isolation/expected/tuplelock-update.out | 24 ++++++++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/tuplelock-update.spec | 28 ++++++++++++++++++++++++
3 files changed, 53 insertions(+)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#1)
Re: pgsql: Add isolationtester spec for old heapam.c bug

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Add isolationtester spec for old heapam.c bug

Hmmm ....

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&amp;dt=2016-02-27%2000%3A00%3A06

This failure looks a lot like the timing-related problems I was chasing
last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters.
spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug
stuff that slows it down by similar orders of magnitude. You seem to need
to think about how to make this test less timing-dependent.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: pgsql: Add isolationtester spec for old heapam.c bug

I wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Add isolationtester spec for old heapam.c bug

Hmmm ....

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&amp;dt=2016-02-27%2000%3A00%3A06

This failure looks a lot like the timing-related problems I was chasing
last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters.
spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug
stuff that slows it down by similar orders of magnitude. You seem to need
to think about how to make this test less timing-dependent.

The plot thickens:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=guaibasaurus&amp;dt=2016-02-29%2016%3A17%3A01

guaibasaurus is not a particularly slow machine, and it's not using any
special build flags AFAICT. So I'm not sure what to make of this case,
except that it proves the timing problem can manifest on normal builds.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: pgsql: Add isolationtester spec for old heapam.c bug

Tom Lane wrote:

I wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Add isolationtester spec for old heapam.c bug

Hmmm ....

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=spoonbill&amp;dt=2016-02-27%2000%3A00%3A06

This failure looks a lot like the timing-related problems I was chasing
last week with deadlock-hard on the CLOBBER_CACHE_ALWAYS critters.
spoonbill isn't CLOBBER_CACHE_ALWAYS, but it uses some weird malloc debug
stuff that slows it down by similar orders of magnitude. You seem to need
to think about how to make this test less timing-dependent.

The plot thickens:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=guaibasaurus&amp;dt=2016-02-29%2016%3A17%3A01

guaibasaurus is not a particularly slow machine, and it's not using any
special build flags AFAICT. So I'm not sure what to make of this case,
except that it proves the timing problem can manifest on normal builds.

Hmm, I suppose I could fix this by using three different advisory locks
rather than a single one. (My assumption is that the timing dependency
is the order in which the backends are awakened when the advisory lock
is released.) I would release the locks one by one rather than all
together.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: pgsql: Add isolationtester spec for old heapam.c bug

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

guaibasaurus is not a particularly slow machine, and it's not using any
special build flags AFAICT. So I'm not sure what to make of this case,
except that it proves the timing problem can manifest on normal builds.

Hmm, I suppose I could fix this by using three different advisory locks
rather than a single one. (My assumption is that the timing dependency
is the order in which the backends are awakened when the advisory lock
is released.) I would release the locks one by one rather than all
together.

Sounds plausible. You would probably need several seconds' pg_sleep() in
between the lock releases to ensure that even on slow/overloaded machines,
there's enough time for all wakened backends to do what they're supposed
to do.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: [COMMITTERS] pgsql: Add isolationtester spec for old heapam.c bug

Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

guaibasaurus is not a particularly slow machine, and it's not using any
special build flags AFAICT. So I'm not sure what to make of this case,
except that it proves the timing problem can manifest on normal builds.

Hmm, I suppose I could fix this by using three different advisory locks
rather than a single one. (My assumption is that the timing dependency
is the order in which the backends are awakened when the advisory lock
is released.) I would release the locks one by one rather than all
together.

Sounds plausible.

Pushed. We'll see how they behave now.

You would probably need several seconds' pg_sleep() in between the
lock releases to ensure that even on slow/overloaded machines, there's
enough time for all wakened backends to do what they're supposed to
do.

I don't really like sleeps in tests, because instead of 0.01 seconds in
the normal case it now takes 10.01 seconds. But fixing it is more
trouble than it's probably worth, so I added 5s sleeps.

If someone wants to get rid of that idle time (i.e. have a mechanism
that causes it to stop sleeping when the update is done) I would be
happy to give it a look.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers