parallel restore item dependencies

Started by Andrew Dunstanalmost 17 years ago4 messages
#1Andrew Dunstan
andrew@dunslane.net

OK, I've worked out why I am seeing deadlocks etc. from parallel restore
on FK items.

In my original patch, I looked at all the dependencies of a candidate
item ansd compared them with the dependencies of the running items to
see if there was a potential locking clash. However, Tom in his
admirable reworking of my patch, restricted the list of potential
clashing items (lockDeps) to "TABLE" items, if any. This would probably
have been ok if we hadn't just beforehand transferred all TABLE
dependencies in POST_DATA items to the corresponding TABLE DATA item.
The result is that we get empty lockDeps lists on all items - I'm
surprised we haven't had more complaints about deadlock or failing locks.

A simple fix that would probably work would be to adjust the filter to
include TABLE DATA items, so the relevant statement would read:

if (tocsByDumpId[depid - 1] &&
(strcmp(tocsByDumpId[depid - 1]->desc, "TABLE") == 0 ||
strcmp(tocsByDumpId[depid - 1]->desc, "TABLE DATA") == 0))
lockids[nlockids++] = depid;

Perhaps a better fix would move the code that sets up the lockDeps so
that it runs before we adjust the dependencies.

I'm moderately confident that either of these fixes will work, but I
think this demonstrates the need for lots of testing, especially with
complex data sets that have lots of dependencies and potentially
deadlocking items.

thoughts?

cheers

andrew

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: parallel restore item dependencies

Andrew Dunstan <andrew@dunslane.net> writes:

OK, I've worked out why I am seeing deadlocks etc. from parallel restore
on FK items.

In my original patch, I looked at all the dependencies of a candidate
item ansd compared them with the dependencies of the running items to
see if there was a potential locking clash. However, Tom in his
admirable reworking of my patch, restricted the list of potential
clashing items (lockDeps) to "TABLE" items, if any. This would probably
have been ok if we hadn't just beforehand transferred all TABLE
dependencies in POST_DATA items to the corresponding TABLE DATA item.
The result is that we get empty lockDeps lists on all items - I'm
surprised we haven't had more complaints about deadlock or failing locks.

[ scratches head... ] I coulda sworn I tested that when I was hacking
it. I'm running low on steam tonight but will think more about this
tomorrow.

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#2)
Re: parallel restore item dependencies

I wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

In my original patch, I looked at all the dependencies of a candidate
item ansd compared them with the dependencies of the running items to
see if there was a potential locking clash. However, Tom in his
admirable reworking of my patch, restricted the list of potential
clashing items (lockDeps) to "TABLE" items, if any. This would probably
have been ok if we hadn't just beforehand transferred all TABLE
dependencies in POST_DATA items to the corresponding TABLE DATA item.
The result is that we get empty lockDeps lists on all items - I'm
surprised we haven't had more complaints about deadlock or failing locks.

[ scratches head... ] I coulda sworn I tested that when I was hacking
it. I'm running low on steam tonight but will think more about this
tomorrow.

I think I have reconstructed what happened: I tested this code before
I decided that repointing the dependencies was a good idea, or else
reordered the sequence of operations in fix_dependencies after that.
It looks to me like the correct fix is just to look for TABLE DATA
not TABLE while setting up lockDeps[], since all the entry types we
care about are POST_DATA items. Anyway, I've committed that, please
try it.

regards, tom lane

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#3)
Re: parallel restore item dependencies

Tom Lane wrote:

I wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

In my original patch, I looked at all the dependencies of a candidate
item ansd compared them with the dependencies of the running items to
see if there was a potential locking clash. However, Tom in his
admirable reworking of my patch, restricted the list of potential
clashing items (lockDeps) to "TABLE" items, if any. This would probably
have been ok if we hadn't just beforehand transferred all TABLE
dependencies in POST_DATA items to the corresponding TABLE DATA item.
The result is that we get empty lockDeps lists on all items - I'm
surprised we haven't had more complaints about deadlock or failing locks.

[ scratches head... ] I coulda sworn I tested that when I was hacking
it. I'm running low on steam tonight but will think more about this
tomorrow.

I think I have reconstructed what happened: I tested this code before
I decided that repointing the dependencies was a good idea, or else
reordered the sequence of operations in fix_dependencies after that.
It looks to me like the correct fix is just to look for TABLE DATA
not TABLE while setting up lockDeps[], since all the entry types we
care about are POST_DATA items. Anyway, I've committed that, please
try it.

Passes test.

Thanks.

andrew