Duplicate-key-detection failure case found in btree

Started by Tom Laneabout 24 years ago4 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I have found a case in nbtree where it is possible for two concurrent
transactions to insert the same key value in a unique index :-(

The reason is that _bt_check_unique can (of course) only detect keys
that are already in the index. To prevent concurrent insertion of
duplicate keys, we have to rely on locking. Two would-be inserters
of the same key must lock the same target page in _bt_doinsert, so
they cannot run the check at the same time. The second to obtain
the lock will see the first one's already-inserted key.

However, suppose that the index contains many existing instances
of the same key (presumably pointing at deleted-but-not-yet-vacuumed
tuples). If the existing instances span several index pages, it is
likely that _bt_insertonpg will decide to insert the new key on one
of the pages to the right of the original target key. As presently
coded, _bt_insertonpg drops the write lock it has before acquiring
write lock on the next page to the right. This creates a window
wherein another process could perform _bt_check_unique, fail to
find any non-dead instances of the key, and decide that it's okay
to insert the same key.

The fix is to acquire the next page's write lock before we drop the
current page's. This is deadlock-free since no writer ever tries
to chain write-locks to the left (in fact the same thing is done in
the page split logic).

I am not convinced that this explains any of the field reports of
duplicate rows that we've seen, but it's clearly a bug. Will commit
the fix shortly.

regards, tom lane

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#1)
Re: Duplicate-key-detection failure case found in btree

The fix is to acquire the next page's write lock before we drop the
current page's. This is deadlock-free since no writer ever tries
to chain write-locks to the left (in fact the same thing is done in
the page split logic).

I am not convinced that this explains any of the field reports of
duplicate rows that we've seen, but it's clearly a bug. Will commit
the fix shortly.

Sounds good to me. Good find.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Jan Wieck
janwieck@yahoo.com
In reply to: Bruce Momjian (#2)
Re: Duplicate-key-detection failure case found in btree

Bruce Momjian wrote:

The fix is to acquire the next page's write lock before we drop the
current page's. This is deadlock-free since no writer ever tries
to chain write-locks to the left (in fact the same thing is done in
the page split logic).

I am not convinced that this explains any of the field reports of
duplicate rows that we've seen, but it's clearly a bug. Will commit
the fix shortly.

Sounds good to me. Good find.

It always scares me that bugs like this can live untriggered
for multiple releases. What else is lingering under the
surface ...

And I absolutely agree with Tom, there is no connection
between this bug and the duplicate rows reported. Good catch
anyway.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#3)
Re: Duplicate-key-detection failure case found in btree

Jan Wieck <janwieck@yahoo.com> writes:

It always scares me that bugs like this can live untriggered
for multiple releases. What else is lingering under the
surface ...

FWIW, I don't think that this bug did survive for multiple releases
(though it very nearly made it into 7.2). The present form of
bt_check_unique and bt_insertonpg was new code in 7.1, replacing
the BTP_CHAIN method of handling identical keys. That code had a
lot of bugs of its own, but I don't think it had this one. I will
take the blame for introducing this bug into 7.1 ...

regards, tom lane