_hash_addovflpage has a bug
It looks to to me like the recent hash index changes have left
_hash_addovflpage slightly broken. I think that if that function
reaches the point where it calls _hash_getbuf() to fetch the next page
in the bucket chain, we also need to clear retain_pin. Otherwise,
we'll erroneously think that we're supposed to retain a pin even when
the current page is an overflow page rather than the primary bucket
page, which is wrong.
A larger question, I suppose, is why this function wants to be certain
of adding a new page even if someone else has already done so. It
seems like it might be smarter for it to just return the newly added
page to the caller and let the caller try to use it. _hash_doinsert
has an Assert() that the tuple fits on the returned page, but that
could be deleted. As it stands, if two backends try to insert a tuple
into the same full page at the same time, both of them will add an
overflow page and we'll end up with 2 overflow pages each containing 1
tuple.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 7, 2017 at 2:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It looks to to me like the recent hash index changes have left
_hash_addovflpage slightly broken. I think that if that function
reaches the point where it calls _hash_getbuf() to fetch the next page
in the bucket chain, we also need to clear retain_pin. Otherwise,
we'll erroneously think that we're supposed to retain a pin even when
the current page is an overflow page rather than the primary bucket
page, which is wrong.
How? I think we are ensuring that we retain pin only if it is a
bucket page. The check is as below:
if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)
A larger question, I suppose, is why this function wants to be certain
of adding a new page even if someone else has already done so. It
seems like it might be smarter for it to just return the newly added
page to the caller and let the caller try to use it. _hash_doinsert
has an Assert() that the tuple fits on the returned page, but that
could be deleted. As it stands, if two backends try to insert a tuple
into the same full page at the same time, both of them will add an
overflow page and we'll end up with 2 overflow pages each containing 1
tuple.
I think it is because in the current algorithm we first get an
overflow page and then add it to the end. Now we can change it such
that first, we acquire the lock on the tail page, check if we still
need an overflow page, if so, then proceed, else return the already
added overflow page.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 6, 2017 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 7, 2017 at 2:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It looks to to me like the recent hash index changes have left
_hash_addovflpage slightly broken. I think that if that function
reaches the point where it calls _hash_getbuf() to fetch the next page
in the bucket chain, we also need to clear retain_pin. Otherwise,
we'll erroneously think that we're supposed to retain a pin even when
the current page is an overflow page rather than the primary bucket
page, which is wrong.How? I think we are ensuring that we retain pin only if it is a
bucket page. The check is as below:
if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)
Oh, right. So I guess there's no bug, but I still think that's pretty
ugly. How about:
if (retain_pin)
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
else
_hash_relbuf(rel, buf);
retain_pin = false;
instead?
A larger question, I suppose, is why this function wants to be certain
of adding a new page even if someone else has already done so. It
seems like it might be smarter for it to just return the newly added
page to the caller and let the caller try to use it. _hash_doinsert
has an Assert() that the tuple fits on the returned page, but that
could be deleted. As it stands, if two backends try to insert a tuple
into the same full page at the same time, both of them will add an
overflow page and we'll end up with 2 overflow pages each containing 1
tuple.I think it is because in the current algorithm we first get an
overflow page and then add it to the end. Now we can change it such
that first, we acquire the lock on the tail page, check if we still
need an overflow page, if so, then proceed, else return the already
added overflow page.
For the WAL patch, this is all going to need to be atomic anyway,
right? Like, you have to allocate the overflow page and add it to the
bucket chain as a single logged operation? Not sure exactly how that
works out in terms of locking.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 9, 2017 at 11:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 6, 2017 at 11:32 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Jan 7, 2017 at 2:33 AM, Robert Haas <robertmhaas@gmail.com> wrote:
It looks to to me like the recent hash index changes have left
_hash_addovflpage slightly broken. I think that if that function
reaches the point where it calls _hash_getbuf() to fetch the next page
in the bucket chain, we also need to clear retain_pin. Otherwise,
we'll erroneously think that we're supposed to retain a pin even when
the current page is an overflow page rather than the primary bucket
page, which is wrong.How? I think we are ensuring that we retain pin only if it is a
bucket page. The check is as below:
if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)Oh, right. So I guess there's no bug, but I still think that's pretty
ugly. How about:if (retain_pin)
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
else
_hash_relbuf(rel, buf);
retain_pin = false;instead?
Yeah, we can write code that way, but then it is better to rely just
on retain_pin variable in the function and add an Assert for bucket
page whenever we are retaining pin. How about doing something like
attached patch?
A larger question, I suppose, is why this function wants to be certain
of adding a new page even if someone else has already done so. It
seems like it might be smarter for it to just return the newly added
page to the caller and let the caller try to use it. _hash_doinsert
has an Assert() that the tuple fits on the returned page, but that
could be deleted. As it stands, if two backends try to insert a tuple
into the same full page at the same time, both of them will add an
overflow page and we'll end up with 2 overflow pages each containing 1
tuple.I think it is because in the current algorithm we first get an
overflow page and then add it to the end. Now we can change it such
that first, we acquire the lock on the tail page, check if we still
need an overflow page, if so, then proceed, else return the already
added overflow page.For the WAL patch, this is all going to need to be atomic anyway,
right? Like, you have to allocate the overflow page and add it to the
bucket chain as a single logged operation?
Yes.
Not sure exactly how that
works out in terms of locking.
We have to change the locking order as mentioned above by me. This
change is already present in that patch, so maybe we add the check as
suggested by you along with that patch. Now, another thing we could
do is to extract those changes from WAL patch, but I am not sure if it
is worth the effort.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
improve_code_hash_add_ovfl_page_v1.patchapplication/octet-stream; name=improve_code_hash_add_ovfl_page_v1.patchDownload
diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c
index d2e8f64..e8928ef 100644
--- a/src/backend/access/hash/hashovfl.c
+++ b/src/backend/access/hash/hashovfl.c
@@ -128,11 +128,17 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf, bool retain_pin)
break;
/* we assume we do not need to write the unmodified page */
- if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)
+ if (retain_pin)
+ {
+ /* pin will be retained only for the primary bucket page */
+ Assert(pageopaque->hasho_flag & LH_BUCKET_PAGE);
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ }
else
_hash_relbuf(rel, buf);
+ retain_pin = false;
+
buf = _hash_getbuf(rel, nextblkno, HASH_WRITE, LH_OVERFLOW_PAGE);
}
@@ -150,8 +156,12 @@ _hash_addovflpage(Relation rel, Buffer metabuf, Buffer buf, bool retain_pin)
/* logically chain overflow page to previous page */
pageopaque->hasho_nextblkno = BufferGetBlockNumber(ovflbuf);
MarkBufferDirty(buf);
- if ((pageopaque->hasho_flag & LH_BUCKET_PAGE) && retain_pin)
+ if (retain_pin)
+ {
+ /* pin will be retained only for the primary bucket page */
+ Assert(pageopaque->hasho_flag & LH_BUCKET_PAGE);
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+ }
else
_hash_relbuf(rel, buf);
On Mon, Jan 9, 2017 at 10:46 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
Yeah, we can write code that way, but then it is better to rely just
on retain_pin variable in the function and add an Assert for bucket
page whenever we are retaining pin. How about doing something like
attached patch?
Committed.
Not sure exactly how that
works out in terms of locking.We have to change the locking order as mentioned above by me. This
change is already present in that patch, so maybe we add the check as
suggested by you along with that patch. Now, another thing we could
do is to extract those changes from WAL patch, but I am not sure if it
is worth the effort.
I'm not sure at this point, either.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers