Assertion failure in hash_kill_items()
I bumped into an assertion failure, while playing with variants of the
test case that Alexander Kuzmenkov wrote to exercise hash index page
cleanup [1]/messages/by-id/CALzhyqxrc1ZHYmf5V8NE+yMboqVg7xZrQM7K2c7VS0p1v8z42w@mail.gmail.com. This is master-only, related to the recent changes in how
buffers are marked dirty.
CREATE TABLE hash_cleanup_heap(keycol INT);
CREATE INDEX hash_cleanup_index on hash_cleanup_heap USING HASH (keycol);
BEGIN;
INSERT INTO hash_cleanup_heap SELECT 1 FROM generate_series(1, 500) as i;
ROLLBACK;
SET enable_seqscan = off;
SET enable_bitmapscan = off;
SELECT count(*) FROM hash_cleanup_heap WHERE keycol = 1;
TRAP: failed Assert("BufferIsValid(buffer)"), File:
"../src/backend/storage/buffer/bufmgr.c", Line: 509, PID: 1275145
postgres: heikki postgres [local] SELECT(ExceptionalCondition+0x84)
[0xaaaaecab8650]
postgres: heikki postgres [local] SELECT(+0x20498b4) [0xaaaaec4698b4]
postgres: heikki postgres [local] SELECT(+0x205db78) [0xaaaaec47db78]
postgres: heikki postgres [local] SELECT(BufferBeginSetHintBits+0x44)
[0xaaaaec47df58]
postgres: heikki postgres [local] SELECT(_hash_kill_items+0xa8c)
[0xaaaaeb6a51c8]
postgres: heikki postgres [local] SELECT(_hash_next+0x2a8) [0xaaaaeb69d780]
postgres: heikki postgres [local] SELECT(hashgettuple+0x5a4)
[0xaaaaeb682920]
postgres: heikki postgres [local] SELECT(index_getnext_tid+0x4b4)
[0xaaaaeb73322c]
postgres: heikki postgres [local] SELECT(index_getnext_slot+0x90)
[0xaaaaeb733ebc]
postgres: heikki postgres [local] SELECT(+0x19507d8) [0xaaaaebd707d8]
postgres: heikki postgres [local] SELECT(+0x189a47c) [0xaaaaebcba47c]
postgres: heikki postgres [local] SELECT(+0x189a588) [0xaaaaebcba588]
postgres: heikki postgres [local] SELECT(ExecScan+0x18c) [0xaaaaebcbab24]
postgres: heikki postgres [local] SELECT(+0x1953a00) [0xaaaaebd73a00]
postgres: heikki postgres [local] SELECT(+0x188bf9c) [0xaaaaebcabf9c]
postgres: heikki postgres [local] SELECT(+0x18cb754) [0xaaaaebceb754]
postgres: heikki postgres [local] SELECT(+0x18ccd70) [0xaaaaebcecd70]
postgres: heikki postgres [local] SELECT(+0x18dae40) [0xaaaaebcfae40]
postgres: heikki postgres [local] SELECT(+0x18d98d8) [0xaaaaebcf98d8]
postgres: heikki postgres [local] SELECT(+0x188bf9c) [0xaaaaebcabf9c]
postgres: heikki postgres [local] SELECT(+0x1858814) [0xaaaaebc78814]
postgres: heikki postgres [local] SELECT(+0x1863318) [0xaaaaebc83318]
postgres: heikki postgres [local] SELECT(standard_ExecutorRun+0x594)
[0xaaaaebc7a034]
The first attached patch fixes it. It's pretty straightforward: the
function was using so->currPos.buf, but that's only valid if the page
was already pinned on entry to the function. It should be using the
local 'buf' variable instead.
The second patch simplifies the condition in the 'unlock_page' part.
This isn't new, and isn't needed to fix the bug, it just caught my eye
while looking at this. I don't understand why the condition was the way
it was, checking just 'havePin' seems sufficient and more correct to me.
Am I missing something?
[1]: /messages/by-id/CALzhyqxrc1ZHYmf5V8NE+yMboqVg7xZrQM7K2c7VS0p1v8z42w@mail.gmail.com
/messages/by-id/CALzhyqxrc1ZHYmf5V8NE+yMboqVg7xZrQM7K2c7VS0p1v8z42w@mail.gmail.com
- Heikki
Attachments:
0001-fix-crash-on-killing-items-on-hash-overflow-pages.patchtext/x-patch; charset=UTF-8; name=0001-fix-crash-on-killing-items-on-hash-overflow-pages.patchDownload+2-3
0002-simplify-the-condition-in-unlock_page.patchtext/x-patch; charset=UTF-8; name=0002-simplify-the-condition-in-unlock_page.patchDownload+1-3
Hi,
On 2026-03-17 19:15:10 +0200, Heikki Linnakangas wrote:
I bumped into an assertion failure, while playing with variants of the test
case that Alexander Kuzmenkov wrote to exercise hash index page cleanup [1].
This is master-only, related to the recent changes in how buffers are marked
dirty.
Sorry, I had hoped to push a fix for that already, after it was reported in
/messages/by-id/vjtmvwvbxt7w5uyacxpzibpj65ewcb7uqaqbhd4arvnjbp5jqz@5ksdh6fsyqve
but real life intervened.
I was planning to commit it together with an addition to
src/test/modules/index/specs/killtuples.spec
Unfortunately that made the change a good bit more verbose, as a naive
addition would report a number of buffer accesses that seemed not necessarily
reliable to me. So I updated the 'result' step to just return true/false
depending on whether there were any accesses.
I'll go and work on pushing that.
The first attached patch fixes it. It's pretty straightforward: the function
was using so->currPos.buf, but that's only valid if the page was already
pinned on entry to the function. It should be using the local 'buf' variable
instead.
Yea, that was a stupid bug on my part. No idea how I ended up with it. At
first I thought it might have been a rebase issue, but I didn't see any
relevant change that could have caused that.
The second patch simplifies the condition in the 'unlock_page' part. This
isn't new, and isn't needed to fix the bug, it just caught my eye while
looking at this. I don't understand why the condition was the way it was,
checking just 'havePin' seems sufficient and more correct to me. Am I
missing something?
I can't see anything either, quite odd. Most likely explanation seems to be
that something changed during the development of 7c75ef571579.
Indeed, the first version of the patch from
/messages/by-id/CAE9k0Pm3KTx93K8_5j6VMzG4h5F+SyknxUwXrN-zqSZ9X8ZS3w@mail.gmail.com
was using "if (so->hashso_bucket_buf == so->currPos.buf)" both at the start
and end of _hash_kill_items(). So likely it was just an accident during patch
revisions.
Greetings,
Andres Freund
Hi,
On 2026-03-17 13:40:10 -0400, Andres Freund wrote:
On 2026-03-17 19:15:10 +0200, Heikki Linnakangas wrote:
I bumped into an assertion failure, while playing with variants of the test
case that Alexander Kuzmenkov wrote to exercise hash index page cleanup [1].
This is master-only, related to the recent changes in how buffers are marked
dirty.Sorry, I had hoped to push a fix for that already, after it was reported in
/messages/by-id/vjtmvwvbxt7w5uyacxpzibpj65ewcb7uqaqbhd4arvnjbp5jqz@5ksdh6fsyqve
but real life intervened.I was planning to commit it together with an addition to
src/test/modules/index/specs/killtuples.specUnfortunately that made the change a good bit more verbose, as a naive
addition would report a number of buffer accesses that seemed not necessarily
reliable to me. So I updated the 'result' step to just return true/false
depending on whether there were any accesses.I'll go and work on pushing that.
Done, as of f5eb854ab6d.
Greetings,
Andres
On 17/03/2026 19:40, Andres Freund wrote:
On 2026-03-17 19:15:10 +0200, Heikki Linnakangas wrote:
The second patch simplifies the condition in the 'unlock_page' part. This
isn't new, and isn't needed to fix the bug, it just caught my eye while
looking at this. I don't understand why the condition was the way it was,
checking just 'havePin' seems sufficient and more correct to me. Am I
missing something?I can't see anything either, quite odd. Most likely explanation seems to be
that something changed during the development of 7c75ef571579.Indeed, the first version of the patch from
/messages/by-id/CAE9k0Pm3KTx93K8_5j6VMzG4h5F+SyknxUwXrN-zqSZ9X8ZS3w@mail.gmail.com
was using "if (so->hashso_bucket_buf == so->currPos.buf)" both at the start
and end of _hash_kill_items(). So likely it was just an accident during patch
revisions.
Thanks for archeological excavation; pushed this second patch now.
- Heikki