PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Started by Tomas Vondraover 8 years ago42 messageshackers
Jump to latest
#1Tomas Vondra
tomas.vondra@2ndquadrant.com

Hi,

while doing some weekend hacking & testing on the BRIN patches I posted
recently, I ran into PANICs in VACUUM, when it summarizes data inserted
concurrently (in another session):

PANIC: invalid index offnum: 186

Initially I thought it's a bug in my patches, but apparently it's not
and I can reproduce it easily on current master (846fcc8516).

I'm not sure if/how this is related to the BRIN autosummarization issue
reported by Justin Pryzby about two weeks ago (thread [1]/messages/by-id/20171014035732.GB31726@telsasoft.com), but I don't
see any segfaults and the messages are always exactly the same.

[1]: /messages/by-id/20171014035732.GB31726@telsasoft.com
/messages/by-id/20171014035732.GB31726@telsasoft.com

Reproducing the issue is simple:

create table brin_test (a int, b bigint, c float,
d double precision, e uuid);
create index on brin_test using brin (a);
create index on brin_test using brin (b);
create index on brin_test using brin (c);
create index on brin_test using brin (d);
create index on brin_test using brin (e);

and then run

insert into brin_test select
mod(i,100000)/25,
mod(i,100000)/25,
mod(i,100000)/25,
mod(i,100000)/25,
md5((mod(i,100000)/25)::text)::uuid
from generate_series(1,100000) s(i) \watch 0.1

vacuum brin_test \watch 1

And sooner or later (for me it only takes a minute or two in most cases)
the VACUUM session should fail with the PANIC message mentioned above.
It always fails with the message, only the value (offset) changes.

The stack trace always looks exactly the same - see the attachment.

At first it seemed the idxrel is always the index on 'e' (i.e. the UUID
column), but it seems I also got failures on the other indexes.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

brin-vacuum.txttext/plain; charset=UTF-8; name=brin-vacuum.txtDownload
#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#1)
Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So
it seems to be due to something that changed in the last release.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#2)
Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Tomas Vondra wrote:

FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So
it seems to be due to something that changed in the last release.

I've been trying to reproduce it for half an hour with no success (I
think my laptop is just too slow). I bet this is related to the
addition of PageIndexTupleOverwrite (commit b1328d78f) though I find it
more likely that this was not *caused* by that commit but rather just
made it easier to hit.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#3)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

On Mon, Oct 30, 2017 at 9:42 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

Tomas Vondra wrote:

FWIW I can reproduce this on REL_10_STABLE, but not on REL9_6_STABLE. So
it seems to be due to something that changed in the last release.

I've been trying to reproduce it for half an hour with no success (I
think my laptop is just too slow). I bet this is related to the
addition of PageIndexTupleOverwrite (commit b1328d78f) though I find it
more likely that this was not *caused* by that commit but rather just
made it easier to hit.

b1328d78f is close enough, but per what I see that's actually not
true. I have been able to reproduce the problem, which shows up within
a window of 2-4 minutes. Still sometimes it can take way longer, and I
spotted one test where it took 15 minutes to show up... So, bisecting
with a test that looks for core files for 20 minutes, I am seeing that
the following commit is actually at fault:
commit 24992c6db9fd40f342db1f22747ec9e56483796d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 9 19:00:59 2016 -0400

Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.

The full generality of deleting an arbitrary number of tuples is no longer
needed, so let's save some code and cycles by replacing the original coding
with an implementation based on PageIndexTupleDelete.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Michael Paquier <michael.paquier@gmail.com> writes:

b1328d78f is close enough, but per what I see that's actually not
true. I have been able to reproduce the problem, which shows up within
a window of 2-4 minutes. Still sometimes it can take way longer, and I
spotted one test where it took 15 minutes to show up... So, bisecting
with a test that looks for core files for 20 minutes, I am seeing that
the following commit is actually at fault:

commit 24992c6db9fd40f342db1f22747ec9e56483796d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 9 19:00:59 2016 -0400

Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.

[ scratches head ... ] Not sure how that could've introduced any
problems? Will look.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#5)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

On 10/30/2017 09:03 PM, Tom Lane wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

b1328d78f is close enough, but per what I see that's actually not
true. I have been able to reproduce the problem, which shows up within
a window of 2-4 minutes. Still sometimes it can take way longer, and I
spotted one test where it took 15 minutes to show up... So, bisecting
with a test that looks for core files for 20 minutes, I am seeing that
the following commit is actually at fault:

commit 24992c6db9fd40f342db1f22747ec9e56483796d
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Fri Sep 9 19:00:59 2016 -0400

Rewrite PageIndexDeleteNoCompact into a form that only deletes 1 tuple.

[ scratches head ... ] Not sure how that could've introduced any
problems? Will look.

Not sure, but I can confirm Michael's findings - I've been unable to
reproduce the issue on 1a4be103a5 even after 20 minutes, and on
24992c6db9 it failed after only 2.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#6)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On 10/30/2017 09:03 PM, Tom Lane wrote:

[ scratches head ... ] Not sure how that could've introduced any
problems? Will look.

Not sure, but I can confirm Michael's findings - I've been unable to
reproduce the issue on 1a4be103a5 even after 20 minutes, and on
24992c6db9 it failed after only 2.

Hmm. The index offnum being complained of is one past the end of the
lp array. I think I see what about that commit changed the behavior:
the old code for PageIndexDeleteNoCompact never changed the length
of the lp array, except in the corner case where the page is becoming
completely empty. The new code will shorten the lp array (decrease
phdr->pd_lower) if it's told to remove the last item. So if you make
two successive calls specifying the same offnum, and it's the last one
on the page, the second one will fail with the symptoms we see here.
However, so far as I can see, a sequence like that would have failed
before too, just with a different error message, because once the
first call had marked the item unused, the second call would not see
it as a candidate to match. So I'm not quite sure how that's related
... but it seems like it must be.

Anyway, my opinion ATM is that PageIndexTupleDeleteNoCompact is fine,
and what we're looking at is some kind of bug in summarize_range
or brin_doupdate, causing them to pass a offset beyond the end of
the page in some corner case.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

I wrote:

Hmm. The index offnum being complained of is one past the end of the
lp array. I think I see what about that commit changed the behavior:
the old code for PageIndexDeleteNoCompact never changed the length
of the lp array, except in the corner case where the page is becoming
completely empty. The new code will shorten the lp array (decrease
phdr->pd_lower) if it's told to remove the last item. So if you make
two successive calls specifying the same offnum, and it's the last one
on the page, the second one will fail with the symptoms we see here.
However, so far as I can see, a sequence like that would have failed
before too, just with a different error message, because once the
first call had marked the item unused, the second call would not see
it as a candidate to match. So I'm not quite sure how that's related
... but it seems like it must be.

I'm still confused about why it didn't fail before, but after adding
some additional code to log each call of PageIndexTupleDeleteNoCompact,
I think I've got a smoking gun:

2017-10-30 18:18:44.321 EDT [10932] LOG: deleting tuple 292 (of 292) in rel brin_test_c_idx page 2
2017-10-30 18:18:44.321 EDT [10932] STATEMENT: vacuum brin_test
2017-10-30 18:18:44.393 EDT [10932] LOG: deleting tuple 292 (of 292) in rel brin_test_d_idx page 2
2017-10-30 18:18:44.393 EDT [10932] STATEMENT: vacuum brin_test
2017-10-30 18:18:53.428 EDT [10932] LOG: deleting tuple 186 (of 186) in rel brin_test_e_idx page 3
2017-10-30 18:18:53.428 EDT [10932] STATEMENT: vacuum brin_test
2017-10-30 18:19:13.794 EDT [10938] LOG: deleting tuple 186 (of 186) in rel brin_test_e_idx page 4
2017-10-30 18:19:13.794 EDT [10938] STATEMENT: insert into brin_test select
mod(i,100000)/25,
mod(i,100000)/25,
mod(i,100000)/25,
mod(i,100000)/25,
md5((mod(i,100000)/25)::text)::uuid
from generate_series(1,100000) s(i)
2017-10-30 18:19:13.795 EDT [10932] LOG: deleting tuple 186 (of 185) in rel brin_test_e_idx page 4
2017-10-30 18:19:13.795 EDT [10932] STATEMENT: vacuum brin_test
2017-10-30 18:19:13.795 EDT [10932] PANIC: invalid index offnum: 186
2017-10-30 18:19:13.795 EDT [10932] STATEMENT: vacuum brin_test

So what happened here is that the inserting process decided to
summarize concurrently with the VACUUM process, and the inserting
process deleted (or maybe just updated/moved?) the placeholder tuple
that VACUUM was expecting to delete, and then we get the PANIC because
the tuple we're expecting to delete is already gone.

So: I put the blame on the fact that summarize_range() thinks that
the tuple offset it has for the placeholder tuple is guaranteed to
hold good, even across possibly-long intervals where it's holding
no lock on the containing buffer.

Fixing this without creating new problems is beyond my familiarity
with the BRIN code. It looks like it might be nontrivial :-(

regards, tom lane

Attachments:

hack-to-log-PageIndexTupleDeleteNoCompact.patchtext/x-diff; charset=us-ascii; name=hack-to-log-PageIndexTupleDeleteNoCompact.patchDownload+28-14
#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#3)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Thanks everyone for the analysis downthread. Here's a test case that
provokes the crash faster. Initialize with

create table brin_test (a text);
create index on brin_test using brin (a) with (pages_per_range = 1);

Then in one psql, run this:
select brin_summarize_new_values('brin_test_a_idx') \watch 0.1

and a pgbench running a script with this line (one client is enough):

insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));

It crashes in 10-20 seconds for me.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#8)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Tom Lane wrote:

So: I put the blame on the fact that summarize_range() thinks that
the tuple offset it has for the placeholder tuple is guaranteed to
hold good, even across possibly-long intervals where it's holding
no lock on the containing buffer.

Yeah, I think this is a pretty reasonable explanation for the problem.
I don't understand why it doesn't fail in 9.6.

Fixing this without creating new problems is beyond my familiarity
with the BRIN code. It looks like it might be nontrivial :-(

Looking.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#10)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

So: I put the blame on the fact that summarize_range() thinks that
the tuple offset it has for the placeholder tuple is guaranteed to
hold good, even across possibly-long intervals where it's holding
no lock on the containing buffer.

Yeah, I think this is a pretty reasonable explanation for the problem.
I don't understand why it doesn't fail in 9.6.

Yeah, we're still missing an understanding of why we didn't see it
before; the inadequate locking was surely there before. I'm guessing
that somehow the previous behavior of PageIndexDeleteNoCompact managed
to mask the problem (perhaps only by not throwing an error, which doesn't
imply that the index state was good afterwards). But I don't see quite
how it did that.

One thing I think we do know now is that the bug is triggered by two
concurrent executions of summarize_range. So I'd look for edge cases
like whether the placeholder tuple can be deleted and then reinserted
into the same lp index.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#11)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

So: I put the blame on the fact that summarize_range() thinks that
the tuple offset it has for the placeholder tuple is guaranteed to
hold good, even across possibly-long intervals where it's holding
no lock on the containing buffer.

Yeah, I think this is a pretty reasonable explanation for the problem.
I don't understand why it doesn't fail in 9.6.

Yeah, we're still missing an understanding of why we didn't see it
before; the inadequate locking was surely there before. I'm guessing
that somehow the previous behavior of PageIndexDeleteNoCompact managed
to mask the problem (perhaps only by not throwing an error, which doesn't
imply that the index state was good afterwards). But I don't see quite
how it did that.

Because 24992c6d has added a check on the offset number by using
PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks
tighter, no? I have not tested, and I lack knowledge about the brin
code, but it seems to me that if we had a similar check then things
could likely blow up.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#12)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Michael Paquier <michael.paquier@gmail.com> writes:

On Tue, Oct 31, 2017 at 4:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, we're still missing an understanding of why we didn't see it
before; the inadequate locking was surely there before.

Because 24992c6d has added a check on the offset number by using
PageIndexTupleDeleteNoCompact() in brin_doupdate() making checks
tighter, no?

No, I don't see how it's tighter. The old code matched supplied
offnum(s) against the indexes of not-unused items, and then after
that loop it complained if they weren't all matched. So it should
also have failed, albeit with a different error message, if it were
passed an offnum corresponding to a no-longer-live tuple.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#13)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

So in a few more runs this morning using Alvaro's simplified test case,
I have seen the following behaviors not previously reported:

1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index
offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the
same message appears in plain PageIndexTupleDelete as well.
I think we've been too hasty to assume all instances of this came out of
PageIndexTupleDeleteNoCompact.

2. Crashes in the data-insertion process, not only the process running
summarize_range:

#1 0x0000003b78a33c75 in abort () at abort.c:92
#2 0x000000000086e527 in errfinish (dummy=<value optimized out>) at elog.c:557
#3 0x000000000086f334 in elog_finish (elevel=<value optimized out>,
fmt=<value optimized out>) at elog.c:1378
#4 0x000000000075d48f in PageIndexTupleDeleteNoCompact (
rel=<value optimized out>, buf=2772, page=0x7f0188438080 "\002", offnum=39)
at bufpage.c:995
#5 0x0000000000476fd2 in brin_doupdate (idxrel=0x7f0186633e90,
pagesPerRange=1, revmap=0xba, heapBlk=2542, oldbuf=2772, oldoff=39,
origtup=0x2784010, origsz=8, newtup=0x2784098, newsz=400,
samepage=0 '\000') at brin_pageops.c:270
#6 0x0000000000475430 in brininsert (idxRel=0x7f0186633e90,
values=0x7ffce7b827f0, nulls=0x7ffce7b828f0 "", heaptid=0x279fb3c,
heapRel=<value optimized out>, checkUnique=<value optimized out>,
indexInfo=0x279e370) at brin.c:292
#7 0x000000000061da18 in ExecInsertIndexTuples (slot=0x279e638,
tupleid=0x279fb3c, estate=0x279dd10, noDupErr=0 '\000', specConflict=0x0,
arbiterIndexes=0x0) at execIndexing.c:387

The postmaster log for this, with even more debug logging thrown in:

2017-10-31 11:58:03.466 EDT [23506] LOG: summarize_range(brin_test_a_idx,2542,2543) created placeholder at 68,39
2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG: brin_doupdate(brin_test_a_idx) at 68,39: getinsertbuffer returned page 70
2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG: deleting tuple 39 (of 39) in rel brin_test_a_idx page 68
2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23506] LOG: brin_doupdate(brin_test_a_idx) at 68,39: placed at 70,1
2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:03.466 EDT [23509] LOG: brin_doupdate(brin_test_a_idx) at 68,39: getinsertbuffer returned page 70
2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23509] LOG: deleting tuple 39 (of 38) in rel brin_test_a_idx page 68
2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23509] PANIC: invalid index offnum: 39
2017-10-31 11:58:03.466 EDT [23509] STATEMENT: insert into brin_test values (repeat(chr(ascii('A') + int4(random() * 61)), int4(random() * 200)));
2017-10-31 11:58:03.466 EDT [23506] LOG: summarize_range(brin_test_a_idx,2542,2543): update at 68,39 succeeded
2017-10-31 11:58:03.466 EDT [23506] STATEMENT: select brin_summarize_new_values('brin_test_a_idx')
2017-10-31 11:58:04.517 EDT [23493] LOG: server process (PID 23509) was terminated by signal 6: Aborted

So what evidently happened here is that brininsert decided it needed to
insert into the placeholder tuple that summarize_range had just created,
but by the time it got lock on the buffer, summarize_range had deleted
the placeholder and instead created a new tuple on another page.

HAH: I see how the connection to PageIndexTupleDeleteNoCompact's
change applies. After re-locking the buffer, brin_doupdate tries
to check whether the tuple's been deleted, but it needs to do it
more like this, else it might be accessing off the end of the
lp array:

	/*
	 * Check that the old tuple wasn't updated concurrently: it might have
	 * moved someplace else entirely ...
	 */
-	if (!ItemIdIsNormal(oldlp))
+	if (oldoff > PageGetMaxOffsetNumber(oldpage) ||
+	    !ItemIdIsNormal(oldlp))
	{
		LockBuffer(oldbuf, BUFFER_LOCK_UNLOCK);

This is a pre-existing bug, but before 24992c6d, it would only
have manifested in the (very?) unusual corner case that
PageIndexDeleteNoCompact was able to reset the page to empty.

However, when I fix that and then run Alvaro's test case, I get
ERROR: corrupted BRIN index: inconsistent range map
so there's more creepy-crawlies around here somewhere.

I really don't understand how any of this "let's release the buffer
lock and then take it back later" logic is supposed to work reliably.

BTW, while I'm bitching, it seems fairly insane from a concurrency
standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace
while holding at least one and possibly two buffer locks. Shouldn't
that be done someplace else?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#14)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Tom Lane wrote:

So in a few more runs this morning using Alvaro's simplified test case,
I have seen the following behaviors not previously reported:

1. Crashes in PageIndexTupleOverwrite, which has the same "invalid index
offnum: %u" error report as PageIndexTupleDeleteNoCompact. I note the
same message appears in plain PageIndexTupleDelete as well.
I think we've been too hasty to assume all instances of this came out of
PageIndexTupleDeleteNoCompact.

Ah, I wasn't paying close attention to the originator routine of the
message, but you're right, I see this one too.

2. Crashes in the data-insertion process, not only the process running
summarize_range:

Yeah, I saw these. I was expecting it, since the two routines
(brininsert and summarize_range) pretty much share the insertion
protocol.

I really don't understand how any of this "let's release the buffer
lock and then take it back later" logic is supposed to work reliably.

Yeah, evidently that was way too optimistic and I'll need to figure out
a better mechanism to handle this.

The intention was to avoid deadlocks while locking the target page for
the insertion: by having both pages start unlocked we can simply lock
them in block number order. If we keep the page containing the tuple
locked, I don't see how to reliably avoid a deadlock while acquiring a
buffer to insert the new tuple.

BTW, while I'm bitching, it seems fairly insane from a concurrency
standpoint that brin_getinsertbuffer is calling RecordPageWithFreeSpace
while holding at least one and possibly two buffer locks. Shouldn't
that be done someplace else?

Hmm. I spent a lot of effort (commit ccc4c074994d) to avoid leaving
pages uninitialized / unrecorded in FSM. I left this on purpose on the
rationale that trying to fix it would make the callsites more convoluted
(the retry logic doesn't help). But as I recall this was supposed to be
done only in the rare case where the buffer could not be returned to
caller ... but that's not what the current code does, so there is
something wrong there.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#14)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Tom Lane wrote:

I really don't understand how any of this "let's release the buffer
lock and then take it back later" logic is supposed to work reliably.

So summarize_range first inserts the placeholder tuple, which is there
purposefully for other processes to update concurrently; then, without
blocking any other process, scan the page range and update the
placeholder tuple (this could take a long time, so it'd be a bad idea
to hold buffer lock for that long).

I think what we should do is rethink the locking considerations in
brin_doupdate vs. brinGetTupleForHeapBlock, and how they are used in
summarize_range and brininsert. In summarize_range, instead of hoping
that in some cases we will not need to re-obtain the placeholder tuple,
just do that in all cases keeping the buffer locked until the tuple is
updated.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#16)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Tom Lane wrote:

I really don't understand how any of this "let's release the buffer
lock and then take it back later" logic is supposed to work reliably.

So summarize_range first inserts the placeholder tuple, which is there
purposefully for other processes to update concurrently; then, without
blocking any other process, scan the page range and update the
placeholder tuple (this could take a long time, so it'd be a bad idea
to hold buffer lock for that long).

Yeah, agreed, and your upthread point about avoiding deadlock when we
need to take two buffer locks at the same time is also something that
it's hard to see any other way around. Maybe we just have to find a
way to make the existing structure work. The sticking point is that
not only might the tuple we expected have been deleted, but someone
could have put an unrelated tuple in its place. Are you confident
that you can detect that and recover from it? If you're sure that
brin_tuples_equal() is trustworthy for this purpose, then maybe
we just have some run-of-the-mill bugs to find, like the off-the-end
bug I spotted in brin_doupdate. There's apparently at least one
more, but given the error message it must be something like not
checking for a page to have turned into a revmap page. Shouldn't
be too hard to find...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

I wrote:

maybe
we just have some run-of-the-mill bugs to find, like the off-the-end
bug I spotted in brin_doupdate. There's apparently at least one
more, but given the error message it must be something like not
checking for a page to have turned into a revmap page. Shouldn't
be too hard to find...

Actually, I think it might be as simple as the attached.
brin_getinsertbuffer checks for the old page having turned into revmap,
but the "samepage" path in brin_doupdate does not :-(

With this applied, Alvaro's version of the test case has survived
without error for quite a bit longer than its former MTBF. There
might still be some issues though in other code paths.

regards, tom lane

Attachments:

fix-brin_doupdate.patchtext/x-diff; charset=us-ascii; name=fix-brin_doupdate.patchDownload+10-10
#19Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#18)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

Hi,

On 10/31/2017 08:46 PM, Tom Lane wrote:

I wrote:

maybe
we just have some run-of-the-mill bugs to find, like the off-the-end
bug I spotted in brin_doupdate. There's apparently at least one
more, but given the error message it must be something like not
checking for a page to have turned into a revmap page. Shouldn't
be too hard to find...

Actually, I think it might be as simple as the attached.
brin_getinsertbuffer checks for the old page having turned into revmap,
but the "samepage" path in brin_doupdate does not :-(

With this applied, Alvaro's version of the test case has survived
without error for quite a bit longer than its former MTBF. There
might still be some issues though in other code paths.

That does fix the crashes for me - I've been unable to reproduce any
even after one hour (it took a couple of minutes to crash before).

Unfortunately, I think we still have a problem ... I've been wondering
if we end up producing correct indexes, so I've done a simple test.

1) create the table as before

2) let the insert + vacuum run for some time, to see if there are
crashes (result: no crashes after one hour, inserting ~92M rows)

3) do a bunch of random updates on the data (while still doing the
concurrent vacuum in another session)

4) run a bunch of simple queries to compare the results, essentially

-- BRIN index
SET enable_bitmapscan = on;
SELECT COUNT(*) FROM brin_test WHERE a = $1;

-- seq scan
SET enable_bitmapscan = on;
SELECT COUNT(*) FROM brin_test WHERE a = $1;

and unfortunately what I get is not particularly pleasant:

test=# set enable_bitmapscan = on;
SET
test=# select count(*) from brin_test where a = 0;
count
-------
9062
(1 row)

test=# set enable_bitmapscan = off;
SET
test=# select count(*) from brin_test where a = 0;
count
-------
9175
(1 row)

Attached is a SQL script with commands I used. You'll need to copy the
commands into multiple psql sessions, though, to simulate concurrent
activity).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

brin-test.sqlapplication/sql; name=brin-test.sqlDownload
#20Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#19)
Re: Re: PANIC: invalid index offnum: 186 when processing BRIN indexes in VACUUM

On 10/31/2017 11:44 PM, Tomas Vondra wrote:

...
Unfortunately, I think we still have a problem ... I've been wondering
if we end up producing correct indexes, so I've done a simple test.

1) create the table as before

2) let the insert + vacuum run for some time, to see if there are
crashes (result: no crashes after one hour, inserting ~92M rows)

3) do a bunch of random updates on the data (while still doing the
concurrent vacuum in another session)

4) run a bunch of simple queries to compare the results, essentially

-- BRIN index
SET enable_bitmapscan = on;
SELECT COUNT(*) FROM brin_test WHERE a = $1;

-- seq scan
SET enable_bitmapscan = on;
SELECT COUNT(*) FROM brin_test WHERE a = $1;

and unfortunately what I get is not particularly pleasant:

test=# set enable_bitmapscan = on;
SET
test=# select count(*) from brin_test where a = 0;
count
-------
9062
(1 row)

test=# set enable_bitmapscan = off;
SET
test=# select count(*) from brin_test where a = 0;
count
-------
9175
(1 row)

Attached is a SQL script with commands I used. You'll need to copy the
commands into multiple psql sessions, though, to simulate concurrent
activity).

FWIW I can reproduce this on 9.5, and I don't even need to run the
UPDATE part. That is, INSERT + VACUUM running concurrently is enough to
produce broken BRIN indexes :-(

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#20)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#22)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#24)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#19)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#26)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#28)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#30)
#32Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#31)
#33Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#26)
#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#34)
#36Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#36)
#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#37)
#39Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#38)
#40Jeff Janes
jeff.janes@gmail.com
In reply to: Alvaro Herrera (#36)
#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#40)
#42Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#41)