Failed Assert in pgstat_assoc_relation

Started by vignesh Cover 3 years ago17 messageshackers
Jump to latest
#1vignesh C
vignesh21@gmail.com

Hi,

While reviewing/testing one of the patches I found the following Assert:
#0 __pthread_kill_implementation (no_tid=0, signo=6,
threadid=139624429171648) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=139624429171648) at
./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=139624429171648,
signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007efcda6e3476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4 0x00007efcda6c97f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00005590bf283139 in ExceptionalCondition
(conditionName=0x5590bf468170 "rel->pgstat_info->relation == NULL",
fileName=0x5590bf46812b "pgstat_relation.c", lineNumber=143) at
assert.c:66
#6 0x00005590bf0ce5f8 in pgstat_assoc_relation (rel=0x7efcce996a48)
at pgstat_relation.c:143
#7 0x00005590beb83046 in initscan (scan=0x5590bfbf4af8, key=0x0,
keep_startblock=false) at heapam.c:343
#8 0x00005590beb8466f in heap_beginscan (relation=0x7efcce996a48,
snapshot=0x5590bfb5a520, nkeys=0, key=0x0, parallel_scan=0x0,
flags=449) at heapam.c:1223
#9 0x00005590bf02af39 in table_beginscan (rel=0x7efcce996a48,
snapshot=0x5590bfb5a520, nkeys=0, key=0x0) at
../../../src/include/access/tableam.h:891
#10 0x00005590bf02bf8a in DefineQueryRewrite (rulename=0x5590bfb281d0
"_RETURN", event_relid=16387, event_qual=0x0, event_type=CMD_SELECT,
is_instead=true, replace=false, action=0x5590bfbf4aa8)
at rewriteDefine.c:447
#11 0x00005590bf02b5ab in DefineRule (stmt=0x5590bfb285c0,
queryString=0x5590bfb277a8 "CREATE RULE \"_RETURN\" AS ON SELECT TO t
DO INSTEAD SELECT * FROM t1;") at rewriteDefine.c:213

I could reproduce this issue with the following steps:
create table t1(c int);
BEGIN;
CREATE TABLE t (c int);
SAVEPOINT q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1;
select * from t;
ROLLBACK TO q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1;
ROLLBACK;

Regards,
Vignesh

#2Ajin Cherian
itsajin@gmail.com
In reply to: vignesh C (#1)
Re: Failed Assert in pgstat_assoc_relation

On Mon, Nov 28, 2022 at 8:10 PM vignesh C <vignesh21@gmail.com> wrote:

Hi,

While reviewing/testing one of the patches I found the following Assert:
#0 __pthread_kill_implementation (no_tid=0, signo=6,
threadid=139624429171648) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=139624429171648) at
./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=139624429171648,
signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007efcda6e3476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4 0x00007efcda6c97f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00005590bf283139 in ExceptionalCondition
(conditionName=0x5590bf468170 "rel->pgstat_info->relation == NULL",
fileName=0x5590bf46812b "pgstat_relation.c", lineNumber=143) at
assert.c:66
#6 0x00005590bf0ce5f8 in pgstat_assoc_relation (rel=0x7efcce996a48)
at pgstat_relation.c:143
#7 0x00005590beb83046 in initscan (scan=0x5590bfbf4af8, key=0x0,
keep_startblock=false) at heapam.c:343
#8 0x00005590beb8466f in heap_beginscan (relation=0x7efcce996a48,
snapshot=0x5590bfb5a520, nkeys=0, key=0x0, parallel_scan=0x0,
flags=449) at heapam.c:1223
#9 0x00005590bf02af39 in table_beginscan (rel=0x7efcce996a48,
snapshot=0x5590bfb5a520, nkeys=0, key=0x0) at
../../../src/include/access/tableam.h:891
#10 0x00005590bf02bf8a in DefineQueryRewrite (rulename=0x5590bfb281d0
"_RETURN", event_relid=16387, event_qual=0x0, event_type=CMD_SELECT,
is_instead=true, replace=false, action=0x5590bfbf4aa8)
at rewriteDefine.c:447
#11 0x00005590bf02b5ab in DefineRule (stmt=0x5590bfb285c0,
queryString=0x5590bfb277a8 "CREATE RULE \"_RETURN\" AS ON SELECT TO t
DO INSTEAD SELECT * FROM t1;") at rewriteDefine.c:213

I could reproduce this issue with the following steps:
create table t1(c int);
BEGIN;
CREATE TABLE t (c int);
SAVEPOINT q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1;
select * from t;
ROLLBACK TO q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1;
ROLLBACK;

Regards,
Vignesh

I think what is happening here is that the previous relation is not
unlinked when pgstat_init_relation() is called
because the relation is now a view and for relations without storage
the relation is not unlinked in pgstat_init_relation()

void
pgstat_init_relation(Relation rel)
{
char relkind = rel->rd_rel->relkind;

/*
* We only count stats for relations with storage and partitioned tables
*/
if (!RELKIND_HAS_STORAGE(relkind) && relkind != RELKIND_PARTITIONED_TABLE)
{
rel->pgstat_enabled = false;
rel->pgstat_info = NULL;
return;
}

There is a logic in DefineQueryRewrite() which converts a relation to
a view when you create such a rule like the test case does.
So initially the relation had storage, the pgstat_info is linked,
then table is converted to a view, but in init, the previous
relation is not unlinked but when it tries to link a new relation, the
assert fails saying a previous relation is already linked to
pgstat_info

I have made a small patch with a fix, but I am not sure if this is the
right way to fix this.

regards,
Ajin Cherian
Fujitsu Australia

Attachments:

pgstat_assoc_fix_for_views.patchapplication/octet-stream; name=pgstat_assoc_fix_for_views.patchDownload+4-11
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: vignesh C (#1)
Re: Failed Assert in pgstat_assoc_relation

vignesh C <vignesh21@gmail.com> writes:

I could reproduce this issue with the following steps:
create table t1(c int);
BEGIN;
CREATE TABLE t (c int);
SAVEPOINT q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1;
select * from t;
ROLLBACK TO q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1;
ROLLBACK;

Uh-huh. I've not bothered to trace this in detail, but presumably
what is happening is that the first CREATE RULE converts the table
to a view, and then the ROLLBACK undoes that so far as the catalogs
are concerned, but probably doesn't undo related pg_stats state
changes fully. Then we're in a bad state that will cause problems.
(It still crashes if you replace the second CREATE RULE with
"select * from t".)

As far as HEAD is concerned, maybe it's time to nuke the whole
convert-table-to-view kluge entirely? Only pg_dump older than
9.4 will emit such code, so we're really about out of reasons
to keep on maintaining it.

However, I'm not sure that removing that code in v15 will fly,
so maybe we need to make the new pg_stats code a little more
robust against the possibility of a relkind change.

regards, tom lane

#4Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#3)
Re: Failed Assert in pgstat_assoc_relation

Hi,

On 2022-11-28 13:37:16 -0500, Tom Lane wrote:

vignesh C <vignesh21@gmail.com> writes:

I could reproduce this issue with the following steps:
create table t1(c int);
BEGIN;
CREATE TABLE t (c int);
SAVEPOINT q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1;
select * from t;
ROLLBACK TO q;
CREATE RULE "_RETURN" AS ON SELECT TO t DO INSTEAD SELECT * FROM t1;
ROLLBACK;

Uh-huh. I've not bothered to trace this in detail, but presumably
what is happening is that the first CREATE RULE converts the table
to a view, and then the ROLLBACK undoes that so far as the catalogs
are concerned, but probably doesn't undo related pg_stats state
changes fully. Then we're in a bad state that will cause problems.
(It still crashes if you replace the second CREATE RULE with
"select * from t".)

Yea. I haven't yet fully traced through this, but presumably relcache inval
doesn't fix this because we don't want to loose pending stats after DDL.

Perhaps we need to add a rule about not swapping pgstat* in
RelationClearRelation() when relkind changes?

As far as HEAD is concerned, maybe it's time to nuke the whole
convert-table-to-view kluge entirely? Only pg_dump older than
9.4 will emit such code, so we're really about out of reasons
to keep on maintaining it.

Sounds good to me.

However, I'm not sure that removing that code in v15 will fly,

Agreed, at the very least that'd increase memory usage.

so maybe we need to make the new pg_stats code a little more
robust against the possibility of a relkind change.

Possibly via the relcache code.

Greetings,

Andres Freund

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#4)
Re: Failed Assert in pgstat_assoc_relation

Andres Freund <andres@anarazel.de> writes:

On 2022-11-28 13:37:16 -0500, Tom Lane wrote:

As far as HEAD is concerned, maybe it's time to nuke the whole
convert-table-to-view kluge entirely? Only pg_dump older than
9.4 will emit such code, so we're really about out of reasons
to keep on maintaining it.

Sounds good to me.

Here's a draft patch for that. If we apply this to HEAD then
we only need that klugery in relcache for v15.

regards, tom lane

Attachments:

remove-conversion-of-tables-to-views-1.patchtext/x-diff; charset=us-ascii; name=remove-conversion-of-tables-to-views-1.patchDownload+45-292
#6Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#4)
Re: Failed Assert in pgstat_assoc_relation

Hi,

On 2022-11-28 10:50:13 -0800, Andres Freund wrote:

On 2022-11-28 13:37:16 -0500, Tom Lane wrote:

Uh-huh. I've not bothered to trace this in detail, but presumably
what is happening is that the first CREATE RULE converts the table
to a view, and then the ROLLBACK undoes that so far as the catalogs
are concerned, but probably doesn't undo related pg_stats state
changes fully. Then we're in a bad state that will cause problems.
(It still crashes if you replace the second CREATE RULE with
"select * from t".)

Yea. I haven't yet fully traced through this, but presumably relcache inval
doesn't fix this because we don't want to loose pending stats after DDL.

Perhaps we need to add a rule about not swapping pgstat* in
RelationClearRelation() when relkind changes?

Something like the attached. Still needs a bit of polish, e.g. adding the test
case from above.

I'm a bit uncomfortable adding a function call below
* Perform swapping of the relcache entry contents. Within this
* process the old entry is momentarily invalid, so there *must* be no
* possibility of CHECK_FOR_INTERRUPTS within this sequence. Do it in
* all-in-line code for safety.
but it's not the first, see MemoryContextSetParent().

Greetings,

Andres Freund

Attachments:

relcache_assoc.difftext/x-diff; charset=us-asciiDownload+13-3
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#6)
Re: Failed Assert in pgstat_assoc_relation

Andres Freund <andres@anarazel.de> writes:

Something like the attached. Still needs a bit of polish, e.g. adding the test
case from above.

I'm a bit uncomfortable adding a function call below
* Perform swapping of the relcache entry contents. Within this
* process the old entry is momentarily invalid, so there *must* be no
* possibility of CHECK_FOR_INTERRUPTS within this sequence. Do it in
* all-in-line code for safety.

Ugh. I don't know what pgstat_unlink_relation does, but assuming
that it can never throw an error seems like a pretty bad idea,
especially when you aren't adding that to its API spec (contrast
the comments for MemoryContextSetParent).

Can't that part be done outside the critical section?

regards, tom lane

#8Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#7)
Re: Failed Assert in pgstat_assoc_relation

Hi,

On 2022-11-28 16:33:20 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Something like the attached. Still needs a bit of polish, e.g. adding the test
case from above.

I'm a bit uncomfortable adding a function call below
* Perform swapping of the relcache entry contents. Within this
* process the old entry is momentarily invalid, so there *must* be no
* possibility of CHECK_FOR_INTERRUPTS within this sequence. Do it in
* all-in-line code for safety.

Ugh. I don't know what pgstat_unlink_relation does, but assuming
that it can never throw an error seems like a pretty bad idea,

I don't think it'd be an issue - it just resets the pointer from a pgstat
entry to the relcache entry.

But you're right:

Can't that part be done outside the critical section?

we can do that. See the attached.

Do we have any cases of relcache entries changing their relkind?

Greetings,

Andres Freund

Attachments:

relcache_assoc_v2.difftext/x-diff; charset=us-asciiDownload+71-3
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#8)
Re: Failed Assert in pgstat_assoc_relation

Andres Freund <andres@anarazel.de> writes:

Do we have any cases of relcache entries changing their relkind?

Just the table-to-view hack. I'm not aware that there are any other
cases, and it seems hard to credit that there ever will be any.
I think we could get rid of table-to-view in HEAD, and use your patch
only in v15.

regards, tom lane

#10Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#9)
Re: Failed Assert in pgstat_assoc_relation

Hi,

On 2022-12-02 00:08:20 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Do we have any cases of relcache entries changing their relkind?

Just the table-to-view hack. I'm not aware that there are any other
cases, and it seems hard to credit that there ever will be any.

I can see some halfway credible scenarios. E.g. converting a view to a
matview, or a table into a partition. I kind of wonder if it's worth keeping
the change, just in case we do - it's not that easy to hit...

I think we could get rid of table-to-view in HEAD, and use your patch
only in v15.

WFM. I'll push it to 15 tomorrow.

Greetings,

Andres Freund

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#10)
Re: Failed Assert in pgstat_assoc_relation

Andres Freund <andres@anarazel.de> writes:

On 2022-12-02 00:08:20 -0500, Tom Lane wrote:

Just the table-to-view hack. I'm not aware that there are any other
cases, and it seems hard to credit that there ever will be any.

I can see some halfway credible scenarios. E.g. converting a view to a
matview, or a table into a partition. I kind of wonder if it's worth keeping
the change, just in case we do - it's not that easy to hit...

I'd suggest putting in an assertion that the relkind isn't changing,
instead. When and if somebody makes a credible feature patch that'd
require relaxing that, we can see what to do.

(There's a couple of places in rewriteHandler.c that could
perhaps be simplified, too.)

regards, tom lane

#12Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#11)
Re: Failed Assert in pgstat_assoc_relation

Hi,

On December 1, 2022 9:48:48 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On 2022-12-02 00:08:20 -0500, Tom Lane wrote:

Just the table-to-view hack. I'm not aware that there are any other
cases, and it seems hard to credit that there ever will be any.

I can see some halfway credible scenarios. E.g. converting a view to a
matview, or a table into a partition. I kind of wonder if it's worth keeping
the change, just in case we do - it's not that easy to hit...

I'd suggest putting in an assertion that the relkind isn't changing,
instead.

Sounds like a plan. Will you do that when you remove the table-to-view hack?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#12)
Re: Failed Assert in pgstat_assoc_relation

Andres Freund <andres@anarazel.de> writes:

On December 1, 2022 9:48:48 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'd suggest putting in an assertion that the relkind isn't changing,
instead.

Sounds like a plan. Will you do that when you remove the table-to-view hack?

I'd suggest committing it concurrently with the v15 fix, instead,
so that there's a cross-reference to what some future hacker might
need to install if they remove the assertion.

I guess that means that the table-to-view removal has to go in
first. I should be able to take care of that tomorrow, or if
you're in a hurry I don't mind if you commit it for me.

regards, tom lane

#14Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#13)
Re: Failed Assert in pgstat_assoc_relation

Hi,

On 2022-12-02 01:03:35 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On December 1, 2022 9:48:48 PM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'd suggest putting in an assertion that the relkind isn't changing,
instead.

Sounds like a plan. Will you do that when you remove the table-to-view hack?

I'd suggest committing it concurrently with the v15 fix, instead,
so that there's a cross-reference to what some future hacker might
need to install if they remove the assertion.

Good idea.

I guess that means that the table-to-view removal has to go in
first. I should be able to take care of that tomorrow, or if
you're in a hurry I don't mind if you commit it for me.

No particular hurry from my end.

Greetings,

Andres Freund

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#13)
Re: Failed Assert in pgstat_assoc_relation

I wrote:

I guess that means that the table-to-view removal has to go in
first. I should be able to take care of that tomorrow, or if
you're in a hurry I don't mind if you commit it for me.

Done now, feel free to deal with the pgstat problem.

regards, tom lane

#16Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#15)
Re: Failed Assert in pgstat_assoc_relation

Hi,

On 2022-12-02 12:15:37 -0500, Tom Lane wrote:

I wrote:

I guess that means that the table-to-view removal has to go in
first. I should be able to take care of that tomorrow, or if
you're in a hurry I don't mind if you commit it for me.

Done now, feel free to deal with the pgstat problem.

Thanks. I'm out for a few hours without proper computer access, couldn't
quite get it finished inbetween your push and now. Will deal with it once I
get back.

Greetings,

Andres Freund

#17Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#16)
Re: Failed Assert in pgstat_assoc_relation

Hi,

On 2022-12-02 09:51:39 -0800, Andres Freund wrote:

On 2022-12-02 12:15:37 -0500, Tom Lane wrote:

I wrote:

I guess that means that the table-to-view removal has to go in
first. I should be able to take care of that tomorrow, or if
you're in a hurry I don't mind if you commit it for me.

Done now, feel free to deal with the pgstat problem.

Thanks. I'm out for a few hours without proper computer access, couldn't
quite get it finished inbetween your push and now. Will deal with it once I
get back.

Pushed that now. I debated for a bit whether to backpatch the test all the
way, but after it took me a while to convince myself that there's no active
problem in the older branches, I decided it's a good idea.

Thanks Vignesh for the bugreports!

Greetings,

Andres Freund