Segfault when creating partition with a primary key and sql_drop trigger exists

Started by Marco Slotover 7 years ago29 messageshackers
Jump to latest
#1Marco Slot
marco@citusdata.com

We're seeing a segmentation fault when creating a partition of a
partitioned table with a primary key when there is a sql_drop trigger on
Postgres 11beta4.

We discovered it because the Citus extension creates a sql_drop trigger,
but it's otherwise unrelated to the Citus extension:
https://github.com/citusdata/citus/issues/2390

To reproduce:

CREATE OR REPLACE FUNCTION on_drop()
RETURNS event_trigger AS $ondrop$
BEGIN
RAISE NOTICE 'drop_trigger';
END;
$ondrop$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER fail_drop_table ON sql_drop
EXECUTE PROCEDURE on_drop();

CREATE TABLE collections_list (
key bigint,
ts timestamptz,
collection_id integer,
value numeric,
PRIMARY KEY(collection_id)
) PARTITION BY LIST ( collection_id );

CREATE TABLE collections_list_1
PARTITION OF collections_list (key, ts, collection_id, value)
FOR VALUES IN (1);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Marco

#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Marco Slot (#1)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Thu, Sep 20, 2018 at 12:00:18PM +0200, Marco Slot wrote:

We're seeing a segmentation fault when creating a partition of a
partitioned table with a primary key when there is a sql_drop trigger on
Postgres 11beta4.

Thanks for reporting ; I reproduced easily so added to open items list, since
indices on partitioned talbes is a feature new in PG11.

Core was generated by `postgres: pryzbyj ts [local] CREATE TABLE '.
Program terminated with signal 11, Segmentation fault.
#0 0x000000000059d186 in EventTriggerAlterTableRelid (objectId=40108800) at event_trigger.c:1745
1745 event_trigger.c: No such file or directory.
in event_trigger.c

(gdb) bt
#0 0x000000000059d186 in EventTriggerAlterTableRelid (objectId=40108800) at event_trigger.c:1745
#1 0x00000000005dfbd3 in AlterTableInternal (relid=40108800, cmds=0x21c39a8, recurse=true) at tablecmds.c:3328
#2 0x00000000005b5b7b in DefineIndex (relationId=40108800, stmt=0x21a7350, indexRelationId=0, parentIndexId=40084714, parentConstraintId=40084715,
is_alter_table=false, check_rights=false, check_not_in_use=false, skip_build=false, quiet=false) at indexcmds.c:669
#3 0x00000000005dcfee in DefineRelation (stmt=0x2116690, relkind=114 'r', ownerId=17609, typaddress=0x0,
queryString=0x20f1e60 "CREATE TABLE collections_list_1\nPARTITION OF collections_list (key, ts, collection_id, value)\nFOR VALUES IN (1);") at tablecmds.c:946
[...]

Justin

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Marco Slot (#1)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On 2018-Sep-20, Marco Slot wrote:

We're seeing a segmentation fault when creating a partition of a
partitioned table with a primary key when there is a sql_drop trigger on
Postgres 11beta4.

We discovered it because the Citus extension creates a sql_drop trigger,
but it's otherwise unrelated to the Citus extension:
https://github.com/citusdata/citus/issues/2390

Thanks for the reproducer. Will research.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#3)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Mon, 24 Sep 2018 at 17:58, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2018-Sep-20, Marco Slot wrote:

We're seeing a segmentation fault when creating a partition of a
partitioned table with a primary key when there is a sql_drop trigger on
Postgres 11beta4.

We discovered it because the Citus extension creates a sql_drop trigger,
but it's otherwise unrelated to the Citus extension:
https://github.com/citusdata/citus/issues/2390

Thanks for the reproducer. Will research.

Shouldn't it be fixed by adding EventTriggerAlterTableStart? Judging from the
following call of ATController, we can just pass NULL as parsetree.

Attachments:

sql_drop_on_partition.patchapplication/octet-stream; name=sql_drop_on_partition.patchDownload+1-0
#5Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#4)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Tue, Sep 25, 2018 at 01:39:59PM +0200, Dmitry Dolgov wrote:

Shouldn't it be fixed by adding EventTriggerAlterTableStart? Judging from the
following call of ATController, we can just pass NULL as parsetree.

Hmm. I don't think that this is correct as this data could always be
used to fetch a command tag, right? It seems to me instead that we
should pass down IndexStmt and handle things like the attached.
Thoughts?
--
Michael

Attachments:

event-partition-v1.patchtext/x-diff; charset=us-asciiDownload+11-4
#6Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Michael Paquier (#5)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Wed, 26 Sep 2018 at 05:33, Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Sep 25, 2018 at 01:39:59PM +0200, Dmitry Dolgov wrote:

Shouldn't it be fixed by adding EventTriggerAlterTableStart? Judging from the
following call of ATController, we can just pass NULL as parsetree.

Hmm. I don't think that this is correct as this data could always be
used to fetch a command tag, right? It seems to me instead that we
should pass down IndexStmt and handle things like the attached.
Thoughts?

Yes, you're right. Although probably it's not great that the sequence of
EventTriggerAlterTableStart, EventTriggerAlterTableRelid and
EventTriggerAlterTableEnd is distributed between different functions (which
most likely is worth a comment in AlterTableInternal), but it's a minor
concern.

#7Michael Paquier
michael@paquier.xyz
In reply to: Dmitry Dolgov (#6)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Thu, Sep 27, 2018 at 11:02:06PM +0200, Dmitry Dolgov wrote:

Yes, you're right. Although probably it's not great that the sequence of
EventTriggerAlterTableStart, EventTriggerAlterTableRelid and
EventTriggerAlterTableEnd is distributed between different functions (which
most likely is worth a comment in AlterTableInternal), but it's a minor
concern.

I think that Alvaro should definitely look at this patch to be sure, or
I could do it, but I would need to spend way more time on this and check
event trigger interactions.

Anyway, I was struggling a bit regarding the location where adding a
regression test. event_trigger.sql makes the most sense but in tests
for drops the objects are created before the event trigger is defined,
so that would need to move around so as the original problem is
reproducible. Perhaps you have an idea for that?
--
Michael

#8Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#7)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Fri, Sep 28, 2018 at 12:17:00PM +0900, Michael Paquier wrote:

I think that Alvaro should definitely look at this patch to be sure, or
I could do it, but I would need to spend way more time on this and check
event trigger interactions.

Anyway, I was struggling a bit regarding the location where adding a
regression test. event_trigger.sql makes the most sense but in tests
for drops the objects are created before the event trigger is defined,
so that would need to move around so as the original problem is
reproducible. Perhaps you have an idea for that?

Okay. I have spent more time on this issue, and I have been able to
integrate a test in the existing event_trigger.sql which is able to
reproduce the reported failure. Attached is what I am finishing with.

I still want to do more testing on it, and the day is ending here.

Thoughts?
--
Michael

Attachments:

event-partition-v2.patchtext/x-diff; charset=us-asciiDownload+43-5
#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#8)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

I admit I'm surprised that your patch fixes the bug. sql_drop was added
before the command-stashing was added for pg_event_trigger_ddl_commands
was added, and sql_drop only processes objects from the list passed to
performMultipleDeletions, so adding the EventTriggerAlterTableStart() /
End() calls should not affect it ... evidently I must be missing
something here.

Still looking.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#10Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Alvaro Herrera (#9)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Wed, 3 Oct 2018 at 09:53, Michael Paquier <michael@paquier.xyz> wrote:

On Fri, Sep 28, 2018 at 12:17:00PM +0900, Michael Paquier wrote:

I think that Alvaro should definitely look at this patch to be sure, or
I could do it, but I would need to spend way more time on this and check
event trigger interactions.

Anyway, I was struggling a bit regarding the location where adding a
regression test. event_trigger.sql makes the most sense but in tests
for drops the objects are created before the event trigger is defined,
so that would need to move around so as the original problem is
reproducible. Perhaps you have an idea for that?

Okay. I have spent more time on this issue, and I have been able to
integrate a test in the existing event_trigger.sql which is able to
reproduce the reported failure. Attached is what I am finishing with.

I still want to do more testing on it, and the day is ending here.

Thoughts?

Sorry, couldn't answer your previous message, since was away. So far I don't
see any problems with your proposed patch.

On Thu, 4 Oct 2018 at 17:22, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I admit I'm surprised that your patch fixes the bug. sql_drop was added
before the command-stashing was added for pg_event_trigger_ddl_commands
was added, and sql_drop only processes objects from the list passed to
performMultipleDeletions, so adding the EventTriggerAlterTableStart() /
End() calls should not affect it ... evidently I must be missing
something here.

Still looking.

I also find strange another part of this problem, namely why we ended up doing
AlterTableInternal at all, since I assumed that after MergeAttributes all
attnotnull should be merged with OR. But for example these would work:

CREATE TABLE collections_list_1
PARTITION OF collections_list
FOR VALUES IN (1);

CREATE TABLE collections_list_1
PARTITION OF collections_list (key, ts, collection_id not null, value)
FOR VALUES IN (1);

Looks like in MergeAttributes at the branch:

if (is_partition && list_length(saved_schema) > 0)

we override not null property of ColumnDef:

if (coldef->is_from_parent)
{
coldef->is_not_null = restdef->is_not_null;

It looks a bit confusing, so I wonder if it's how it should be?

#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#9)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On 2018-Oct-04, Alvaro Herrera wrote:

I admit I'm surprised that your patch fixes the bug. sql_drop was added
before the command-stashing was added for pg_event_trigger_ddl_commands
was added, and sql_drop only processes objects from the list passed to
performMultipleDeletions, so adding the EventTriggerAlterTableStart() /
End() calls should not affect it ... evidently I must be missing
something here.

I think the explanation for this is that the problem has nothing to do
with sql_drop per se -- it's only that having a sql_drop trigger causes
the event trigger stuff to get invoked, and the bogus code involving
ddl_command_end (the one that's affected by the
EventTriggerAlterTableStart dance) is what crashes.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#8)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On 2018-Oct-03, Michael Paquier wrote:

Okay. I have spent more time on this issue, and I have been able to
integrate a test in the existing event_trigger.sql which is able to
reproduce the reported failure. Attached is what I am finishing with.

I still want to do more testing on it, and the day is ending here.

I looked at this, and I think that this particular crash you're fixing
is just a symptom of a larger problem in the event trigger alter table
handling -- I noticed while playing a bit with
src/test/modules/test_ddl_decoding (modify sql/alter_table.sql to create
a partitioned table with nullable columns and a partition, then alter
table to add a primary key).

I'm tied up in something else at the moment so can't spend more time on
it, but I hope to have time to give it a look over the weekend.

Thanks

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#12)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm tied up in something else at the moment so can't spend more time on
it, but I hope to have time to give it a look over the weekend.

Keep in mind that RC1 is scheduled to wrap Monday afternoon ...

regards, tom lane

#14Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#13)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Thu, Oct 04, 2018 at 06:04:49PM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm tied up in something else at the moment so can't spend more time on
it, but I hope to have time to give it a look over the weekend.

Keep in mind that RC1 is scheduled to wrap Monday afternoon ...

... Which is little time to work on a complete fix. Surely we could
wait for GA for a fix? I don't find nice to release v11 with this bug.
--
Michael

#15Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#14)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

Hi,

On 2018-10-05 08:29:29 +0900, Michael Paquier wrote:

On Thu, Oct 04, 2018 at 06:04:49PM -0400, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

I'm tied up in something else at the moment so can't spend more time on
it, but I hope to have time to give it a look over the weekend.

Keep in mind that RC1 is scheduled to wrap Monday afternoon ...

... Which is little time to work on a complete fix. Surely we could
wait for GA for a fix? I don't find nice to release v11 with this bug.

Are you suggesting we fix after RC1, or delay RC1? I'm not 100% sure
I'm parsing your sentence correctly.

Greetings,

Andres Freund

#16Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#15)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On Thu, Oct 04, 2018 at 04:54:45PM -0700, Andres Freund wrote:

Are you suggesting we fix after RC1, or delay RC1? I'm not 100% sure
I'm parsing your sentence correctly.

I am suggesting to fix the issue after RC1 is released, but before GA.
--
Michael

#17Jonathan S. Katz
jkatz@postgresql.org
In reply to: Michael Paquier (#16)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On 10/4/18 8:34 PM, Michael Paquier wrote:

On Thu, Oct 04, 2018 at 04:54:45PM -0700, Andres Freund wrote:

Are you suggesting we fix after RC1, or delay RC1? I'm not 100% sure
I'm parsing your sentence correctly.

I am suggesting to fix the issue after RC1 is released, but before GA.

That approach would mean we would require an RC2, which would further
delay the GA.

Based on our release process and various schedules, any delays to the GA
date at this point would push the release well into mid-November. Part
of the reason that we selected the current proposed date for the GA is
that we would have the .1 available 3 weeks after the release during the
regularly scheduled cumulative update.

Ideally it would be great if this was fixed for RC1. However, given the
choice of pushing the release out further vs. saving the fix for .1
which would be relatively soon, I would vote for the latter.

Jonathan

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jonathan S. Katz (#17)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

"Jonathan S. Katz" <jkatz@postgresql.org> writes:

On 10/4/18 8:34 PM, Michael Paquier wrote:

I am suggesting to fix the issue after RC1 is released, but before GA.

That approach would mean we would require an RC2, which would further
delay the GA.

Not sure about that. Alvaro seems to think there's a generic problem
in event trigger processing, which if true, was likely there pre-v11.
I don't think that patches that get back-patched further than 11
need to restart the RC clock.

Ideally it would be great if this was fixed for RC1. However, given the
choice of pushing the release out further vs. saving the fix for .1
which would be relatively soon, I would vote for the latter.

There are definitely good calendar reasons to hold to the schedule of
RC1 next week and GA the week after. I'd only want to blow up that
plan if we hit something that is both very bad and very hard to fix.

However, if we have a fix that we believe in that's available post-RC1,
I'm not sure I see why it's better to sit on it till after GA.

regards, tom lane

#19Jonathan S. Katz
jkatz@postgresql.org
In reply to: Tom Lane (#18)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

On 10/4/18 11:37 PM, Tom Lane wrote:

"Jonathan S. Katz" <jkatz@postgresql.org> writes:

On 10/4/18 8:34 PM, Michael Paquier wrote:

I am suggesting to fix the issue after RC1 is released, but before GA.

That approach would mean we would require an RC2, which would further
delay the GA.

Not sure about that. Alvaro seems to think there's a generic problem
in event trigger processing, which if true, was likely there pre-v11.
I don't think that patches that get back-patched further than 11
need to restart the RC clock.

Well, unless we are targeting it for the release? AIUI the RCs are
should be equivalent to GA[1]https://www.postgresql.org/about/news/1783/ (and yes I see the qualifier of "should be").

Ideally it would be great if this was fixed for RC1. However, given the
choice of pushing the release out further vs. saving the fix for .1
which would be relatively soon, I would vote for the latter.

There are definitely good calendar reasons to hold to the schedule of
RC1 next week and GA the week after. I'd only want to blow up that
plan if we hit something that is both very bad and very hard to fix.

However, if we have a fix that we believe in that's available post-RC1,
I'm not sure I see why it's better to sit on it till after GA.

I'm not opposed; after seeing the "should be" qualifier I feel more
comfortable with the above proposal.

Jonathan

[1]: https://www.postgresql.org/about/news/1783/

#20Andres Freund
andres@anarazel.de
In reply to: Jonathan S. Katz (#19)
Re: Segfault when creating partition with a primary key and sql_drop trigger exists

Hi,

On 2018-10-05 15:31:37 -0400, Jonathan S. Katz wrote:

On 10/4/18 11:37 PM, Tom Lane wrote:

"Jonathan S. Katz" <jkatz@postgresql.org> writes:

On 10/4/18 8:34 PM, Michael Paquier wrote:

I am suggesting to fix the issue after RC1 is released, but before GA.

That approach would mean we would require an RC2, which would further
delay the GA.

Not sure about that. Alvaro seems to think there's a generic problem
in event trigger processing, which if true, was likely there pre-v11.
I don't think that patches that get back-patched further than 11
need to restart the RC clock.

Well, unless we are targeting it for the release? AIUI the RCs are
should be equivalent to GA[1] (and yes I see the qualifier of "should be").

FWIW, I think that's a pretty pointless restriction. We release
bugfixes in minor releases all the time, so there's imo absolutely no
point in having a blanket restriction that a fix that we'd put in a
minor release shouldn't be slipped in between RC and GA.

Greetings,

Andres Freund

#21Jonathan S. Katz
jkatz@postgresql.org
In reply to: Andres Freund (#20)
#22Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#18)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#22)
#24Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#23)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Michael Paquier (#24)
#26Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#25)
#27Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#26)
#28Michael Paquier
michael@paquier.xyz
In reply to: Alvaro Herrera (#27)
#29Arthur Zakirov
a.zakirov@postgrespro.ru
In reply to: Alvaro Herrera (#26)