BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

Started by Sveinn Sveinssonalmost 9 years ago8 messageshackersbugs
Jump to latest
#1Sveinn Sveinsson
sveinn.sveinsson@gmail.com
hackersbugs

The following bug has been logged on the website:

Bug reference: 14657
Logged by: Sveinn Sveinsson
Email address: sveinn.sveinsson@gmail.com
PostgreSQL version: Unsupported/Unknown
Operating system: Linux x86_64
Description:

The following causes segmentation fault in v10, May 10th development
snapshot:

create table test (sd timestamp,anb varchar(16)) partition by range (sd);

create table test_1 partition of test for values from ('2017-01-01') to
('2017-02-01');

create index test_1_a on test_1 (anb,sd);

insert into test values ('2017-01-01','12345');

select min(sd), max(sd) from test where anb='12345';

The server log file shows:

2017-05-16 09:36:13.243 UTC [1503] LOG: server process (PID 3474) was
terminated by signal 11: Segmentation fault
2017-05-16 09:36:13.243 UTC [1503] DETAIL: Failed process was running:
select min(sd), max(sd) from test where anb='12345';
2017-05-16 09:36:13.244 UTC [1503] LOG: terminating any other active server
processes
2017-05-16 09:36:13.245 UTC [3463] WARNING: terminating connection because
of crash of another server process

The stack trace is (no debug info):

Program terminated with signal 11, Segmentation fault.
#0 0x000000000061ab1b in list_nth ()
(gdb) bt
#0 0x000000000061ab1b in list_nth ()
#1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
#2 0x00000000005f4d52 in ExecInitMergeAppend ()
#3 0x00000000005e0365 in ExecInitNode ()
#4 0x00000000005f35a7 in ExecInitLimit ()
#5 0x00000000005e00f3 in ExecInitNode ()
#6 0x00000000005dd207 in standard_ExecutorStart ()
#7 0x00000000006f96d2 in PortalStart ()
#8 0x00000000006f5c7f in exec_simple_query ()
#9 0x00000000006f6fac in PostgresMain ()
#10 0x0000000000475cdc in ServerLoop ()
#11 0x0000000000692ffa in PostmasterMain ()
#12 0x0000000000476600 in main ()

Regards,
Sveinn Sveinsson.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Dilip Kumar
dilipbalaut@gmail.com
In reply to: Sveinn Sveinsson (#1)
hackersbugs
Re: BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:

(gdb) bt
#0 0x000000000061ab1b in list_nth ()
#1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
#2 0x00000000005f4d52 in ExecInitMergeAppend ()
#3 0x00000000005e0365 in ExecInitNode ()
#4 0x00000000005f35a7 in ExecInitLimit ()
#5 0x00000000005e00f3 in ExecInitNode ()
#6 0x00000000005dd207 in standard_ExecutorStart ()
#7 0x00000000006f96d2 in PortalStart ()
#8 0x00000000006f5c7f in exec_simple_query ()
#9 0x00000000006f6fac in PostgresMain ()
#10 0x0000000000475cdc in ServerLoop ()
#11 0x0000000000692ffa in PostmasterMain ()
#12 0x0000000000476600 in main ()

Seems like the issue is that the plans under multiple subroots are
pointing to the same partitioned_rels.

If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
*plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
problem is that set_plan_refs called for different subroot is updating
the same partition_rel info and make this value completely wrong which
will ultimately make ExecLockNonLeafAppendTables to access the out of
bound "rte" index.

set_plan_refs
{
[clipped]
case T_MergeAppend:
{
[clipped]

foreach(l, splan->partitioned_rels)
{
lfirst_int(l) += rtoffset;

I think the solution should be that create_merge_append_path make the
copy of partitioned_rels list?

Attached patch fixes the problem but I am not completely sure about the fix.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachments:

fix_merge_append.patchapplication/octet-stream; name=fix_merge_append.patchDownload+1-1
#3Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Dilip Kumar (#2)
hackersbugs
Re: BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

On 2017/05/18 2:14, Dilip Kumar wrote:

On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:

(gdb) bt
#0 0x000000000061ab1b in list_nth ()
#1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
#2 0x00000000005f4d52 in ExecInitMergeAppend ()
#3 0x00000000005e0365 in ExecInitNode ()
#4 0x00000000005f35a7 in ExecInitLimit ()
#5 0x00000000005e00f3 in ExecInitNode ()
#6 0x00000000005dd207 in standard_ExecutorStart ()
#7 0x00000000006f96d2 in PortalStart ()
#8 0x00000000006f5c7f in exec_simple_query ()
#9 0x00000000006f6fac in PostgresMain ()
#10 0x0000000000475cdc in ServerLoop ()
#11 0x0000000000692ffa in PostmasterMain ()
#12 0x0000000000476600 in main ()

Thanks for the test case Sveinn and thanks Dilip for analyzing.

Seems like the issue is that the plans under multiple subroots are
pointing to the same partitioned_rels.

That's correct.

If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
*plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
problem is that set_plan_refs called for different subroot is updating
the same partition_rel info and make this value completely wrong which
will ultimately make ExecLockNonLeafAppendTables to access the out of
bound "rte" index.

Yes.

set_plan_refs
{
[clipped]
case T_MergeAppend:
{
[clipped]

foreach(l, splan->partitioned_rels)
{
lfirst_int(l) += rtoffset;

I think the solution should be that create_merge_append_path make the
copy of partitioned_rels list?

Yes, partitioned_rels should be copied.

Attached patch fixes the problem but I am not completely sure about the fix.

Thanks for creating the patch, although I think a better fix would be to
make get_partitioned_child_rels() do the list_copy. That way, any other
users of partitioned_rels will not suffer the same issue. Attached patch
implements that, along with a regression test.

Added to the open items.

Thanks,
Amit

Attachments:

0001-Fix-crash-when-partitioned-table-Append-referenced-i.patchtext/x-diff; name=0001-Fix-crash-when-partitioned-table-Append-referenced-i.patchDownload+42-2
#4Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Amit Langote (#3)
hackersbugs
Re: BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

On 2017/05/18 10:49, Amit Langote wrote:

On 2017/05/18 2:14, Dilip Kumar wrote:

On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:

(gdb) bt
#0 0x000000000061ab1b in list_nth ()
#1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
#2 0x00000000005f4d52 in ExecInitMergeAppend ()
#3 0x00000000005e0365 in ExecInitNode ()
#4 0x00000000005f35a7 in ExecInitLimit ()
#5 0x00000000005e00f3 in ExecInitNode ()
#6 0x00000000005dd207 in standard_ExecutorStart ()
#7 0x00000000006f96d2 in PortalStart ()
#8 0x00000000006f5c7f in exec_simple_query ()
#9 0x00000000006f6fac in PostgresMain ()
#10 0x0000000000475cdc in ServerLoop ()
#11 0x0000000000692ffa in PostmasterMain ()
#12 0x0000000000476600 in main ()

Thanks for the test case Sveinn and thanks Dilip for analyzing.

Seems like the issue is that the plans under multiple subroots are
pointing to the same partitioned_rels.

That's correct.

If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
*plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
problem is that set_plan_refs called for different subroot is updating
the same partition_rel info and make this value completely wrong which
will ultimately make ExecLockNonLeafAppendTables to access the out of
bound "rte" index.

Yes.

set_plan_refs
{
[clipped]
case T_MergeAppend:
{
[clipped]

foreach(l, splan->partitioned_rels)
{
lfirst_int(l) += rtoffset;

I think the solution should be that create_merge_append_path make the
copy of partitioned_rels list?

Yes, partitioned_rels should be copied.

Attached patch fixes the problem but I am not completely sure about the fix.

Thanks for creating the patch, although I think a better fix would be to
make get_partitioned_child_rels() do the list_copy. That way, any other
users of partitioned_rels will not suffer the same issue. Attached patch
implements that, along with a regression test.

Added to the open items.

Oops, forgot to cc -hackers. Patch attached again.

Thanks,
Amit

Attachments:

0001-Fix-crash-when-partitioned-table-Append-referenced-i.patchtext/x-diff; name=0001-Fix-crash-when-partitioned-table-Append-referenced-i.patchDownload+42-2
#5Dilip Kumar
dilipbalaut@gmail.com
In reply to: Amit Langote (#3)
hackersbugs
Re: BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

On Thu, May 18, 2017 at 7:19 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

Thanks for creating the patch, although I think a better fix would be to
make get_partitioned_child_rels() do the list_copy. That way, any other
users of partitioned_rels will not suffer the same issue. Attached patch
implements that, along with a regression test.

Correct! This is generic fix.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Sveinn Sveinsson
sveinn.sveinsson@gmail.com
In reply to: Amit Langote (#4)
hackersbugs
Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

The patch fixed the problem, thanks a lot.
Regards,
Sveinn.

On fim 18.maí 2017 01:53, Amit Langote wrote:

On 2017/05/18 10:49, Amit Langote wrote:

On 2017/05/18 2:14, Dilip Kumar wrote:

On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:

(gdb) bt
#0 0x000000000061ab1b in list_nth ()
#1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
#2 0x00000000005f4d52 in ExecInitMergeAppend ()
#3 0x00000000005e0365 in ExecInitNode ()
#4 0x00000000005f35a7 in ExecInitLimit ()
#5 0x00000000005e00f3 in ExecInitNode ()
#6 0x00000000005dd207 in standard_ExecutorStart ()
#7 0x00000000006f96d2 in PortalStart ()
#8 0x00000000006f5c7f in exec_simple_query ()
#9 0x00000000006f6fac in PostgresMain ()
#10 0x0000000000475cdc in ServerLoop ()
#11 0x0000000000692ffa in PostmasterMain ()
#12 0x0000000000476600 in main ()

Thanks for the test case Sveinn and thanks Dilip for analyzing.

Seems like the issue is that the plans under multiple subroots are
pointing to the same partitioned_rels.

That's correct.

If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
*plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
problem is that set_plan_refs called for different subroot is updating
the same partition_rel info and make this value completely wrong which
will ultimately make ExecLockNonLeafAppendTables to access the out of
bound "rte" index.

Yes.

set_plan_refs
{
[clipped]
case T_MergeAppend:
{
[clipped]

foreach(l, splan->partitioned_rels)
{
lfirst_int(l) += rtoffset;

I think the solution should be that create_merge_append_path make the
copy of partitioned_rels list?

Yes, partitioned_rels should be copied.

Attached patch fixes the problem but I am not completely sure about the fix.

Thanks for creating the patch, although I think a better fix would be to
make get_partitioned_child_rels() do the list_copy. That way, any other
users of partitioned_rels will not suffer the same issue. Attached patch
implements that, along with a regression test.

Added to the open items.

Oops, forgot to cc -hackers. Patch attached again.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Amit Langote (#4)
hackersbugs
Re: Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

On Thu, May 18, 2017 at 7:23 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2017/05/18 10:49, Amit Langote wrote:

On 2017/05/18 2:14, Dilip Kumar wrote:

On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote:

(gdb) bt
#0 0x000000000061ab1b in list_nth ()
#1 0x00000000005e4081 in ExecLockNonLeafAppendTables ()
#2 0x00000000005f4d52 in ExecInitMergeAppend ()
#3 0x00000000005e0365 in ExecInitNode ()
#4 0x00000000005f35a7 in ExecInitLimit ()
#5 0x00000000005e00f3 in ExecInitNode ()
#6 0x00000000005dd207 in standard_ExecutorStart ()
#7 0x00000000006f96d2 in PortalStart ()
#8 0x00000000006f5c7f in exec_simple_query ()
#9 0x00000000006f6fac in PostgresMain ()
#10 0x0000000000475cdc in ServerLoop ()
#11 0x0000000000692ffa in PostmasterMain ()
#12 0x0000000000476600 in main ()

Thanks for the test case Sveinn and thanks Dilip for analyzing.

Seems like the issue is that the plans under multiple subroots are
pointing to the same partitioned_rels.

That's correct.

If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
*plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
problem is that set_plan_refs called for different subroot is updating
the same partition_rel info and make this value completely wrong which
will ultimately make ExecLockNonLeafAppendTables to access the out of
bound "rte" index.

Yes.

set_plan_refs
{
[clipped]
case T_MergeAppend:
{
[clipped]

foreach(l, splan->partitioned_rels)
{
lfirst_int(l) += rtoffset;

I think the solution should be that create_merge_append_path make the
copy of partitioned_rels list?

Yes, partitioned_rels should be copied.

Attached patch fixes the problem but I am not completely sure about the fix.

Thanks for creating the patch, although I think a better fix would be to
make get_partitioned_child_rels() do the list_copy. That way, any other
users of partitioned_rels will not suffer the same issue. Attached patch
implements that, along with a regression test.

Added to the open items.

Oops, forgot to cc -hackers. Patch attached again.

May be we should add a comment as to why the copy is needed.

We still have the same copy shared across multiple append paths and
set_plan_refs would change change it underneath those. May not be a
problem right now but may be a problem in the future. Another option,
which consumes a bit less memory is to make a copy at the time of
planning if the path gets selected as the cheapest path.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Robert Haas
robertmhaas@gmail.com
In reply to: Ashutosh Bapat (#7)
hackersbugs
Re: [HACKERS] Re: BUG #14657: Server process segmentation fault in v10, May 10th dev snapshot

On Fri, May 19, 2017 at 6:07 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

We still have the same copy shared across multiple append paths and
set_plan_refs would change change it underneath those. May not be a
problem right now but may be a problem in the future.

I agree. I think it's better for the path-creation functions to copy
the list, so that there is no surprising sharing of substructure.
set_plan_refs() obviously expects this data to be unshared, and this
seems like the best way to ensure that's true in all cases.

Committed that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs