Parallel bt build crashes when DSM_NONE
Hello.
I happend to find that server crashes during regtest when
DSM_NONE is enforced. The attached patch fixes that.
The cause is the fact that _bt_spools_heapscan runs
_bt_begin_parallel() even if dynamic_shared_memory_type is
DSM_NONE. It is because plan_create_index_workers() is ignoring
dynamic_shared_memory_type.
We can reproduce this by letting initdb set
dynamic_shared_memory_type=none regardless of actual
availability. (Second attached) and just "make check".
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachments:
enfoce-dsm_none_in_regtest.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2efd3b7..876e153 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -871,6 +871,7 @@ choose_dsm_implementation(void)
#ifdef HAVE_SHM_OPEN
int ntries = 10;
+ return "none";
while (ntries > 0)
{
uint32 handle;
let-plan_create_index_workers-honor_dsm_none.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 740de49..3e8cd14 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -5825,7 +5825,8 @@ plan_create_index_workers(Oid tableOid, Oid indexOid)
double allvisfrac;
/* Return immediately when parallelism disabled */
- if (max_parallel_maintenance_workers == 0)
+ if (dynamic_shared_memory_type == DSM_IMPL_NONE ||
+ max_parallel_maintenance_workers == 0)
return 0;
/* Set up largely-dummy planner state */
On Fri, Feb 09, 2018 at 05:06:35PM +0900, Kyotaro HORIGUCHI wrote:
I happend to find that server crashes during regtest when
DSM_NONE is enforced. The attached patch fixes that.The cause is the fact that _bt_spools_heapscan runs
_bt_begin_parallel() even if dynamic_shared_memory_type is
DSM_NONE. It is because plan_create_index_workers() is ignoring
dynamic_shared_memory_type.
Adding Peter Geoghegan as the author and Robert as the committer in CC,
as that's a mistake from 9da0cc35.
We can reproduce this by letting initdb set
dynamic_shared_memory_type=none regardless of actual
availability. (Second attached) and just "make check".
Or more simply you can just setup an instance with this configuration
and run installcheck. No need to patch initdb for that.
4 regression tests fail when using dynamic_shared_memory_type=none:
join, aggregates, select_parallel and write_parallel. test_shm_mq of
course blows up. Could that justify getting rid of DSM_IMPL_NONE? As
far as I can see there is an alternative on Windows, and we fallback to
sysv in the worst case. So I am wondering what's actually the use case
for "none". And it is good to keep alternate outputs at a minimum,
those tend to rot easily.
Except for those mind errands your patch looks good to me.
--
Michael
On Fri, Feb 9, 2018 at 12:06 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
I happend to find that server crashes during regtest when
DSM_NONE is enforced. The attached patch fixes that.The cause is the fact that _bt_spools_heapscan runs
_bt_begin_parallel() even if dynamic_shared_memory_type is
DSM_NONE. It is because plan_create_index_workers() is ignoring
dynamic_shared_memory_type.
I think that your patch does the right thing.
plan_create_index_workers() is supposed to take care of parallel
safety, and this is a parallel safety issue.
Thanks
--
Peter Geoghegan
On Mon, Feb 12, 2018 at 12:26 PM, Peter Geoghegan <pg@bowt.ie> wrote:
I think that your patch does the right thing.
plan_create_index_workers() is supposed to take care of parallel
safety, and this is a parallel safety issue.
Committed the patch.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
At Mon, 12 Feb 2018 15:00:54 -0500, Robert Haas <robertmhaas@gmail.com> wrote in <CA+TgmoZot0Pm9JqLs=_jspm78YCjeq=2u+0Oa4kg+gGVvO7Urg@mail.gmail.com>
On Mon, Feb 12, 2018 at 12:26 PM, Peter Geoghegan <pg@bowt.ie> wrote:
I think that your patch does the right thing.
plan_create_index_workers() is supposed to take care of parallel
safety, and this is a parallel safety issue.Committed the patch.
Thank you Robert for Committing, and Peter and Michael for the
comments and supplement.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Mon, 12 Feb 2018 22:05:36 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20180212130536.GA18625@paquier.xyz>
On Fri, Feb 09, 2018 at 05:06:35PM +0900, Kyotaro HORIGUCHI wrote:
4 regression tests fail when using dynamic_shared_memory_type=none:
join, aggregates, select_parallel and write_parallel. test_shm_mq of
course blows up. Could that justify getting rid of DSM_IMPL_NONE? As
A query runs into the fallback route in the case, which even
though the regtest doesn't care about. So it alone doesn't
justify that.
far as I can see there is an alternative on Windows, and we fallback to
sysv in the worst case. So I am wondering what's actually the use case
for "none". And it is good to keep alternate outputs at a minimum,
those tend to rot easily.
Agreed. As Tom mentioned, no bf animal haven't complained with
that since 9.4 and I belive they won't. initdb doesn't create a
database with DSM_IMPL_NONE. Although it can be manually
deactivated, the fact that we haven't have a complain about that
can justify to remove it to some extent. I'll post a patch that
eliminate DSM_IMPL_NONE after this. I'd like it to be shipped on
11.
Except for those mind errands your patch looks good to me.
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Wed, Feb 14, 2018 at 10:38:58AM +0900, Kyotaro HORIGUCHI wrote:
I'll post a patch that eliminate DSM_IMPL_NONE after this. I'd like it
to be shipped on 11.
Feel free to. Be just careful that the connection attempts in
test_config_settings() should try to use the DSM implementation defined
by choose_dsm_implementation().
--
Michael