parallel.c is not marked as test covered

Started by Clément Prévostalmost 10 years ago71 messageshackers
Jump to latest
#1Clément Prévost
prevostclement@gmail.com

The entire parallel.c reported test coverage is zero:
http://coverage.postgresql.org/src/backend/access/transam/parallel.c.gcov.html

It seem that it's not covered by the original 924bcf4f commit but I don't
know if it's on purpose. This feature being really new that would be
understandable.

If it's not, my first idea would be to fix this by adding a simple sql test
in /src/test/regress, in the "sql" and "expected" subdirectories that
explain (cost off) some queries.
I'm also guessing here that we can force a query to have a parallel plan
with some cost magic to avoid dealing with sufficiently large datasets.

#2David Rowley
dgrowleyml@gmail.com
In reply to: Clément Prévost (#1)
Re: parallel.c is not marked as test covered

On 9 May 2016 at 09:12, Clément Prévost <prevostclement@gmail.com> wrote:

The entire parallel.c reported test coverage is zero:
http://coverage.postgresql.org/src/backend/access/transam/parallel.c.gcov.html

It seem that it's not covered by the original 924bcf4f commit but I don't
know if it's on purpose. This feature being really new that would be
understandable.

If it's not, my first idea would be to fix this by adding a simple sql test
in /src/test/regress, in the "sql" and "expected" subdirectories that
explain (cost off) some queries.
I'm also guessing here that we can force a query to have a parallel plan
with some cost magic to avoid dealing with sufficiently large datasets.

I'm not entirely sure which machine generates that coverage output,
but the problem with it is that it's just one machine. We do have at
least one buildfarm member which runs with force_parallel_mode =
regress. See http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=mandrill&amp;br=HEAD

If the coverage was run from that machine then you'd see something
higher than 0% on parallel.c.

It would be nice to improve this a bit and have the planner generate a
handful of parallel plans even without force_parallel_mode = regress.
I do believe we can now do this without having to create large tables
in the regression tests if we set parallel_setup_cost to something
low, perhaps 0 and set the table's parallel_degree to something higher
than 1. The problem with that is knowing what should actually be
tested. It seems like pretty much any plan which can generate a
parallel query is a good candidate for writing a test for, which makes
the choices for which tests to write quite hard. It makes you realise
why Robert when down the force_parallel_mode = regress path instead.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#2)
Re: parallel.c is not marked as test covered

David Rowley wrote:

I'm not entirely sure which machine generates that coverage output,
but the problem with it is that it's just one machine. We do have at
least one buildfarm member which runs with force_parallel_mode =
regress.

It's not a buildfarm machine, but a machine setup specifically for
coverage reports. It runs "make check-world" only. I can add some
additional command(s) to run, if somebody can suggest something.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4David Rowley
dgrowleyml@gmail.com
In reply to: Alvaro Herrera (#3)
Re: parallel.c is not marked as test covered

On 9 May 2016 at 13:20, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

David Rowley wrote:

I'm not entirely sure which machine generates that coverage output,
but the problem with it is that it's just one machine. We do have at
least one buildfarm member which runs with force_parallel_mode =
regress.

It's not a buildfarm machine, but a machine setup specifically for
coverage reports. It runs "make check-world" only. I can add some
additional command(s) to run, if somebody can suggest something.

I'm not familiar with what's possible with make check-world, but would
it be reasonable to make that just do another regression test run with
force_parallel_mode set to regress?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#4)
Re: parallel.c is not marked as test covered

David Rowley wrote:

On 9 May 2016 at 13:20, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

It's not a buildfarm machine, but a machine setup specifically for
coverage reports. It runs "make check-world" only. I can add some
additional command(s) to run, if somebody can suggest something.

I'm not familiar with what's possible with make check-world, but would
it be reasonable to make that just do another regression test run with
force_parallel_mode set to regress?

My suggestion isn't to change what "make check-world" does; it's just to
change the script in the coverage machine so that it runs some other
command after "make check-world" and before "make coverage-html".

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6David Rowley
dgrowleyml@gmail.com
In reply to: Alvaro Herrera (#5)
Re: parallel.c is not marked as test covered

On 9 May 2016 at 14:26, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

David Rowley wrote:

On 9 May 2016 at 13:20, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

It's not a buildfarm machine, but a machine setup specifically for
coverage reports. It runs "make check-world" only. I can add some
additional command(s) to run, if somebody can suggest something.

I'm not familiar with what's possible with make check-world, but would
it be reasonable to make that just do another regression test run with
force_parallel_mode set to regress?

My suggestion isn't to change what "make check-world" does; it's just to
change the script in the coverage machine so that it runs some other
command after "make check-world" and before "make coverage-html".

I understand what you meant. I was just taking the suggestion one step
further as I thought that if you were adding that for the coverage
test then it might actually also be a good idea that it occurred in
make check-world. The parallel tests are quite thin as it is, so
perhaps its a good idea to get more machines running through the
parallel code.

pg_regress seems to support --temp-config, so we could just have a
config file which has; max_parallel_degree = 2 and force_parallel_mode
= regress.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Andres Freund
andres@anarazel.de
In reply to: Alvaro Herrera (#3)
Re: parallel.c is not marked as test covered

On 2016-05-08 22:20:55 -0300, Alvaro Herrera wrote:

David Rowley wrote:

I'm not entirely sure which machine generates that coverage output,
but the problem with it is that it's just one machine. We do have at
least one buildfarm member which runs with force_parallel_mode =
regress.

It's not a buildfarm machine, but a machine setup specifically for
coverage reports. It runs "make check-world" only. I can add some
additional command(s) to run, if somebody can suggest something.

I think it's a good idea to run a force-parallel run on some buildfarm
members. But I'm rather convinced that the core tests run by all animals
need some minimal coverage of parallel queries. Both because otherwise
it'll be hard to get some coverage of unusual platforms, and because
it's imo something rather relevant to test during development.

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#7)
Re: parallel.c is not marked as test covered

Andres Freund <andres@anarazel.de> writes:

I think it's a good idea to run a force-parallel run on some buildfarm
members.

Noah's already doing that on at least one of his critters. But some more
wouldn't hurt.

But I'm rather convinced that the core tests run by all animals
need some minimal coverage of parallel queries. Both because otherwise
it'll be hard to get some coverage of unusual platforms, and because
it's imo something rather relevant to test during development.

+1. Experimenting with what we might do, it seems like it's harder to get
the planner to use a parallel plan than you would think.

regression=# explain select count(*) from tenk1;
QUERY PLAN

--------------------------------------------------------------------------------
-------------------
Aggregate (cost=295.29..295.30 rows=1 width=8)
-> Index Only Scan using tenk1_thous_tenthous on tenk1 (cost=0.29..270.29 r
ows=10000 width=0)
(2 rows)

regression=# set enable_indexscan TO 0;
SET
regression=# explain select count(*) from tenk1;
QUERY PLAN
-----------------------------------------------------------------
Aggregate (cost=483.00..483.01 rows=1 width=8)
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=0)
(2 rows)

regression=# set force_parallel_mode TO on;
SET
regression=# explain select count(*) from tenk1;
QUERY PLAN
-----------------------------------------------------------------
Aggregate (cost=483.00..483.01 rows=1 width=8)
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=0)
(2 rows)

Methinks force_parallel_mode is a bit broken.

Also, once you *do* get it to make a parallel plan:

regression=# create table foo as select generate_series(1,1000000) g;
SELECT 1000000
regression=# analyze foo;
ANALYZE
regression=# explain select count(*) from foo;
QUERY PLAN
--------------------------------------------------------------------------------------
Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8)
-> Gather (cost=10633.33..10633.54 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8)
-> Parallel Seq Scan on foo (cost=0.00..8591.67 rows=416667 width=0)
(5 rows)

regression=# explain (costs off) select count(*) from foo;
QUERY PLAN
--------------------------------------------
Finalize Aggregate
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on foo
(5 rows)

That's not going to do for a regression-test case because it will break
anytime somebody changes the value of max_parallel_degree. Maybe we
should suppress the "Workers Planned" field in costs-off display mode?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: parallel.c is not marked as test covered

On Mon, May 9, 2016 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I think it's a good idea to run a force-parallel run on some buildfarm
members.

Noah's already doing that on at least one of his critters. But some more
wouldn't hurt.

I agree.

But I'm rather convinced that the core tests run by all animals
need some minimal coverage of parallel queries. Both because otherwise
it'll be hard to get some coverage of unusual platforms, and because
it's imo something rather relevant to test during development.

+1. Experimenting with what we might do, it seems like it's harder to get
the planner to use a parallel plan than you would think.

regression=# explain select count(*) from tenk1;
QUERY PLAN

--------------------------------------------------------------------------------
-------------------
Aggregate (cost=295.29..295.30 rows=1 width=8)
-> Index Only Scan using tenk1_thous_tenthous on tenk1 (cost=0.29..270.29 r
ows=10000 width=0)
(2 rows)

regression=# set enable_indexscan TO 0;
SET
regression=# explain select count(*) from tenk1;
QUERY PLAN
-----------------------------------------------------------------
Aggregate (cost=483.00..483.01 rows=1 width=8)
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=0)
(2 rows)

regression=# set force_parallel_mode TO on;
SET
regression=# explain select count(*) from tenk1;
QUERY PLAN
-----------------------------------------------------------------
Aggregate (cost=483.00..483.01 rows=1 width=8)
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=0)
(2 rows)

Methinks force_parallel_mode is a bit broken.

Hmm, that is strange. I would have expected that to stuff a Gather on
top of the Aggregate. I wonder why it's not doing that.

Also, once you *do* get it to make a parallel plan:

regression=# create table foo as select generate_series(1,1000000) g;
SELECT 1000000
regression=# analyze foo;
ANALYZE
regression=# explain select count(*) from foo;
QUERY PLAN
--------------------------------------------------------------------------------------
Finalize Aggregate (cost=10633.55..10633.56 rows=1 width=8)
-> Gather (cost=10633.33..10633.54 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=9633.33..9633.34 rows=1 width=8)
-> Parallel Seq Scan on foo (cost=0.00..8591.67 rows=416667 width=0)
(5 rows)

regression=# explain (costs off) select count(*) from foo;
QUERY PLAN
--------------------------------------------
Finalize Aggregate
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on foo
(5 rows)

That's not going to do for a regression-test case because it will break
anytime somebody changes the value of max_parallel_degree. Maybe we
should suppress the "Workers Planned" field in costs-off display mode?

That seems reasonable to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#9)
Re: parallel.c is not marked as test covered

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, May 9, 2016 at 11:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

regression=# set force_parallel_mode TO on;
SET
regression=# explain select count(*) from tenk1;
QUERY PLAN
-----------------------------------------------------------------
Aggregate (cost=483.00..483.01 rows=1 width=8)
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=0)
(2 rows)

Methinks force_parallel_mode is a bit broken.

Hmm, that is strange. I would have expected that to stuff a Gather on
top of the Aggregate. I wonder why it's not doing that.

The reason is that create_plain_partial_paths() contains a hard-wired
decision not to generate any partial paths for relations smaller than
1000 blocks, which means that no partial path will ever be generated
for *any* relation in the standard regression tests, force_parallel_mode
or no.

I would just go fix this, along the lines of

*************** create_plain_partial_paths(PlannerInfo *
*** 702,708 ****
* with all of its inheritance siblings it may well pay off.
*/
if (rel->pages < parallel_threshold &&
! rel->reloptkind == RELOPT_BASEREL)
return;

        /*
--- 703,710 ----
         * with all of its inheritance siblings it may well pay off.
         */
        if (rel->pages < parallel_threshold &&
!           rel->reloptkind == RELOPT_BASEREL &&
!           force_parallel_mode == FORCE_PARALLEL_OFF)
            return;

/*

except that doing so and running the regression tests with
force_parallel_mode = on results in core dumps. Some debugging seems
indicated ... and we should at this point assume that there has been no
useful testing of parallel query in the buildfarm, not even on Noah's
machines.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#10)
Re: parallel.c is not marked as test covered

On Wed, May 11, 2016 at 12:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm, that is strange. I would have expected that to stuff a Gather on
top of the Aggregate. I wonder why it's not doing that.

The reason is that create_plain_partial_paths() contains a hard-wired
decision not to generate any partial paths for relations smaller than
1000 blocks, which means that no partial path will ever be generated
for *any* relation in the standard regression tests, force_parallel_mode
or no.

Well that's an interesting theory, except that you've completely
missed the point of force_parallel_mode. force_parallel_mode pushes a
special Gather node on top of any plan that is not already parallel in
some way but which is parallel-safe. That special Gather node runs
only in the worker, not the leader, and always uses just one worker.
The point is to test that queries run in the worker in the same way
that they do in the leader. This has already found lots of bugs, so
it's clearly useful, despite all the confusion it's created.

I would just go fix this, along the lines of

*************** create_plain_partial_paths(PlannerInfo *
*** 702,708 ****
* with all of its inheritance siblings it may well pay off.
*/
if (rel->pages < parallel_threshold &&
! rel->reloptkind == RELOPT_BASEREL)
return;

/*
--- 703,710 ----
* with all of its inheritance siblings it may well pay off.
*/
if (rel->pages < parallel_threshold &&
!           rel->reloptkind == RELOPT_BASEREL &&
!           force_parallel_mode == FORCE_PARALLEL_OFF)
return;

/*

except that doing so and running the regression tests with
force_parallel_mode = on results in core dumps.

Nonetheless, that is a bug. (I only get one crash - do you get more?)

Some debugging seems
indicated ... and we should at this point assume that there has been no
useful testing of parallel query in the buildfarm, not even on Noah's
machines.

Yes and no. The force_parallel_mode stuff is primarily there to tell
us about things like "you marked a function as parallel-safe, but it
actually blows up when run in a worker". It won't catch a case where
a function is marked parallel-safe but silently does the wrong thing
instead of failing, but it will catch quite a few mistakes of that
kind, and I think that's good.

What it won't do, though, is actually run "real" parallel queries -
that is, multiple workers doing a Parallel Seq Scan with some other
stuff pushed on top, and then a Gather. And we should have test
coverage for that, too, so that we're testing real concurrent
behavior, and not just our ability to run plans in a worker.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#11)
Re: parallel.c is not marked as test covered

On Wed, May 11, 2016 at 10:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, May 11, 2016 at 12:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm, that is strange. I would have expected that to stuff a Gather on
top of the Aggregate. I wonder why it's not doing that.

The reason is that create_plain_partial_paths() contains a hard-wired
decision not to generate any partial paths for relations smaller than
1000 blocks, which means that no partial path will ever be generated
for *any* relation in the standard regression tests, force_parallel_mode
or no.

Well that's an interesting theory, except that you've completely
missed the point of force_parallel_mode. force_parallel_mode pushes a
special Gather node on top of any plan that is not already parallel in
some way but which is parallel-safe. That special Gather node runs
only in the worker, not the leader, and always uses just one worker.

​What happens when there are no workers available due to
max_worker_processes ​already being assigned?

Related question, if max_parallel_degree is >1 and "the requested number of
workers may not actually be available at runtime" is true, does the degree
of parallelism minimize at 1 worker + leader or will the leader simply run
the query by itself?

David J.

#13Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#11)
Re: parallel.c is not marked as test covered

On Wed, May 11, 2016 at 1:38 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I would just go fix this, along the lines of

*************** create_plain_partial_paths(PlannerInfo *
*** 702,708 ****
* with all of its inheritance siblings it may well pay off.
*/
if (rel->pages < parallel_threshold &&
! rel->reloptkind == RELOPT_BASEREL)
return;

/*
--- 703,710 ----
* with all of its inheritance siblings it may well pay off.
*/
if (rel->pages < parallel_threshold &&
!           rel->reloptkind == RELOPT_BASEREL &&
!           force_parallel_mode == FORCE_PARALLEL_OFF)
return;

/*

except that doing so and running the regression tests with
force_parallel_mode = on results in core dumps.

Nonetheless, that is a bug. (I only get one crash - do you get more?)

This looks like a bug in the parallel aggregate code.

#0 make_partialgroup_input_target (root=0x7faa5f002d20,
final_target=0x7faa5f004270) at planner.c:4308
4308 Index sgref = final_target->sortgrouprefs[i];
(gdb) p debug_query_string
$1 = 0x7faa5d00b638 "SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;"
Current language: auto; currently minimal
(gdb) bt
#0 make_partialgroup_input_target (root=0x7faa5f002d20,
final_target=0x7faa5f004270) at planner.c:4308
#1 0x0000000101df4889 in create_grouping_paths (root=0x7faa5f002d20,
input_rel=0x7faa5f0034b0, target=0x7faa5f004270, rollup_lists=0x0,
rollup_groupclauses=0x0) at planner.c:3421
#2 0x0000000101df19a0 in grouping_planner (root=0x7faa5f002d20,
inheritance_update=0 '\0', tuple_fraction=0) at planner.c:1796
#3 0x0000000101def276 in subquery_planner (glob=0x7faa5f002b90,
parse=0x7faa5d00cb80, parent_root=0x0, hasRecursion=0 '\0',
tuple_fraction=0) at planner.c:758
#4 0x0000000101dee2de in standard_planner (parse=0x7faa5d00cb80,
cursorOptions=256, boundParams=0x0) at planner.c:307
#5 0x0000000101dedf81 in planner (parse=0x7faa5d00cb80,
cursorOptions=256, boundParams=0x0) at planner.c:177
#6 0x0000000101eed7b6 in pg_plan_query (querytree=0x7faa5d00cb80,
cursorOptions=256, boundParams=0x0) at postgres.c:798
#7 0x0000000101eed8a3 in pg_plan_queries (querytrees=0x7faa5f002cc0,
cursorOptions=256, boundParams=0x0) at postgres.c:857
#8 0x0000000101ef05ad in exec_simple_query
(query_string=0x7faa5d00b638 "SELECT SUM(COUNT(f1)) OVER () FROM
int4_tbl WHERE f1=42;") at postgres.c:1022
#9 0x0000000101eefa8f in PostgresMain (argc=1, argv=0x7faa5b005be0,
dbname=0x7faa5b005940 "regression", username=0x7faa5b005920 "rhaas")
at postgres.c:4059
#10 0x0000000101e45209 in BackendRun (port=0x7faa5ae01770) at postmaster.c:4258
#11 0x0000000101e44449 in BackendStartup (port=0x7faa5ae01770) at
postmaster.c:3932
#12 0x0000000101e433ef in ServerLoop () at postmaster.c:1690
#13 0x0000000101e40a23 in PostmasterMain (argc=8, argv=0x7faa5ac099b0)
at postmaster.c:1298
#14 0x0000000101d6e160 in main (argc=8, argv=0x7faa5ac099b0) at main.c:228
(gdb) p final_target->sortgrouprefs
$2 = (Index *) 0x0

I don't immediately understand what's going wrong here. It looks to
me like make_group_input_target() already called, and that worked OK,
but now make_partialgroup_input_target() is failing using more-or-less
the same logic. Presumably that's because make_group_input_target()
was called on final_target as returned by create_pathtarget(root,
tlist), but make_partialgroup_input_target() is being called on
grouping_target, which I'm guessing came from
make_window_input_target, which somehow lacks sortgroupref labeling.
But I don't immediately see how that would happen, so there's
obviously something I'm missing here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: David G. Johnston (#12)
Re: parallel.c is not marked as test covered

On Wed, May 11, 2016 at 1:48 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

What happens when there are no workers available due to max_worker_processes
already being assigned?

Then the leader runs the plan after all.

Related question, if max_parallel_degree is >1 and "the requested number of
workers may not actually be available at runtime" is true, does the degree
of parallelism minimize at 1 worker + leader or will the leader simply run
the query by itself?

If the leader can get no workers at all, it will simply run the query
by itself. Of course, it tries to get as many as it can.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#13)
Re: parallel.c is not marked as test covered

On Wed, May 11, 2016 at 1:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I don't immediately understand what's going wrong here. It looks to
me like make_group_input_target() already called, and that worked OK,
but now make_partialgroup_input_target() is failing using more-or-less
the same logic. Presumably that's because make_group_input_target()
was called on final_target as returned by create_pathtarget(root,
tlist), but make_partialgroup_input_target() is being called on
grouping_target, which I'm guessing came from
make_window_input_target, which somehow lacks sortgroupref labeling.
But I don't immediately see how that would happen, so there's
obviously something I'm missing here.

So, it turns out you can reproduce this bug pretty easily without
force_parallel_mode, like this:

alter table int4_tbl set (parallel_degree = 4);
SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;

Or you can just a query that involves a window function on a table
large enough for parallelism to be considered:

SELECT SUM(COUNT(aid)) OVER () FROM pgbench_accounts;

The crash goes way if the target list involves at least one plain
column that uses no aggregate or window function, because then the
PathTarget list has a sortgrouprefs array. Trivial fix patch
attached, although some more review from you (Tom Lane, PathTarget
inventor and planner whiz) and David Rowley (author of this function)
would be appreciated in case there are deeper issues here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

avoid-null-sortgroups-crash.patchtext/x-diff; charset=US-ASCII; name=avoid-null-sortgroups-crash.patchDownload+4-1
#16David Rowley
dgrowleyml@gmail.com
In reply to: Robert Haas (#15)
Re: parallel.c is not marked as test covered

On 12 May 2016 at 07:04, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, May 11, 2016 at 1:57 PM, Robert Haas <robertmhaas@gmail.com>

wrote:

I don't immediately understand what's going wrong here. It looks to
me like make_group_input_target() already called, and that worked OK,
but now make_partialgroup_input_target() is failing using more-or-less
the same logic. Presumably that's because make_group_input_target()
was called on final_target as returned by create_pathtarget(root,
tlist), but make_partialgroup_input_target() is being called on
grouping_target, which I'm guessing came from
make_window_input_target, which somehow lacks sortgroupref labeling.
But I don't immediately see how that would happen, so there's
obviously something I'm missing here.

So, it turns out you can reproduce this bug pretty easily without
force_parallel_mode, like this:

alter table int4_tbl set (parallel_degree = 4);
SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;

Or you can just a query that involves a window function on a table
large enough for parallelism to be considered:

SELECT SUM(COUNT(aid)) OVER () FROM pgbench_accounts;

The crash goes way if the target list involves at least one plain
column that uses no aggregate or window function, because then the
PathTarget list has a sortgrouprefs array. Trivial fix patch
attached, although some more review from you (Tom Lane, PathTarget
inventor and planner whiz) and David Rowley (author of this function)
would be appreciated in case there are deeper issues here.

The problem is make_group_input_target() only calls
add_column_to_pathtarget() (which allocates this array) when there's a
GROUP BY, otherwise it just appends to the non_group_col list. Since your
query has no GROUP BY it means that add_column_to_pathtarget() is never
called with a non-zero sortgroupref.

It looks like Tom has intended that PathTarget->sortgrouprefs can be NULL
going by both the comment /* corresponding sort/group refnos, or 0 */, and
the coding inside add_column_to_pathtarget(), which does not allocate the
array if given a 0 sortgroupref.

It looks like make_sort_input_target(), make_window_input_target() and
make_group_input_target() all get away without this check because they're
all using final_target, which was built by make_pathtarget_from_tlist()
which *always* allocates the sortgrouprefs array, even if it's left filled
with zeros.

It might be better if this was all consistent. Perhaps it would be worth
modifying make_pathtarget_from_tlist() to only allocate the sortgrouprefs
array if there's any non-zero tle->ressortgroupref, then modify the other
make_*_input_target() functions to handle a NULL array, similar to the fix
that's in your patch. This saves an allocation which is likely much more
expensive than the NULL check later. Alternatively
add_column_to_pathtarget() could be modified to allocate the array even if
sortgroupref is zero.

I think consistency is good here, as if this had been consistent this would
not be a bug.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#17Clément Prévost
prevostclement@gmail.com
In reply to: Andres Freund (#7)
Re: parallel.c is not marked as test covered

On Mon, May 9, 2016 at 4:50 PM Andres Freund <andres@anarazel.de> wrote:

I think it's a good idea to run a force-parallel run on some buildfarm
members. But I'm rather convinced that the core tests run by all animals
need some minimal coverage of parallel queries. Both because otherwise
it'll be hard to get some coverage of unusual platforms, and because
it's imo something rather relevant to test during development.

Good point.

After some experiments, I found out that, for my setup (9b7bfc3a88ef7b), a
parallel seq scan is used given both parallel_setup_cost
and parallel_tuple_cost are set to 0 and given that the table is at least 3
times as large as the biggest test table tenk1.

The attached patch is a regression test using this method that is
reasonably small and fast to run. I also hid the workers count from the
explain output when costs are disabled as suggested by Tom Lane and Robert
Haas on this same thread (
/messages/by-id/CA+TgmobBQS4ss3+CwoZOKgbsBqKfRndwc=hLiALAep5aXQCTDA@mail.gmail.com
).

Testing under these conditions does not test the planner job at all but at
least some parallel code can be run on the build farm and the test suite
gets 2643 more lines and 188 more function covered.

I don't know however if this test will be reliable on other platforms, some
more feedback is needed here.

Attachments:

select_parallel_regress.patchapplication/octet-stream; name=select_parallel_regress.patchDownload+56-3
#18Noah Misch
noah@leadboat.com
In reply to: Clément Prévost (#17)
Re: parallel.c is not marked as test covered

On Sun, May 15, 2016 at 12:53:13PM +0000, Cl�ment Pr�vost wrote:

On Mon, May 9, 2016 at 4:50 PM Andres Freund <andres@anarazel.de> wrote:

I think it's a good idea to run a force-parallel run on some buildfarm
members. But I'm rather convinced that the core tests run by all animals
need some minimal coverage of parallel queries. Both because otherwise
it'll be hard to get some coverage of unusual platforms, and because
it's imo something rather relevant to test during development.

Good point.

After some experiments, I found out that, for my setup (9b7bfc3a88ef7b), a
parallel seq scan is used given both parallel_setup_cost
and parallel_tuple_cost are set to 0 and given that the table is at least 3
times as large as the biggest test table tenk1.

The attached patch is a regression test using this method that is
reasonably small and fast to run. I also hid the workers count from the
explain output when costs are disabled as suggested by Tom Lane and Robert
Haas on this same thread (
/messages/by-id/CA+TgmobBQS4ss3+CwoZOKgbsBqKfRndwc=hLiALAep5aXQCTDA@mail.gmail.com
).

Testing under these conditions does not test the planner job at all but at
least some parallel code can be run on the build farm and the test suite
gets 2643 more lines and 188 more function covered.

I don't know however if this test will be reliable on other platforms, some
more feedback is needed here.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1]/messages/by-id/20160527025039.GA447393@tornado.leadboat.com and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.

[1]: /messages/by-id/20160527025039.GA447393@tornado.leadboat.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Peter Eisentraut
peter_e@gmx.net
In reply to: Andres Freund (#7)
Re: parallel.c is not marked as test covered

On 5/9/16 10:50 AM, Andres Freund wrote:

I think it's a good idea to run a force-parallel run on some buildfarm
members. But I'm rather convinced that the core tests run by all animals
need some minimal coverage of parallel queries. Both because otherwise
it'll be hard to get some coverage of unusual platforms, and because
it's imo something rather relevant to test during development.

I can confirm that with force_parallel_mode = on, parallel.c does get
good test coverage. But I agree that it is a problem that this does not
happen in the default test setup.

I think we might want to have a new test file that sets
force_parallel_mode = on and just runs a few queries to give parallel.c
a small workout. For example, I find that the following test file gives
almost the same coverage as running the full test suite with f_p_m=on:

"""
SET force_parallel_mode = on;

EXPLAIN SELECT * FROM tenk1 WHERE unique1 = 1;
SELECT * FROM tenk1 WHERE unique1 = 1;

-- provoke error in worker
SELECT stringu1::int2 FROM tenk1;
"""

While we're at it, one of the things that force_parallel_mode = regress
does is remove an error context that is otherwise added to all messages:

errcontext("parallel worker, PID %d", *(int32 *) arg);

I think we should get rid of that error context altogether, except
possibly as a debugging tool, because it's an implementation detail that
does not need to be shown to users by default.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Noah Misch
noah@leadboat.com
In reply to: Noah Misch (#18)
Re: parallel.c is not marked as test covered

On Sun, May 29, 2016 at 01:31:24AM -0400, Noah Misch wrote:

On Sun, May 15, 2016 at 12:53:13PM +0000, Cl�ment Pr�vost wrote:

On Mon, May 9, 2016 at 4:50 PM Andres Freund <andres@anarazel.de> wrote:

I think it's a good idea to run a force-parallel run on some buildfarm
members. But I'm rather convinced that the core tests run by all animals
need some minimal coverage of parallel queries. Both because otherwise
it'll be hard to get some coverage of unusual platforms, and because
it's imo something rather relevant to test during development.

Good point.

After some experiments, I found out that, for my setup (9b7bfc3a88ef7b), a
parallel seq scan is used given both parallel_setup_cost
and parallel_tuple_cost are set to 0 and given that the table is at least 3
times as large as the biggest test table tenk1.

The attached patch is a regression test using this method that is
reasonably small and fast to run. I also hid the workers count from the
explain output when costs are disabled as suggested by Tom Lane and Robert
Haas on this same thread (
/messages/by-id/CA+TgmobBQS4ss3+CwoZOKgbsBqKfRndwc=hLiALAep5aXQCTDA@mail.gmail.com
).

Testing under these conditions does not test the planner job at all but at
least some parallel code can be run on the build farm and the test suite
gets 2643 more lines and 188 more function covered.

I don't know however if this test will be reliable on other platforms, some
more feedback is needed here.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item. Robert,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.

[1] /messages/by-id/20160527025039.GA447393@tornado.leadboat.com

I enjoy reviewing automated test patches, so I persuaded Robert to transfer
ownership of this open item to me. I will update this thread no later than
2016-06-07 09:00 UTC. There is an 85% chance I will have reviewed the
proposed patch by then.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Noah Misch
noah@leadboat.com
In reply to: Clément Prévost (#17)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Noah Misch (#21)
#23Clément Prévost
prevostclement@gmail.com
In reply to: Peter Eisentraut (#22)
#24Noah Misch
noah@leadboat.com
In reply to: Clément Prévost (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Noah Misch (#24)
#26Peter Eisentraut
peter_e@gmx.net
In reply to: Noah Misch (#24)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#27)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#31)
#33Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#32)
#34Noah Misch
noah@leadboat.com
In reply to: Amit Kapila (#33)
#35Amit Kapila
amit.kapila16@gmail.com
In reply to: Noah Misch (#34)
#36Amit Kapila
amit.kapila16@gmail.com
In reply to: Noah Misch (#34)
#37Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#36)
#38Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#28)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#38)
#40Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#37)
#41Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#40)
#42Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#39)
#43Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#42)
#44Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#43)
#45Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#44)
#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Amit Kapila (#45)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#46)
#48Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#47)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#48)
#50Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#48)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#50)
#52Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#41)
#53Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#51)
#54Robert Haas
robertmhaas@gmail.com
In reply to: Amit Kapila (#52)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#53)
#56David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#53)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#55)
#58Robert Haas
robertmhaas@gmail.com
In reply to: David G. Johnston (#56)
#59David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#58)
#60Robert Haas
robertmhaas@gmail.com
In reply to: David G. Johnston (#59)
#61Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#49)
#62Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#61)
#63David G. Johnston
david.g.johnston@gmail.com
In reply to: Robert Haas (#60)
#64Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#62)
#65Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#49)
#66Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#65)
#67Amit Kapila
amit.kapila16@gmail.com
In reply to: Robert Haas (#57)
#68Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#50)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Peter Eisentraut (#68)
#70Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#66)
#71Amit Kapila
amit.kapila16@gmail.com
In reply to: Peter Eisentraut (#70)