select_parallel test fails with nonstandard block size

Started by Peter Eisentrautover 9 years ago8 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

When building with --with-blocksize=16, the select_parallel test fails
with this difference:

 explain (costs off)
        select  sum(parallel_restricted(unique1)) from tenk1
        group by(parallel_restricted(unique1));
-                     QUERY PLAN
-----------------------------------------------------
+                QUERY PLAN
+-------------------------------------------
  HashAggregate
    Group Key: parallel_restricted(unique1)
-   ->  Index Only Scan using tenk1_unique1 on tenk1
-(3 rows)
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Seq Scan on tenk1
+(5 rows)

set force_parallel_mode=1;
explain (costs off)

We know that different block sizes cause some test failures, mainly
because of row ordering differences. But this looked a bit different.

The size of the tenk1 table is very similar under either block size:

16k: tenk1 = 2883584
8k: tenk1 = 2932736

Is there an explanation for this difference, or is there something wrong
in the cost estimation somewhere?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: select_parallel test fails with nonstandard block size

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

When building with --with-blocksize=16, the select_parallel test fails
with this difference:

explain (costs off)
select  sum(parallel_restricted(unique1)) from tenk1
group by(parallel_restricted(unique1));
-                     QUERY PLAN
-----------------------------------------------------
+                QUERY PLAN
+-------------------------------------------
HashAggregate
Group Key: parallel_restricted(unique1)
-   ->  Index Only Scan using tenk1_unique1 on tenk1
-(3 rows)
+   ->  Gather
+         Workers Planned: 4
+         ->  Parallel Seq Scan on tenk1
+(5 rows)

set force_parallel_mode=1;
explain (costs off)

We know that different block sizes cause some test failures, mainly
because of row ordering differences. But this looked a bit different.

I suspect what is happening is that min_parallel_relation_size is
being interpreted differently (because the default is set at 1024
blocks, regardless of what BLCKSZ is) and that's affecting the
cost estimate for the parallel seqscan. The direction of change
seems a bit surprising though; if the table is now half as big
blocks-wise, how did that make the parallel scan look cheaper?
Please step through create_plain_partial_paths and see what
is being done differently.

Possibly we ought to change things so that the default value of
min_parallel_relation_size is a fixed number of bytes rather
than a fixed number of blocks. Not sure though.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
Re: select_parallel test fails with nonstandard block size

On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Possibly we ought to change things so that the default value of
min_parallel_relation_size is a fixed number of bytes rather
than a fixed number of blocks. Not sure though.

The reason why this was originally reckoned in blocks is because the
data is divided between the workers on the basis of a block number.
In the degenerate case where blocks < workers, the extra workers will
get no blocks at all, and thus no rows at all. It seemed best to
insist that the relation had a reasonable number of blocks so that we
could hope for a reasonably even distribution of work among a pool of
workers. I'm not altogether sure that's the right way of thinking
about this problem but I'm not sure it's wrong, either; anyway, it's
as far as my thought process had progressed at the time I wrote the
code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#3)
Re: select_parallel test fails with nonstandard block size

Robert Haas wrote:

On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Possibly we ought to change things so that the default value of
min_parallel_relation_size is a fixed number of bytes rather
than a fixed number of blocks. Not sure though.

The reason why this was originally reckoned in blocks is because the
data is divided between the workers on the basis of a block number.

Maybe the solution is to fill the table to a given number of blocks
rather than a number of rows.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: select_parallel test fails with nonstandard block size

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Possibly we ought to change things so that the default value of
min_parallel_relation_size is a fixed number of bytes rather
than a fixed number of blocks. Not sure though.

The reason why this was originally reckoned in blocks is because the
data is divided between the workers on the basis of a block number.
In the degenerate case where blocks < workers, the extra workers will
get no blocks at all, and thus no rows at all.

Well, sure, but at any reasonable value of min_parallel_relation_size
that won't be a factor. The question here is whether we want the default
value to be platform-independent. I notice that both config.sgml and
postgresql.conf.sample claim that the default value is 8MB, which this
discussion reveals to be a lie. If you want to keep the default expressed
as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
documentation.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: select_parallel test fails with nonstandard block size

On Thu, Sep 15, 2016 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Sep 15, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Possibly we ought to change things so that the default value of
min_parallel_relation_size is a fixed number of bytes rather
than a fixed number of blocks. Not sure though.

The reason why this was originally reckoned in blocks is because the
data is divided between the workers on the basis of a block number.
In the degenerate case where blocks < workers, the extra workers will
get no blocks at all, and thus no rows at all.

Well, sure, but at any reasonable value of min_parallel_relation_size
that won't be a factor. The question here is whether we want the default
value to be platform-independent. I notice that both config.sgml and
postgresql.conf.sample claim that the default value is 8MB, which this
discussion reveals to be a lie. If you want to keep the default expressed
as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
documentation.

I don't particularly care about that. Changing it to 8MB always would
be fine with me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: select_parallel test fails with nonstandard block size

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Sep 15, 2016 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Well, sure, but at any reasonable value of min_parallel_relation_size
that won't be a factor. The question here is whether we want the default
value to be platform-independent. I notice that both config.sgml and
postgresql.conf.sample claim that the default value is 8MB, which this
discussion reveals to be a lie. If you want to keep the default expressed
as "1024" and not "(8 * 1024 * 1024) / BLCKSZ", we need to change the
documentation.

I don't particularly care about that. Changing it to 8MB always would
be fine with me.

OK, I'll take care of it (since I now realize that the inconsistency
is my own fault --- I committed that GUC not you). It's unclear what
this will do for Peter's complaint though.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: select_parallel test fails with nonstandard block size

I wrote:

OK, I'll take care of it (since I now realize that the inconsistency
is my own fault --- I committed that GUC not you). It's unclear what
this will do for Peter's complaint though.

On closer inspection, the answer is "nothing", because the select_parallel
test overrides the default value of min_parallel_relation_size anyway.
(Without that, I don't think tenk1 is large enough to trigger
consideration of parallel scan at all.)

I find that at BLCKSZ 8K, the planner thinks the best plan is

HashAggregate (cost=5320.28..7920.28 rows=10000 width=12)
Group Key: parallel_restricted(unique1)
-> Index Only Scan using tenk1_unique1 on tenk1 (cost=0.29..2770.28 rows=10000 width=8)

which is what the regression test script expects. Forcing the parallel
plan to be chosen, we get this using the cost parameters set up by
select_parallel:

HashAggregate (cost=5433.00..8033.00 rows=10000 width=12)
Group Key: parallel_restricted(unique1)
-> Gather (cost=0.00..2883.00 rows=10000 width=8)
Workers Planned: 4
-> Parallel Seq Scan on tenk1 (cost=0.00..383.00 rows=2500 width=4)

However, at BLCKSZ 16K, we get these numbers instead:

HashAggregate (cost=5264.28..7864.28 rows=10000 width=12)
Group Key: parallel_restricted(unique1)
-> Index Only Scan using tenk1_unique1 on tenk1 (cost=0.29..2714.28 rows=10000 width=8)

HashAggregate (cost=5251.00..7851.00 rows=10000 width=12)
Group Key: parallel_restricted(unique1)
-> Gather (cost=0.00..2701.00 rows=10000 width=8)
Workers Planned: 4
-> Parallel Seq Scan on tenk1 (cost=0.00..201.00 rows=2500 width=4)

so the planner goes for the second one.

I don't think there's anything particularly broken here. The seqscan
cost estimate is largely dependent on the number of blocks, and there's
half as many blocks at 16K. The indexscan estimate is also reduced,
but not as much, so it stops looking like the cheaper alternative.

We could maybe twiddle the cost parameters select_parallel uses so that
the same plan is chosen at both block sizes, but it seems like it would
be very fragile, and I'm not sure there's much point.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers