Suspicious call of initial_cost_hashjoin()
try_partial_hashjoin_path() passes constant true to for the parallel_hash
argument of initial_cost_hashjoin(). Shouldn't it instead pass the
parallel_hash argument that it receives?
This is related to commit 1804284042e659e7d16904e7bbb0ad546394b6a3.
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at
On Fri, Dec 22, 2017 at 10:45 PM, Antonin Houska <ah@cybertec.at> wrote:
try_partial_hashjoin_path() passes constant true to for the parallel_hash
argument of initial_cost_hashjoin(). Shouldn't it instead pass the
parallel_hash argument that it receives?
Thanks. Yeah. When initial_cost_hashjoin() calls
get_parallel_divisor() on a non-partial inner path I think it would
return 1.0, so no damage was done there, but when
ExecChooseHashTableSize() receives try_combined_work_mem == true it
might underestimate the number of batches required for a partial hash
join without parallel hash, because it would incorrectly assume that a
single batch join could use the combined work_mem budget. This was
quite well hidden because ExecHashTableCreate() calls
ExecChooseHashTableSize() again (rather than reusing the results from
planning time), so the bad nbatch estimate doesn't show up anywhere.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
fix.patchapplication/octet-stream; name=fix.patchDownload+1-2
Hi Antonin,
On 12/22/17 6:13 AM, Thomas Munro wrote:
On Fri, Dec 22, 2017 at 10:45 PM, Antonin Houska <ah@cybertec.at> wrote:
try_partial_hashjoin_path() passes constant true to for the parallel_hash
argument of initial_cost_hashjoin(). Shouldn't it instead pass the
parallel_hash argument that it receives?Thanks. Yeah. When initial_cost_hashjoin() calls
get_parallel_divisor() on a non-partial inner path I think it would
return 1.0, so no damage was done there, but when
ExecChooseHashTableSize() receives try_combined_work_mem == true it
might underestimate the number of batches required for a partial hash
join without parallel hash, because it would incorrectly assume that a
single batch join could use the combined work_mem budget. This was
quite well hidden because ExecHashTableCreate() calls
ExecChooseHashTableSize() again (rather than reusing the results from
planning time), so the bad nbatch estimate doesn't show up anywhere.
Does this look right to you? If so, can you sign up as a reviewer and
mark it Ready for Committer?
Thanks,
--
-David
david@pgmasters.net
David Steele <david@pgmasters.net> wrote:
On 12/22/17 6:13 AM, Thomas Munro wrote:
On Fri, Dec 22, 2017 at 10:45 PM, Antonin Houska <ah@cybertec.at> wrote:
try_partial_hashjoin_path() passes constant true to for the parallel_hash
argument of initial_cost_hashjoin(). Shouldn't it instead pass the
parallel_hash argument that it receives?Thanks. Yeah. When initial_cost_hashjoin() calls
get_parallel_divisor() on a non-partial inner path I think it would
return 1.0, so no damage was done there, but when
ExecChooseHashTableSize() receives try_combined_work_mem == true it
might underestimate the number of batches required for a partial hash
join without parallel hash, because it would incorrectly assume that a
single batch join could use the combined work_mem budget. This was
quite well hidden because ExecHashTableCreate() calls
ExecChooseHashTableSize() again (rather than reusing the results from
planning time), so the bad nbatch estimate doesn't show up anywhere.
Does this look right to you?
Yes, this is what I meant. The patch applies cleanly and the code compiles
well.
If so, can you sign up as a reviewer and mark it Ready for Committer?
Done.
Actually I think it'd be nice if the "parallel_hash" argument was mentioned in
the header comment of initial_cost_hashjoin() function, but not sure this is
worth returning the patch to the author.
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com
On Fri, Mar 2, 2018 at 9:06 PM, Antonin Houska <ah@cybertec.at> wrote:
David Steele <david@pgmasters.net> wrote:
Does this look right to you?
Yes, this is what I meant. The patch applies cleanly and the code compiles
well.If so, can you sign up as a reviewer and mark it Ready for Committer?
Done.
Thanks.
Actually I think it'd be nice if the "parallel_hash" argument was mentioned in
the header comment of initial_cost_hashjoin() function, but not sure this is
worth returning the patch to the author.
Done.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
0001-Fix-costing-of-parallel-hash-joins-v2.patchapplication/octet-stream; name=0001-Fix-costing-of-parallel-hash-joins-v2.patchDownload+3-2
On 3/2/18 05:01, Thomas Munro wrote:
On Fri, Mar 2, 2018 at 9:06 PM, Antonin Houska <ah@cybertec.at> wrote:
David Steele <david@pgmasters.net> wrote:
Does this look right to you?
Yes, this is what I meant. The patch applies cleanly and the code compiles
well.If so, can you sign up as a reviewer and mark it Ready for Committer?
Done.
Thanks.
Actually I think it'd be nice if the "parallel_hash" argument was mentioned in
the header comment of initial_cost_hashjoin() function, but not sure this is
worth returning the patch to the author.Done.
committed
--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services