Questions about MergeAppend

Started by Ronan Dunklaualmost 9 years ago2 messages
#1Ronan Dunklau
ronan.dunklau@dalibo.com
1 attachment(s)

Hello,

Looking into the MergeAppendPath generation, I'm a bit surprised by several
things:

- When generating the mergeappendpath, we use a dummy path to take the sort
cost into account for non-sorted subpaths. This is called with the base
relation tuples instead of the subpath estimated number of rows. This tends to
overestimate the sorting cost drastically, since the base relation can be
filtered thus reducing the number of input tuples for the sorting routine.
Please find attached a trivial patch fixing this.
- Why does it only generate a "fake" SortPath for sorting purpose, not
adding it to the subpath, and posptone the creation of Sort plan node until
later ? This also adds a bit of complexity when fixing the sort cost node
later for explain output.
- Why do we only consider generating MergeAppendPath for PathKeys for which
a sorted Path exists in any of the child relation ? It seems to me there could
be an advantage in using a MergeAppend of explicitly sorted relations over
sorting an Append, in particular if every existing subpath can be sorted in
work_mem.

--
Ronan Dunklau
http://dalibo.com - http://dalibo.org

Attachments:

fix_mergeappend_costsort_estimates.patchtext/x-patch; charset=UTF-8; name=fix_mergeappend_costsort_estimates.patchDownload
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 324829690d..a99a3aeceb 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1313,7 +1313,7 @@ create_merge_append_path(PlannerInfo *root,
 					  root,
 					  pathkeys,
 					  subpath->total_cost,
-					  subpath->parent->tuples,
+					  subpath->rows,
 					  subpath->pathtarget->width,
 					  0.0,
 					  work_mem,
#2Ashutosh Bapat
ashutosh.bapat@enterprisedb.com
In reply to: Ronan Dunklau (#1)
Re: Questions about MergeAppend

On Thu, Mar 2, 2017 at 8:00 PM, Ronan Dunklau <ronan.dunklau@dalibo.com> wrote:

Hello,

Looking into the MergeAppendPath generation, I'm a bit surprised by several
things:

- When generating the mergeappendpath, we use a dummy path to take the sort
cost into account for non-sorted subpaths. This is called with the base
relation tuples instead of the subpath estimated number of rows. This tends to
overestimate the sorting cost drastically, since the base relation can be
filtered thus reducing the number of input tuples for the sorting routine.
Please find attached a trivial patch fixing this.

Yes, that makes sense. It's weird that the function
create_merge_append_path() sums up subpath->rows as an estimate for
the total rows in append relation, but uses subpath->parent->tuples to
estimate cost of sort. The patch looks right to me.

- Why does it only generate a "fake" SortPath for sorting purpose, not
adding it to the subpath, and posptone the creation of Sort plan node until
later ? This also adds a bit of complexity when fixing the sort cost node
later for explain output.

I think, the reason for that is not to waste space for SortPath in
case MergeAppendPath does not emerge as the cheapest path. If there
are many possible pathkeys combinations where only half children have
readily sorted paths, we would be wasting a lot of space in SortPaths
for the unsorted children.

- Why do we only consider generating MergeAppendPath for PathKeys for which
a sorted Path exists in any of the child relation ?

For any base relation a path with pathkeys implies that there exists
corresponding index on that base relation. These pathkeys then bubble
up the join tree, allowing the executor to use the indexes wherever
possible. We just extend that logic to an append relation.

It seems to me there could
be an advantage in using a MergeAppend of explicitly sorted relations over
sorting an Append, in particular if every existing subpath can be sorted in
work_mem.

I think this is a case, when we have to add a sorting step on top of
an append relation, for say merge join. Probably, we should probably
try to cost two kinds of paths one with Sort->Append and the other a
MergeAppend with list of Sort->subpath and choose cheaper among those.
Probably for a very large number of partitions, MergeAppend on
in-memory sort wouldn't be efficient and might require a large
work_mem.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers