explain and PARAM_EXEC

Started by Robert Haasabout 16 years ago30 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

Consider the following (rather lame) query:

rhaas=# explain (verbose) select (select oid from pg_class a where
a.oid = b.relfilenode) from pg_class b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class b (cost=0.00..2250.22 rows=271 width=4)
Output: (SubPlan 1)
SubPlan 1
-> Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
Output: a.oid
Index Cond: (a.oid = $0)

It seems odd to me that we don't display any information about where
$0 comes from or how it's initialized. Should we? I believe what's
happening is that the sequential scan of b kicks out b.oid, and that
then gets yanked into $0 when we invoke the subplan. But you can't
really see what's happening. Interestingly, if you contrive to make
the sequential scan not the toplevel plan node, then you actually do
get to see what it's kicking out:

rhaas=# explain (verbose) select (select oid from pg_class a where
a.oid = b.oid::integer) from pg_class b, generate_series(1,5);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2245943.89 rows=271000 width=4)
Output: (SubPlan 1)
-> Seq Scan on pg_catalog.pg_class b (cost=0.00..9.71 rows=271 width=4)
Output: b.oid
-> Function Scan on pg_catalog.generate_series (cost=0.00..10.00
rows=1000 width=0)
Output: generate_series.generate_series
SubPlan 1
-> Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
Output: a.oid
Index Cond: (a.oid = ($0)::oid)
(10 rows)

We can even make it kick out two things:

rhaas=# explain (verbose) select (select oid from pg_class a where
a.oid = b.oid::integer + b.relfilenode::integer) from pg_class b,
generate_series(1,5);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..2246621.39 rows=271000 width=8)
Output: (SubPlan 1)
-> Seq Scan on pg_catalog.pg_class b (cost=0.00..9.71 rows=271 width=8)
Output: b.oid, b.relfilenode
-> Function Scan on pg_catalog.generate_series (cost=0.00..10.00
rows=1000 width=0)
Output: generate_series.generate_series
SubPlan 1
-> Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
Output: a.oid
Index Cond: (a.oid = ((($0)::integer + ($1)::integer))::oid)
(10 rows)

But if we drop the generate_series call we're back in the dark -
where's the node that's emitting oid and relfilenode?

rhaas=# explain (verbose) select (select oid from pg_class a where
a.oid = b.oid::integer + b.relfilenode::integer) from pg_class b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class b (cost=0.00..2250.90 rows=271 width=8)
Output: (SubPlan 1)
SubPlan 1
-> Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
Output: a.oid
Index Cond: (a.oid = ((($0)::integer + ($1)::integer))::oid)
(6 rows)

So I guess there are two issues here: (1) somehow I feel like we
should be telling the user what expression is being used to initialize
$0, $1, etc. when they are PARAM_EXEC parameters; and (2) where does
the output list for the sequential scan "go" when there's only one
table involved?

This is when you all start explaining to me why I'm woefully confused...

...Robert

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: explain and PARAM_EXEC

Robert Haas <robertmhaas@gmail.com> writes:

So I guess there are two issues here: (1) somehow I feel like we
should be telling the user what expression is being used to initialize
$0, $1, etc. when they are PARAM_EXEC parameters;

Maybe, but the only reasonable place to put it would be within the
(SubPlan N) reference, which is not a place where verboseness would be
appreciated, I think. In principle it could look something like

(SubPlan N ($0 := b.oid))

but with a few parameters and a bunch of other stuff on the same line
that would get out of hand.

and (2) where does
the output list for the sequential scan "go" when there's only one
table involved?

The (SubPlan N) reference is meant to be understood as an expression
element yielding the output of the subplan. One of us is confused,
because I don't see any material difference between your examples;
they're all calling the subplan in the same way.

regards, tom lane

#3Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#2)
Re: explain and PARAM_EXEC

On Fri, Feb 19, 2010 at 10:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

So I guess there are two issues here: (1) somehow I feel like we
should be telling the user what expression is being used to initialize
$0, $1, etc. when they are PARAM_EXEC parameters;

Maybe, but the only reasonable place to put it would be within the
(SubPlan N) reference, which is not a place where verboseness would be
appreciated, I think.  In principle it could look something like

       (SubPlan N ($0 := b.oid))

but with a few parameters and a bunch of other stuff on the same line
that would get out of hand.

I thought maybe it could do something like this:

SubPlan 1
Parameters: $0 := b.oid
-> Index Scan etc.

...but maybe that doesn't work if it can be called with different
parameters from different places? Not sure if that's possible.

and (2) where does
the output list for the sequential scan "go" when there's only one
table involved?

The (SubPlan N) reference is meant to be understood as an expression
element yielding the output of the subplan.  One of us is confused,
because I don't see any material difference between your examples;
they're all calling the subplan in the same way.

I don't think it's a stretch to say that I'm the one who is confused.
I am under the (perhaps faulty) impression that when evaluating an
expression there can only ever be three tuples in score: inner, outer,
and scan. So when we go to evaluate the expression whose result will
be assigned to $0, where do we get those inner and/or outer and/or
scan tuples from? IOW, I understand where the subplan is putting its
OUTPUT, what I don't understand is what context is being used to set
its input parameters.

...Robert

#4Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#3)
Re: explain and PARAM_EXEC

On Fri, Feb 19, 2010 at 11:05 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I don't think it's a stretch to say that I'm the one who is confused.
I am under the (perhaps faulty) impression that when evaluating an
expression there can only ever be three tuples in score: inner, outer,

s/score/scope.

and scan.  So when we go to evaluate the expression whose result will
be assigned to $0, where do we get those inner and/or outer and/or
scan tuples from?  IOW, I understand where the subplan is putting its
OUTPUT, what I don't understand is what context is being used to set
its input parameters.

...Robert

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#3)
Re: explain and PARAM_EXEC

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 19, 2010 at 10:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe, but the only reasonable place to put it would be within the
(SubPlan N) reference,

I thought maybe it could do something like this:

SubPlan 1
Parameters: $0 := b.oid
-> Index Scan etc.

No, that's the wrong end of the stick --- that's like trying to annotate
a function definition with the actual parameter values being passed to
it from somewhere else. You haven't got the info there, and even if you
did, it's assuming that there is exactly one call site for any subplan.

I am under the (perhaps faulty) impression that when evaluating an
expression there can only ever be three tuples in score: inner, outer,
and scan. So when we go to evaluate the expression whose result will
be assigned to $0, where do we get those inner and/or outer and/or
scan tuples from? IOW, I understand where the subplan is putting its
OUTPUT, what I don't understand is what context is being used to set
its input parameters.

Consider this small mod on your example:

regression=# explain (verbose) select oid::int + 1,(select oid from pg_class a where a.oid = b.relfilenode) from pg_class b;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class b (cost=0.00..5573.04 rows=671 width=8)
Output: ((b.oid)::integer + 1), (SubPlan 1)
SubPlan 1
-> Index Scan using pg_class_oid_index on pg_catalog.pg_class a (cost=0.00..8.27 rows=1 width=4)
Output: a.oid
Index Cond: (a.oid = $0)
(6 rows)

When we are evaluating the output targetlist of the seqscan node, we
have a scan tuple of pg_class b in scope. We can fetch that tuple's
oid and use it in the first expression. We can also fetch that tuple's
relfilenode and pass it to the subplan, which we do by setting the $0
Param value before invoking the subplan. The subplan runs an indexscan
and returns a single scalar value (to wit, a.oid from some row of
pg_class a), which becomes the value of the (SubPlan 1) reference
back at the evaluation of the seqscan's targetlist.

It's really not much different from a function call with subplans as
functions. The PARAM_EXEC stuff looks just like 1950's era
non-reentrant function parameter passing mechanisms, back before anybody
had thought of recursive functions and they passed a function's
parameters in fixed storage locations. It's okay for this because
subplan trees are never recursive ...

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: explain and PARAM_EXEC

On Fri, Feb 19, 2010 at 11:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 19, 2010 at 10:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Maybe, but the only reasonable place to put it would be within the
(SubPlan N) reference,

I thought maybe it could do something like this:

SubPlan 1
  Parameters: $0 := b.oid
  -> Index Scan etc.

No, that's the wrong end of the stick --- that's like trying to annotate
a function definition with the actual parameter values being passed to
it from somewhere else.  You haven't got the info there, and even if you
did, it's assuming that there is exactly one call site for any subplan.

OK. Will have to think this one over.

I am under the (perhaps faulty) impression that when evaluating an
expression there can only ever be three tuples in score: inner, outer,
and scan.  So when we go to evaluate the expression whose result will
be assigned to $0, where do we get those inner and/or outer and/or
scan tuples from?  IOW, I understand where the subplan is putting its
OUTPUT, what I don't understand is what context is being used to set
its input parameters.

Consider this small mod on your example:

regression=# explain (verbose) select oid::int + 1,(select oid from pg_class a where a.oid = b.relfilenode) from pg_class b;
                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Seq Scan on pg_catalog.pg_class b  (cost=0.00..5573.04 rows=671 width=8)
  Output: ((b.oid)::integer + 1), (SubPlan 1)
  SubPlan 1
    ->  Index Scan using pg_class_oid_index on pg_catalog.pg_class a  (cost=0.00..8.27 rows=1 width=4)
          Output: a.oid
          Index Cond: (a.oid = $0)
(6 rows)

When we are evaluating the output targetlist of the seqscan node, we
have a scan tuple of pg_class b in scope.  We can fetch that tuple's
oid and use it in the first expression.

OK.

We can also fetch that tuple's
relfilenode and pass it to the subplan, which we do by setting the $0
Param value before invoking the subplan.

Are the same tuples in scope when evaluating the expression that sets
$0 as were in scope when evaluating ((b.oid)::integer + 1)?

The subplan runs an indexscan
and returns a single scalar value (to wit, a.oid from some row of
pg_class a), which becomes the value of the (SubPlan 1) reference
back at the evaluation of the seqscan's targetlist.

This part I get, 100%.

...Robert

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: explain and PARAM_EXEC

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 19, 2010 at 11:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We can also fetch that tuple's
relfilenode and pass it to the subplan, which we do by setting the $0
Param value before invoking the subplan.

Are the same tuples in scope when evaluating the expression that sets
$0 as were in scope when evaluating ((b.oid)::integer + 1)?

Yes, exactly the same. The parameter-value expression is just like any
other scalar expression that could appear where the SubPlan reference
is. It doesn't know anything about the subplan, really.

regards, tom lane

#8Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#5)
Re: explain and PARAM_EXEC

Tom Lane <tgl@sss.pgh.pa.us> writes:

It's really not much different from a function call with subplans as
functions. The PARAM_EXEC stuff looks just like 1950's era
non-reentrant function parameter passing mechanisms, back before anybody
had thought of recursive functions and they passed a function's
parameters in fixed storage locations. It's okay for this because
subplan trees are never recursive ...

<hand waving alert>

How much does this stuff is dependent on the current state of the
backend? If not that much, maybe the planner could consider costs of
having another backend started to handle the subplan. We'd need a
tuplestore or some other place to exchange results (a unix style pipe
maybe, but we do support windows as a platform), and a special executor
mode for running a subplan, maybe more than once. Oh, and a way to share
the same snapshot in more than one backend too, but that's being worked
on I think.

Ok that's a far stretch from the question at hand, but would that be a
plausible approach to have parallel queries in PostgreSQL ?

Regards,
--
dim

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: explain and PARAM_EXEC

On Sat, Feb 20, 2010 at 4:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's really not much different from a function call with subplans as
functions.

Perhaps it would be clearer to display the "(Subplan 1)" in a function
call style format like Subplan1(b.oid)

--
greg

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#7)
Re: explain and PARAM_EXEC

On Fri, Feb 19, 2010 at 11:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Feb 19, 2010 at 11:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We can also fetch that tuple's
relfilenode and pass it to the subplan, which we do by setting the $0
Param value before invoking the subplan.

Are the same tuples in scope when evaluating the expression that sets
$0 as were in scope when evaluating  ((b.oid)::integer + 1)?

Yes, exactly the same.  The parameter-value expression is just like any
other scalar expression that could appear where the SubPlan reference
is.  It doesn't know anything about the subplan, really.

OK, thanks for the explanation. That makes more sense now. I've got
to go study this some more before I ask my next question...

...Robert

#11Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#8)
Re: explain and PARAM_EXEC

On Sat, Feb 20, 2010 at 6:57 AM, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:

Tom Lane <tgl@sss.pgh.pa.us> writes:

It's really not much different from a function call with subplans as
functions.  The PARAM_EXEC stuff looks just like 1950's era
non-reentrant function parameter passing mechanisms, back before anybody
had thought of recursive functions and they passed a function's
parameters in fixed storage locations.  It's okay for this because
subplan trees are never recursive ...

<hand waving alert>

How much does this stuff is dependent on the current state of the
backend?

A whole lot.

Ok that's a far stretch from the question at hand, but would that be a
plausible approach to have parallel queries in PostgreSQL ?

This is really a topic for another thread, but at 100,000 feet it
seems to me that the hardest question is - how will you decide which
operations to parallelize in the first place? Actually making it
happen is really hard, too, of course, but even to get that that point
you have to have some model for what types of operations it makes
sense to parallelize and how you're going to decide when it's a win.

...Robert

#12Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#9)
Re: explain and PARAM_EXEC

On Sat, Feb 20, 2010 at 7:53 AM, Greg Stark <gsstark@mit.edu> wrote:

On Sat, Feb 20, 2010 at 4:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's really not much different from a function call with subplans as
functions.

Perhaps it would be clearer to display the "(Subplan 1)" in a function
call style format like Subplan1(b.oid)

I thought about that, too... maybe for 9.1 we should consider it. It
might be nice to add some sort of glyph to make the user less likely
to think that Subplan1 is in fact a function. <Subplan1>(b.oid)?
${Subplan1}(b.oid)? I dunno.

...Robert

#13Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#11)
parallelizing subplan execution (was: explain and PARAM_EXEC)

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Feb 20, 2010 at 6:57 AM, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:

How much does this stuff is dependent on the current state of the
backend?

A whole lot.

Bad news.

Ok that's a far stretch from the question at hand, but would that be a
plausible approach to have parallel queries in PostgreSQL ?

This is really a topic for another thread, but at 100,000 feet it
seems to me that the hardest question is - how will you decide which
operations to parallelize in the first place? Actually making it
happen is really hard, too, of course, but even to get that that point
you have to have some model for what types of operations it makes
sense to parallelize and how you're going to decide when it's a win.

My naive thoughts would be to add some cost parameters. The fact to
fork() another backend first, then model for each supported subplan (we
will want to add more, or maybe have a special rendez-vous-materialise
node) some idea of the data exchange cost.

Now the planner would as usual try to find the less costly plan, and
will be able to compare plans with and without distributing the work.

Overly naive ?

Regards,
--
dim

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#11)
Re: explain and PARAM_EXEC

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Feb 20, 2010 at 6:57 AM, Dimitri Fontaine

Ok that's a far stretch from the question at hand, but would that be a
plausible approach to have parallel queries in PostgreSQL�?

This is really a topic for another thread, but at 100,000 feet it
seems to me that the hardest question is - how will you decide which
operations to parallelize in the first place? Actually making it
happen is really hard, too, of course, but even to get that that point
you have to have some model for what types of operations it makes
sense to parallelize and how you're going to decide when it's a win.

Indeed, and if I were setting out to parallelize queries in PG (which
I am not), subplans would be the last thing I would think about. You
could put an enormous amount of work in there and have nothing much
to show for it, because the construct doesn't even arise in many
queries. Even where the user wrote something that looks like a subplan,
the planner will do its best to get rid of it by turning it into a join.

So if you want to parallelize queries, start someplace else. The past
discussions of this have revolved around splitting the node tree of an
ordinary query plan into separately executable parts. Maybe a subplan
could be one of the cut points for such an approach, but if it's the
only one or even the main one, you're wasting your time.

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#12)
Re: explain and PARAM_EXEC

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Feb 20, 2010 at 7:53 AM, Greg Stark <gsstark@mit.edu> wrote:

Perhaps it would be clearer to display the "(Subplan 1)" in a function
call style format like Subplan1(b.oid)

I thought about that, too... maybe for 9.1 we should consider it. It
might be nice to add some sort of glyph to make the user less likely
to think that Subplan1 is in fact a function. <Subplan1>(b.oid)?
${Subplan1}(b.oid)? I dunno.

You really can't escape the need to identify which $N symbol is
associated with which parameter value. As soon as you've got more than
one subplan in a query, that becomes a nontrivial thing for a user to
guess.

regards, tom lane

#16Jaime Casanova
jcasanov@systemguards.com.ec
In reply to: Tom Lane (#2)
Re: explain and PARAM_EXEC

On Fri, Feb 19, 2010 at 10:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

So I guess there are two issues here: (1) somehow I feel like we
should be telling the user what expression is being used to initialize
$0, $1, etc. when they are PARAM_EXEC parameters;

Maybe, but the only reasonable place to put it would be within the
(SubPlan N) reference, which is not a place where verboseness would be
appreciated, I think.  In principle it could look something like

       (SubPlan N ($0 := b.oid))

what if we put fully qualified names every time we use a reference
from a subplan?
something like:

QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class b (cost=0.00..2250.22 rows=271 width=4)
Output: (SubPlan 1)
SubPlan 1
-> Index Scan using pg_class_oid_index on pg_catalog.pg_class a
(cost=0.00..8.27 rows=1 width=4)
Output: a.oid
Index Cond: (a.oid = subplan1.$0)

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

#17Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#14)
Re: explain and PARAM_EXEC

Tom Lane <tgl@sss.pgh.pa.us> writes:

Indeed, and if I were setting out to parallelize queries in PG (which
I am not), subplans would be the last thing I would think about. You
could put an enormous amount of work in there and have nothing much
to show for it, because the construct doesn't even arise in many
queries. Even where the user wrote something that looks like a subplan,
the planner will do its best to get rid of it by turning it into a
join.

I guess that's because subplans do cost more than their alternative. The
idea was to provide a parallel implementation of them, so they get some
benefits, then compare better to plain join'ing.

But I can see that's an entirely wrong approach, and I'm happy to know
that and glad I asked, thanks :)

So if you want to parallelize queries, start someplace else. The past
discussions of this have revolved around splitting the node tree of an
ordinary query plan into separately executable parts. Maybe a subplan
could be one of the cut points for such an approach, but if it's the
only one or even the main one, you're wasting your time.

Unless you arrange for the planner to have good (new) reasons to prefer
using subplans, or provide subplan based joins ?

Ok, once you've done that, maybe you're back to the main problem and
just changed its name.

Regards,
--
dim

#18Robert Haas
robertmhaas@gmail.com
In reply to: Dimitri Fontaine (#13)
Re: parallelizing subplan execution (was: explain and PARAM_EXEC)

On Sat, Feb 20, 2010 at 8:31 AM, Dimitri Fontaine
<dfontaine@hi-media.com> wrote:

This is really a topic for another thread, but at 100,000 feet it
seems to me that the hardest question is - how will you decide which
operations to parallelize in the first place?  Actually making it
happen is really hard, too, of course, but even to get that that point
you have to have some model for what types of operations it makes
sense to parallelize and how you're going to decide when it's a win.

My naive thoughts would be to add some cost parameters. The fact to
fork() another backend first, then model for each supported subplan (we
will want to add more, or maybe have a special rendez-vous-materialise
node) some idea of the data exchange cost.

Now the planner would as usual try to find the less costly plan, and
will be able to compare plans with and without distributing the work.

Overly naive ?

Probably. For one thing, you can't use fork(), because it won't work
on Windows.

It seems to me that you need to start by thinking about what kinds of
queries could be usefully parallelized. What I think you're proposing
here, modulo large amounts of hand-waving, is that we should basically
find a branch of the query tree, cut it off, and make that branch the
responsibility of a subprocess. What kinds of things would be
sensible to hand off in this way? Well, you'd want to find nodes that
are not likely to be repeatedly re-executed with different parameters,
like subplans or inner-indexscans, because otherwise you'll get
pipeline stalls handing the new parameters back and forth. And you
want to find nodes that are expensive for the same reason. So maybe
this would work for something like a merge join on top of two sorts -
one backend could perform each sort, and then whichever one was the
child would stream the tuples to the parent for the final merge. Of
course, this assumes the I/O subsystem can keep up, which is not a
given - if both tables are fed by the same, single spindle, it might
be worse than if you just did the sorts consecutively.

This approach might also benefit queries that are very CPU-intensive,
on a multi-core system with spare cycles. Suppose you have a big tall
stack of hash joins, each with a small inner rel. The child process
does about half the joins and then pipelines the results into the
parent, which does the other half and returns the results.

But there's at least one other totally different way of thinking about
this problem, which is that you might want two processes to cooperate
in executing the SAME query node - imagine, for example, a big
sequential scan with an expensive but highly selective filter
condition, or an enormous sort. You have all the same problems of
figuring out when it's actually going to help, of course, but the
details will likely be quite different.

I'm not really sure which one of these would be more useful in
practice - or maybe there are even other strategies. What does
$COMPETITOR do?

I'm also ignoring the difficulties of getting hold of a second backend
in the right state - same database, same snapshot, etc. It seems to
me unlikely that there are a substantial number of real-world
applications for which this will not work very well if we have to
actually start a new backend every time we want to parallelize a
query. IOW, we're going to need, well, a connection pool in core.
*ducks, runs for cover*

...Robert

#19Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Robert Haas (#18)
Re: parallelizing subplan execution

Robert Haas wrote:

It seems to me that you need to start by thinking about what kinds of
queries could be usefully parallelized. What I think you're proposing
here, modulo large amounts of hand-waving, is that we should basically
find a branch of the query tree, cut it off, and make that branch the
responsibility of a subprocess. What kinds of things would be
sensible to hand off in this way? Well, you'd want to find nodes that
are not likely to be repeatedly re-executed with different parameters,
like subplans or inner-indexscans, because otherwise you'll get
pipeline stalls handing the new parameters back and forth. And you
want to find nodes that are expensive for the same reason. So maybe
this would work for something like a merge join on top of two sorts -
one backend could perform each sort, and then whichever one was the
child would stream the tuples to the parent for the final merge. Of
course, this assumes the I/O subsystem can keep up, which is not a
given - if both tables are fed by the same, single spindle, it might
be worse than if you just did the sorts consecutively.

This approach might also benefit queries that are very CPU-intensive,
on a multi-core system with spare cycles. Suppose you have a big tall
stack of hash joins, each with a small inner rel. The child process
does about half the joins and then pipelines the results into the
parent, which does the other half and returns the results.

But there's at least one other totally different way of thinking about
this problem, which is that you might want two processes to cooperate
in executing the SAME query node - imagine, for example, a big
sequential scan with an expensive but highly selective filter
condition, or an enormous sort. You have all the same problems of
figuring out when it's actually going to help, of course, but the
details will likely be quite different.

I'm not really sure which one of these would be more useful in
practice - or maybe there are even other strategies. What does
$COMPETITOR do?

I'm also ignoring the difficulties of getting hold of a second backend
in the right state - same database, same snapshot, etc. It seems to
me unlikely that there are a substantial number of real-world
applications for which this will not work very well if we have to
actually start a new backend every time we want to parallelize a
query. IOW, we're going to need, well, a connection pool in core.
*ducks, runs for cover*

One thing that might work quite well is slicing up by partition
(properly implemented partitioning would go along with this nicely too...)

regards

Mark

#20Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Robert Haas (#18)
Re: parallelizing subplan execution

Robert Haas <robertmhaas@gmail.com> writes:

Probably. For one thing, you can't use fork(), because it won't work
on Windows.

[...]

query. IOW, we're going to need, well, a connection pool in core.
*ducks, runs for cover*

Well, in fact, you're slowly getting to the interesting^W crazy part of
it.

Now that you have a connection pool in core and a way to share the same
snapshot in more than one backend, won't you like for any HotStandby
slave to be able to share this snapshot too? And run the subplan there?

And while at it, you'd obviously (ahem) want the slave to run the pooler
too and have the possibility to ask its master if it still have a given
snapshot available. So that any transaction (session?) that turns out
not to be read-only can get transparently run on the master instead. So
the "snapshot too old" error get some more reasons to be.

Oh, of course, the next step after that is to have a single cluster be
both a slave and a master, so that we can talk about distributing the
data. Multi-nodes multi-TB (make it PB) is the future, didn't they say?

We now have nodes with only some of the data (that could be only some
partitions) and a way to give them subplans over the network, and a way
for them to run a write query on other hosts without telling the client
connection. Sounds fun, he?

Regards,
--
dim

And I don't do drugs, not even caffeine. :)

#21Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#18)
#22Yeb Havinga
yebhavinga@gmail.com
In reply to: Tom Lane (#2)
#23Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Yeb Havinga (#23)
#25Yeb Havinga
yebhavinga@gmail.com
In reply to: Tom Lane (#24)
#26Yeb Havinga
yebhavinga@gmail.com
In reply to: Yeb Havinga (#25)
#27Mark Wong
markw@osdl.org
In reply to: Robert Haas (#18)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Mark Wong (#27)
#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Robert Haas (#28)
#30Mark Wong
markw@osdl.org
In reply to: Robert Haas (#28)