Window functions review
I've been slicing and dicing this patch for the last few days. There's a
lot of code in there, but here's some initial comments:
The code to initialize, advance, and finalize an aggregate should be
shared between Agg and Window nodes.
I'm a bit disappointed that we need so much code to support the window
aggregates. I figured we'd just have a special window function that
calls the initialize/advance/finalize functions for the right aggregate,
working with the WindowObject object like other window functions. But we
have in fact very separate code path for aggregates. I feel we should
implement the aggregates as just a thin wrapper window function that I
just described. Or, perhaps the aggregate functionality should be moved
to Agg node, although if we do that, we'd probably have to change that
again when we implement the window frames. Or perhaps it should be
completely new node type, though you'd really want to share the
tuplestore with the Window node if there's any window functions in the
same query, using the same window specification.
I don't like that the window functions are passed Var and Const nodes as
arguments. A user-defined function shouldn't see those internal data
structures, even though they're just passed as opaque arguments to
WinRowGetArg. You could pass WinRowGetArg just argument numbers, and
have it fetch the Expr nodes.
The incomplete support for window frames should be removed. It was good
that you did it, so that we have a sketch of how it'd work and got some
confidence that we won't have to rip out and rewrite all of this in the
future to support the window frames. But if it's not going to go into
this release, we should take it out.
The buffering strategies are very coarse. For aggregates, as long as we
don't support window frames, row buffering is enough. Even when
partition-buffering is needed, we shouldn't need to fetch the whole
partition into the tuplestore before we start processing. lead/lag for
example can start returning values earlier. That's just an optimization,
though, so maybe we can live with that for now.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Thank you for your reviewing my code.
2008/11/12 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
I've been slicing and dicing this patch for the last few days. There's a lot
of code in there, but here's some initial comments:The code to initialize, advance, and finalize an aggregate should be shared
between Agg and Window nodes.I'm a bit disappointed that we need so much code to support the window
aggregates. I figured we'd just have a special window function that calls
the initialize/advance/finalize functions for the right aggregate, working
with the WindowObject object like other window functions. But we have in
fact very separate code path for aggregates. I feel we should implement the
aggregates as just a thin wrapper window function that I just described. Or,
perhaps the aggregate functionality should be moved to Agg node, although if
we do that, we'd probably have to change that again when we implement the
window frames. Or perhaps it should be completely new node type, though
you'd really want to share the tuplestore with the Window node if there's
any window functions in the same query, using the same window specification.
I am disappointed at that, too. I was thinking as you do, that special
thin wrapper would do for pure aggregates. It seems, however,
impossible to me. Aggregates have two separate functions then
initialize value, store intermediate results, pass if each function is
strict, and so on. My rough sketch of wrapper for pure aggregates was
eval_windowaggregate() but actually they need to initialize, advance,
and finalize. As long as we have the same structure for aggregates,
they're always needed.
So the next choice is to share those three with Agg. This sounds sane.
I've not touched any code in nodeAgg.c. If we really need it I will do
it. Then we can remove initialize_windowaggregate(),
advance_windowaggregate(), finalize_windowaggregate(), and
initialize_peragg() from nodeWindow.c. They are almost same
correspoinding functions in Agg, except advance requires window
function API in Window. And if we do that, PerAgg and PerGroup should
be shared also.
The reason I didn't touch nodeAgg.c and didn't share them is that
there are only two nodes that use the share code. I wasn't sure if it
is general enough. Or I thought it'd be better to seperate code than
to share it so we keep them simple. What do you think?
I don't like that the window functions are passed Var and Const nodes as
arguments. A user-defined function shouldn't see those internal data
structures, even though they're just passed as opaque arguments to
WinRowGetArg. You could pass WinRowGetArg just argument numbers, and have it
fetch the Expr nodes.
I understand what you point. The current signature of WinRowGetArg is
Datum WinRowGetArg(WindowObject winobj, ExprState *argstate, bool *isnull);
And if we use a number as arguments instead of argstate, we need
fcinfo. How do you think we should pass it? WindowObject may hold it
but it is data related with the window logically, not with each
function. The alternative is to add one more argument to
WinRowGetArg(). I don't like long argument list. But if we really need
it, no way to deny it.
The incomplete support for window frames should be removed. It was good that
you did it, so that we have a sketch of how it'd work and got some
confidence that we won't have to rip out and rewrite all of this in the
future to support the window frames. But if it's not going to go into this
release, we should take it out.
Agreed. I'll remove it. Until recently I was wondering if I could try
to implement the FRAME clause. But actually it's too late for 8.4. Or
it would have to be after the current patch is commited.
The buffering strategies are very coarse. For aggregates, as long as we
don't support window frames, row buffering is enough. Even when
partition-buffering is needed, we shouldn't need to fetch the whole
partition into the tuplestore before we start processing. lead/lag for
example can start returning values earlier. That's just an optimization,
though, so maybe we can live with that for now.
Aggregates() RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW doesn't
require frame buffering but RANGE BETEWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING (that is OVER() ) needs it. You're telling me to
switch the strategy depending on cases? Hmmm, let me see.
--
Hitoshi Harada
Just to let you know, I'm hacking this patch again, the executor part in
particular. I've got a stripped out the unfinished window frame stuff,
refactored the Window object API so that the window functions don't get
to, and don't need to, access tuple slots directly. And a bunch of other
simplifications as well. I'll post as soon as it's in a more readable state.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
2008/11/22 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>:
Just to let you know, I'm hacking this patch again, the executor part in
particular. I've got a stripped out the unfinished window frame stuff,
refactored the Window object API so that the window functions don't get to,
and don't need to, access tuple slots directly. And a bunch of other
simplifications as well. I'll post as soon as it's in a more readable state.--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
I'm so glad to hear that. Currently I don't have time to refactor much
in executor, but only fixes such like what David reported. Feel free
to ask me about any cut of the patch.
Regards,
--
Hitoshi Harada