Partial aggregates pushdown

Started by Alexander Pyhalovover 4 years ago198 messageshackers

a.pyhalov@postgrespro.ru

over 4 years ago

Hi.

One of the issues when we try to use sharding in PostgreSQL is absence
of partial aggregates pushdown.

I see several opportunities to alleviate this issue.
If we look at Citus, it implements aggregate, calculating internal state
of an arbitrary agregate function and exporting it as text. So we could
calculate internal states independently on all data sources and then
finalize it, which allows to compute arbitrary aggregate.

But, as mentioned in [1]/messages/by-id/9998c3af9fdb5f7d62a6c7ad0fcd9142@postgrespro.ru thread, for some functions (like
count/max/min/sum) we can just push down them. It seems easy and covers
a lot of cases.
For now there are still issues - for example you can't handle functions
as avg() as we should somehow get its internal state or sum() variants,
which need aggserialfn/aggdeserialfn. Preliminary version is attached.

Is someone else working on the issue? Does suggested approach make
sense?

[1]: /messages/by-id/9998c3af9fdb5f7d62a6c7ad0fcd9142@postgrespro.ru
/messages/by-id/9998c3af9fdb5f7d62a6c7ad0fcd9142@postgrespro.ru

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Alexander Pyhalov (#1)

Re: Partial aggregates pushdown

Hi Alexander,

On 10/15/21 15:15, Alexander Pyhalov wrote:

Hi.

One of the issues when we try to use sharding in PostgreSQL is absence
of partial aggregates pushdown.

I see several opportunities to alleviate this issue.
If we look at Citus, it implements aggregate, calculating internal state
of an arbitrary agregate function and exporting it as text. So we could
calculate internal states independently on all data sources and then
finalize it, which allows to compute arbitrary aggregate.

But, as mentioned in [1] thread, for some functions (like
count/max/min/sum) we can just push down them. It seems easy and covers
a lot of cases.
For now there are still issues - for example you can't handle functions
as avg() as we should somehow get its internal state or sum() variants,
which need aggserialfn/aggdeserialfn. Preliminary version is attached.

Is someone else working on the issue? Does suggested approach make sense?

I think a couple people worked on this (or something similar/related) in
the past, but I don't recall any recent patches.

IMHO being able to push-down parts of an aggregation to other nodes is a
very desirable feature, that might result in huge improvements for some
analytical workloads.

As for the proposed approach, it's probably good enough for the first
version to restrict this to aggregates where the aggregate result is
sufficient, i.e. we don't need any new export/import procedures.

But it's very unlikely we'd want to restrict it the way the patch does
it, i.e. based on aggregate name. That's both fragile (people can create
new aggregates with such name) and against the PostgreSQL extensibility
(people may implement custom aggregates, but won't be able to benefit
from this just because of name).

So for v0 maybe, but I think there neeeds to be a way to relax this in
some way, for example we could add a new flag to pg_aggregate to mark
aggregates supporting this.

And then we should extend this for aggregates with more complex internal
states (e.g. avg), by supporting a function that "exports" the aggregate
state - similar to serial/deserial functions, but needs to be portable.

I think the trickiest thing here is rewriting the remote query to call
this export function, but maybe we could simply instruct the remote node
to use a different final function for the top-level node?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alexander Pyhalov

a.pyhalov@postgrespro.ru

over 4 years ago

In reply to: Tomas Vondra (#2)

Re: Partial aggregates pushdown

Tomas Vondra писал 2021-10-15 17:56:

Hi Alexander,

Hi.

And then we should extend this for aggregates with more complex
internal states (e.g. avg), by supporting a function that "exports"
the aggregate state - similar to serial/deserial functions, but needs
to be portable.

I think the trickiest thing here is rewriting the remote query to call
this export function, but maybe we could simply instruct the remote
node to use a different final function for the top-level node?

If we have some special export function, how should we find out that
remote server supports this? Should it be server property or should it
somehow find out it while connecting to the server?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Alexander Pyhalov (#3)

Re: Partial aggregates pushdown

On 10/15/21 17:05, Alexander Pyhalov wrote:

Tomas Vondra писал 2021-10-15 17:56:

Hi Alexander,

Hi.

And then we should extend this for aggregates with more complex
internal states (e.g. avg), by supporting a function that "exports"
the aggregate state - similar to serial/deserial functions, but needs
to be portable.

I think the trickiest thing here is rewriting the remote query to call
this export function, but maybe we could simply instruct the remote
node to use a different final function for the top-level node?

If we have some special export function, how should we find out that
remote server supports this? Should it be server property or should it
somehow find out it while connecting to the server?

Good question. I guess there could be some initial negotiation based on
remote node version etc. And we could also disable this pushdown for
older server versions, etc.

But after that, I think we can treat this just like other definitions
between local/remote node - we'd assume they match (i.e. the remote
server has the export function), and then we'd get an error if it does
not. If you need to use remote nodes without an export function, you'd
have to disable the pushdown.

AFAICS this works both for case with explicit query rewrite (i.e. we
send SQL with calls to the export function) and implicit query rewrite
(where the remote node uses a different finalize function based on mode,
specified by GUC).

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Stephen Frost

sfrost@snowman.net

over 4 years ago

In reply to: Tomas Vondra (#4)

Re: Partial aggregates pushdown

Greetings,

* Tomas Vondra (tomas.vondra@enterprisedb.com) wrote:

On 10/15/21 17:05, Alexander Pyhalov wrote:

Tomas Vondra писал 2021-10-15 17:56:

And then we should extend this for aggregates with more complex
internal states (e.g. avg), by supporting a function that "exports"
the aggregate state - similar to serial/deserial functions, but needs
to be portable.

I think the trickiest thing here is rewriting the remote query to call
this export function, but maybe we could simply instruct the remote
node to use a different final function for the top-level node?

If we have some special export function, how should we find out that
remote server supports this? Should it be server property or should it
somehow find out it while connecting to the server?

Good question. I guess there could be some initial negotiation based on
remote node version etc. And we could also disable this pushdown for older
server versions, etc.

Yeah, I'd think we would just only support it on versions where we know
it's available. That doesn't seem terribly difficult.

But after that, I think we can treat this just like other definitions
between local/remote node - we'd assume they match (i.e. the remote server
has the export function), and then we'd get an error if it does not. If you
need to use remote nodes without an export function, you'd have to disable
the pushdown.

AFAICS this works both for case with explicit query rewrite (i.e. we send
SQL with calls to the export function) and implicit query rewrite (where the
remote node uses a different finalize function based on mode, specified by
GUC).

Not quite sure where to drop this, but I've always figured we'd find a
way to use the existing PartialAgg / FinalizeAggregate bits which are
used for parallel query when it comes to pushing down to foreign servers
to perform aggregates. That also gives us how to serialize the results,
though we'd have to make sure that works across different
architectures.. I've not looked to see if that's the case today.

Then again, being able to transform an aggregate into a partial
aggregate that runs as an actual SQL query would mean we do partial
aggregate push-down against non-PG FDWs and that'd be pretty darn neat,
so maybe that's a better way to go, if we can figure out how.

(I mean, for avg it's pretty easy to just turn that into a SELECT that
grabs the sum and the count and use that.. other aggregates are more
complicated though and that doesn't work, maybe we need both?)

Thanks,

Stephen

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Stephen Frost (#5)

Re: Partial aggregates pushdown

On 10/15/21 21:31, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.vondra@enterprisedb.com) wrote:

On 10/15/21 17:05, Alexander Pyhalov wrote:

Tomas Vondra писал 2021-10-15 17:56:

And then we should extend this for aggregates with more complex
internal states (e.g. avg), by supporting a function that "exports"
the aggregate state - similar to serial/deserial functions, but needs
to be portable.

I think the trickiest thing here is rewriting the remote query to call
this export function, but maybe we could simply instruct the remote
node to use a different final function for the top-level node?

If we have some special export function, how should we find out that
remote server supports this? Should it be server property or should it
somehow find out it while connecting to the server?

Good question. I guess there could be some initial negotiation based on
remote node version etc. And we could also disable this pushdown for older
server versions, etc.

Yeah, I'd think we would just only support it on versions where we know
it's available. That doesn't seem terribly difficult.

Yeah.

But maybe Alexander was concerned about cases where the nodes disagree
on the aggregate definition, so one node might have the export function
and the other would not. E.g. the remote node may have older version of
an extension implementing the aggregate, without the export function
(although the server version supports it). I don't think we can do much
about that, it's just one of many issues that may be caused by
mismatching schemas.

I wonder if this might get more complex, though. Imagine for example a
partitioned table on node A with a FDW partition, pointing to a node B.
But on B, the object is partitioned again, with one partition placed on
C. So it's like

A -> partition on B -> partition on C

When planning on A, we can consider server version on B. But what if C
is an older version, not supporting the export function?

Bot sure if this makes any difference, though ... in the worst case it
will error out, and we should have a way to disable the feature on A.

But after that, I think we can treat this just like other definitions
between local/remote node - we'd assume they match (i.e. the remote server
has the export function), and then we'd get an error if it does not. If you
need to use remote nodes without an export function, you'd have to disable
the pushdown.

AFAICS this works both for case with explicit query rewrite (i.e. we send
SQL with calls to the export function) and implicit query rewrite (where the
remote node uses a different finalize function based on mode, specified by
GUC).

Not quite sure where to drop this, but I've always figured we'd find a
way to use the existing PartialAgg / FinalizeAggregate bits which are
used for parallel query when it comes to pushing down to foreign servers
to perform aggregates. That also gives us how to serialize the results,
though we'd have to make sure that works across different
architectures.. I've not looked to see if that's the case today.

It sure is similar to what serial/deserial functions do for partial
aggs, but IIRC the functions were not designed to be portable. I think
we don't even require compatibility across minor releases, because we
only use this to copy data between workers running at the same time. Not
saying it can't be made to work, of course.

Then again, being able to transform an aggregate into a partial
aggregate that runs as an actual SQL query would mean we do partial
aggregate push-down against non-PG FDWs and that'd be pretty darn neat,
so maybe that's a better way to go, if we can figure out how.

(I mean, for avg it's pretty easy to just turn that into a SELECT that
grabs the sum and the count and use that.. other aggregates are more
complicated though and that doesn't work, maybe we need both?)

Maybe, but that seems like a very different concept - transforming the
SQL so that it calculates different set of aggregates that we know can
be pushed down easily. But I don't recall any other practical example
beyond the AVG() -> SUM()/COUNT(). Well, VAR() can be translated into
SUM(X), SUM(X^2).

Another thing is how many users would actually benefit from this. I
mean, for this to matter you need partitioned table with partitions
placed on a non-PG FDW, right? Seems like a pretty niche use case.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alexander Pyhalov

a.pyhalov@postgrespro.ru

over 4 years ago

In reply to: Tomas Vondra (#2)

Re: Partial aggregates pushdown

Hi.

Tomas Vondra писал 2021-10-15 17:56:

As for the proposed approach, it's probably good enough for the first
version to restrict this to aggregates where the aggregate result is
sufficient, i.e. we don't need any new export/import procedures.

But it's very unlikely we'd want to restrict it the way the patch does
it, i.e. based on aggregate name. That's both fragile (people can
create new aggregates with such name) and against the PostgreSQL
extensibility (people may implement custom aggregates, but won't be
able to benefit from this just because of name).

So for v0 maybe, but I think there neeeds to be a way to relax this in
some way, for example we could add a new flag to pg_aggregate to mark
aggregates supporting this.

Updated patch to mark aggregates as pushdown-safe in pg_aggregates.

So far have no solution for aggregates with internal aggtranstype.
--
Best regards,
Alexander Pyhalov,
Postgres Professional

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Alexander Pyhalov (#7)

Re: Partial aggregates pushdown

On 10/19/21 08:56, Alexander Pyhalov wrote:

Hi.

Tomas Vondra писал 2021-10-15 17:56:

As for the proposed approach, it's probably good enough for the first
version to restrict this to aggregates where the aggregate result is
sufficient, i.e. we don't need any new export/import procedures.

But it's very unlikely we'd want to restrict it the way the patch does
it, i.e. based on aggregate name. That's both fragile (people can
create new aggregates with such name) and against the PostgreSQL
extensibility (people may implement custom aggregates, but won't be
able to benefit from this just because of name).

So for v0 maybe, but I think there neeeds to be a way to relax this in
some way, for example we could add a new flag to pg_aggregate to mark
aggregates supporting this.

Updated patch to mark aggregates as pushdown-safe in pg_aggregates.

So far have no solution for aggregates with internal aggtranstype.

Thanks. Please add it to the next CF, so that we don't lose track of it.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alexander Pyhalov

a.pyhalov@postgrespro.ru

over 4 years ago

In reply to: Tomas Vondra (#8)

Re: Partial aggregates pushdown

Tomas Vondra писал 2021-10-19 16:25:

On 10/19/21 08:56, Alexander Pyhalov wrote:

Hi.

Tomas Vondra писал 2021-10-15 17:56:

As for the proposed approach, it's probably good enough for the first
version to restrict this to aggregates where the aggregate result is
sufficient, i.e. we don't need any new export/import procedures.

But it's very unlikely we'd want to restrict it the way the patch
does
it, i.e. based on aggregate name. That's both fragile (people can
create new aggregates with such name) and against the PostgreSQL
extensibility (people may implement custom aggregates, but won't be
able to benefit from this just because of name).

So for v0 maybe, but I think there neeeds to be a way to relax this
in
some way, for example we could add a new flag to pg_aggregate to mark
aggregates supporting this.

Updated patch to mark aggregates as pushdown-safe in pg_aggregates.

So far have no solution for aggregates with internal aggtranstype.

Hi. Updated patch.
Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters. Converters are called
locally, they transform aggregate result to serialized internal
representation.
As converters don't have access to internal aggregate state, partial
aggregates like avg() are still not pushable.

For now the overall logic is quite simple. We now also call
add_foreign_grouping_paths() for partial aggregation. In
foreign_expr_walker() we check if aggregate is pushable (which means
that it is simple, marked as pushable and if has 'internal' as
aggtranstype, has associated converter).
If it is pushable, we proceed as with usual aggregates (but forbid
having pushdown). During postgresGetForeignPlan() we produce list of
converters for aggregates. As converters has different input argument
type from their result (bytea), we have to generate alternative
metadata, which is used by make_tuple_from_result_row().
If make_tuple_from_result_row() encounters field with converter, it
calls converter and returns result. For now we expect converter to have
only one input and output argument. Existing converters just transform
input value to internal representation and return its serialized form.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

#10

Zhihong Yu

zyu@yugabyte.com

over 4 years ago

In reply to: Alexander Pyhalov (#9)

Re: Partial aggregates pushdown

Hi,
w.r.t. 0001-Partial-aggregates-push-down-v03.patch

For partial_agg_ok(),

+   if (agg->aggdistinct || agg->aggvariadic || agg->aggkind !=
AGGKIND_NORMAL || agg->aggorder != NIL)
+       ok = false;

Since SearchSysCache1() is not called yet, you can return false directly.

+ if (aggform->aggpartialpushdownsafe != true)

The above can be written as:

if (!aggform->aggpartialpushdownsafe)

For build_conv_list():

+           Oid         converter_oid = InvalidOid;
+
+           if (IsA(tlentry->expr, Aggref))
...
+           }
+           convlist = lappend_oid(convlist, converter_oid);

Do you intend to append InvalidOid to convlist (when tlentry->expr is
not Aggref) ?

Cheers

Import Notes

Resolved by subject fallback

#11

Alexander Pyhalov

a.pyhalov@postgrespro.ru

over 4 years ago

In reply to: Zhihong Yu (#10)

Re: Partial aggregates pushdown

Zhihong Yu писал 2021-10-22 00:43:

Hi,
w.r.t. 0001-Partial-aggregates-push-down-v03.patch

Hi.

For partial_agg_ok(),
+   if (agg->aggdistinct || agg->aggvariadic || agg->aggkind !=
AGGKIND_NORMAL || agg->aggorder != NIL)
+       ok = false;
Since SearchSysCache1() is not called yet, you can return false
directly.

Fixed.

+ if (aggform->aggpartialpushdownsafe != true)

The above can be written as:

if (!aggform->aggpartialpushdownsafe)

Fixed.

For build_conv_list():
+           Oid         converter_oid = InvalidOid;
+
+           if (IsA(tlentry->expr, Aggref))
...
+           }
+           convlist = lappend_oid(convlist, converter_oid);
Do you intend to append InvalidOid to convlist (when tlentry->expr is
not Aggref) ?

Yes, for each tlist member (which matches fpinfo->grouped_tlist in case
when foreignrel is UPPER_REL) we need to find corresponding converter.
If we don't append InvalidOid, we can't find convlist member,
corresponding to tlist member. Added comments to build_conv_list.

Also fixed error in pg_dump.c (we selected '0' when
aggpartialconverterfn was not defined in schema, but checked for '-').

--
Best regards,
Alexander Pyhalov,
Postgres Professional

#12

Peter Eisentraut

peter_e@gmx.net

over 4 years ago

In reply to: Alexander Pyhalov (#9)

Re: Partial aggregates pushdown

On 21.10.21 12:55, Alexander Pyhalov wrote:

Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters. Converters are called
locally, they transform aggregate result to serialized internal
representation.
As converters don't have access to internal aggregate state, partial
aggregates like avg() are still not pushable.

It seems to me that the system should be able to determine from the
existing aggregate catalog entry whether an aggregate can be pushed
down. For example, it could check aggtranstype != internal and similar.
A separate boolean flag should not be necessary. Or if it is, the
patch should provide some guidance about how an aggregate function
author should set it.

#13

Alexander Pyhalov

a.pyhalov@postgrespro.ru

over 4 years ago

In reply to: Peter Eisentraut (#12)

Re: Partial aggregates pushdown

Peter Eisentraut писал 2021-11-01 12:47:

On 21.10.21 12:55, Alexander Pyhalov wrote:

Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters. Converters are called
locally, they transform aggregate result to serialized internal
representation.
As converters don't have access to internal aggregate state, partial
aggregates like avg() are still not pushable.

It seems to me that the system should be able to determine from the
existing aggregate catalog entry whether an aggregate can be pushed
down. For example, it could check aggtranstype != internal and
similar. A separate boolean flag should not be necessary.

Hi.
I think we can't infer this property from existing flags. For example,
if I have avg() with bigint[] argtranstype, it doesn't mean we can push
down it. We couldn't also decide if partial aggregete is safe to push
down based on aggfinalfn presence (for example, it is defined for
sum(numeric), but we can push it down.

Or if it
is, the patch should provide some guidance about how an aggregate
function author should set it.

Where should it be provided?

--
Best regards,
Alexander Pyhalov,
Postgres Professional

#14

Ilya Gladyshev

i.gladyshev@postgrespro.ru

over 4 years ago

In reply to: Alexander Pyhalov (#9)

Re: Partial aggregates pushdown

Hi,

On 21.10.2021 13:55, Alexander Pyhalov wrote:

Hi. Updated patch.
Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters.

I don't quite understand why this is restricted only to aggregates that
have 'internal' state, I feel like that should be possible for any
aggregate that has a function to convert its final result back to
aggregate state to be pushed down. While I couldn't come up with a
useful example for this, except maybe for an aggregate whose aggfinalfn
is used purely for cosmetic purposes (e.g. format the result into a
string), I still feel that it is an unnecessary restriction.

A few minor review notes to the patch:

+static List *build_conv_list(RelOptInfo *foreignrel);

this should probably be up top among other declarations.

@@ -1433,6 +1453,48 @@ postgresGetForeignPlan(PlannerInfo *root,
outer_plan);
}

+/*
+ * Generate attinmeta if there are some converters:
+ * they are expecxted to return BYTEA, but real input type is likely 
different.
+ */

typo in word "expec*x*ted".

@@ -139,10 +147,13 @@ typedef struct PgFdwScanState
                                   * for a foreign join scan. */
      TupleDesc    tupdesc;        /* tuple descriptor of scan */
      AttInMetadata *attinmeta;    /* attribute datatype conversion 
metadata */
+    AttInMetadata *rcvd_attinmeta;    /* metadata for received tuples, 
NULL if
+                                     * there's no converters */

Looks like rcvd_attinmeta is redundant and you could use attinmeta for
conversion metadata.

The last thing - the patch needs to be rebased, it doesn't apply cleanly
on top of current master.

Thanks,

Ilya Gladyshev

#15

Ilya Gladyshev

i.gladyshev@postgrespro.ru

over 4 years ago

In reply to: Alexander Pyhalov (#13)

Re: Partial aggregates pushdown

On 01.11.2021 13:30, Alexander Pyhalov wrote:

Peter Eisentraut писал 2021-11-01 12:47:

On 21.10.21 12:55, Alexander Pyhalov wrote:

Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters. Converters are
called locally, they transform aggregate result to serialized
internal representation.
As converters don't have access to internal aggregate state, partial
aggregates like avg() are still not pushable.

It seems to me that the system should be able to determine from the
existing aggregate catalog entry whether an aggregate can be pushed
down. For example, it could check aggtranstype != internal and
similar. A separate boolean flag should not be necessary.

Hi.
I think we can't infer this property from existing flags. For example,
if I have avg() with bigint[] argtranstype, it doesn't mean we can
push down it. We couldn't also decide if partial aggregete is safe to
push down based on aggfinalfn presence (for example, it is defined for
sum(numeric), but we can push it down.

I think one potential way to do it would be to allow pushing down
aggregates that EITHER have state of the same type as their return type,
OR have a conversion function that converts their return value to the
type of their state.

#16

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Ilya Gladyshev (#14)

Re: Partial aggregates pushdown

On 11/1/21 22:31, Ilya Gladyshev wrote:

Hi,

On 21.10.2021 13:55, Alexander Pyhalov wrote:

Hi. Updated patch.
Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters.

I don't quite understand why this is restricted only to aggregates that
have 'internal' state, I feel like that should be possible for any
aggregate that has a function to convert its final result back to
aggregate state to be pushed down. While I couldn't come up with a
useful example for this, except maybe for an aggregate whose aggfinalfn
is used purely for cosmetic purposes (e.g. format the result into a
string), I still feel that it is an unnecessary restriction.

But it's *not* restricted to aggregates with internal state. The patch
merely requires aggregates with "internal" state to have an extra
"converter" function.

That being said, I don't think the approach used to deal with internal
state is the right one. AFAICS it simply runs the aggregate on the
remote node, finalizes is there, and then uses the converter function to
"expand" the partial result back into the internal state.

Unfortunately that only works for aggregates like "sum" where the result
is enough to rebuild the internal state, but it fails for anything more
complex (like "avg" or "var").

Earlier in this thread I mentioned this to serial/deserial functions,
and I think we need to do something like that for internal state. I.e.
we need to call the "serial" function on the remote node, and which
dumps the whole internal state, and then "deserial" on the local node.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17

Tomas Vondra

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Ilya Gladyshev (#15)

Re: Partial aggregates pushdown

On 11/1/21 22:53, Ilya Gladyshev wrote:

On 01.11.2021 13:30, Alexander Pyhalov wrote:

Peter Eisentraut писал 2021-11-01 12:47:

On 21.10.21 12:55, Alexander Pyhalov wrote:

Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters. Converters are
called locally, they transform aggregate result to serialized
internal representation.
As converters don't have access to internal aggregate state, partial
aggregates like avg() are still not pushable.

It seems to me that the system should be able to determine from the
existing aggregate catalog entry whether an aggregate can be pushed
down. For example, it could check aggtranstype != internal and
similar. A separate boolean flag should not be necessary.

Hi.
I think we can't infer this property from existing flags. For example,
if I have avg() with bigint[] argtranstype, it doesn't mean we can
push down it. We couldn't also decide if partial aggregete is safe to
push down based on aggfinalfn presence (for example, it is defined for
sum(numeric), but we can push it down.

I think one potential way to do it would be to allow pushing down
aggregates that EITHER have state of the same type as their return type,
OR have a conversion function that converts their return value to the
type of their state.

IMO just checking (aggtranstype == result type) entirely ignores the
issue of portability - we've never required the aggregate state to be
portable in any meaningful way (between architectures, minor/major
versions, ...) and it seems foolish to just start relying on it here.

Imagine for example an aggregate using bytea state, storing some complex
C struct in it. You can't just copy that between architectures.

It's a bit like why we don't simply copy data types to network, but pass
them through input/output or send/receive functions. The new flag is a
way to mark aggregates where this is safe, and I don't think we can do
away without it.

The more I think about this, the more I'm convinced the proper way to do
this would be adding export/import functions, similar to serial/deserial
functions, with the extra portability guarantees. And we'd need to do
that for all aggregates, not just those with (aggtranstype == internal).

I get it - the idea of the patch is that keeping the data types the same
makes it much simpler to pass the aggregate state (compared to having to
export/import it). But I'm not sure it's the right approach.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18

Alexander Pyhalov

a.pyhalov@postgrespro.ru

over 4 years ago

In reply to: Ilya Gladyshev (#14)

Re: Partial aggregates pushdown

Hi.

Updated and rebased patch.

Ilya Gladyshev писал 2021-11-02 00:31:

Hi,
On 21.10.2021 13:55, Alexander Pyhalov wrote:

Hi. Updated patch.
Now aggregates with internal states can be pushed down, if they are
marked as pushdown safe (this flag is set to true for min/max/sum),
have internal states and associated converters.

I don't quite understand why this is restricted only to aggregates
that have 'internal' state, I feel like that should be possible for
any aggregate that has a function to convert its final result back to
aggregate state to be pushed down. While I couldn't come up with a
useful example for this, except maybe for an aggregate whose
aggfinalfn is used purely for cosmetic purposes (e.g. format the
result into a string), I still feel that it is an unnecessary
restriction.

I don't feel comfortable with it for the following reasons.
- Now partial converters translate aggregate result to serialized
internal representation.
In case when aggregate type is different from internal state,
we'd have to translate it to non-serialized internal representation,
so converters should skip serialization step. This seems like
introducing two
kind of converters.
- I don't see any system aggregates which would benefit from this.

However, it doesn't seem to be complex, and if it seems to be desirable,
it can be done.
For now introduced check that transtype matches aggregate type (or is
internal)
in partial_agg_ok().

A few minor review notes to the patch:

+static List *build_conv_list(RelOptInfo *foreignrel);

this should probably be up top among other declarations.

Moved it upper.

@@ -1433,6 +1453,48 @@ postgresGetForeignPlan(PlannerInfo *root,
outer_plan);
}
+/*
+ * Generate attinmeta if there are some converters:
+ * they are expecxted to return BYTEA, but real input type is likely
different.
+ */
typo in word "expecxted".

Fixed.

@@ -139,10 +147,13 @@ typedef struct PgFdwScanState
* for a foreign join scan. */
TupleDesc    tupdesc;        /* tuple descriptor of scan */
AttInMetadata *attinmeta;    /* attribute datatype conversion
metadata */
+    AttInMetadata *rcvd_attinmeta;    /* metadata for received
tuples, NULL if
+                                     * there's no converters */

Looks like rcvd_attinmeta is redundant and you could use attinmeta for
conversion metadata.

Seems so, removed it.

--
Best regards,
Alexander Pyhalov,
Postgres Professional

#19

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Alexander Pyhalov (#18)

Re: Partial aggregates pushdown

On 2 Nov 2021, at 10:12, Alexander Pyhalov <a.pyhalov@postgrespro.ru> wrote:

Updated and rebased patch.

+	state = (Int128AggState *) palloc0(sizeof(Int128AggState));
+	state->calcSumX2 = false;
+
+	if (!PG_ARGISNULL(0))
+	{
+#ifdef HAVE_INT128
+		do_int128_accum(state, (int128) PG_GETARG_INT64(0));
+#else
+		do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0)));
+#endif

This fails on non-INT128 platforms as state cannot be cast to Int128AggState
outside of HAVE_INT128; it's not defined there. This needs to be a
PolyNumAggState no?

--
Daniel Gustafsson https://vmware.com/

#20

Alexander Pyhalov

a.pyhalov@postgrespro.ru

over 4 years ago

In reply to: Daniel Gustafsson (#19)

Re: Partial aggregates pushdown

Daniel Gustafsson писал 2021-11-03 16:45:

On 2 Nov 2021, at 10:12, Alexander Pyhalov <a.pyhalov@postgrespro.ru>
wrote:

Updated and rebased patch.
+	state = (Int128AggState *) palloc0(sizeof(Int128AggState));
+	state->calcSumX2 = false;
+
+	if (!PG_ARGISNULL(0))
+	{
+#ifdef HAVE_INT128
+		do_int128_accum(state, (int128) PG_GETARG_INT64(0));
+#else
+		do_numeric_accum(state, int64_to_numeric(PG_GETARG_INT64(0)));
+#endif
This fails on non-INT128 platforms as state cannot be cast to
Int128AggState
outside of HAVE_INT128; it's not defined there. This needs to be a
PolyNumAggState no?

Hi.
Thank you for noticing this. It's indeed fails with
pgac_cv__128bit_int=no.
Updated patch.
--
Best regards,
Alexander Pyhalov,
Postgres Professional