Partial Mode in Aggregate Functions

Started by Marcos Pegoraro2 months ago19 messageshackers

marcos@f10.com.br

2 months ago

My understanding reading [0]https://www.postgresql.org/docs/current/functions-aggregate.html was that Partial Mode would be related to
FILTER (WHERE ...). And I'm not alone thinking that, see [1]https://www.cybertec-postgresql.com/en/partial-aggregation-the-beautiful-way/, Hans calls
Partial Aggregation.

Then I see that jsonb_agg does not have Partial Mode, but testing, it works
select jsonb_agg(V) filter (where V>1) from (Values(1),(2)) x(V)

Then reading more carefully on the header of this page says
Aggregate functions that support Partial Mode are eligible to participate
in various optimizations, such as parallel aggregation.

But what does Partial Mode mean to the user ? Is it related only to
parallel or any other optimization ? And all aggregate functions can have
FILTER ?
I'm not exactly sure how to explain this case better, but I think it needs
better wording.

Additionally, on this page there is no mention to FILTER (WHERE ...).
Wouldn't it be better to explain something like it is on [2]https://www.postgresql.org/docs/current/tutorial-agg.html.

[0]: https://www.postgresql.org/docs/current/functions-aggregate.html
[1]: https://www.cybertec-postgresql.com/en/partial-aggregation-the-beautiful-way/
https://www.cybertec-postgresql.com/en/partial-aggregation-the-beautiful-way/
[2]: https://www.postgresql.org/docs/current/tutorial-agg.html

regards
Marcos

David G. Johnston

david.g.johnston@gmail.com

2 months ago

In reply to: Marcos Pegoraro (#1)

On Saturday, February 21, 2026, Marcos Pegoraro <marcos@f10.com.br> wrote:

Is it related only to parallel or any other optimization ?

Yes, see create aggregate for what is required when setting partial
aggregation to true. The explanation on [0] (aggregate function reference
page) regarding why json creation functions show false is also informative.

And all aggregate functions can have FILTER ?

Yes, the executor handles filter and simply omits invoking the aggregate if
the row does not pass the filter. The aggregates have no awareness they
have been filtered. Yes, one must reason their way to this, it isn’t
spelled out anywhere that I recall.

I'm not exactly sure how to explain this case better, but I think it needs
better wording.

I would agree. Or a cross-reference at least. It’s also another instance
of weasel-wording that seems to just invite doubt/questions. But it does
say optimizations which means it does not affect the rows returned, which
filter does.

Additionally, on this page there is no mention to FILTER (WHERE ...).
Wouldn't it be better to explain something like it is on [2].

We already cross-reference to the authoritative sections for this. Could
entertain a rephrasing of those cross-references but I would not go and
make a big thing of it here. That said, maybe a separate “since all
aggregates accept a filter clause its syntax has been o tied here for
brevity” sentence would work and match up with the existing one for order
by.

David J.

Marcos Pegoraro

marcos@f10.com.br

2 months ago

In reply to: David G. Johnston (#2)

Re: Partial Mode in Aggregate Functions

Em sáb., 21 de fev. de 2026 às 12:01, David G. Johnston <
david.g.johnston@gmail.com> escreveu:

Yes, see create aggregate for what is required when setting partial
aggregation to true. The explanation on [0] (aggregate function reference
page) regarding why json creation functions show false is also informative.

Okay, looking at the documentation in depth, it's possible to understand tha
t. The problem is, for a user who only reads this page, and often only the
table titled "Partial Mode", what do you want them to understand ?
I think it's more reasonable for him to think that Partial refers to
filtered data rather than imagining that it is data processed in parallel.

We already cross-reference to the authoritative sections for this. Could
entertain a rephrasing of those cross-references but I would not go and
make a big thing of it here. That said, maybe a separate “since all
aggregates accept a filter clause its syntax has been o tied here for
brevity” sentence would work and match up with the existing one for order
by.

Why not an introduction for aggregate functions describing not only FILTER
but ORDER BY and DISTINCT too ?
agg_function([ DISTINCT ] expression [ , ... ] [ ORDER BY order_by_clause
]) [ FILTER ( WHERE filter_clause ) ])

regards
Marcos

David G. Johnston

david.g.johnston@gmail.com

2 months ago

In reply to: Marcos Pegoraro (#3)

Re: Partial Mode in Aggregate Functions

On Sat, Feb 21, 2026, 10:32 Marcos Pegoraro <marcos@f10.com.br> wrote:

Em sáb., 21 de fev. de 2026 às 12:01, David G. Johnston <
david.g.johnston@gmail.com> escreveu:

Yes, see create aggregate for what is required when setting partial
aggregation to true. The explanation on [0] (aggregate function reference
page) regarding why json creation functions show false is also informative.

Okay, looking at the documentation in depth, it's possible to understand th
at. The problem is, for a user who only reads this page, and often only
the table titled "Partial Mode", what do you want them to understand ?
I think it's more reasonable for him to think that Partial refers to
filtered data rather than imagining that it is data processed in parallel.

Fair, people do guess when exposed to an unknown technical term instead of
searching out its definition. Rename the column to "Parallel Execution?"
to avoid that. I agree unless you are writing a new aggregate the exact
technical feature name we used is unimportant. Whether it can be totally
suppressed here or left to a footnote would need to be decided.

We already cross-reference to the authoritative sections for this. Could
entertain a rephrasing of those cross-references but I would not go and
make a big thing of it here. That said, maybe a separate “since all
aggregates accept a filter clause its syntax has been o tied here for
brevity” sentence would work and match up with the existing one for order
by.

Why not an introduction for aggregate functions describing not only FILTER
but ORDER BY and DISTINCT too ?
agg_function([ DISTINCT ] expression [ , ... ] [ ORDER BY order_by_clause
]) [ FILTER ( WHERE filter_clause ) ])

That is what the syntax chapter covers; though I have had the same thoughts
regarding the learn ability of all this. I'd definitely review any
proposal to make improvements here. It wouldn't bother me if the consensus
was to add some repetition in docs around this.

David J.

Marcos Pegoraro

marcos@f10.com.br

about 2 months ago

In reply to: David G. Johnston (#4)

Re: Partial Mode in Aggregate Functions

Em sáb., 21 de fev. de 2026 às 14:56, David G. Johnston <
david.g.johnston@gmail.com> escreveu:

Fair, people do guess when exposed to an unknown technical term instead of
searching out its definition. Rename the column to "Parallel Execution?"
to avoid that. I agree unless you are writing a new aggregate the exact
technical feature name we used is unimportant. Whether it can be totally
suppressed here or left to a footnote would need to be decided.

Perhaps this way we will have a better understanding of this case.

regards
Marcos

David Rowley

dgrowleyml@gmail.com

about 2 months ago

In reply to: Marcos Pegoraro (#5)

Re: Partial Mode in Aggregate Functions

On Thu, 26 Feb 2026 at 02:00, Marcos Pegoraro <marcos@f10.com.br> wrote:

Perhaps this way we will have a better understanding of this case.

I'm not sure how you thought partial mode was related to FILTER. Did
you read something in the documentation that led you to believe
they're somehow related? I imagine you just wrongly assumed that and
that's why you're confused. FILTER is implemented by filtering rows
according to the FILTER's WHERE clause before the aggregate's
transition function is called. That does not require any functionality
provided by the partial aggregate code.

As for the proposed patch, I'm strongly against it. If people are
confused about what partial aggregation is, then let's modify the
documentation to explain what it means.

Currently, [1]https://www.postgresql.org/docs/current/functions-aggregate.html says:

"Aggregate functions that support Partial Mode are eligible to
participate in various optimizations, such as parallel aggregation.".

We could replace that with something like:

"Partial Mode allows input values which belong to the same logical
group to be aggregated separately and later combined to form a single
aggregate state per group. These aggregate states must then be
finalized, which will produce a result equivalent to if all input
values had been aggregated together. Parallel aggregation uses this
so that each parallel worker can aggregate a subset of input values
and form an aggregate state per group. We say these aggregate states
are "partial" as other parallel workers may have aggregated input
values which logically belong to the same group. In the leader
process, the partial aggregate states generated by the parallel
workers are combined to form a single aggregate state per logical
group. The leader finalizes these aggregate states to produce the
final result."

Partial aggregates are also used for combining groups partially
aggregated at the partition level in partitionwise aggregates. Just
look at:

create table hp (a int, b int) partition by hash (a,b);
create table hp0 partition of hp for values with (modulus 2, remainder 0);
create table hp1 partition of hp for values with (modulus 2, remainder 1);
set enable_partitionwise_aggregate=1;
explain (costs off) select a,count(*) from hp group by a;
QUERY PLAN
----------------------------------------
Finalize HashAggregate
Group Key: hp.a
-> Append
-> Partial HashAggregate
Group Key: hp.a
-> Seq Scan on hp0 hp
-> Partial HashAggregate
Group Key: hp_1.a
-> Seq Scan on hp1 hp_1

David

[1]: https://www.postgresql.org/docs/current/functions-aggregate.html

Marcos Pegoraro

marcos@f10.com.br

about 2 months ago

In reply to: David Rowley (#6)

Re: Partial Mode in Aggregate Functions

Em qui., 26 de fev. de 2026 às 01:57, David Rowley <dgrowleyml@gmail.com>
escreveu:

As for the proposed patch, I'm strongly against it. If people are
confused about what partial aggregation is, then let's modify the
documentation to explain what it means.

Well, I started this because the way it's written is confusing, so much so
that I showed a link [1]https://www.cybertec-postgresql.com/en/partial-aggregation-the-beautiful-way/ from someone who speaks English better than me
saying that PARTIAL is FILTER.
And from the user point of view, who is reading just that page, partial
means incomplete, a subset of, etc. So I think it's easy for the reader to
misunderstand. Whether Parallel Mode is the best term to use here, I don't
know, but I'm convinced that the way it is written is easy to confuse.

[1]: https://www.cybertec-postgresql.com/en/partial-aggregation-the-beautiful-way/
https://www.cybertec-postgresql.com/en/partial-aggregation-the-beautiful-way/

regards
Marcos

David Rowley

dgrowleyml@gmail.com

about 2 months ago

In reply to: Marcos Pegoraro (#7)

Re: Partial Mode in Aggregate Functions

On Fri, 27 Feb 2026 at 00:44, Marcos Pegoraro <marcos@f10.com.br> wrote:

Em qui., 26 de fev. de 2026 às 01:57, David Rowley <dgrowleyml@gmail.com> escreveu:

As for the proposed patch, I'm strongly against it. If people are
confused about what partial aggregation is, then let's modify the
documentation to explain what it means.

Well, I started this because the way it's written is confusing, so much so that I showed a link [1] from someone who speaks English better than me saying that PARTIAL is FILTER.
And from the user point of view, who is reading just that page, partial means incomplete, a subset of, etc. So I think it's easy for the reader to misunderstand. Whether Parallel Mode is the best term to use here, I don't know, but I'm convinced that the way it is written is easy to confuse.

Does the wording I proposed make it easier to understand why the word
"partial" is in the name?

[1] - https://www.cybertec-postgresql.com/en/partial-aggregation-the-beautiful-way/

I've never heard anyone using "partial" for FILTER before. The blog is
from 2015, and partial aggregates got committed in 2016. I hope nobody
would use the partial terminology for FILTER today, as we now have
something else using that name.

David

Marcos Pegoraro

marcos@f10.com.br

about 2 months ago

In reply to: David Rowley (#8)

Re: Partial Mode in Aggregate Functions

Em qui., 26 de fev. de 2026 às 10:03, David Rowley <dgrowleyml@gmail.com>
escreveu:

I've never heard anyone using "partial" for FILTER before. The blog is

from 2015, and partial aggregates got committed in 2016.

I'm not reading based on this or that commit.
I'm reading the word "partial" in english, and I see partial as a subset
of.
Look at this example [1]https://www.postgresql.org/docs/current/indexes-partial.html, what does "partial" mean there ? Is it not a
filtered subset ?
So, if someone reads "Partial Mode" for the first time, what do they think?

[1]: https://www.postgresql.org/docs/current/indexes-partial.html

regards
Marcos

#10

David G. Johnston

david.g.johnston@gmail.com

about 2 months ago

In reply to: Marcos Pegoraro (#9)

Re: Partial Mode in Aggregate Functions

On Thursday, February 26, 2026, Marcos Pegoraro <marcos@f10.com.br> wrote:

So, if someone reads "Partial Mode" for the first time, what do they think?

I should go back and read the paragraph preceding this table and see if it
explains what this terminology means.

David J.

#11

David G. Johnston

david.g.johnston@gmail.com

about 2 months ago

In reply to: David Rowley (#6)

On Wednesday, February 25, 2026, David Rowley <dgrowleyml@gmail.com> wrote:

As for the proposed patch, I'm strongly against it. If people are
confused about what partial aggregation is, then let's modify the
documentation to explain what it means.

Overall that helps a lot - given its application to partitioning, “parallel
mode” indeed is not a good choice here. However, I think It reads a bit too
DBA and custom function author oriented which, IMO, is too technically
precise and detailed for someone who just wants to use these functions and
understand how the “false” subset is limited. A bit shorter and more use
of colloquial terminology would be nice. I may give that a shot.

I was pondering whether Batch Mode as more commonly used term would be
sufficiently accurate here. But I’m not going to try hard to avoid Partial
Mode.

I do question whether the column is even particularly valuable to keep
though. I would rate function volatility and strictness much higher in the
list of things people want to know about our functions and we don’t
document that, at least not in table form. Unless we advise people to
write their queries differently based upon this optimization property if
they need the feature they will use the function regardless. Showing it
likely just creates noise in their mind for something they cannot control
anyway.

David J.

#12

David Rowley

dgrowleyml@gmail.com

about 2 months ago

In reply to: David G. Johnston (#11)

Re: Partial Mode in Aggregate Functions

On Fri, 27 Feb 2026 at 07:39, David G. Johnston
<david.g.johnston@gmail.com> wrote:

I was pondering whether Batch Mode as more commonly used term would be sufficiently accurate here. But I’m not going to try hard to avoid Partial Mode.

I expect that would confuse more people. I'd expect a "batch mode" in
relation to aggregation to be transitioning multiple input values in a
single call to a transition function. I expect we'll one day have
that, and likely that transition function will be called "Batch
transition function"

Also, I just found [1]https://techcommunity.microsoft.com/blog/sqlserver/partial-aggregation/383338, so looks like SQL Server also calls these
Partial Aggregates too.

I'll stand by what I said earlier, that if people don't know what
partial aggregation is that we should aim to better explain what it
is. Renaming it won't magically inform people what it is and will
likely just confuse all the people who already know what it is. I
think you're about 10 years too late to bikeshed names for this.

David

[1]: https://techcommunity.microsoft.com/blog/sqlserver/partial-aggregation/383338

#13

Andreas Karlsson

andreas.karlsson@percona.com

about 2 months ago

In reply to: David Rowley (#12)

Re: Partial Mode in Aggregate Functions

On 2/27/26 5:34 AM, David Rowley wrote:

I'll stand by what I said earlier, that if people don't know what
partial aggregation is that we should aim to better explain what it
is. Renaming it won't magically inform people what it is and will
likely just confuse all the people who already know what it is. I
think you're about 10 years too late to bikeshed names for this.

There is nothing wrong with calling them partial aggregates. No name is
likely to never confuse anyone and I do not see any big issue with
partial in particular. Both parallel and batch, as suggested in this
thread, would be worse.

--
Andreas Karlsson
Percona

#14

David G. Johnston

david.g.johnston@gmail.com

about 2 months ago

In reply to: David Rowley (#6)

Re: Partial Mode in Aggregate Functions

On Wed, Feb 25, 2026 at 9:57 PM David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 26 Feb 2026 at 02:00, Marcos Pegoraro <marcos@f10.com.br> wrote:

Perhaps this way we will have a better understanding of this case.

I'm not sure how you thought partial mode was related to FILTER. Did
you read something in the documentation that led you to believe
they're somehow related? I imagine you just wrongly assumed that and
that's why you're confused. FILTER is implemented by filtering rows
according to the FILTER's WHERE clause before the aggregate's
transition function is called. That does not require any functionality
provided by the partial aggregate code.

As for the proposed patch, I'm strongly against it. If people are
confused about what partial aggregation is, then let's modify the
documentation to explain what it means.

Currently, [1] says:

"Aggregate functions that support Partial Mode are eligible to
participate in various optimizations, such as parallel aggregation.".

We could replace that with something like:

"Partial Mode allows input values which belong to the same logical
group to be aggregated separately and later combined to form a single
aggregate state per group. These aggregate states must then be
finalized, which will produce a result equivalent to if all input
values had been aggregated together. Parallel aggregation uses this
so that each parallel worker can aggregate a subset of input values
and form an aggregate state per group. We say these aggregate states
are "partial" as other parallel workers may have aggregated input
values which logically belong to the same group. In the leader
process, the partial aggregate states generated by the parallel
workers are combined to form a single aggregate state per logical
group. The leader finalizes these aggregate states to produce the
final result."

I commented about the phrasing for this being not ideal for the target
audience of the functions reference page. Here's some other wording to
consider:

--This first paragraph covers the same material, and then some, just a bit
differently:

"Partial Mode communicates that the computation of an aggregate value
can possibly be done piecemeal - where multiple intermediate computations
are performed on a subset of the data which are then combined into a
final aggregate value. Parallel aggregation uses this feature
to assign each parallel worker (and optionally the leader) to aggregate its
subset
of input values. The leader then accepts all these parital aggregations
and computes the final aggregate value for the row. For partitioned tables,
each partition's data is aggregated individually and then finalized into the
combined value for the entire partitioned table."

--This paragraph explains why the reader should care:

"This optimization mode can only be used if applying the aggregate function
to the output (including backend state, not just the scalar result) of the
partial aggregates is guaranteed to produce the same outcome as applying the
function to all the original inputs. This is not the case if the order
of those original inputs is important. The benefit of enabling parallelism
is obvious, but even in the non-parallel partitioned table scenario, not
having
to push the entire input set into memory at once to compute the aggregate
usually compensates for the extra handful of processing steps needed during
finalizing."

I am a bit unconfident regarding the second paragraph in terms of
correctness and adding all this here versus possibly elsewhere. Especially
since it isn't all that actionable.

Are there any special considerations for this as it pertains to executing
these functions in a window context, versus a normal group by, that should
be mentioned here (or the possible elsewhere)?

David J.

#15

Marcos Pegoraro

marcos@f10.com.br

about 2 months ago

In reply to: David G. Johnston (#14)

Re: Partial Mode in Aggregate Functions

Em sex., 27 de fev. de 2026 às 22:58, David G. Johnston <
david.g.johnston@gmail.com> escreveu:

I commented about the phrasing for this being not ideal for the target

audience of the functions reference page. Here's some other wording to
consider:

I sent this link to some colleagues and asked, "Tell me what 'Partial Mode'
means." Some are developers, some are DBAs, and some are just starting out.
Those who responded in less than 20 seconds obviously didn't read the
headers; they just deduced it by looking at the table title. And yes, some
said it related to aggregating data on fewer records. So, putting an
explanation in the header probably won't completely solve the problem,
since many people only look at the table, and if they can draw any
conclusions from it, they ignore the rest of the text.

regards
Marcos

#16

David G. Johnston

david.g.johnston@gmail.com

about 2 months ago

In reply to: Marcos Pegoraro (#15)

Re: Partial Mode in Aggregate Functions

On Sunday, March 1, 2026, Marcos Pegoraro <marcos@f10.com.br> wrote:

Em sex., 27 de fev. de 2026 às 22:58, David G. Johnston <
david.g.johnston@gmail.com> escreveu:

I commented about the phrasing for this being not ideal for the target

audience of the functions reference page. Here's some other wording to
consider:

I sent this link to some colleagues and asked, "Tell me what 'Partial
Mode' means." Some are developers, some are DBAs, and some are just
starting out. Those who responded in less than 20 seconds obviously didn't
read the headers; they just deduced it by looking at the table title. And
yes, some said it related to aggregating data on fewer records. So, putting
an explanation in the header probably won't completely solve the problem,
since many people only look at the table, and if they can draw any
conclusions from it, they ignore the rest of the text.

I can’t come up with a better header than Partial Mode. Given the fairly
benign outcome for such a misunderstanding - this isn’t critical to
understand knowledge for writing queries - I can live with people who don’t
read the documentation being misinformed. Though it does make my idea of
just ripping it out, and putting it elsewhere where optimizations are
talked about, more appealing.

David J.

#17

Marcos Pegoraro

marcos@f10.com.br

about 2 months ago

In reply to: David G. Johnston (#16)

Re: Partial Mode in Aggregate Functions

Em dom., 1 de mar. de 2026 às 11:29, David G. Johnston <
david.g.johnston@gmail.com> escreveu:

I can’t come up with a better header than Partial Mode.

The problem is that if they conclude that there is no FILTER for that
aggregate,
they decide to run two queries, not the most readable or the most
performant one.
with Data as (select V from MyTable where F1 = 100)
select (select jsonb_agg(V) from Data) Total,
(select jsonb_agg(V) from MyTable where V>1) Filtered;

regards
Marcos

#18

Marcos Pegoraro

marcos@f10.com.br

about 2 months ago

In reply to: Marcos Pegoraro (#17)

Re: Partial Mode in Aggregate Functions

Em dom., 1 de mar. de 2026 às 11:54, Marcos Pegoraro <marcos@f10.com.br>
escreveu:

with Data as (select V from MyTable where F1 = 100)
select (select jsonb_agg(V) from Data) Total,
(select jsonb_agg(V) from Data where V>1) Filtered;

This one should be.
Regards
Marcos

#19

David G. Johnston

david.g.johnston@gmail.com

about 2 months ago

In reply to: Marcos Pegoraro (#17)

Re: Partial Mode in Aggregate Functions

On Sun, Mar 1, 2026 at 7:55 AM Marcos Pegoraro <marcos@f10.com.br> wrote:

Em dom., 1 de mar. de 2026 às 11:29, David G. Johnston <
david.g.johnston@gmail.com> escreveu:

I can’t come up with a better header than Partial Mode.

The problem is that if they conclude that there is no FILTER for that
aggregate,
they decide to run two queries, not the most readable or the most
performant one.
with Data as (select V from MyTable where F1 = 100)
select (select jsonb_agg(V) from Data) Total,
(select jsonb_agg(V) from MyTable where V>1) Filtered;

I'm willing to entertain ideas, but for the proposed scenario I'm quite
content to take the "not our problem" attitude here. As you say, it's
suboptimal, not wrong. And it seems to me an unlikely real world outcome
worth bending over backwards to accommodate based on theory alone.

I'm much more amenable to trying to make clear that FILTER exists, and in
the process point out how/that it differs from Partial Mode.

David J.

Partial Mode in Aggregate Functions

Attachments: