Error from array_agg when table has many rows

Started by Kirill Zdornyyabout 1 year ago5 messagesbugs
Jump to latest
#1Kirill Zdornyy
kirill@dineserve.com

Hello,

After upgrading from PostgreSQL 12.19 to PostgreSQL 16.3 the function "array_agg" gives me the following error under certain conditions.

ERROR: input of anonymous composite types is not implemented

I was also able reproduce the issue on PostgreSQL 17.4 via the latest currently available Docker image.

Please see the following SQL script for the minimum reproduction.

```
drop table if exists my_table;

create table my_table
(
my_text text
);

-- Insert a minimum of 200,000 rows.
do
$$
declare
counter integer := 0;
begin
while counter < 200000
loop
INSERT INTO my_table (my_text) VALUES ('A simple sentence.');
counter := counter + 1;
end loop;
end;
$$;

select array_agg(t)
from (select my_text from my_table WHERE my_text != '') t;
```

On my machine the issue appears if 200,000 or more rows exist. I used EXPLAIN and noticed that the query plan is different if this condition is met.

This is the query plan if fewer than 200,000 rows exist.

```
Aggregate (cost=1935.40..1935.41 rows=1 width=32)
-> Seq Scan on my_table (cost=0.00..1719.90 rows=86199 width=32)
Filter: (my_text <> ''::text)
```

This is the query plan if more than 200,000 rows exist.

```
Finalize Aggregate (cost=3801.65..3801.66 rows=1 width=32)
-> Gather (cost=3801.53..3801.64 rows=1 width=32)
Workers Planned: 1
-> Partial Aggregate (cost=2801.53..2801.54 rows=1 width=32)
-> Parallel Seq Scan on my_table (cost=0.00..2548.00 rows=101411 width=32)
Filter: (my_text <> ''::text)
```

I am also able to reproduce the issue on AWS RDS PostgreSQL 16.3.

Thank you!

#2Richard Guo
guofenglinux@gmail.com
In reply to: Kirill Zdornyy (#1)
Re: Error from array_agg when table has many rows

On Sat, Mar 8, 2025 at 8:10 AM Kirill Zdornyy <kirill@dineserve.com> wrote:

After upgrading from PostgreSQL 12.19 to PostgreSQL 16.3 the function "array_agg" gives me the following error under certain conditions.

ERROR: input of anonymous composite types is not implemented

I was also able reproduce the issue on PostgreSQL 17.4 via the latest currently available Docker image.

Thanks for the report! I can reproduce it on master with the query
below.

create table t (a int);
insert into t values (1);

set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_table_scan_size=0;

# select array_agg(s) from (select * from t) s;
ERROR: input of anonymous composite types is not implemented

And the plan for this query is:

explain (verbose, costs off)
select array_agg(s) from (select * from t) s;
QUERY PLAN
---------------------------------------------------
Finalize Aggregate
Output: array_agg(ROW(t.a))
-> Gather
Output: (PARTIAL array_agg(ROW(t.a)))
Workers Planned: 2
-> Partial Aggregate
Output: PARTIAL array_agg(ROW(t.a))
-> Parallel Seq Scan on public.t
Output: t.a
(9 rows)

We are performing deserialization during the final phase of the
aggregation on data of type RECORD but we fail to provide a valid
typmod (array_agg_deserialize() uses -1 as the typmod when calling the
receiveproc).

I haven't verified it, but I suspect it's related to 16fd03e95.

Thanks
Richard

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Richard Guo (#2)
Re: Error from array_agg when table has many rows

Richard Guo <guofenglinux@gmail.com> writes:

We are performing deserialization during the final phase of the
aggregation on data of type RECORD but we fail to provide a valid
typmod (array_agg_deserialize() uses -1 as the typmod when calling the
receiveproc).

I haven't verified it, but I suspect it's related to 16fd03e95.

Yeah. I don't think there is any way for array_agg_deserialize to
know the correct typmod, so what we have to do is disable using
partial aggregation in this case. Fortunately there's a
policy-setting function that can be taught that, as attached.

regards, tom lane

Attachments:

disable_partial_aggregation_on_RECORD.patchtext/x-diff; charset=us-ascii; name=disable_partial_aggregation_on_RECORD.patchDownload+10-1
#4David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#3)
Re: Error from array_agg when table has many rows

On Sun, 9 Mar 2025 at 04:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Richard Guo <guofenglinux@gmail.com> writes:

We are performing deserialization during the final phase of the
aggregation on data of type RECORD but we fail to provide a valid
typmod (array_agg_deserialize() uses -1 as the typmod when calling the
receiveproc).

I haven't verified it, but I suspect it's related to 16fd03e95.

Yeah. I don't think there is any way for array_agg_deserialize to
know the correct typmod, so what we have to do is disable using
partial aggregation in this case. Fortunately there's a
policy-setting function that can be taught that, as attached.

The only way I can think of to get that would be to special-case
array_agg_serialize() to have it serialize the typmod when the send
function is record_send(), then add a similar special-case to
array_agg_deserialize() to check for a record_recv() and deserialize
the typmod there. That doesn't seem very pretty, so I'm happy to go
with your fix to disable parallel aggregates for this case.

David

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#4)
Re: Error from array_agg when table has many rows

David Rowley <dgrowleyml@gmail.com> writes:

On Sun, 9 Mar 2025 at 04:50, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah. I don't think there is any way for array_agg_deserialize to
know the correct typmod, so what we have to do is disable using
partial aggregation in this case. Fortunately there's a
policy-setting function that can be taught that, as attached.

The only way I can think of to get that would be to special-case
array_agg_serialize() to have it serialize the typmod when the send
function is record_send(), then add a similar special-case to
array_agg_deserialize() to check for a record_recv() and deserialize
the typmod there. That doesn't seem very pretty, so I'm happy to go
with your fix to disable parallel aggregates for this case.

Yeah ... we could probably make that work if we had to, but the
ugliness would likely metastasize outside array_agg_[de]serialize.
Since it took more than a year to get a field report, I'm content
to just disable the optimization instead. Pushed that way.

regards, tom lane