Reject Foreign Tables from MIN/MAX indexscan Optimization?

Started by David E. Wheeler6 days ago5 messages
#1David E. Wheeler
david@justatheory.com
1 attachment(s)

Hackers,

In working on a foreign data wrapper, I noticed that a simple `MIN()` or `MAX()` query doesn’t push down, even though the remote server supports it and it will push down for a more complicated query.

To demonstrate, I added this query to the postgres_fdw tests:

```
explain (verbose, costs off)
select min(c2) from ft4;
```

And the output:

```
QUERY PLAN
-----------------------------------------------------------------------------------
Result
Output: (InitPlan minmax_1).col1
Replaces: MinMaxAggregate
InitPlan minmax_1
-> Limit
Output: ft4.c2
-> Foreign Scan on public.ft4
Output: ft4.c2
Remote SQL: SELECT c2 FROM "S 1"."T 3" ORDER BY c2 ASC NULLS LAST
(9 rows)
```

I don’t think this is what we want, but should let the remote server handle the optimization; besides the LIMIT isn’t pushed down!

The attached patch fixes the issue by disabling the MIN/MAX indexscan optimization for foreign tables. It simply does

```
/*
* Reject foreign tables. They have their own optimizations, so just let
* them have it.
*/
if (rte->relkind == RELKIND_FOREIGN_TABLE)
return;
```

With that change, the test out emits:

```
QUERY PLAN
-----------------------------------------------
Foreign Scan
Output: (min(c2))
Relations: Aggregate on (public.ft4)
Remote SQL: SELECT min(c2) FROM "S 1"."T 3"
(4 rows)
```

Leaving the optimization up to the remote server.

What do you think?

Best,

David

Attachments:

v1-0001-Disable-MIN-MAX-optimization-for-foreign-tables.patchapplication/octet-stream; name=v1-0001-Disable-MIN-MAX-optimization-for-foreign-tables.patch; x-unix-mode=0644Download
From 6a2510cc9efd163565d61e6fa511427a6784b729 Mon Sep 17 00:00:00 2001
From: "David E. Wheeler" <david@justatheory.com>
Date: Mon, 5 Jan 2026 18:53:11 -0500
Subject: [PATCH v1] Disable MIN/MAX optimization for foreign tables

Without this patch, the postgres_fdw tests case emits this plan:

```
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Result
   Output: (InitPlan minmax_1).col1
   Replaces: MinMaxAggregate
   InitPlan minmax_1
     ->  Limit
           Output: ft4.c2
           ->  Foreign Scan on public.ft4
                 Output: ft4.c2
                 Remote SQL: SELECT c2 FROM "S 1"."T 3" ORDER BY c2 ASC NULLS LAST
(9 rows)
```

Which seems sub-optimal, since it performs a table scan and fetches all
the rows back from the remote server. It would be useful to push down
the `LIMIT`, but better is to let the remote server handle the
optimization, especially since non-Postgres FDWs may not be able to take
advantage of the optimization.

So disable the min/max indexscan optimization for for foreign tables.
---
 .../postgres_fdw/expected/postgres_fdw.out    | 33 +++++++++++++++++++
 contrib/postgres_fdw/sql/postgres_fdw.sql     |  9 +++++
 src/backend/optimizer/plan/planagg.c          |  8 +++++
 3 files changed, 50 insertions(+)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6066510c7c0..3a0c4e2048a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -2940,6 +2940,39 @@ select count(c6), sum(c1), avg(c1), min(c2), max(c1), stddev(c2), sum(c1) * (ran
    100 | 49600 | 496.0000000000000000 |   1 | 991 |      0 | 49600
 (1 row)
 
+-- Simple MIN and MAX aggregates should push down
+explain (verbose, costs off)
+select min(c2) from ft4;
+                  QUERY PLAN                   
+-----------------------------------------------
+ Foreign Scan
+   Output: (min(c2))
+   Relations: Aggregate on (public.ft4)
+   Remote SQL: SELECT min(c2) FROM "S 1"."T 3"
+(4 rows)
+
+select min(c2) from ft4;
+ min 
+-----
+   3
+(1 row)
+
+explain (verbose, costs off)
+select max(c2) from ft4;
+                  QUERY PLAN                   
+-----------------------------------------------
+ Foreign Scan
+   Output: (max(c2))
+   Relations: Aggregate on (public.ft4)
+   Remote SQL: SELECT max(c2) FROM "S 1"."T 3"
+(4 rows)
+
+select max(c2) from ft4;
+ max 
+-----
+ 101
+(1 row)
+
 -- Aggregate is not pushed down as aggregation contains random()
 explain (verbose, costs off)
 select sum(c1 * (random() <= 1)::int) as sum, avg(c1) from ft1;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4f7ab2ed0ac..7535c51f296 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -801,6 +801,15 @@ explain (verbose, costs off)
 select count(c6), sum(c1), avg(c1), min(c2), max(c1), stddev(c2), sum(c1) * (random() <= 1)::int as sum2 from ft1 where c2 < 5 group by c2 order by 1, 2 limit 1;
 select count(c6), sum(c1), avg(c1), min(c2), max(c1), stddev(c2), sum(c1) * (random() <= 1)::int as sum2 from ft1 where c2 < 5 group by c2 order by 1, 2 limit 1;
 
+-- Simple MIN and MAX aggregates should push down
+explain (verbose, costs off)
+select min(c2) from ft4;
+select min(c2) from ft4;
+
+explain (verbose, costs off)
+select max(c2) from ft4;
+select max(c2) from ft4;
+
 -- Aggregate is not pushed down as aggregation contains random()
 explain (verbose, costs off)
 select sum(c1 * (random() <= 1)::int) as sum, avg(c1) from ft1;
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 09b38b2c378..9c27771e2b1 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -28,6 +28,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "access/htup_details.h"
 #include "catalog/pg_aggregate.h"
 #include "catalog/pg_type.h"
@@ -137,6 +138,13 @@ preprocess_minmax_aggregates(PlannerInfo *root)
 	else
 		return;
 
+	/*
+	 * Reject foreign tables. They have their own optimizations, so just let
+	 * them have it.
+	 */
+	if (rte->relkind == RELKIND_FOREIGN_TABLE)
+		return;
+
 	/*
 	 * Examine all the aggregates and verify all are MIN/MAX aggregates.  Stop
 	 * as soon as we find one that isn't.
-- 
2.51.0

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#1)
Re: Reject Foreign Tables from MIN/MAX indexscan Optimization?

"David E. Wheeler" <david@justatheory.com> writes:

In working on a foreign data wrapper, I noticed that a simple `MIN()` or `MAX()` query doesn’t push down, even though the remote server supports it and it will push down for a more complicated query.

Are you sure this is bad? I would hope that it'd only choose such
a plan if the remote server has a supporting index, else cost
considerations should cause us to reject it. Yeah, it looks a little
odd to ship the query in this form rather than as a MIN()/MAX(),
but I'm unconvinced that it's harmful per se. Maybe there is a
costing problem that needs to be fixed, but if so we should fix it
not lobotomize the planner.

The attached patch fixes the issue by disabling the MIN/MAX
indexscan optimization for foreign tables.

I don't like that too much. The original intention of
preprocess_minmax_aggregates was that it would only create indexscan
plans, which would automatically eliminate FDW relations. Some of
the comments still say that :-(, but in reality we got rid of that
restriction back in 9.1 (see 034967bdc and 8df08c848) because we
realized that the optimization could be applicable even when the
underlying plan isn't a simple indexscan. I think that the same
applies here, and so we might be throwing away useful cases.
In particular, your "let the remote side do the optimization"
argument is only valid for postgres_fdw; other FDWs might not
have equivalent smarts underneath them.

It would be good to look under the hood a little more and understand
why the MinMaxAgg path is winning, if it's not actually fast.

regards, tom lane

#3David E. Wheeler
david@justatheory.com
In reply to: Tom Lane (#2)
Re: Reject Foreign Tables from MIN/MAX indexscan Optimization?

On Jan 5, 2026, at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Are you sure this is bad? I would hope that it'd only choose such
a plan if the remote server has a supporting index, else cost
considerations should cause us to reject it.

How does one tell it that there is a supporting index? I’m working with ClickHouse, where there are no indexes to speak of; this query relies on the inherent ordering of the columns.

Yeah, it looks a little
odd to ship the query in this form rather than as a MIN()/MAX(),
but I'm unconvinced that it's harmful per se.

It would be less harmful if the LIMIT pushed down; otherwise it’s pulling the entire table back into Postgres, no?

Maybe there is a
costing problem that needs to be fixed, but if so we should fix it
not lobotomize the planner.

I would love to better understand how that all works, esp. WRT foreign tables. With an analytics database like ClickHouse, though, we’re trying to push down as much as possible.

I don't like that too much. The original intention of
preprocess_minmax_aggregates was that it would only create indexscan
plans, which would automatically eliminate FDW relations. Some of
the comments still say that :-(, but in reality we got rid of that
restriction back in 9.1 (see 034967bdc and 8df08c848) because we
realized that the optimization could be applicable even when the
underlying plan isn't a simple indexscan. I think that the same
applies here, and so we might be throwing away useful cases.

But it needs the LIMIT, no?

In particular, your "let the remote side do the optimization"
argument is only valid for postgres_fdw; other FDWs might not
have equivalent smarts underneath them.

Okay, but it’d be great to have a way to tell it “don’t do that” for those that do.

It would be good to look under the hood a little more and understand
why the MinMaxAgg path is winning, if it's not actually fast.

Where does one start?

D

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#3)
3 attachment(s)
Re: Reject Foreign Tables from MIN/MAX indexscan Optimization?

"David E. Wheeler" <david@justatheory.com> writes:

On Jan 5, 2026, at 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, it looks a little
odd to ship the query in this form rather than as a MIN()/MAX(),
but I'm unconvinced that it's harmful per se.

It would be less harmful if the LIMIT pushed down; otherwise it’s pulling the entire table back into Postgres, no?

Yeah, there might be some gold to be mined there. IIRC we already
have some ability for LIMIT to tell an immediately-below Sort that
only a limited number of rows will be retrieved. Could be useful
if ForeignScan could be in on that.

It would be good to look under the hood a little more and understand
why the MinMaxAgg path is winning, if it's not actually fast.

Where does one start?

I made the attached test case (based on postgres_fdw.sql, but with
a few more rows in the table). The idea is to check the behavior
for both an indexed column (c1) and an unindexed one (c2). To
eliminate cacheing/startup effects, I ran each test case three times
but considered only the lowest execution time of the three (often
but not always the last run).

The first attached results are from current HEAD; the second are
after applying your patch. What I find interesting is that
with use_remote_estimate enabled, it always goes for shipping the
MIN as-is. Your patch changes the behavior with use_remote_estimate
disabled, and we can see that the reason is that that mode estimates
the query cost at 100.24 with the minmax optimization and 146.78 when
shipping the aggregate. I've not dug into where we are getting
those numbers without use_remote_estimate, but perhaps that could
use refinement. On the other hand, there is not that much difference
between the actual runtimes, so perhaps it doesn't matter for
postgres_fdw.

I can't speak to your results with ClickHouse, since I have no idea
how the cost estimates are derived for that.

regards, tom lane

Attachments:

mmtest.sqltext/plain; charset=us-ascii; name=mmtest.sqlDownload
mmtest-head.outtext/plain; charset=us-ascii; name=mmtest-head.outDownload
mmtest-patched.outtext/plain; charset=us-ascii; name=mmtest-patched.outDownload
#5David E. Wheeler
david@justatheory.com
In reply to: Tom Lane (#4)
Re: Reject Foreign Tables from MIN/MAX indexscan Optimization?

On Jan 5, 2026, at 23:04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Yeah, there might be some gold to be mined there. IIRC we already
have some ability for LIMIT to tell an immediately-below Sort that
only a limited number of rows will be retrieved. Could be useful
if ForeignScan could be in on that.

I’m surprised the limit part of the plan isn’t there already. Is that something that needs to be tweaked in the paths passed to the FDW?

The first attached results are from current HEAD; the second are
after applying your patch. What I find interesting is that
with use_remote_estimate enabled, it always goes for shipping the
MIN as-is. Your patch changes the behavior with use_remote_estimate
disabled, and we can see that the reason is that that mode estimates
the query cost at 100.24 with the minmax optimization and 146.78 when
shipping the aggregate. I've not dug into where we are getting
those numbers without use_remote_estimate, but perhaps that could
use refinement.

Odd. They should be the same, no?

On the other hand, there is not that much difference
between the actual runtimes, so perhaps it doesn't matter for
postgres_fdw.

Maybe not with a loopback connection, but some of these queries send all 10000 values over the network, no? That could be a lot of additional latency in some situations. ISTM that the code should either:

1. Push down the `LIMIT 1`; or
2. Push down the min/max and let the optimization happen remotely

I can't speak to your results with ClickHouse, since I have no idea
how the cost estimates are derived for that.

ClickHouse is column-oriented, and all the columns are sorted, so min/max optimizations are completely different and never require an index scan or table scan: they just read the first or last value in the relevant column. This is why I started looking into this in the first place.

But your finding of the difference between use_remote_estimate true and false helped me to figure out the right lever to pull. pg_clickhouse has some remnants of use_remote_estimate from forking postgres_fdw back in 2019; they don’t do anything, but it led me to a function that sets more-or-less hard-coded estimates. In `estimate_path_cost_size()` I changed:

p_total_cost = 5.0 + coef

To

p_total_cost = 0.1 + coef

And that makes the min and max push down. Of course this impacts *all* pg_clickhouse queries, but that’s mainly what we want for analytics-oriented workloads.

Anyway, some comments on plans generated by your testing:

explain (analyze, verbose) select min(c1) from "S 1"."T 3";
explain (analyze, verbose) select min(c1) from "S 1"."T 3";
explain (analyze, verbose) select min(c1) from "S 1"."T 3";

Index-only scan, 1 row (though estimates 10000 for some reason).

explain (analyze, verbose) select min(c2) from "S 1"."T 3";
explain (analyze, verbose) select min(c2) from "S 1"."T 3";
explain (analyze, verbose) select min(c2) from "S 1"."T 3";

Table scan, 10000 rows, as expected.

ALTER FOREIGN TABLE ft4 OPTIONS (use_remote_estimate 'false');

explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c1) from ft4;

The Foreign Scan here confuses me:

-> Foreign Scan on public.ft4 (cost=100.00..761.35 rows=2925 width=4) (actual time=0.144..0.144 rows=1.00 loops=1)
Output: ft4.c1
Remote SQL: SELECT c1 FROM "S 1"."T 3" ORDER BY c1 ASC NULLS LAST

How does it result in only one row? Does that mean it gets all of them back from the seq scan but only reads the first? Are they still delivered over the network and buffered? Seems like it would be good to have `LIMIT 1` pushed down.

explain (analyze, verbose) select min(c2) from ft4;
explain (analyze, verbose) select min(c2) from ft4;
explain (analyze, verbose) select min(c2) from ft4;

Same confusion about the foreign scan plan:

-> Foreign Scan on public.ft4 (cost=100.00..761.35 rows=2925 width=4) (actual time=0.831..0.831 rows=1.00 loops=1)
Output: ft4.c2
Remote SQL: SELECT c2 FROM "S 1"."T 3" ORDER BY c2 ASC NULLS LAST

But also its execution time is 0.900 ms, which implies a remote full table scan. This makes sense, and `LIMIT 1` would reduce the network overhead, but not the remote execution time.

ALTER FOREIGN TABLE ft4 OPTIONS (SET use_remote_estimate 'true');

explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c1) from ft4;
explain (analyze, verbose) select min(c1) from ft4;

explain (analyze, verbose) select min(c2) from ft4;
explain (analyze, verbose) select min(c2) from ft4;
explain (analyze, verbose) select min(c2) from ft4;

Interesting that this configuration pushes down min/max, but of course then the same optimizations happen remotely as locally, but now we know only one row is returned, right?

My patch doesn’t change this much, because the remote optimization still takes place, it’s just that in that case it never returns more than a single row. Setting `LIMIT` pushdown aside, I guess the question is, how ought an FDW author to properly tweak cost estimates to encourage or discourage the min/max optimization?

Best,

David