[PATCH] Add a guc parameter to control limit clause adjust path cost.
Hi all,
When a query contains LIMIT/OFFSET clauses, the optimizer uses this
information to influence path generation. For example, it may consider
the path with the lowest startup cost and adjust the cost of paths
accordingly.
In most cases, these behaviors are beneficial and result in a more optimal
path. However, there are some scenarios where these behaviors can lead to
a bad plan. One typical scenario is when a query contains both
ORDER BY and LIMIT clauses, and the optimizer chooses an incorrect index scan.
PostgreSQL makes very optimistic predictions about the path that uses an
ordered index to satisfy the row count requirement. When the actual scenario
differs from the prediction, PostgreSQL may generate a bad plan. One case
provided by Lukas [1]https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit <https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit > hits this issue. And another case i met recently is:
```
-- bad plan
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.00..611.58 rows=200 width=419) (actual time=3848.321..3867.253 rows=200 loops=1)
Buffers: shared hit=5957080 (main=5957080 vm=0 fsm=0) read=1 (main=1 vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519 page replay=0.002
-> Nested Loop (cost=1.00..1556085.69 rows=509704 width=419) (actual time=3848.320..3867.231 rows=200 loops=1)
Buffers: shared hit=5957080 (main=5957080 vm=0 fsm=0) read=1 (main=1 vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519 page replay=0.002
-> Index Scan using "idx_outDetail_channel_id" on pmc_outcome_detail d (cost=0.43..369798.26 rows=641998 width=411) (actual time=0.042..758.725 rows=1172313 loops=1)
Index Cond: ((channel_id = ANY ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Filter: ((voucher_file_name IS NULL) AND (payment_type = ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
Rows Removed by Filter: 586208
Buffers: shared hit=95516 (main=95516 vm=0 fsm=0)
-> Index Scan using pmc_outcome_pkey on pmc_outcome o (cost=0.56..1.85 rows=1 width=18) (actual time=0.002..0.002 rows=0 loops=1172313)
Index Cond: (id = d.outcome_id)
Filter: (user_type = '1'::numeric)
Rows Removed by Filter: 1
Buffers: shared hit=5861564 (main=5861564 vm=0 fsm=0) read=1 (main=1 vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519 page replay=0.002
Planning:
Buffers: shared hit=20 (main=16 vm=4 fsm=0)
Planning Time: 0.269 ms
Execution Time: 3867.322 ms
(24 rows)
-- good plan
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=15964.44..595103.67 rows=200 width=419) (actual time=809.243..818.407 rows=200 loops=1)
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Gather (cost=15964.44..595103.67 rows=509704 width=419) (actual time=809.242..818.381 rows=200 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Nested Loop (cost=14964.44..543133.27 rows=84951 width=419) (actual time=801.757..803.199 rows=31 loops=7)
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Parallel Bitmap Heap Scan on pmc_outcome_detail d (cost=14963.87..345418.09 rows=107000 width=411) (actual time=107.837..210.957 rows=167477 loops=7)
Recheck Cond: ((channel_id = ANY ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Filter: ((voucher_file_name IS NULL) AND (payment_type = ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
Rows Removed by Filter: 83745
Heap Blocks: exact=13247
Buffers: shared hit=95541 (main=95541 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Bitmap Index Scan on "idx_outDetail_channel_id" (cost=0.00..14803.38 rows=1072436 width=0) (actual time=101.764..101.764 rows=1820498 loops=1)
Index Cond: ((channel_id = ANY ('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Buffers: shared hit=6016 (main=6016 vm=0 fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page replay=0.004
-> Index Scan using pmc_outcome_pkey on pmc_outcome o (cost=0.56..1.85 rows=1 width=18) (actual time=0.003..0.003 rows=0 loops=1172342)
Index Cond: (id = d.outcome_id)
Filter: (user_type = '1'::numeric)
Rows Removed by Filter: 1
Buffers: shared hit=5861716 (main=5861716 vm=0 fsm=0)
Planning:
Buffers: shared hit=20 (main=16 vm=4 fsm=0)
Planning Time: 0.299 ms
Execution Time: 818.499 ms
(36 rows)
```
In such cases, users do not have a good way to force the optimizer to behave differently.
To address this, I have added a GUC parameter to control the
optimizer's use of LIMIT clauses for adjusting the cost of paths,
providing a means to force intervention. We have also considered similar
scenarios, such as MIN/MAX. Note that we still retain the optimizer's
use of LIMIT clauses to adjust the number of rows, as this is always
reasonable. The patch is provided in attachment(not add test case yet).
Any thoughts?
Regards,
Haiyang Li
[1]: https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit <https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit >
Attachments:
v01_enable_limit_adjust_cost_guc.patchapplication/octet-streamDownload
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 60b0fcfb6be..8e8be605aed 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -163,7 +163,7 @@ bool enable_parallel_hash = true;
bool enable_partition_pruning = true;
bool enable_presorted_aggregate = true;
bool enable_async_append = true;
-
+bool enable_limit_adjust_cost = true;
typedef struct
{
PlannerInfo *root;
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 64605be3178..510941feac0 100644
--- a/src/backend/optimizer/plan/planagg.c
+++ b/src/backend/optimizer/plan/planagg.c
@@ -412,12 +412,20 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
Int64GetDatum(1), false,
FLOAT8PASSBYVAL);
- /*
- * Generate the best paths for this query, telling query_planner that we
- * have LIMIT 1.
- */
- subroot->tuple_fraction = 1.0;
- subroot->limit_tuples = 1.0;
+ if (enable_limit_adjust_cost)
+ {
+ /*
+ * Generate the best paths for this query, telling query_planner that
+ * we have LIMIT 1.
+ */
+ subroot->tuple_fraction = 1.0;
+ subroot->limit_tuples = 1.0;
+ }
+ else
+ {
+ subroot->tuple_fraction = 0.0;
+ subroot->limit_tuples = -1.0;
+ }
final_rel = query_planner(subroot, minmax_qp_callback, NULL);
@@ -432,9 +440,11 @@ build_minmax_path(PlannerInfo *root, MinMaxAggInfo *mminfo,
/*
* Get the best presorted path, that being the one that's cheapest for
- * fetching just one row. If there's no such path, fail.
+ * fetching just one row. If there's no such path, fail. If the
+ * adjustance of path cost is disabled, we will also set path_fraction
+ * to 1.0.
*/
- if (final_rel->rows > 1.0)
+ if (enable_limit_adjust_cost && final_rel->rows > 1.0)
path_fraction = 1.0 / final_rel->rows;
else
path_fraction = 1.0;
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 566ce5b3cb4..bb6fc47be61 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1401,12 +1401,25 @@ grouping_planner(PlannerInfo *root, double tuple_fraction,
tuple_fraction = preprocess_limit(root, tuple_fraction,
&offset_est, &count_est);
- /*
- * If we have a known LIMIT, and don't have an unknown OFFSET, we can
- * estimate the effects of using a bounded sort.
- */
- if (count_est > 0 && offset_est >= 0)
- limit_tuples = (double) count_est + (double) offset_est;
+ if (enable_limit_adjust_cost)
+ {
+ /*
+ * If we have a known LIMIT, and don't have an unknown OFFSET, we
+ * can estimate the effects of using a bounded sort.
+ */
+ if (count_est > 0 && offset_est >= 0)
+ limit_tuples = (double) count_est + (double) offset_est;
+ }
+ else
+ {
+ /*
+ * Disable limit clause adjust cost, retrival all tuples.
+ * offset_est and count_est are needed to adjust estimated rows of
+ * limit path. Thus, we should do this after calling
+ * preprocess_limit().
+ */
+ tuple_fraction = 0.0;
+ }
}
/* Make tuple_fraction accessible to lower-level routines */
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 93e73cb44db..2f8465d0435 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -4079,6 +4079,13 @@ adjust_limit_rows_costs(double *rows, /* in/out parameter */
if (*rows < 1)
*rows = 1;
}
+
+ /* disable cost modification */
+ if (!enable_limit_adjust_cost)
+ {
+ *startup_cost = input_startup_cost;
+ *total_cost = input_total_cost;
+ }
}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 60b12446a1c..3a46ca17a80 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1036,6 +1036,16 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_limit_adjust_cost", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enable the planner's use of limit information adjust cost."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_limit_adjust_cost,
+ true,
+ NULL, NULL, NULL
+ },
{
{"geqo", PGC_USERSET, QUERY_TUNING_GEQO,
gettext_noop("Enables genetic query optimization."),
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index c5987440817..7cdac2ec0e6 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -71,6 +71,7 @@ extern PGDLLIMPORT bool enable_partition_pruning;
extern PGDLLIMPORT bool enable_presorted_aggregate;
extern PGDLLIMPORT bool enable_async_append;
extern PGDLLIMPORT int constraint_exclusion;
+extern PGDLLIMPORT bool enable_limit_adjust_cost;
extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
double index_pages, PlannerInfo *root);
"Haiyang Li" <mohen.lhy@alibaba-inc.com> writes:
To address this, I have added a GUC parameter to control the
optimizer's use of LIMIT clauses for adjusting the cost of paths,
providing a means to force intervention. We have also considered similar
scenarios, such as MIN/MAX. Note that we still retain the optimizer's
use of LIMIT clauses to adjust the number of rows, as this is always
reasonable. The patch is provided in attachment(not add test case yet).
Any thoughts?
I think this is a pretty bad solution as given. A global GUC switch
is an extremely blunt instrument and hard to use in production without
breaking cases you didn't want to break. The proposed patch seems to
have that problem in spades, as it looks like you've turned off
*every* place that has any consideration of LIMIT effects without
concern for whether that place is known to have problems, and
furthermore done so at the greatest possible distance from where the
estimates actually get made.
We have discussed fixes with a bit more finesse, such as adjusting
LIMIT cost estimates with the understanding that the tuples being
sought may not be uniformly distributed in the input data (which is
usually the ultimate cause of such plans performing badly). Nobody's
carried such ideas through to a complete proposal as yet, though.
regards, tom lane
On 11/2/2025 11:28 AM, Haiyang Li wrote:
Hi all,
When a query contains LIMIT/OFFSET clauses, the optimizer uses this
information to influence path generation. For example, it may consider
the path with the lowest startup cost and adjust the cost of paths
accordingly.In most cases, these behaviors are beneficial and result in a more optimal
path. However, there are some scenarios where these behaviors can lead to
a bad plan. One typical scenario is when a query contains both
ORDER BY and LIMIT clauses, and the optimizer chooses an incorrect index
scan.
PostgreSQL makes very optimistic predictions about the path that uses an
ordered index to satisfy the row count requirement. When the actual scenario
differs from the prediction, PostgreSQL may generate a bad plan. One case
provided by Lukas [1] hits this issue. And another case i met recently is:```
-- bad plan
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1.00..611.58 rows=200 width=419) (actual
time=3848.321..3867.253 rows=200 loops=1)
Buffers: shared hit=5957080 (main=5957080 vm=0 fsm=0) read=1 (main=1
vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519 page
replay=0.002
-> Nested Loop (cost=1.00..1556085.69 rows=509704 width=419) (actual
time=3848.320..3867.231 rows=200 loops=1)
Buffers: shared hit=5957080 (main=5957080 vm=0 fsm=0) read=1
(main=1 vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read io=0.519
page replay=0.002
-> Index Scan using "idx_outDetail_channel_id" on
pmc_outcome_detail d (cost=0.43..369798.26 rows=641998 width=411)
(actual time=0.042..758.725 rows=1172313 loops=1)
Index Cond: ((channel_id = ANY
('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Filter: ((voucher_file_name IS NULL) AND (payment_type =
ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
Rows Removed by Filter: 586208
Buffers: shared hit=95516 (main=95516 vm=0 fsm=0)
-> Index Scan using pmc_outcome_pkey on pmc_outcome o
(cost=0.56..1.85 rows=1 width=18) (actual time=0.002..0.002 rows=0
loops=1172313)
Index Cond: (id = d.outcome_id)
Filter: (user_type = '1'::numeric)
Rows Removed by Filter: 1
Buffers: shared hit=5861564 (main=5861564 vm=0 fsm=0)
read=1 (main=1 vm=0 fsm=0)
I/O Timings: shared/local read=0.518
Read Timings: total=0.534 ms buffer alloc=0.010 read
io=0.519 page replay=0.002
Planning:
Buffers: shared hit=20 (main=16 vm=4 fsm=0)
Planning Time: 0.269 ms
Execution Time: 3867.322 ms
(24 rows)
-- good plan
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=15964.44..595103.67 rows=200 width=419) (actual
time=809.243..818.407 rows=200 loops=1)
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5 (main=5
vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219 page
replay=0.004
-> Gather (cost=15964.44..595103.67 rows=509704 width=419) (actual
time=809.242..818.381 rows=200 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0) read=5
(main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read io=1.219
page replay=0.004
-> Nested Loop (cost=14964.44..543133.27 rows=84951 width=419)
(actual time=801.757..803.199 rows=31 loops=7)
Buffers: shared hit=5957257 (main=5957257 vm=0 fsm=0)
read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read
io=1.219 page replay=0.004
-> Parallel Bitmap Heap Scan on pmc_outcome_detail d
(cost=14963.87..345418.09 rows=107000 width=411) (actual
time=107.837..210.957 rows=167477 loops=7)
Recheck Cond: ((channel_id = ANY
('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Filter: ((voucher_file_name IS NULL) AND
(payment_type = ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
Rows Removed by Filter: 83745
Heap Blocks: exact=13247
Buffers: shared hit=95541 (main=95541 vm=0 fsm=0)
read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer alloc=0.015 read
io=1.219 page replay=0.004
-> Bitmap Index Scan on "idx_outDetail_channel_id"
(cost=0.00..14803.38 rows=1072436 width=0) (actual time=101.764..101.764
rows=1820498 loops=1)
Index Cond: ((channel_id = ANY
('{7000,2000,4000}'::numeric[])) AND (channel_id = '4000'::numeric))
Buffers: shared hit=6016 (main=6016 vm=0
fsm=0) read=5 (main=5 vm=0 fsm=0)
I/O Timings: shared/local read=1.219
Read Timings: total=1.243 ms buffer
alloc=0.015 read io=1.219 page replay=0.004
-> Index Scan using pmc_outcome_pkey on pmc_outcome o
(cost=0.56..1.85 rows=1 width=18) (actual time=0.003..0.003 rows=0
loops=1172342)
Index Cond: (id = d.outcome_id)
Filter: (user_type = '1'::numeric)
Rows Removed by Filter: 1
Buffers: shared hit=5861716 (main=5861716 vm=0 fsm=0)
Planning:
Buffers: shared hit=20 (main=16 vm=4 fsm=0)
Planning Time: 0.299 ms
Execution Time: 818.499 ms
(36 rows)
```In such cases, users do not have a good way to force the optimizer to
behave differently.To address this, I have added a GUC parameter to control the
optimizer's use of LIMIT clauses for adjusting the cost of paths,
providing a means to force intervention. We have also considered similar
scenarios, such as MIN/MAX. Note that we still retain the optimizer's
use of LIMIT clauses to adjust the number of rows, as this is always
reasonable. The patch is provided in attachment(not add test case yet).Any thoughts?
Regards,
Haiyang Li[1] https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit
<https://pganalyze.com/blog/5mins-postgres-planner-order-by-limit>
I am new to this area, but isn't the problem the filter selectivity
estimation? The filter Filter: ((voucher_file_name IS NULL) AND
(payment_type =
ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
Rows Removed by Filter: 586208
Buffers: shared hit=95516 (main=95516 vm=0 fsm=0)
removes the most rows. The optimizer shouldn't have selected the index
scan path even with LIMIT existing. Maybe the benefit of the LIMIT
should be adjusted downward based on the filter selectivity? Calculate
a new limit by dividing the total filter selectivity by the LIMIT?
Forgive me if I am completely offbase with this discussion.
Bryan Green
Bryan Green <dbryan.green@gmail.com> writes:
I am new to this area, but isn't the problem the filter selectivity
estimation? The filter Filter: ((voucher_file_name IS NULL) AND
(payment_type =ANY ('{7,8}'::numeric[])) AND (status = '3'::numeric))
Rows Removed by Filter: 586208
Buffers: shared hit=95516 (main=95516 vm=0 fsm=0)removes the most rows. The optimizer shouldn't have selected the index
scan path even with LIMIT existing.
I hadn't looked closely at the two plans being compared, but yeah,
they *both* suck. Why is it choosing a nestloop join when it knows
the outside will yield many many rows? The "good" plan is better
only because it decided to parallelize the outer table scan, which
moves the goalposts not very far and frankly seems like a chance
result anyway. I wonder if we are looking at the behavior of a
planner that's been locally modified in other ways, or is using
strange cost settings.
Maybe the benefit of the LIMIT
should be adjusted downward based on the filter selectivity?
Yeah, ideas similar to that are what I was alluding to in my
other response. The typical issue is that the filter clause's
selectivity is underestimated (so that there are fewer matching
rows than we thought) and/or the matching rows preferentially
appear closer to the end of the index scan than the beginning.
I'd be happier with a proposal that would inflate the LIMIT's
minimum cost by some factor based on the likelihood of that
sort of error, rather than just kneecapping the calculation
altogether.
regards, tom lane
There were some formatting display issues with my previous email reply,
so I’m using another email account to send this message.
"Tom Lane" <tgl@sss.pgh.pa.us> Sun, 02 Nov 2025 12:44:13 -0500 write:
I think this is a pretty bad solution as given. A global GUC switch
is an extremely blunt instrument and hard to use in production without
breaking cases you didn't want to break.
Yeah. Disabling this globally in a production environment is not advisable.
A more acceptable approach would be to use the pg_hint_plan extension
and apply hints only to the specific SQL statements that exhibit issues
for example:
```
/*+set(enable_limit_adjust_cost off) */ SELECT xxx;
```
The proposed patch seems to
have that problem in spades, as it looks like you've turned off
*every* place that has any consideration of LIMIT effects without
concern for whether that place is known to have problems, and
furthermore done so at the greatest possible distance from where the
estimates actually get made.
The patch works by making the cost estimator behave as if there is no LIMIT
clause inside the subquery, except for the rows estimation performed at the
limit node itself.
We have discussed fixes with a bit more finesse, such as adjusting
LIMIT cost estimates with the understanding that the tuples being
sought may not be uniformly distributed in the input data (which is
usually the ultimate cause of such plans performing badly). Nobody's
carried such ideas through to a complete proposal as yet, though.
I have also considered more finesse solutions, but haven't yet found one that
is appropriate. It seems that this issue has been discussed for quite some time.
Thus, I proposed adding a dedicated GUC. This solution may not be elegant, but
it is straightforward and working.
regards,
Haiyang Li