postgres_fdw: estimate_path_cost_size fails to re-use cached costs

Started by Etsuro Fujitaover 7 years ago3 messageshackers
Jump to latest
#1Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp

Hi,

I noticed yet another thing while updating the patch for pushing down
ORDER BY LIMIT. Let me explain. When costing foreign paths on the
basis of local statistics, we calculate/cache the costs of an unsorted
foreign path, and re-use them to estimate the costs of presorted foreign
paths, as shown below. BUT: we fail to re-use them for some typical
queries, such as "select * from ft1 order by a", due to
fpinfo->rel_startup_cost=0, leading to doing the same cost calculation
repeatedly.

/*
* We will come here again and again with different set of pathkeys
* that caller wants to cost. We don't need to calculate the cost of
* bare scan each time. Instead, use the costs if we have cached
them
* already.
*/
if (fpinfo->rel_startup_cost > 0 && fpinfo->rel_total_cost > 0)
{
startup_cost = fpinfo->rel_startup_cost;
run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
}

I think we should use "fpinfo->rel_startup_cost >= 0" here, not
"fpinfo->rel_startup_cost > 0". Also, it would be possible that the
total cost calculated is zero in corner cases (eg, seq_page_cost=0 and
cpu_tuple_cost=0 for the example), so I think we should change the total
cost part as well. Attached is a patch for that.

Best regards,
Etsuro Fujita

Attachments:

yet-another-fix-for-estimate_path_cost_size.patchtext/x-patch; name=yet-another-fix-for-estimate_path_cost_size.patchDownload+2-2
#2Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#1)
Re: postgres_fdw: estimate_path_cost_size fails to re-use cached costs

(2019/01/25 20:33), Etsuro Fujita wrote:

I noticed yet another thing while updating the patch for pushing down
ORDER BY LIMIT. Let me explain. When costing foreign paths on the
basis of local statistics, we calculate/cache the costs of an unsorted
foreign path, and re-use them to estimate the costs of presorted foreign
paths, as shown below. BUT: we fail to re-use them for some typical
queries, such as "select * from ft1 order by a", due to
fpinfo->rel_startup_cost=0, leading to doing the same cost calculation
repeatedly.

/*
* We will come here again and again with different set of pathkeys
* that caller wants to cost. We don't need to calculate the cost of
* bare scan each time. Instead, use the costs if we have cached
them
* already.
*/
if (fpinfo->rel_startup_cost> 0&& fpinfo->rel_total_cost> 0)
{
startup_cost = fpinfo->rel_startup_cost;
run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
}

I think we should use "fpinfo->rel_startup_cost>= 0" here, not
"fpinfo->rel_startup_cost> 0". Also, it would be possible that the
total cost calculated is zero in corner cases (eg, seq_page_cost=0 and
cpu_tuple_cost=0 for the example), so I think we should change the total
cost part as well. Attached is a patch for that.

I added the commit message. Updated patch attached. If no objections,
I'll apply that to HEAD only as there are no reports of actual trouble
from this, as far as I know.

Best regards,
Etsuro Fujita

Attachments:

0001-postgres_fdw-Fix-test-for-cached-costs-in-estimate_p.patchtext/x-patch; name=0001-postgres_fdw-Fix-test-for-cached-costs-in-estimate_p.patchDownload+1-2
#3Etsuro Fujita
fujita.etsuro@lab.ntt.co.jp
In reply to: Etsuro Fujita (#2)
Re: postgres_fdw: estimate_path_cost_size fails to re-use cached costs

(2019/01/28 19:37), Etsuro Fujita wrote:

(2019/01/25 20:33), Etsuro Fujita wrote:

I noticed yet another thing while updating the patch for pushing down
ORDER BY LIMIT. Let me explain. When costing foreign paths on the
basis of local statistics, we calculate/cache the costs of an unsorted
foreign path, and re-use them to estimate the costs of presorted foreign
paths, as shown below. BUT: we fail to re-use them for some typical
queries, such as "select * from ft1 order by a", due to
fpinfo->rel_startup_cost=0, leading to doing the same cost calculation
repeatedly.

/*
* We will come here again and again with different set of pathkeys
* that caller wants to cost. We don't need to calculate the cost of
* bare scan each time. Instead, use the costs if we have cached
them
* already.
*/
if (fpinfo->rel_startup_cost> 0&& fpinfo->rel_total_cost> 0)
{
startup_cost = fpinfo->rel_startup_cost;
run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
}

I think we should use "fpinfo->rel_startup_cost>= 0" here, not
"fpinfo->rel_startup_cost> 0". Also, it would be possible that the
total cost calculated is zero in corner cases (eg, seq_page_cost=0 and
cpu_tuple_cost=0 for the example), so I think we should change the total
cost part as well. Attached is a patch for that.

I added the commit message. Updated patch attached. If no objections,
I'll apply that to HEAD only as there are no reports of actual trouble
from this, as far as I know.

Pushed.

Best regards,
Etsuro Fujita