PassDownLimitBound for ForeignScan/CustomScan [take-2]
On Mon, Nov 21, 2016 at 1:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Hello,
The attached patch is a revised version of pass-down LIMIT to FDW/CSP.
Below is the updates from the last version.
'ps_numTuples' of PlanState was declared as uint64, instead of long
to avoid problems on 32bits machine when a large LIMIT clause is
supplied.'ps_numTuples' is re-interpreted; 0 means that its upper node wants
to fetch all the tuples. It allows to eliminate a boring initialization
on ExecInit handler for each executor node.Even though it was not suggested, estimate_path_cost_size() of postgres_fdw
adjusts number of rows if foreign-path is located on top-level of
the base-relations and LIMIT clause takes a constant value.
It will make more adequate plan as follows:* WITHOUT this patch
--------------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id
and t_a.x < t_b.x LIMIT 100;
QUERY PLAN
------------------------------------------------------------
----------------------------
Limit (cost=261.17..274.43 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
-> Hash Join (cost=261.17..581.50 rows=2416 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
Hash Cond: (t_a.id = t_b.id)
Join Filter: (t_a.x < t_b.x)
-> Foreign Scan on public.t_a (cost=100.00..146.12 rows=1204
width=44)
Output: t_a.id, t_a.x, t_a.y
Remote SQL: SELECT id, x, y FROM public.t
-> Hash (cost=146.12..146.12 rows=1204 width=44)
Output: t_b.id, t_b.x, t_b.y
-> Foreign Scan on public.t_b (cost=100.00..146.12
rows=1204 width=44)
Output: t_b.id, t_b.x, t_b.y
Remote SQL: SELECT id, x, y FROM public.t
(14 rows)* WITH this patch
-----------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id
and t_a.x < t_b.x LIMIT 100;QUERY PLAN
------------------------------------------------------------
--------------------------------------------------
Limit (cost=100.00..146.58 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
-> Foreign Scan (cost=100.00..146.58 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
Relations: (public.t_a) INNER JOIN (public.t_b)
Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM
(public.t r1 INNER JOIN public.t r2 ON (((r1.x < r2.x)) AND ((r1.id =
r2.id))))
(6 rows)That's nice.
On the other hands, I noticed it is not safe to attach LIMIT clause at
the planner stage because root->limit_tuples is declared as double.
Even if LIMIT clause takes a constant value, it is potentially larger
than 2^53 which is the limitation we can represent accurately with
float64 data type but LIMIT clause allows up to 2^63-1.
So, postgres_fdw now attaches LIMIT clause on the remote query on
execution time only.
I think, it's OK.
Here are few comments on latest patch:
1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.
2.
+ *
+ * MEMO: root->limit_tuples is not attached when query contains
+ * grouping-clause or aggregate functions. So, we don's adjust
+ * rows even if LIMIT <const> is supplied.
Can you please explain why you are not doing this for grouping-clause or
aggregate functions.
3.
Typo:
don's => don't
Rest of the changes look good to me.
Thanks
Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, November 10, 2016 3:08 AM
To: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>
Cc: pgsql-hackers@postgresql.org; Jeevan Chalke
<jeevan.chalke@enterprisedb.com>; Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan
[take-2]On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
wrote:As an example, I enhanced postgres_fdw to understand the ps_numTuples
if it is set. If and when remote ORDER BY is pushed down, the latest
code tries to sort the entire remote table because it does not know
how many rows to be returned. Thus, it took larger execution time.
On the other hands, the patched one runs the remote query with LIMIT
clause according to the ps_numTuples; which is informed by the Limit
node on top of the ForeignScan node.So there are two cases here. If the user says LIMIT 12, we could in
theory
know that at planner time and optimize accordingly. If the user says
LIMIT
twelve(), however, we will need to wait until execution time unless
twelve()
happens to be capable of being simplified to a constant by the planner.
Therefore, it's possible to imagine having two mechanisms here. In the
simple case where the LIMIT and OFFSET values are constants, we could
implement a system to get hold of that information during planning and
use it for whatever we like. In addition, we can have an
execution-time system that optimizes based on values available atexecution
(regardless of whether those values were also available during planning).
Those are, basically, two separate things, and this patch has enough to
do just focusing on one of them.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachments:
regression.diffsapplication/octet-stream; name=regression.diffsDownload+253-253
Hello,
Sorry for my late response.
The attached patch reflects your comments.
Here are few comments on latest patch:
1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.
It was an incorrect interaction when postgres_fdw tries to push down
sorting to the remote side. We cannot attach LIMIT clause on the plain
scan path across SORT, however, the previous version estimated the cost
for the plain scan with LIMIT clause even if local sorting is needed.
If remote scan may return just 10 rows, estimated cost of the local sort
is very lightweight, thus, this unreasonable path was chosen.
(On the other hands, its query execution results were correct because
ps_numTuples is not delivered across Sort node, so ForeignScan eventually
scanned all the remote tuples. It made correct results but not optimal
from the viewpoint of performance.)
The v3 patch estimates the cost with remote LIMIT clause only if supplied
pathkey strictly matches with the final output order of the query, thus,
no local sorting is expected.
Some of the regression test cases still have different plans but due to
the new optimization by remote LIMIT clause.
Without remote LIMIT clause, some of regression test cases preferred
remote-JOIN + local-SORT then local-LIMIT.
Once we have remote-LIMIT option, it allows to discount the cost for
remote-SORT by choice of top-k heap sorting.
It changed the optimizer's decision on some test cases.
Potential one big change is the test case below.
-- CROSS JOIN, not pushed down
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
It assumed CROSS JOIN was not pushed down due to the cost for network
traffic, however, remote LIMIT reduced the estimated number of tuples
to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to run
on the remote side.
2. + * + * MEMO: root->limit_tuples is not attached when query contains + * grouping-clause or aggregate functions. So, we don's adjust + * rows even if LIMIT <const> is supplied.Can you please explain why you are not doing this for grouping-clause or
aggregate functions.
See grouping_planner() at optimizer/plan/planner.c
It puts an invalid value on the root->limit_tuples if query has GROUP BY clause,
so we cannot know number of tuples to be returned even if we have upper limit
actually.
/*
* Figure out whether there's a hard limit on the number of rows that
* query_planner's result subplan needs to return. Even if we know a
* hard limit overall, it doesn't apply if the query has any
* grouping/aggregation operations, or SRFs in the tlist.
*/
if (parse->groupClause ||
parse->groupingSets ||
parse->distinctClause ||
parse->hasAggs ||
parse->hasWindowFuncs ||
parse->hasTargetSRFs ||
root->hasHavingQual)
root->limit_tuples = -1.0;
else
root->limit_tuples = limit_tuples;
3.
Typo:don's => don't
Fixed,
best regards,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
passdown-limit-fdw.v3.patchapplication/octet-stream; name=passdown-limit-fdw.v3.patchDownload+118-98
Oops, I oversight this patch was marked as "returned with feedback",
not "moved to the next CF".
Its status has not been changed since the last update. (Code was
revised according to the last
comment by Jeevan, but CF-Nov was time up at that time.)
How do I handle the patch?
2016-12-05 16:49 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Hello,
Sorry for my late response.
The attached patch reflects your comments.Here are few comments on latest patch:
1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.It was an incorrect interaction when postgres_fdw tries to push down
sorting to the remote side. We cannot attach LIMIT clause on the plain
scan path across SORT, however, the previous version estimated the cost
for the plain scan with LIMIT clause even if local sorting is needed.
If remote scan may return just 10 rows, estimated cost of the local sort
is very lightweight, thus, this unreasonable path was chosen.
(On the other hands, its query execution results were correct because
ps_numTuples is not delivered across Sort node, so ForeignScan eventually
scanned all the remote tuples. It made correct results but not optimal
from the viewpoint of performance.)The v3 patch estimates the cost with remote LIMIT clause only if supplied
pathkey strictly matches with the final output order of the query, thus,
no local sorting is expected.Some of the regression test cases still have different plans but due to
the new optimization by remote LIMIT clause.
Without remote LIMIT clause, some of regression test cases preferred
remote-JOIN + local-SORT then local-LIMIT.
Once we have remote-LIMIT option, it allows to discount the cost for
remote-SORT by choice of top-k heap sorting.
It changed the optimizer's decision on some test cases.Potential one big change is the test case below.
-- CROSS JOIN, not pushed down
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;It assumed CROSS JOIN was not pushed down due to the cost for network
traffic, however, remote LIMIT reduced the estimated number of tuples
to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to run
on the remote side.2. + * + * MEMO: root->limit_tuples is not attached when query contains + * grouping-clause or aggregate functions. So, we don's adjust + * rows even if LIMIT <const> is supplied.Can you please explain why you are not doing this for grouping-clause or
aggregate functions.See grouping_planner() at optimizer/plan/planner.c
It puts an invalid value on the root->limit_tuples if query has GROUP BY clause,
so we cannot know number of tuples to be returned even if we have upper limit
actually./*
* Figure out whether there's a hard limit on the number of rows that
* query_planner's result subplan needs to return. Even if we know a
* hard limit overall, it doesn't apply if the query has any
* grouping/aggregation operations, or SRFs in the tlist.
*/
if (parse->groupClause ||
parse->groupingSets ||
parse->distinctClause ||
parse->hasAggs ||
parse->hasWindowFuncs ||
parse->hasTargetSRFs ||
root->hasHavingQual)
root->limit_tuples = -1.0;
else
root->limit_tuples = limit_tuples;3.
Typo:don's => don't
Fixed,
best regards,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
passdown-limit-fdw.v3.patchapplication/octet-stream; name=passdown-limit-fdw.v3.patchDownload+118-98
On Mon, Jan 2, 2017 at 10:07 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
Oops, I oversight this patch was marked as "returned with feedback",
not "moved to the next CF".Its status has not been changed since the last update. (Code was
revised according to the last
comment by Jeevan, but CF-Nov was time up at that time.)How do I handle the patch?
Can you just change the status to "Moved to Next CF" and then make it
"Needs Review"?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
Sent: Thursday, January 05, 2017 5:29 AM
To: Kohei KaiGai <kaigai@kaigai.gr.jp>
Cc: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>; Jeevan Chalke
<jeevan.chalke@enterprisedb.com>; pgsql-hackers@postgresql.org; Etsuro
Fujita <fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
Subject: Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan
[take-2]On Mon, Jan 2, 2017 at 10:07 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
Oops, I oversight this patch was marked as "returned with feedback",
not "moved to the next CF".Its status has not been changed since the last update. (Code was
revised according to the last comment by Jeevan, but CF-Nov was time
up at that time.)How do I handle the patch?
Can you just change the status to "Moved to Next CF" and then make it "Needs
Review"?
Unfortunately, it was not possible. Probably, administrator privilege will be
needed for this operation.
May I add it to the CF:2017-03?
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 5, 2017 at 8:58 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Unfortunately, it was not possible. Probably, administrator privilege will be
needed for this operation.
Adding a patch to a CF in progress indeed requires administrator privileges,
May I add it to the CF:2017-03?
I can notice that the previous CFM has switched this patch to
"returned with feedback" without mentioning the new status on this
thread. Perhaps that was not appropriate.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
The attached patch is rebased version of pass-down LIMIT clause patch,
which was forgotten to register on the last CF.
It allows to inform required number of rows to the sub-plans not only
ones we have individually handled at pass_down_bound().
Its primary target is control of number of remote tuple transfer over
the network connection by postgres_fdw.
According to the past discussion, we add a new field @ps_numTuples on
the PlanState to represent the required number of tuples.
Limit node assign a particular number on the field of sub-plan, then
this sub-plan can know its upper node does not require entire tuples,
and adjust its execution storategy.
Like MergeAppend, the sub-plan can also pass down the bounds to its
sub-plans again, if it makes sense and works correctly.
This feature is potentially a basis of GPU-based sorting on top of
CustomScan, because it has advantage for a workload to pick up the
top-N tuples if its data-size is enough small to load onto GPU-RAM.
Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>
Show quoted text
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
Sent: Tuesday, January 03, 2017 12:07 PM
To: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>
Cc: Jeevan Chalke <jeevan.chalke@enterprisedb.com>; Robert Haas
<robertmhaas@gmail.com>; pgsql-hackers@postgresql.org; Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
Subject: Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan
[take-2]Oops, I oversight this patch was marked as "returned with feedback", not
"moved to the next CF".Its status has not been changed since the last update. (Code was revised
according to the last comment by Jeevan, but CF-Nov was time up at that
time.)How do I handle the patch?
2016-12-05 16:49 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
Hello,
Sorry for my late response.
The attached patch reflects your comments.Here are few comments on latest patch:
1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.It was an incorrect interaction when postgres_fdw tries to push down
sorting to the remote side. We cannot attach LIMIT clause on the plain
scan path across SORT, however, the previous version estimated the
cost for the plain scan with LIMIT clause even if local sorting is needed.
If remote scan may return just 10 rows, estimated cost of the local
sort is very lightweight, thus, this unreasonable path was chosen.
(On the other hands, its query execution results were correct because
ps_numTuples is not delivered across Sort node, so ForeignScan
eventually scanned all the remote tuples. It made correct results but
not optimal from the viewpoint of performance.)The v3 patch estimates the cost with remote LIMIT clause only if
supplied pathkey strictly matches with the final output order of the
query, thus, no local sorting is expected.Some of the regression test cases still have different plans but due
to the new optimization by remote LIMIT clause.
Without remote LIMIT clause, some of regression test cases preferred
remote-JOIN + local-SORT then local-LIMIT.
Once we have remote-LIMIT option, it allows to discount the cost for
remote-SORT by choice of top-k heap sorting.
It changed the optimizer's decision on some test cases.Potential one big change is the test case below.
-- CROSS JOIN, not pushed down
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1,
t2.c1 OFFSET 100 LIMIT 10;It assumed CROSS JOIN was not pushed down due to the cost for network
traffic, however, remote LIMIT reduced the estimated number of tuples
to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to
run on the remote side.2. + * + * MEMO: root->limit_tuples is not attached when query contains + * grouping-clause or aggregate functions. So, we don'sadjust
+ * rows even if LIMIT <const> is supplied.
Can you please explain why you are not doing this for grouping-clause
or aggregate functions.See grouping_planner() at optimizer/plan/planner.c It puts an invalid
value on the root->limit_tuples if query has GROUP BY clause, so we
cannot know number of tuples to be returned even if we have upper
limit actually./*
* Figure out whether there's a hard limit on the number of rowsthat
* query_planner's result subplan needs to return. Even if we
know a
* hard limit overall, it doesn't apply if the query has any
* grouping/aggregation operations, or SRFs in the tlist.
*/
if (parse->groupClause ||
parse->groupingSets ||
parse->distinctClause ||
parse->hasAggs ||
parse->hasWindowFuncs ||
parse->hasTargetSRFs ||
root->hasHavingQual)
root->limit_tuples = -1.0;
else
root->limit_tuples = limit_tuples;3.
Typo:don's => don't
Fixed,
best regards,
----
PG-Strom Project / NEC OSS Promotion Center KaiGai Kohei
<kaigai@ak.jp.nec.com>--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachments:
passdown-limit-fdw.v4.patchapplication/octet-stream; name=passdown-limit-fdw.v4.patchDownload+118-92
Hello all,
as this is my first mail to pgsql-hackers, please be gentle :)
I've looked at the patch, and as I'm not that familiar with the
pg-sourcecode, customs and so on, this isn't a review, more like food for
thought and all should be taken with a grain of salt. :)
So here are a few questions and remarks:
+ double limit_tuples = -1.0;
Surely the limit cannot be fractional, and must be an integer. So wouldn't
it be better the same type as say:
+ if (root->limit_tuples >= 0.0 &&
Than you could also compare with ">= 0", not ">= 0.0".
node->ss.ps.ps_numTuples is f.i. an uint64.
Or is there a specific reason the limit must be a double?
And finally:
+ if (node->ss.ps.ps_numTuples > 0)
+ appendStringInfo(&buf, " LIMIT %ld", node->ss.ps.ps_numTuples);
vs.
+ appendStringInfo(&buf, "%s LIMIT %lu", + sql, node->ss.ps.ps_numTuples);
It seems odd to have two different format strings here for the same variable.
A few comments miss "." at the end, like these:
+ * Also, pass down the required number of tuples
+ * Pass down the number of required tuples by the upper node
And this comment might be better "were we already called?"
+ bool rs_started; /* are we already called? */
Hope this helps, and thank you for working on this issue.
Regards,
Tels
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello all,
as this is my first mail to pgsql-hackers, please be gentle :)
Welcome to pgsql-hackers,
I've looked at the patch, and as I'm not that familiar with the pg-sourcecode,
customs and so on, this isn't a review, more like food for thought and all
should be taken with a grain of salt. :)So here are a few questions and remarks:
+ double limit_tuples = -1.0;
Surely the limit cannot be fractional, and must be an integer. So wouldn't
it be better the same type as say:+ if (root->limit_tuples >= 0.0 &&
Than you could also compare with ">= 0", not ">= 0.0".
The above variable represents the "estimated" number of rows at the
planning stage, not execution time.
You may be able to see Path structure has "rows" field declared as
double type. It makes sense to consider stupid paths during planning,
even if it is eventually rejected. For example, if a cross join with
two large tables appear during planning, 64bit integer will make overflow
easily.
node->ss.ps.ps_numTuples is f.i. an uint64.
Or is there a specific reason the limit must be a double?
The above variable represents "actual" number of rows at the execution
stage. Likely, hardware replacement cycle will come before int64 overflow.
And finally:
+ if (node->ss.ps.ps_numTuples > 0)
+ appendStringInfo(&buf, " LIMIT %ld",
node->ss.ps.ps_numTuples);
vs.
+ appendStringInfo(&buf, "%s LIMIT %lu", + sql,node->ss.ps.ps_numTuples);
It seems odd to have two different format strings here for the same variable.
Ah, yes, %lu is right because ps_numTuples is uint64.
A few comments miss "." at the end, like these:
+ * Also, pass down the required number of tuples
+ * Pass down the number of required tuples by the upper node
OK,
And this comment might be better "were we already called?"
+ bool rs_started; /* are we already
called? */
Other variables in ResultState uses present form, like:
+ bool rs_started; /* are we already called? */
bool rs_done; /* are we done? */
bool rs_checkqual; /* do we need to check the qual? */
} ResultState;
Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
passdown-limit-fdw.v5.patchapplication/octet-stream; name=passdown-limit-fdw.v5.patchDownload+118-92
Hello,
On Wed, March 1, 2017 7:21 pm, Kouhei Kaigai wrote:
I've looked at the patch, and as I'm not that familiar with the
pg-sourcecode,
customs and so on, this isn't a review, more like food for thought and
all
should be taken with a grain of salt. :)So here are a few questions and remarks:
+ double limit_tuples = -1.0;
Surely the limit cannot be fractional, and must be an integer. So
wouldn't
it be better the same type as say:+ if (root->limit_tuples >= 0.0 &&
Than you could also compare with ">= 0", not ">= 0.0".
The above variable represents the "estimated" number of rows at the
planning stage, not execution time.
You may be able to see Path structure has "rows" field declared as
double type. It makes sense to consider stupid paths during planning,
even if it is eventually rejected. For example, if a cross join with
two large tables appear during planning, 64bit integer will make overflow
easily.
Hm, ok. Not related to your patch, just curious: Is there a mechanism in
place that automatically rejects plans where the limit would overflow the
double to uint64 conversation? Or is this more of a "there would be
hopefully a plan with a better limit so we do not use the bad one"?
Would it possible to force a plan where such overflow would occur?
And this comment might be better "were we already called?"
+ bool rs_started; /* are we already
called? */
Other variables in ResultState uses present form, like:
+ bool rs_started; /* are we already called? */
bool rs_done; /* are we done? */
bool rs_checkqual; /* do we need to check the qual? */
} ResultState;
Yes, I noted that, but still "are" and "called" and "already" don't read
well together for me:
are - present form
called - past form like "were we called?", or "are we called bob?" an
ongoing process
already - it has started
So "are we already called" reads like someone is waiting for being called.
Maybe to mirror the comment on "rs_done":
/* have we started yet? */
Also, maybe it's easier for the comment to describe what is happening in
the code because of the flag, not just to the flag itself:
/* To do things once when we are called */
Anyway, it is a minor point and don't let me distract you from your work,
I do like the feature and the patch :)
Best wishes,
Tels
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hello,
On Wed, March 1, 2017 7:21 pm, Kouhei Kaigai wrote:
I've looked at the patch, and as I'm not that familiar with the
pg-sourcecode, customs and so on, this isn't a review, more like food
for thought and all should be taken with a grain of salt. :)So here are a few questions and remarks:
+ double limit_tuples = -1.0;
Surely the limit cannot be fractional, and must be an integer. So
wouldn't it be better the same type as say:+ if (root->limit_tuples >= 0.0 &&
Than you could also compare with ">= 0", not ">= 0.0".
The above variable represents the "estimated" number of rows at the
planning stage, not execution time.
You may be able to see Path structure has "rows" field declared as
double type. It makes sense to consider stupid paths during planning,
even if it is eventually rejected. For example, if a cross join with
two large tables appear during planning, 64bit integer will make
overflow easily.Hm, ok. Not related to your patch, just curious: Is there a mechanism in
place that automatically rejects plans where the limit would overflow the
double to uint64 conversation? Or is this more of a "there would be hopefully
a plan with a better limit so we do not use the bad one"?Would it possible to force a plan where such overflow would occur?
We have no such mechanism, and less necessity.
Estimated number of rows in plan time is stored in the plan_rows field of
the Plan structure, as FP64. Once plan-tree gets constructed, estimated
number of rows shall not affect to the execution. (Some plan might use it
for estimation of resource consumption on execution time.)
On the other hands, the actual number of rows that was processed is saved
on the instrument field of the PlanState structure. It is counted up from
the zero by one. So, people wants to replace the hardware prior to uint64
overflow. If 1B rows are processed per sec, uint64 overflow happen at 26th
century(!).
And this comment might be better "were we already called?"
+ bool rs_started; /* are we already
called? */
Other variables in ResultState uses present form, like:
+ bool rs_started; /* are we already called? */
bool rs_done; /* are we done? */
bool rs_checkqual; /* do we need to check the qual? */
} ResultState;Yes, I noted that, but still "are" and "called" and "already" don't read
well together for me:are - present form
called - past form like "were we called?", or "are we called bob?" an
ongoing process
already - it has startedSo "are we already called" reads like someone is waiting for being called.
Maybe to mirror the comment on "rs_done":
/* have we started yet? */
Also, maybe it's easier for the comment to describe what is happening in
the code because of the flag, not just to the flag itself:/* To do things once when we are called */
Anyway, it is a minor point and don't let me distract you from your work,
I do like the feature and the patch :)
Fixed to "have we started yet?"
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachments:
passdown-limit-fdw.v6.patchapplication/octet-stream; name=passdown-limit-fdw.v6.patchDownload+118-92
Hi,
I have reviewed this patch further and here are my comments:
1.
Will it be better to use compare_pathkeys() instead of
equal(root->query_pathkeys, pathkeys)?
2.
I think it will be good if we add a simple test-case in postgres_fdw.sql
which shows that LIMIT is passed to remote server. We might convert one of
the affected EXPLAIN to EXPLAIN ANALYZE.
With this patch, we do push LIMIT on foreign server but not at planning
time.
Thus we still show remote query in EXPLAIN output without LIMIT clause but
we
actually push that at execution time. Such explain plan confuses the reader.
So I think we better convert them to EXPLAIN ANALYZE or put some comments
before the query explaining this. I prefer to put comments as EXPLAIN
ANALYZE
is costly.
3.
"-- CROSS JOIN, not pushed down"
Above comment need to be changed now.
As you explained already, due to limit clause, CROSS-JOIN is pushed down to
remote server, thus need to update this comment.
Rest of the changes look good to me.
Thanks
--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Hi,
On 3/13/17 3:25 AM, Jeevan Chalke wrote:
I have reviewed this patch further and here are my comments:
This thread has been idle for over a week. Please respond and/or post a
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".
Thanks,
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Kaigai,
On 3/21/17 1:11 PM, David Steele wrote:
On 3/13/17 3:25 AM, Jeevan Chalke wrote:
I have reviewed this patch further and here are my comments:
This thread has been idle for over a week. Please respond and/or post a
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".
This submission has been marked "Returned with Feedback". Please feel
free to resubmit to a future commitfest.
Regards,
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers