PassDownLimitBound for ForeignScan/CustomScan [take-2]

Started by Jeevan Chalkeabout 9 years ago14 messages
#1Jeevan Chalke
jeevan.chalke@enterprisedb.com
1 attachment(s)

On Mon, Nov 21, 2016 at 1:59 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Hello,

The attached patch is a revised version of pass-down LIMIT to FDW/CSP.

Below is the updates from the last version.

'ps_numTuples' of PlanState was declared as uint64, instead of long
to avoid problems on 32bits machine when a large LIMIT clause is
supplied.

'ps_numTuples' is re-interpreted; 0 means that its upper node wants
to fetch all the tuples. It allows to eliminate a boring initialization
on ExecInit handler for each executor node.

Even though it was not suggested, estimate_path_cost_size() of postgres_fdw
adjusts number of rows if foreign-path is located on top-level of
the base-relations and LIMIT clause takes a constant value.
It will make more adequate plan as follows:

* WITHOUT this patch
--------------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id
and t_a.x < t_b.x LIMIT 100;
QUERY PLAN
------------------------------------------------------------
----------------------------
Limit (cost=261.17..274.43 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
-> Hash Join (cost=261.17..581.50 rows=2416 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
Hash Cond: (t_a.id = t_b.id)
Join Filter: (t_a.x < t_b.x)
-> Foreign Scan on public.t_a (cost=100.00..146.12 rows=1204
width=44)
Output: t_a.id, t_a.x, t_a.y
Remote SQL: SELECT id, x, y FROM public.t
-> Hash (cost=146.12..146.12 rows=1204 width=44)
Output: t_b.id, t_b.x, t_b.y
-> Foreign Scan on public.t_b (cost=100.00..146.12
rows=1204 width=44)
Output: t_b.id, t_b.x, t_b.y
Remote SQL: SELECT id, x, y FROM public.t
(14 rows)

* WITH this patch
-----------------
postgres=# explain verbose select * from t_a, t_b where t_a.id = t_b.id
and t_a.x < t_b.x LIMIT 100;

QUERY PLAN
------------------------------------------------------------
--------------------------------------------------
Limit (cost=100.00..146.58 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
-> Foreign Scan (cost=100.00..146.58 rows=100 width=88)
Output: t_a.id, t_a.x, t_a.y, t_b.id, t_b.x, t_b.y
Relations: (public.t_a) INNER JOIN (public.t_b)
Remote SQL: SELECT r1.id, r1.x, r1.y, r2.id, r2.x, r2.y FROM
(public.t r1 INNER JOIN public.t r2 ON (((r1.x < r2.x)) AND ((r1.id =
r2.id))))
(6 rows)

That's nice.

On the other hands, I noticed it is not safe to attach LIMIT clause at
the planner stage because root->limit_tuples is declared as double.
Even if LIMIT clause takes a constant value, it is potentially larger
than 2^53 which is the limitation we can represent accurately with
float64 data type but LIMIT clause allows up to 2^63-1.
So, postgres_fdw now attaches LIMIT clause on the remote query on
execution time only.

I think, it's OK.

Here are few comments on latest patch:

1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.

2.
+             *
+             * MEMO: root->limit_tuples is not attached when query contains
+             * grouping-clause or aggregate functions. So, we don's adjust
+             * rows even if LIMIT <const> is supplied.

Can you please explain why you are not doing this for grouping-clause or
aggregate functions.

3.
Typo:

don's => don't

Rest of the changes look good to me.

Thanks

Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>

-----Original Message-----
From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Thursday, November 10, 2016 3:08 AM
To: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>
Cc: pgsql-hackers@postgresql.org; Jeevan Chalke
<jeevan.chalke@enterprisedb.com>; Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
Subject: ##freemail## Re: PassDownLimitBound for ForeignScan/CustomScan
[take-2]

On Mon, Oct 31, 2016 at 10:20 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com>
wrote:

As an example, I enhanced postgres_fdw to understand the ps_numTuples
if it is set. If and when remote ORDER BY is pushed down, the latest
code tries to sort the entire remote table because it does not know
how many rows to be returned. Thus, it took larger execution time.
On the other hands, the patched one runs the remote query with LIMIT
clause according to the ps_numTuples; which is informed by the Limit
node on top of the ForeignScan node.

So there are two cases here. If the user says LIMIT 12, we could in

theory

know that at planner time and optimize accordingly. If the user says

LIMIT

twelve(), however, we will need to wait until execution time unless

twelve()

happens to be capable of being simplified to a constant by the planner.

Therefore, it's possible to imagine having two mechanisms here. In the
simple case where the LIMIT and OFFSET values are constants, we could
implement a system to get hold of that information during planning and
use it for whatever we like. In addition, we can have an
execution-time system that optimizes based on values available at

execution

(regardless of whether those values were also available during planning).
Those are, basically, two separate things, and this patch has enough to
do just focusing on one of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
Company

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

regression.diffsapplication/octet-stream; name=regression.diffsDownload
*** /home/jeevan/work/pg_master/contrib/postgres_fdw/expected/postgres_fdw.out	2016-11-07 18:03:33.373390410 +0530
--- /home/jeevan/work/pg_master/contrib/postgres_fdw/results/postgres_fdw.out	2016-11-24 15:42:35.477445111 +0530
***************
*** 975,989 ****
  -- join two tables
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
!                                                                                         QUERY PLAN                                                                                        
! ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3
!    ->  Foreign Scan
           Output: t1.c1, t2.c1, t1.c3
!          Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
! (6 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
   c1  | c1  
--- 975,992 ----
  -- join two tables
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
!                                                                QUERY PLAN                                                                
! -----------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3
!    ->  Sort
           Output: t1.c1, t2.c1, t1.c3
!          Sort Key: t1.c3, t1.c1
!          ->  Foreign Scan
!                Output: t1.c1, t2.c1, t1.c3
!                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1"))))
! (9 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
   c1  | c1  
***************
*** 1477,1494 ****
  -- full outer join + WHERE clause, only matched rows
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
!                                                                             QUERY PLAN                                                                            
! ------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1
!    ->  Sort
           Output: t1.c1, t2.c1
!          Sort Key: t1.c1, t2.c1
!          ->  Foreign Scan
!                Output: t1.c1, t2.c1
!                Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
!                Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL)))
! (9 rows)
  
  SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
   c1 | c1 
--- 1480,1494 ----
  -- full outer join + WHERE clause, only matched rows
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
!                                                                                                    QUERY PLAN                                                                                                   
! ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1
!    ->  Foreign Scan
           Output: t1.c1, t2.c1
!          Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
!          Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL))) ORDER BY r1.c1 ASC NULLS LAST, r2.c1 ASC NULLS LAST
! (6 rows)
  
  SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
   c1 | c1 
***************
*** 1509,1540 ****
  -- tests whole-row reference for row marks
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
!                                                                                                                                                                                                                QUERY PLAN                                                                                                                                                                                                                
! -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
     ->  LockRows
           Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!          ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1
!                ->  Merge Join
!                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
!                            Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
                                   Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
!                            Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
                                   Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
--- 1509,1543 ----
  -- tests whole-row reference for row marks
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
!                                                                                                                                                                                        QUERY PLAN                                                                                                                                                                                       
! ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
     ->  LockRows
           Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!          ->  Sort
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                Sort Key: t1.c3, t1.c1
!                ->  Foreign Scan
!                      Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                      Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) FOR UPDATE OF r1
!                      ->  Merge Join
!                            Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                            Merge Cond: (t1.c1 = t2.c1)
!                            ->  Sort
                                   Output: t1.c1, t1.c3, t1.*
!                                  Sort Key: t1.c1
!                                  ->  Foreign Scan on public.ft1 t1
!                                        Output: t1.c1, t1.c3, t1.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                            ->  Sort
                                   Output: t2.c1, t2.*
!                                  Sort Key: t2.c1
!                                  ->  Foreign Scan on public.ft2 t2
!                                        Output: t2.c1, t2.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (26 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;
   c1  | c1  
***************
*** 1553,1584 ****
  
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
!                                                                                                                                                                                                                         QUERY PLAN                                                                                                                                                                                                                        
! ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
     ->  LockRows
           Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!          ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 FOR UPDATE OF r2
!                ->  Merge Join
!                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
!                            Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
                                   Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                      ->  Sort
!                            Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
                                   Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
--- 1556,1590 ----
  
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
!                                                                                                                                                                                                QUERY PLAN                                                                                                                                                                                                
! ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
     ->  LockRows
           Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!          ->  Sort
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                Sort Key: t1.c3, t1.c1
!                ->  Foreign Scan
!                      Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                      Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) FOR UPDATE OF r1 FOR UPDATE OF r2
!                      ->  Merge Join
!                            Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                            Merge Cond: (t1.c1 = t2.c1)
!                            ->  Sort
                                   Output: t1.c1, t1.c3, t1.*
!                                  Sort Key: t1.c1
!                                  ->  Foreign Scan on public.ft1 t1
!                                        Output: t1.c1, t1.c3, t1.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
!                            ->  Sort
                                   Output: t2.c1, t2.*
!                                  Sort Key: t2.c1
!                                  ->  Foreign Scan on public.ft2 t2
!                                        Output: t2.c1, t2.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE
! (26 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE;
   c1  | c1  
***************
*** 1598,1629 ****
  -- join two tables with FOR SHARE clause
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
!                                                                                                                                                                                                                QUERY PLAN                                                                                                                                                                                                               
! ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
     ->  LockRows
           Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!          ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1
!                ->  Merge Join
!                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
!                            Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
                                   Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
!                            Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
                                   Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
--- 1604,1638 ----
  -- join two tables with FOR SHARE clause
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
!                                                                                                                                                                                       QUERY PLAN                                                                                                                                                                                       
! ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
     ->  LockRows
           Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!          ->  Sort
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                Sort Key: t1.c3, t1.c1
!                ->  Foreign Scan
!                      Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                      Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) FOR SHARE OF r1
!                      ->  Merge Join
!                            Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                            Merge Cond: (t1.c1 = t2.c1)
!                            ->  Sort
                                   Output: t1.c1, t1.c3, t1.*
!                                  Sort Key: t1.c1
!                                  ->  Foreign Scan on public.ft1 t1
!                                        Output: t1.c1, t1.c3, t1.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                            ->  Sort
                                   Output: t2.c1, t2.*
!                                  Sort Key: t2.c1
!                                  ->  Foreign Scan on public.ft2 t2
!                                        Output: t2.c1, t2.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (26 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE OF t1;
   c1  | c1  
***************
*** 1642,1673 ****
  
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
!                                                                                                                                                                                                                        QUERY PLAN                                                                                                                                                                                                                       
! --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
     ->  LockRows
           Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!          ->  Foreign Scan
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR SHARE OF r1 FOR SHARE OF r2
!                ->  Merge Join
!                      Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                      Merge Cond: (t1.c1 = t2.c1)
!                      ->  Sort
!                            Output: t1.c1, t1.c3, t1.*
!                            Sort Key: t1.c1
!                            ->  Foreign Scan on public.ft1 t1
                                   Output: t1.c1, t1.c3, t1.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                      ->  Sort
!                            Output: t2.c1, t2.*
!                            Sort Key: t2.c1
!                            ->  Foreign Scan on public.ft2 t2
                                   Output: t2.c1, t2.*
!                                  Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (23 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
--- 1651,1685 ----
  
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
!                                                                                                                                                                                               QUERY PLAN                                                                                                                                                                                               
! -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
     ->  LockRows
           Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!          ->  Sort
                 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                Sort Key: t1.c3, t1.c1
!                ->  Foreign Scan
!                      Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
!                      Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                      Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) FOR SHARE OF r1 FOR SHARE OF r2
!                      ->  Merge Join
!                            Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
!                            Merge Cond: (t1.c1 = t2.c1)
!                            ->  Sort
                                   Output: t1.c1, t1.c3, t1.*
!                                  Sort Key: t1.c1
!                                  ->  Foreign Scan on public.ft1 t1
!                                        Output: t1.c1, t1.c3, t1.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
!                            ->  Sort
                                   Output: t2.c1, t2.*
!                                  Sort Key: t2.c1
!                                  ->  Foreign Scan on public.ft2 t2
!                                        Output: t2.c1, t2.*
!                                        Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR SHARE
! (26 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
   c1  | c1  
***************
*** 1721,1735 ****
  -- ctid with whole-row reference
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
!                                                                                                                                                                                                    QUERY PLAN                                                                                                                                                                                                    
! -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
!    ->  Foreign Scan
           Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
!          Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!          Remote SQL: SELECT r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r1."C 1", r1.c3, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
! (6 rows)
  
  -- SEMI JOIN, not pushed down
  EXPLAIN (VERBOSE, COSTS OFF)
--- 1733,1750 ----
  -- ctid with whole-row reference
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
!                                                                                                                                                                            QUERY PLAN                                                                                                                                                                           
! ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
!    ->  Sort
           Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
!          Sort Key: t1.c3, t1.c1
!          ->  Foreign Scan
!                Output: t1.ctid, t1.*, t2.*, t1.c1, t1.c3
!                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1.ctid, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r1."C 1", r1.c3, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1"))))
! (9 rows)
  
  -- SEMI JOIN, not pushed down
  EXPLAIN (VERBOSE, COSTS OFF)
***************
*** 1804,1827 ****
  -- CROSS JOIN, not pushed down
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
!                              QUERY PLAN                              
! ---------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1
     ->  Sort
           Output: t1.c1, t2.c1
           Sort Key: t1.c1, t2.c1
!          ->  Nested Loop
                 Output: t1.c1, t2.c1
!                ->  Foreign Scan on public.ft1 t1
!                      Output: t1.c1
!                      Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
!                ->  Materialize
!                      Output: t2.c1
!                      ->  Foreign Scan on public.ft2 t2
!                            Output: t2.c1
!                            Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
! (15 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
   c1 | c1  
--- 1819,1836 ----
  -- CROSS JOIN, not pushed down
  EXPLAIN (VERBOSE, COSTS OFF)
  SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
!                                                   QUERY PLAN                                                   
! ---------------------------------------------------------------------------------------------------------------
   Limit
     Output: t1.c1, t2.c1
     ->  Sort
           Output: t1.c1, t2.c1
           Sort Key: t1.c1, t2.c1
!          ->  Foreign Scan
                 Output: t1.c1, t2.c1
!                Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
!                Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE))
! (9 rows)
  
  SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
   c1 | c1  
***************
*** 5476,5489 ****
  
  -- ORDER BY DESC NULLS FIRST options
  EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 ORDER BY c6 DESC NULLS FIRST, c1 OFFSET 15 LIMIT 10;
!                                                             QUERY PLAN                                                            
! ----------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: c1, c2, c3, c4, c5, c6, c7, c8
!    ->  Foreign Scan on public.ft1
           Output: c1, c2, c3, c4, c5, c6, c7, c8
!          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY c6 DESC NULLS FIRST, "C 1" ASC NULLS LAST
! (5 rows)
  
  SELECT * FROM ft1 ORDER BY c6 DESC NULLS FIRST, c1 OFFSET 15 LIMIT 10;
    c1  | c2  |       c3        |              c4              |            c5            | c6 |     c7     | c8  
--- 5485,5501 ----
  
  -- ORDER BY DESC NULLS FIRST options
  EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 ORDER BY c6 DESC NULLS FIRST, c1 OFFSET 15 LIMIT 10;
!                                      QUERY PLAN                                      
! -------------------------------------------------------------------------------------
   Limit
     Output: c1, c2, c3, c4, c5, c6, c7, c8
!    ->  Sort
           Output: c1, c2, c3, c4, c5, c6, c7, c8
!          Sort Key: ft1.c6 DESC, ft1.c1
!          ->  Foreign Scan on public.ft1
!                Output: c1, c2, c3, c4, c5, c6, c7, c8
!                Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (8 rows)
  
  SELECT * FROM ft1 ORDER BY c6 DESC NULLS FIRST, c1 OFFSET 15 LIMIT 10;
    c1  | c2  |       c3        |              c4              |            c5            | c6 |     c7     | c8  
***************
*** 5502,5515 ****
  
  -- ORDER BY ASC NULLS FIRST options
  EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 ORDER BY c6 ASC NULLS FIRST, c1 OFFSET 15 LIMIT 10;
!                                                            QUERY PLAN                                                            
! ---------------------------------------------------------------------------------------------------------------------------------
   Limit
     Output: c1, c2, c3, c4, c5, c6, c7, c8
!    ->  Foreign Scan on public.ft1
           Output: c1, c2, c3, c4, c5, c6, c7, c8
!          Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ORDER BY c6 ASC NULLS FIRST, "C 1" ASC NULLS LAST
! (5 rows)
  
  SELECT * FROM ft1 ORDER BY c6 ASC NULLS FIRST, c1 OFFSET 15 LIMIT 10;
    c1  | c2  |        c3         |              c4              |            c5            |  c6  |     c7     | c8  
--- 5514,5530 ----
  
  -- ORDER BY ASC NULLS FIRST options
  EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 ORDER BY c6 ASC NULLS FIRST, c1 OFFSET 15 LIMIT 10;
!                                      QUERY PLAN                                      
! -------------------------------------------------------------------------------------
   Limit
     Output: c1, c2, c3, c4, c5, c6, c7, c8
!    ->  Sort
           Output: c1, c2, c3, c4, c5, c6, c7, c8
!          Sort Key: ft1.c6 NULLS FIRST, ft1.c1
!          ->  Foreign Scan on public.ft1
!                Output: c1, c2, c3, c4, c5, c6, c7, c8
!                Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"
! (8 rows)
  
  SELECT * FROM ft1 ORDER BY c6 ASC NULLS FIRST, c1 OFFSET 15 LIMIT 10;
    c1  | c2  |        c3         |              c4              |            c5            |  c6  |     c7     | c8  

======================================================================

#2Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Jeevan Chalke (#1)
1 attachment(s)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Hello,

Sorry for my late response.
The attached patch reflects your comments.

Here are few comments on latest patch:

1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.

It was an incorrect interaction when postgres_fdw tries to push down
sorting to the remote side. We cannot attach LIMIT clause on the plain
scan path across SORT, however, the previous version estimated the cost
for the plain scan with LIMIT clause even if local sorting is needed.
If remote scan may return just 10 rows, estimated cost of the local sort
is very lightweight, thus, this unreasonable path was chosen.
(On the other hands, its query execution results were correct because
ps_numTuples is not delivered across Sort node, so ForeignScan eventually
scanned all the remote tuples. It made correct results but not optimal
from the viewpoint of performance.)

The v3 patch estimates the cost with remote LIMIT clause only if supplied
pathkey strictly matches with the final output order of the query, thus,
no local sorting is expected.

Some of the regression test cases still have different plans but due to
the new optimization by remote LIMIT clause.
Without remote LIMIT clause, some of regression test cases preferred
remote-JOIN + local-SORT then local-LIMIT.
Once we have remote-LIMIT option, it allows to discount the cost for
remote-SORT by choice of top-k heap sorting.
It changed the optimizer's decision on some test cases.

Potential one big change is the test case below.

-- CROSS JOIN, not pushed down
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;

It assumed CROSS JOIN was not pushed down due to the cost for network
traffic, however, remote LIMIT reduced the estimated number of tuples
to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to run
on the remote side.

2.
+             *
+             * MEMO: root->limit_tuples is not attached when query
contains
+             * grouping-clause or aggregate functions. So, we don's adjust
+             * rows even if LIMIT <const> is supplied.

Can you please explain why you are not doing this for grouping-clause or
aggregate functions.

See grouping_planner() at optimizer/plan/planner.c
It puts an invalid value on the root->limit_tuples if query has GROUP BY clause,
so we cannot know number of tuples to be returned even if we have upper limit
actually.

/*
* Figure out whether there's a hard limit on the number of rows that
* query_planner's result subplan needs to return. Even if we know a
* hard limit overall, it doesn't apply if the query has any
* grouping/aggregation operations, or SRFs in the tlist.
*/
if (parse->groupClause ||
parse->groupingSets ||
parse->distinctClause ||
parse->hasAggs ||
parse->hasWindowFuncs ||
parse->hasTargetSRFs ||
root->hasHavingQual)
root->limit_tuples = -1.0;
else
root->limit_tuples = limit_tuples;

3.
Typo:

don's => don't

Fixed,

best regards,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

passdown-limit-fdw.v3.patchapplication/octet-stream; name=passdown-limit-fdw.v3.patchDownload
 contrib/postgres_fdw/expected/postgres_fdw.out | 51 +++++++------------
 contrib/postgres_fdw/postgres_fdw.c            | 56 ++++++++++++++++++++-
 src/backend/executor/nodeLimit.c               | 68 ++------------------------
 src/backend/executor/nodeMergeAppend.c         |  6 ++-
 src/backend/executor/nodeResult.c              | 25 ++++++++++
 src/backend/executor/nodeSort.c                | 10 ++++
 src/include/nodes/execnodes.h                  |  3 ++
 7 files changed, 118 insertions(+), 101 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 785f520..c3f9433 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1003,18 +1003,15 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 -- join three tables
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
-                                                                                            QUERY PLAN                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                     QUERY PLAN                                                                                                                     
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c2, t3.c3, t1.c3
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c2, t3.c3, t1.c3
-         Sort Key: t1.c3, t1.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c2, t3.c3, t1.c3
-               Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
-               Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1))))
-(9 rows)
+         Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
+         Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1)))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
  c1 | c2 |   c3   
@@ -1477,18 +1474,15 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT
 -- full outer join + WHERE clause, only matched rows
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
-                                                                            QUERY PLAN                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                   QUERY PLAN                                                                                                   
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c1
-               Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
-               Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL)))
-(9 rows)
+         Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+         Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL))) ORDER BY r1.c1 ASC NULLS LAST, r2.c1 ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
  c1 | c1 
@@ -1804,24 +1798,15 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2
 -- CROSS JOIN, not pushed down
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                                                                            QUERY PLAN                                                                             
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Nested Loop
-               Output: t1.c1, t2.c1
-               ->  Foreign Scan on public.ft1 t1
-                     Output: t1.c1
-                     Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-               ->  Materialize
-                     Output: t2.c1
-                     ->  Foreign Scan on public.ft2 t2
-                           Output: t2.c1
-                           Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-(15 rows)
+         Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+         Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE)) ORDER BY r1."C 1" ASC NULLS LAST, r2."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
  c1 | c1  
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fbe6929..4631f63 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2428,6 +2428,17 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
 	if (es->verbose)
 	{
 		sql = strVal(list_nth(fdw_private, FdwScanPrivateSelectSql));
+
+		/* LIMIT clause is attached on execution time */
+		if (node->ss.ps.ps_numTuples > 0)
+		{
+			StringInfoData	buf;
+
+			initStringInfo(&buf);
+			appendStringInfo(&buf, "%s LIMIT %lu",
+							 sql, node->ss.ps.ps_numTuples);
+			sql = buf.data;
+		}
 		ExplainPropertyText("Remote SQL", sql, es);
 	}
 }
@@ -2493,6 +2504,7 @@ estimate_path_cost_size(PlannerInfo *root,
 						Cost *p_startup_cost, Cost *p_total_cost)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
+	double		limit_tuples = -1.0;
 	double		rows;
 	double		retrieved_rows;
 	int			width;
@@ -2501,6 +2513,18 @@ estimate_path_cost_size(PlannerInfo *root,
 	Cost		cpu_per_tuple;
 
 	/*
+	 * LIMIT clause can be pushed down only when this foreign-path does not
+	 * need to join any other base relations, and the supplied path-keys
+	 * strictly match with the final output order.
+	 * Elsewhere, pushdown of LIMIT clause makes incorrect results, thus,
+	 * estimated cost don't assume remote LIMIT execution.
+	 */
+	if (root->limit_tuples >= 0.0 &&
+		equal(root->query_pathkeys, pathkeys) &&
+		bms_equal(root->all_baserels, foreignrel->relids))
+		limit_tuples = root->limit_tuples;
+
+	/*
 	 * If the table or the server is configured to use remote estimates,
 	 * connect to the foreign server and execute EXPLAIN to estimate the
 	 * number of rows selected by the restriction+join clauses.  Otherwise,
@@ -2553,6 +2577,12 @@ estimate_path_cost_size(PlannerInfo *root,
 		deparseSelectStmtForRel(&sql, root, foreignrel, fdw_scan_tlist,
 								remote_conds, pathkeys, &retrieved_attrs,
 								NULL);
+		/*
+		 * Attach LIMIT if this path is top-level and constant value is
+		 * specified.
+		 */
+		if (limit_tuples >= 0.0)
+			appendStringInfo(&sql, " LIMIT %.0f", limit_tuples);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->user, false);
@@ -2627,6 +2657,12 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			/* Estimate of number of rows in cross product */
 			nrows = fpinfo_i->rows * fpinfo_o->rows;
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimate to at most size of cross product */
 			retrieved_rows = Min(retrieved_rows, nrows);
 
@@ -2724,6 +2760,10 @@ estimate_path_cost_size(PlannerInfo *root,
 			/*
 			 * Number of rows expected from foreign server will be same as
 			 * that of number of groups.
+			 *
+			 * MEMO: root->limit_tuples is not attached when query contains
+			 * grouping-clause or aggregate functions. So, we don't adjust
+			 * rows even if LIMIT <const> is supplied.
 			 */
 			rows = retrieved_rows = numGroups;
 
@@ -2754,8 +2794,16 @@ estimate_path_cost_size(PlannerInfo *root,
 		}
 		else
 		{
+			double		nrows = foreignrel->tuples;
+
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimates to at most foreignrel->tuples. */
-			retrieved_rows = Min(retrieved_rows, foreignrel->tuples);
+			retrieved_rows = Min(retrieved_rows, nrows);
 
 			/*
 			 * Cost as though this were a seqscan, which is pessimistic.  We
@@ -2768,7 +2816,7 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			startup_cost += foreignrel->baserestrictcost.startup;
 			cpu_per_tuple = cpu_tuple_cost + foreignrel->baserestrictcost.per_tuple;
-			run_cost += cpu_per_tuple * foreignrel->tuples;
+			run_cost += cpu_per_tuple * nrows;
 		}
 
 		/*
@@ -2942,6 +2990,10 @@ create_cursor(ForeignScanState *node)
 	appendStringInfo(&buf, "DECLARE c%u CURSOR FOR\n%s",
 					 fsstate->cursor_number, fsstate->query);
 
+	/* Append LIMIT if numTuples were passed-down */
+	if (node->ss.ps.ps_numTuples > 0)
+		appendStringInfo(&buf, " LIMIT %ld", node->ss.ps.ps_numTuples);
+
 	/*
 	 * Notice that we pass NULL for paramTypes, thus forcing the remote server
 	 * to infer types for all parameters.  Since we explicitly cast every
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index faf32e1..df3fe01 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -26,7 +26,6 @@
 #include "nodes/nodeFuncs.h"
 
 static void recompute_limits(LimitState *node);
-static void pass_down_bound(LimitState *node, PlanState *child_node);
 
 
 /* ----------------------------------------------------------------
@@ -232,6 +231,7 @@ static void
 recompute_limits(LimitState *node)
 {
 	ExprContext *econtext = node->ps.ps_ExprContext;
+	uint64		tuples_needed;
 	Datum		val;
 	bool		isNull;
 
@@ -296,70 +296,8 @@ recompute_limits(LimitState *node)
 	node->lstate = LIMIT_RESCAN;
 
 	/* Notify child node about limit, if useful */
-	pass_down_bound(node, outerPlanState(node));
-}
-
-/*
- * If we have a COUNT, and our input is a Sort node, notify it that it can
- * use bounded sort.  Also, if our input is a MergeAppend, we can apply the
- * same bound to any Sorts that are direct children of the MergeAppend,
- * since the MergeAppend surely need read no more than that many tuples from
- * any one input.  We also have to be prepared to look through a Result,
- * since the planner might stick one atop MergeAppend for projection purposes.
- *
- * This is a bit of a kluge, but we don't have any more-abstract way of
- * communicating between the two nodes; and it doesn't seem worth trying
- * to invent one without some more examples of special communication needs.
- *
- * Note: it is the responsibility of nodeSort.c to react properly to
- * changes of these parameters.  If we ever do redesign this, it'd be a
- * good idea to integrate this signaling with the parameter-change mechanism.
- */
-static void
-pass_down_bound(LimitState *node, PlanState *child_node)
-{
-	if (IsA(child_node, SortState))
-	{
-		SortState  *sortState = (SortState *) child_node;
-		int64		tuples_needed = node->count + node->offset;
-
-		/* negative test checks for overflow in sum */
-		if (node->noCount || tuples_needed < 0)
-		{
-			/* make sure flag gets reset if needed upon rescan */
-			sortState->bounded = false;
-		}
-		else
-		{
-			sortState->bounded = true;
-			sortState->bound = tuples_needed;
-		}
-	}
-	else if (IsA(child_node, MergeAppendState))
-	{
-		MergeAppendState *maState = (MergeAppendState *) child_node;
-		int			i;
-
-		for (i = 0; i < maState->ms_nplans; i++)
-			pass_down_bound(node, maState->mergeplans[i]);
-	}
-	else if (IsA(child_node, ResultState))
-	{
-		/*
-		 * An extra consideration here is that if the Result is projecting a
-		 * targetlist that contains any SRFs, we can't assume that every input
-		 * tuple generates an output tuple, so a Sort underneath might need to
-		 * return more than N tuples to satisfy LIMIT N. So we cannot use
-		 * bounded sort.
-		 *
-		 * If Result supported qual checking, we'd have to punt on seeing a
-		 * qual, too.  Note that having a resconstantqual is not a
-		 * showstopper: if that fails we're not getting any rows at all.
-		 */
-		if (outerPlanState(child_node) &&
-			!expression_returns_set((Node *) child_node->plan->targetlist))
-			pass_down_bound(node, outerPlanState(child_node));
-	}
+	tuples_needed = (node->noCount ? 0 : node->count + node->offset);
+	outerPlanState(node)->ps_numTuples = tuples_needed;
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e271927..91707c6 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -174,10 +174,14 @@ ExecMergeAppend(MergeAppendState *node)
 		/*
 		 * First time through: pull the first tuple from each subplan, and set
 		 * up the heap.
+		 * Also, pass down the required number of tuples
 		 */
 		for (i = 0; i < node->ms_nplans; i++)
 		{
-			node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+			PlanState  *pstate = node->mergeplans[i];
+
+			pstate->ps_numTuples = node->ps.ps_numTuples;
+			node->ms_slots[i] = ExecProcNode(pstate);
 			if (!TupIsNull(node->ms_slots[i]))
 				binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
 		}
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4007b76..9e83eb2 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -47,6 +47,7 @@
 
 #include "executor/executor.h"
 #include "executor/nodeResult.h"
+#include "nodes/nodeFuncs.h"
 #include "utils/memutils.h"
 
 
@@ -75,6 +76,28 @@ ExecResult(ResultState *node)
 	econtext = node->ps.ps_ExprContext;
 
 	/*
+	 * Pass down the number of required tuples by the upper node
+	 *
+	 * An extra consideration here is that if the Result is projecting a
+	 * targetlist that contains any SRFs, we can't assume that every input
+	 * tuple generates an output tuple, so a Sort underneath might need to
+	 * return more than N tuples to satisfy LIMIT N. So we cannot use
+	 * bounded sort.
+	 *
+	 * If Result supported qual checking, we'd have to punt on seeing a
+	 * qual, too.  Note that having a resconstantqual is not a
+	 * showstopper: if that fails we're not getting any rows at all.
+	 */
+	if (!node->rs_started)
+	{
+		if (outerPlanState(node) &&
+			!expression_returns_set((Node *) node->ps.plan->targetlist))
+			outerPlanState(node)->ps_numTuples = node->ps.ps_numTuples;
+
+		node->rs_started = true;
+	}
+
+	/*
 	 * check constant qualifications like (2 > 1), if not already done
 	 */
 	if (node->rs_checkqual)
@@ -218,6 +241,7 @@ ExecInitResult(Result *node, EState *estate, int eflags)
 	resstate->ps.plan = (Plan *) node;
 	resstate->ps.state = estate;
 
+	resstate->rs_started = false;
 	resstate->rs_done = false;
 	resstate->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
 
@@ -294,6 +318,7 @@ ExecEndResult(ResultState *node)
 void
 ExecReScanResult(ResultState *node)
 {
+	node->rs_started = false;
 	node->rs_done = false;
 	node->ps.ps_TupFromTlist = false;
 	node->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index a34dcc5..6ec7abb 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -66,6 +66,16 @@ ExecSort(SortState *node)
 
 		SO1_printf("ExecSort: %s\n",
 				   "sorting subplan");
+		/*
+		 * Check bounds according to the required number of tuples
+		 */
+		if (node->ss.ps.ps_numTuples == 0)
+			node->bounded = false;
+		else
+		{
+			node->bounded = true;
+			node->bound = node->ss.ps.ps_numTuples;
+		}
 
 		/*
 		 * Want to scan subplan in the forward direction while creating the
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8004d85..9b3aed8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1066,6 +1066,8 @@ typedef struct PlanState
 	ProjectionInfo *ps_ProjInfo;	/* info for doing tuple projection */
 	bool		ps_TupFromTlist;/* state flag for processing set-valued
 								 * functions in targetlist */
+	uint64		ps_numTuples;	/* number of tuples required by the upper
+								 * node if any. 0 means all the tuples */
 } PlanState;
 
 /* ----------------
@@ -1114,6 +1116,7 @@ typedef struct ResultState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	ExprState  *resconstantqual;
+	bool		rs_started;		/* are we already called? */
 	bool		rs_done;		/* are we done? */
 	bool		rs_checkqual;	/* do we need to check the qual? */
 } ResultState;
#3Kohei KaiGai
kaigai@kaigai.gr.jp
In reply to: Kouhei Kaigai (#2)
1 attachment(s)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Oops, I oversight this patch was marked as "returned with feedback",
not "moved to the next CF".

Its status has not been changed since the last update. (Code was
revised according to the last
comment by Jeevan, but CF-Nov was time up at that time.)

How do I handle the patch?

2016-12-05 16:49 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Hello,

Sorry for my late response.
The attached patch reflects your comments.

Here are few comments on latest patch:

1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.

It was an incorrect interaction when postgres_fdw tries to push down
sorting to the remote side. We cannot attach LIMIT clause on the plain
scan path across SORT, however, the previous version estimated the cost
for the plain scan with LIMIT clause even if local sorting is needed.
If remote scan may return just 10 rows, estimated cost of the local sort
is very lightweight, thus, this unreasonable path was chosen.
(On the other hands, its query execution results were correct because
ps_numTuples is not delivered across Sort node, so ForeignScan eventually
scanned all the remote tuples. It made correct results but not optimal
from the viewpoint of performance.)

The v3 patch estimates the cost with remote LIMIT clause only if supplied
pathkey strictly matches with the final output order of the query, thus,
no local sorting is expected.

Some of the regression test cases still have different plans but due to
the new optimization by remote LIMIT clause.
Without remote LIMIT clause, some of regression test cases preferred
remote-JOIN + local-SORT then local-LIMIT.
Once we have remote-LIMIT option, it allows to discount the cost for
remote-SORT by choice of top-k heap sorting.
It changed the optimizer's decision on some test cases.

Potential one big change is the test case below.

-- CROSS JOIN, not pushed down
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;

It assumed CROSS JOIN was not pushed down due to the cost for network
traffic, however, remote LIMIT reduced the estimated number of tuples
to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to run
on the remote side.

2.
+             *
+             * MEMO: root->limit_tuples is not attached when query
contains
+             * grouping-clause or aggregate functions. So, we don's adjust
+             * rows even if LIMIT <const> is supplied.

Can you please explain why you are not doing this for grouping-clause or
aggregate functions.

See grouping_planner() at optimizer/plan/planner.c
It puts an invalid value on the root->limit_tuples if query has GROUP BY clause,
so we cannot know number of tuples to be returned even if we have upper limit
actually.

/*
* Figure out whether there's a hard limit on the number of rows that
* query_planner's result subplan needs to return. Even if we know a
* hard limit overall, it doesn't apply if the query has any
* grouping/aggregation operations, or SRFs in the tlist.
*/
if (parse->groupClause ||
parse->groupingSets ||
parse->distinctClause ||
parse->hasAggs ||
parse->hasWindowFuncs ||
parse->hasTargetSRFs ||
root->hasHavingQual)
root->limit_tuples = -1.0;
else
root->limit_tuples = limit_tuples;

3.
Typo:

don's => don't

Fixed,

best regards,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

passdown-limit-fdw.v3.patchapplication/octet-stream; name=passdown-limit-fdw.v3.patchDownload
 contrib/postgres_fdw/expected/postgres_fdw.out | 51 +++++++------------
 contrib/postgres_fdw/postgres_fdw.c            | 56 ++++++++++++++++++++-
 src/backend/executor/nodeLimit.c               | 68 ++------------------------
 src/backend/executor/nodeMergeAppend.c         |  6 ++-
 src/backend/executor/nodeResult.c              | 25 ++++++++++
 src/backend/executor/nodeSort.c                | 10 ++++
 src/include/nodes/execnodes.h                  |  3 ++
 7 files changed, 118 insertions(+), 101 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 785f520..c3f9433 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1003,18 +1003,15 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 -- join three tables
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
-                                                                                            QUERY PLAN                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                     QUERY PLAN                                                                                                                     
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c2, t3.c3, t1.c3
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c2, t3.c3, t1.c3
-         Sort Key: t1.c3, t1.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c2, t3.c3, t1.c3
-               Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
-               Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1))))
-(9 rows)
+         Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
+         Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1)))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
  c1 | c2 |   c3   
@@ -1477,18 +1474,15 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT
 -- full outer join + WHERE clause, only matched rows
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
-                                                                            QUERY PLAN                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                   QUERY PLAN                                                                                                   
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c1
-               Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
-               Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL)))
-(9 rows)
+         Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+         Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL))) ORDER BY r1.c1 ASC NULLS LAST, r2.c1 ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
  c1 | c1 
@@ -1804,24 +1798,15 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2
 -- CROSS JOIN, not pushed down
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                                                                            QUERY PLAN                                                                             
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Nested Loop
-               Output: t1.c1, t2.c1
-               ->  Foreign Scan on public.ft1 t1
-                     Output: t1.c1
-                     Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-               ->  Materialize
-                     Output: t2.c1
-                     ->  Foreign Scan on public.ft2 t2
-                           Output: t2.c1
-                           Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-(15 rows)
+         Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+         Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE)) ORDER BY r1."C 1" ASC NULLS LAST, r2."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
  c1 | c1  
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index fbe6929..4631f63 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2428,6 +2428,17 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
 	if (es->verbose)
 	{
 		sql = strVal(list_nth(fdw_private, FdwScanPrivateSelectSql));
+
+		/* LIMIT clause is attached on execution time */
+		if (node->ss.ps.ps_numTuples > 0)
+		{
+			StringInfoData	buf;
+
+			initStringInfo(&buf);
+			appendStringInfo(&buf, "%s LIMIT %lu",
+							 sql, node->ss.ps.ps_numTuples);
+			sql = buf.data;
+		}
 		ExplainPropertyText("Remote SQL", sql, es);
 	}
 }
@@ -2493,6 +2504,7 @@ estimate_path_cost_size(PlannerInfo *root,
 						Cost *p_startup_cost, Cost *p_total_cost)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
+	double		limit_tuples = -1.0;
 	double		rows;
 	double		retrieved_rows;
 	int			width;
@@ -2501,6 +2513,18 @@ estimate_path_cost_size(PlannerInfo *root,
 	Cost		cpu_per_tuple;
 
 	/*
+	 * LIMIT clause can be pushed down only when this foreign-path does not
+	 * need to join any other base relations, and the supplied path-keys
+	 * strictly match with the final output order.
+	 * Elsewhere, pushdown of LIMIT clause makes incorrect results, thus,
+	 * estimated cost don't assume remote LIMIT execution.
+	 */
+	if (root->limit_tuples >= 0.0 &&
+		equal(root->query_pathkeys, pathkeys) &&
+		bms_equal(root->all_baserels, foreignrel->relids))
+		limit_tuples = root->limit_tuples;
+
+	/*
 	 * If the table or the server is configured to use remote estimates,
 	 * connect to the foreign server and execute EXPLAIN to estimate the
 	 * number of rows selected by the restriction+join clauses.  Otherwise,
@@ -2553,6 +2577,12 @@ estimate_path_cost_size(PlannerInfo *root,
 		deparseSelectStmtForRel(&sql, root, foreignrel, fdw_scan_tlist,
 								remote_conds, pathkeys, &retrieved_attrs,
 								NULL);
+		/*
+		 * Attach LIMIT if this path is top-level and constant value is
+		 * specified.
+		 */
+		if (limit_tuples >= 0.0)
+			appendStringInfo(&sql, " LIMIT %.0f", limit_tuples);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->user, false);
@@ -2627,6 +2657,12 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			/* Estimate of number of rows in cross product */
 			nrows = fpinfo_i->rows * fpinfo_o->rows;
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimate to at most size of cross product */
 			retrieved_rows = Min(retrieved_rows, nrows);
 
@@ -2724,6 +2760,10 @@ estimate_path_cost_size(PlannerInfo *root,
 			/*
 			 * Number of rows expected from foreign server will be same as
 			 * that of number of groups.
+			 *
+			 * MEMO: root->limit_tuples is not attached when query contains
+			 * grouping-clause or aggregate functions. So, we don't adjust
+			 * rows even if LIMIT <const> is supplied.
 			 */
 			rows = retrieved_rows = numGroups;
 
@@ -2754,8 +2794,16 @@ estimate_path_cost_size(PlannerInfo *root,
 		}
 		else
 		{
+			double		nrows = foreignrel->tuples;
+
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimates to at most foreignrel->tuples. */
-			retrieved_rows = Min(retrieved_rows, foreignrel->tuples);
+			retrieved_rows = Min(retrieved_rows, nrows);
 
 			/*
 			 * Cost as though this were a seqscan, which is pessimistic.  We
@@ -2768,7 +2816,7 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			startup_cost += foreignrel->baserestrictcost.startup;
 			cpu_per_tuple = cpu_tuple_cost + foreignrel->baserestrictcost.per_tuple;
-			run_cost += cpu_per_tuple * foreignrel->tuples;
+			run_cost += cpu_per_tuple * nrows;
 		}
 
 		/*
@@ -2942,6 +2990,10 @@ create_cursor(ForeignScanState *node)
 	appendStringInfo(&buf, "DECLARE c%u CURSOR FOR\n%s",
 					 fsstate->cursor_number, fsstate->query);
 
+	/* Append LIMIT if numTuples were passed-down */
+	if (node->ss.ps.ps_numTuples > 0)
+		appendStringInfo(&buf, " LIMIT %ld", node->ss.ps.ps_numTuples);
+
 	/*
 	 * Notice that we pass NULL for paramTypes, thus forcing the remote server
 	 * to infer types for all parameters.  Since we explicitly cast every
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index faf32e1..df3fe01 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -26,7 +26,6 @@
 #include "nodes/nodeFuncs.h"
 
 static void recompute_limits(LimitState *node);
-static void pass_down_bound(LimitState *node, PlanState *child_node);
 
 
 /* ----------------------------------------------------------------
@@ -232,6 +231,7 @@ static void
 recompute_limits(LimitState *node)
 {
 	ExprContext *econtext = node->ps.ps_ExprContext;
+	uint64		tuples_needed;
 	Datum		val;
 	bool		isNull;
 
@@ -296,70 +296,8 @@ recompute_limits(LimitState *node)
 	node->lstate = LIMIT_RESCAN;
 
 	/* Notify child node about limit, if useful */
-	pass_down_bound(node, outerPlanState(node));
-}
-
-/*
- * If we have a COUNT, and our input is a Sort node, notify it that it can
- * use bounded sort.  Also, if our input is a MergeAppend, we can apply the
- * same bound to any Sorts that are direct children of the MergeAppend,
- * since the MergeAppend surely need read no more than that many tuples from
- * any one input.  We also have to be prepared to look through a Result,
- * since the planner might stick one atop MergeAppend for projection purposes.
- *
- * This is a bit of a kluge, but we don't have any more-abstract way of
- * communicating between the two nodes; and it doesn't seem worth trying
- * to invent one without some more examples of special communication needs.
- *
- * Note: it is the responsibility of nodeSort.c to react properly to
- * changes of these parameters.  If we ever do redesign this, it'd be a
- * good idea to integrate this signaling with the parameter-change mechanism.
- */
-static void
-pass_down_bound(LimitState *node, PlanState *child_node)
-{
-	if (IsA(child_node, SortState))
-	{
-		SortState  *sortState = (SortState *) child_node;
-		int64		tuples_needed = node->count + node->offset;
-
-		/* negative test checks for overflow in sum */
-		if (node->noCount || tuples_needed < 0)
-		{
-			/* make sure flag gets reset if needed upon rescan */
-			sortState->bounded = false;
-		}
-		else
-		{
-			sortState->bounded = true;
-			sortState->bound = tuples_needed;
-		}
-	}
-	else if (IsA(child_node, MergeAppendState))
-	{
-		MergeAppendState *maState = (MergeAppendState *) child_node;
-		int			i;
-
-		for (i = 0; i < maState->ms_nplans; i++)
-			pass_down_bound(node, maState->mergeplans[i]);
-	}
-	else if (IsA(child_node, ResultState))
-	{
-		/*
-		 * An extra consideration here is that if the Result is projecting a
-		 * targetlist that contains any SRFs, we can't assume that every input
-		 * tuple generates an output tuple, so a Sort underneath might need to
-		 * return more than N tuples to satisfy LIMIT N. So we cannot use
-		 * bounded sort.
-		 *
-		 * If Result supported qual checking, we'd have to punt on seeing a
-		 * qual, too.  Note that having a resconstantqual is not a
-		 * showstopper: if that fails we're not getting any rows at all.
-		 */
-		if (outerPlanState(child_node) &&
-			!expression_returns_set((Node *) child_node->plan->targetlist))
-			pass_down_bound(node, outerPlanState(child_node));
-	}
+	tuples_needed = (node->noCount ? 0 : node->count + node->offset);
+	outerPlanState(node)->ps_numTuples = tuples_needed;
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index e271927..91707c6 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -174,10 +174,14 @@ ExecMergeAppend(MergeAppendState *node)
 		/*
 		 * First time through: pull the first tuple from each subplan, and set
 		 * up the heap.
+		 * Also, pass down the required number of tuples
 		 */
 		for (i = 0; i < node->ms_nplans; i++)
 		{
-			node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+			PlanState  *pstate = node->mergeplans[i];
+
+			pstate->ps_numTuples = node->ps.ps_numTuples;
+			node->ms_slots[i] = ExecProcNode(pstate);
 			if (!TupIsNull(node->ms_slots[i]))
 				binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
 		}
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index 4007b76..9e83eb2 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -47,6 +47,7 @@
 
 #include "executor/executor.h"
 #include "executor/nodeResult.h"
+#include "nodes/nodeFuncs.h"
 #include "utils/memutils.h"
 
 
@@ -75,6 +76,28 @@ ExecResult(ResultState *node)
 	econtext = node->ps.ps_ExprContext;
 
 	/*
+	 * Pass down the number of required tuples by the upper node
+	 *
+	 * An extra consideration here is that if the Result is projecting a
+	 * targetlist that contains any SRFs, we can't assume that every input
+	 * tuple generates an output tuple, so a Sort underneath might need to
+	 * return more than N tuples to satisfy LIMIT N. So we cannot use
+	 * bounded sort.
+	 *
+	 * If Result supported qual checking, we'd have to punt on seeing a
+	 * qual, too.  Note that having a resconstantqual is not a
+	 * showstopper: if that fails we're not getting any rows at all.
+	 */
+	if (!node->rs_started)
+	{
+		if (outerPlanState(node) &&
+			!expression_returns_set((Node *) node->ps.plan->targetlist))
+			outerPlanState(node)->ps_numTuples = node->ps.ps_numTuples;
+
+		node->rs_started = true;
+	}
+
+	/*
 	 * check constant qualifications like (2 > 1), if not already done
 	 */
 	if (node->rs_checkqual)
@@ -218,6 +241,7 @@ ExecInitResult(Result *node, EState *estate, int eflags)
 	resstate->ps.plan = (Plan *) node;
 	resstate->ps.state = estate;
 
+	resstate->rs_started = false;
 	resstate->rs_done = false;
 	resstate->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
 
@@ -294,6 +318,7 @@ ExecEndResult(ResultState *node)
 void
 ExecReScanResult(ResultState *node)
 {
+	node->rs_started = false;
 	node->rs_done = false;
 	node->ps.ps_TupFromTlist = false;
 	node->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index a34dcc5..6ec7abb 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -66,6 +66,16 @@ ExecSort(SortState *node)
 
 		SO1_printf("ExecSort: %s\n",
 				   "sorting subplan");
+		/*
+		 * Check bounds according to the required number of tuples
+		 */
+		if (node->ss.ps.ps_numTuples == 0)
+			node->bounded = false;
+		else
+		{
+			node->bounded = true;
+			node->bound = node->ss.ps.ps_numTuples;
+		}
 
 		/*
 		 * Want to scan subplan in the forward direction while creating the
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 8004d85..9b3aed8 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1066,6 +1066,8 @@ typedef struct PlanState
 	ProjectionInfo *ps_ProjInfo;	/* info for doing tuple projection */
 	bool		ps_TupFromTlist;/* state flag for processing set-valued
 								 * functions in targetlist */
+	uint64		ps_numTuples;	/* number of tuples required by the upper
+								 * node if any. 0 means all the tuples */
 } PlanState;
 
 /* ----------------
@@ -1114,6 +1116,7 @@ typedef struct ResultState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	ExprState  *resconstantqual;
+	bool		rs_started;		/* are we already called? */
 	bool		rs_done;		/* are we done? */
 	bool		rs_checkqual;	/* do we need to check the qual? */
 } ResultState;
#4Robert Haas
robertmhaas@gmail.com
In reply to: Kohei KaiGai (#3)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

On Mon, Jan 2, 2017 at 10:07 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Oops, I oversight this patch was marked as "returned with feedback",
not "moved to the next CF".

Its status has not been changed since the last update. (Code was
revised according to the last
comment by Jeevan, but CF-Nov was time up at that time.)

How do I handle the patch?

Can you just change the status to "Moved to Next CF" and then make it
"Needs Review"?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Robert Haas (#4)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Robert Haas
Sent: Thursday, January 05, 2017 5:29 AM
To: Kohei KaiGai <kaigai@kaigai.gr.jp>
Cc: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>; Jeevan Chalke
<jeevan.chalke@enterprisedb.com>; pgsql-hackers@postgresql.org; Etsuro
Fujita <fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
Subject: Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan
[take-2]

On Mon, Jan 2, 2017 at 10:07 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Oops, I oversight this patch was marked as "returned with feedback",
not "moved to the next CF".

Its status has not been changed since the last update. (Code was
revised according to the last comment by Jeevan, but CF-Nov was time
up at that time.)

How do I handle the patch?

Can you just change the status to "Moved to Next CF" and then make it "Needs
Review"?

Unfortunately, it was not possible. Probably, administrator privilege will be
needed for this operation.
May I add it to the CF:2017-03?
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Michael Paquier
michael.paquier@gmail.com
In reply to: Kouhei Kaigai (#5)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

On Thu, Jan 5, 2017 at 8:58 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

Unfortunately, it was not possible. Probably, administrator privilege will be
needed for this operation.

Adding a patch to a CF in progress indeed requires administrator privileges,

May I add it to the CF:2017-03?

I can notice that the previous CFM has switched this patch to
"returned with feedback" without mentioning the new status on this
thread. Perhaps that was not appropriate.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Kohei KaiGai (#3)
1 attachment(s)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

The attached patch is rebased version of pass-down LIMIT clause patch,
which was forgotten to register on the last CF.

It allows to inform required number of rows to the sub-plans not only
ones we have individually handled at pass_down_bound().
Its primary target is control of number of remote tuple transfer over
the network connection by postgres_fdw.

According to the past discussion, we add a new field @ps_numTuples on
the PlanState to represent the required number of tuples.
Limit node assign a particular number on the field of sub-plan, then
this sub-plan can know its upper node does not require entire tuples,
and adjust its execution storategy.
Like MergeAppend, the sub-plan can also pass down the bounds to its
sub-plans again, if it makes sense and works correctly.

This feature is potentially a basis of GPU-based sorting on top of
CustomScan, because it has advantage for a workload to pick up the
top-N tuples if its data-size is enough small to load onto GPU-RAM.

Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
Sent: Tuesday, January 03, 2017 12:07 PM
To: Kaigai Kouhei(海外 浩平) <kaigai@ak.jp.nec.com>
Cc: Jeevan Chalke <jeevan.chalke@enterprisedb.com>; Robert Haas
<robertmhaas@gmail.com>; pgsql-hackers@postgresql.org; Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp>; Andres Freund <andres@anarazel.de>
Subject: Re: [HACKERS] PassDownLimitBound for ForeignScan/CustomScan
[take-2]

Oops, I oversight this patch was marked as "returned with feedback", not
"moved to the next CF".

Its status has not been changed since the last update. (Code was revised
according to the last comment by Jeevan, but CF-Nov was time up at that
time.)

How do I handle the patch?

2016-12-05 16:49 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:

Hello,

Sorry for my late response.
The attached patch reflects your comments.

Here are few comments on latest patch:

1.
make/make check is fine, however I am getting regression failure in
postgres_fdw contrib module (attached regression.diff).
Please investigate and fix.

It was an incorrect interaction when postgres_fdw tries to push down
sorting to the remote side. We cannot attach LIMIT clause on the plain
scan path across SORT, however, the previous version estimated the
cost for the plain scan with LIMIT clause even if local sorting is needed.
If remote scan may return just 10 rows, estimated cost of the local
sort is very lightweight, thus, this unreasonable path was chosen.
(On the other hands, its query execution results were correct because
ps_numTuples is not delivered across Sort node, so ForeignScan
eventually scanned all the remote tuples. It made correct results but
not optimal from the viewpoint of performance.)

The v3 patch estimates the cost with remote LIMIT clause only if
supplied pathkey strictly matches with the final output order of the
query, thus, no local sorting is expected.

Some of the regression test cases still have different plans but due
to the new optimization by remote LIMIT clause.
Without remote LIMIT clause, some of regression test cases preferred
remote-JOIN + local-SORT then local-LIMIT.
Once we have remote-LIMIT option, it allows to discount the cost for
remote-SORT by choice of top-k heap sorting.
It changed the optimizer's decision on some test cases.

Potential one big change is the test case below.

-- CROSS JOIN, not pushed down
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1,
t2.c1 OFFSET 100 LIMIT 10;

It assumed CROSS JOIN was not pushed down due to the cost for network
traffic, however, remote LIMIT reduced the estimated number of tuples
to be moved. So, all of the CROSS JOIN + ORDER BY + LIMIT became to
run on the remote side.

2.
+             *
+             * MEMO: root->limit_tuples is not attached when query
contains
+             * grouping-clause or aggregate functions. So, we don's

adjust

+ * rows even if LIMIT <const> is supplied.

Can you please explain why you are not doing this for grouping-clause
or aggregate functions.

See grouping_planner() at optimizer/plan/planner.c It puts an invalid
value on the root->limit_tuples if query has GROUP BY clause, so we
cannot know number of tuples to be returned even if we have upper
limit actually.

/*
* Figure out whether there's a hard limit on the number of rows

that

* query_planner's result subplan needs to return. Even if we

know a

* hard limit overall, it doesn't apply if the query has any
* grouping/aggregation operations, or SRFs in the tlist.
*/
if (parse->groupClause ||
parse->groupingSets ||
parse->distinctClause ||
parse->hasAggs ||
parse->hasWindowFuncs ||
parse->hasTargetSRFs ||
root->hasHavingQual)
root->limit_tuples = -1.0;
else
root->limit_tuples = limit_tuples;

3.
Typo:

don's => don't

Fixed,

best regards,
----
PG-Strom Project / NEC OSS Promotion Center KaiGai Kohei
<kaigai@ak.jp.nec.com>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachments:

passdown-limit-fdw.v4.patchapplication/octet-stream; name=passdown-limit-fdw.v4.patchDownload
 contrib/postgres_fdw/expected/postgres_fdw.out | 51 ++++++++-------------
 contrib/postgres_fdw/postgres_fdw.c            | 56 ++++++++++++++++++++++-
 src/backend/executor/nodeLimit.c               | 62 ++------------------------
 src/backend/executor/nodeMergeAppend.c         |  6 ++-
 src/backend/executor/nodeResult.c              | 25 +++++++++++
 src/backend/executor/nodeSort.c                | 10 +++++
 src/include/nodes/execnodes.h                  |  3 ++
 7 files changed, 118 insertions(+), 95 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0b9e3e4..08ead31 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1003,18 +1003,15 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 -- join three tables
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
-                                                                                            QUERY PLAN                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                     QUERY PLAN                                                                                                                     
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c2, t3.c3, t1.c3
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c2, t3.c3, t1.c3
-         Sort Key: t1.c3, t1.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c2, t3.c3, t1.c3
-               Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
-               Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1))))
-(9 rows)
+         Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
+         Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1)))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
  c1 | c2 |   c3   
@@ -1477,18 +1474,15 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT
 -- full outer join + WHERE clause, only matched rows
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
-                                                                            QUERY PLAN                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                   QUERY PLAN                                                                                                   
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c1
-               Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
-               Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL)))
-(9 rows)
+         Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+         Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL))) ORDER BY r1.c1 ASC NULLS LAST, r2.c1 ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
  c1 | c1 
@@ -1804,24 +1798,15 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2
 -- CROSS JOIN, not pushed down
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                                                                            QUERY PLAN                                                                             
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Nested Loop
-               Output: t1.c1, t2.c1
-               ->  Foreign Scan on public.ft1 t1
-                     Output: t1.c1
-                     Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-               ->  Materialize
-                     Output: t2.c1
-                     ->  Foreign Scan on public.ft2 t2
-                           Output: t2.c1
-                           Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-(15 rows)
+         Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+         Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE)) ORDER BY r1."C 1" ASC NULLS LAST, r2."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
  c1 | c1  
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 5d270b9..e4cdbe5 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2426,6 +2426,17 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
 	if (es->verbose)
 	{
 		sql = strVal(list_nth(fdw_private, FdwScanPrivateSelectSql));
+
+		/* LIMIT clause is attached on execution time */
+		if (node->ss.ps.ps_numTuples > 0)
+		{
+			StringInfoData	buf;
+
+			initStringInfo(&buf);
+			appendStringInfo(&buf, "%s LIMIT %lu",
+							 sql, node->ss.ps.ps_numTuples);
+			sql = buf.data;
+		}
 		ExplainPropertyText("Remote SQL", sql, es);
 	}
 }
@@ -2491,6 +2502,7 @@ estimate_path_cost_size(PlannerInfo *root,
 						Cost *p_startup_cost, Cost *p_total_cost)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
+	double		limit_tuples = -1.0;
 	double		rows;
 	double		retrieved_rows;
 	int			width;
@@ -2499,6 +2511,18 @@ estimate_path_cost_size(PlannerInfo *root,
 	Cost		cpu_per_tuple;
 
 	/*
+	 * LIMIT clause can be pushed down only when this foreign-path does not
+	 * need to join any other base relations, and the supplied path-keys
+	 * strictly match with the final output order.
+	 * Elsewhere, pushdown of LIMIT clause makes incorrect results, thus,
+	 * estimated cost don't assume remote LIMIT execution.
+	 */
+	if (root->limit_tuples >= 0.0 &&
+		equal(root->query_pathkeys, pathkeys) &&
+		bms_equal(root->all_baserels, foreignrel->relids))
+		limit_tuples = root->limit_tuples;
+
+	/*
 	 * If the table or the server is configured to use remote estimates,
 	 * connect to the foreign server and execute EXPLAIN to estimate the
 	 * number of rows selected by the restriction+join clauses.  Otherwise,
@@ -2551,6 +2575,12 @@ estimate_path_cost_size(PlannerInfo *root,
 		deparseSelectStmtForRel(&sql, root, foreignrel, fdw_scan_tlist,
 								remote_conds, pathkeys, &retrieved_attrs,
 								NULL);
+		/*
+		 * Attach LIMIT if this path is top-level and constant value is
+		 * specified.
+		 */
+		if (limit_tuples >= 0.0)
+			appendStringInfo(&sql, " LIMIT %.0f", limit_tuples);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->user, false);
@@ -2625,6 +2655,12 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			/* Estimate of number of rows in cross product */
 			nrows = fpinfo_i->rows * fpinfo_o->rows;
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimate to at most size of cross product */
 			retrieved_rows = Min(retrieved_rows, nrows);
 
@@ -2722,6 +2758,10 @@ estimate_path_cost_size(PlannerInfo *root,
 			/*
 			 * Number of rows expected from foreign server will be same as
 			 * that of number of groups.
+			 *
+			 * MEMO: root->limit_tuples is not attached when query contains
+			 * grouping-clause or aggregate functions. So, we don't adjust
+			 * rows even if LIMIT <const> is supplied.
 			 */
 			rows = retrieved_rows = numGroups;
 
@@ -2752,8 +2792,16 @@ estimate_path_cost_size(PlannerInfo *root,
 		}
 		else
 		{
+			double		nrows = foreignrel->tuples;
+
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimates to at most foreignrel->tuples. */
-			retrieved_rows = Min(retrieved_rows, foreignrel->tuples);
+			retrieved_rows = Min(retrieved_rows, nrows);
 
 			/*
 			 * Cost as though this were a seqscan, which is pessimistic.  We
@@ -2766,7 +2814,7 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			startup_cost += foreignrel->baserestrictcost.startup;
 			cpu_per_tuple = cpu_tuple_cost + foreignrel->baserestrictcost.per_tuple;
-			run_cost += cpu_per_tuple * foreignrel->tuples;
+			run_cost += cpu_per_tuple * nrows;
 		}
 
 		/*
@@ -2940,6 +2988,10 @@ create_cursor(ForeignScanState *node)
 	appendStringInfo(&buf, "DECLARE c%u CURSOR FOR\n%s",
 					 fsstate->cursor_number, fsstate->query);
 
+	/* Append LIMIT if numTuples were passed-down */
+	if (node->ss.ps.ps_numTuples > 0)
+		appendStringInfo(&buf, " LIMIT %ld", node->ss.ps.ps_numTuples);
+
 	/*
 	 * Notice that we pass NULL for paramTypes, thus forcing the remote server
 	 * to infer types for all parameters.  Since we explicitly cast every
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index aaec132..3cbc97a 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -26,7 +26,6 @@
 #include "nodes/nodeFuncs.h"
 
 static void recompute_limits(LimitState *node);
-static void pass_down_bound(LimitState *node, PlanState *child_node);
 
 
 /* ----------------------------------------------------------------
@@ -232,6 +231,7 @@ static void
 recompute_limits(LimitState *node)
 {
 	ExprContext *econtext = node->ps.ps_ExprContext;
+	uint64		tuples_needed;
 	Datum		val;
 	bool		isNull;
 
@@ -293,64 +293,8 @@ recompute_limits(LimitState *node)
 	/* Set state-machine state */
 	node->lstate = LIMIT_RESCAN;
 
-	/* Notify child node about limit, if useful */
-	pass_down_bound(node, outerPlanState(node));
-}
-
-/*
- * If we have a COUNT, and our input is a Sort node, notify it that it can
- * use bounded sort.  Also, if our input is a MergeAppend, we can apply the
- * same bound to any Sorts that are direct children of the MergeAppend,
- * since the MergeAppend surely need read no more than that many tuples from
- * any one input.  We also have to be prepared to look through a Result,
- * since the planner might stick one atop MergeAppend for projection purposes.
- *
- * This is a bit of a kluge, but we don't have any more-abstract way of
- * communicating between the two nodes; and it doesn't seem worth trying
- * to invent one without some more examples of special communication needs.
- *
- * Note: it is the responsibility of nodeSort.c to react properly to
- * changes of these parameters.  If we ever do redesign this, it'd be a
- * good idea to integrate this signaling with the parameter-change mechanism.
- */
-static void
-pass_down_bound(LimitState *node, PlanState *child_node)
-{
-	if (IsA(child_node, SortState))
-	{
-		SortState  *sortState = (SortState *) child_node;
-		int64		tuples_needed = node->count + node->offset;
-
-		/* negative test checks for overflow in sum */
-		if (node->noCount || tuples_needed < 0)
-		{
-			/* make sure flag gets reset if needed upon rescan */
-			sortState->bounded = false;
-		}
-		else
-		{
-			sortState->bounded = true;
-			sortState->bound = tuples_needed;
-		}
-	}
-	else if (IsA(child_node, MergeAppendState))
-	{
-		MergeAppendState *maState = (MergeAppendState *) child_node;
-		int			i;
-
-		for (i = 0; i < maState->ms_nplans; i++)
-			pass_down_bound(node, maState->mergeplans[i]);
-	}
-	else if (IsA(child_node, ResultState))
-	{
-		/*
-		 * If Result supported qual checking, we'd have to punt on seeing a
-		 * qual.  Note that having a resconstantqual is not a showstopper: if
-		 * that fails we're not getting any rows at all.
-		 */
-		if (outerPlanState(child_node))
-			pass_down_bound(node, outerPlanState(child_node));
-	}
+	tuples_needed = (node->noCount ? 0 : node->count + node->offset);
+	outerPlanState(node)->ps_numTuples = tuples_needed;
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 7a20bf0..6e76e31 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -174,10 +174,14 @@ ExecMergeAppend(MergeAppendState *node)
 		/*
 		 * First time through: pull the first tuple from each subplan, and set
 		 * up the heap.
+		 * Also, pass down the required number of tuples
 		 */
 		for (i = 0; i < node->ms_nplans; i++)
 		{
-			node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+			PlanState  *pstate = node->mergeplans[i];
+
+			pstate->ps_numTuples = node->ps.ps_numTuples;
+			node->ms_slots[i] = ExecProcNode(pstate);
 			if (!TupIsNull(node->ms_slots[i]))
 				binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
 		}
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index b5b50b2..be94b42 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -47,6 +47,7 @@
 
 #include "executor/executor.h"
 #include "executor/nodeResult.h"
+#include "nodes/nodeFuncs.h"
 #include "utils/memutils.h"
 
 
@@ -73,6 +74,28 @@ ExecResult(ResultState *node)
 	econtext = node->ps.ps_ExprContext;
 
 	/*
+	 * Pass down the number of required tuples by the upper node
+	 *
+	 * An extra consideration here is that if the Result is projecting a
+	 * targetlist that contains any SRFs, we can't assume that every input
+	 * tuple generates an output tuple, so a Sort underneath might need to
+	 * return more than N tuples to satisfy LIMIT N. So we cannot use
+	 * bounded sort.
+	 *
+	 * If Result supported qual checking, we'd have to punt on seeing a
+	 * qual, too.  Note that having a resconstantqual is not a
+	 * showstopper: if that fails we're not getting any rows at all.
+	 */
+	if (!node->rs_started)
+	{
+		if (outerPlanState(node) &&
+			!expression_returns_set((Node *) node->ps.plan->targetlist))
+			outerPlanState(node)->ps_numTuples = node->ps.ps_numTuples;
+
+		node->rs_started = true;
+	}
+
+	/*
 	 * check constant qualifications like (2 > 1), if not already done
 	 */
 	if (node->rs_checkqual)
@@ -191,6 +214,7 @@ ExecInitResult(Result *node, EState *estate, int eflags)
 	resstate->ps.plan = (Plan *) node;
 	resstate->ps.state = estate;
 
+	resstate->rs_started = false;
 	resstate->rs_done = false;
 	resstate->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
 
@@ -265,6 +289,7 @@ ExecEndResult(ResultState *node)
 void
 ExecReScanResult(ResultState *node)
 {
+	node->rs_started = false;
 	node->rs_done = false;
 	node->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
 
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 591a31a..b6b72f1 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -66,6 +66,16 @@ ExecSort(SortState *node)
 
 		SO1_printf("ExecSort: %s\n",
 				   "sorting subplan");
+		/*
+		 * Check bounds according to the required number of tuples
+		 */
+		if (node->ss.ps.ps_numTuples == 0)
+			node->bounded = false;
+		else
+		{
+			node->bounded = true;
+			node->bound = node->ss.ps.ps_numTuples;
+		}
 
 		/*
 		 * Want to scan subplan in the forward direction while creating the
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6332ea0..2e3c37a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1077,6 +1077,8 @@ typedef struct PlanState
 	TupleTableSlot *ps_ResultTupleSlot; /* slot for my result tuples */
 	ExprContext *ps_ExprContext;	/* node's expression-evaluation context */
 	ProjectionInfo *ps_ProjInfo;	/* info for doing tuple projection */
+	uint64		ps_numTuples;	/* number of tuples required by the upper
+								 * node if any. 0 means all the tuples */
 } PlanState;
 
 /* ----------------
@@ -1125,6 +1127,7 @@ typedef struct ResultState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	ExprState  *resconstantqual;
+	bool		rs_started;		/* are we already called? */
 	bool		rs_done;		/* are we done? */
 	bool		rs_checkqual;	/* do we need to check the qual? */
 } ResultState;
#8Tels
nospam-pg-abuse@bloodgate.com
In reply to: Kouhei Kaigai (#7)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Hello all,

as this is my first mail to pgsql-hackers, please be gentle :)

I've looked at the patch, and as I'm not that familiar with the
pg-sourcecode, customs and so on, this isn't a review, more like food for
thought and all should be taken with a grain of salt. :)

So here are a few questions and remarks:

+ double limit_tuples = -1.0;

Surely the limit cannot be fractional, and must be an integer. So wouldn't
it be better the same type as say:

+ if (root->limit_tuples >= 0.0 &&

Than you could also compare with ">= 0", not ">= 0.0".

node->ss.ps.ps_numTuples is f.i. an uint64.

Or is there a specific reason the limit must be a double?

And finally:

+ if (node->ss.ps.ps_numTuples > 0)

+ appendStringInfo(&buf, " LIMIT %ld", node->ss.ps.ps_numTuples);

vs.

+			appendStringInfo(&buf, "%s LIMIT %lu",
+							 sql, node->ss.ps.ps_numTuples);

It seems odd to have two different format strings here for the same variable.

A few comments miss "." at the end, like these:

+ * Also, pass down the required number of tuples

+ * Pass down the number of required tuples by the upper node

And this comment might be better "were we already called?"

+ bool rs_started; /* are we already called? */

Hope this helps, and thank you for working on this issue.

Regards,

Tels

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Tels (#8)
1 attachment(s)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Hello all,

as this is my first mail to pgsql-hackers, please be gentle :)

Welcome to pgsql-hackers,

I've looked at the patch, and as I'm not that familiar with the pg-sourcecode,
customs and so on, this isn't a review, more like food for thought and all
should be taken with a grain of salt. :)

So here are a few questions and remarks:

+ double limit_tuples = -1.0;

Surely the limit cannot be fractional, and must be an integer. So wouldn't
it be better the same type as say:

+ if (root->limit_tuples >= 0.0 &&

Than you could also compare with ">= 0", not ">= 0.0".

The above variable represents the "estimated" number of rows at the
planning stage, not execution time.
You may be able to see Path structure has "rows" field declared as
double type. It makes sense to consider stupid paths during planning,
even if it is eventually rejected. For example, if a cross join with
two large tables appear during planning, 64bit integer will make overflow
easily.

node->ss.ps.ps_numTuples is f.i. an uint64.

Or is there a specific reason the limit must be a double?

The above variable represents "actual" number of rows at the execution
stage. Likely, hardware replacement cycle will come before int64 overflow.

And finally:

+ if (node->ss.ps.ps_numTuples > 0)

+ appendStringInfo(&buf, " LIMIT %ld",

node->ss.ps.ps_numTuples);

vs.

+			appendStringInfo(&buf, "%s LIMIT %lu",
+							 sql,

node->ss.ps.ps_numTuples);

It seems odd to have two different format strings here for the same variable.

Ah, yes, %lu is right because ps_numTuples is uint64.

A few comments miss "." at the end, like these:

+ * Also, pass down the required number of tuples

+ * Pass down the number of required tuples by the upper node

OK,

And this comment might be better "were we already called?"

+ bool rs_started; /* are we already

called? */

Other variables in ResultState uses present form, like:

+ bool rs_started; /* are we already called? */
bool rs_done; /* are we done? */
bool rs_checkqual; /* do we need to check the qual? */
} ResultState;

Thanks,
----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

passdown-limit-fdw.v5.patchapplication/octet-stream; name=passdown-limit-fdw.v5.patchDownload
 contrib/postgres_fdw/expected/postgres_fdw.out | 51 ++++++++-------------
 contrib/postgres_fdw/postgres_fdw.c            | 56 ++++++++++++++++++++++-
 src/backend/executor/nodeLimit.c               | 62 ++------------------------
 src/backend/executor/nodeMergeAppend.c         |  6 ++-
 src/backend/executor/nodeResult.c              | 25 +++++++++++
 src/backend/executor/nodeSort.c                | 10 +++++
 src/include/nodes/execnodes.h                  |  3 ++
 7 files changed, 118 insertions(+), 95 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0b9e3e4..08ead31 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1003,18 +1003,15 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 -- join three tables
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
-                                                                                            QUERY PLAN                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                     QUERY PLAN                                                                                                                     
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c2, t3.c3, t1.c3
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c2, t3.c3, t1.c3
-         Sort Key: t1.c3, t1.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c2, t3.c3, t1.c3
-               Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
-               Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1))))
-(9 rows)
+         Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
+         Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1)))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
  c1 | c2 |   c3   
@@ -1477,18 +1474,15 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT
 -- full outer join + WHERE clause, only matched rows
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
-                                                                            QUERY PLAN                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                   QUERY PLAN                                                                                                   
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c1
-               Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
-               Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL)))
-(9 rows)
+         Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+         Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL))) ORDER BY r1.c1 ASC NULLS LAST, r2.c1 ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
  c1 | c1 
@@ -1804,24 +1798,15 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2
 -- CROSS JOIN, not pushed down
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                                                                            QUERY PLAN                                                                             
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Nested Loop
-               Output: t1.c1, t2.c1
-               ->  Foreign Scan on public.ft1 t1
-                     Output: t1.c1
-                     Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-               ->  Materialize
-                     Output: t2.c1
-                     ->  Foreign Scan on public.ft2 t2
-                           Output: t2.c1
-                           Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-(15 rows)
+         Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+         Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE)) ORDER BY r1."C 1" ASC NULLS LAST, r2."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
  c1 | c1  
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 5d270b9..7a11f1a 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2426,6 +2426,17 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
 	if (es->verbose)
 	{
 		sql = strVal(list_nth(fdw_private, FdwScanPrivateSelectSql));
+
+		/* LIMIT clause is attached on execution time */
+		if (node->ss.ps.ps_numTuples > 0)
+		{
+			StringInfoData	buf;
+
+			initStringInfo(&buf);
+			appendStringInfo(&buf, "%s LIMIT %lu",
+							 sql, node->ss.ps.ps_numTuples);
+			sql = buf.data;
+		}
 		ExplainPropertyText("Remote SQL", sql, es);
 	}
 }
@@ -2491,6 +2502,7 @@ estimate_path_cost_size(PlannerInfo *root,
 						Cost *p_startup_cost, Cost *p_total_cost)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
+	double		limit_tuples = -1.0;
 	double		rows;
 	double		retrieved_rows;
 	int			width;
@@ -2499,6 +2511,18 @@ estimate_path_cost_size(PlannerInfo *root,
 	Cost		cpu_per_tuple;
 
 	/*
+	 * LIMIT clause can be pushed down only when this foreign-path does not
+	 * need to join any other base relations, and the supplied path-keys
+	 * strictly match with the final output order.
+	 * Elsewhere, pushdown of LIMIT clause makes incorrect results, thus,
+	 * estimated cost don't assume remote LIMIT execution.
+	 */
+	if (root->limit_tuples >= 0.0 &&
+		equal(root->query_pathkeys, pathkeys) &&
+		bms_equal(root->all_baserels, foreignrel->relids))
+		limit_tuples = root->limit_tuples;
+
+	/*
 	 * If the table or the server is configured to use remote estimates,
 	 * connect to the foreign server and execute EXPLAIN to estimate the
 	 * number of rows selected by the restriction+join clauses.  Otherwise,
@@ -2551,6 +2575,12 @@ estimate_path_cost_size(PlannerInfo *root,
 		deparseSelectStmtForRel(&sql, root, foreignrel, fdw_scan_tlist,
 								remote_conds, pathkeys, &retrieved_attrs,
 								NULL);
+		/*
+		 * Attach LIMIT if this path is top-level and constant value is
+		 * specified.
+		 */
+		if (limit_tuples >= 0.0)
+			appendStringInfo(&sql, " LIMIT %.0f", limit_tuples);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->user, false);
@@ -2625,6 +2655,12 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			/* Estimate of number of rows in cross product */
 			nrows = fpinfo_i->rows * fpinfo_o->rows;
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimate to at most size of cross product */
 			retrieved_rows = Min(retrieved_rows, nrows);
 
@@ -2722,6 +2758,10 @@ estimate_path_cost_size(PlannerInfo *root,
 			/*
 			 * Number of rows expected from foreign server will be same as
 			 * that of number of groups.
+			 *
+			 * MEMO: root->limit_tuples is not attached when query contains
+			 * grouping-clause or aggregate functions. So, we don't adjust
+			 * rows even if LIMIT <const> is supplied.
 			 */
 			rows = retrieved_rows = numGroups;
 
@@ -2752,8 +2792,16 @@ estimate_path_cost_size(PlannerInfo *root,
 		}
 		else
 		{
+			double		nrows = foreignrel->tuples;
+
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimates to at most foreignrel->tuples. */
-			retrieved_rows = Min(retrieved_rows, foreignrel->tuples);
+			retrieved_rows = Min(retrieved_rows, nrows);
 
 			/*
 			 * Cost as though this were a seqscan, which is pessimistic.  We
@@ -2766,7 +2814,7 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			startup_cost += foreignrel->baserestrictcost.startup;
 			cpu_per_tuple = cpu_tuple_cost + foreignrel->baserestrictcost.per_tuple;
-			run_cost += cpu_per_tuple * foreignrel->tuples;
+			run_cost += cpu_per_tuple * nrows;
 		}
 
 		/*
@@ -2940,6 +2988,10 @@ create_cursor(ForeignScanState *node)
 	appendStringInfo(&buf, "DECLARE c%u CURSOR FOR\n%s",
 					 fsstate->cursor_number, fsstate->query);
 
+	/* Append LIMIT if numTuples were passed-down */
+	if (node->ss.ps.ps_numTuples > 0)
+		appendStringInfo(&buf, " LIMIT %lu", node->ss.ps.ps_numTuples);
+
 	/*
 	 * Notice that we pass NULL for paramTypes, thus forcing the remote server
 	 * to infer types for all parameters.  Since we explicitly cast every
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index aaec132..3cbc97a 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -26,7 +26,6 @@
 #include "nodes/nodeFuncs.h"
 
 static void recompute_limits(LimitState *node);
-static void pass_down_bound(LimitState *node, PlanState *child_node);
 
 
 /* ----------------------------------------------------------------
@@ -232,6 +231,7 @@ static void
 recompute_limits(LimitState *node)
 {
 	ExprContext *econtext = node->ps.ps_ExprContext;
+	uint64		tuples_needed;
 	Datum		val;
 	bool		isNull;
 
@@ -293,64 +293,8 @@ recompute_limits(LimitState *node)
 	/* Set state-machine state */
 	node->lstate = LIMIT_RESCAN;
 
-	/* Notify child node about limit, if useful */
-	pass_down_bound(node, outerPlanState(node));
-}
-
-/*
- * If we have a COUNT, and our input is a Sort node, notify it that it can
- * use bounded sort.  Also, if our input is a MergeAppend, we can apply the
- * same bound to any Sorts that are direct children of the MergeAppend,
- * since the MergeAppend surely need read no more than that many tuples from
- * any one input.  We also have to be prepared to look through a Result,
- * since the planner might stick one atop MergeAppend for projection purposes.
- *
- * This is a bit of a kluge, but we don't have any more-abstract way of
- * communicating between the two nodes; and it doesn't seem worth trying
- * to invent one without some more examples of special communication needs.
- *
- * Note: it is the responsibility of nodeSort.c to react properly to
- * changes of these parameters.  If we ever do redesign this, it'd be a
- * good idea to integrate this signaling with the parameter-change mechanism.
- */
-static void
-pass_down_bound(LimitState *node, PlanState *child_node)
-{
-	if (IsA(child_node, SortState))
-	{
-		SortState  *sortState = (SortState *) child_node;
-		int64		tuples_needed = node->count + node->offset;
-
-		/* negative test checks for overflow in sum */
-		if (node->noCount || tuples_needed < 0)
-		{
-			/* make sure flag gets reset if needed upon rescan */
-			sortState->bounded = false;
-		}
-		else
-		{
-			sortState->bounded = true;
-			sortState->bound = tuples_needed;
-		}
-	}
-	else if (IsA(child_node, MergeAppendState))
-	{
-		MergeAppendState *maState = (MergeAppendState *) child_node;
-		int			i;
-
-		for (i = 0; i < maState->ms_nplans; i++)
-			pass_down_bound(node, maState->mergeplans[i]);
-	}
-	else if (IsA(child_node, ResultState))
-	{
-		/*
-		 * If Result supported qual checking, we'd have to punt on seeing a
-		 * qual.  Note that having a resconstantqual is not a showstopper: if
-		 * that fails we're not getting any rows at all.
-		 */
-		if (outerPlanState(child_node))
-			pass_down_bound(node, outerPlanState(child_node));
-	}
+	tuples_needed = (node->noCount ? 0 : node->count + node->offset);
+	outerPlanState(node)->ps_numTuples = tuples_needed;
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 7a20bf0..bd7e0ce 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -174,10 +174,14 @@ ExecMergeAppend(MergeAppendState *node)
 		/*
 		 * First time through: pull the first tuple from each subplan, and set
 		 * up the heap.
+		 * Also, pass down the required number of tuples.
 		 */
 		for (i = 0; i < node->ms_nplans; i++)
 		{
-			node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+			PlanState  *pstate = node->mergeplans[i];
+
+			pstate->ps_numTuples = node->ps.ps_numTuples;
+			node->ms_slots[i] = ExecProcNode(pstate);
 			if (!TupIsNull(node->ms_slots[i]))
 				binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
 		}
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index b5b50b2..be94b42 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -47,6 +47,7 @@
 
 #include "executor/executor.h"
 #include "executor/nodeResult.h"
+#include "nodes/nodeFuncs.h"
 #include "utils/memutils.h"
 
 
@@ -73,6 +74,28 @@ ExecResult(ResultState *node)
 	econtext = node->ps.ps_ExprContext;
 
 	/*
+	 * Pass down the number of required tuples by the upper node
+	 *
+	 * An extra consideration here is that if the Result is projecting a
+	 * targetlist that contains any SRFs, we can't assume that every input
+	 * tuple generates an output tuple, so a Sort underneath might need to
+	 * return more than N tuples to satisfy LIMIT N. So we cannot use
+	 * bounded sort.
+	 *
+	 * If Result supported qual checking, we'd have to punt on seeing a
+	 * qual, too.  Note that having a resconstantqual is not a
+	 * showstopper: if that fails we're not getting any rows at all.
+	 */
+	if (!node->rs_started)
+	{
+		if (outerPlanState(node) &&
+			!expression_returns_set((Node *) node->ps.plan->targetlist))
+			outerPlanState(node)->ps_numTuples = node->ps.ps_numTuples;
+
+		node->rs_started = true;
+	}
+
+	/*
 	 * check constant qualifications like (2 > 1), if not already done
 	 */
 	if (node->rs_checkqual)
@@ -191,6 +214,7 @@ ExecInitResult(Result *node, EState *estate, int eflags)
 	resstate->ps.plan = (Plan *) node;
 	resstate->ps.state = estate;
 
+	resstate->rs_started = false;
 	resstate->rs_done = false;
 	resstate->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
 
@@ -265,6 +289,7 @@ ExecEndResult(ResultState *node)
 void
 ExecReScanResult(ResultState *node)
 {
+	node->rs_started = false;
 	node->rs_done = false;
 	node->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
 
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 591a31a..b6b72f1 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -66,6 +66,16 @@ ExecSort(SortState *node)
 
 		SO1_printf("ExecSort: %s\n",
 				   "sorting subplan");
+		/*
+		 * Check bounds according to the required number of tuples
+		 */
+		if (node->ss.ps.ps_numTuples == 0)
+			node->bounded = false;
+		else
+		{
+			node->bounded = true;
+			node->bound = node->ss.ps.ps_numTuples;
+		}
 
 		/*
 		 * Want to scan subplan in the forward direction while creating the
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 6332ea0..2e3c37a 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1077,6 +1077,8 @@ typedef struct PlanState
 	TupleTableSlot *ps_ResultTupleSlot; /* slot for my result tuples */
 	ExprContext *ps_ExprContext;	/* node's expression-evaluation context */
 	ProjectionInfo *ps_ProjInfo;	/* info for doing tuple projection */
+	uint64		ps_numTuples;	/* number of tuples required by the upper
+								 * node if any. 0 means all the tuples */
 } PlanState;
 
 /* ----------------
@@ -1125,6 +1127,7 @@ typedef struct ResultState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	ExprState  *resconstantqual;
+	bool		rs_started;		/* are we already called? */
 	bool		rs_done;		/* are we done? */
 	bool		rs_checkqual;	/* do we need to check the qual? */
 } ResultState;
#10Tels
nospam-pg-abuse@bloodgate.com
In reply to: Kouhei Kaigai (#9)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Hello,

On Wed, March 1, 2017 7:21 pm, Kouhei Kaigai wrote:

I've looked at the patch, and as I'm not that familiar with the
pg-sourcecode,
customs and so on, this isn't a review, more like food for thought and
all
should be taken with a grain of salt. :)

So here are a few questions and remarks:

+ double limit_tuples = -1.0;

Surely the limit cannot be fractional, and must be an integer. So
wouldn't
it be better the same type as say:

+ if (root->limit_tuples >= 0.0 &&

Than you could also compare with ">= 0", not ">= 0.0".

The above variable represents the "estimated" number of rows at the
planning stage, not execution time.
You may be able to see Path structure has "rows" field declared as
double type. It makes sense to consider stupid paths during planning,
even if it is eventually rejected. For example, if a cross join with
two large tables appear during planning, 64bit integer will make overflow
easily.

Hm, ok. Not related to your patch, just curious: Is there a mechanism in
place that automatically rejects plans where the limit would overflow the
double to uint64 conversation? Or is this more of a "there would be
hopefully a plan with a better limit so we do not use the bad one"?

Would it possible to force a plan where such overflow would occur?

And this comment might be better "were we already called?"

+ bool rs_started; /* are we already

called? */

Other variables in ResultState uses present form, like:

+ bool rs_started; /* are we already called? */
bool rs_done; /* are we done? */
bool rs_checkqual; /* do we need to check the qual? */
} ResultState;

Yes, I noted that, but still "are" and "called" and "already" don't read
well together for me:

are - present form
called - past form like "were we called?", or "are we called bob?" an
ongoing process
already - it has started

So "are we already called" reads like someone is waiting for being called.

Maybe to mirror the comment on "rs_done":

/* have we started yet? */

Also, maybe it's easier for the comment to describe what is happening in
the code because of the flag, not just to the flag itself:

/* To do things once when we are called */

Anyway, it is a minor point and don't let me distract you from your work,
I do like the feature and the patch :)

Best wishes,

Tels

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Kouhei Kaigai
kaigai@ak.jp.nec.com
In reply to: Tels (#10)
1 attachment(s)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Hello,

On Wed, March 1, 2017 7:21 pm, Kouhei Kaigai wrote:

I've looked at the patch, and as I'm not that familiar with the
pg-sourcecode, customs and so on, this isn't a review, more like food
for thought and all should be taken with a grain of salt. :)

So here are a few questions and remarks:

+ double limit_tuples = -1.0;

Surely the limit cannot be fractional, and must be an integer. So
wouldn't it be better the same type as say:

+ if (root->limit_tuples >= 0.0 &&

Than you could also compare with ">= 0", not ">= 0.0".

The above variable represents the "estimated" number of rows at the
planning stage, not execution time.
You may be able to see Path structure has "rows" field declared as
double type. It makes sense to consider stupid paths during planning,
even if it is eventually rejected. For example, if a cross join with
two large tables appear during planning, 64bit integer will make
overflow easily.

Hm, ok. Not related to your patch, just curious: Is there a mechanism in
place that automatically rejects plans where the limit would overflow the
double to uint64 conversation? Or is this more of a "there would be hopefully
a plan with a better limit so we do not use the bad one"?

Would it possible to force a plan where such overflow would occur?

We have no such mechanism, and less necessity.
Estimated number of rows in plan time is stored in the plan_rows field of
the Plan structure, as FP64. Once plan-tree gets constructed, estimated
number of rows shall not affect to the execution. (Some plan might use it
for estimation of resource consumption on execution time.)
On the other hands, the actual number of rows that was processed is saved
on the instrument field of the PlanState structure. It is counted up from
the zero by one. So, people wants to replace the hardware prior to uint64
overflow. If 1B rows are processed per sec, uint64 overflow happen at 26th
century(!).

And this comment might be better "were we already called?"

+ bool rs_started; /* are we already

called? */

Other variables in ResultState uses present form, like:

+ bool rs_started; /* are we already called? */
bool rs_done; /* are we done? */
bool rs_checkqual; /* do we need to check the qual? */
} ResultState;

Yes, I noted that, but still "are" and "called" and "already" don't read
well together for me:

are - present form
called - past form like "were we called?", or "are we called bob?" an
ongoing process
already - it has started

So "are we already called" reads like someone is waiting for being called.

Maybe to mirror the comment on "rs_done":

/* have we started yet? */

Also, maybe it's easier for the comment to describe what is happening in
the code because of the flag, not just to the flag itself:

/* To do things once when we are called */

Anyway, it is a minor point and don't let me distract you from your work,
I do like the feature and the patch :)

Fixed to "have we started yet?"

----
PG-Strom Project / NEC OSS Promotion Center
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachments:

passdown-limit-fdw.v6.patchapplication/octet-stream; name=passdown-limit-fdw.v6.patchDownload
 contrib/postgres_fdw/expected/postgres_fdw.out | 51 ++++++++-------------
 contrib/postgres_fdw/postgres_fdw.c            | 56 ++++++++++++++++++++++-
 src/backend/executor/nodeLimit.c               | 62 ++------------------------
 src/backend/executor/nodeMergeAppend.c         |  6 ++-
 src/backend/executor/nodeResult.c              | 25 +++++++++++
 src/backend/executor/nodeSort.c                | 10 +++++
 src/include/nodes/execnodes.h                  |  3 ++
 7 files changed, 118 insertions(+), 95 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 0b9e3e4..08ead31 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1003,18 +1003,15 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 -- join three tables
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
-                                                                                            QUERY PLAN                                                                                             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                                     QUERY PLAN                                                                                                                     
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c2, t3.c3, t1.c3
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c2, t3.c3, t1.c3
-         Sort Key: t1.c3, t1.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c2, t3.c3, t1.c3
-               Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
-               Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1))))
-(9 rows)
+         Relations: ((public.ft1 t1) INNER JOIN (public.ft2 t2)) INNER JOIN (public.ft4 t3)
+         Remote SQL: SELECT r1."C 1", r1.c3, r2.c2, r4.c3 FROM (("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1")))) INNER JOIN "S 1"."T 3" r4 ON (((r1."C 1" = r4.c1)))) ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c2, t3.c3 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) JOIN ft4 t3 ON (t3.c1 = t1.c1) ORDER BY t1.c3, t1.c1 OFFSET 10 LIMIT 10;
  c1 | c2 |   c3   
@@ -1477,18 +1474,15 @@ SELECT t1.c1, t2.c2, t3.c3 FROM ft2 t1 LEFT JOIN ft2 t2 ON (t1.c1 = t2.c1) RIGHT
 -- full outer join + WHERE clause, only matched rows
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
-                                                                            QUERY PLAN                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
+                                                                                                   QUERY PLAN                                                                                                   
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Foreign Scan
-               Output: t1.c1, t2.c1
-               Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
-               Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL)))
-(9 rows)
+         Relations: (public.ft4 t1) FULL JOIN (public.ft5 t2)
+         Remote SQL: SELECT r1.c1, r2.c1 FROM ("S 1"."T 3" r1 FULL JOIN "S 1"."T 4" r2 ON (((r1.c1 = r2.c1)))) WHERE (((r1.c1 = r2.c1) OR (r1.c1 IS NULL))) ORDER BY r1.c1 ASC NULLS LAST, r2.c1 ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft4 t1 FULL JOIN ft5 t2 ON (t1.c1 = t2.c1) WHERE (t1.c1 = t2.c1 OR t1.c1 IS NULL) ORDER BY t1.c1, t2.c1 OFFSET 10 LIMIT 10;
  c1 | c1 
@@ -1804,24 +1798,15 @@ SELECT t1.c1 FROM ft1 t1 WHERE NOT EXISTS (SELECT 1 FROM ft2 t2 WHERE t1.c1 = t2
 -- CROSS JOIN, not pushed down
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
-                             QUERY PLAN                              
----------------------------------------------------------------------
+                                                                            QUERY PLAN                                                                             
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit
    Output: t1.c1, t2.c1
-   ->  Sort
+   ->  Foreign Scan
          Output: t1.c1, t2.c1
-         Sort Key: t1.c1, t2.c1
-         ->  Nested Loop
-               Output: t1.c1, t2.c1
-               ->  Foreign Scan on public.ft1 t1
-                     Output: t1.c1
-                     Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-               ->  Materialize
-                     Output: t2.c1
-                     ->  Foreign Scan on public.ft2 t2
-                           Output: t2.c1
-                           Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
-(15 rows)
+         Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
+         Remote SQL: SELECT r1."C 1", r2."C 1" FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (TRUE)) ORDER BY r1."C 1" ASC NULLS LAST, r2."C 1" ASC NULLS LAST
+(6 rows)
 
 SELECT t1.c1, t2.c1 FROM ft1 t1 CROSS JOIN ft2 t2 ORDER BY t1.c1, t2.c1 OFFSET 100 LIMIT 10;
  c1 | c1  
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 990313a..53e6ebd 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2427,6 +2427,17 @@ postgresExplainForeignScan(ForeignScanState *node, ExplainState *es)
 	if (es->verbose)
 	{
 		sql = strVal(list_nth(fdw_private, FdwScanPrivateSelectSql));
+
+		/* LIMIT clause is attached on execution time */
+		if (node->ss.ps.ps_numTuples > 0)
+		{
+			StringInfoData	buf;
+
+			initStringInfo(&buf);
+			appendStringInfo(&buf, "%s LIMIT %lu",
+							 sql, node->ss.ps.ps_numTuples);
+			sql = buf.data;
+		}
 		ExplainPropertyText("Remote SQL", sql, es);
 	}
 }
@@ -2492,6 +2503,7 @@ estimate_path_cost_size(PlannerInfo *root,
 						Cost *p_startup_cost, Cost *p_total_cost)
 {
 	PgFdwRelationInfo *fpinfo = (PgFdwRelationInfo *) foreignrel->fdw_private;
+	double		limit_tuples = -1.0;
 	double		rows;
 	double		retrieved_rows;
 	int			width;
@@ -2500,6 +2512,18 @@ estimate_path_cost_size(PlannerInfo *root,
 	Cost		cpu_per_tuple;
 
 	/*
+	 * LIMIT clause can be pushed down only when this foreign-path does not
+	 * need to join any other base relations, and the supplied path-keys
+	 * strictly match with the final output order.
+	 * Elsewhere, pushdown of LIMIT clause makes incorrect results, thus,
+	 * estimated cost don't assume remote LIMIT execution.
+	 */
+	if (root->limit_tuples >= 0.0 &&
+		equal(root->query_pathkeys, pathkeys) &&
+		bms_equal(root->all_baserels, foreignrel->relids))
+		limit_tuples = root->limit_tuples;
+
+	/*
 	 * If the table or the server is configured to use remote estimates,
 	 * connect to the foreign server and execute EXPLAIN to estimate the
 	 * number of rows selected by the restriction+join clauses.  Otherwise,
@@ -2552,6 +2576,12 @@ estimate_path_cost_size(PlannerInfo *root,
 		deparseSelectStmtForRel(&sql, root, foreignrel, fdw_scan_tlist,
 								remote_conds, pathkeys, &retrieved_attrs,
 								NULL);
+		/*
+		 * Attach LIMIT if this path is top-level and constant value is
+		 * specified.
+		 */
+		if (limit_tuples >= 0.0)
+			appendStringInfo(&sql, " LIMIT %.0f", limit_tuples);
 
 		/* Get the remote estimate */
 		conn = GetConnection(fpinfo->user, false);
@@ -2626,6 +2656,12 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			/* Estimate of number of rows in cross product */
 			nrows = fpinfo_i->rows * fpinfo_o->rows;
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimate to at most size of cross product */
 			retrieved_rows = Min(retrieved_rows, nrows);
 
@@ -2723,6 +2759,10 @@ estimate_path_cost_size(PlannerInfo *root,
 			/*
 			 * Number of rows expected from foreign server will be same as
 			 * that of number of groups.
+			 *
+			 * MEMO: root->limit_tuples is not attached when query contains
+			 * grouping-clause or aggregate functions. So, we don't adjust
+			 * rows even if LIMIT <const> is supplied.
 			 */
 			rows = retrieved_rows = numGroups;
 
@@ -2753,8 +2793,16 @@ estimate_path_cost_size(PlannerInfo *root,
 		}
 		else
 		{
+			double		nrows = foreignrel->tuples;
+
+			if (limit_tuples >= 0.0)
+			{
+				nrows = Min(nrows, limit_tuples);
+				rows = Min(rows, limit_tuples);
+			}
+
 			/* Clamp retrieved rows estimates to at most foreignrel->tuples. */
-			retrieved_rows = Min(retrieved_rows, foreignrel->tuples);
+			retrieved_rows = Min(retrieved_rows, nrows);
 
 			/*
 			 * Cost as though this were a seqscan, which is pessimistic.  We
@@ -2767,7 +2815,7 @@ estimate_path_cost_size(PlannerInfo *root,
 
 			startup_cost += foreignrel->baserestrictcost.startup;
 			cpu_per_tuple = cpu_tuple_cost + foreignrel->baserestrictcost.per_tuple;
-			run_cost += cpu_per_tuple * foreignrel->tuples;
+			run_cost += cpu_per_tuple * nrows;
 		}
 
 		/*
@@ -2941,6 +2989,10 @@ create_cursor(ForeignScanState *node)
 	appendStringInfo(&buf, "DECLARE c%u CURSOR FOR\n%s",
 					 fsstate->cursor_number, fsstate->query);
 
+	/* Append LIMIT if numTuples were passed-down */
+	if (node->ss.ps.ps_numTuples > 0)
+		appendStringInfo(&buf, " LIMIT %lu", node->ss.ps.ps_numTuples);
+
 	/*
 	 * Notice that we pass NULL for paramTypes, thus forcing the remote server
 	 * to infer types for all parameters.  Since we explicitly cast every
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index aaec132..3cbc97a 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -26,7 +26,6 @@
 #include "nodes/nodeFuncs.h"
 
 static void recompute_limits(LimitState *node);
-static void pass_down_bound(LimitState *node, PlanState *child_node);
 
 
 /* ----------------------------------------------------------------
@@ -232,6 +231,7 @@ static void
 recompute_limits(LimitState *node)
 {
 	ExprContext *econtext = node->ps.ps_ExprContext;
+	uint64		tuples_needed;
 	Datum		val;
 	bool		isNull;
 
@@ -293,64 +293,8 @@ recompute_limits(LimitState *node)
 	/* Set state-machine state */
 	node->lstate = LIMIT_RESCAN;
 
-	/* Notify child node about limit, if useful */
-	pass_down_bound(node, outerPlanState(node));
-}
-
-/*
- * If we have a COUNT, and our input is a Sort node, notify it that it can
- * use bounded sort.  Also, if our input is a MergeAppend, we can apply the
- * same bound to any Sorts that are direct children of the MergeAppend,
- * since the MergeAppend surely need read no more than that many tuples from
- * any one input.  We also have to be prepared to look through a Result,
- * since the planner might stick one atop MergeAppend for projection purposes.
- *
- * This is a bit of a kluge, but we don't have any more-abstract way of
- * communicating between the two nodes; and it doesn't seem worth trying
- * to invent one without some more examples of special communication needs.
- *
- * Note: it is the responsibility of nodeSort.c to react properly to
- * changes of these parameters.  If we ever do redesign this, it'd be a
- * good idea to integrate this signaling with the parameter-change mechanism.
- */
-static void
-pass_down_bound(LimitState *node, PlanState *child_node)
-{
-	if (IsA(child_node, SortState))
-	{
-		SortState  *sortState = (SortState *) child_node;
-		int64		tuples_needed = node->count + node->offset;
-
-		/* negative test checks for overflow in sum */
-		if (node->noCount || tuples_needed < 0)
-		{
-			/* make sure flag gets reset if needed upon rescan */
-			sortState->bounded = false;
-		}
-		else
-		{
-			sortState->bounded = true;
-			sortState->bound = tuples_needed;
-		}
-	}
-	else if (IsA(child_node, MergeAppendState))
-	{
-		MergeAppendState *maState = (MergeAppendState *) child_node;
-		int			i;
-
-		for (i = 0; i < maState->ms_nplans; i++)
-			pass_down_bound(node, maState->mergeplans[i]);
-	}
-	else if (IsA(child_node, ResultState))
-	{
-		/*
-		 * If Result supported qual checking, we'd have to punt on seeing a
-		 * qual.  Note that having a resconstantqual is not a showstopper: if
-		 * that fails we're not getting any rows at all.
-		 */
-		if (outerPlanState(child_node))
-			pass_down_bound(node, outerPlanState(child_node));
-	}
+	tuples_needed = (node->noCount ? 0 : node->count + node->offset);
+	outerPlanState(node)->ps_numTuples = tuples_needed;
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 7a20bf0..bd7e0ce 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -174,10 +174,14 @@ ExecMergeAppend(MergeAppendState *node)
 		/*
 		 * First time through: pull the first tuple from each subplan, and set
 		 * up the heap.
+		 * Also, pass down the required number of tuples.
 		 */
 		for (i = 0; i < node->ms_nplans; i++)
 		{
-			node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+			PlanState  *pstate = node->mergeplans[i];
+
+			pstate->ps_numTuples = node->ps.ps_numTuples;
+			node->ms_slots[i] = ExecProcNode(pstate);
 			if (!TupIsNull(node->ms_slots[i]))
 				binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
 		}
diff --git a/src/backend/executor/nodeResult.c b/src/backend/executor/nodeResult.c
index b5b50b2..be94b42 100644
--- a/src/backend/executor/nodeResult.c
+++ b/src/backend/executor/nodeResult.c
@@ -47,6 +47,7 @@
 
 #include "executor/executor.h"
 #include "executor/nodeResult.h"
+#include "nodes/nodeFuncs.h"
 #include "utils/memutils.h"
 
 
@@ -73,6 +74,28 @@ ExecResult(ResultState *node)
 	econtext = node->ps.ps_ExprContext;
 
 	/*
+	 * Pass down the number of required tuples by the upper node
+	 *
+	 * An extra consideration here is that if the Result is projecting a
+	 * targetlist that contains any SRFs, we can't assume that every input
+	 * tuple generates an output tuple, so a Sort underneath might need to
+	 * return more than N tuples to satisfy LIMIT N. So we cannot use
+	 * bounded sort.
+	 *
+	 * If Result supported qual checking, we'd have to punt on seeing a
+	 * qual, too.  Note that having a resconstantqual is not a
+	 * showstopper: if that fails we're not getting any rows at all.
+	 */
+	if (!node->rs_started)
+	{
+		if (outerPlanState(node) &&
+			!expression_returns_set((Node *) node->ps.plan->targetlist))
+			outerPlanState(node)->ps_numTuples = node->ps.ps_numTuples;
+
+		node->rs_started = true;
+	}
+
+	/*
 	 * check constant qualifications like (2 > 1), if not already done
 	 */
 	if (node->rs_checkqual)
@@ -191,6 +214,7 @@ ExecInitResult(Result *node, EState *estate, int eflags)
 	resstate->ps.plan = (Plan *) node;
 	resstate->ps.state = estate;
 
+	resstate->rs_started = false;
 	resstate->rs_done = false;
 	resstate->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
 
@@ -265,6 +289,7 @@ ExecEndResult(ResultState *node)
 void
 ExecReScanResult(ResultState *node)
 {
+	node->rs_started = false;
 	node->rs_done = false;
 	node->rs_checkqual = (node->resconstantqual == NULL) ? false : true;
 
diff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c
index 591a31a..b6b72f1 100644
--- a/src/backend/executor/nodeSort.c
+++ b/src/backend/executor/nodeSort.c
@@ -66,6 +66,16 @@ ExecSort(SortState *node)
 
 		SO1_printf("ExecSort: %s\n",
 				   "sorting subplan");
+		/*
+		 * Check bounds according to the required number of tuples
+		 */
+		if (node->ss.ps.ps_numTuples == 0)
+			node->bounded = false;
+		else
+		{
+			node->bounded = true;
+			node->bound = node->ss.ps.ps_numTuples;
+		}
 
 		/*
 		 * Want to scan subplan in the forward direction while creating the
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index f856f60..19630a2 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1079,6 +1079,8 @@ typedef struct PlanState
 	TupleTableSlot *ps_ResultTupleSlot; /* slot for my result tuples */
 	ExprContext *ps_ExprContext;	/* node's expression-evaluation context */
 	ProjectionInfo *ps_ProjInfo;	/* info for doing tuple projection */
+	uint64		ps_numTuples;	/* number of tuples required by the upper
+								 * node if any. 0 means all the tuples */
 } PlanState;
 
 /* ----------------
@@ -1127,6 +1129,7 @@ typedef struct ResultState
 {
 	PlanState	ps;				/* its first field is NodeTag */
 	ExprState  *resconstantqual;
+	bool		rs_started;		/* have we started yet? */
 	bool		rs_done;		/* are we done? */
 	bool		rs_checkqual;	/* do we need to check the qual? */
 } ResultState;
#12Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Kouhei Kaigai (#11)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Hi,

I have reviewed this patch further and here are my comments:

1.
Will it be better to use compare_pathkeys() instead of
equal(root->query_pathkeys, pathkeys)?

2.
I think it will be good if we add a simple test-case in postgres_fdw.sql
which shows that LIMIT is passed to remote server. We might convert one of
the affected EXPLAIN to EXPLAIN ANALYZE.
With this patch, we do push LIMIT on foreign server but not at planning
time.
Thus we still show remote query in EXPLAIN output without LIMIT clause but
we
actually push that at execution time. Such explain plan confuses the reader.
So I think we better convert them to EXPLAIN ANALYZE or put some comments
before the query explaining this. I prefer to put comments as EXPLAIN
ANALYZE
is costly.

3.
"-- CROSS JOIN, not pushed down"
Above comment need to be changed now.
As you explained already, due to limit clause, CROSS-JOIN is pushed down to
remote server, thus need to update this comment.

Rest of the changes look good to me.

Thanks

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

#13David Steele
david@pgmasters.net
In reply to: Jeevan Chalke (#12)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Hi,

On 3/13/17 3:25 AM, Jeevan Chalke wrote:

I have reviewed this patch further and here are my comments:

This thread has been idle for over a week. Please respond and/or post a
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".

Thanks,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14David Steele
david@pgmasters.net
In reply to: David Steele (#13)
Re: PassDownLimitBound for ForeignScan/CustomScan [take-2]

Hi Kaigai,

On 3/21/17 1:11 PM, David Steele wrote:

On 3/13/17 3:25 AM, Jeevan Chalke wrote:

I have reviewed this patch further and here are my comments:

This thread has been idle for over a week. Please respond and/or post a
new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be
marked "Returned with Feedback".

This submission has been marked "Returned with Feedback". Please feel
free to resubmit to a future commitfest.

Regards,
--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers