[POC] FETCH limited by bytes.

Started by Kyotaro HORIGUCHIalmost 11 years ago36 messages

horiguchi.kyotaro@lab.ntt.co.jp

almost 11 years ago

1 attachment(s)

Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.

Is such a feature and syntax could be allowed to be added?

Postgres_fdw fetches tuples from remote servers using cursor. The
transfer gets faster as the number of fetch decreases. On the
other hand buffer size for the fetched tuples widely varies
according to their average length. 100 tuples per fetch is quite
small for short tuples but larger fetch size will easily cause
memory exhaustion. However, there's no way to know it in advance.

One means to settle the contradiction would be a FETCH which
sends result limiting by size, not the number of tuples. So I'd
like to propose this.

This patch is a POC for the feature. For exapmle,

FETCH 10000 LIMIT 1000000 FROM c1;

This FETCH retrieves up to 10000 tuples but cut out just after
the total tuple length exceeds 1MB. (It does not literally
"LIMIT" in that sense)

The syntax added by this patch is described as following.

FETCH [FORWARD|BACKWARD] <ALL|SignedIconst> LIMIT Iconst [FROM|IN] curname

The "data size" to be compared with the LIMIT size is the
summation of the result of the following expression. The
appropriateness of it should be arguable.

[if tupleslot has tts_tuple]
HEAPTUPLESIZE + slot->tts_tuple->t_len
[else]
HEAPTUPLESIZE +
heap_compute_data_size(slot->tts_tupleDescriptor,
slot->tts_values,
slot->tts_isnull);

========================

This patch does following changes,

- This patch adds the parameter "size" to following functions
(standard_)ExecutorRun / ExecutePlan / RunFromStore
PortalRun / PortalRunSelect / PortalRunFetch / DoPortalRunFetch

- The core is in StandardExecutorRun and RunFromStore. Simplly
sum up the sent tuple length and compare against the given
limit.

- struct FetchStmt and EState has new member.

- The modifications in gram.y affects on ecpg parser. I think I
could fix them but with no confidence :(

- Modified the corespondence parts of the changes above in
auto_explain and pg_stat_statments only in parameter list.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Size-limitation-feature-of-FETCH-v0.patchtext/x-patch; charset=us-asciiDownload

>From 6f1dd6998ba312c3552f137365e3a3118b7935be Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Wed, 21 Jan 2015 17:18:09 +0900
Subject: [PATCH] Size limitation feature of FETCH v0

---
 contrib/auto_explain/auto_explain.c             |  8 +--
 contrib/pg_stat_statements/pg_stat_statements.c |  8 +--
 src/backend/commands/copy.c                     |  2 +-
 src/backend/commands/createas.c                 |  2 +-
 src/backend/commands/explain.c                  |  2 +-
 src/backend/commands/extension.c                |  2 +-
 src/backend/commands/matview.c                  |  2 +-
 src/backend/commands/portalcmds.c               |  3 +-
 src/backend/commands/prepare.c                  |  2 +-
 src/backend/executor/execMain.c                 | 35 +++++++--
 src/backend/executor/execUtils.c                |  1 +
 src/backend/executor/functions.c                |  2 +-
 src/backend/executor/spi.c                      |  4 +-
 src/backend/parser/gram.y                       | 59 +++++++++++++++
 src/backend/tcop/postgres.c                     |  2 +
 src/backend/tcop/pquery.c                       | 95 +++++++++++++++++--------
 src/include/executor/executor.h                 |  8 +--
 src/include/nodes/execnodes.h                   |  1 +
 src/include/nodes/parsenodes.h                  |  1 +
 src/include/tcop/pquery.h                       |  3 +-
 src/interfaces/ecpg/preproc/Makefile            |  2 +-
 src/interfaces/ecpg/preproc/ecpg.addons         | 63 ++++++++++++++++
 22 files changed, 248 insertions(+), 59 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 2a184ed..f121a33 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -57,7 +57,7 @@ void		_PG_fini(void);
 static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
 static void explain_ExecutorRun(QueryDesc *queryDesc,
 					ScanDirection direction,
-					long count);
+					long count, long size);
 static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
@@ -232,15 +232,15 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
  * ExecutorRun hook: all we need do is track nesting depth
  */
 static void
-explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
+explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count, long size)
 {
 	nesting_level++;
 	PG_TRY();
 	{
 		if (prev_ExecutorRun)
-			prev_ExecutorRun(queryDesc, direction, count);
+			prev_ExecutorRun(queryDesc, direction, count, size);
 		else
-			standard_ExecutorRun(queryDesc, direction, count);
+			standard_ExecutorRun(queryDesc, direction, count, size);
 		nesting_level--;
 	}
 	PG_CATCH();
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 2629bfc..a68c11d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -282,7 +282,7 @@ static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
 static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
 static void pgss_ExecutorRun(QueryDesc *queryDesc,
 				 ScanDirection direction,
-				 long count);
+				 long count, long size);
 static void pgss_ExecutorFinish(QueryDesc *queryDesc);
 static void pgss_ExecutorEnd(QueryDesc *queryDesc);
 static void pgss_ProcessUtility(Node *parsetree, const char *queryString,
@@ -863,15 +863,15 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
  * ExecutorRun hook: all we need do is track nesting depth
  */
 static void
-pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
+pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count, long size)
 {
 	nested_level++;
 	PG_TRY();
 	{
 		if (prev_ExecutorRun)
-			prev_ExecutorRun(queryDesc, direction, count);
+			prev_ExecutorRun(queryDesc, direction, count, size);
 		else
-			standard_ExecutorRun(queryDesc, direction, count);
+			standard_ExecutorRun(queryDesc, direction, count, size);
 		nested_level--;
 	}
 	PG_CATCH();
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0e604b7..b6e6523 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1915,7 +1915,7 @@ CopyTo(CopyState cstate)
 	else
 	{
 		/* run the plan --- the dest receiver will send tuples */
-		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L);
+		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, 0L);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index abc0fe8..c5c4478 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -192,7 +192,7 @@ ExecCreateTableAs(CreateTableAsStmt *stmt, const char *queryString,
 		dir = ForwardScanDirection;
 
 	/* run the plan */
-	ExecutorRun(queryDesc, dir, 0L);
+	ExecutorRun(queryDesc, dir, 0L, 0L);
 
 	/* save the rowcount if we're given a completionTag to fill */
 	if (completionTag)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7cfc9bb..2c23e9b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -489,7 +489,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			dir = ForwardScanDirection;
 
 		/* run the plan */
-		ExecutorRun(queryDesc, dir, 0L);
+		ExecutorRun(queryDesc, dir, 0L, 0L);
 
 		/* run cleanup too */
 		ExecutorFinish(queryDesc);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3b95552..f624567 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -736,7 +736,7 @@ execute_sql_string(const char *sql, const char *filename)
 										dest, NULL, 0);
 
 				ExecutorStart(qdesc, 0);
-				ExecutorRun(qdesc, ForwardScanDirection, 0);
+				ExecutorRun(qdesc, ForwardScanDirection, 0, 0);
 				ExecutorFinish(qdesc);
 				ExecutorEnd(qdesc);
 
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 74415b8..6530ecb 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -360,7 +360,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	ExecutorStart(queryDesc, EXEC_FLAG_WITHOUT_OIDS);
 
 	/* run the plan */
-	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
+	ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
 
 	/* and clean up */
 	ExecutorFinish(queryDesc);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 2794537..255c86e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -177,6 +177,7 @@ PerformPortalFetch(FetchStmt *stmt,
 	nprocessed = PortalRunFetch(portal,
 								stmt->direction,
 								stmt->howMany,
+								stmt->howLarge,
 								dest);
 
 	/* Return command status if wanted */
@@ -375,7 +376,7 @@ PersistHoldablePortal(Portal portal)
 										true);
 
 		/* Fetch the result set into the tuplestore */
-		ExecutorRun(queryDesc, ForwardScanDirection, 0L);
+		ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
 
 		(*queryDesc->dest->rDestroy) (queryDesc->dest);
 		queryDesc->dest = NULL;
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 71b08f0..31799f5 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -291,7 +291,7 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause,
 	 */
 	PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
 
-	(void) PortalRun(portal, count, false, dest, dest, completionTag);
+	(void) PortalRun(portal, count, 0L, false, dest, dest, completionTag);
 
 	PortalDrop(portal, false);
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b9f21c5..9cecc1d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -78,6 +78,7 @@ static void ExecutePlan(EState *estate, PlanState *planstate,
 			CmdType operation,
 			bool sendTuples,
 			long numberTuples,
+			long sizeTuples,
 			ScanDirection direction,
 			DestReceiver *dest);
 static bool ExecCheckRTEPerms(RangeTblEntry *rte);
@@ -248,17 +249,17 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
  */
 void
 ExecutorRun(QueryDesc *queryDesc,
-			ScanDirection direction, long count)
+			ScanDirection direction, long count, long size)
 {
 	if (ExecutorRun_hook)
-		(*ExecutorRun_hook) (queryDesc, direction, count);
+		(*ExecutorRun_hook) (queryDesc, direction, count, size);
 	else
-		standard_ExecutorRun(queryDesc, direction, count);
+		standard_ExecutorRun(queryDesc, direction, count, size);
 }
 
 void
 standard_ExecutorRun(QueryDesc *queryDesc,
-					 ScanDirection direction, long count)
+					 ScanDirection direction, long count, long size)
 {
 	EState	   *estate;
 	CmdType		operation;
@@ -310,6 +311,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
 					operation,
 					sendTuples,
 					count,
+					size,
 					direction,
 					dest);
 
@@ -1450,22 +1452,26 @@ ExecutePlan(EState *estate,
 			CmdType operation,
 			bool sendTuples,
 			long numberTuples,
+			long sizeTuples,
 			ScanDirection direction,
 			DestReceiver *dest)
 {
 	TupleTableSlot *slot;
 	long		current_tuple_count;
+	long		sent_size;
 
 	/*
 	 * initialize local variables
 	 */
 	current_tuple_count = 0;
-
+	sent_size = 0;
 	/*
 	 * Set the direction.
 	 */
 	estate->es_direction = direction;
 
+	estate->es_stoppedbysize = false;
+
 	/*
 	 * Loop until we've processed the proper number of tuples from the plan.
 	 */
@@ -1520,6 +1526,25 @@ ExecutePlan(EState *estate,
 		current_tuple_count++;
 		if (numberTuples && numberTuples == current_tuple_count)
 			break;
+
+		/* Count the size of tuples we've sent */
+		if (slot->tts_tuple)
+			sent_size += HEAPTUPLESIZE + slot->tts_tuple->t_len;
+		else
+		{
+			sent_size += HEAPTUPLESIZE +
+				heap_compute_data_size(slot->tts_tupleDescriptor,
+									   slot->tts_values,
+									   slot->tts_isnull);
+		}
+
+		/* Quit when the size limit will be exceeded by this tuple */
+		if (sizeTuples > 0 && sizeTuples < sent_size)
+		{
+			estate->es_stoppedbysize = true;
+			break;
+		}
+
 	}
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 32697dd..ff2c395 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -133,6 +133,7 @@ CreateExecutorState(void)
 	estate->es_rowMarks = NIL;
 
 	estate->es_processed = 0;
+	estate->es_stoppedbysize = false;
 	estate->es_lastoid = InvalidOid;
 
 	estate->es_top_eflags = 0;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 84be37c..d64e908 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -850,7 +850,7 @@ postquel_getnext(execution_state *es, SQLFunctionCachePtr fcache)
 		/* Run regular commands to completion unless lazyEval */
 		long		count = (es->lazyEval) ? 1L : 0L;
 
-		ExecutorRun(es->qd, ForwardScanDirection, count);
+		ExecutorRun(es->qd, ForwardScanDirection, count, 0L);
 
 		/*
 		 * If we requested run to completion OR there was no tuple returned,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 4b86e91..cb30cfb 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2369,7 +2369,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, long tcount)
 
 	ExecutorStart(queryDesc, eflags);
 
-	ExecutorRun(queryDesc, ForwardScanDirection, tcount);
+	ExecutorRun(queryDesc, ForwardScanDirection, tcount, 0L);
 
 	_SPI_current->processed = queryDesc->estate->es_processed;
 	_SPI_current->lastoid = queryDesc->estate->es_lastoid;
@@ -2447,7 +2447,7 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 	/* Run the cursor */
 	nfetched = PortalRunFetch(portal,
 							  direction,
-							  count,
+							  count, 0L,
 							  dest);
 
 	/*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 36dac29..3a139ed 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -520,6 +520,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <str>		opt_existing_window_name
 %type <boolean> opt_if_not_exists
 
+%type <ival>	opt_fetch_limit
+
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -6021,6 +6023,15 @@ fetch_args:	cursor_name
 					n->howMany = $1;
 					$$ = (Node *)n;
 				}
+			| SignedIconst LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $5;
+					n->direction = FETCH_FORWARD;
+					n->howMany = $1;
+					n->howLarge = $3;
+					$$ = (Node *)n;
+				}
 			| ALL opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6029,6 +6040,15 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
+			| ALL LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $5;
+					n->direction = FETCH_FORWARD;
+					n->howMany = FETCH_ALL;
+					n->howLarge = $3;
+					$$ = (Node *)n;
+				}
 			| FORWARD opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6045,6 +6065,15 @@ fetch_args:	cursor_name
 					n->howMany = $2;
 					$$ = (Node *)n;
 				}
+			| FORWARD SignedIconst LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $6;
+					n->direction = FETCH_FORWARD;
+					n->howMany = $2;
+					n->howLarge = $4;
+					$$ = (Node *)n;
+				}
 			| FORWARD ALL opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6053,6 +6082,15 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
+			| FORWARD ALL LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $6;
+					n->direction = FETCH_FORWARD;
+					n->howMany = FETCH_ALL;
+					n->howLarge = $4;
+					$$ = (Node *)n;
+				}
 			| BACKWARD opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6069,6 +6107,15 @@ fetch_args:	cursor_name
 					n->howMany = $2;
 					$$ = (Node *)n;
 				}
+			| BACKWARD SignedIconst LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $6;
+					n->direction = FETCH_BACKWARD;
+					n->howMany = $2;
+					n->howLarge = $4;
+					$$ = (Node *)n;
+				}
 			| BACKWARD ALL opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6077,6 +6124,15 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
+			| BACKWARD ALL LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $6;
+					n->direction = FETCH_BACKWARD;
+					n->howMany = FETCH_ALL;
+					n->howLarge = $4;
+					$$ = (Node *)n;
+				}
 		;
 
 from_in:	FROM									{}
@@ -6087,6 +6143,9 @@ opt_from_in:	from_in								{}
 			| /* EMPTY */							{}
 		;
 
+opt_fetch_limit:	LIMIT Iconst					{ $$ = $2;}
+			| /* EMPTY */							{ $$ = 0; }
+		;
 
 /*****************************************************************************
  *
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8f74353..55f062b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1043,6 +1043,7 @@ exec_simple_query(const char *query_string)
 		 */
 		(void) PortalRun(portal,
 						 FETCH_ALL,
+						 0,
 						 isTopLevel,
 						 receiver,
 						 receiver,
@@ -1928,6 +1929,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 
 	completed = PortalRun(portal,
 						  max_rows,
+						  0,
 						  true, /* always top level */
 						  receiver,
 						  receiver,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9c14e8a..5f68cc7 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -16,6 +16,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/htup_details.h"
 #include "commands/prepare.h"
 #include "executor/tstoreReceiver.h"
 #include "miscadmin.h"
@@ -39,9 +40,10 @@ static void ProcessQuery(PlannedStmt *plan,
 			 DestReceiver *dest,
 			 char *completionTag);
 static void FillPortalStore(Portal portal, bool isTopLevel);
-static uint32 RunFromStore(Portal portal, ScanDirection direction, long count,
+static uint32 RunFromStore(Portal portal, ScanDirection direction,
+		     long count, long size, bool *stoppedbysize,
 			 DestReceiver *dest);
-static long PortalRunSelect(Portal portal, bool forward, long count,
+static long PortalRunSelect(Portal portal, bool forward, long count, long size,
 				DestReceiver *dest);
 static void PortalRunUtility(Portal portal, Node *utilityStmt, bool isTopLevel,
 				 DestReceiver *dest, char *completionTag);
@@ -51,6 +53,7 @@ static void PortalRunMulti(Portal portal, bool isTopLevel,
 static long DoPortalRunFetch(Portal portal,
 				 FetchDirection fdirection,
 				 long count,
+				 long size,
 				 DestReceiver *dest);
 static void DoPortalRewind(Portal portal);
 
@@ -182,7 +185,7 @@ ProcessQuery(PlannedStmt *plan,
 	/*
 	 * Run the plan to completion.
 	 */
-	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
+	ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
 
 	/*
 	 * Build command completion status string, if caller wants one.
@@ -703,7 +706,7 @@ PortalSetResultFormat(Portal portal, int nFormats, int16 *formats)
  * suspended due to exhaustion of the count parameter.
  */
 bool
-PortalRun(Portal portal, long count, bool isTopLevel,
+PortalRun(Portal portal, long count, long size, bool isTopLevel,
 		  DestReceiver *dest, DestReceiver *altdest,
 		  char *completionTag)
 {
@@ -787,7 +790,7 @@ PortalRun(Portal portal, long count, bool isTopLevel,
 				/*
 				 * Now fetch desired portion of results.
 				 */
-				nprocessed = PortalRunSelect(portal, true, count, dest);
+				nprocessed = PortalRunSelect(portal, true, count, size, dest);
 
 				/*
 				 * If the portal result contains a command tag and the caller
@@ -892,11 +895,13 @@ static long
 PortalRunSelect(Portal portal,
 				bool forward,
 				long count,
+				long size,
 				DestReceiver *dest)
 {
 	QueryDesc  *queryDesc;
 	ScanDirection direction;
 	uint32		nprocessed;
+	bool		stoppedbysize;
 
 	/*
 	 * NB: queryDesc will be NULL if we are fetching from a held cursor or a
@@ -939,12 +944,14 @@ PortalRunSelect(Portal portal,
 			count = 0;
 
 		if (portal->holdStore)
-			nprocessed = RunFromStore(portal, direction, count, dest);
+			nprocessed = RunFromStore(portal, direction, count,
+									  size, &stoppedbysize, dest);
 		else
 		{
 			PushActiveSnapshot(queryDesc->snapshot);
-			ExecutorRun(queryDesc, direction, count);
+			ExecutorRun(queryDesc, direction, count, size);
 			nprocessed = queryDesc->estate->es_processed;
+			stoppedbysize = queryDesc->estate->es_stoppedbysize;
 			PopActiveSnapshot();
 		}
 
@@ -954,8 +961,9 @@ PortalRunSelect(Portal portal,
 
 			if (nprocessed > 0)
 				portal->atStart = false;		/* OK to go backward now */
-			if (count == 0 ||
-				(unsigned long) nprocessed < (unsigned long) count)
+			if ((count == 0 ||
+				 (unsigned long) nprocessed < (unsigned long) count) &&
+				!stoppedbysize)
 				portal->atEnd = true;	/* we retrieved 'em all */
 			oldPos = portal->portalPos;
 			portal->portalPos += nprocessed;
@@ -982,12 +990,14 @@ PortalRunSelect(Portal portal,
 			count = 0;
 
 		if (portal->holdStore)
-			nprocessed = RunFromStore(portal, direction, count, dest);
+			nprocessed = RunFromStore(portal, direction, count,
+									  size, &stoppedbysize, dest);
 		else
 		{
 			PushActiveSnapshot(queryDesc->snapshot);
-			ExecutorRun(queryDesc, direction, count);
+			ExecutorRun(queryDesc, direction, count, size);
 			nprocessed = queryDesc->estate->es_processed;
+			stoppedbysize = queryDesc->estate->es_stoppedbysize;
 			PopActiveSnapshot();
 		}
 
@@ -998,8 +1008,9 @@ PortalRunSelect(Portal portal,
 				portal->atEnd = false;	/* OK to go forward now */
 				portal->portalPos++;	/* adjust for endpoint case */
 			}
-			if (count == 0 ||
-				(unsigned long) nprocessed < (unsigned long) count)
+			if ((count == 0 ||
+				 (unsigned long) nprocessed < (unsigned long) count) &&
+				!stoppedbysize)
 			{
 				portal->atStart = true; /* we retrieved 'em all */
 				portal->portalPos = 0;
@@ -1089,10 +1100,13 @@ FillPortalStore(Portal portal, bool isTopLevel)
  */
 static uint32
 RunFromStore(Portal portal, ScanDirection direction, long count,
-			 DestReceiver *dest)
+			 long size_limit, bool *stoppedbysize, DestReceiver *dest)
 {
 	long		current_tuple_count = 0;
 	TupleTableSlot *slot;
+	long			sent_size = 0;
+
+	*stoppedbysize = false;
 
 	slot = MakeSingleTupleTableSlot(portal->tupDesc);
 
@@ -1133,6 +1147,25 @@ RunFromStore(Portal portal, ScanDirection direction, long count,
 			current_tuple_count++;
 			if (count && count == current_tuple_count)
 				break;
+
+			/* Count the size of tuples we've sent */
+			if (slot->tts_tuple)
+				sent_size += HEAPTUPLESIZE + slot->tts_tuple->t_len;
+			else
+			{
+				sent_size += HEAPTUPLESIZE +
+					heap_compute_data_size(slot->tts_tupleDescriptor,
+										   slot->tts_values,
+										   slot->tts_isnull);
+			}
+
+			/* Quit when the size limit will be exceeded by this tuple */
+			if (current_tuple_count > 0 &&
+				size_limit > 0 && size_limit < sent_size)
+			{
+				*stoppedbysize = true;
+				break;
+			}
 		}
 	}
 
@@ -1385,6 +1418,7 @@ long
 PortalRunFetch(Portal portal,
 			   FetchDirection fdirection,
 			   long count,
+			   long size,
 			   DestReceiver *dest)
 {
 	long		result;
@@ -1422,7 +1456,7 @@ PortalRunFetch(Portal portal,
 		switch (portal->strategy)
 		{
 			case PORTAL_ONE_SELECT:
-				result = DoPortalRunFetch(portal, fdirection, count, dest);
+				result = DoPortalRunFetch(portal, fdirection, count, size, dest);
 				break;
 
 			case PORTAL_ONE_RETURNING:
@@ -1439,7 +1473,7 @@ PortalRunFetch(Portal portal,
 				/*
 				 * Now fetch desired portion of results.
 				 */
-				result = DoPortalRunFetch(portal, fdirection, count, dest);
+				result = DoPortalRunFetch(portal, fdirection, count, size, dest);
 				break;
 
 			default:
@@ -1484,6 +1518,7 @@ static long
 DoPortalRunFetch(Portal portal,
 				 FetchDirection fdirection,
 				 long count,
+				 long size,
 				 DestReceiver *dest)
 {
 	bool		forward;
@@ -1526,7 +1561,7 @@ DoPortalRunFetch(Portal portal,
 				{
 					DoPortalRewind(portal);
 					if (count > 1)
-						PortalRunSelect(portal, true, count - 1,
+						PortalRunSelect(portal, true, count - 1, 0L,
 										None_Receiver);
 				}
 				else
@@ -1536,13 +1571,13 @@ DoPortalRunFetch(Portal portal,
 					if (portal->atEnd)
 						pos++;	/* need one extra fetch if off end */
 					if (count <= pos)
-						PortalRunSelect(portal, false, pos - count + 1,
+						PortalRunSelect(portal, false, pos - count + 1, 0L,
 										None_Receiver);
 					else if (count > pos + 1)
-						PortalRunSelect(portal, true, count - pos - 1,
+						PortalRunSelect(portal, true, count - pos - 1, 0L,
 										None_Receiver);
 				}
-				return PortalRunSelect(portal, true, 1L, dest);
+				return PortalRunSelect(portal, true, 1L, 0L, dest);
 			}
 			else if (count < 0)
 			{
@@ -1553,17 +1588,17 @@ DoPortalRunFetch(Portal portal,
 				 * (Is it worth considering case where count > half of size of
 				 * query?  We could rewind once we know the size ...)
 				 */
-				PortalRunSelect(portal, true, FETCH_ALL, None_Receiver);
+				PortalRunSelect(portal, true, FETCH_ALL, 0L, None_Receiver);
 				if (count < -1)
-					PortalRunSelect(portal, false, -count - 1, None_Receiver);
-				return PortalRunSelect(portal, false, 1L, dest);
+					PortalRunSelect(portal, false, -count - 1, 0, None_Receiver);
+				return PortalRunSelect(portal, false, 1L, 0L, dest);
 			}
 			else
 			{
 				/* count == 0 */
 				/* Rewind to start, return zero rows */
 				DoPortalRewind(portal);
-				return PortalRunSelect(portal, true, 0L, dest);
+				return PortalRunSelect(portal, true, 0L, 0L, dest);
 			}
 			break;
 		case FETCH_RELATIVE:
@@ -1573,8 +1608,8 @@ DoPortalRunFetch(Portal portal,
 				 * Definition: advance count-1 rows, return next row (if any).
 				 */
 				if (count > 1)
-					PortalRunSelect(portal, true, count - 1, None_Receiver);
-				return PortalRunSelect(portal, true, 1L, dest);
+					PortalRunSelect(portal, true, count - 1, 0L, None_Receiver);
+				return PortalRunSelect(portal, true, 1L, 0L, dest);
 			}
 			else if (count < 0)
 			{
@@ -1583,8 +1618,8 @@ DoPortalRunFetch(Portal portal,
 				 * any).
 				 */
 				if (count < -1)
-					PortalRunSelect(portal, false, -count - 1, None_Receiver);
-				return PortalRunSelect(portal, false, 1L, dest);
+					PortalRunSelect(portal, false, -count - 1, 0L, None_Receiver);
+				return PortalRunSelect(portal, false, 1L, 0L, dest);
 			}
 			else
 			{
@@ -1630,7 +1665,7 @@ DoPortalRunFetch(Portal portal,
 			 */
 			if (on_row)
 			{
-				PortalRunSelect(portal, false, 1L, None_Receiver);
+				PortalRunSelect(portal, false, 1L, 0L, None_Receiver);
 				/* Set up to fetch one row forward */
 				count = 1;
 				forward = true;
@@ -1652,7 +1687,7 @@ DoPortalRunFetch(Portal portal,
 		return result;
 	}
 
-	return PortalRunSelect(portal, forward, count, dest);
+	return PortalRunSelect(portal, forward, count, size, dest);
 }
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 40fde83..64a02c3 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -80,8 +80,8 @@ extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
 
 /* Hook for plugins to get control in ExecutorRun() */
 typedef void (*ExecutorRun_hook_type) (QueryDesc *queryDesc,
-												   ScanDirection direction,
-												   long count);
+									   ScanDirection direction,
+									   long count, long size);
 extern PGDLLIMPORT ExecutorRun_hook_type ExecutorRun_hook;
 
 /* Hook for plugins to get control in ExecutorFinish() */
@@ -176,9 +176,9 @@ extern TupleTableSlot *ExecFilterJunk(JunkFilter *junkfilter,
 extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
 extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
 extern void ExecutorRun(QueryDesc *queryDesc,
-			ScanDirection direction, long count);
+			ScanDirection direction, long count, long size);
 extern void standard_ExecutorRun(QueryDesc *queryDesc,
-					 ScanDirection direction, long count);
+		    ScanDirection direction, long count, long size);
 extern void ExecutorFinish(QueryDesc *queryDesc);
 extern void standard_ExecutorFinish(QueryDesc *queryDesc);
 extern void ExecutorEnd(QueryDesc *queryDesc);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41288ed..d963286 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -376,6 +376,7 @@ typedef struct EState
 	List	   *es_rowMarks;	/* List of ExecRowMarks */
 
 	uint32		es_processed;	/* # of tuples processed */
+	bool		es_stoppedbysize; /* true if processing stopped by size */
 	Oid			es_lastoid;		/* last oid processed (by INSERT) */
 
 	int			es_top_eflags;	/* eflags passed to ExecutorStart */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..9e18331 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2223,6 +2223,7 @@ typedef struct FetchStmt
 	NodeTag		type;
 	FetchDirection direction;	/* see above */
 	long		howMany;		/* number of rows, or position argument */
+	long		howLarge;		/* total bytes of rows */
 	char	   *portalname;		/* name of portal (cursor) */
 	bool		ismove;			/* TRUE if MOVE */
 } FetchStmt;
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index 8073a6e..afffe86 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -33,13 +33,14 @@ extern void PortalStart(Portal portal, ParamListInfo params,
 extern void PortalSetResultFormat(Portal portal, int nFormats,
 					  int16 *formats);
 
-extern bool PortalRun(Portal portal, long count, bool isTopLevel,
+extern bool PortalRun(Portal portal, long count, long size, bool isTopLevel,
 		  DestReceiver *dest, DestReceiver *altdest,
 		  char *completionTag);
 
 extern long PortalRunFetch(Portal portal,
 			   FetchDirection fdirection,
 			   long count,
+			   long size,
 			   DestReceiver *dest);
 
 #endif   /* PQUERY_H */
diff --git a/src/interfaces/ecpg/preproc/Makefile b/src/interfaces/ecpg/preproc/Makefile
index 1ecc405..b492fa7 100644
--- a/src/interfaces/ecpg/preproc/Makefile
+++ b/src/interfaces/ecpg/preproc/Makefile
@@ -48,7 +48,7 @@ ecpg: $(OBJS) | submake-libpgport
 preproc.o: pgc.c
 
 preproc.h: preproc.c ;
-preproc.c: BISONFLAGS += -d
+preproc.c: BISONFLAGS += -r all -d
 
 preproc.y: ../../../backend/parser/gram.y parse.pl ecpg.addons ecpg.header ecpg.tokens ecpg.trailer ecpg.type
 	$(PERL) $(srcdir)/parse.pl $(srcdir) < $< > $@
diff --git a/src/interfaces/ecpg/preproc/ecpg.addons b/src/interfaces/ecpg/preproc/ecpg.addons
index b3b36cf..bdccb68 100644
--- a/src/interfaces/ecpg/preproc/ecpg.addons
+++ b/src/interfaces/ecpg/preproc/ecpg.addons
@@ -220,13 +220,46 @@ ECPG: fetch_argsNEXTopt_from_incursor_name addon
 ECPG: fetch_argsPRIORopt_from_incursor_name addon
 ECPG: fetch_argsFIRST_Popt_from_incursor_name addon
 ECPG: fetch_argsLAST_Popt_from_incursor_name addon
+		add_additional_variables($3, false);
+		if ($3[0] == ':')
+		{
+			free($3);
+			$3 = mm_strdup("$0");
+		}
 ECPG: fetch_argsALLopt_from_incursor_name addon
+ECPG: fetch_argsFORWARDopt_from_incursor_name addon
+ECPG: fetch_argsBACKWARDopt_from_incursor_name addon
 		add_additional_variables($3, false);
 		if ($3[0] == ':')
 		{
 			free($3);
 			$3 = mm_strdup("$0");
 		}
+ECPG: fetch_argsALLLIMITIconstopt_from_incursor_name addon
+		add_additional_variables($5, false);
+		if ($5[0] == ':')
+		{
+			free($5);
+			$5 = mm_strdup("$0");
+		}
+		if ($3[0] == '$')
+		{
+			free($3);
+			$3 = mm_strdup("$0");
+		}
+ECPG: fetch_argsFORWARDALLLIMITIconstopt_from_incursor_name addon
+ECPG: fetch_argsBACKWARDALLLIMITIconstopt_from_incursor_name addon
+		add_additional_variables($6, false);
+		if ($6[0] == ':')
+		{
+			free($6);
+			$6 = mm_strdup("$0");
+		}
+		if ($4[0] == '$')
+		{
+			free($4);
+			$4 = mm_strdup("$0");
+		}
 ECPG: fetch_argsSignedIconstopt_from_incursor_name addon
 		add_additional_variables($3, false);
 		if ($3[0] == ':')
@@ -234,11 +267,41 @@ ECPG: fetch_argsSignedIconstopt_from_incursor_name addon
 			free($3);
 			$3 = mm_strdup("$0");
 		}
+ECPG: fetch_argsSignedIconstLIMITIconstopt_from_incursor_name addon
+		add_additional_variables($5, false);
+		if ($5[0] == ':')
+		{
+			free($5);
+			$5 = mm_strdup("$0");
+		}
 		if ($1[0] == '$')
 		{
 			free($1);
 			$1 = mm_strdup("$0");
 		}
+		if ($3[0] == '$')
+		{
+			free($3);
+			$3 = mm_strdup("$0");
+		}
+ECPG: fetch_argsFORWARDSignedIconstLIMITIconstopt_from_incursor_name addon
+ECPG: fetch_argsBACKWARDSignedIconstLIMITIconstopt_from_incursor_name addon
+		add_additional_variables($6, false);
+		if ($6[0] == ':')
+		{
+			free($6);
+			$6 = mm_strdup("$0");
+		}
+		if ($2[0] == '$')
+		{
+			free($2);
+			$2 = mm_strdup("$0");
+		}
+		if ($4[0] == '$')
+		{
+			free($4);
+			$4 = mm_strdup("$0");
+		}
 ECPG: fetch_argsFORWARDALLopt_from_incursor_name addon
 ECPG: fetch_argsBACKWARDALLopt_from_incursor_name addon
 		add_additional_variables($4, false);
-- 
2.1.0.GIT

Tom Lane

tgl@sss.pgh.pa.us

almost 11 years ago

In reply to: Kyotaro HORIGUCHI (#1)

Re: [POC] FETCH limited by bytes.

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.

Is such a feature and syntax could be allowed to be added?

This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?

Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire. (The difference is not small:
for instance, toasted values would likely still be toasted at the point
where you're measuring.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 11 years ago

In reply to: Tom Lane (#2)

2 attachment(s)

Re: [POC] FETCH limited by bytes.

Thank you for the comment.

The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?

Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <24503.1421943472@sss.pgh.pa.us>

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.

Is such a feature and syntax could be allowed to be added?

This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?

The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.

The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.

This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100.

/messages/by-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp

I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.

=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )

=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )

=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )

The definitions of the environment are the following.

CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1 OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM generate_series(0, 999999) a);

The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.

They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.

Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire. (The difference is not small:
for instance, toasted values would likely still be toasted at the point
where you're measuring.)

Sure. Finally, the attached patch #1 which does the following
things.

- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.

- postgres_fdw calculates fetch limit bytes by the following
formula,

MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);

The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.

fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)

This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(

On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.

MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)

MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)

MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

0001-Size-limitation-feature-of-FETCH-v1.patchtext/x-patch; charset=us-asciiDownload

>From 4e5937d33e92d908462d567fa3264ae11404ecac Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Wed, 21 Jan 2015 17:18:09 +0900
Subject: [PATCH 1/2] Size limitation feature of FETCH v1

- Row size calculation is based on detoasted size.
---
 contrib/auto_explain/auto_explain.c             |   8 +-
 contrib/pg_stat_statements/pg_stat_statements.c |   8 +-
 contrib/postgres_fdw/postgres_fdw.c             | 103 +++++++++++++++++++-----
 src/backend/access/common/heaptuple.c           |  36 +++++++++
 src/backend/commands/copy.c                     |   2 +-
 src/backend/commands/createas.c                 |   2 +-
 src/backend/commands/explain.c                  |   2 +-
 src/backend/commands/extension.c                |   2 +-
 src/backend/commands/matview.c                  |   2 +-
 src/backend/commands/portalcmds.c               |   3 +-
 src/backend/commands/prepare.c                  |   2 +-
 src/backend/executor/execMain.c                 |  33 ++++++--
 src/backend/executor/execUtils.c                |   1 +
 src/backend/executor/functions.c                |   2 +-
 src/backend/executor/spi.c                      |   4 +-
 src/backend/parser/gram.y                       |  54 +++++++++++++
 src/backend/tcop/postgres.c                     |   2 +
 src/backend/tcop/pquery.c                       |  87 +++++++++++++-------
 src/include/access/htup_details.h               |   2 +
 src/include/executor/executor.h                 |   8 +-
 src/include/nodes/execnodes.h                   |   1 +
 src/include/nodes/parsenodes.h                  |   1 +
 src/include/tcop/pquery.h                       |   3 +-
 src/interfaces/ecpg/preproc/Makefile            |   2 +-
 src/interfaces/ecpg/preproc/ecpg.addons         |  63 +++++++++++++++
 25 files changed, 353 insertions(+), 80 deletions(-)

diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 2a184ed..f121a33 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -57,7 +57,7 @@ void		_PG_fini(void);
 static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
 static void explain_ExecutorRun(QueryDesc *queryDesc,
 					ScanDirection direction,
-					long count);
+					long count, long size);
 static void explain_ExecutorFinish(QueryDesc *queryDesc);
 static void explain_ExecutorEnd(QueryDesc *queryDesc);
 
@@ -232,15 +232,15 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
  * ExecutorRun hook: all we need do is track nesting depth
  */
 static void
-explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
+explain_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count, long size)
 {
 	nesting_level++;
 	PG_TRY();
 	{
 		if (prev_ExecutorRun)
-			prev_ExecutorRun(queryDesc, direction, count);
+			prev_ExecutorRun(queryDesc, direction, count, size);
 		else
-			standard_ExecutorRun(queryDesc, direction, count);
+			standard_ExecutorRun(queryDesc, direction, count, size);
 		nesting_level--;
 	}
 	PG_CATCH();
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 2629bfc..a68c11d 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -282,7 +282,7 @@ static void pgss_post_parse_analyze(ParseState *pstate, Query *query);
 static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
 static void pgss_ExecutorRun(QueryDesc *queryDesc,
 				 ScanDirection direction,
-				 long count);
+				 long count, long size);
 static void pgss_ExecutorFinish(QueryDesc *queryDesc);
 static void pgss_ExecutorEnd(QueryDesc *queryDesc);
 static void pgss_ProcessUtility(Node *parsetree, const char *queryString,
@@ -863,15 +863,15 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
  * ExecutorRun hook: all we need do is track nesting depth
  */
 static void
-pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count)
+pgss_ExecutorRun(QueryDesc *queryDesc, ScanDirection direction, long count, long size)
 {
 	nested_level++;
 	PG_TRY();
 	{
 		if (prev_ExecutorRun)
-			prev_ExecutorRun(queryDesc, direction, count);
+			prev_ExecutorRun(queryDesc, direction, count, size);
 		else
-			standard_ExecutorRun(queryDesc, direction, count);
+			standard_ExecutorRun(queryDesc, direction, count, size);
 		nested_level--;
 	}
 	PG_CATCH();
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d76e739..b3bf27e 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -46,6 +46,11 @@ PG_MODULE_MAGIC;
 /* Default CPU cost to process 1 row (above and beyond cpu_tuple_cost). */
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
+/* Maximum tuples per fetch */
+#define MAX_FETCH_SIZE				30000
+
+/* Maximum memory usable for retrieved data  */
+#define MAX_FETCH_MEM				(2 * 1024 * 1024)
 /*
  * FDW-specific planner information kept in RelOptInfo.fdw_private for a
  * foreign table.  This information is collected by postgresGetForeignRelSize.
@@ -156,6 +161,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	long		max_palloced_mem; /* For test, remove me later */
 } PgFdwScanState;
 
 /*
@@ -321,6 +328,8 @@ static int postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 							  double *totaldeadrows);
 static void analyze_row_processor(PGresult *res, int row,
 					  PgFdwAnalyzeState *astate);
+static Size estimate_tuple_overhead(TupleDesc tupDesc,
+									List *retrieved_attrs);
 static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   int row,
 						   Relation rel,
@@ -1095,6 +1104,7 @@ postgresEndForeignScan(ForeignScanState *node)
 	if (fsstate == NULL)
 		return;
 
+	elog(LOG, "Max memory fo tuple store = %ld", fsstate->max_palloced_mem);
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -2029,12 +2039,18 @@ fetch_more_data(ForeignScanState *node)
 		int			fetch_size;
 		int			numrows;
 		int			i;
+		long		alloc_size = 0;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
-
-		snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
-				 fetch_size, fsstate->cursor_number);
+		fetch_size = MAX_FETCH_MEM -
+			MAX_FETCH_SIZE *
+				estimate_tuple_overhead(fsstate->attinmeta->tupdesc,
+										fsstate->retrieved_attrs);
+
+		snprintf(sql, sizeof(sql), "FETCH %d LIMIT %ld FROM c%u",
+				 MAX_FETCH_SIZE,
+				 fetch_size,
+				 fsstate->cursor_number);
 
 		res = PQexec(conn, sql);
 		/* On error, report the original query, not the FETCH. */
@@ -2043,27 +2059,34 @@ fetch_more_data(ForeignScanState *node)
 
 		/* Convert the data into HeapTuples */
 		numrows = PQntuples(res);
-		fsstate->tuples = (HeapTuple *) palloc0(numrows * sizeof(HeapTuple));
-		fsstate->num_tuples = numrows;
-		fsstate->next_tuple = 0;
-
-		for (i = 0; i < numrows; i++)
+		if (numrows == 0)
+			fsstate->eof_reached;
+		else
 		{
-			fsstate->tuples[i] =
-				make_tuple_from_result_row(res, i,
-										   fsstate->rel,
-										   fsstate->attinmeta,
-										   fsstate->retrieved_attrs,
-										   fsstate->temp_cxt);
-		}
+			alloc_size = numrows * sizeof(HeapTuple);
+			fsstate->tuples = (HeapTuple *) palloc0(alloc_size);
+			fsstate->num_tuples = numrows;
+			fsstate->next_tuple = 0;
 
-		/* Update fetch_ct_2 */
-		if (fsstate->fetch_ct_2 < 2)
-			fsstate->fetch_ct_2++;
+			for (i = 0; i < numrows; i++)
+			{
+				fsstate->tuples[i] =
+					make_tuple_from_result_row(res, i,
+											   fsstate->rel,
+											   fsstate->attinmeta,
+											   fsstate->retrieved_attrs,
+											   fsstate->temp_cxt);
+				alloc_size += fsstate->tuples[i]->t_len;
+			}
 
-		/* Must be EOF if we didn't get as many tuples as we asked for. */
-		fsstate->eof_reached = (numrows < fetch_size);
+			if (alloc_size > fsstate->max_palloced_mem)
+				fsstate->max_palloced_mem = alloc_size;
 
+			/* Update fetch_ct_2 */
+			if (fsstate->fetch_ct_2 < 2)
+				fsstate->fetch_ct_2++;
+		}
+			
 		PQclear(res);
 		res = NULL;
 	}
@@ -2835,6 +2858,44 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 }
 
 /*
+ * Compute the estimated overhead of the result tuples
+ * See heap_form_tuple for the details of this calculation.
+ */
+static Size
+estimate_tuple_overhead(TupleDesc tupDesc,
+						List *retrieved_attrs)
+{
+	Size size = 0;
+	int	 ncol = list_length(retrieved_attrs);
+	int  nadded = 0;
+	ListCell	*lc;
+
+	size += offsetof(HeapTupleHeaderData, t_bits);
+	size += BITMAPLEN(ncol);
+
+	if (tupDesc->tdhasoid)
+		size += sizeof(Oid);
+
+	size = MAXALIGN(size);
+
+	size += sizeof(Datum) * ncol;
+	size += sizeof(bool) * ncol;
+
+	foreach (lc, retrieved_attrs)
+	{
+		int i = lfirst_int(lc);
+
+		if (i > 0)
+		{
+			if (tupDesc->attrs[i - 1]->attbyval)
+				size -= (sizeof(Datum) - tupDesc->attrs[i - 1]->attlen);
+		}
+	}
+
+	return size;
+}
+
+/*
  * Create a tuple from the specified row of the PGresult.
  *
  * rel is the local representation of the foreign table, attinmeta is
diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index 867035d..2a577e5 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -60,6 +60,7 @@
 #include "access/sysattr.h"
 #include "access/tuptoaster.h"
 #include "executor/tuptable.h"
+#include "utils/pg_lzcompress.h"
 
 
 /* Does att's datatype allow packing into the 1-byte-header varlena format? */
@@ -120,6 +121,41 @@ heap_compute_data_size(TupleDesc tupleDesc,
 	return data_length;
 }
 
+Size
+slot_compute_attr_size(TupleTableSlot *slot)
+{
+	TupleDesc	 tupleDesc = slot->tts_tupleDescriptor;
+	Datum		*values = slot->tts_values;
+	bool		*isnull = slot->tts_isnull;
+	int			 nattrs = tupleDesc->natts;
+	int i;
+	Size		 sumattlen = 0;
+
+	if (slot->tts_nvalid < nattrs)
+	{
+		/*  We need all attributes deformed */
+		slot_getallattrs(slot);
+	}
+	for (i = 0 ; i < nattrs ; i++)
+	{
+		Form_pg_attribute thisatt = tupleDesc->attrs[i];
+
+		if (isnull[i]) continue;
+
+		if (thisatt->attbyval)
+			sumattlen += thisatt->attlen;
+		else if (VARATT_IS_COMPRESSED(values[i]))
+		{
+			sumattlen += PGLZ_RAW_SIZE((PGLZ_Header *)values[i]);
+		}
+		else if (VARATT_IS_SHORT(values[i]))
+			sumattlen += VARSIZE_SHORT(values[i]) - VARHDRSZ_SHORT;
+		else
+			sumattlen += VARSIZE(values[i]) - VARHDRSZ;
+	}
+	return sumattlen;
+}
+
 /*
  * heap_fill_tuple
  *		Load data portion of a tuple from values/isnull arrays
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0e604b7..b6e6523 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1915,7 +1915,7 @@ CopyTo(CopyState cstate)
 	else
 	{
 		/* run the plan --- the dest receiver will send tuples */
-		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L);
+		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, 0L);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index abc0fe8..c5c4478 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -192,7 +192,7 @@ ExecCreateTableAs(CreateTableAsStmt *stmt, const char *queryString,
 		dir = ForwardScanDirection;
 
 	/* run the plan */
-	ExecutorRun(queryDesc, dir, 0L);
+	ExecutorRun(queryDesc, dir, 0L, 0L);
 
 	/* save the rowcount if we're given a completionTag to fill */
 	if (completionTag)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7cfc9bb..2c23e9b 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -489,7 +489,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			dir = ForwardScanDirection;
 
 		/* run the plan */
-		ExecutorRun(queryDesc, dir, 0L);
+		ExecutorRun(queryDesc, dir, 0L, 0L);
 
 		/* run cleanup too */
 		ExecutorFinish(queryDesc);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3b95552..f624567 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -736,7 +736,7 @@ execute_sql_string(const char *sql, const char *filename)
 										dest, NULL, 0);
 
 				ExecutorStart(qdesc, 0);
-				ExecutorRun(qdesc, ForwardScanDirection, 0);
+				ExecutorRun(qdesc, ForwardScanDirection, 0, 0);
 				ExecutorFinish(qdesc);
 				ExecutorEnd(qdesc);
 
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 74415b8..6530ecb 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -360,7 +360,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	ExecutorStart(queryDesc, EXEC_FLAG_WITHOUT_OIDS);
 
 	/* run the plan */
-	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
+	ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
 
 	/* and clean up */
 	ExecutorFinish(queryDesc);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 2794537..255c86e 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -177,6 +177,7 @@ PerformPortalFetch(FetchStmt *stmt,
 	nprocessed = PortalRunFetch(portal,
 								stmt->direction,
 								stmt->howMany,
+								stmt->howLarge,
 								dest);
 
 	/* Return command status if wanted */
@@ -375,7 +376,7 @@ PersistHoldablePortal(Portal portal)
 										true);
 
 		/* Fetch the result set into the tuplestore */
-		ExecutorRun(queryDesc, ForwardScanDirection, 0L);
+		ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
 
 		(*queryDesc->dest->rDestroy) (queryDesc->dest);
 		queryDesc->dest = NULL;
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 71b08f0..31799f5 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -291,7 +291,7 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause,
 	 */
 	PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
 
-	(void) PortalRun(portal, count, false, dest, dest, completionTag);
+	(void) PortalRun(portal, count, 0L, false, dest, dest, completionTag);
 
 	PortalDrop(portal, false);
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b9f21c5..d976bf3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -78,6 +78,7 @@ static void ExecutePlan(EState *estate, PlanState *planstate,
 			CmdType operation,
 			bool sendTuples,
 			long numberTuples,
+			long sizeTuples,
 			ScanDirection direction,
 			DestReceiver *dest);
 static bool ExecCheckRTEPerms(RangeTblEntry *rte);
@@ -248,17 +249,17 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
  */
 void
 ExecutorRun(QueryDesc *queryDesc,
-			ScanDirection direction, long count)
+			ScanDirection direction, long count, long size)
 {
 	if (ExecutorRun_hook)
-		(*ExecutorRun_hook) (queryDesc, direction, count);
+		(*ExecutorRun_hook) (queryDesc, direction, count, size);
 	else
-		standard_ExecutorRun(queryDesc, direction, count);
+		standard_ExecutorRun(queryDesc, direction, count, size);
 }
 
 void
 standard_ExecutorRun(QueryDesc *queryDesc,
-					 ScanDirection direction, long count)
+					 ScanDirection direction, long count, long size)
 {
 	EState	   *estate;
 	CmdType		operation;
@@ -310,6 +311,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
 					operation,
 					sendTuples,
 					count,
+					size,
 					direction,
 					dest);
 
@@ -1450,22 +1452,26 @@ ExecutePlan(EState *estate,
 			CmdType operation,
 			bool sendTuples,
 			long numberTuples,
+			long sizeTuples,
 			ScanDirection direction,
 			DestReceiver *dest)
 {
 	TupleTableSlot *slot;
 	long		current_tuple_count;
+	long		sent_size;
 
 	/*
 	 * initialize local variables
 	 */
 	current_tuple_count = 0;
-
+	sent_size = 0;
 	/*
 	 * Set the direction.
 	 */
 	estate->es_direction = direction;
 
+	estate->es_stoppedbysize = false;
+
 	/*
 	 * Loop until we've processed the proper number of tuples from the plan.
 	 */
@@ -1520,6 +1526,23 @@ ExecutePlan(EState *estate,
 		current_tuple_count++;
 		if (numberTuples && numberTuples == current_tuple_count)
 			break;
+
+		if (sizeTuples > 0)
+		{
+			/*
+			 * Count the size of tuples we've sent
+			 *
+			 * This needs all attributes deformed so a bit slow on some cases.
+			 */
+			sent_size += slot_compute_attr_size(slot);
+
+			/* Quit when the size limit will be exceeded by this tuple */
+			if (sizeTuples < sent_size)
+			{
+				estate->es_stoppedbysize = true;
+				break;
+			}
+		}
 	}
 }
 
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 32697dd..ff2c395 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -133,6 +133,7 @@ CreateExecutorState(void)
 	estate->es_rowMarks = NIL;
 
 	estate->es_processed = 0;
+	estate->es_stoppedbysize = false;
 	estate->es_lastoid = InvalidOid;
 
 	estate->es_top_eflags = 0;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 84be37c..d64e908 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -850,7 +850,7 @@ postquel_getnext(execution_state *es, SQLFunctionCachePtr fcache)
 		/* Run regular commands to completion unless lazyEval */
 		long		count = (es->lazyEval) ? 1L : 0L;
 
-		ExecutorRun(es->qd, ForwardScanDirection, count);
+		ExecutorRun(es->qd, ForwardScanDirection, count, 0L);
 
 		/*
 		 * If we requested run to completion OR there was no tuple returned,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 4b86e91..cb30cfb 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2369,7 +2369,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, long tcount)
 
 	ExecutorStart(queryDesc, eflags);
 
-	ExecutorRun(queryDesc, ForwardScanDirection, tcount);
+	ExecutorRun(queryDesc, ForwardScanDirection, tcount, 0L);
 
 	_SPI_current->processed = queryDesc->estate->es_processed;
 	_SPI_current->lastoid = queryDesc->estate->es_lastoid;
@@ -2447,7 +2447,7 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 	/* Run the cursor */
 	nfetched = PortalRunFetch(portal,
 							  direction,
-							  count,
+							  count, 0L,
 							  dest);
 
 	/*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 36dac29..e559d1a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6021,6 +6021,15 @@ fetch_args:	cursor_name
 					n->howMany = $1;
 					$$ = (Node *)n;
 				}
+			| SignedIconst LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $5;
+					n->direction = FETCH_FORWARD;
+					n->howMany = $1;
+					n->howLarge = $3;
+					$$ = (Node *)n;
+				}
 			| ALL opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6029,6 +6038,15 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
+			| ALL LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $5;
+					n->direction = FETCH_FORWARD;
+					n->howMany = FETCH_ALL;
+					n->howLarge = $3;
+					$$ = (Node *)n;
+				}
 			| FORWARD opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6045,6 +6063,15 @@ fetch_args:	cursor_name
 					n->howMany = $2;
 					$$ = (Node *)n;
 				}
+			| FORWARD SignedIconst LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $6;
+					n->direction = FETCH_FORWARD;
+					n->howMany = $2;
+					n->howLarge = $4;
+					$$ = (Node *)n;
+				}
 			| FORWARD ALL opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6053,6 +6080,15 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
+			| FORWARD ALL LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $6;
+					n->direction = FETCH_FORWARD;
+					n->howMany = FETCH_ALL;
+					n->howLarge = $4;
+					$$ = (Node *)n;
+				}
 			| BACKWARD opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6069,6 +6105,15 @@ fetch_args:	cursor_name
 					n->howMany = $2;
 					$$ = (Node *)n;
 				}
+			| BACKWARD SignedIconst LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $6;
+					n->direction = FETCH_BACKWARD;
+					n->howMany = $2;
+					n->howLarge = $4;
+					$$ = (Node *)n;
+				}
 			| BACKWARD ALL opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
@@ -6077,6 +6122,15 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
+			| BACKWARD ALL LIMIT Iconst opt_from_in cursor_name
+				{
+					FetchStmt *n = makeNode(FetchStmt);
+					n->portalname = $6;
+					n->direction = FETCH_BACKWARD;
+					n->howMany = FETCH_ALL;
+					n->howLarge = $4;
+					$$ = (Node *)n;
+				}
 		;
 
 from_in:	FROM									{}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8f74353..55f062b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1043,6 +1043,7 @@ exec_simple_query(const char *query_string)
 		 */
 		(void) PortalRun(portal,
 						 FETCH_ALL,
+						 0,
 						 isTopLevel,
 						 receiver,
 						 receiver,
@@ -1928,6 +1929,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 
 	completed = PortalRun(portal,
 						  max_rows,
+						  0,
 						  true, /* always top level */
 						  receiver,
 						  receiver,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 9c14e8a..1456c5a 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -16,6 +16,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "access/htup_details.h"
 #include "commands/prepare.h"
 #include "executor/tstoreReceiver.h"
 #include "miscadmin.h"
@@ -39,9 +40,10 @@ static void ProcessQuery(PlannedStmt *plan,
 			 DestReceiver *dest,
 			 char *completionTag);
 static void FillPortalStore(Portal portal, bool isTopLevel);
-static uint32 RunFromStore(Portal portal, ScanDirection direction, long count,
+static uint32 RunFromStore(Portal portal, ScanDirection direction,
+		     long count, long size, bool *stoppedbysize,
 			 DestReceiver *dest);
-static long PortalRunSelect(Portal portal, bool forward, long count,
+static long PortalRunSelect(Portal portal, bool forward, long count, long size,
 				DestReceiver *dest);
 static void PortalRunUtility(Portal portal, Node *utilityStmt, bool isTopLevel,
 				 DestReceiver *dest, char *completionTag);
@@ -51,6 +53,7 @@ static void PortalRunMulti(Portal portal, bool isTopLevel,
 static long DoPortalRunFetch(Portal portal,
 				 FetchDirection fdirection,
 				 long count,
+				 long size,
 				 DestReceiver *dest);
 static void DoPortalRewind(Portal portal);
 
@@ -182,7 +185,7 @@ ProcessQuery(PlannedStmt *plan,
 	/*
 	 * Run the plan to completion.
 	 */
-	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
+	ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
 
 	/*
 	 * Build command completion status string, if caller wants one.
@@ -703,7 +706,7 @@ PortalSetResultFormat(Portal portal, int nFormats, int16 *formats)
  * suspended due to exhaustion of the count parameter.
  */
 bool
-PortalRun(Portal portal, long count, bool isTopLevel,
+PortalRun(Portal portal, long count, long size, bool isTopLevel,
 		  DestReceiver *dest, DestReceiver *altdest,
 		  char *completionTag)
 {
@@ -787,7 +790,7 @@ PortalRun(Portal portal, long count, bool isTopLevel,
 				/*
 				 * Now fetch desired portion of results.
 				 */
-				nprocessed = PortalRunSelect(portal, true, count, dest);
+				nprocessed = PortalRunSelect(portal, true, count, size, dest);
 
 				/*
 				 * If the portal result contains a command tag and the caller
@@ -892,11 +895,13 @@ static long
 PortalRunSelect(Portal portal,
 				bool forward,
 				long count,
+				long size,
 				DestReceiver *dest)
 {
 	QueryDesc  *queryDesc;
 	ScanDirection direction;
 	uint32		nprocessed;
+	bool		stoppedbysize;
 
 	/*
 	 * NB: queryDesc will be NULL if we are fetching from a held cursor or a
@@ -939,12 +944,14 @@ PortalRunSelect(Portal portal,
 			count = 0;
 
 		if (portal->holdStore)
-			nprocessed = RunFromStore(portal, direction, count, dest);
+			nprocessed = RunFromStore(portal, direction, count,
+									  size, &stoppedbysize, dest);
 		else
 		{
 			PushActiveSnapshot(queryDesc->snapshot);
-			ExecutorRun(queryDesc, direction, count);
+			ExecutorRun(queryDesc, direction, count, size);
 			nprocessed = queryDesc->estate->es_processed;
+			stoppedbysize = queryDesc->estate->es_stoppedbysize;
 			PopActiveSnapshot();
 		}
 
@@ -954,8 +961,9 @@ PortalRunSelect(Portal portal,
 
 			if (nprocessed > 0)
 				portal->atStart = false;		/* OK to go backward now */
-			if (count == 0 ||
-				(unsigned long) nprocessed < (unsigned long) count)
+			if ((count == 0 ||
+				 (unsigned long) nprocessed < (unsigned long) count) &&
+				!stoppedbysize)
 				portal->atEnd = true;	/* we retrieved 'em all */
 			oldPos = portal->portalPos;
 			portal->portalPos += nprocessed;
@@ -982,12 +990,14 @@ PortalRunSelect(Portal portal,
 			count = 0;
 
 		if (portal->holdStore)
-			nprocessed = RunFromStore(portal, direction, count, dest);
+			nprocessed = RunFromStore(portal, direction, count,
+									  size, &stoppedbysize, dest);
 		else
 		{
 			PushActiveSnapshot(queryDesc->snapshot);
-			ExecutorRun(queryDesc, direction, count);
+			ExecutorRun(queryDesc, direction, count, size);
 			nprocessed = queryDesc->estate->es_processed;
+			stoppedbysize = queryDesc->estate->es_stoppedbysize;
 			PopActiveSnapshot();
 		}
 
@@ -998,8 +1008,9 @@ PortalRunSelect(Portal portal,
 				portal->atEnd = false;	/* OK to go forward now */
 				portal->portalPos++;	/* adjust for endpoint case */
 			}
-			if (count == 0 ||
-				(unsigned long) nprocessed < (unsigned long) count)
+			if ((count == 0 ||
+				 (unsigned long) nprocessed < (unsigned long) count) &&
+				!stoppedbysize)
 			{
 				portal->atStart = true; /* we retrieved 'em all */
 				portal->portalPos = 0;
@@ -1089,10 +1100,13 @@ FillPortalStore(Portal portal, bool isTopLevel)
  */
 static uint32
 RunFromStore(Portal portal, ScanDirection direction, long count,
-			 DestReceiver *dest)
+			 long size_limit, bool *stoppedbysize, DestReceiver *dest)
 {
 	long		current_tuple_count = 0;
 	TupleTableSlot *slot;
+	long			sent_size = 0;
+
+	*stoppedbysize = false;
 
 	slot = MakeSingleTupleTableSlot(portal->tupDesc);
 
@@ -1123,6 +1137,9 @@ RunFromStore(Portal portal, ScanDirection direction, long count,
 
 			(*dest->receiveSlot) (slot, dest);
 
+			/* Count the size of tuples we've sent */
+			sent_size += slot_compute_attr_size(slot);
+
 			ExecClearTuple(slot);
 
 			/*
@@ -1133,6 +1150,14 @@ RunFromStore(Portal portal, ScanDirection direction, long count,
 			current_tuple_count++;
 			if (count && count == current_tuple_count)
 				break;
+
+			/* Quit when the size limit will be exceeded by this tuple */
+			if (current_tuple_count > 0 &&
+				size_limit > 0 && size_limit < sent_size)
+			{
+				*stoppedbysize = true;
+				break;
+			}
 		}
 	}
 
@@ -1385,6 +1410,7 @@ long
 PortalRunFetch(Portal portal,
 			   FetchDirection fdirection,
 			   long count,
+			   long size,
 			   DestReceiver *dest)
 {
 	long		result;
@@ -1422,7 +1448,7 @@ PortalRunFetch(Portal portal,
 		switch (portal->strategy)
 		{
 			case PORTAL_ONE_SELECT:
-				result = DoPortalRunFetch(portal, fdirection, count, dest);
+				result = DoPortalRunFetch(portal, fdirection, count, size, dest);
 				break;
 
 			case PORTAL_ONE_RETURNING:
@@ -1439,7 +1465,7 @@ PortalRunFetch(Portal portal,
 				/*
 				 * Now fetch desired portion of results.
 				 */
-				result = DoPortalRunFetch(portal, fdirection, count, dest);
+				result = DoPortalRunFetch(portal, fdirection, count, size, dest);
 				break;
 
 			default:
@@ -1484,6 +1510,7 @@ static long
 DoPortalRunFetch(Portal portal,
 				 FetchDirection fdirection,
 				 long count,
+				 long size,
 				 DestReceiver *dest)
 {
 	bool		forward;
@@ -1526,7 +1553,7 @@ DoPortalRunFetch(Portal portal,
 				{
 					DoPortalRewind(portal);
 					if (count > 1)
-						PortalRunSelect(portal, true, count - 1,
+						PortalRunSelect(portal, true, count - 1, 0L,
 										None_Receiver);
 				}
 				else
@@ -1536,13 +1563,13 @@ DoPortalRunFetch(Portal portal,
 					if (portal->atEnd)
 						pos++;	/* need one extra fetch if off end */
 					if (count <= pos)
-						PortalRunSelect(portal, false, pos - count + 1,
+						PortalRunSelect(portal, false, pos - count + 1, 0L,
 										None_Receiver);
 					else if (count > pos + 1)
-						PortalRunSelect(portal, true, count - pos - 1,
+						PortalRunSelect(portal, true, count - pos - 1, 0L,
 										None_Receiver);
 				}
-				return PortalRunSelect(portal, true, 1L, dest);
+				return PortalRunSelect(portal, true, 1L, 0L, dest);
 			}
 			else if (count < 0)
 			{
@@ -1553,17 +1580,17 @@ DoPortalRunFetch(Portal portal,
 				 * (Is it worth considering case where count > half of size of
 				 * query?  We could rewind once we know the size ...)
 				 */
-				PortalRunSelect(portal, true, FETCH_ALL, None_Receiver);
+				PortalRunSelect(portal, true, FETCH_ALL, 0L, None_Receiver);
 				if (count < -1)
-					PortalRunSelect(portal, false, -count - 1, None_Receiver);
-				return PortalRunSelect(portal, false, 1L, dest);
+					PortalRunSelect(portal, false, -count - 1, 0, None_Receiver);
+				return PortalRunSelect(portal, false, 1L, 0L, dest);
 			}
 			else
 			{
 				/* count == 0 */
 				/* Rewind to start, return zero rows */
 				DoPortalRewind(portal);
-				return PortalRunSelect(portal, true, 0L, dest);
+				return PortalRunSelect(portal, true, 0L, 0L, dest);
 			}
 			break;
 		case FETCH_RELATIVE:
@@ -1573,8 +1600,8 @@ DoPortalRunFetch(Portal portal,
 				 * Definition: advance count-1 rows, return next row (if any).
 				 */
 				if (count > 1)
-					PortalRunSelect(portal, true, count - 1, None_Receiver);
-				return PortalRunSelect(portal, true, 1L, dest);
+					PortalRunSelect(portal, true, count - 1, 0L, None_Receiver);
+				return PortalRunSelect(portal, true, 1L, 0L, dest);
 			}
 			else if (count < 0)
 			{
@@ -1583,8 +1610,8 @@ DoPortalRunFetch(Portal portal,
 				 * any).
 				 */
 				if (count < -1)
-					PortalRunSelect(portal, false, -count - 1, None_Receiver);
-				return PortalRunSelect(portal, false, 1L, dest);
+					PortalRunSelect(portal, false, -count - 1, 0L, None_Receiver);
+				return PortalRunSelect(portal, false, 1L, 0L, dest);
 			}
 			else
 			{
@@ -1630,7 +1657,7 @@ DoPortalRunFetch(Portal portal,
 			 */
 			if (on_row)
 			{
-				PortalRunSelect(portal, false, 1L, None_Receiver);
+				PortalRunSelect(portal, false, 1L, 0L, None_Receiver);
 				/* Set up to fetch one row forward */
 				count = 1;
 				forward = true;
@@ -1652,7 +1679,7 @@ DoPortalRunFetch(Portal portal,
 		return result;
 	}
 
-	return PortalRunSelect(portal, forward, count, dest);
+	return PortalRunSelect(portal, forward, count, size, dest);
 }
 
 /*
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index d2ad910..2eeba00 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -20,6 +20,7 @@
 #include "access/transam.h"
 #include "storage/bufpage.h"
 
+#include "executor/tuptable.h"
 /*
  * MaxTupleAttributeNumber limits the number of (user) columns in a tuple.
  * The key limit on this value is that the size of the fixed overhead for
@@ -723,6 +724,7 @@ extern Datum fastgetattr(HeapTuple tup, int attnum, TupleDesc tupleDesc,
 /* prototypes for functions in common/heaptuple.c */
 extern Size heap_compute_data_size(TupleDesc tupleDesc,
 					   Datum *values, bool *isnull);
+extern Size slot_compute_attr_size(TupleTableSlot *slot);
 extern void heap_fill_tuple(TupleDesc tupleDesc,
 				Datum *values, bool *isnull,
 				char *data, Size data_size,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 40fde83..64a02c3 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -80,8 +80,8 @@ extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
 
 /* Hook for plugins to get control in ExecutorRun() */
 typedef void (*ExecutorRun_hook_type) (QueryDesc *queryDesc,
-												   ScanDirection direction,
-												   long count);
+									   ScanDirection direction,
+									   long count, long size);
 extern PGDLLIMPORT ExecutorRun_hook_type ExecutorRun_hook;
 
 /* Hook for plugins to get control in ExecutorFinish() */
@@ -176,9 +176,9 @@ extern TupleTableSlot *ExecFilterJunk(JunkFilter *junkfilter,
 extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
 extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
 extern void ExecutorRun(QueryDesc *queryDesc,
-			ScanDirection direction, long count);
+			ScanDirection direction, long count, long size);
 extern void standard_ExecutorRun(QueryDesc *queryDesc,
-					 ScanDirection direction, long count);
+		    ScanDirection direction, long count, long size);
 extern void ExecutorFinish(QueryDesc *queryDesc);
 extern void standard_ExecutorFinish(QueryDesc *queryDesc);
 extern void ExecutorEnd(QueryDesc *queryDesc);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41288ed..d963286 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -376,6 +376,7 @@ typedef struct EState
 	List	   *es_rowMarks;	/* List of ExecRowMarks */
 
 	uint32		es_processed;	/* # of tuples processed */
+	bool		es_stoppedbysize; /* true if processing stopped by size */
 	Oid			es_lastoid;		/* last oid processed (by INSERT) */
 
 	int			es_top_eflags;	/* eflags passed to ExecutorStart */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index b1dfa85..9e18331 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2223,6 +2223,7 @@ typedef struct FetchStmt
 	NodeTag		type;
 	FetchDirection direction;	/* see above */
 	long		howMany;		/* number of rows, or position argument */
+	long		howLarge;		/* total bytes of rows */
 	char	   *portalname;		/* name of portal (cursor) */
 	bool		ismove;			/* TRUE if MOVE */
 } FetchStmt;
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index 8073a6e..afffe86 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -33,13 +33,14 @@ extern void PortalStart(Portal portal, ParamListInfo params,
 extern void PortalSetResultFormat(Portal portal, int nFormats,
 					  int16 *formats);
 
-extern bool PortalRun(Portal portal, long count, bool isTopLevel,
+extern bool PortalRun(Portal portal, long count, long size, bool isTopLevel,
 		  DestReceiver *dest, DestReceiver *altdest,
 		  char *completionTag);
 
 extern long PortalRunFetch(Portal portal,
 			   FetchDirection fdirection,
 			   long count,
+			   long size,
 			   DestReceiver *dest);
 
 #endif   /* PQUERY_H */
diff --git a/src/interfaces/ecpg/preproc/Makefile b/src/interfaces/ecpg/preproc/Makefile
index 1ecc405..b492fa7 100644
--- a/src/interfaces/ecpg/preproc/Makefile
+++ b/src/interfaces/ecpg/preproc/Makefile
@@ -48,7 +48,7 @@ ecpg: $(OBJS) | submake-libpgport
 preproc.o: pgc.c
 
 preproc.h: preproc.c ;
-preproc.c: BISONFLAGS += -d
+preproc.c: BISONFLAGS += -r all -d
 
 preproc.y: ../../../backend/parser/gram.y parse.pl ecpg.addons ecpg.header ecpg.tokens ecpg.trailer ecpg.type
 	$(PERL) $(srcdir)/parse.pl $(srcdir) < $< > $@
diff --git a/src/interfaces/ecpg/preproc/ecpg.addons b/src/interfaces/ecpg/preproc/ecpg.addons
index b3b36cf..bdccb68 100644
--- a/src/interfaces/ecpg/preproc/ecpg.addons
+++ b/src/interfaces/ecpg/preproc/ecpg.addons
@@ -220,13 +220,46 @@ ECPG: fetch_argsNEXTopt_from_incursor_name addon
 ECPG: fetch_argsPRIORopt_from_incursor_name addon
 ECPG: fetch_argsFIRST_Popt_from_incursor_name addon
 ECPG: fetch_argsLAST_Popt_from_incursor_name addon
+		add_additional_variables($3, false);
+		if ($3[0] == ':')
+		{
+			free($3);
+			$3 = mm_strdup("$0");
+		}
 ECPG: fetch_argsALLopt_from_incursor_name addon
+ECPG: fetch_argsFORWARDopt_from_incursor_name addon
+ECPG: fetch_argsBACKWARDopt_from_incursor_name addon
 		add_additional_variables($3, false);
 		if ($3[0] == ':')
 		{
 			free($3);
 			$3 = mm_strdup("$0");
 		}
+ECPG: fetch_argsALLLIMITIconstopt_from_incursor_name addon
+		add_additional_variables($5, false);
+		if ($5[0] == ':')
+		{
+			free($5);
+			$5 = mm_strdup("$0");
+		}
+		if ($3[0] == '$')
+		{
+			free($3);
+			$3 = mm_strdup("$0");
+		}
+ECPG: fetch_argsFORWARDALLLIMITIconstopt_from_incursor_name addon
+ECPG: fetch_argsBACKWARDALLLIMITIconstopt_from_incursor_name addon
+		add_additional_variables($6, false);
+		if ($6[0] == ':')
+		{
+			free($6);
+			$6 = mm_strdup("$0");
+		}
+		if ($4[0] == '$')
+		{
+			free($4);
+			$4 = mm_strdup("$0");
+		}
 ECPG: fetch_argsSignedIconstopt_from_incursor_name addon
 		add_additional_variables($3, false);
 		if ($3[0] == ':')
@@ -234,11 +267,41 @@ ECPG: fetch_argsSignedIconstopt_from_incursor_name addon
 			free($3);
 			$3 = mm_strdup("$0");
 		}
+ECPG: fetch_argsSignedIconstLIMITIconstopt_from_incursor_name addon
+		add_additional_variables($5, false);
+		if ($5[0] == ':')
+		{
+			free($5);
+			$5 = mm_strdup("$0");
+		}
 		if ($1[0] == '$')
 		{
 			free($1);
 			$1 = mm_strdup("$0");
 		}
+		if ($3[0] == '$')
+		{
+			free($3);
+			$3 = mm_strdup("$0");
+		}
+ECPG: fetch_argsFORWARDSignedIconstLIMITIconstopt_from_incursor_name addon
+ECPG: fetch_argsBACKWARDSignedIconstLIMITIconstopt_from_incursor_name addon
+		add_additional_variables($6, false);
+		if ($6[0] == ':')
+		{
+			free($6);
+			$6 = mm_strdup("$0");
+		}
+		if ($2[0] == '$')
+		{
+			free($2);
+			$2 = mm_strdup("$0");
+		}
+		if ($4[0] == '$')
+		{
+			free($4);
+			$4 = mm_strdup("$0");
+		}
 ECPG: fetch_argsFORWARDALLopt_from_incursor_name addon
 ECPG: fetch_argsBACKWARDALLopt_from_incursor_name addon
 		add_additional_variables($4, false);
-- 
2.1.0.GIT

0002-Make-FETCH-can-accept-per-tuple-memory-overhead.patchtext/x-patch; charset=us-asciiDownload

>From 1b595089408f47bafa96f8c86ddbc7a5728f0e5e Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Tue, 27 Jan 2015 11:15:17 +0900
Subject: [PATCH 2/2] Make FETCH can accept per-tuple memory overhead.

---
 contrib/postgres_fdw/postgres_fdw.c     | 25 ++++++-----
 src/backend/commands/copy.c             |  2 +-
 src/backend/commands/createas.c         |  2 +-
 src/backend/commands/explain.c          |  2 +-
 src/backend/commands/extension.c        |  2 +-
 src/backend/commands/matview.c          |  2 +-
 src/backend/commands/portalcmds.c       |  3 +-
 src/backend/commands/prepare.c          |  2 +-
 src/backend/executor/execMain.c         | 16 ++++---
 src/backend/executor/functions.c        |  2 +-
 src/backend/executor/spi.c              |  4 +-
 src/backend/parser/gram.y               | 35 ++++++++++-----
 src/backend/tcop/postgres.c             |  4 +-
 src/backend/tcop/pquery.c               | 79 ++++++++++++++++++++-------------
 src/include/executor/executor.h         |  6 +--
 src/include/nodes/parsenodes.h          |  1 +
 src/include/tcop/pquery.h               |  6 +--
 src/interfaces/ecpg/preproc/Makefile    |  2 +-
 src/interfaces/ecpg/preproc/ecpg.addons | 64 +++++++++++++++++---------
 19 files changed, 159 insertions(+), 100 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index b3bf27e..6633912 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -47,10 +47,10 @@ PG_MODULE_MAGIC;
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
 /* Maximum tuples per fetch */
-#define MAX_FETCH_SIZE				30000
+#define MAX_FETCH_SIZE				10000
 
 /* Maximum memory usable for retrieved data  */
-#define MAX_FETCH_MEM				(2 * 1024 * 1024)
+#define MAX_FETCH_MEM				(512 * 1024)
 /*
  * FDW-specific planner information kept in RelOptInfo.fdw_private for a
  * foreign table.  This information is collected by postgresGetForeignRelSize.
@@ -163,6 +163,7 @@ typedef struct PgFdwScanState
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
 
 	long		max_palloced_mem; /* For test, remove me later */
+	int			max_numrows;
 } PgFdwScanState;
 
 /*
@@ -1104,7 +1105,7 @@ postgresEndForeignScan(ForeignScanState *node)
 	if (fsstate == NULL)
 		return;
 
-	elog(LOG, "Max memory fo tuple store = %ld", fsstate->max_palloced_mem);
+	elog(LOG, "Max memory for tuple store = %ld, max numrows = %d", fsstate->max_palloced_mem, fsstate->max_numrows);
 	/* Close the cursor if open, to prevent accumulation of cursors */
 	if (fsstate->cursor_exists)
 		close_cursor(fsstate->conn, fsstate->cursor_number);
@@ -2037,19 +2038,18 @@ fetch_more_data(ForeignScanState *node)
 		PGconn	   *conn = fsstate->conn;
 		char		sql[64];
 		int			fetch_size;
+		int			tuple_overhead;
 		int			numrows;
 		int			i;
 		long		alloc_size = 0;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = MAX_FETCH_MEM -
-			MAX_FETCH_SIZE *
-				estimate_tuple_overhead(fsstate->attinmeta->tupdesc,
-										fsstate->retrieved_attrs);
-
-		snprintf(sql, sizeof(sql), "FETCH %d LIMIT %ld FROM c%u",
+		tuple_overhead = estimate_tuple_overhead(fsstate->attinmeta->tupdesc,
+												 fsstate->retrieved_attrs);
+		fetch_size = MAX_FETCH_MEM - MAX_FETCH_SIZE * sizeof(HeapTuple);
+		snprintf(sql, sizeof(sql), "FETCH %d LIMIT %ld (%d) FROM c%u",
 				 MAX_FETCH_SIZE,
-				 fetch_size,
+				 fetch_size, tuple_overhead,
 				 fsstate->cursor_number);
 
 		res = PQexec(conn, sql);
@@ -2059,6 +2059,9 @@ fetch_more_data(ForeignScanState *node)
 
 		/* Convert the data into HeapTuples */
 		numrows = PQntuples(res);
+		if (fsstate->max_numrows < numrows)
+			fsstate->max_numrows = numrows;
+
 		if (numrows == 0)
 			fsstate->eof_reached;
 		else
@@ -2079,7 +2082,7 @@ fetch_more_data(ForeignScanState *node)
 				alloc_size += fsstate->tuples[i]->t_len;
 			}
 
-			if (alloc_size > fsstate->max_palloced_mem)
+			if (fsstate->max_palloced_mem < alloc_size)
 				fsstate->max_palloced_mem = alloc_size;
 
 			/* Update fetch_ct_2 */
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b6e6523..6ddae82 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1915,7 +1915,7 @@ CopyTo(CopyState cstate)
 	else
 	{
 		/* run the plan --- the dest receiver will send tuples */
-		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, 0L);
+		ExecutorRun(cstate->queryDesc, ForwardScanDirection, 0L, 0L, 0);
 		processed = ((DR_copy *) cstate->queryDesc->dest)->processed;
 	}
 
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index c5c4478..1644f86 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -192,7 +192,7 @@ ExecCreateTableAs(CreateTableAsStmt *stmt, const char *queryString,
 		dir = ForwardScanDirection;
 
 	/* run the plan */
-	ExecutorRun(queryDesc, dir, 0L, 0L);
+	ExecutorRun(queryDesc, dir, 0L, 0L, 0);
 
 	/* save the rowcount if we're given a completionTag to fill */
 	if (completionTag)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 2c23e9b..1b423ee 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -489,7 +489,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
 			dir = ForwardScanDirection;
 
 		/* run the plan */
-		ExecutorRun(queryDesc, dir, 0L, 0L);
+		ExecutorRun(queryDesc, dir, 0L, 0L, 0);
 
 		/* run cleanup too */
 		ExecutorFinish(queryDesc);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index f624567..2360ffa 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -736,7 +736,7 @@ execute_sql_string(const char *sql, const char *filename)
 										dest, NULL, 0);
 
 				ExecutorStart(qdesc, 0);
-				ExecutorRun(qdesc, ForwardScanDirection, 0, 0);
+				ExecutorRun(qdesc, ForwardScanDirection, 0L, 0L, 0);
 				ExecutorFinish(qdesc);
 				ExecutorEnd(qdesc);
 
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 6530ecb..54669c5 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -360,7 +360,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	ExecutorStart(queryDesc, EXEC_FLAG_WITHOUT_OIDS);
 
 	/* run the plan */
-	ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
+	ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L, 0);
 
 	/* and clean up */
 	ExecutorFinish(queryDesc);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 255c86e..85fffc1 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -178,6 +178,7 @@ PerformPortalFetch(FetchStmt *stmt,
 								stmt->direction,
 								stmt->howMany,
 								stmt->howLarge,
+								stmt->tupoverhead,
 								dest);
 
 	/* Return command status if wanted */
@@ -376,7 +377,7 @@ PersistHoldablePortal(Portal portal)
 										true);
 
 		/* Fetch the result set into the tuplestore */
-		ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
+		ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L, 0);
 
 		(*queryDesc->dest->rDestroy) (queryDesc->dest);
 		queryDesc->dest = NULL;
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 31799f5..e46367a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -291,7 +291,7 @@ ExecuteQuery(ExecuteStmt *stmt, IntoClause *intoClause,
 	 */
 	PortalStart(portal, paramLI, eflags, GetActiveSnapshot());
 
-	(void) PortalRun(portal, count, 0L, false, dest, dest, completionTag);
+	(void) PortalRun(portal, count, 0L, 0, false, dest, dest, completionTag);
 
 	PortalDrop(portal, false);
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d976bf3..b40702c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -79,6 +79,7 @@ static void ExecutePlan(EState *estate, PlanState *planstate,
 			bool sendTuples,
 			long numberTuples,
 			long sizeTuples,
+			int  tupleOverhead,
 			ScanDirection direction,
 			DestReceiver *dest);
 static bool ExecCheckRTEPerms(RangeTblEntry *rte);
@@ -249,17 +250,20 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
  */
 void
 ExecutorRun(QueryDesc *queryDesc,
-			ScanDirection direction, long count, long size)
+			ScanDirection direction, long count, long size, int tupoverhead)
 {
 	if (ExecutorRun_hook)
-		(*ExecutorRun_hook) (queryDesc, direction, count, size);
+		(*ExecutorRun_hook) (queryDesc, direction,
+							 count, size, tupoverhead);
 	else
-		standard_ExecutorRun(queryDesc, direction, count, size);
+		standard_ExecutorRun(queryDesc, direction,
+							 count, size, tupoverhead);
 }
 
 void
 standard_ExecutorRun(QueryDesc *queryDesc,
-					 ScanDirection direction, long count, long size)
+					 ScanDirection direction,
+					 long count, long size, int tupoverhead)
 {
 	EState	   *estate;
 	CmdType		operation;
@@ -312,6 +316,7 @@ standard_ExecutorRun(QueryDesc *queryDesc,
 					sendTuples,
 					count,
 					size,
+					tupoverhead,
 					direction,
 					dest);
 
@@ -1453,6 +1458,7 @@ ExecutePlan(EState *estate,
 			bool sendTuples,
 			long numberTuples,
 			long sizeTuples,
+			int  tupleOverhead,
 			ScanDirection direction,
 			DestReceiver *dest)
 {
@@ -1534,7 +1540,7 @@ ExecutePlan(EState *estate,
 			 *
 			 * This needs all attributes deformed so a bit slow on some cases.
 			 */
-			sent_size += slot_compute_attr_size(slot);
+			sent_size += slot_compute_attr_size(slot) + tupleOverhead;
 
 			/* Quit when the size limit will be exceeded by this tuple */
 			if (sizeTuples < sent_size)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index d64e908..9b46c95 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -850,7 +850,7 @@ postquel_getnext(execution_state *es, SQLFunctionCachePtr fcache)
 		/* Run regular commands to completion unless lazyEval */
 		long		count = (es->lazyEval) ? 1L : 0L;
 
-		ExecutorRun(es->qd, ForwardScanDirection, count, 0L);
+		ExecutorRun(es->qd, ForwardScanDirection, count, 0L, 0);
 
 		/*
 		 * If we requested run to completion OR there was no tuple returned,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index cb30cfb..889a65c 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2369,7 +2369,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, long tcount)
 
 	ExecutorStart(queryDesc, eflags);
 
-	ExecutorRun(queryDesc, ForwardScanDirection, tcount, 0L);
+	ExecutorRun(queryDesc, ForwardScanDirection, tcount, 0L, 0);
 
 	_SPI_current->processed = queryDesc->estate->es_processed;
 	_SPI_current->lastoid = queryDesc->estate->es_lastoid;
@@ -2447,7 +2447,7 @@ _SPI_cursor_operation(Portal portal, FetchDirection direction, long count,
 	/* Run the cursor */
 	nfetched = PortalRunFetch(portal,
 							  direction,
-							  count, 0L,
+							  count, 0L, 0,
 							  dest);
 
 	/*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e559d1a..4507ea2 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -520,6 +520,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type <str>		opt_existing_window_name
 %type <boolean> opt_if_not_exists
 
+%type <ival>	opt_overhead
+
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -6021,13 +6023,14 @@ fetch_args:	cursor_name
 					n->howMany = $1;
 					$$ = (Node *)n;
 				}
-			| SignedIconst LIMIT Iconst opt_from_in cursor_name
+			| SignedIconst LIMIT Iconst opt_overhead opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
-					n->portalname = $5;
+					n->portalname = $6;
 					n->direction = FETCH_FORWARD;
 					n->howMany = $1;
 					n->howLarge = $3;
+					n->tupoverhead = $4;
 					$$ = (Node *)n;
 				}
 			| ALL opt_from_in cursor_name
@@ -6038,13 +6041,14 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
-			| ALL LIMIT Iconst opt_from_in cursor_name
+			| ALL LIMIT Iconst opt_overhead opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
-					n->portalname = $5;
+					n->portalname = $6;
 					n->direction = FETCH_FORWARD;
 					n->howMany = FETCH_ALL;
 					n->howLarge = $3;
+					n->tupoverhead = $4;
 					$$ = (Node *)n;
 				}
 			| FORWARD opt_from_in cursor_name
@@ -6063,13 +6067,14 @@ fetch_args:	cursor_name
 					n->howMany = $2;
 					$$ = (Node *)n;
 				}
-			| FORWARD SignedIconst LIMIT Iconst opt_from_in cursor_name
+			| FORWARD SignedIconst LIMIT Iconst opt_overhead opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
-					n->portalname = $6;
+					n->portalname = $7;
 					n->direction = FETCH_FORWARD;
 					n->howMany = $2;
 					n->howLarge = $4;
+					n->tupoverhead = $5;
 					$$ = (Node *)n;
 				}
 			| FORWARD ALL opt_from_in cursor_name
@@ -6080,13 +6085,14 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
-			| FORWARD ALL LIMIT Iconst opt_from_in cursor_name
+			| FORWARD ALL LIMIT Iconst  opt_overhead  opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
-					n->portalname = $6;
+					n->portalname = $7;
 					n->direction = FETCH_FORWARD;
 					n->howMany = FETCH_ALL;
 					n->howLarge = $4;
+					n->tupoverhead = $5;
 					$$ = (Node *)n;
 				}
 			| BACKWARD opt_from_in cursor_name
@@ -6105,13 +6111,14 @@ fetch_args:	cursor_name
 					n->howMany = $2;
 					$$ = (Node *)n;
 				}
-			| BACKWARD SignedIconst LIMIT Iconst opt_from_in cursor_name
+			| BACKWARD SignedIconst LIMIT Iconst  opt_overhead opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
-					n->portalname = $6;
+					n->portalname = $7;
 					n->direction = FETCH_BACKWARD;
 					n->howMany = $2;
 					n->howLarge = $4;
+					n->tupoverhead = $5;
 					$$ = (Node *)n;
 				}
 			| BACKWARD ALL opt_from_in cursor_name
@@ -6122,13 +6129,14 @@ fetch_args:	cursor_name
 					n->howMany = FETCH_ALL;
 					$$ = (Node *)n;
 				}
-			| BACKWARD ALL LIMIT Iconst opt_from_in cursor_name
+			| BACKWARD ALL LIMIT Iconst  opt_overhead opt_from_in cursor_name
 				{
 					FetchStmt *n = makeNode(FetchStmt);
-					n->portalname = $6;
+					n->portalname = $7;
 					n->direction = FETCH_BACKWARD;
 					n->howMany = FETCH_ALL;
 					n->howLarge = $4;
+					n->tupoverhead = $5;
 					$$ = (Node *)n;
 				}
 		;
@@ -6141,6 +6149,9 @@ opt_from_in:	from_in								{}
 			| /* EMPTY */							{}
 		;
 
+opt_overhead:	'(' Iconst ')'						{ $$ = $2;}
+			| /* EMPTY */							{ $$ = 0; }
+		;
 
 /*****************************************************************************
  *
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 55f062b..5261197 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1043,7 +1043,7 @@ exec_simple_query(const char *query_string)
 		 */
 		(void) PortalRun(portal,
 						 FETCH_ALL,
-						 0,
+						 0L, 0,
 						 isTopLevel,
 						 receiver,
 						 receiver,
@@ -1929,7 +1929,7 @@ exec_execute_message(const char *portal_name, long max_rows)
 
 	completed = PortalRun(portal,
 						  max_rows,
-						  0,
+						  0L, 0,
 						  true, /* always top level */
 						  receiver,
 						  receiver,
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 1456c5a..6628b19 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -41,9 +41,10 @@ static void ProcessQuery(PlannedStmt *plan,
 			 char *completionTag);
 static void FillPortalStore(Portal portal, bool isTopLevel);
 static uint32 RunFromStore(Portal portal, ScanDirection direction,
-		     long count, long size, bool *stoppedbysize,
+		     long count, long size, int tupoverhead, bool *stoppedbysize,
 			 DestReceiver *dest);
-static long PortalRunSelect(Portal portal, bool forward, long count, long size,
+static long PortalRunSelect(Portal portal, bool forward,
+				long count, long size, int tupoverhead,
 				DestReceiver *dest);
 static void PortalRunUtility(Portal portal, Node *utilityStmt, bool isTopLevel,
 				 DestReceiver *dest, char *completionTag);
@@ -54,6 +55,7 @@ static long DoPortalRunFetch(Portal portal,
 				 FetchDirection fdirection,
 				 long count,
 				 long size,
+				 int tupoverehad,
 				 DestReceiver *dest);
 static void DoPortalRewind(Portal portal);
 
@@ -185,7 +187,7 @@ ProcessQuery(PlannedStmt *plan,
 	/*
 	 * Run the plan to completion.
 	 */
-	ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L);
+	ExecutorRun(queryDesc, ForwardScanDirection, 0L, 0L, 0);
 
 	/*
 	 * Build command completion status string, if caller wants one.
@@ -706,8 +708,8 @@ PortalSetResultFormat(Portal portal, int nFormats, int16 *formats)
  * suspended due to exhaustion of the count parameter.
  */
 bool
-PortalRun(Portal portal, long count, long size, bool isTopLevel,
-		  DestReceiver *dest, DestReceiver *altdest,
+PortalRun(Portal portal, long count, long size, int tupoverhead,
+		  bool isTopLevel, DestReceiver *dest, DestReceiver *altdest,
 		  char *completionTag)
 {
 	bool		result;
@@ -790,7 +792,8 @@ PortalRun(Portal portal, long count, long size, bool isTopLevel,
 				/*
 				 * Now fetch desired portion of results.
 				 */
-				nprocessed = PortalRunSelect(portal, true, count, size, dest);
+				nprocessed = PortalRunSelect(portal, true,
+											 count, size, tupoverhead, dest);
 
 				/*
 				 * If the portal result contains a command tag and the caller
@@ -896,6 +899,7 @@ PortalRunSelect(Portal portal,
 				bool forward,
 				long count,
 				long size,
+				int  tupoverhead,
 				DestReceiver *dest)
 {
 	QueryDesc  *queryDesc;
@@ -944,12 +948,13 @@ PortalRunSelect(Portal portal,
 			count = 0;
 
 		if (portal->holdStore)
-			nprocessed = RunFromStore(portal, direction, count,
-									  size, &stoppedbysize, dest);
+			nprocessed = RunFromStore(portal, direction,
+									  count, size, tupoverhead,
+									  &stoppedbysize, dest);
 		else
 		{
 			PushActiveSnapshot(queryDesc->snapshot);
-			ExecutorRun(queryDesc, direction, count, size);
+			ExecutorRun(queryDesc, direction, count, size, tupoverhead);
 			nprocessed = queryDesc->estate->es_processed;
 			stoppedbysize = queryDesc->estate->es_stoppedbysize;
 			PopActiveSnapshot();
@@ -990,12 +995,13 @@ PortalRunSelect(Portal portal,
 			count = 0;
 
 		if (portal->holdStore)
-			nprocessed = RunFromStore(portal, direction, count,
-									  size, &stoppedbysize, dest);
+			nprocessed = RunFromStore(portal, direction,
+									  count, size, tupoverhead,
+									  &stoppedbysize, dest);
 		else
 		{
 			PushActiveSnapshot(queryDesc->snapshot);
-			ExecutorRun(queryDesc, direction, count, size);
+			ExecutorRun(queryDesc, direction, count, size, tupoverhead);
 			nprocessed = queryDesc->estate->es_processed;
 			stoppedbysize = queryDesc->estate->es_stoppedbysize;
 			PopActiveSnapshot();
@@ -1099,8 +1105,9 @@ FillPortalStore(Portal portal, bool isTopLevel)
  * out for memory leaks.
  */
 static uint32
-RunFromStore(Portal portal, ScanDirection direction, long count,
-			 long size_limit, bool *stoppedbysize, DestReceiver *dest)
+RunFromStore(Portal portal, ScanDirection direction,
+			 long count, long size_limit, int tupoverhead,
+			 bool *stoppedbysize, DestReceiver *dest)
 {
 	long		current_tuple_count = 0;
 	TupleTableSlot *slot;
@@ -1138,7 +1145,7 @@ RunFromStore(Portal portal, ScanDirection direction, long count,
 			(*dest->receiveSlot) (slot, dest);
 
 			/* Count the size of tuples we've sent */
-			sent_size += slot_compute_attr_size(slot);
+			sent_size += slot_compute_attr_size(slot) + tupoverhead;
 
 			ExecClearTuple(slot);
 
@@ -1411,6 +1418,7 @@ PortalRunFetch(Portal portal,
 			   FetchDirection fdirection,
 			   long count,
 			   long size,
+			   int  tupoverhead,
 			   DestReceiver *dest)
 {
 	long		result;
@@ -1448,7 +1456,8 @@ PortalRunFetch(Portal portal,
 		switch (portal->strategy)
 		{
 			case PORTAL_ONE_SELECT:
-				result = DoPortalRunFetch(portal, fdirection, count, size, dest);
+				result = DoPortalRunFetch(portal, fdirection,
+										  count, size, tupoverhead, dest);
 				break;
 
 			case PORTAL_ONE_RETURNING:
@@ -1465,7 +1474,8 @@ PortalRunFetch(Portal portal,
 				/*
 				 * Now fetch desired portion of results.
 				 */
-				result = DoPortalRunFetch(portal, fdirection, count, size, dest);
+				result = DoPortalRunFetch(portal, fdirection,
+										  count, size, tupoverhead, dest);
 				break;
 
 			default:
@@ -1511,6 +1521,7 @@ DoPortalRunFetch(Portal portal,
 				 FetchDirection fdirection,
 				 long count,
 				 long size,
+				 int  tupoverhead,
 				 DestReceiver *dest)
 {
 	bool		forward;
@@ -1553,7 +1564,7 @@ DoPortalRunFetch(Portal portal,
 				{
 					DoPortalRewind(portal);
 					if (count > 1)
-						PortalRunSelect(portal, true, count - 1, 0L,
+						PortalRunSelect(portal, true, count - 1, 0L, 0,
 										None_Receiver);
 				}
 				else
@@ -1563,13 +1574,15 @@ DoPortalRunFetch(Portal portal,
 					if (portal->atEnd)
 						pos++;	/* need one extra fetch if off end */
 					if (count <= pos)
-						PortalRunSelect(portal, false, pos - count + 1, 0L,
+						PortalRunSelect(portal, false,
+										pos - count + 1, 0L, 0,
 										None_Receiver);
 					else if (count > pos + 1)
-						PortalRunSelect(portal, true, count - pos - 1, 0L,
+						PortalRunSelect(portal, true,
+										count - pos - 1, 0L, 0,
 										None_Receiver);
 				}
-				return PortalRunSelect(portal, true, 1L, 0L, dest);
+				return PortalRunSelect(portal, true, 1L, 0L, 0, dest);
 			}
 			else if (count < 0)
 			{
@@ -1580,17 +1593,19 @@ DoPortalRunFetch(Portal portal,
 				 * (Is it worth considering case where count > half of size of
 				 * query?  We could rewind once we know the size ...)
 				 */
-				PortalRunSelect(portal, true, FETCH_ALL, 0L, None_Receiver);
+				PortalRunSelect(portal, true,
+								FETCH_ALL, 0L, 0, None_Receiver);
 				if (count < -1)
-					PortalRunSelect(portal, false, -count - 1, 0, None_Receiver);
-				return PortalRunSelect(portal, false, 1L, 0L, dest);
+					PortalRunSelect(portal, false,
+									-count - 1, 0, 0, None_Receiver);
+				return PortalRunSelect(portal, false, 1L, 0L, 0, dest);
 			}
 			else
 			{
 				/* count == 0 */
 				/* Rewind to start, return zero rows */
 				DoPortalRewind(portal);
-				return PortalRunSelect(portal, true, 0L, 0L, dest);
+				return PortalRunSelect(portal, true, 0L, 0L, 0, dest);
 			}
 			break;
 		case FETCH_RELATIVE:
@@ -1600,8 +1615,9 @@ DoPortalRunFetch(Portal portal,
 				 * Definition: advance count-1 rows, return next row (if any).
 				 */
 				if (count > 1)
-					PortalRunSelect(portal, true, count - 1, 0L, None_Receiver);
-				return PortalRunSelect(portal, true, 1L, 0L, dest);
+					PortalRunSelect(portal, true,
+									count - 1, 0L, 0, None_Receiver);
+				return PortalRunSelect(portal, true, 1L, 0L, 0, dest);
 			}
 			else if (count < 0)
 			{
@@ -1610,8 +1626,9 @@ DoPortalRunFetch(Portal portal,
 				 * any).
 				 */
 				if (count < -1)
-					PortalRunSelect(portal, false, -count - 1, 0L, None_Receiver);
-				return PortalRunSelect(portal, false, 1L, 0L, dest);
+					PortalRunSelect(portal, false,
+									-count - 1, 0L, 0, None_Receiver);
+				return PortalRunSelect(portal, false, 1L, 0L, 0, dest);
 			}
 			else
 			{
@@ -1657,7 +1674,7 @@ DoPortalRunFetch(Portal portal,
 			 */
 			if (on_row)
 			{
-				PortalRunSelect(portal, false, 1L, 0L, None_Receiver);
+				PortalRunSelect(portal, false, 1L, 0L, 0, None_Receiver);
 				/* Set up to fetch one row forward */
 				count = 1;
 				forward = true;
@@ -1679,7 +1696,7 @@ DoPortalRunFetch(Portal portal,
 		return result;
 	}
 
-	return PortalRunSelect(portal, forward, count, size, dest);
+	return PortalRunSelect(portal, forward, count, size, tupoverhead, dest);
 }
 
 /*
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 64a02c3..316568f 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -81,7 +81,7 @@ extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
 /* Hook for plugins to get control in ExecutorRun() */
 typedef void (*ExecutorRun_hook_type) (QueryDesc *queryDesc,
 									   ScanDirection direction,
-									   long count, long size);
+									   long count, long size, int tupoverhead);
 extern PGDLLIMPORT ExecutorRun_hook_type ExecutorRun_hook;
 
 /* Hook for plugins to get control in ExecutorFinish() */
@@ -176,9 +176,9 @@ extern TupleTableSlot *ExecFilterJunk(JunkFilter *junkfilter,
 extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
 extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
 extern void ExecutorRun(QueryDesc *queryDesc,
-			ScanDirection direction, long count, long size);
+		ScanDirection direction, long count, long size, int tupoverhead);
 extern void standard_ExecutorRun(QueryDesc *queryDesc,
-		    ScanDirection direction, long count, long size);
+		 ScanDirection direction, long count, long size, int tupoverhead);
 extern void ExecutorFinish(QueryDesc *queryDesc);
 extern void standard_ExecutorFinish(QueryDesc *queryDesc);
 extern void ExecutorEnd(QueryDesc *queryDesc);
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 9e18331..f86694b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2224,6 +2224,7 @@ typedef struct FetchStmt
 	FetchDirection direction;	/* see above */
 	long		howMany;		/* number of rows, or position argument */
 	long		howLarge;		/* total bytes of rows */
+	int			tupoverhead;	/* declared overhead per tuple in client */
 	char	   *portalname;		/* name of portal (cursor) */
 	bool		ismove;			/* TRUE if MOVE */
 } FetchStmt;
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index afffe86..021532c 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -17,7 +17,6 @@
 #include "nodes/parsenodes.h"
 #include "utils/portal.h"
 
-
 extern PGDLLIMPORT Portal ActivePortal;
 
 
@@ -33,14 +32,15 @@ extern void PortalStart(Portal portal, ParamListInfo params,
 extern void PortalSetResultFormat(Portal portal, int nFormats,
 					  int16 *formats);
 
-extern bool PortalRun(Portal portal, long count, long size, bool isTopLevel,
-		  DestReceiver *dest, DestReceiver *altdest,
+extern bool PortalRun(Portal portal, long count, long size, int tupoverhead,
+		  bool isTopLevel, DestReceiver *dest, DestReceiver *altdest,
 		  char *completionTag);
 
 extern long PortalRunFetch(Portal portal,
 			   FetchDirection fdirection,
 			   long count,
 			   long size,
+			   int tupoverhead,
 			   DestReceiver *dest);
 
 #endif   /* PQUERY_H */
diff --git a/src/interfaces/ecpg/preproc/Makefile b/src/interfaces/ecpg/preproc/Makefile
index b492fa7..1ecc405 100644
--- a/src/interfaces/ecpg/preproc/Makefile
+++ b/src/interfaces/ecpg/preproc/Makefile
@@ -48,7 +48,7 @@ ecpg: $(OBJS) | submake-libpgport
 preproc.o: pgc.c
 
 preproc.h: preproc.c ;
-preproc.c: BISONFLAGS += -r all -d
+preproc.c: BISONFLAGS += -d
 
 preproc.y: ../../../backend/parser/gram.y parse.pl ecpg.addons ecpg.header ecpg.tokens ecpg.trailer ecpg.type
 	$(PERL) $(srcdir)/parse.pl $(srcdir) < $< > $@
diff --git a/src/interfaces/ecpg/preproc/ecpg.addons b/src/interfaces/ecpg/preproc/ecpg.addons
index bdccb68..424f412 100644
--- a/src/interfaces/ecpg/preproc/ecpg.addons
+++ b/src/interfaces/ecpg/preproc/ecpg.addons
@@ -235,31 +235,41 @@ ECPG: fetch_argsBACKWARDopt_from_incursor_name addon
 			free($3);
 			$3 = mm_strdup("$0");
 		}
-ECPG: fetch_argsALLLIMITIconstopt_from_incursor_name addon
-		add_additional_variables($5, false);
-		if ($5[0] == ':')
+ECPG: fetch_argsALLLIMITIconstopt_overheadopt_from_incursor_name addon
+		add_additional_variables($6, false);
+		if ($6[0] == ':')
 		{
-			free($5);
-			$5 = mm_strdup("$0");
+			free($6);
+			$6 = mm_strdup("$0");
 		}
 		if ($3[0] == '$')
 		{
 			free($3);
 			$3 = mm_strdup("$0");
 		}
-ECPG: fetch_argsFORWARDALLLIMITIconstopt_from_incursor_name addon
-ECPG: fetch_argsBACKWARDALLLIMITIconstopt_from_incursor_name addon
-		add_additional_variables($6, false);
-		if ($6[0] == ':')
+		if ($4[0] == '$')
 		{
-			free($6);
-			$6 = mm_strdup("$0");
+			free($4);
+			$4 = mm_strdup("$0");
+		}
+ECPG: fetch_argsFORWARDALLLIMITIconstopt_overheadopt_from_incursor_name addon
+ECPG: fetch_argsBACKWARDALLLIMITIconstopt_overheadopt_from_incursor_name addon
+		add_additional_variables($7, false);
+		if ($7[0] == ':')
+		{
+			free($7);
+			$7 = mm_strdup("$0");
 		}
 		if ($4[0] == '$')
 		{
 			free($4);
 			$4 = mm_strdup("$0");
 		}
+		if ($5[0] == '$')
+		{
+			free($5);
+			$5 = mm_strdup("$0");
+		}
 ECPG: fetch_argsSignedIconstopt_from_incursor_name addon
 		add_additional_variables($3, false);
 		if ($3[0] == ':')
@@ -267,12 +277,12 @@ ECPG: fetch_argsSignedIconstopt_from_incursor_name addon
 			free($3);
 			$3 = mm_strdup("$0");
 		}
-ECPG: fetch_argsSignedIconstLIMITIconstopt_from_incursor_name addon
-		add_additional_variables($5, false);
-		if ($5[0] == ':')
+ECPG: fetch_argsSignedIconstLIMITIconstopt_overheadopt_from_incursor_name addon
+		add_additional_variables($6, false);
+		if ($6[0] == ':')
 		{
-			free($5);
-			$5 = mm_strdup("$0");
+			free($6);
+			$6 = mm_strdup("$0");
 		}
 		if ($1[0] == '$')
 		{
@@ -284,13 +294,18 @@ ECPG: fetch_argsSignedIconstLIMITIconstopt_from_incursor_name addon
 			free($3);
 			$3 = mm_strdup("$0");
 		}
-ECPG: fetch_argsFORWARDSignedIconstLIMITIconstopt_from_incursor_name addon
-ECPG: fetch_argsBACKWARDSignedIconstLIMITIconstopt_from_incursor_name addon
-		add_additional_variables($6, false);
-		if ($6[0] == ':')
+		if ($4[0] == '$')
 		{
-			free($6);
-			$6 = mm_strdup("$0");
+			free($4);
+			$4 = mm_strdup("$0");
+		}
+ECPG: fetch_argsFORWARDSignedIconstLIMITIconstopt_overheadopt_from_incursor_name addon
+ECPG: fetch_argsBACKWARDSignedIconstLIMITIconstopt_overheadopt_from_incursor_name addon
+		add_additional_variables($7, false);
+		if ($7[0] == ':')
+		{
+			free($7);
+			$7 = mm_strdup("$0");
 		}
 		if ($2[0] == '$')
 		{
@@ -302,6 +317,11 @@ ECPG: fetch_argsBACKWARDSignedIconstLIMITIconstopt_from_incursor_name addon
 			free($4);
 			$4 = mm_strdup("$0");
 		}
+		if ($5[0] == '$')
+		{
+			free($5);
+			$5 = mm_strdup("$0");
+		}
 ECPG: fetch_argsFORWARDALLopt_from_incursor_name addon
 ECPG: fetch_argsBACKWARDALLopt_from_incursor_name addon
 		add_additional_variables($4, false);
-- 
2.1.0.GIT

Corey Huinker

corey.huinker@gmail.com

almost 11 years ago

In reply to: Kyotaro HORIGUCHI (#3)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.

I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to get these
custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.

I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.

My incomplete patch is attached.

On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Show quoted text

Thank you for the comment.

The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?

Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <
24503.1421943472@sss.pgh.pa.us>

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.

Is such a feature and syntax could be allowed to be added?

This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?

The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.

The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.

This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100.

/messages/by-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp

I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.

=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )

=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )

=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )

The definitions of the environment are the following.

CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1
OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM
generate_series(0, 999999) a);

The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.

They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.

Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire. (The difference is not

small:

for instance, toasted values would likely still be toasted at the point
where you're measuring.)

Sure. Finally, the attached patch #1 which does the following
things.

- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.

- postgres_fdw calculates fetch limit bytes by the following
formula,

MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);

The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.

fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)

This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(

On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.

MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)

MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)

MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

diff_so_far.difftext/plain; charset=US-ASCII; name=diff_so_far.diffDownload

diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index e1d4c47..0d3426f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -113,6 +113,7 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			(void) defGetBoolean(def);
 		}
 		else if (strcmp(def->defname, "fdw_startup_cost") == 0 ||
+				 strcmp(def->defname, "fetch_size") == 0 ||
 				 strcmp(def->defname, "fdw_tuple_cost") == 0)
 		{
 			/* these must have a non-negative numeric value */
@@ -155,6 +156,9 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/* fetch_size is available on both server and table */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 5de1835..2729ba3 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -46,6 +46,9 @@ PG_MODULE_MAGIC;
 /* Default CPU cost to process 1 row (above and beyond cpu_tuple_cost). */
 #define DEFAULT_FDW_TUPLE_COST		0.01
 
+/* default fetch size */
+#define DEFAULT_FETCH_SIZE			100
+
 /*
  * FDW-specific planner information kept in RelOptInfo.fdw_private for a
  * foreign table.  This information is collected by postgresGetForeignRelSize.
@@ -73,6 +76,7 @@ typedef struct PgFdwRelationInfo
 	bool		use_remote_estimate;
 	Cost		fdw_startup_cost;
 	Cost		fdw_tuple_cost;
+	int			fetch_size;
 
 	/* Cached catalog information. */
 	ForeignTable *table;
@@ -156,6 +160,9 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	/* fetch size */
+	int			fetch_size;		/* how many rows to get per fetch */
 } PgFdwScanState;
 
 /*
@@ -395,12 +402,13 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->server = GetForeignServer(fpinfo->table->serverid);
 
 	/*
-	 * Extract user-settable option values.  Note that per-table setting of
+	 * Extract user-settable option values.  Note that per-table settings of
 	 * use_remote_estimate overrides per-server setting.
 	 */
 	fpinfo->use_remote_estimate = false;
 	fpinfo->fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
+	fpinfo->fetch_size = DEFAULT_FETCH_SIZE;
 
 	foreach(lc, fpinfo->server->options)
 	{
@@ -412,16 +420,17 @@ postgresGetForeignRelSize(PlannerInfo *root,
 			fpinfo->fdw_startup_cost = strtod(defGetString(def), NULL);
 		else if (strcmp(def->defname, "fdw_tuple_cost") == 0)
 			fpinfo->fdw_tuple_cost = strtod(defGetString(def), NULL);
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtod(defGetString(def), NULL);
 	}
 	foreach(lc, fpinfo->table->options)
 	{
 		DefElem    *def = (DefElem *) lfirst(lc);
 
 		if (strcmp(def->defname, "use_remote_estimate") == 0)
-		{
 			fpinfo->use_remote_estimate = defGetBoolean(def);
-			break;				/* only need the one value */
-		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtod(defGetString(def), NULL);
 	}
 
 	/*
@@ -979,6 +988,14 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 		fsstate->param_values = (const char **) palloc0(numParams * sizeof(char *));
 	else
 		fsstate->param_values = NULL;
+
+
+	
+	ForeignTable *table;
+	ForeignServer *server;
+
+	fsstate->fetch_size = 
+	fpinfo->fetch_size = DEFAULT_FETCH_SIZE;
 }
 
 /*

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 11 years ago

In reply to: Corey Huinker (#4)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.

-----
Sorry, I've been back. Thank you for the comment.

Do you have any insight into where I would pass the custom row fetches from
the table struct to the scan struct?

Yeah it's one simple way to tune it, if the user knows the
appropreate value.

Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.

I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to get these
custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.

Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo->fetch_size can be read in postgresGetForeignPlan. It
newly creates another fdw_private. You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.

I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.

My incomplete patch is attached.

However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.

Finally, the attached patch will work as you intended.

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Show quoted text

On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Thank you for the comment.

The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?

Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <
24503.1421943472@sss.pgh.pa.us>

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.

Is such a feature and syntax could be allowed to be added?

This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?

The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.

The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.

This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100.

/messages/by-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp

I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.

=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )

=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )

=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )

The definitions of the environment are the following.

CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER sv1
OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280) FROM
generate_series(0, 999999) a);

The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.

They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.

Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that users
would normally have any direct knowledge of; nor does it have anything
at all to do with the claimed use-case, where what you'd really need to
measure is bytes transmitted down the wire. (The difference is not

small:

for instance, toasted values would likely still be toasted at the point
where you're measuring.)

Sure. Finally, the attached patch #1 which does the following
things.

- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.

- postgres_fdw calculates fetch limit bytes by the following
formula,

MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);

The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.

fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)

This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(

On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.

MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)

MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)

MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachments:

0001-Make-fetch_size-settable-per-foreign-server-and-fore.patchtext/x-patch; charset=us-asciiDownload

>From 4a26e88bed0d6f6e7e51b39c6d0f9caf4dc3fbea Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Mon, 2 Feb 2015 16:05:22 +0900
Subject: [PATCH] Make fetch_size settable per foreign server and foreign table

postgres_fdw fetches tuples by the fixed fetch size, but the suitable
size varies widely by the configuration. This patch enables users to
set it as an foreign server option or an foreign table option
'fetch_size'.
---
 contrib/postgres_fdw/option.c       |  6 ++++++
 contrib/postgres_fdw/postgres_fdw.c | 37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 7547ec2..2a3ab7d 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -153,6 +153,12 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/*
+		 * fetch_size is available on both server and table, the table setting
+		 * overrides the server setting.
+		 */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d76e739..ac5e416 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -134,6 +134,7 @@ typedef struct PgFdwScanState
 	/* extracted fdw_private data */
 	char	   *query;			/* text of SELECT command */
 	List	   *retrieved_attrs;	/* list of retrieved attribute numbers */
+	int			fetch_size;		/* number of tuples per fetch */
 
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the scan */
@@ -871,6 +872,22 @@ postgresGetForeignPlan(PlannerInfo *root,
 							fdw_private);
 }
 
+static DefElem*
+get_option(List *options, char *optname)
+{
+	ListCell *lc;
+
+	foreach(lc, options)
+	{
+		DefElem	*def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, optname) == 0)
+			return def;
+	}
+	return NULL;
+}
+
+
 /*
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
@@ -889,6 +906,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	int			numParams;
 	int			i;
 	ListCell   *lc;
+	DefElem	   *def;
 
 	/*
 	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(userid, server->serverid);
 
+	/* Reading table options */
+	fsstate->fetch_size = -1;
+
+	def = get_option(table->options, "fetch_size");
+	if (!def)
+		def = get_option(server->options, "fetch_size");
+
+	if (def)
+	{
+		fsstate->fetch_size = strtod(defGetString(def), NULL);
+		if (fsstate->fetch_size < 0)
+			elog(ERROR, "invalid fetch size for foreign table \"%s\"",
+				 get_rel_name(table->relid));
+	}
+	else
+		fsstate->fetch_size = 100;
+
 	/*
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
@@ -2031,7 +2066,7 @@ fetch_more_data(ForeignScanState *node)
 		int			i;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
+		fetch_size = fsstate->fetch_size;
 
 		snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
 				 fetch_size, fsstate->cursor_number);
-- 
2.1.0.GIT

Corey Huinker

corey.huinker@gmail.com

almost 11 years ago

In reply to: Kyotaro HORIGUCHI (#5)

Re: [POC] FETCH limited by bytes.

I applied this patch to REL9_4_STABLE, and I was able to connect to a
foreign database (redshift, actually).

the basic outline of the test is below, names changed to protect my
employment.

create extension if not exists postgres_fdw;

create server redshift_server foreign data wrapper postgres_fdw
options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
fetch_size '150' );

create user mapping for public server redshift_server options ( user
'redacted_user', password 'comeonyouarekiddingright' );

create foreign table redshift_tab150 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema' );

create foreign table redshift_tab151 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema', fetch_size '151' );

-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp150;

-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp151;

For those of you that aren't familiar with Redshift, it's a columnar
database that seems to be a fork of postgres 8.cough. You can connect to it
with modern libpq programs (psql, psycopg2, etc).
Redshift has a table, stv_inflight, which serves about the same purpose as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper didn't
perform so well.

I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.

I was also able to confirm that the second query showed "FETCH 151 FROM c1"
as the query, which shows that per-table overrides also work.

I'd be happy to stop here, but Kyotaro might feel differently. With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.

But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.

Thanks to Kyotaro and Bruce Momjian for their help.

On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Show quoted text

Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.

-----
Sorry, I've been back. Thank you for the comment.

Do you have any insight into where I would pass the custom row fetches

from

the table struct to the scan struct?

Yeah it's one simple way to tune it, if the user knows the
appropreate value.

Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.

I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to get

these

custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.

Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo->fetch_size can be read in postgresGetForeignPlan. It
newly creates another fdw_private. You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.

I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.

My incomplete patch is attached.

However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.

Finally, the attached patch will work as you intended.

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

On Tue, Jan 27, 2015 at 4:24 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Thank you for the comment.

The automatic way to determin the fetch_size looks become too
much for the purpose. An example of non-automatic way is a new
foreign table option like 'fetch_size' but this exposes the
inside too much... Which do you think is preferable?

Thu, 22 Jan 2015 11:17:52 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote

in <

24503.1421943472@sss.pgh.pa.us>

Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:

Hello, as the discuttion on async fetching on postgres_fdw, FETCH
with data-size limitation would be useful to get memory usage
stability of postgres_fdw.

Is such a feature and syntax could be allowed to be added?

This seems like a lot of work, and frankly an incredibly ugly API,
for a benefit that is entirely hypothetical. Have you got numbers
showing any actual performance win for postgres_fdw?

The API is a rush work to make the path for the new parameter
(but, yes, I did too much for the purpose that use from
postgres_fdw..) and it can be any saner syntax but it's not the
time to do so yet.

The data-size limitation, any size to limit, would give
significant gain especially for small sized rows.

This patch began from the fact that it runs about twice faster
when fetch size = 10000 than 100.

/messages/by-id/20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp

I took exec times to get 1M rows from localhost via postgres_fdw
and it showed the following numbers.

=# SELECT a from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.75s
100 60 6.2s 6000 (0.006)
10000 60 2.7s 600000 (0.6 )
33333 60 2.2s 1999980 (2.0 )
66666 60 2.4s 3999960 (4.0 )

=# SELECT a, b, c from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 204 12 s 20400 (0.02 )
1000 204 10 s 204000 (0.2 )
10000 204 5.8s 2040000 (2 )
20000 204 5.9s 4080000 (4 )

=# SELECT a, b, d from ft1;
fetch_size, avg row size(*1), time, alloced_mem/fetch(Mbytes)(*1)
(local) 0.8s
100 1356 17 s 135600 (0.136)
1000 1356 15 s 1356000 (1.356)
1475 1356 13 s 2000100 (2.0 )
2950 1356 13 s 4000200 (4.0 )

The definitions of the environment are the following.

CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host
'localhost', dbname 'postgres');
CREATE USER MAPPING FOR PUBLIC SERVER sv1;
CREATE TABLE lt1 (a int, b timestamp, c text, d text);
CREATE FOREIGN TABLE ft1 (a int, b timestamp, c text, d text) SERVER

sv1

OPTIONS (table_name 'lt1');
INSERT INTO lt1 (SELECT a, now(), repeat('x', 128), repeat('x', 1280)

FROM

generate_series(0, 999999) a);

The "avg row size" is alloced_mem/fetch_size and the alloced_mem
is the sum of HeapTuple[fetch_size] and (HEAPTUPLESIZE +
tup->t_len) for all stored tuples in the receiver side,
fetch_more_data() in postgres_fdw.

They are about 50% gain for the smaller tuple size and 25% for
the larger. They looks to be optimal at where alloced_mem is
around 2MB by the reason unknown to me. Anyway the difference
seems to be significant.

Even if we wanted to do something like this, I strongly object to
measuring size by heap_compute_data_size. That's not a number that

users

would normally have any direct knowledge of; nor does it have

anything

at all to do with the claimed use-case, where what you'd really need

to

measure is bytes transmitted down the wire. (The difference is not

small:

for instance, toasted values would likely still be toasted at the

point

where you're measuring.)

Sure. Finally, the attached patch #1 which does the following
things.

- Sender limits the number of tuples using the sum of the net
length of the column values to be sent, not including protocol
overhead. It is calculated in the added function
slot_compute_attr_size(), using raw length for compressed
values.

- postgres_fdw calculates fetch limit bytes by the following
formula,

MAX_FETCH_MEM - MAX_FETCH_SIZE * (estimated overhead per tuple);

The result of the patch is as follows. MAX_FETCH_MEM = 2MiB and
MAX_FETCH_SIZE = 30000.

fetch_size, avg row size(*1), time, max alloced_mem/fetch(Mbytes)
(auto) 60 2.4s 1080000 ( 1.08)
(auto) 204 7.3s 536400 ( 0.54)
(auto) 1356 15 s 430236 ( 0.43)

This is meaningfully fast but the patch looks too big and the
meaning of the new parameter is hard to understand..:(

On the other hand the cause of the displacements of alloced_mem
shown above is per-tuple overhead, the sum of which is unknown
before execution. The second patch makes FETCH accept the tuple
overhead bytes. The result seems pretty good, but I think this
might be too spcialized to this usage.

MAX_FETCH_SIZE = 30000 and MAX_FETCH_MEM = 2MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
30000 60 2.3s 1080000 ( 1.0)
9932 204 5.7s 1787760 ( 1.7)
1376 1356 13 s 1847484 ( 1.8)

MAX_FETCH_SIZE = 25000 and MAX_FETCH_MEM = 1MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
25000 60 2.4s 900000 ( 0.86)
4358 204 6.6s 816840 ( 0.78)
634 1356 16 s 844488 ( 0.81)

MAX_FETCH_SIZE = 10000 and MAX_FETCH_MEM = 0.5MiB,
max_fetch_size, avg row size(*1), time, max
alloced_mem/fetch(MiBytes)
10000 60 2.8s 360000 ( 0.35)
2376 204 7.8s 427680 ( 0.41)
332 1356 17 s 442224 ( 0.42)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

almost 11 years ago

In reply to: Corey Huinker (#6)

Re: [POC] FETCH limited by bytes.

Hello,

Redshift has a table, stv_inflight, which serves about the same purpose as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper didn't
perform so well.

I agree with you.

I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.

I was also able to confirm that the second query showed "FETCH 151 FROM c1"
as the query, which shows that per-table overrides also work.

I'd be happy to stop here, but Kyotaro might feel differently.

This is enough in its own way, of course.

With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.

The users including me will be happy with it when the users know
how to determin the fetch size. Especially the remote tables are
very few or the configuration will be enough stable.

On widely distributed systems, it would be far difficult to tune
fetch size manually for every foreign tables, so finally it would
be left at the default and safe size, it's 100 or so.

This is the similar discussion about max_wal_size on another
thread. Calculating fetch size is far tougher for users than
setting expected memory usage, I think.

But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.

We cannot know the real length of the text type data in advance,
other than that, even defined as char(n), the n is the
theoretically(or in-design) maximum size for the field but in the
most cases the mean length of the real data would be far small
than that. For that reason, calculating the ratio at the table
creation time seems to be difficult.

However, I agree to the Tom's suggestion that the changes in
FETCH statement is defenitly ugly, especially the "overhead"
argument is prohibitive even for me:(

Thanks to Kyotaro and Bruce Momjian for their help.

Not at all.

regardes,

At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker <corey.huinker@gmail.com> wrote in <CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=P6MXMg7S86Cdqw@mail.gmail.com>

I applied this patch to REL9_4_STABLE, and I was able to connect to a
foreign database (redshift, actually).

the basic outline of the test is below, names changed to protect my
employment.

create extension if not exists postgres_fdw;

create server redshift_server foreign data wrapper postgres_fdw
options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
fetch_size '150' );

create user mapping for public server redshift_server options ( user
'redacted_user', password 'comeonyouarekiddingright' );

create foreign table redshift_tab150 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema' );

create foreign table redshift_tab151 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema', fetch_size '151' );

-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp150;

-- i don't expect the fdw to push the aggregate, this is just a test to see
what query shows up in stv_inflight.
select count(*) from redshift_ccp151;

For those of you that aren't familiar with Redshift, it's a columnar
database that seems to be a fork of postgres 8.cough. You can connect to it
with modern libpq programs (psql, psycopg2, etc).
Redshift has a table, stv_inflight, which serves about the same purpose as
pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper didn't
perform so well.

I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.

I was also able to confirm that the second query showed "FETCH 151 FROM c1"
as the query, which shows that per-table overrides also work.

I'd be happy to stop here, but Kyotaro might feel differently. With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.

But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.

Thanks to Kyotaro and Bruce Momjian for their help.

On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.

-----
Sorry, I've been back. Thank you for the comment.

Do you have any insight into where I would pass the custom row fetches

from

the table struct to the scan struct?

Yeah it's one simple way to tune it, if the user knows the
appropreate value.

Last year I was working on a patch to postgres_fdw where the fetch_size
could be set at the table level and the server level.

I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to get

these

custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.

Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo->fetch_size can be read in postgresGetForeignPlan. It
newly creates another fdw_private. You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.

I bring this up only because it might be a simpler solution, in that the
table designer could set the fetch size very high for narrow tables, and
lower or default for wider tables. It's also a very clean syntax, just
another option on the table and/or server creation.

My incomplete patch is attached.

However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.

Finally, the attached patch will work as you intended.

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Corey Huinker

corey.huinker@gmail.com

almost 11 years ago

In reply to: Kyotaro HORIGUCHI (#7)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

Attached is a diff containing the original (working) patch from my
(incomplete) patch, plus regression test changes and documentation changes.

While it's easy to regression-test the persistence of the fetch_size
options, I am confounded as to how we would show that the fetch_size
setting was respected. I've seen it with my own eyes viewing the query log
on redshift, but I see no way to automate that. Suggestions welcome.

On Fri, Feb 6, 2015 at 3:11 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Show quoted text

Hello,

Redshift has a table, stv_inflight, which serves about the same purpose

as

pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper

didn't

perform so well.

I agree with you.

I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.

I was also able to confirm that the second query showed "FETCH 151 FROM

c1"

as the query, which shows that per-table overrides also work.

I'd be happy to stop here, but Kyotaro might feel differently.

This is enough in its own way, of course.

With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.

The users including me will be happy with it when the users know
how to determin the fetch size. Especially the remote tables are
very few or the configuration will be enough stable.

On widely distributed systems, it would be far difficult to tune
fetch size manually for every foreign tables, so finally it would
be left at the default and safe size, it's 100 or so.

This is the similar discussion about max_wal_size on another
thread. Calculating fetch size is far tougher for users than
setting expected memory usage, I think.

But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.

We cannot know the real length of the text type data in advance,
other than that, even defined as char(n), the n is the
theoretically(or in-design) maximum size for the field but in the
most cases the mean length of the real data would be far small
than that. For that reason, calculating the ratio at the table
creation time seems to be difficult.

However, I agree to the Tom's suggestion that the changes in
FETCH statement is defenitly ugly, especially the "overhead"
argument is prohibitive even for me:(

Thanks to Kyotaro and Bruce Momjian for their help.

Not at all.

regardes,

At Wed, 4 Feb 2015 18:06:02 -0500, Corey Huinker <corey.huinker@gmail.com>
wrote in <CADkLM=eTpKYX5VOfjLr0VvfXhEZbC2UeakN=
P6MXMg7S86Cdqw@mail.gmail.com>

I applied this patch to REL9_4_STABLE, and I was able to connect to a
foreign database (redshift, actually).

the basic outline of the test is below, names changed to protect my
employment.

create extension if not exists postgres_fdw;

create server redshift_server foreign data wrapper postgres_fdw
options ( host 'some.hostname.ext', port '5439', dbname 'redacted',
fetch_size '150' );

create user mapping for public server redshift_server options ( user
'redacted_user', password 'comeonyouarekiddingright' );

create foreign table redshift_tab150 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema' );

create foreign table redshift_tab151 ( <colspecs> )
server redshift_server options (table_name 'redacted_table', schema_name
'redacted_schema', fetch_size '151' );

-- i don't expect the fdw to push the aggregate, this is just a test to

see

what query shows up in stv_inflight.
select count(*) from redshift_ccp150;

-- i don't expect the fdw to push the aggregate, this is just a test to

see

what query shows up in stv_inflight.
select count(*) from redshift_ccp151;

For those of you that aren't familiar with Redshift, it's a columnar
database that seems to be a fork of postgres 8.cough. You can connect to

it

with modern libpq programs (psql, psycopg2, etc).
Redshift has a table, stv_inflight, which serves about the same purpose

as

pg_stat_activity. Redshift seems to perform better with very high fetch
sizes (10,000 is a good start), so the default foreign data wrapper

didn't

perform so well.

I was able to confirm that the first query showed "FETCH 150 FROM c1" as
the query, which is normal highly unhelpful, but in this case it confirms
that tables created in redshift_server do by default use the fetch_size
option given during server creation.

I was also able to confirm that the second query showed "FETCH 151 FROM

c1"

as the query, which shows that per-table overrides also work.

I'd be happy to stop here, but Kyotaro might feel differently. With this
limited patch, one could estimate the number of rows that would fit into
the desired memory block based on the row width and set fetch_size
accordingly.

But we could go further, and have a fetch_max_memory option only at the
table level, and the fdw could do that same memory / estimated_row_size
calculation and derive fetch_size that way *at table creation time*, not
query time.

Thanks to Kyotaro and Bruce Momjian for their help.

On Mon, Feb 2, 2015 at 2:27 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hmm, somehow I removed some recipients, especially the
list. Sorry for the duplicate.

-----
Sorry, I've been back. Thank you for the comment.

Do you have any insight into where I would pass the custom row

fetches

from

the table struct to the scan struct?

Yeah it's one simple way to tune it, if the user knows the
appropreate value.

Last year I was working on a patch to postgres_fdw where the

fetch_size

could be set at the table level and the server level.

I was able to get the settings parsed and they would show up in
pg_foreign_table
and pg_foreign_servers. Unfortunately, I'm not very familiar with how
foreign data wrappers work, so I wasn't able to figure out how to get

these

custom values passed from the PgFdwRelationInfo struct into the
query's PgFdwScanState
struct.

Directly answering, the states needed to be shared among several
stages are holded within fdw_private. Your new variable
fpinfo->fetch_size can be read in postgresGetForeignPlan. It
newly creates another fdw_private. You can pass your values to
ForeignPlan making it hold the value there. Finally, you will get
it in postgresBeginForeginScan and can set it into
PgFdwScanState.

I bring this up only because it might be a simpler solution, in that

the

table designer could set the fetch size very high for narrow tables,

and

lower or default for wider tables. It's also a very clean syntax,

just

another option on the table and/or server creation.

My incomplete patch is attached.

However, the fetch_size is not needed by planner (so far), so we
can simply read the options in postgresBeginForeignScan() and set
into PgFdwScanState. This runs once per exection.

Finally, the attached patch will work as you intended.

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

postgres_fdw_fetch_size_20150227.difftext/plain; charset=US-ASCII; name=postgres_fdw_fetch_size_20150227.diffDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 583cce7..84334e6 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -106,7 +106,8 @@ ALTER SERVER testserver1 OPTIONS (
 	sslcert 'value',
 	sslkey 'value',
 	sslrootcert 'value',
-	sslcrl 'value'
+	sslcrl 'value',
+	fetch_size '101'
 	--requirepeer 'value',
 	-- krbsrvname 'value',
 	-- gsslib 'value',
@@ -114,18 +115,34 @@ ALTER SERVER testserver1 OPTIONS (
 );
 ALTER USER MAPPING FOR public SERVER testserver1
 	OPTIONS (DROP user, DROP password);
-ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1', fetch_size '102');
 ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                             List of foreign tables
- Schema | Table |  Server  |              FDW Options              | Description 
---------+-------+----------+---------------------------------------+-------------
- public | ft1   | loopback | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback | (schema_name 'S 1', table_name 'T 1') | 
+                                      List of foreign tables
+ Schema | Table |  Server  |                       FDW Options                       | Description 
+--------+-------+----------+---------------------------------------------------------+-------------
+ public | ft1   | loopback | (schema_name 'S 1', table_name 'T 1', fetch_size '102') | 
+ public | ft2   | loopback | (schema_name 'S 1', table_name 'T 1')                   | 
 (2 rows)
 
+-- Test what options made it into pg_foreign_server.
+-- Filter for just the server we created.
+SELECT srvoptions FROM pg_foreign_server WHERE srvname = 'testserver1';
+                                                                                                                                                                                     srvoptions                                                                                                                                                                                     
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {use_remote_estimate=false,updatable=true,fdw_startup_cost=123.456,fdw_tuple_cost=0.123,service=value,connect_timeout=value,dbname=value,host=value,hostaddr=value,port=value,application_name=value,keepalives=value,keepalives_idle=value,keepalives_interval=value,sslcompression=value,sslmode=value,sslcert=value,sslkey=value,sslrootcert=value,sslcrl=value,fetch_size=101}
+(1 row)
+
+-- Test what options made it into pg_foreign_table.
+-- Filter this heavily because we cannot specify which foreign server.
+SELECT ftoptions FROM pg_foreign_table WHERE ftoptions @> array['table_name=T 1','fetch_size=102'];
+                      ftoptions                      
+-----------------------------------------------------
+ {"schema_name=S 1","table_name=T 1",fetch_size=102}
+(1 row)
+
 -- Now we should be able to run ANALYZE.
 -- To exercise multiple code paths, we use local stats on ft1
 -- and remote-estimate mode on ft2.
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 7547ec2..2a3ab7d 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -153,6 +153,12 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/*
+		 * fetch_size is available on both server and table, the table setting
+		 * overrides the server setting.
+		 */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 63f0577..733ffb6 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -134,6 +134,7 @@ typedef struct PgFdwScanState
 	/* extracted fdw_private data */
 	char	   *query;			/* text of SELECT command */
 	List	   *retrieved_attrs;	/* list of retrieved attribute numbers */
+	int			fetch_size;		/* number of tuples per fetch */
 
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the scan */
@@ -871,6 +872,22 @@ postgresGetForeignPlan(PlannerInfo *root,
 							fdw_private);
 }
 
+static DefElem*
+get_option(List *options, char *optname)
+{
+	ListCell *lc;
+
+	foreach(lc, options)
+	{
+		DefElem	*def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, optname) == 0)
+			return def;
+	}
+	return NULL;
+}
+
+
 /*
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
@@ -889,6 +906,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	int			numParams;
 	int			i;
 	ListCell   *lc;
+	DefElem	   *def;
 
 	/*
 	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(userid, server->serverid);
 
+	/* Reading table options */
+	fsstate->fetch_size = -1;
+
+	def = get_option(table->options, "fetch_size");
+	if (!def)
+		def = get_option(server->options, "fetch_size");
+
+	if (def)
+	{
+		fsstate->fetch_size = strtod(defGetString(def), NULL);
+		if (fsstate->fetch_size < 0)
+			elog(ERROR, "invalid fetch size for foreign table \"%s\"",
+				 get_rel_name(table->relid));
+	}
+	else
+		fsstate->fetch_size = 100;
+
 	/*
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
@@ -2031,7 +2066,7 @@ fetch_more_data(ForeignScanState *node)
 		int			i;
 
 		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
+		fetch_size = fsstate->fetch_size;
 
 		snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
 				 fetch_size, fsstate->cursor_number);
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 83e8fa7..d12925a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -115,7 +115,8 @@ ALTER SERVER testserver1 OPTIONS (
 	sslcert 'value',
 	sslkey 'value',
 	sslrootcert 'value',
-	sslcrl 'value'
+	sslcrl 'value',
+	fetch_size '101'
 	--requirepeer 'value',
 	-- krbsrvname 'value',
 	-- gsslib 'value',
@@ -123,12 +124,20 @@ ALTER SERVER testserver1 OPTIONS (
 );
 ALTER USER MAPPING FOR public SERVER testserver1
 	OPTIONS (DROP user, DROP password);
-ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1', fetch_size '102');
 ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
 
+-- Test what options made it into pg_foreign_server.
+-- Filter for just the server we created.
+SELECT srvoptions FROM pg_foreign_server WHERE srvname = 'testserver1';
+
+-- Test what options made it into pg_foreign_table.
+-- Filter this heavily because we cannot specify which foreign server.
+SELECT ftoptions FROM pg_foreign_table WHERE ftoptions @> array['table_name=T 1','fetch_size=102'];
+
 -- Now we should be able to run ANALYZE.
 -- To exercise multiple code paths, we use local stats on ft1
 -- and remote-estimate mode on ft2.
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 43adb61..02b004d 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -294,6 +294,34 @@
   </sect3>
 
   <sect3>
+   <title>Fetch Size Options</title>
+
+   <para>
+    By default, rows are fetched from the remote server 100 at a time.
+    This may be overridden using the following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>fetch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</> 
+       should get in each fetch operation. It can be specified for a foreign
+       table or a foreign server. The option specified on a table overrides
+       an option specified for the server.
+       The default is <literal>100</>.
+      </para>
+
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
+
+
+  <sect3>
    <title>Importing Options</title>
 
    <para>

Andres Freund

andres@anarazel.de

over 10 years ago

In reply to: Corey Huinker (#8)

Re: [POC] FETCH limited by bytes.

On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:

+static DefElem*
+get_option(List *options, char *optname)
+{
+	ListCell *lc;
+
+	foreach(lc, options)
+	{
+		DefElem	*def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, optname) == 0)
+			return def;
+	}
+	return NULL;
+}

/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays NULL.
@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);
+	/* Reading table options */
+	fsstate->fetch_size = -1;
+
+	def = get_option(table->options, "fetch_size");
+	if (!def)
+		def = get_option(server->options, "fetch_size");
+
+	if (def)
+	{
+		fsstate->fetch_size = strtod(defGetString(def), NULL);
+		if (fsstate->fetch_size < 0)
+			elog(ERROR, "invalid fetch size for foreign table \"%s\"",
+				 get_rel_name(table->relid));
+	}
+	else
+		fsstate->fetch_size = 100;

I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.

Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?

Have you thought about how this option should cooperate with join
pushdown once implemented?

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Corey Huinker

corey.huinker@gmail.com

over 10 years ago

In reply to: Andres Freund (#9)

Re: [POC] FETCH limited by bytes.

On Wed, Sep 2, 2015 at 9:08 AM, Andres Freund <andres@anarazel.de> wrote:

On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
+static DefElem*
+get_option(List *options, char *optname)
+{
+     ListCell *lc;
+
+     foreach(lc, options)
+     {
+             DefElem *def = (DefElem *) lfirst(lc);
+
+             if (strcmp(def->defname, optname) == 0)
+                     return def;
+     }
+     return NULL;
+}
/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays

NULL.

@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,

int eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);
+     /* Reading table options */
+     fsstate->fetch_size = -1;
+
+     def = get_option(table->options, "fetch_size");
+     if (!def)
+             def = get_option(server->options, "fetch_size");
+
+     if (def)
+     {
+             fsstate->fetch_size = strtod(defGetString(def), NULL);
+             if (fsstate->fetch_size < 0)
+                     elog(ERROR, "invalid fetch size for foreign table
\"%s\"",
+                              get_rel_name(table->relid));
+     }
+     else
+             fsstate->fetch_size = 100;
I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.

Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?

I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.

What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.

Have you thought about how this option should cooperate with join
pushdown once implemented?

I hadn't until now, but I think the only sensible thing would be to
disregard table-specific settings once a second foreign table is detected,
and instead consider only the server-level setting.

I suppose one could argue that if ALL the tables in the join had the same
table-level setting, we should go with that, but I think that would be
complicated, expensive, and generally a good argument for changing the
server setting instead.

#11

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

over 10 years ago

In reply to: Corey Huinker (#10)

Re: [POC] FETCH limited by bytes.

Hello,

@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,

int eflags)

...

+ def = get_option(table->options, "fetch_size");

I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.

Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?

I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.

What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.

Sorry, it certainly shouldn't be a good place to do such thing. I
easily selected the place in order to avoid adding new similar
member in multiple existing structs (PgFdwRelationInfo and
PgFdwScanState).

Having a new member fetch_size is added in PgFdwRelationInfo and
PgFdwScanState, I think postgresGetForeignRelSize is the best
place to do that, from the point that it collects basic
information needed to calculate scan costs.

# Fetch sizes of foreign join would be the future issue..

typedef struct PgFdwRelationInfo
{

...
+ int fetch_size; /* fetch size for this remote table */

====================

postgreGetForeignRelSize()
{

...

fpinfo->table = GetForeignTable(foreigntableid);
fpinfo->server = GetForeignServer(fpinfo->table->serverid);

+     def = get_option(table->options, "fetch_size");
+     ..
+     fpinfo->fetch_size = strtod(defGetString...

Also it is doable in postgresGetForeignPlan and doing there
removes redundant copy of fetch_size from PgFdwRelation to
PgFdwScanState but theoretical basis would be weak.

regards,

On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
+static DefElem*
+get_option(List *options, char *optname)
+{
+     ListCell *lc;
+
+     foreach(lc, options)
+     {
+             DefElem *def = (DefElem *) lfirst(lc);
+
+             if (strcmp(def->defname, optname) == 0)
+                     return def;
+     }
+     return NULL;
+}
/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state stays

NULL.

@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,

int eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);
+     /* Reading table options */
+     fsstate->fetch_size = -1;
+
+     def = get_option(table->options, "fetch_size");
+     if (!def)
+             def = get_option(server->options, "fetch_size");
+
+     if (def)
+     {
+             fsstate->fetch_size = strtod(defGetString(def), NULL);
+             if (fsstate->fetch_size < 0)
+                     elog(ERROR, "invalid fetch size for foreign table
\"%s\"",
+                              get_rel_name(table->relid));
+     }
+     else
+             fsstate->fetch_size = 100;
I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.
Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan node?
And be a part of the scan state?

I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.

What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.

Have you thought about how this option should cooperate with join
pushdown once implemented?

I hadn't until now, but I think the only sensible thing would be to
disregard table-specific settings once a second foreign table is detected,
and instead consider only the server-level setting.

I suppose one could argue that if ALL the tables in the join had the same
table-level setting, we should go with that, but I think that would be
complicated, expensive, and generally a good argument for changing the
server setting instead.

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Corey Huinker

corey.huinker@gmail.com

over 10 years ago

In reply to: Kyotaro HORIGUCHI (#11)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

On Fri, Sep 4, 2015 at 2:45 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,

int eflags)

...

+ def = get_option(table->options, "fetch_size");

I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.

Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan

node?

And be a part of the scan state?

I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.

What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.

Sorry, it certainly shouldn't be a good place to do such thing. I
easily selected the place in order to avoid adding new similar
member in multiple existing structs (PgFdwRelationInfo and
PgFdwScanState).

Having a new member fetch_size is added in PgFdwRelationInfo and
PgFdwScanState, I think postgresGetForeignRelSize is the best
place to do that, from the point that it collects basic
information needed to calculate scan costs.

# Fetch sizes of foreign join would be the future issue..

typedef struct PgFdwRelationInfo
{

...
+ int fetch_size; /* fetch size for this remote table */

====================

postgreGetForeignRelSize()
{

...

fpinfo->table = GetForeignTable(foreigntableid);
fpinfo->server = GetForeignServer(fpinfo->table->serverid);
+     def = get_option(table->options, "fetch_size");
+     ..
+     fpinfo->fetch_size = strtod(defGetString...
Also it is doable in postgresGetForeignPlan and doing there
removes redundant copy of fetch_size from PgFdwRelation to
PgFdwScanState but theoretical basis would be weak.

regards,
On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
+static DefElem*
+get_option(List *options, char *optname)
+{
+     ListCell *lc;
+
+     foreach(lc, options)
+     {
+             DefElem *def = (DefElem *) lfirst(lc);
+
+             if (strcmp(def->defname, optname) == 0)
+                     return def;
+     }
+     return NULL;
+}
/*
* Do nothing in EXPLAIN (no ANALYZE) case. node->fdw_state
stays
NULL.

@@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,

int eflags)
server = GetForeignServer(table->serverid);
user = GetUserMapping(userid, server->serverid);
+     /* Reading table options */
+     fsstate->fetch_size = -1;
+
+     def = get_option(table->options, "fetch_size");
+     if (!def)
+             def = get_option(server->options, "fetch_size");
+
+     if (def)
+     {
+             fsstate->fetch_size = strtod(defGetString(def), NULL);
+             if (fsstate->fetch_size < 0)
+                     elog(ERROR, "invalid fetch size for foreign
table
\"%s\"",
+                              get_rel_name(table->relid));
+     }
+     else
+             fsstate->fetch_size = 100;
I don't think it's a good idea to make such checks at runtime - and
either way it's somethign that should be reported back using an
ereport(), not an elog.
Also, it seems somewhat wrong to determine this at execution
time. Shouldn't this rather be done when creating the foreign scan
node?

And be a part of the scan state?

I agree, that was my original plan, but I wasn't familiar enough with the
FDW architecture to know where the table struct and the scan struct were
both exposed in the same function.

What I submitted incorporated some of Kyotaro's feedback (and working
patch) to my original incomplete patch.

Have you thought about how this option should cooperate with join
pushdown once implemented?

I hadn't until now, but I think the only sensible thing would be to
disregard table-specific settings once a second foreign table is

detected,

and instead consider only the server-level setting.

I suppose one could argue that if ALL the tables in the join had the same
table-level setting, we should go with that, but I think that would be
complicated, expensive, and generally a good argument for changing the
server setting instead.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Ok, with some guidance from RhodiumToad (thanks!) I was able to get the
proper RelOptInfo->Plan->Scan handoff.

What I *don't* know how to do is show that the proper fetch sizes are being
used on the remote server with the resources available in the regression
test. *Suggestions welcome.*

This patch works for my original added test-cases, and works for me
connecting to a redshift cluster that we have, the queries show up in the
console like this:
FETCH 101 FROM c1
FETCH 30 FROM c1
FETCH 50 FROM c1

The (redacted) source of that test is as follows:

begin;
create extension if not exists postgres_fdw;

create server redshift foreign data wrapper postgres_fdw
options (host 'REDACTED', port '5439', dbname 'REDACTED', fetch_size '101');

select * from pg_foreign_server;

create user mapping for public server redshift
options ( user 'REDACTED', password 'REDACTED');

select * from pg_user_mappings;

create foreign table test_table ( date date, tval text )
server redshift
options (table_name 'REDACTED');

select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;

alter server redshift options ( set fetch_size '30' );

select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;

alter foreign table test_table options ( fetch_size '50' );

select count(*) from ( select tval from test_table where date = 'REDACTED'
) x;

rollback;

Attached is the patch / diff against current master.

Attachments:

0002-Make-fetch_size-settable-per-foreign-server-and-fore.patchtext/x-patch; charset=US-ASCII; name=0002-Make-fetch_size-settable-per-foreign-server-and-fore.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 1f417b3..51edc24 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -106,7 +106,8 @@ ALTER SERVER testserver1 OPTIONS (
 	sslcert 'value',
 	sslkey 'value',
 	sslrootcert 'value',
-	sslcrl 'value'
+	sslcrl 'value',
+	fetch_size '101'
 	--requirepeer 'value',
 	-- krbsrvname 'value',
 	-- gsslib 'value',
@@ -114,18 +115,34 @@ ALTER SERVER testserver1 OPTIONS (
 );
 ALTER USER MAPPING FOR public SERVER testserver1
 	OPTIONS (DROP user, DROP password);
-ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1', fetch_size '102');
 ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
-                             List of foreign tables
- Schema | Table |  Server  |              FDW Options              | Description 
---------+-------+----------+---------------------------------------+-------------
- public | ft1   | loopback | (schema_name 'S 1', table_name 'T 1') | 
- public | ft2   | loopback | (schema_name 'S 1', table_name 'T 1') | 
+                                      List of foreign tables
+ Schema | Table |  Server  |                       FDW Options                       | Description 
+--------+-------+----------+---------------------------------------------------------+-------------
+ public | ft1   | loopback | (schema_name 'S 1', table_name 'T 1', fetch_size '102') | 
+ public | ft2   | loopback | (schema_name 'S 1', table_name 'T 1')                   | 
 (2 rows)
 
+-- Test what options made it into pg_foreign_server.
+-- Filter for just the server we created.
+SELECT srvoptions FROM pg_foreign_server WHERE srvname = 'testserver1';
+                                                                                                                                                                                     srvoptions                                                                                                                                                                                     
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ {use_remote_estimate=false,updatable=true,fdw_startup_cost=123.456,fdw_tuple_cost=0.123,service=value,connect_timeout=value,dbname=value,host=value,hostaddr=value,port=value,application_name=value,keepalives=value,keepalives_idle=value,keepalives_interval=value,sslcompression=value,sslmode=value,sslcert=value,sslkey=value,sslrootcert=value,sslcrl=value,fetch_size=101}
+(1 row)
+
+-- Test what options made it into pg_foreign_table.
+-- Filter this heavily because we cannot specify which foreign server.
+SELECT ftoptions FROM pg_foreign_table WHERE ftoptions @> array['table_name=T 1','fetch_size=102'];
+                      ftoptions                      
+-----------------------------------------------------
+ {"schema_name=S 1","table_name=T 1",fetch_size=102}
+(1 row)
+
 -- Now we should be able to run ANALYZE.
 -- To exercise multiple code paths, we use local stats on ft1
 -- and remote-estimate mode on ft2.
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 7547ec2..2a3ab7d 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -153,6 +153,12 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/*
+		 * fetch_size is available on both server and table, the table setting
+		 * overrides the server setting.
+		 */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index e4d799c..11abd23 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -79,6 +79,8 @@ typedef struct PgFdwRelationInfo
 	ForeignTable *table;
 	ForeignServer *server;
 	UserMapping *user;			/* only set in use_remote_estimate mode */
+
+	int			fetch_size      /* fetch size for this remote table */
 } PgFdwRelationInfo;
 
 /*
@@ -99,7 +101,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -135,6 +139,7 @@ typedef struct PgFdwScanState
 	/* extracted fdw_private data */
 	char	   *query;			/* text of SELECT command */
 	List	   *retrieved_attrs;	/* list of retrieved attribute numbers */
+	int			fetch_size;		/* number of tuples per fetch */
 
 	/* for remote query execution */
 	PGconn	   *conn;			/* connection for the scan */
@@ -329,6 +334,8 @@ static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
+static int get_fetch_size(ForeignTable	*table,
+					ForeignServer *server);
 
 
 /*
@@ -397,6 +404,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	/* Look up foreign-table catalog info. */
 	fpinfo->table = GetForeignTable(foreigntableid);
 	fpinfo->server = GetForeignServer(fpinfo->table->serverid);
+	/* Look up any table-specific fetch size */
+	fpinfo->fetch_size = get_fetch_size(fpinfo->table,fpinfo->server);
 
 	/*
 	 * Extract user-settable option values.  Note that per-table setting of
@@ -858,8 +867,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwScanPrivateIndex, above.
 	 */
-	fdw_private = list_make2(makeString(sql.data),
-							 retrieved_attrs);
+	fdw_private = list_make3(makeString(sql.data),
+							 retrieved_attrs,
+							 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, local filtering
@@ -877,6 +887,22 @@ postgresGetForeignPlan(PlannerInfo *root,
 							NIL /* no custom tlist */ );
 }
 
+static DefElem*
+get_option(List *options, char *optname)
+{
+	ListCell *lc;
+
+	foreach(lc, options)
+	{
+		DefElem	*def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, optname) == 0)
+			return def;
+	}
+	return NULL;
+}
+
+
 /*
  * postgresBeginForeignScan
  *		Initiate an executor scan of a foreign PostgreSQL table.
@@ -895,6 +921,7 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	int			numParams;
 	int			i;
 	ListCell   *lc;
+	DefElem	   *def;
 
 	/*
 	 * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays NULL.
@@ -921,6 +948,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(userid, server->serverid);
 
+
+
 	/*
 	 * Get connection to the foreign server.  Connection manager will
 	 * establish new connection if necessary.
@@ -936,6 +965,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 									 FdwScanPrivateSelectSql));
 	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
 											   FdwScanPrivateRetrievedAttrs);
+	fsstate->fetch_size = intVal(list_nth(fsplan->fdw_private,
+								 FdwScanPrivateFetchSize));
 
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
 	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -2045,15 +2076,11 @@ fetch_more_data(ForeignScanState *node)
 	{
 		PGconn	   *conn = fsstate->conn;
 		char		sql[64];
-		int			fetch_size;
 		int			numrows;
 		int			i;
 
-		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
-
 		snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
-				 fetch_size, fsstate->cursor_number);
+				 fsstate->fetch_size, fsstate->cursor_number);
 
 		res = PQexec(conn, sql);
 		/* On error, report the original query, not the FETCH. */
@@ -2081,7 +2108,7 @@ fetch_more_data(ForeignScanState *node)
 			fsstate->fetch_ct_2++;
 
 		/* Must be EOF if we didn't get as many tuples as we asked for. */
-		fsstate->eof_reached = (numrows < fetch_size);
+		fsstate->eof_reached = (numrows < fsstate->fetch_size);
 
 		PQclear(res);
 		res = NULL;
@@ -2465,8 +2492,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 			 * then just adjust rowstoskip and samplerows appropriately.
 			 */
 
-			/* The fetch size is arbitrary, but shouldn't be enormous. */
-			fetch_size = 100;
+			fetch_size = get_fetch_size(table,server);
 
 			/* Fetch some rows */
 			snprintf(fetch_sql, sizeof(fetch_sql), "FETCH %d FROM c%u",
@@ -2621,7 +2647,7 @@ postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_FDW_INVALID_OPTION_NAME),
-					 errmsg("invalid option \"%s\"", def->defname)));
+						errmsg("invalid option \"%s\"", def->defname)));
 	}
 
 	/*
@@ -2994,3 +3020,40 @@ conversion_error_callback(void *arg)
 				   NameStr(tupdesc->attrs[errpos->cur_attno - 1]->attname),
 				   RelationGetRelationName(errpos->rel));
 }
+
+/*
+ * Scan the foreign sever and foreign table definitions for any explicit
+ * fetch_size options. Prefer table-specific option to server-wide option.
+ * If none are found, keep the previous default of 100
+ */
+static int
+get_fetch_size(ForeignTable	*table,
+			   ForeignServer *server)
+{
+	DefElem    *def;
+	int			fetch_size;
+
+	def = get_option(table->options, "fetch_size");
+	if (!def)
+	{
+		/*
+		 * In the absence of table-specific fetch size,
+		 * look for a server-specific one
+		 */
+		def = get_option(server->options, "fetch_size");
+	}
+
+	if (def)
+	{
+		fetch_size = strtol(defGetString(def), NULL,10);
+		if (fetch_size < 0)
+			ereport(ERROR,
+                    (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+                        errmsg("invalid fetch size for foreign table \"%s\"",
+                                get_rel_name(table->relid)),
+                        errhint("fetch_size must be > 0.")));
+	}
+	else
+		fetch_size = 100;
+	return fetch_size;
+}
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index fcdd92e..8521fab 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -115,7 +115,8 @@ ALTER SERVER testserver1 OPTIONS (
 	sslcert 'value',
 	sslkey 'value',
 	sslrootcert 'value',
-	sslcrl 'value'
+	sslcrl 'value',
+	fetch_size '101'
 	--requirepeer 'value',
 	-- krbsrvname 'value',
 	-- gsslib 'value',
@@ -123,12 +124,20 @@ ALTER SERVER testserver1 OPTIONS (
 );
 ALTER USER MAPPING FOR public SERVER testserver1
 	OPTIONS (DROP user, DROP password);
-ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1');
+ALTER FOREIGN TABLE ft1 OPTIONS (schema_name 'S 1', table_name 'T 1', fetch_size '102');
 ALTER FOREIGN TABLE ft2 OPTIONS (schema_name 'S 1', table_name 'T 1');
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
 \det+
 
+-- Test what options made it into pg_foreign_server.
+-- Filter for just the server we created.
+SELECT srvoptions FROM pg_foreign_server WHERE srvname = 'testserver1';
+
+-- Test what options made it into pg_foreign_table.
+-- Filter this heavily because we cannot specify which foreign server.
+SELECT ftoptions FROM pg_foreign_table WHERE ftoptions @> array['table_name=T 1','fetch_size=102'];
+
 -- Now we should be able to run ANALYZE.
 -- To exercise multiple code paths, we use local stats on ft1
 -- and remote-estimate mode on ft2.
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 7c92282..a96f9a2 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -302,6 +302,34 @@
   </sect3>
 
   <sect3>
+   <title>Fetch Size Options</title>
+
+   <para>
+    By default, rows are fetched from the remote server 100 at a time.
+    This may be overridden using the following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>fetch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</> 
+       should get in each fetch operation. It can be specified for a foreign
+       table or a foreign server. The option specified on a table overrides
+       an option specified for the server.
+       The default is <literal>100</>.
+      </para>
+
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
+
+
+  <sect3>
    <title>Importing Options</title>
 
    <para>

#13

Michael Paquier

michael.paquier@gmail.com

about 10 years ago

In reply to: Corey Huinker (#12)

Re: [POC] FETCH limited by bytes.

On Mon, Sep 21, 2015 at 1:52 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

Attached is the patch / diff against current master.

I have moved this patch to next CF, there is a new patch, but no reviews.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Corey Huinker

corey.huinker@gmail.com

about 10 years ago

In reply to: Michael Paquier (#13)

Re: [POC] FETCH limited by bytes.

On Wed, Dec 23, 2015 at 8:43 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Mon, Sep 21, 2015 at 1:52 PM, Corey Huinker <corey.huinker@gmail.com>
wrote:

Attached is the patch / diff against current master.

I have moved this patch to next CF, there is a new patch, but no reviews.
--
Michael

That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!

#15

Jim Nasby

Jim.Nasby@BlueTreble.com

about 10 years ago

In reply to: Corey Huinker (#14)

Re: [POC] FETCH limited by bytes.

On 12/23/15 12:15 PM, Corey Huinker wrote:

That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!

A combination of repeat() and generate_series()?

I'm guessing it's not that obvious and that I'm missing something; can
you elaborate?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Corey Huinker

corey.huinker@gmail.com

about 10 years ago

In reply to: Jim Nasby (#15)

Re: [POC] FETCH limited by bytes.

On Wed, Dec 23, 2015 at 3:08 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 12/23/15 12:15 PM, Corey Huinker wrote:

That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!

A combination of repeat() and generate_series()?

I'm guessing it's not that obvious and that I'm missing something; can you
elaborate?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

I'll try. So the basic test of whether the FDW respected the fetch limit is
this:

1. create foreign server using postgres_fdw, create foreign table.
2. run a query against that table. it works. great.
3. alter server set fetch size option to 101 (or any number different from
100)
4. run same query against the table. the server side should show that the
result set was fetched in 101 row chunks[1].
5. alter table set fetch size option to 102 (or any number different from
100 and the number you picked in #3)
6. run same query against the table. the server side should show that the
result set was fetched in 102 row chunks[1].

The parts marked [1] are the problem...because the way I know it works is
looking at the query console on the remote redshift cluster where the query
column reads "FETCH 101 in c1" or somesuch rather than the query text.
That's great, *I* know it works, but I don't know how capture that
information from a vanilla postgres server, and I don't know if we can do
the regression with a loopback connection, or if we'd need to set up a
second pg instance for the regression test scaffolding.

#17

Kyotaro HORIGUCHI

horiguchi.kyotaro@lab.ntt.co.jp

about 10 years ago

In reply to: Corey Huinker (#16)

Re: [POC] FETCH limited by bytes.

Hello,

At Thu, 24 Dec 2015 18:31:42 -0500, Corey Huinker <corey.huinker@gmail.com> wrote in <CADkLM=fh+ZUEykcCDu8P0PPrOyYwLEp5OBRjKCe5O7swqDF65w@mail.gmail.com>

On Wed, Dec 23, 2015 at 3:08 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 12/23/15 12:15 PM, Corey Huinker wrote:

That's fair. I'm still at a loss for how to show that the fetch size was
respected by the remote server, suggestions welcome!

A combination of repeat() and generate_series()?

I'm guessing it's not that obvious and that I'm missing something; can you
elaborate?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

I'll try. So the basic test of whether the FDW respected the fetch limit is
this:

1. create foreign server using postgres_fdw, create foreign table.
2. run a query against that table. it works. great.
3. alter server set fetch size option to 101 (or any number different from
100)
4. run same query against the table. the server side should show that the
result set was fetched in 101 row chunks[1].
5. alter table set fetch size option to 102 (or any number different from
100 and the number you picked in #3)
6. run same query against the table. the server side should show that the
result set was fetched in 102 row chunks[1].

The parts marked [1] are the problem...because the way I know it works is
looking at the query console on the remote redshift cluster where the query
column reads "FETCH 101 in c1" or somesuch rather than the query text.
That's great, *I* know it works, but I don't know how capture that
information from a vanilla postgres server, and I don't know if we can do
the regression with a loopback connection, or if we'd need to set up a
second pg instance for the regression test scaffolding.

I believe it won't be needed as a regression test but DEBUGn
messages could be usable to see "what was sent to the remote
side". It can be shown to the console so easily done in
regression test. See the deocumentation for client_min_messages
for details. (It could receive far many messages then expected,
though, and maybe fragile for changing in other part.)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Corey Huinker

corey.huinker@gmail.com

about 10 years ago

In reply to: Kyotaro HORIGUCHI (#17)

Re: [POC] FETCH limited by bytes.

On Fri, Dec 25, 2015 at 12:31 AM, Kyotaro HORIGUCHI <
horiguchi.kyotaro@lab.ntt.co.jp> wrote:

Hello,

At Thu, 24 Dec 2015 18:31:42 -0500, Corey Huinker <corey.huinker@gmail.com>
wrote in <CADkLM=
fh+ZUEykcCDu8P0PPrOyYwLEp5OBRjKCe5O7swqDF65w@mail.gmail.com>

On Wed, Dec 23, 2015 at 3:08 PM, Jim Nasby <Jim.Nasby@bluetreble.com>

wrote:

On 12/23/15 12:15 PM, Corey Huinker wrote:

That's fair. I'm still at a loss for how to show that the fetch size

was

respected by the remote server, suggestions welcome!

A combination of repeat() and generate_series()?

I'm guessing it's not that obvious and that I'm missing something; can

you

elaborate?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

I'll try. So the basic test of whether the FDW respected the fetch limit

is

this:

1. create foreign server using postgres_fdw, create foreign table.
2. run a query against that table. it works. great.
3. alter server set fetch size option to 101 (or any number different

from

100)
4. run same query against the table. the server side should show that the
result set was fetched in 101 row chunks[1].
5. alter table set fetch size option to 102 (or any number different from
100 and the number you picked in #3)
6. run same query against the table. the server side should show that the
result set was fetched in 102 row chunks[1].

The parts marked [1] are the problem...because the way I know it works is
looking at the query console on the remote redshift cluster where the

query

column reads "FETCH 101 in c1" or somesuch rather than the query text.
That's great, *I* know it works, but I don't know how capture that
information from a vanilla postgres server, and I don't know if we can do
the regression with a loopback connection, or if we'd need to set up a
second pg instance for the regression test scaffolding.

I believe it won't be needed as a regression test but DEBUGn
messages could be usable to see "what was sent to the remote
side". It can be shown to the console so easily done in
regression test. See the deocumentation for client_min_messages
for details. (It could receive far many messages then expected,
though, and maybe fragile for changing in other part.)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Ok, I'll see what debug-level messages reveal.

#19

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 10 years ago

In reply to: Tom Lane (#2)

Re: [POC] FETCH limited by bytes.

Have you got numbers showing any actual performance win for postgres_fdw?

For JDBC purposes, it would be nice to have an ability of asking
backend "to stop fetching if produced more than X MiB of response
data".
For small table (4 int4 fields), having decent fetchSize (~1000) makes
result processing 7 times faster than with fetchSize of 50 rows (14 ms
-> 2 ms for 2000 rows).
Here are the measurements: [1]https://github.com/pgjdbc/pgjdbc/issues/292#issue-82595473 and [2]https://github.com/pgjdbc/pgjdbc/issues/292#issuecomment-107019387.

Note: it is not required to precisely follow given "max fetch bytes"
limit. It would be enough just to stop after certain amount of data
was sent.
The whole thing of using limited fetch size is to avoid running out of
memory at client side.
I do not think developers care how many rows is fetched at once. It
they do, they should rather use "limit X" SQL syntax.

Do you have a suggestion for such a use case?

For fixed-size data types, JDBC driver can estimate "max sane fetch
size", however:
1) In order to know data types, a roundtrip is required. This means
the first fetch must be conservative, thus small queries would be
penalized.
2) For variable length types there is no way to estimate "sane number
of rows", except of using "average row size of already received data".
This is not reliable, especially if the first rows have nulls, and
subsequent ones contain non-empty strings.

[1]: https://github.com/pgjdbc/pgjdbc/issues/292#issue-82595473
[2]: https://github.com/pgjdbc/pgjdbc/issues/292#issuecomment-107019387

Vladimir

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Corey Huinker

corey.huinker@gmail.com

about 10 years ago

In reply to: Vladimir Sitnikov (#19)

Re: [POC] FETCH limited by bytes.

On Sat, Dec 26, 2015 at 6:16 AM, Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

Have you got numbers showing any actual performance win for postgres_fdw?

For JDBC purposes, it would be nice to have an ability of asking
backend "to stop fetching if produced more than X MiB of response
data".
For small table (4 int4 fields), having decent fetchSize (~1000) makes
result processing 7 times faster than with fetchSize of 50 rows (14 ms
-> 2 ms for 2000 rows).
Here are the measurements: [1] and [2].

Note: it is not required to precisely follow given "max fetch bytes"
limit. It would be enough just to stop after certain amount of data
was sent.
The whole thing of using limited fetch size is to avoid running out of
memory at client side.
I do not think developers care how many rows is fetched at once. It
they do, they should rather use "limit X" SQL syntax.

Do you have a suggestion for such a use case?

For fixed-size data types, JDBC driver can estimate "max sane fetch
size", however:
1) In order to know data types, a roundtrip is required. This means
the first fetch must be conservative, thus small queries would be
penalized.
2) For variable length types there is no way to estimate "sane number
of rows", except of using "average row size of already received data".
This is not reliable, especially if the first rows have nulls, and
subsequent ones contain non-empty strings.

[1]: https://github.com/pgjdbc/pgjdbc/issues/292#issue-82595473
[2]: https://github.com/pgjdbc/pgjdbc/issues/292#issuecomment-107019387

Vladimir

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

I believe that Kyotaro proposed something like that, wherein the FDW would
be more adaptive based on the amount of memory available, and fetch a
number of rows that, by its estimation, would fit in the memory available.
I don't know the progress of that patch.

This patch is a far less complicated solution and puts the burden on the
DBA to figure out approximately how many rows would fit in memory based on
the average row size, and set the per-table option accordingly. If it is
later determined that the rows are now too heavy to fit into the space
allotted, the fetch size can be altered for that table as needed.

#21

Vladimir Sitnikov

sitnikov.vladimir@gmail.com

about 10 years ago

In reply to: Corey Huinker (#20)

Re: [POC] FETCH limited by bytes.

and fetch a number of rows that, by its estimation, would fit in the memory available

What's wrong with having size limit in the first place? It seems to
make much more sense.

Vladimir

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Corey Huinker

corey.huinker@gmail.com

about 10 years ago

In reply to: Corey Huinker (#18)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

I believe it won't be needed as a regression test but DEBUGn
messages could be usable to see "what was sent to the remote
side". It can be shown to the console so easily done in
regression test. See the deocumentation for client_min_messages
for details. (It could receive far many messages then expected,
though, and maybe fragile for changing in other part.)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Ok, I'll see what debug-level messages reveal.

Here's a re-based patch. Notable changes since the last patch are:

- some changes had to move to postgres_fdw.h
- the regression tests are in their own script fetch_size.sql /
fetch_size.out. that should make things easier for the reviewer(s), even if
we later merge it into postgres_fdw.sql/.out.
- I put braces around a multi-line error ereport(). That might be a style
violation, but the statement seemed confusing without it.
- The fetch sql is reported as a DEBUG1 level ereport(), and confirms that
server level options are respected, and table level options are used in
favor of server-level options.

I left the DEBUG1-level message in this patch so that others can see the
output, but I assume we would omit that for production code, with
corresponding deletions from fetch_size.sql and fetch_size.out.

Attachments:

0001-Make-fetch_size-settable-per-server-and-table.patchtext/x-patch; charset=US-ASCII; name=0001-Make-fetch_size-settable-per-server-and-table.patchDownload

diff --git a/contrib/postgres_fdw/Makefile b/contrib/postgres_fdw/Makefile
index 3543312..5d50b30 100644
--- a/contrib/postgres_fdw/Makefile
+++ b/contrib/postgres_fdw/Makefile
@@ -10,7 +10,7 @@ SHLIB_LINK = $(libpq)
 EXTENSION = postgres_fdw
 DATA = postgres_fdw--1.0.sql
 
-REGRESS = postgres_fdw
+REGRESS = postgres_fdw fetch_size
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 380ac80..f482dcd 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -162,6 +162,12 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/*
+		 * fetch_size is available on both server and table, the table setting
+		 * overrides the server setting.
+		 */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index f501c6c..d2595f6 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -303,6 +307,8 @@ static HeapTuple make_tuple_from_result_row(PGresult *res,
 						   List *retrieved_attrs,
 						   MemoryContext temp_context);
 static void conversion_error_callback(void *arg);
+static int get_fetch_size(ForeignTable	*table,
+					ForeignServer *server);
 
 
 /*
@@ -371,6 +377,8 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	/* Look up foreign-table catalog info. */
 	fpinfo->table = GetForeignTable(foreigntableid);
 	fpinfo->server = GetForeignServer(fpinfo->table->serverid);
+	/* Look up any table-specific fetch size */
+	fpinfo->fetch_size = get_fetch_size(fpinfo->table,fpinfo->server);
 
 	/*
 	 * Extract user-settable option values.  Note that per-table setting of
@@ -1069,6 +1077,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = list_make2(makeString(sql.data),
 							 retrieved_attrs);
+	fdw_private = list_make3(makeString(sql.data),
+							 retrieved_attrs,
+							 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, filtering expressions,
@@ -1147,6 +1158,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 									 FdwScanPrivateSelectSql));
 	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
 											   FdwScanPrivateRetrievedAttrs);
+	fsstate->fetch_size = intVal(list_nth(fsplan->fdw_private,
+								 FdwScanPrivateFetchSize));
 
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
 	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -2275,15 +2288,13 @@ fetch_more_data(ForeignScanState *node)
 	{
 		PGconn	   *conn = fsstate->conn;
 		char		sql[64];
-		int			fetch_size;
 		int			numrows;
 		int			i;
 
-		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
-
 		snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
-				 fetch_size, fsstate->cursor_number);
+				 fsstate->fetch_size, fsstate->cursor_number);
+
+		ereport(DEBUG1, (errmsg_internal("Fetch Command: %s", sql)));
 
 		res = PQexec(conn, sql);
 		/* On error, report the original query, not the FETCH. */
@@ -2311,7 +2322,7 @@ fetch_more_data(ForeignScanState *node)
 			fsstate->fetch_ct_2++;
 
 		/* Must be EOF if we didn't get as many tuples as we asked for. */
-		fsstate->eof_reached = (numrows < fetch_size);
+		fsstate->eof_reached = (numrows < fsstate->fetch_size);
 
 		PQclear(res);
 		res = NULL;
@@ -2695,8 +2706,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 			 * then just adjust rowstoskip and samplerows appropriately.
 			 */
 
-			/* The fetch size is arbitrary, but shouldn't be enormous. */
-			fetch_size = 100;
+			fetch_size = get_fetch_size(table,server);
 
 			/* Fetch some rows */
 			snprintf(fetch_sql, sizeof(fetch_sql), "FETCH %d FROM c%u",
@@ -3252,3 +3262,57 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
+static DefElem*
+get_option(List *options, char *optname)
+{
+	ListCell *lc;
+
+	foreach(lc, options)
+	{
+		DefElem	*def = (DefElem *) lfirst(lc);
+
+		if (strcmp(def->defname, optname) == 0)
+			return def;
+	}
+	return NULL;
+}
+
+/*
+ * Scan the foreign sever and foreign table definitions for any explicit
+ * fetch_size options. Prefer table-specific option to server-wide option.
+ * If none are found, keep the previous default of 100
+ */
+static int
+get_fetch_size(ForeignTable	*table,
+			   ForeignServer *server)
+{
+	DefElem    *def;
+	int			fetch_size;
+
+	def = get_option(table->options, "fetch_size");
+	if (!def)
+	{
+		/*
+		 * In the absence of table-specific fetch size,
+		 * look for a server-specific one
+		 */
+		def = get_option(server->options, "fetch_size");
+	}
+
+	if (def)
+	{
+		fetch_size = strtol(defGetString(def), NULL,10);
+		if (fetch_size < 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+						errmsg("invalid fetch size for foreign table \"%s\"",
+								get_rel_name(table->relid)),
+						errhint("fetch_size must be > 0.")));
+		}
+	}
+	else
+		fetch_size = 100;
+	return fetch_size;
+}
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index f243de8..fad6ae4 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -53,6 +53,8 @@ typedef struct PgFdwRelationInfo
 	ForeignTable *table;
 	ForeignServer *server;
 	UserMapping *user;			/* only set in use_remote_estimate mode */
+
+	int			fetch_size;      /* fetch size for this remote table */
 } PgFdwRelationInfo;
 
 /* in postgres_fdw.c */
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 5a829d5..19350c5 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -342,6 +342,34 @@
   </sect3>
 
   <sect3>
+   <title>Fetch Size Options</title>
+
+   <para>
+    By default, rows are fetched from the remote server 100 at a time.
+    This may be overridden using the following option:
+   </para>
+
+   <variablelist>
+
+    <varlistentry>
+     <term><literal>fetch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</> 
+       should get in each fetch operation. It can be specified for a foreign
+       table or a foreign server. The option specified on a table overrides
+       an option specified for the server.
+       The default is <literal>100</>.
+      </para>
+
+     </listitem>
+    </varlistentry>
+
+   </variablelist>
+  </sect3>
+
+
+  <sect3>
    <title>Importing Options</title>
 
    <para>

#23

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Corey Huinker (#22)

Re: [POC] FETCH limited by bytes.

On Thu, Dec 31, 2015 at 1:10 AM, Corey Huinker <corey.huinker@gmail.com> wrote:

Here's a re-based patch. Notable changes since the last patch are:

- some changes had to move to postgres_fdw.h
- the regression tests are in their own script fetch_size.sql /
fetch_size.out. that should make things easier for the reviewer(s), even if
we later merge it into postgres_fdw.sql/.out.
- I put braces around a multi-line error ereport(). That might be a style
violation, but the statement seemed confusing without it.
- The fetch sql is reported as a DEBUG1 level ereport(), and confirms that
server level options are respected, and table level options are used in
favor of server-level options.

I left the DEBUG1-level message in this patch so that others can see the
output, but I assume we would omit that for production code, with
corresponding deletions from fetch_size.sql and fetch_size.out.

Review:

- There is an established idiom in postgres_fdw for how to extract
data from lists of DefElems, and this patch does something else
instead. Please make it look like postgresIsForeignRelUpdatable,
postgresGetForeignRelSize, and postgresImportForeignSchema instead of
inventing something new. (Note that your approach would require
multiple passes over the list if you needed information on multiple
options, whereas the existing approach does not.)

- I think the comment in InitPgFdwOptions() could be reduced to a
one-line comment similar to those already present.

- I would reduce the debug level on the fetch command to something
lower than DEBUG1, or drop it entirely. If you keep it, you need to
fix the formatting so that the line breaks look like other ereport()
calls.

- We could consider folding fetch_size into "Remote Execution
Options", but maybe that's too clever.

I would not worry about trying to get this into the regression tests.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Corey Huinker

corey.huinker@gmail.com

almost 10 years ago

In reply to: Robert Haas (#23)

Re: [POC] FETCH limited by bytes.

Review:

- There is an established idiom in postgres_fdw for how to extract
data from lists of DefElems, and this patch does something else
instead. Please make it look like postgresIsForeignRelUpdatable,
postgresGetForeignRelSize, and postgresImportForeignSchema instead of
inventing something new. (Note that your approach would require
multiple passes over the list if you needed information on multiple
options, whereas the existing approach does not.)

I will look into that. The original patch pre-dates import foreign schema,
so I'm not surprised that I missed the established pattern.

- I think the comment in InitPgFdwOptions() could be reduced to a
one-line comment similar to those already present.

Aye.

- I would reduce the debug level on the fetch command to something
lower than DEBUG1, or drop it entirely. If you keep it, you need to
fix the formatting so that the line breaks look like other ereport()
calls.

As I mentioned, I plan to drop it entirely. It was only there to show a
reviewer that it works as advertised. There's not much to see without it.

- We could consider folding fetch_size into "Remote Execution
Options", but maybe that's too clever.

If you care to explain, I'm listening. Otherwise I'm going forward with the
other suggestions you've made.

I would not worry about trying to get this into the regression tests.

Happy to hear that.

#25

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Corey Huinker (#24)

Re: [POC] FETCH limited by bytes.

On Mon, Jan 25, 2016 at 1:21 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

- We could consider folding fetch_size into "Remote Execution
Options", but maybe that's too clever.

If you care to explain, I'm listening. Otherwise I'm going forward with the
other suggestions you've made.

It's just a little unfortunate to have multiple sections with only a
single option in each. It would be nice to avoid that somehow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Corey Huinker

corey.huinker@gmail.com

almost 10 years ago

In reply to: Robert Haas (#25)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

On Mon, Jan 25, 2016 at 3:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 25, 2016 at 1:21 PM, Corey Huinker <corey.huinker@gmail.com>
wrote:

- We could consider folding fetch_size into "Remote Execution
Options", but maybe that's too clever.

If you care to explain, I'm listening. Otherwise I'm going forward with

the

other suggestions you've made.

It's just a little unfortunate to have multiple sections with only a
single option in each. It would be nice to avoid that somehow.

Revised in patch v3:
* get_option() and get_fetch_size() removed, fetch_size searches added to
existing loops.
* Move fetch_size <= 0 tests into postgres_fdw_validator() routine in
option.c
* DEBUG1 message removed, never intended that to live beyond the proof of
concept.
* Missing regression test mentioned in makefile de-mentioned, as there's
nothing to see without the DEBUG1 message.
* Multi-line comment shrunk

(There's a v2 patch that is prior to the change to postgres_fdw_validator()
in option.c, but in retrospect that's not interesting to you).

I'm not too keen on having *no* new regression tests, but defer to your
judgement.

Still not sure what you mean by remote execution options. But it might be
simpler now that the patch is closer to your expectations.

Attachments:

v3.Make-fetch_size-settable-per-server-and-table.patchapplication/octet-stream; name=v3.Make-fetch_size-settable-per-server-and-table.patchDownload

diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 86a5789..f89de2f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -131,6 +131,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			/* check list syntax, warn about uninstalled extensions */
 			(void) ExtractExtensionList(defGetString(def), true);
 		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+		{
+			int		fetch_size;
+
+			fetch_size = strtol(defGetString(def), NULL,10);
+			if (fetch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 	}
 
 	PG_RETURN_VOID();
@@ -162,6 +173,9 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/* fetch_size is available on both server and table */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 374faf5..0e66073 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -380,6 +384,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
 	fpinfo->shippable_extensions = NIL;
+	fpinfo->fetch_size = 100;
 
 	foreach(lc, fpinfo->server->options)
 	{
@@ -394,16 +399,17 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		else if (strcmp(def->defname, "extensions") == 0)
 			fpinfo->shippable_extensions =
 				ExtractExtensionList(defGetString(def), false);
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 	foreach(lc, fpinfo->table->options)
 	{
 		DefElem    *def = (DefElem *) lfirst(lc);
 
 		if (strcmp(def->defname, "use_remote_estimate") == 0)
-		{
 			fpinfo->use_remote_estimate = defGetBoolean(def);
-			break;				/* only need the one value */
-		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 
 	/*
@@ -1069,6 +1075,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = list_make2(makeString(sql.data),
 							 retrieved_attrs);
+	fdw_private = list_make3(makeString(sql.data),
+							 retrieved_attrs,
+							 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, filtering expressions,
@@ -1147,6 +1156,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 									 FdwScanPrivateSelectSql));
 	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
 											   FdwScanPrivateRetrievedAttrs);
+	fsstate->fetch_size = intVal(list_nth(fsplan->fdw_private,
+								 FdwScanPrivateFetchSize));
 
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
 	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -2275,15 +2286,11 @@ fetch_more_data(ForeignScanState *node)
 	{
 		PGconn	   *conn = fsstate->conn;
 		char		sql[64];
-		int			fetch_size;
 		int			numrows;
 		int			i;
 
-		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
-
 		snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
-				 fetch_size, fsstate->cursor_number);
+				 fsstate->fetch_size, fsstate->cursor_number);
 
 		res = PQexec(conn, sql);
 		/* On error, report the original query, not the FETCH. */
@@ -2311,7 +2318,7 @@ fetch_more_data(ForeignScanState *node)
 			fsstate->fetch_ct_2++;
 
 		/* Must be EOF if we didn't get as many tuples as we asked for. */
-		fsstate->eof_reached = (numrows < fetch_size);
+		fsstate->eof_reached = (numrows < fsstate->fetch_size);
 
 		PQclear(res);
 		res = NULL;
@@ -2685,6 +2692,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 			int			fetch_size;
 			int			numrows;
 			int			i;
+			ListCell   *lc;
 
 			/* Allow users to cancel long query */
 			CHECK_FOR_INTERRUPTS();
@@ -2695,8 +2703,27 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 			 * then just adjust rowstoskip and samplerows appropriately.
 			 */
 
-			/* The fetch size is arbitrary, but shouldn't be enormous. */
 			fetch_size = 100;
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "fetch_size") == 0)
+				{
+					fetch_size = strtol(defGetString(def), NULL,10);
+					break;
+				}
+			}
+			foreach(lc, table->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "fetch_size") == 0)
+				{
+					fetch_size = strtol(defGetString(def), NULL,10);
+					break;
+				}
+			}
 
 			/* Fetch some rows */
 			snprintf(fetch_sql, sizeof(fetch_sql), "FETCH %d FROM c%u",
@@ -3256,3 +3283,4 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
 	/* We didn't find any suitable equivalence class expression */
 	return NULL;
 }
+
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8553536..4b67c37 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -53,6 +53,8 @@ typedef struct PgFdwRelationInfo
 	ForeignTable *table;
 	ForeignServer *server;
 	UserMapping *user;			/* only set in use_remote_estimate mode */
+
+	int			fetch_size;      /* fetch size for this remote table */
 } PgFdwRelationInfo;
 
 /* in postgres_fdw.c */

#27

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Corey Huinker (#26)

Re: [POC] FETCH limited by bytes.

On Mon, Jan 25, 2016 at 4:18 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

Revised in patch v3:
* get_option() and get_fetch_size() removed, fetch_size searches added to
existing loops.
* Move fetch_size <= 0 tests into postgres_fdw_validator() routine in
option.c
* DEBUG1 message removed, never intended that to live beyond the proof of
concept.
* Missing regression test mentioned in makefile de-mentioned, as there's
nothing to see without the DEBUG1 message.
* Multi-line comment shrunk

Looks pretty good. You seem to have added a blank line at the end of
postgres_fdw.c, which should be stripped back out.

I'm not too keen on having *no* new regression tests, but defer to your
judgement.

So... I'm not *opposed* to regression tests. I just don't see a
reasonable way to write one. If you've got an idea, I'm all ears.

Still not sure what you mean by remote execution options. But it might be
simpler now that the patch is closer to your expectations.

I'm talking about the documentation portion of the patch, which
regrettably seems to have disappeared between v2 and v3.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Corey Huinker

corey.huinker@gmail.com

almost 10 years ago

In reply to: Robert Haas (#27)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

Looks pretty good. You seem to have added a blank line at the end of
postgres_fdw.c, which should be stripped back out.

Done.

I'm not too keen on having *no* new regression tests, but defer to your
judgement.

So... I'm not *opposed* to regression tests. I just don't see a
reasonable way to write one. If you've got an idea, I'm all ears.

The possible tests might be:
- creating a server/table with fetch_size set
- altering an existing server/table to have fetch_size set
- verifying that the value exists in pg_foreign_server.srvoptions and
pg_foreign_table.ftoptions.
- attempting to set fetch_size to a non-integer value

None of which are very interesting, and none of which actually test what
fetch_size was actually used.

But I'm happy to add any of those that seem worthwhile.

Still not sure what you mean by remote execution options. But it might be
simpler now that the patch is closer to your expectations.

I'm talking about the documentation portion of the patch, which
regrettably seems to have disappeared between v2 and v3.

Ah, didn't realize you were speaking about the documentation, and since
that section was new, I wasn't familiar with the name. Moved.

...and not sure why the doc change didn't make it into the last patch, but
it's in this one.

I also moved the paragraph starting "When using the extensions option, *it
is the user's responsibility* that..." such that it is in the same
varlistentry as "extensions", though maybe that change should be delegated
to the patch that created the extensions option.

Enjoy.

Attachments:

v4.Make-fetch_size-settable-per-server-and-table.patchtext/x-patch; charset=US-ASCII; name=v4.Make-fetch_size-settable-per-server-and-table.patchDownload

diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 86a5789..f89de2f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -131,6 +131,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			/* check list syntax, warn about uninstalled extensions */
 			(void) ExtractExtensionList(defGetString(def), true);
 		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+		{
+			int		fetch_size;
+
+			fetch_size = strtol(defGetString(def), NULL,10);
+			if (fetch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 	}
 
 	PG_RETURN_VOID();
@@ -162,6 +173,9 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/* fetch_size is available on both server and table */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 374faf5..f71bf61 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -380,6 +384,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
 	fpinfo->shippable_extensions = NIL;
+	fpinfo->fetch_size = 100;
 
 	foreach(lc, fpinfo->server->options)
 	{
@@ -394,16 +399,17 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		else if (strcmp(def->defname, "extensions") == 0)
 			fpinfo->shippable_extensions =
 				ExtractExtensionList(defGetString(def), false);
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 	foreach(lc, fpinfo->table->options)
 	{
 		DefElem    *def = (DefElem *) lfirst(lc);
 
 		if (strcmp(def->defname, "use_remote_estimate") == 0)
-		{
 			fpinfo->use_remote_estimate = defGetBoolean(def);
-			break;				/* only need the one value */
-		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 
 	/*
@@ -1069,6 +1075,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = list_make2(makeString(sql.data),
 							 retrieved_attrs);
+	fdw_private = list_make3(makeString(sql.data),
+							 retrieved_attrs,
+							 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, filtering expressions,
@@ -1147,6 +1156,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 									 FdwScanPrivateSelectSql));
 	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
 											   FdwScanPrivateRetrievedAttrs);
+	fsstate->fetch_size = intVal(list_nth(fsplan->fdw_private,
+								 FdwScanPrivateFetchSize));
 
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
 	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -2275,15 +2286,11 @@ fetch_more_data(ForeignScanState *node)
 	{
 		PGconn	   *conn = fsstate->conn;
 		char		sql[64];
-		int			fetch_size;
 		int			numrows;
 		int			i;
 
-		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
-
 		snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
-				 fetch_size, fsstate->cursor_number);
+				 fsstate->fetch_size, fsstate->cursor_number);
 
 		res = PQexec(conn, sql);
 		/* On error, report the original query, not the FETCH. */
@@ -2311,7 +2318,7 @@ fetch_more_data(ForeignScanState *node)
 			fsstate->fetch_ct_2++;
 
 		/* Must be EOF if we didn't get as many tuples as we asked for. */
-		fsstate->eof_reached = (numrows < fetch_size);
+		fsstate->eof_reached = (numrows < fsstate->fetch_size);
 
 		PQclear(res);
 		res = NULL;
@@ -2685,6 +2692,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 			int			fetch_size;
 			int			numrows;
 			int			i;
+			ListCell   *lc;
 
 			/* Allow users to cancel long query */
 			CHECK_FOR_INTERRUPTS();
@@ -2695,8 +2703,27 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 			 * then just adjust rowstoskip and samplerows appropriately.
 			 */
 
-			/* The fetch size is arbitrary, but shouldn't be enormous. */
 			fetch_size = 100;
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "fetch_size") == 0)
+				{
+					fetch_size = strtol(defGetString(def), NULL,10);
+					break;
+				}
+			}
+			foreach(lc, table->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "fetch_size") == 0)
+				{
+					fetch_size = strtol(defGetString(def), NULL,10);
+					break;
+				}
+			}
 
 			/* Fetch some rows */
 			snprintf(fetch_sql, sizeof(fetch_sql), "FETCH %d FROM c%u",
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index 8553536..4b67c37 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -53,6 +53,8 @@ typedef struct PgFdwRelationInfo
 	ForeignTable *table;
 	ForeignServer *server;
 	UserMapping *user;			/* only set in use_remote_estimate mode */
+
+	int			fetch_size;      /* fetch size for this remote table */
 } PgFdwRelationInfo;
 
 /* in postgres_fdw.c */
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 5a829d5..a90983c 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -290,17 +290,30 @@
        be considered shippable to the remote server.
        This option can only be specified for foreign servers, not per-table.
       </para>
+
+      <para>
+       When using the <literal>extensions</literal> option, <emphasis>it is the
+       user's responsibility</> that the listed extensions exist and behave
+       identically on both the local and remote servers.  Otherwise, remote
+       queries may fail or behave unexpectedly.
+      </para>
      </listitem>
     </varlistentry>
 
-   </variablelist>
+    <varlistentry>
+     <term><literal>fetch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</> 
+       should get in each fetch operation. It can be specified for a foreign
+       table or a foreign server. The option specified on a table overrides
+       an option specified for the server.
+       The default is <literal>100</>.
+      </para>
+     </listitem>
+    </varlistentry>
 
-   <para>
-    When using the <literal>extensions</literal> option, <emphasis>it is the
-    user's responsibility</> that the listed extensions exist and behave
-    identically on both the local and remote servers.  Otherwise, remote
-    queries may fail or behave unexpectedly.
-   </para>
+   </variablelist>
 
   </sect3>

#29

Alvaro Herrera

alvherre@2ndquadrant.com

almost 10 years ago

In reply to: Corey Huinker (#28)

Re: [POC] FETCH limited by bytes.

Corey Huinker wrote:

Enjoy.

Didn't actually look into the patch but looks like there's progress
here and this might be heading for commit. Moving to next one.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Corey Huinker (#28)

Re: [POC] FETCH limited by bytes.

On Wed, Jan 27, 2016 at 11:19 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

The possible tests might be:
- creating a server/table with fetch_size set
- altering an existing server/table to have fetch_size set
- verifying that the value exists in pg_foreign_server.srvoptions and
pg_foreign_table.ftoptions.
- attempting to set fetch_size to a non-integer value

I'd add a test that does one of the first two and skip the others.
I'm not wedded to that exact thing; that's just a suggestion.

Enjoy.

I'd enjoy it more if, heh heh, it compiled.

postgres_fdw.c:2642:16: error: use of undeclared identifier 'server'
foreach(lc, server->options)
^
../../src/include/nodes/pg_list.h:153:26: note: expanded from macro 'foreach'
for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
^
1 error generated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Corey Huinker

corey.huinker@gmail.com

almost 10 years ago

In reply to: Robert Haas (#30)

Re: [POC] FETCH limited by bytes.

postgres_fdw.c:2642:16: error: use of undeclared identifier 'server'
foreach(lc, server->options)

Odd. Compiled for me. Maybe I messed up the diff. Will get back to you on
that with the tests suggested.

^
../../src/include/nodes/pg_list.h:153:26: note: expanded from macro
'foreach'
for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
^

Didn't modify this file on this or any other work of mine. Possible garbage
from a git pull. Will investigate.

#32

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Corey Huinker (#31)

Re: [POC] FETCH limited by bytes.

On Tue, Feb 2, 2016 at 5:57 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

postgres_fdw.c:2642:16: error: use of undeclared identifier 'server'
foreach(lc, server->options)

Odd. Compiled for me. Maybe I messed up the diff. Will get back to you on
that with the tests suggested.

I don't see how. There really is no declaration in there for a
variable called server.

../../src/include/nodes/pg_list.h:153:26: note: expanded from macro
'foreach'
for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
^

Didn't modify this file on this or any other work of mine. Possible garbage
from a git pull. Will investigate.

This is context information for the same error, not a separate problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Corey Huinker

corey.huinker@gmail.com

almost 10 years ago

In reply to: Robert Haas (#32)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

I don't see how. There really is no declaration in there for a
variable called server.

Absolutely correct. My only guess is that it was from failing to make clean
after a checkout/re-checkout. A good reason to have even boring regression
tests.

Regression tests added.

Attachments:

v5.fetch-size.patchtext/x-patch; charset=US-ASCII; name=v5.fetch-size.patchDownload

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2390e61..e28cf77 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3951,3 +3951,63 @@ QUERY:  CREATE FOREIGN TABLE t5 (
 OPTIONS (schema_name 'import_source', table_name 't5');
 CONTEXT:  importing foreign table "t5"
 ROLLBACK;
+BEGIN;
+CREATE SERVER fetch101 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( fetch_size '101' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=101'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER SERVER fetch101 OPTIONS( SET fetch_size '202' );
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=101'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=202'];
+ count 
+-------
+     1
+(1 row)
+
+CREATE FOREIGN TABLE table30000 ( x int ) SERVER fetch101 OPTIONS ( fetch_size '30000' );
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30000'::regclass
+AND ftoptions @> array['fetch_size=30000'];
+ count 
+-------
+     1
+(1 row)
+
+ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '60000');
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30000'::regclass
+AND ftoptions @> array['fetch_size=30000'];
+ count 
+-------
+     0
+(1 row)
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30000'::regclass
+AND ftoptions @> array['fetch_size=60000'];
+ count 
+-------
+     1
+(1 row)
+
+ROLLBACK;
diff --git a/contrib/postgres_fdw/option.c b/contrib/postgres_fdw/option.c
index 86a5789..f89de2f 100644
--- a/contrib/postgres_fdw/option.c
+++ b/contrib/postgres_fdw/option.c
@@ -131,6 +131,17 @@ postgres_fdw_validator(PG_FUNCTION_ARGS)
 			/* check list syntax, warn about uninstalled extensions */
 			(void) ExtractExtensionList(defGetString(def), true);
 		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+		{
+			int		fetch_size;
+
+			fetch_size = strtol(defGetString(def), NULL,10);
+			if (fetch_size <= 0)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("%s requires a non-negative integer value",
+								def->defname)));
+		}
 	}
 
 	PG_RETURN_VOID();
@@ -162,6 +173,9 @@ InitPgFdwOptions(void)
 		/* updatable is available on both server and table */
 		{"updatable", ForeignServerRelationId, false},
 		{"updatable", ForeignTableRelationId, false},
+		/* fetch_size is available on both server and table */
+		{"fetch_size", ForeignServerRelationId, false},
+		{"fetch_size", ForeignTableRelationId, false},
 		{NULL, InvalidOid, false}
 	};
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 2ab85f6..8f72ff7 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -68,7 +68,9 @@ enum FdwScanPrivateIndex
 	/* SQL statement to execute remotely (as a String node) */
 	FdwScanPrivateSelectSql,
 	/* Integer list of attribute numbers retrieved by the SELECT */
-	FdwScanPrivateRetrievedAttrs
+	FdwScanPrivateRetrievedAttrs,
+	/* Integer representing the desired fetch_size */
+	FdwScanPrivateFetchSize
 };
 
 /*
@@ -126,6 +128,8 @@ typedef struct PgFdwScanState
 	/* working memory contexts */
 	MemoryContext batch_cxt;	/* context holding current batch of tuples */
 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+
+	int			fetch_size;		/* number of tuples per fetch */
 } PgFdwScanState;
 
 /*
@@ -380,6 +384,7 @@ postgresGetForeignRelSize(PlannerInfo *root,
 	fpinfo->fdw_startup_cost = DEFAULT_FDW_STARTUP_COST;
 	fpinfo->fdw_tuple_cost = DEFAULT_FDW_TUPLE_COST;
 	fpinfo->shippable_extensions = NIL;
+	fpinfo->fetch_size = 100;
 
 	foreach(lc, fpinfo->server->options)
 	{
@@ -394,16 +399,17 @@ postgresGetForeignRelSize(PlannerInfo *root,
 		else if (strcmp(def->defname, "extensions") == 0)
 			fpinfo->shippable_extensions =
 				ExtractExtensionList(defGetString(def), false);
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 	foreach(lc, fpinfo->table->options)
 	{
 		DefElem    *def = (DefElem *) lfirst(lc);
 
 		if (strcmp(def->defname, "use_remote_estimate") == 0)
-		{
 			fpinfo->use_remote_estimate = defGetBoolean(def);
-			break;				/* only need the one value */
-		}
+		else if (strcmp(def->defname, "fetch_size") == 0)
+			fpinfo->fetch_size = strtol(defGetString(def), NULL,10);
 	}
 
 	/*
@@ -1012,6 +1018,9 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 */
 	fdw_private = list_make2(makeString(sql.data),
 							 retrieved_attrs);
+	fdw_private = list_make3(makeString(sql.data),
+							 retrieved_attrs,
+							 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, filtering expressions,
@@ -1088,6 +1097,8 @@ postgresBeginForeignScan(ForeignScanState *node, int eflags)
 									 FdwScanPrivateSelectSql));
 	fsstate->retrieved_attrs = (List *) list_nth(fsplan->fdw_private,
 											   FdwScanPrivateRetrievedAttrs);
+	fsstate->fetch_size = intVal(list_nth(fsplan->fdw_private,
+								 FdwScanPrivateFetchSize));
 
 	/* Create contexts for batches of tuples and per-tuple temp workspace. */
 	fsstate->batch_cxt = AllocSetContextCreate(estate->es_query_cxt,
@@ -2214,15 +2225,11 @@ fetch_more_data(ForeignScanState *node)
 	{
 		PGconn	   *conn = fsstate->conn;
 		char		sql[64];
-		int			fetch_size;
 		int			numrows;
 		int			i;
 
-		/* The fetch size is arbitrary, but shouldn't be enormous. */
-		fetch_size = 100;
-
 		snprintf(sql, sizeof(sql), "FETCH %d FROM c%u",
-				 fetch_size, fsstate->cursor_number);
+				 fsstate->fetch_size, fsstate->cursor_number);
 
 		res = PQexec(conn, sql);
 		/* On error, report the original query, not the FETCH. */
@@ -2250,7 +2257,7 @@ fetch_more_data(ForeignScanState *node)
 			fsstate->fetch_ct_2++;
 
 		/* Must be EOF if we didn't get as many tuples as we asked for. */
-		fsstate->eof_reached = (numrows < fetch_size);
+		fsstate->eof_reached = (numrows < fsstate->fetch_size);
 
 		PQclear(res);
 		res = NULL;
@@ -2563,6 +2570,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 {
 	PgFdwAnalyzeState astate;
 	ForeignTable *table;
+	ForeignServer *server;
 	UserMapping *user;
 	PGconn	   *conn;
 	unsigned int cursor_number;
@@ -2593,6 +2601,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 	 * owner, even if the ANALYZE was started by some other user.
 	 */
 	table = GetForeignTable(RelationGetRelid(relation));
+	server = GetForeignServer(table->serverid);
 	user = GetUserMapping(relation->rd_rel->relowner, table->serverid);
 	conn = GetConnection(user, false);
 
@@ -2620,6 +2629,7 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 			int			fetch_size;
 			int			numrows;
 			int			i;
+			ListCell   *lc;
 
 			/* Allow users to cancel long query */
 			CHECK_FOR_INTERRUPTS();
@@ -2630,8 +2640,27 @@ postgresAcquireSampleRowsFunc(Relation relation, int elevel,
 			 * then just adjust rowstoskip and samplerows appropriately.
 			 */
 
-			/* The fetch size is arbitrary, but shouldn't be enormous. */
 			fetch_size = 100;
+			foreach(lc, server->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "fetch_size") == 0)
+				{
+					fetch_size = strtol(defGetString(def), NULL,10);
+					break;
+				}
+			}
+			foreach(lc, table->options)
+			{
+				DefElem    *def = (DefElem *) lfirst(lc);
+
+				if (strcmp(def->defname, "fetch_size") == 0)
+				{
+					fetch_size = strtol(defGetString(def), NULL,10);
+					break;
+				}
+			}
 
 			/* Fetch some rows */
 			snprintf(fetch_sql, sizeof(fetch_sql), "FETCH %d FROM c%u",
diff --git a/contrib/postgres_fdw/postgres_fdw.h b/contrib/postgres_fdw/postgres_fdw.h
index bf83c91..2b63281 100644
--- a/contrib/postgres_fdw/postgres_fdw.h
+++ b/contrib/postgres_fdw/postgres_fdw.h
@@ -53,6 +53,8 @@ typedef struct PgFdwRelationInfo
 	ForeignTable *table;
 	ForeignServer *server;
 	UserMapping *user;			/* only set in use_remote_estimate mode */
+
+	int			fetch_size;      /* fetch size for this remote table */
 } PgFdwRelationInfo;
 
 /* in postgres_fdw.c */
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 5c6ead1..ec8a30a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -919,4 +919,48 @@ BEGIN;
 DROP TYPE "Colors" CASCADE;
 IMPORT FOREIGN SCHEMA import_source LIMIT TO (t5)
   FROM SERVER loopback INTO import_dest5;  -- ERROR
+
+ROLLBACK;
+
+BEGIN;
+
+
+CREATE SERVER fetch101 FOREIGN DATA WRAPPER postgres_fdw OPTIONS( fetch_size '101' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=101'];
+
+ALTER SERVER fetch101 OPTIONS( SET fetch_size '202' );
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=101'];
+
+SELECT count(*)
+FROM pg_foreign_server
+WHERE srvname = 'fetch101'
+AND srvoptions @> array['fetch_size=202'];
+
+CREATE FOREIGN TABLE table30000 ( x int ) SERVER fetch101 OPTIONS ( fetch_size '30000' );
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30000'::regclass
+AND ftoptions @> array['fetch_size=30000'];
+
+ALTER FOREIGN TABLE table30000 OPTIONS ( SET fetch_size '60000');
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30000'::regclass
+AND ftoptions @> array['fetch_size=30000'];
+
+SELECT COUNT(*)
+FROM pg_foreign_table
+WHERE ftrelid = 'table30000'::regclass
+AND ftoptions @> array['fetch_size=60000'];
+
 ROLLBACK;
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 5a829d5..a90983c 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -290,17 +290,30 @@
        be considered shippable to the remote server.
        This option can only be specified for foreign servers, not per-table.
       </para>
+
+      <para>
+       When using the <literal>extensions</literal> option, <emphasis>it is the
+       user's responsibility</> that the listed extensions exist and behave
+       identically on both the local and remote servers.  Otherwise, remote
+       queries may fail or behave unexpectedly.
+      </para>
      </listitem>
     </varlistentry>
 
-   </variablelist>
+    <varlistentry>
+     <term><literal>fetch_size</literal></term>
+     <listitem>
+      <para>
+       This option specifies the number of rows <filename>postgres_fdw</> 
+       should get in each fetch operation. It can be specified for a foreign
+       table or a foreign server. The option specified on a table overrides
+       an option specified for the server.
+       The default is <literal>100</>.
+      </para>
+     </listitem>
+    </varlistentry>
 
-   <para>
-    When using the <literal>extensions</literal> option, <emphasis>it is the
-    user's responsibility</> that the listed extensions exist and behave
-    identically on both the local and remote servers.  Otherwise, remote
-    queries may fail or behave unexpectedly.
-   </para>
+   </variablelist>
 
   </sect3>

#34

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Corey Huinker (#33)

Re: [POC] FETCH limited by bytes.

On Tue, Feb 2, 2016 at 10:42 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

I don't see how. There really is no declaration in there for a
variable called server.

Absolutely correct. My only guess is that it was from failing to make clean
after a checkout/re-checkout. A good reason to have even boring regression
tests.

Regression tests added.

Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

almost 10 years ago

In reply to: Robert Haas (#34)

1 attachment(s)

Re: [POC] FETCH limited by bytes.

There seems to be double successive assignment to fdw_private in recent
commit. Here's patch to remove the first one.

On Wed, Feb 3, 2016 at 7:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Feb 2, 2016 at 10:42 PM, Corey Huinker <corey.huinker@gmail.com>
wrote:

I don't see how. There really is no declaration in there for a
variable called server.

Absolutely correct. My only guess is that it was from failing to make

clean

after a checkout/re-checkout. A good reason to have even boring

regression

tests.

Regression tests added.

Committed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Attachments:

pg_double_assignment.patchtext/plain; charset=US-ASCII; name=pg_double_assignment.patchDownload

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index d5c0383..c95ac05 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1009,22 +1009,20 @@ postgresGetForeignPlan(PlannerInfo *root,
 	 * expressions to be sent as parameters.
 	 */
 	initStringInfo(&sql);
 	deparseSelectStmtForRel(&sql, root, baserel, remote_conds,
 							best_path->path.pathkeys, &retrieved_attrs,
 							&params_list);
 	/*
 	 * Build the fdw_private list that will be available to the executor.
 	 * Items in the list must match enum FdwScanPrivateIndex, above.
 	 */
-	fdw_private = list_make2(makeString(sql.data),
-							 retrieved_attrs);
 	fdw_private = list_make3(makeString(sql.data),
 							 retrieved_attrs,
 							 makeInteger(fpinfo->fetch_size));
 
 	/*
 	 * Create the ForeignScan node from target list, filtering expressions,
 	 * remote parameter expressions, and FDW private information.
 	 *
 	 * Note that the remote parameter expressions are stored in the fdw_exprs
 	 * field of the finished plan node; we can't keep them in private state

#36

Robert Haas

robertmhaas@gmail.com

almost 10 years ago

In reply to: Ashutosh Bapat (#35)

Re: [POC] FETCH limited by bytes.

On Wed, Feb 3, 2016 at 11:17 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:

There seems to be double successive assignment to fdw_private in recent
commit. Here's patch to remove the first one.

Committed along with a fix for another problem I noted along the way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers