Let file_fdw access COPY FROM PROGRAM

Started by Corey Huinkerover 9 years ago25 messages
#1Corey Huinker
corey.huinker@gmail.com
1 attachment(s)

A while back, there was a push to make COPY gzip-aware. That didn't happen,
but COPY FROM PROGRAM did, and it scratches the same itch.

I have a similar need, but with file_fdw foreign tables. I have .csv.gz
files downloaded to the server, but those CSVs have 100+ columns in them,
and in this case I only really care about a half dozen of those columns.
I'd like to avoid:
- the overhead of writing the uncompressed file to disk and then
immediately re-reading it
- writing unwanted columns to a temp/work table via COPY, and then
immediately re-reading them
- multicorn fdw because it ends up making a python string out of all data
cells
- a csv parsing tool like csvtool or mlr, because they output another CSV
which must be reparsed from scratch

Since file_fdw leverages COPY, it seemed like it would be easy to add the
FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql
IRC, only to discover that Adam Gomaa ( akgomaa@gmail.com ) had already
written such a thing, but hadn't submitted it. Attached is a small rework
of his patch, along with documentation.

NOTE: The regression test includes unix commands in the program option. I
figured that wouldn't work for win32 systems, so I checked to see what the
regression tests do to test COPY FROM PROGRAM...and I couldn't find any. So
I guess the test exists as a proof of concept that will get excised before
final commit.

Attachments:

file_fdw_program_option-2016-06-02-v2.difftext/plain; charset=US-ASCII; name=file_fdw_program_option-2016-06-02-v2.diffDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index bc4d2d7..97df270 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
 	/* File options */
 	{"filename", ForeignTableRelationId},
+	{"program", ForeignTableRelationId},
 
 	/* Format options */
 	/* oids option is not supported */
@@ -86,9 +87,10 @@ static const struct FileFdwOption valid_options[] = {
 typedef struct FileFdwPlanState
 {
 	char	   *filename;		/* file to read */
-	List	   *options;		/* merged COPY options, excluding filename */
-	BlockNumber pages;			/* estimate of file's physical size */
-	double		ntuples;		/* estimate of number of rows in file */
+	char	   *program;		/* program to read output from */
+	List	   *options;		/* merged COPY options, excluding filename and program */
+	BlockNumber pages;			/* estimate of file or program output's physical size */
+	double		ntuples;		/* estimate of number of rows in file or program output */
 } FileFdwPlanState;
 
 /*
@@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
 typedef struct FileFdwExecutionState
 {
 	char	   *filename;		/* file to read */
-	List	   *options;		/* merged COPY options, excluding filename */
-	CopyState	cstate;			/* state of reading file */
+	char	   *program;		/* program to read output from */
+	List	   *options;		/* merged COPY options, excluding filename and program */
+	CopyState	cstate;			/* state of reading file or program */
 } FileFdwExecutionState;
 
 /*
@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo *root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-			   char **filename, List **other_options);
+						   char **filename,
+						   char **program,
+						   List **other_options);
 static List *get_file_fdw_attribute_options(Oid relid);
 static bool check_selective_binary_conversion(RelOptInfo *baserel,
 								  Oid foreigntableid,
@@ -189,6 +194,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	List	   *options_list = untransformRelOptions(PG_GETARG_DATUM(0));
 	Oid			catalog = PG_GETARG_OID(1);
 	char	   *filename = NULL;
+	char	   *program = NULL;
 	DefElem    *force_not_null = NULL;
 	DefElem    *force_null = NULL;
 	List	   *other_options = NIL;
@@ -196,16 +202,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 
 	/*
 	 * Only superusers are allowed to set options of a file_fdw foreign table.
-	 * This is because the filename is one of those options, and we don't want
-	 * non-superusers to be able to determine which file gets read.
+	 * This is because the filename or program string are two of those
+	 * options, and we don't want non-superusers to be able to determine which
+	 * file gets read or what command is run.
 	 *
 	 * Putting this sort of permissions check in a validator is a bit of a
 	 * crock, but there doesn't seem to be any other place that can enforce
 	 * the check more cleanly.
 	 *
-	 * Note that the valid_options[] array disallows setting filename at any
-	 * options level other than foreign table --- otherwise there'd still be a
-	 * security hole.
+	 * Note that the valid_options[] array disallows setting filename and
+	 * program at any options level other than foreign table --- otherwise
+	 * there'd still be a security hole.
 	 */
 	if (catalog == ForeignTableRelationId && !superuser())
 		ereport(ERROR,
@@ -247,7 +254,7 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		}
 
 		/*
-		 * Separate out filename and column-specific options, since
+		 * Separate out filename, program, and column-specific options, since
 		 * ProcessCopyOptions won't accept them.
 		 */
 
@@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options")));
+			if (program)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
 			filename = defGetString(def);
 		}
 
+		else if (strcmp(def->defname, "program") == 0)
+		{
+			if (filename)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			if (program)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			program = defGetString(def);
+		}
+
 		/*
 		 * force_not_null is a boolean option; after validation we can discard
 		 * it - it will be retrieved later in get_file_fdw_attribute_options()
@@ -296,12 +320,13 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	ProcessCopyOptions(NULL, true, other_options);
 
 	/*
-	 * Filename option is required for file_fdw foreign tables.
+	 * Either filename or program option is required for file_fdw foreign
+	 * tables.
 	 */
-	if (catalog == ForeignTableRelationId && filename == NULL)
+	if (catalog == ForeignTableRelationId && filename == NULL && program == NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FDW_DYNAMIC_PARAMETER_VALUE_NEEDED),
-				 errmsg("filename is required for file_fdw foreign tables")));
+				 errmsg("either filename or program is required for file_fdw foreign tables")));
 
 	PG_RETURN_VOID();
 }
@@ -326,12 +351,13 @@ is_valid_option(const char *option, Oid context)
 /*
  * Fetch the options for a file_fdw foreign table.
  *
- * We have to separate out "filename" from the other options because
- * it must not appear in the options list passed to the core COPY code.
+ * We have to separate out "filename" and "program" from the other options
+ * because it must not appear in the options list passed to the core COPY
+ * code.
  */
 static void
 fileGetOptions(Oid foreigntableid,
-			   char **filename, List **other_options)
+			   char **filename, char **program, List **other_options)
 {
 	ForeignTable *table;
 	ForeignServer *server;
@@ -359,9 +385,10 @@ fileGetOptions(Oid foreigntableid,
 	options = list_concat(options, get_file_fdw_attribute_options(foreigntableid));
 
 	/*
-	 * Separate out the filename.
+	 * Separate out the filename or program.
 	 */
 	*filename = NULL;
+	*program = NULL;
 	prev = NULL;
 	foreach(lc, options)
 	{
@@ -373,6 +400,12 @@ fileGetOptions(Oid foreigntableid,
 			options = list_delete_cell(options, lc, prev);
 			break;
 		}
+		else if (strcmp(def->defname, "program") == 0)
+		{
+			*program = defGetString(def);
+			options = list_delete_cell(options, lc, prev);
+			break;
+		}
 		prev = lc;
 	}
 
@@ -380,8 +413,8 @@ fileGetOptions(Oid foreigntableid,
 	 * The validator should have checked that a filename was included in the
 	 * options, but check again, just in case.
 	 */
-	if (*filename == NULL)
-		elog(ERROR, "filename is required for file_fdw foreign tables");
+	if (*filename == NULL && *program == NULL)
+		elog(ERROR, "either filename or program is required for file_fdw foreign tables");
 
 	*other_options = options;
 }
@@ -475,12 +508,13 @@ fileGetForeignRelSize(PlannerInfo *root,
 	FileFdwPlanState *fdw_private;
 
 	/*
-	 * Fetch options.  We only need filename at this point, but we might as
-	 * well get everything and not need to re-fetch it later in planning.
+	 * Fetch options.  We only need filename (or program) at this point, but
+	 * we might as well get everything and not need to re-fetch it later in
+	 * planning.
 	 */
 	fdw_private = (FileFdwPlanState *) palloc(sizeof(FileFdwPlanState));
 	fileGetOptions(foreigntableid,
-				   &fdw_private->filename, &fdw_private->options);
+				   &fdw_private->filename, &fdw_private->program, &fdw_private->options);
 	baserel->fdw_private = (void *) fdw_private;
 
 	/* Estimate relation size */
@@ -583,20 +617,24 @@ static void
 fileExplainForeignScan(ForeignScanState *node, ExplainState *es)
 {
 	char	   *filename;
+	char	   *program;
 	List	   *options;
 
 	/* Fetch options --- we only need filename at this point */
 	fileGetOptions(RelationGetRelid(node->ss.ss_currentRelation),
-				   &filename, &options);
+				   &filename, &program, &options);
 
-	ExplainPropertyText("Foreign File", filename, es);
+	if(filename)
+		ExplainPropertyText("Foreign File", filename, es);
+	else
+		ExplainPropertyText("Foreign Program", program, es);
 
 	/* Suppress file size if we're not showing cost details */
 	if (es->costs)
 	{
 		struct stat stat_buf;
 
-		if (stat(filename, &stat_buf) == 0)
+		if (filename && (stat(filename, &stat_buf) == 0))
 			ExplainPropertyLong("Foreign File Size", (long) stat_buf.st_size,
 								es);
 	}
@@ -611,6 +649,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 {
 	ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
 	char	   *filename;
+	char	   *program;
 	List	   *options;
 	CopyState	cstate;
 	FileFdwExecutionState *festate;
@@ -623,7 +662,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 
 	/* Fetch options of foreign table */
 	fileGetOptions(RelationGetRelid(node->ss.ss_currentRelation),
-				   &filename, &options);
+				   &filename, &program, &options);
 
 	/* Add any options from the plan (currently only convert_selectively) */
 	options = list_concat(options, plan->fdw_private);
@@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 	 * Create CopyState from FDW options.  We always acquire all columns, so
 	 * as to match the expected ScanTupleSlot signature.
 	 */
-	cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-						   filename,
-						   false,
-						   NIL,
-						   options);
+	if(filename)
+		cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+							   filename,
+							   false,
+							   NIL,
+							   options);
+	else
+		cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+							   program,
+							   true,
+							   NIL,
+							   options);
 
 	/*
 	 * Save state in node->fdw_state.  We must save enough information to call
@@ -644,6 +690,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 	 */
 	festate = (FileFdwExecutionState *) palloc(sizeof(FileFdwExecutionState));
 	festate->filename = filename;
+	festate->program = program;
 	festate->options = options;
 	festate->cstate = cstate;
 
@@ -705,11 +752,19 @@ fileReScanForeignScan(ForeignScanState *node)
 
 	EndCopyFrom(festate->cstate);
 
-	festate->cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-									festate->filename,
-									false,
-									NIL,
-									festate->options);
+	if(festate->filename)
+		festate->cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+										festate->filename,
+										false,
+										NIL,
+										festate->options);
+	else
+		festate->cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+										festate->program,
+										true,
+										NIL,
+										festate->options);
+
 }
 
 /*
@@ -736,11 +791,19 @@ fileAnalyzeForeignTable(Relation relation,
 						BlockNumber *totalpages)
 {
 	char	   *filename;
+	char	   *program;
 	List	   *options;
 	struct stat stat_buf;
 
 	/* Fetch options of foreign table */
-	fileGetOptions(RelationGetRelid(relation), &filename, &options);
+	fileGetOptions(RelationGetRelid(relation), &filename, &program, &options);
+
+	/*
+	 * If this is a program instead of a file, just return false to skip
+	 * analyzing the table.
+	 */
+	if (program)
+		return false;
 
 	/*
 	 * Get size of the file.  (XXX if we fail here, would it be better to just
@@ -914,9 +977,11 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 
 	/*
 	 * Get size of the file.  It might not be there at plan time, though, in
-	 * which case we have to use a default estimate.
+	 * which case we have to use a default estimate.  We also have to fall
+	 * back to the default if using a program as the input.
 	 */
-	if (stat(fdw_private->filename, &stat_buf) < 0)
+	if (fdw_private->filename == NULL ||
+		stat(fdw_private->filename, &stat_buf) < 0)
 		stat_buf.st_size = 10 * BLCKSZ;
 
 	/*
@@ -1034,6 +1099,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	bool	   *nulls;
 	bool		found;
 	char	   *filename;
+	char	   *program;
 	List	   *options;
 	CopyState	cstate;
 	ErrorContextCallback errcallback;
@@ -1048,12 +1114,15 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
 
 	/* Fetch options of foreign table */
-	fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+	fileGetOptions(RelationGetRelid(onerel), &filename, &program, &options);
 
 	/*
 	 * Create CopyState from FDW options.
 	 */
-	cstate = BeginCopyFrom(onerel, filename, false, NIL, options);
+	if(filename)
+		cstate = BeginCopyFrom(onerel, filename, false, NIL, options);
+	else
+		cstate = BeginCopyFrom(onerel, program, true, NIL, options);
 
 	/*
 	 * Use per-tuple memory context to prevent leak of memory used to read
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 35db4af..207f8e2 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -97,6 +97,17 @@ SELECT * FROM text_csv;
 ALTER FOREIGN TABLE text_csv ALTER COLUMN word1 OPTIONS (force_null 'true');
 ALTER FOREIGN TABLE text_csv ALTER COLUMN word3 OPTIONS (force_not_null 'true');
 
+-- program tests
+CREATE FOREIGN TABLE text_tsv_program (
+    word1 text OPTIONS (force_not_null 'true'),
+    word2 text OPTIONS (force_not_null 'off'),
+    word3 text OPTIONS (force_null 'true'),
+    word4 text OPTIONS (force_null 'off')
+) SERVER file_server
+OPTIONS (format 'csv', program 'sed -e ''s/,/\t/g'' @abs_srcdir@/data/text.csv', null 'NULL', delimiter e'\t');
+\pset null _null_
+SELECT * FROM text_tsv_program;
+
 -- force_not_null is not allowed to be specified at any foreign object level:
 ALTER FOREIGN DATA WRAPPER file_fdw OPTIONS (ADD force_not_null '*'); -- ERROR
 ALTER SERVER file_server OPTIONS (ADD force_not_null '*'); -- ERROR
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index d3b39aa..f36a3c5 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -27,7 +27,26 @@
 
    <listitem>
     <para>
-     Specifies the file to be read.  Required.  Must be an absolute path name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal> must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><literal>program</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the command to executed.
+     Note that the command is invoked by the shell, so if you need to pass any
+     arguments to shell command that come from an untrusted source, you must
+     be careful to strip or escape any special characters that might have a
+     special meaning for the shell. For security reasons, it is best to use a
+     fixed command string, or at least avoid passing any user input in it.
+     Either <literal>program</literal> or <literal>filename</literal> must be
+     specified.  They are mutually exclusive.
     </para>
    </listitem>
   </varlistentry>
#2Craig Ringer
craig@2ndquadrant.com
In reply to: Corey Huinker (#1)
Re: Let file_fdw access COPY FROM PROGRAM

On 3 June 2016 at 04:48, Corey Huinker <corey.huinker@gmail.com> wrote:

A while back, there was a push to make COPY gzip-aware. That didn't
happen, but COPY FROM PROGRAM did, and it scratches the same itch.

- writing unwanted columns to a temp/work table via COPY, and then
immediately re-reading them

Without wanting to take away from the value of letting file FDW access FROM
PROGRAM, I think this really merits a solution that applies to COPY as
well. Variants on "how do I COPY just some columns from a CSV" is a real
FAQ, and it doesn't seem like it'd be excessively hard to support. We'd
just need some way to pass a list of column-ordinals or header-names.

Not asking you to do that work, just pointing out that this particular
issue applies to COPY its self as well.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#3Corey Huinker
corey.huinker@gmail.com
In reply to: Craig Ringer (#2)
Re: Let file_fdw access COPY FROM PROGRAM

On Fri, Jun 3, 2016 at 1:06 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 3 June 2016 at 04:48, Corey Huinker <corey.huinker@gmail.com> wrote:

A while back, there was a push to make COPY gzip-aware. That didn't
happen, but COPY FROM PROGRAM did, and it scratches the same itch.

- writing unwanted columns to a temp/work table via COPY, and then
immediately re-reading them

Without wanting to take away from the value of letting file FDW access
FROM PROGRAM, I think this really merits a solution that applies to COPY as
well. Variants on "how do I COPY just some columns from a CSV" is a real
FAQ, and it doesn't seem like it'd be excessively hard to support. We'd
just need some way to pass a list of column-ordinals or header-names.

Not asking you to do that work, just pointing out that this particular
issue applies to COPY its self as well.

I agree, they are two different but slightly overlapping issues. file_fdw
needs a way to handle compressed/filtered files, and COPY needs the ability
to skip columns. But right now COPY is all about getting the shape of the
input from the shape of the destination table.

I had the bright idea of creating a custom datatype, call it "skip" /
"filler" / "nada" / "devnull" or something like that, that would map any
and all values to NULL, thus allowing COPY to naively "store" the unwanted
values into nothingness.

That idea falls apart when it hits InputFunctionCall() in fmgr.c, which
prevents any text string from coercing into NULL. I didn't think I was up
to the task of making InputFunctionCall respect a special case type (void?)
or somehow promoting the pseudo type void into a legit column type (unknown
side-effects there). I did create a type that coerced all input into the
empty string, but the disk usage of that was still pretty significant.

So maybe if there was a subspecies of COPY that returned SETOF RECORD:

SELECT important_column1, important_column2
FROM copy_srf( program := 'zcat filename', format := 'csv', header := true
) AS t(bogus_col1 text, important_column1 integer, important_column2 date,
bogus_col2 text, ... );

That would allow COPY to keep it's current efficiency and simplicity, while
(hopefully) avoiding the unwanted data from ever hitting disk. It would
also be vaguely reminiscent of SQL*Loader.

#4Robert Haas
robertmhaas@gmail.com
In reply to: Corey Huinker (#1)
Re: Let file_fdw access COPY FROM PROGRAM

On Thu, Jun 2, 2016 at 4:48 PM, Corey Huinker <corey.huinker@gmail.com> wrote:

A while back, there was a push to make COPY gzip-aware. That didn't happen,
but COPY FROM PROGRAM did, and it scratches the same itch.

I have a similar need, but with file_fdw foreign tables. I have .csv.gz
files downloaded to the server, but those CSVs have 100+ columns in them,
and in this case I only really care about a half dozen of those columns. I'd
like to avoid:
- the overhead of writing the uncompressed file to disk and then immediately
re-reading it
- writing unwanted columns to a temp/work table via COPY, and then
immediately re-reading them
- multicorn fdw because it ends up making a python string out of all data
cells
- a csv parsing tool like csvtool or mlr, because they output another CSV
which must be reparsed from scratch

Since file_fdw leverages COPY, it seemed like it would be easy to add the
FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql
IRC, only to discover that Adam Gomaa ( akgomaa@gmail.com ) had already
written such a thing, but hadn't submitted it. Attached is a small rework of
his patch, along with documentation.

His failure to submit that here himself raises the question of whether
he is OK with that code being released under the PostgreSQL license.
If this patch is going to be considered, I think we should have a post
from him clarifying that matter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Adam Gomaa
akgomaa@gmail.com
In reply to: Robert Haas (#4)
Re: Let file_fdw access COPY FROM PROGRAM

I'm fine with the code being released under the PostgreSQL license.

Thanks,
Adam
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Corey Huinker (#1)
Re: Let file_fdw access COPY FROM PROGRAM

Hi Corey,

Here are some comments and a review of the patch.

On 2016/06/03 5:48, Corey Huinker wrote:

A while back, there was a push to make COPY gzip-aware. That didn't happen,
but COPY FROM PROGRAM did, and it scratches the same itch.

I have a similar need, but with file_fdw foreign tables. I have .csv.gz
files downloaded to the server, but those CSVs have 100+ columns in them,
and in this case I only really care about a half dozen of those columns.
I'd like to avoid:
- the overhead of writing the uncompressed file to disk and then
immediately re-reading it
- writing unwanted columns to a temp/work table via COPY, and then
immediately re-reading them
- multicorn fdw because it ends up making a python string out of all data
cells
- a csv parsing tool like csvtool or mlr, because they output another CSV
which must be reparsed from scratch

This feature seems desirable to me too.

Since file_fdw leverages COPY, it seemed like it would be easy to add the
FROM PROGRAM feature to file_fdw. I began asking questions on #postgresql
IRC, only to discover that Adam Gomaa ( akgomaa@gmail.com ) had already
written such a thing, but hadn't submitted it. Attached is a small rework
of his patch, along with documentation.

Yeah, the fact that file_fdw leverages COPY makes including this feature a
breeze. Various concerns such as security, popen() vs execve() were
addressed when COPY FROM PROGRAM/STDIN/STDOUT feature was proposed [1]/messages/by-id/002101cd9190$081c4140$1854c3c0$@lab.ntt.co.jp, so
we will not have to go through that again (hopefully).

NOTE: The regression test includes unix commands in the program option. I
figured that wouldn't work for win32 systems, so I checked to see what the
regression tests do to test COPY FROM PROGRAM...and I couldn't find any. So
I guess the test exists as a proof of concept that will get excised before
final commit.

I am not familiar with win32 stuff too, so I don't have much to say about
that. Maybe someone else can chime in to help with that.

About the patch:

* applies cleanly, compiles fine
* basic functionality seems to work (have not done any extensive tests though)

-     Specifies the file to be read.  Required.  Must be an absolute path
name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal> must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>

The note about filename and program being mutually exclusive could be
placed at the end of list of options. Or perhaps mention this note as
part of the description of program option, because filename is already
defined by that point.

+   <listitem>
+    <para>
+     Specifies the command to executed.

s/to executed/to be executed/g

+     Either <literal>program</literal> or <literal>filename</literal> must be
+     specified.  They are mutually exclusive.
     </para>

Oh I see this has already been mentioned in program option description.
Did you intend to specify the same in both filename and program descriptions?

@@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
 typedef struct FileFdwExecutionState
 {
     char       *filename;       /* file to read */
-    List       *options;        /* merged COPY options, excluding filename */
-    CopyState   cstate;         /* state of reading file */
+    char       *program;        /* program to read output from */
+    List       *options;        /* merged COPY options, excluding
filename and program */
+    CopyState   cstate;         /* state of reading file or program */

Have you considered a Boolean flag is_program instead of char **program
similar to how copy.c does it? (See a related comment further below)

-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * This is because the filename or program string are two of those
+     * options, and we don't want non-superusers to be able to determine
which
+     * file gets read or what command is run.

I'm no native English speaker, but I wonder if this should read: filename
or program string *is* one of those options ... OR filename *and* program
are two of those options ... ? Also, "are two of those options" sounds a
bit odd to me because I don't see that used as often as "one of those
whatever". I guess that's quite a bit of nitpicking about a C comment, ;)

@@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
                 ereport(ERROR,
                         (errcode(ERRCODE_SYNTAX_ERROR),
                          errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
             filename = defGetString(def);
         }
+        else if (strcmp(def->defname, "program") == 0)
+        {
+            if (filename)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            program = defGetString(def);
+        }

Why does it check for filename here? Also the other way around above.

@@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
      * Create CopyState from FDW options.  We always acquire all columns, so
      * as to match the expected ScanTupleSlot signature.
      */
-    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-                           filename,
-                           false,
-                           NIL,
-                           options);
+    if(filename)
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               filename,
+                               false,
+                               NIL,
+                               options);
+    else
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               program,
+                               true,
+                               NIL,
+                               options)

Like I mentioned above, if there was a is_program Boolean flag instead of
separate filename and program, this would be just:

+    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                           filename,
+                           is_program,
+                           NIL,
+                           options);

That would get rid of if-else blocks here and a couple of other places.

diff --git a/contrib/file_fdw/input/file_fdw.source
b/contrib/file_fdw/input/file_fdw.source
index 685561f..eae34a4 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source

You forgot to include expected output diffs.

Thanks,
Amit

[1]: /messages/by-id/002101cd9190$081c4140$1854c3c0$@lab.ntt.co.jp
/messages/by-id/002101cd9190$081c4140$1854c3c0$@lab.ntt.co.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Corey Huinker
corey.huinker@gmail.com
In reply to: Amit Langote (#6)
Re: Let file_fdw access COPY FROM PROGRAM

On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

I am not familiar with win32 stuff too, so I don't have much to say about
that. Maybe someone else can chime in to help with that.

The regressions basically *can't* test this because we'd need a shell
command we know works on any architecture.

About the patch:

* applies cleanly, compiles fine
* basic functionality seems to work (have not done any extensive tests
though)

-     Specifies the file to be read.  Required.  Must be an absolute path
name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal>
must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>

The note about filename and program being mutually exclusive could be
placed at the end of list of options. Or perhaps mention this note as
part of the description of program option, because filename is already
defined by that point.

+   <listitem>
+    <para>
+     Specifies the command to executed.

s/to executed/to be executed/g

Correct. I will fix that when other issues below are resolved.

+     Either <literal>program</literal> or <literal>filename</literal>
must be
+     specified.  They are mutually exclusive.
</para>

Oh I see this has already been mentioned in program option description.
Did you intend to specify the same in both filename and program
descriptions?

It's been a while since I repackaged Adam's code, but generally I'm in
favor of some redundancy if the two mutually exclusive things are
documented far enough apart.

@@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
typedef struct FileFdwExecutionState
{
char       *filename;       /* file to read */
-    List       *options;        /* merged COPY options, excluding
filename */
-    CopyState   cstate;         /* state of reading file */
+    char       *program;        /* program to read output from */
+    List       *options;        /* merged COPY options, excluding
filename and program */
+    CopyState   cstate;         /* state of reading file or program */

Have you considered a Boolean flag is_program instead of char **program
similar to how copy.c does it? (See a related comment further below)

Considered it yes, but decided against it when I started to write my
version. When Adam delivered his version of the patch, I noticed he had
made the same design decision. Only one of them will be initialized, and
the boolean will byte align to 4 bytes, so it's the same storage allocated
either way.

Either we're fine with two variables, or we think file_name is poorly
named. I have only a slight preference for the two variables, and defer to
the group for a preference.

-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * This is because the filename or program string are two of those
+     * options, and we don't want non-superusers to be able to determine
which
+     * file gets read or what command is run.

I'm no native English speaker, but I wonder if this should read: filename
or program string *is* one of those options ... OR filename *and* program
are two of those options ... ? Also, "are two of those options" sounds a
bit odd to me because I don't see that used as often as "one of those
whatever". I guess that's quite a bit of nitpicking about a C comment, ;)

Given that C comments constitute a large portion of our documentation, I
fully support making them as clear as possible.

I don't remember if this is Adam's comment or mine. Adam - if you're out
there, chime in.

The original paragraph was:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read.
In principle non-superusers could be allowed to change the other options,
but that's not supported at present.

How does this paragraph sound?:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read,
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.

@@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
filename = defGetString(def);
}
+        else if (strcmp(def->defname, "program") == 0)
+        {
+            if (filename)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            program = defGetString(def);
+        }

Why does it check for filename here? Also the other way around above.

We don't want them to specify both program and filename, nor do we allow 2
programs, or 2 filenames.

@@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
eflags)
* Create CopyState from FDW options.  We always acquire all columns,
so
* as to match the expected ScanTupleSlot signature.
*/
-    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-                           filename,
-                           false,
-                           NIL,
-                           options);
+    if(filename)
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               filename,
+                               false,
+                               NIL,
+                               options);
+    else
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               program,
+                               true,
+                               NIL,
+                               options)

Like I mentioned above, if there was a is_program Boolean flag instead of
separate filename and program, this would be just:

+    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                           filename,
+                           is_program,
+                           NIL,
+                           options);

That would get rid of if-else blocks here and a couple of other places.

It would, pushing the complexity out to the user. Which could be fine, but
if we do that then "filename" is a misnomer.

diff --git a/contrib/file_fdw/input/file_fdw.source
b/contrib/file_fdw/input/file_fdw.source
index 685561f..eae34a4 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source

You forgot to include expected output diffs.

Having regression tests for this is extremely problematic, because the
program invoked would need to be an invokable command on any architecture
supported by postgres. I'm pretty sure no such command exists.

#8Craig Ringer
craig.ringer@2ndquadrant.com
In reply to: Corey Huinker (#7)
Re: Let file_fdw access COPY FROM PROGRAM

On 7 Sep. 2016 02:14, "Corey Huinker" <corey.huinker@gmail.com> wrote:

Having regression tests for this is extremely problematic, because the

program invoked would need to be an invokable command on any architecture
supported by postgres. I'm pretty sure no such command exists.

Your best bet will be using the TAP framework. There you can use Perl logic.

I'm not sure where to put such a test though. It doesn't really make sense
in src/test/recovery/ .

#9Michael Paquier
michael.paquier@gmail.com
In reply to: Craig Ringer (#8)
Re: Let file_fdw access COPY FROM PROGRAM

On Wed, Sep 7, 2016 at 7:53 AM, Craig Ringer
<craig.ringer@2ndquadrant.com> wrote:

On 7 Sep. 2016 02:14, "Corey Huinker" <corey.huinker@gmail.com> wrote:

Having regression tests for this is extremely problematic, because the
program invoked would need to be an invokable command on any architecture
supported by postgres. I'm pretty sure no such command exists.

Your best bet will be using the TAP framework. There you can use Perl logic.

I'm not sure where to put such a test though. It doesn't really make sense
in src/test/recovery/ .

There is nothing preventing the addition of a TAP test where there are
normal regression tests, so if you want a test for file_fdw you should
add it there, then change its Makefile to have the following target
rules:
check: prove-check

prove-check:
$(prove_check)
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Corey Huinker (#7)
Re: Let file_fdw access COPY FROM PROGRAM

On 2016/09/07 3:12, Corey Huinker wrote:

On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:

I am not familiar with win32 stuff too, so I don't have much to say about
that. Maybe someone else can chime in to help with that.

The regressions basically *can't* test this because we'd need a shell
command we know works on any architecture.

OK.

@@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
typedef struct FileFdwExecutionState
{
char       *filename;       /* file to read */
-    List       *options;        /* merged COPY options, excluding
filename */
-    CopyState   cstate;         /* state of reading file */
+    char       *program;        /* program to read output from */
+    List       *options;        /* merged COPY options, excluding
filename and program */
+    CopyState   cstate;         /* state of reading file or program */

Have you considered a Boolean flag is_program instead of char **program
similar to how copy.c does it? (See a related comment further below)

Considered it yes, but decided against it when I started to write my
version. When Adam delivered his version of the patch, I noticed he had
made the same design decision. Only one of them will be initialized, and
the boolean will byte align to 4 bytes, so it's the same storage allocated
either way.

Either we're fine with two variables, or we think file_name is poorly
named. I have only a slight preference for the two variables, and defer to
the group for a preference.

OK, let's defer it to the committer.

-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * This is because the filename or program string are two of those
+     * options, and we don't want non-superusers to be able to determine
which
+     * file gets read or what command is run.

I'm no native English speaker, but I wonder if this should read: filename
or program string *is* one of those options ... OR filename *and* program
are two of those options ... ? Also, "are two of those options" sounds a
bit odd to me because I don't see that used as often as "one of those
whatever". I guess that's quite a bit of nitpicking about a C comment, ;)

Given that C comments constitute a large portion of our documentation, I
fully support making them as clear as possible.

I don't remember if this is Adam's comment or mine. Adam - if you're out
there, chime in.

The original paragraph was:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read.
In principle non-superusers could be allowed to change the other options,
but that's not supported at present.

How does this paragraph sound?:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read,
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.

Hmm, just a little modification would make it better for me:

... for security reasons. For example, only superuser should be able to
determine which file read or which program is run.

Although that could be just me.

@@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
filename = defGetString(def);
}
+        else if (strcmp(def->defname, "program") == 0)
+        {
+            if (filename)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            if (program)
+                ereport(ERROR,
+                        (errcode(ERRCODE_SYNTAX_ERROR),
+                         errmsg("conflicting or redundant options")));
+            program = defGetString(def);
+        }

Why does it check for filename here? Also the other way around above.

We don't want them to specify both program and filename, nor do we allow 2
programs, or 2 filenames.

Ah, I forgot about the mutually exclusive options part.

@@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
eflags)
* Create CopyState from FDW options.  We always acquire all columns,
so
* as to match the expected ScanTupleSlot signature.
*/
-    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-                           filename,
-                           false,
-                           NIL,
-                           options);
+    if(filename)
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               filename,
+                               false,
+                               NIL,
+                               options);
+    else
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               program,
+                               true,
+                               NIL,
+                               options)

Like I mentioned above, if there was a is_program Boolean flag instead of
separate filename and program, this would be just:

+    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                           filename,
+                           is_program,
+                           NIL,
+                           options);

That would get rid of if-else blocks here and a couple of other places.

It would, pushing the complexity out to the user. Which could be fine, but
if we do that then "filename" is a misnomer.

This is an internal state so I'm not sure how this would be pushing
complexity out to the user. Am I perhaps misreading what you said?

What a user sees is that there are two separate foreign table options -
filename and program. That we handle them internally using a string to
identify either file or program and a Boolean flag to show which of the
two is just an internal implementation detail.

COPY does it that way internally and I just saw that psql's \copy does it
the same way too. In the latter's case, following is the options struct
(src/bin/psql/copy.c):

struct copy_options
{
char *before_tofrom; /* COPY string before TO/FROM */
char *after_tofrom; /* COPY string after TO/FROM filename */
char *file; /* NULL = stdin/stdout */
bool program; /* is 'file' a program to popen? */
bool psql_inout; /* true = use psql stdin/stdout */
bool from; /* true = FROM, false = TO */
};

But as you said above, this could be deferred to the committer.

diff --git a/contrib/file_fdw/input/file_fdw.source
b/contrib/file_fdw/input/file_fdw.source
index 685561f..eae34a4 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source

You forgot to include expected output diffs.

Having regression tests for this is extremely problematic, because the
program invoked would need to be an invokable command on any architecture
supported by postgres. I'm pretty sure no such command exists.

Craig and Michael elsewhere suggested something about adding TAP tests. If
that helps the situation, maybe you could.

I will mark the CF entry status to "Waiting on author" till you post a new
patch.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Corey Huinker
corey.huinker@gmail.com
In reply to: Craig Ringer (#8)
Re: Let file_fdw access COPY FROM PROGRAM

On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer <craig.ringer@2ndquadrant.com>
wrote:

On 7 Sep. 2016 02:14, "Corey Huinker" <corey.huinker@gmail.com> wrote:

Having regression tests for this is extremely problematic, because the

program invoked would need to be an invokable command on any architecture
supported by postgres. I'm pretty sure no such command exists.

Your best bet will be using the TAP framework. There you can use Perl
logic.

I'm not sure where to put such a test though. It doesn't really make sense
in src/test/recovery/ .

And the TAP test would detect the operating system and know to create an
FDW that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
test_data.csv' on windows, and 'type test_data.csv;1' on VMS?

#12Craig Ringer
craig.ringer@2ndquadrant.com
In reply to: Corey Huinker (#11)
Re: Let file_fdw access COPY FROM PROGRAM

On 7 September 2016 at 11:21, Corey Huinker <corey.huinker@gmail.com> wrote:

On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer <craig.ringer@2ndquadrant.com>

And the TAP test would detect the operating system and know to create an FDW
that has the PROGRAM value 'cat test_data.csv' on Unix, 'type test_data.csv'
on windows, and 'type test_data.csv;1' on VMS?

Right. Or just "perl emit_test_data.pl" that works for all of them,
since TAP is perl so you can safely assume you have Perl.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Corey Huinker
corey.huinker@gmail.com
In reply to: Amit Langote (#10)
Re: Let file_fdw access COPY FROM PROGRAM

On Tue, Sep 6, 2016 at 9:46 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

On 2016/09/07 3:12, Corey Huinker wrote:

On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:

I am not familiar with win32 stuff too, so I don't have much to say

about

that. Maybe someone else can chime in to help with that.

The regressions basically *can't* test this because we'd need a shell
command we know works on any architecture.

OK.

Well...maybe not, depending on what Craig and other can do to educate me
about the TAP tests.

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read,
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.

Hmm, just a little modification would make it better for me:

... for security reasons. For example, only superuser should be able to
determine which file read or which program is run.

Although that could be just me.

In this case "determine" is unclear whether a non-superuser can set the
program to be run, or is capable of knowing which program is set to be run
by the fdw.

We may want some more opinions on what is the most clear.

@@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
eflags)
* Create CopyState from FDW options. We always acquire all

columns,

so
* as to match the expected ScanTupleSlot signature.
*/
-    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
-                           filename,
-                           false,
-                           NIL,
-                           options);
+    if(filename)
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               filename,
+                               false,
+                               NIL,
+                               options);
+    else
+        cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                               program,
+                               true,
+                               NIL,
+                               options)

Like I mentioned above, if there was a is_program Boolean flag instead

of

separate filename and program, this would be just:

+    cstate = BeginCopyFrom(node->ss.ss_currentRelation,
+                           filename,
+                           is_program,
+                           NIL,
+                           options);

That would get rid of if-else blocks here and a couple of other places.

It would, pushing the complexity out to the user. Which could be fine,

but

if we do that then "filename" is a misnomer.

This is an internal state so I'm not sure how this would be pushing
complexity out to the user. Am I perhaps misreading what you said?

Indeed, it is internal state. Maybe we rename the variable file_command or
something.

What a user sees is that there are two separate foreign table options -
filename and program. That we handle them internally using a string to
identify either file or program and a Boolean flag to show which of the
two is just an internal implementation detail.

COPY does it that way internally and I just saw that psql's \copy does it
the same way too. In the latter's case, following is the options struct
(src/bin/psql/copy.c):

struct copy_options
{
char *before_tofrom; /* COPY string before TO/FROM */
char *after_tofrom; /* COPY string after TO/FROM filename */
char *file; /* NULL = stdin/stdout */
bool program; /* is 'file' a program to popen? */
bool psql_inout; /* true = use psql stdin/stdout */
bool from; /* true = FROM, false = TO */
};

But as you said above, this could be deferred to the committer.

Yeah, and that made for zero storage savings: a char pointer which is never
assigned a string takes up as much space as a 4-byte-aligned boolean. And
the result is that "file" really means program, which I found slightly
awkward.

diff --git a/contrib/file_fdw/input/file_fdw.source
b/contrib/file_fdw/input/file_fdw.source
index 685561f..eae34a4 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source

You forgot to include expected output diffs.

Having regression tests for this is extremely problematic, because the
program invoked would need to be an invokable command on any architecture
supported by postgres. I'm pretty sure no such command exists.

Craig and Michael elsewhere suggested something about adding TAP tests. If
that helps the situation, maybe you could.

Yeah, I need to educate myself about those.

I will mark the CF entry status to "Waiting on author" till you post a new
patch.

Thanks.

Show quoted text
#14Corey Huinker
corey.huinker@gmail.com
In reply to: Craig Ringer (#12)
Re: Let file_fdw access COPY FROM PROGRAM

On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer <craig.ringer@2ndquadrant.com>
wrote:

On 7 September 2016 at 11:21, Corey Huinker <corey.huinker@gmail.com>
wrote:

On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer <

craig.ringer@2ndquadrant.com>

And the TAP test would detect the operating system and know to create an

FDW

that has the PROGRAM value 'cat test_data.csv' on Unix, 'type

test_data.csv'

on windows, and 'type test_data.csv;1' on VMS?

Right. Or just "perl emit_test_data.pl" that works for all of them,
since TAP is perl so you can safely assume you have Perl.

Thanks. I was mentally locked in more basic OS commands. Am I right in
thinking perl is about the *only* OS command you can be sure is on every
architecture?

The platforms page says we support S/390 but no mention of VM/MVS/CMS. Did
we do an OS/400 port yet? ::ducks::

#15Craig Ringer
craig.ringer@2ndquadrant.com
In reply to: Corey Huinker (#14)
Re: Let file_fdw access COPY FROM PROGRAM

On 7 September 2016 at 11:37, Corey Huinker <corey.huinker@gmail.com> wrote:

On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer <craig.ringer@2ndquadrant.com>
wrote:

On 7 September 2016 at 11:21, Corey Huinker <corey.huinker@gmail.com>
wrote:

On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer
<craig.ringer@2ndquadrant.com>

And the TAP test would detect the operating system and know to create an
FDW
that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
test_data.csv'
on windows, and 'type test_data.csv;1' on VMS?

Right. Or just "perl emit_test_data.pl" that works for all of them,
since TAP is perl so you can safely assume you have Perl.

Thanks. I was mentally locked in more basic OS commands. Am I right in
thinking perl is about the *only* OS command you can be sure is on every
architecture?

Probably, there's a lot of crazy out there.

TAP tests can be conditionally run based on architecture, but
something like this is probably worth testing as widely as possible.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Corey Huinker (#13)
Re: Let file_fdw access COPY FROM PROGRAM

On 2016/09/07 12:29, Corey Huinker wrote:

On Tue, Sep 6, 2016 at 9:46 PM, Amit Langote wrote:

OK.

Well...maybe not, depending on what Craig and other can do to educate me
about the TAP tests.

Sure.

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read,
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.

Hmm, just a little modification would make it better for me:

... for security reasons. For example, only superuser should be able to
determine which file read or which program is run.

Although that could be just me.

In this case "determine" is unclear whether a non-superuser can set the
program to be run, or is capable of knowing which program is set to be run
by the fdw.

Hmm, it is indeed unclear.

How about:

... for security reasons. For example, only superuser should be able to
*change* which file is read or which program is run.

I just realized this is not just about a C comment. There is a line in
documentation as well which needs an update. Any conclusion here should
be applied there.

We may want some more opinions on what is the most clear.

Certainly.

But as you said above, this could be deferred to the committer.

Yeah, and that made for zero storage savings: a char pointer which is never
assigned a string takes up as much space as a 4-byte-aligned boolean. And
the result is that "file" really means program, which I found slightly
awkward.

My only intent to push for that approach is to have consistency with other
code implementing a similar feature although it may not be that important.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Corey Huinker
corey.huinker@gmail.com
In reply to: Craig Ringer (#15)
Re: Let file_fdw access COPY FROM PROGRAM

On Tue, Sep 6, 2016 at 11:44 PM, Craig Ringer <craig.ringer@2ndquadrant.com>
wrote:

On 7 September 2016 at 11:37, Corey Huinker <corey.huinker@gmail.com>
wrote:

On Tue, Sep 6, 2016 at 11:24 PM, Craig Ringer <

craig.ringer@2ndquadrant.com>

wrote:

On 7 September 2016 at 11:21, Corey Huinker <corey.huinker@gmail.com>
wrote:

On Tue, Sep 6, 2016 at 6:53 PM, Craig Ringer
<craig.ringer@2ndquadrant.com>

And the TAP test would detect the operating system and know to create

an

FDW
that has the PROGRAM value 'cat test_data.csv' on Unix, 'type
test_data.csv'
on windows, and 'type test_data.csv;1' on VMS?

Right. Or just "perl emit_test_data.pl" that works for all of them,
since TAP is perl so you can safely assume you have Perl.

Thanks. I was mentally locked in more basic OS commands. Am I right in
thinking perl is about the *only* OS command you can be sure is on every
architecture?

Probably, there's a lot of crazy out there.

TAP tests can be conditionally run based on architecture, but
something like this is probably worth testing as widely as possible.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Stylistically, would a separate .pl file for the emitter be preferable to
something inline like

perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'

?

I'm inclined to go inline to cut down on the number of moving parts, but I
can see where perl's readability is a barrier to some, and either way I
want to follow established patterns.

[*] For those who don't perl, the command prints:

a b cc 4
b c dd 5

#18Craig Ringer
craig.ringer@2ndquadrant.com
In reply to: Corey Huinker (#17)
Re: Let file_fdw access COPY FROM PROGRAM

On 9 Sep. 2016 03:45, "Corey Huinker" <corey.huinker@gmail.com> wrote:

Stylistically, would a separate .pl file for the emitter be preferable to

something inline like

perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'

I'd be fine with that and a suitable comment. Just be careful with
different platforms' shell escaping rules.

#19Corey Huinker
corey.huinker@gmail.com
In reply to: Craig Ringer (#18)
Re: Let file_fdw access COPY FROM PROGRAM

On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringer <craig.ringer@2ndquadrant.com>
wrote:

On 9 Sep. 2016 03:45, "Corey Huinker" <corey.huinker@gmail.com> wrote:

Stylistically, would a separate .pl file for the emitter be preferable

to something inline like

perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'

I'd be fine with that and a suitable comment. Just be careful with
different platforms' shell escaping rules.

Do perl command switches on windows/VMS use /e instead of -e? If so,
that'd be a great argument doing just "perl filename".

#20Corey Huinker
corey.huinker@gmail.com
In reply to: Craig Ringer (#18)
1 attachment(s)
Re: Let file_fdw access COPY FROM PROGRAM

V2 of this patch:

Changes:
* rebased to most recent master
* removed non-tap test that assumed the existence of Unix sed program
* added non-tap test that assumes the existence of perl
* switched from filename/program to filename/is_program to more closely
follow patterns in copy.c
* slight wording change in C comments

On Thu, Sep 8, 2016 at 6:59 PM, Craig Ringer <craig.ringer@2ndquadrant.com>
wrote:

Show quoted text

On 9 Sep. 2016 03:45, "Corey Huinker" <corey.huinker@gmail.com> wrote:

Stylistically, would a separate .pl file for the emitter be preferable

to something inline like

perl -e 'print "a\tb\tcc\t4\n"; print "b\tc\tdd\t5\n"'

I'd be fine with that and a suitable comment. Just be careful with
different platforms' shell escaping rules.

Attachments:

file_fdw_program_v2.difftext/plain; charset=US-ASCII; name=file_fdw_program_v2.diffDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..bf9753a 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
 	/* File options */
 	{"filename", ForeignTableRelationId},
+	{"program", ForeignTableRelationId},
 
 	/* Format options */
 	/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-	char	   *filename;		/* file to read */
-	List	   *options;		/* merged COPY options, excluding filename */
-	BlockNumber pages;			/* estimate of file's physical size */
-	double		ntuples;		/* estimate of number of rows in file */
+	char	   *filename;		/* file/program to read */
+	bool		is_program;		/* true if filename represents an OS command */
+	List	   *options;		/* merged COPY options, excluding filename and program */
+	BlockNumber pages;			/* estimate of file or program output's physical size */
+	double		ntuples;		/* estimate of number of rows in file/program output */
 } FileFdwPlanState;
 
 /*
@@ -96,9 +98,10 @@ typedef struct FileFdwPlanState
  */
 typedef struct FileFdwExecutionState
 {
-	char	   *filename;		/* file to read */
-	List	   *options;		/* merged COPY options, excluding filename */
-	CopyState	cstate;			/* state of reading file */
+	char	   *filename;		/* file/program to read */
+	bool		is_program;		/* true if filename represents an OS command */
+	List	   *options;		/* merged COPY options, excluding filename and is_program */
+	CopyState	cstate;			/* state of reading file or program */
 } FileFdwExecutionState;
 
 /*
@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo *root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-			   char **filename, List **other_options);
+						   char **filename,
+						   bool *is_program,	
+						   List **other_options);
 static List *get_file_fdw_attribute_options(Oid relid);
 static bool check_selective_binary_conversion(RelOptInfo *baserel,
 								  Oid foreigntableid,
@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 
 	/*
 	 * Only superusers are allowed to set options of a file_fdw foreign table.
-	 * This is because the filename is one of those options, and we don't want
-	 * non-superusers to be able to determine which file gets read.
+	 * The reason for this is to prevent non-superusers from changing the 
+	 * definition to access an arbitrary file not visible to that user
+	 * or to run programs not accessible to that user.
 	 *
 	 * Putting this sort of permissions check in a validator is a bit of a
 	 * crock, but there doesn't seem to be any other place that can enforce
 	 * the check more cleanly.
 	 *
-	 * Note that the valid_options[] array disallows setting filename at any
-	 * options level other than foreign table --- otherwise there'd still be a
-	 * security hole.
+	 * Note that the valid_options[] array disallows setting filename and
+	 * program at any options level other than foreign table --- otherwise
+	 * there'd still be a security hole.
 	 */
 	if (catalog == ForeignTableRelationId && !superuser())
 		ereport(ERROR,
@@ -247,11 +253,11 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		}
 
 		/*
-		 * Separate out filename and column-specific options, since
+		 * Separate out filename, program, and column-specific options, since
 		 * ProcessCopyOptions won't accept them.
 		 */
 
-		if (strcmp(def->defname, "filename") == 0)
+		if ((strcmp(def->defname, "filename") == 0) || (strcmp(def->defname, "program") == 0))
 		{
 			if (filename)
 				ereport(ERROR,
@@ -296,12 +302,13 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	ProcessCopyOptions(NULL, NULL, true, other_options);
 
 	/*
-	 * Filename option is required for file_fdw foreign tables.
+	 * Either filename or program option is required for file_fdw foreign
+	 * tables.
 	 */
 	if (catalog == ForeignTableRelationId && filename == NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FDW_DYNAMIC_PARAMETER_VALUE_NEEDED),
-				 errmsg("filename is required for file_fdw foreign tables")));
+				 errmsg("either filename or program is required for file_fdw foreign tables")));
 
 	PG_RETURN_VOID();
 }
@@ -326,12 +333,13 @@ is_valid_option(const char *option, Oid context)
 /*
  * Fetch the options for a file_fdw foreign table.
  *
- * We have to separate out "filename" from the other options because
- * it must not appear in the options list passed to the core COPY code.
+ * We have to separate out "filename" and "program" from the other options
+ * because it must not appear in the options list passed to the core COPY
+ * code.
  */
 static void
 fileGetOptions(Oid foreigntableid,
-			   char **filename, List **other_options)
+			   char **filename, bool *is_program, List **other_options)
 {
 	ForeignTable *table;
 	ForeignServer *server;
@@ -359,9 +367,10 @@ fileGetOptions(Oid foreigntableid,
 	options = list_concat(options, get_file_fdw_attribute_options(foreigntableid));
 
 	/*
-	 * Separate out the filename.
+	 * Separate out the filename and program.
 	 */
 	*filename = NULL;
+	*is_program = false;
 	prev = NULL;
 	foreach(lc, options)
 	{
@@ -373,6 +382,13 @@ fileGetOptions(Oid foreigntableid,
 			options = list_delete_cell(options, lc, prev);
 			break;
 		}
+		else if (strcmp(def->defname, "program") == 0)
+		{
+			*filename = defGetString(def);
+			*is_program = true;
+			options = list_delete_cell(options, lc, prev);
+			break;
+		}
 		prev = lc;
 	}
 
@@ -381,7 +397,7 @@ fileGetOptions(Oid foreigntableid,
 	 * options, but check again, just in case.
 	 */
 	if (*filename == NULL)
-		elog(ERROR, "filename is required for file_fdw foreign tables");
+		elog(ERROR, "either filename or program is required for file_fdw foreign tables");
 
 	*other_options = options;
 }
@@ -475,12 +491,13 @@ fileGetForeignRelSize(PlannerInfo *root,
 	FileFdwPlanState *fdw_private;
 
 	/*
-	 * Fetch options.  We only need filename at this point, but we might as
-	 * well get everything and not need to re-fetch it later in planning.
+	 * Fetch options.  We only need filename (or program) at this point, but
+	 * we might as well get everything and not need to re-fetch it later in
+	 * planning.
 	 */
 	fdw_private = (FileFdwPlanState *) palloc(sizeof(FileFdwPlanState));
 	fileGetOptions(foreigntableid,
-				   &fdw_private->filename, &fdw_private->options);
+				   &fdw_private->filename, &fdw_private->is_program, &fdw_private->options);
 	baserel->fdw_private = (void *) fdw_private;
 
 	/* Estimate relation size */
@@ -583,20 +600,27 @@ static void
 fileExplainForeignScan(ForeignScanState *node, ExplainState *es)
 {
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 
-	/* Fetch options --- we only need filename at this point */
+	/* Fetch options --- we only need filename and is_program at this point */
 	fileGetOptions(RelationGetRelid(node->ss.ss_currentRelation),
-				   &filename, &options);
+				   &filename, &is_program, &options);
 
-	ExplainPropertyText("Foreign File", filename, es);
+	if(filename)
+	{
+		if (is_program)
+			ExplainPropertyText("Foreign Program", filename, es);
+		else
+			ExplainPropertyText("Foreign File", filename, es);
+	}
 
 	/* Suppress file size if we're not showing cost details */
 	if (es->costs)
 	{
 		struct stat stat_buf;
 
-		if (stat(filename, &stat_buf) == 0)
+		if ((! is_program) && (stat(filename, &stat_buf) == 0))
 			ExplainPropertyLong("Foreign File Size", (long) stat_buf.st_size,
 								es);
 	}
@@ -611,6 +635,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 {
 	ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	CopyState	cstate;
 	FileFdwExecutionState *festate;
@@ -623,7 +648,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 
 	/* Fetch options of foreign table */
 	fileGetOptions(RelationGetRelid(node->ss.ss_currentRelation),
-				   &filename, &options);
+				   &filename, &is_program, &options);
 
 	/* Add any options from the plan (currently only convert_selectively) */
 	options = list_concat(options, plan->fdw_private);
@@ -633,11 +658,11 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 	 * as to match the expected ScanTupleSlot signature.
 	 */
 	cstate = BeginCopyFrom(NULL,
-						   node->ss.ss_currentRelation,
-						   filename,
-						   false,
-						   NIL,
-						   options);
+							node->ss.ss_currentRelation,
+							filename,
+							is_program,
+							NIL,
+							options);
 
 	/*
 	 * Save state in node->fdw_state.  We must save enough information to call
@@ -645,6 +670,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 	 */
 	festate = (FileFdwExecutionState *) palloc(sizeof(FileFdwExecutionState));
 	festate->filename = filename;
+	festate->is_program = is_program;
 	festate->options = options;
 	festate->cstate = cstate;
 
@@ -709,7 +735,7 @@ fileReScanForeignScan(ForeignScanState *node)
 	festate->cstate = BeginCopyFrom(NULL,
 									node->ss.ss_currentRelation,
 									festate->filename,
-									false,
+									festate->is_program,
 									NIL,
 									festate->options);
 }
@@ -738,11 +764,19 @@ fileAnalyzeForeignTable(Relation relation,
 						BlockNumber *totalpages)
 {
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	struct stat stat_buf;
 
 	/* Fetch options of foreign table */
-	fileGetOptions(RelationGetRelid(relation), &filename, &options);
+	fileGetOptions(RelationGetRelid(relation), &filename, &is_program, &options);
+
+	/*
+	 * If this is a program instead of a file, just return false to skip
+	 * analyzing the table.
+	 */
+	if (is_program)
+		return false;
 
 	/*
 	 * Get size of the file.  (XXX if we fail here, would it be better to just
@@ -916,9 +950,10 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 
 	/*
 	 * Get size of the file.  It might not be there at plan time, though, in
-	 * which case we have to use a default estimate.
+	 * which case we have to use a default estimate.  We also have to fall
+	 * back to the default if using a program as the input.
 	 */
-	if (stat(fdw_private->filename, &stat_buf) < 0)
+	if (fdw_private->is_program || stat(fdw_private->filename, &stat_buf) < 0)
 		stat_buf.st_size = 10 * BLCKSZ;
 
 	/*
@@ -1036,6 +1071,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	bool	   *nulls;
 	bool		found;
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	CopyState	cstate;
 	ErrorContextCallback errcallback;
@@ -1050,12 +1086,12 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
 
 	/* Fetch options of foreign table */
-	fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+	fileGetOptions(RelationGetRelid(onerel), &filename, &is_program, &options);
 
 	/*
 	 * Create CopyState from FDW options.
 	 */
-	cstate = BeginCopyFrom(NULL, onerel, filename, false, NIL, options);
+	cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NIL, options);
 
 	/*
 	 * Use per-tuple memory context to prevent leak of memory used to read
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 685561f..59a983d 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -185,6 +185,17 @@ SET ROLE regress_file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE regress_file_fdw_superuser;
 
+-- PROGRAM test
+CREATE FOREIGN TABLE program_test(
+	a text,
+	b integer,
+	c date
+) SERVER file_server
+OPTIONS(program 'perl -e ''print "abc\t123\t1994-09-10\n"''');
+
+SELECT * FROM program_test;
+DROP FOREIGN TABLE program_test;
+
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 6fa5440..7dae388 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -76,7 +76,7 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
-ERROR:  filename is required for file_fdw foreign tables
+ERROR:  either filename or program is required for file_fdw foreign tables
 CREATE FOREIGN TABLE agg_text (
 	a	int2 CHECK (a >= 0),
 	b	float4
@@ -132,7 +132,7 @@ ERROR:  invalid option "force_not_null"
 HINT:  There are no valid options in this context.
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_not_null '*'); -- ERROR
 ERROR:  invalid option "force_not_null"
-HINT:  Valid options in this context are: filename, format, header, delimiter, quote, escape, null, encoding
+HINT:  Valid options in this context are: filename, program, format, header, delimiter, quote, escape, null, encoding
 -- force_null is not allowed to be specified at any foreign object level:
 ALTER FOREIGN DATA WRAPPER file_fdw OPTIONS (ADD force_null '*'); -- ERROR
 ERROR:  invalid option "force_null"
@@ -145,7 +145,7 @@ ERROR:  invalid option "force_null"
 HINT:  There are no valid options in this context.
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_null '*'); -- ERROR
 ERROR:  invalid option "force_null"
-HINT:  Valid options in this context are: filename, format, header, delimiter, quote, escape, null, encoding
+HINT:  Valid options in this context are: filename, program, format, header, delimiter, quote, escape, null, encoding
 -- basic query tests
 SELECT * FROM agg_text WHERE b > 10.0 ORDER BY a;
   a  |   b    
@@ -341,6 +341,20 @@ SET ROLE regress_file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 ERROR:  only superuser can change options of a file_fdw foreign table
 SET ROLE regress_file_fdw_superuser;
+-- PROGRAM test
+CREATE FOREIGN TABLE program_test(
+	a text,
+	b integer,
+	c date
+) SERVER file_server
+OPTIONS(program 'perl -e ''print "abc\t123\t1994-09-10\n"''');
+SELECT * FROM program_test;
+  a  |  b  |     c      
+-----+-----+------------
+ abc | 123 | 09-10-1994
+(1 row)
+
+DROP FOREIGN TABLE program_test;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index d3b39aa..f36a3c5 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -27,7 +27,26 @@
 
    <listitem>
     <para>
-     Specifies the file to be read.  Required.  Must be an absolute path name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal> must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><literal>program</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the command to executed.
+     Note that the command is invoked by the shell, so if you need to pass any
+     arguments to shell command that come from an untrusted source, you must
+     be careful to strip or escape any special characters that might have a
+     special meaning for the shell. For security reasons, it is best to use a
+     fixed command string, or at least avoid passing any user input in it.
+     Either <literal>program</literal> or <literal>filename</literal> must be
+     specified.  They are mutually exclusive.
     </para>
    </listitem>
   </varlistentry>
#21Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Corey Huinker (#20)
Re: Let file_fdw access COPY FROM PROGRAM

On 2016/09/11 8:04, Corey Huinker wrote:

V2 of this patch:

Changes:
* rebased to most recent master
* removed non-tap test that assumed the existence of Unix sed program
* added non-tap test that assumes the existence of perl
* switched from filename/program to filename/is_program to more closely
follow patterns in copy.c
* slight wording change in C comments

This version looks mostly good to me. Except some whitespace noise in
some hunks:

@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
*root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-               char **filename, List **other_options);
+                           char **filename,
+                           bool *is_program,

Space after "is_program,"

@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)

     /*
      * Only superusers are allowed to set options of a file_fdw foreign
table.
-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * The reason for this is to prevent non-superusers from changing the

Space after "the"

-        if (stat(filename, &stat_buf) == 0)
+        if ((! is_program) && (stat(filename, &stat_buf) == 0)))

Space between ! and is_program.

-        if (strcmp(def->defname, "filename") == 0)
+        if ((strcmp(def->defname, "filename") == 0) ||
(strcmp(def->defname, "program") == 0))

I think the usual style would be to split the if statement into two lines
as follows to keep within 80 characters per line [1]https://www.postgresql.org/docs/devel/static/source-format.html:

+        if ((strcmp(def->defname, "filename") == 0) ||
+            (strcmp(def->defname, "program") == 0))

And likewise for:

-                   &fdw_private->filename, &fdw_private->options);
+                   &fdw_private->filename, &fdw_private->is_program,
&fdw_private->options);

By the way, doesn't the following paragraph in file-fdw.sgml need an update?

<para>
Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read.
In principle non-superusers could be allowed to change the other options,
but that's not supported at present.
</para>

I would like to mark this now as "Ready for Committer".

Thanks,
Amit

[1]: https://www.postgresql.org/docs/devel/static/source-format.html

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Corey Huinker
corey.huinker@gmail.com
In reply to: Amit Langote (#21)
1 attachment(s)
Re: Let file_fdw access COPY FROM PROGRAM

Thanks for the review!

I agree with all the code cleanups suggested and have made then in the
attached patch, to save the committer some time.

Also in this patch, I changed sgml para to
<para>
Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.
</para>

"Determine" is an odd word in this context. I understand it to mean
"set/change", but I can see where a less familiar reader would take the
meaning to be "has permission to see the value already set". Either way, it
now mentions program as an option in addition to filename.

On Mon, Sep 12, 2016 at 1:59 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp

Show quoted text

wrote:

On 2016/09/11 8:04, Corey Huinker wrote:

V2 of this patch:

Changes:
* rebased to most recent master
* removed non-tap test that assumed the existence of Unix sed program
* added non-tap test that assumes the existence of perl
* switched from filename/program to filename/is_program to more closely
follow patterns in copy.c
* slight wording change in C comments

This version looks mostly good to me. Except some whitespace noise in
some hunks:

@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
*root, RelOptInfo *rel,
*/
static bool is_valid_option(const char *option, Oid context);
static void fileGetOptions(Oid foreigntableid,
-               char **filename, List **other_options);
+                           char **filename,
+                           bool *is_program,

Space after "is_program,"

@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)

/*
* Only superusers are allowed to set options of a file_fdw foreign
table.
-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * The reason for this is to prevent non-superusers from changing the

Space after "the"

-        if (stat(filename, &stat_buf) == 0)
+        if ((! is_program) && (stat(filename, &stat_buf) == 0)))

Space between ! and is_program.

-        if (strcmp(def->defname, "filename") == 0)
+        if ((strcmp(def->defname, "filename") == 0) ||
(strcmp(def->defname, "program") == 0))

I think the usual style would be to split the if statement into two lines
as follows to keep within 80 characters per line [1]:

+        if ((strcmp(def->defname, "filename") == 0) ||
+            (strcmp(def->defname, "program") == 0))

And likewise for:

-                   &fdw_private->filename, &fdw_private->options);
+                   &fdw_private->filename, &fdw_private->is_program,
&fdw_private->options);

By the way, doesn't the following paragraph in file-fdw.sgml need an
update?

<para>
Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read.
In principle non-superusers could be allowed to change the other options,
but that's not supported at present.
</para>

I would like to mark this now as "Ready for Committer".

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/source-format.html

Attachments:

file_fdw_program_v3.difftext/plain; charset=US-ASCII; name=file_fdw_program_v3.diffDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..7f534b1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
 	/* File options */
 	{"filename", ForeignTableRelationId},
+	{"program", ForeignTableRelationId},
 
 	/* Format options */
 	/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-	char	   *filename;		/* file to read */
-	List	   *options;		/* merged COPY options, excluding filename */
-	BlockNumber pages;			/* estimate of file's physical size */
-	double		ntuples;		/* estimate of number of rows in file */
+	char	   *filename;		/* file/program to read */
+	bool		is_program;		/* true if filename represents an OS command */
+	List	   *options;		/* merged COPY options, excluding filename and program */
+	BlockNumber pages;			/* estimate of file or program output's physical size */
+	double		ntuples;		/* estimate of number of rows in file/program output */
 } FileFdwPlanState;
 
 /*
@@ -96,9 +98,10 @@ typedef struct FileFdwPlanState
  */
 typedef struct FileFdwExecutionState
 {
-	char	   *filename;		/* file to read */
-	List	   *options;		/* merged COPY options, excluding filename */
-	CopyState	cstate;			/* state of reading file */
+	char	   *filename;		/* file/program to read */
+	bool		is_program;		/* true if filename represents an OS command */
+	List	   *options;		/* merged COPY options, excluding filename and is_program */
+	CopyState	cstate;			/* state of reading file or program */
 } FileFdwExecutionState;
 
 /*
@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo *root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-			   char **filename, List **other_options);
+						   char **filename,
+						   bool *is_program,
+						   List **other_options);
 static List *get_file_fdw_attribute_options(Oid relid);
 static bool check_selective_binary_conversion(RelOptInfo *baserel,
 								  Oid foreigntableid,
@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 
 	/*
 	 * Only superusers are allowed to set options of a file_fdw foreign table.
-	 * This is because the filename is one of those options, and we don't want
-	 * non-superusers to be able to determine which file gets read.
+	 * The reason for this is to prevent non-superusers from changing the
+	 * definition to access an arbitrary file not visible to that user
+	 * or to run programs not accessible to that user.
 	 *
 	 * Putting this sort of permissions check in a validator is a bit of a
 	 * crock, but there doesn't seem to be any other place that can enforce
 	 * the check more cleanly.
 	 *
-	 * Note that the valid_options[] array disallows setting filename at any
-	 * options level other than foreign table --- otherwise there'd still be a
-	 * security hole.
+	 * Note that the valid_options[] array disallows setting filename and
+	 * program at any options level other than foreign table --- otherwise
+	 * there'd still be a security hole.
 	 */
 	if (catalog == ForeignTableRelationId && !superuser())
 		ereport(ERROR,
@@ -247,11 +253,12 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		}
 
 		/*
-		 * Separate out filename and column-specific options, since
+		 * Separate out filename, program, and column-specific options, since
 		 * ProcessCopyOptions won't accept them.
 		 */
 
-		if (strcmp(def->defname, "filename") == 0)
+		if ((strcmp(def->defname, "filename") == 0) ||
+			(strcmp(def->defname, "program") == 0))
 		{
 			if (filename)
 				ereport(ERROR,
@@ -296,12 +303,13 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	ProcessCopyOptions(NULL, NULL, true, other_options);
 
 	/*
-	 * Filename option is required for file_fdw foreign tables.
+	 * Either filename or program option is required for file_fdw foreign
+	 * tables.
 	 */
 	if (catalog == ForeignTableRelationId && filename == NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FDW_DYNAMIC_PARAMETER_VALUE_NEEDED),
-				 errmsg("filename is required for file_fdw foreign tables")));
+				 errmsg("either filename or program is required for file_fdw foreign tables")));
 
 	PG_RETURN_VOID();
 }
@@ -326,12 +334,13 @@ is_valid_option(const char *option, Oid context)
 /*
  * Fetch the options for a file_fdw foreign table.
  *
- * We have to separate out "filename" from the other options because
- * it must not appear in the options list passed to the core COPY code.
+ * We have to separate out "filename" and "program" from the other options
+ * because it must not appear in the options list passed to the core COPY
+ * code.
  */
 static void
 fileGetOptions(Oid foreigntableid,
-			   char **filename, List **other_options)
+			   char **filename, bool *is_program, List **other_options)
 {
 	ForeignTable *table;
 	ForeignServer *server;
@@ -359,9 +368,10 @@ fileGetOptions(Oid foreigntableid,
 	options = list_concat(options, get_file_fdw_attribute_options(foreigntableid));
 
 	/*
-	 * Separate out the filename.
+	 * Separate out the filename and program.
 	 */
 	*filename = NULL;
+	*is_program = false;
 	prev = NULL;
 	foreach(lc, options)
 	{
@@ -373,6 +383,13 @@ fileGetOptions(Oid foreigntableid,
 			options = list_delete_cell(options, lc, prev);
 			break;
 		}
+		else if (strcmp(def->defname, "program") == 0)
+		{
+			*filename = defGetString(def);
+			*is_program = true;
+			options = list_delete_cell(options, lc, prev);
+			break;
+		}
 		prev = lc;
 	}
 
@@ -381,7 +398,7 @@ fileGetOptions(Oid foreigntableid,
 	 * options, but check again, just in case.
 	 */
 	if (*filename == NULL)
-		elog(ERROR, "filename is required for file_fdw foreign tables");
+		elog(ERROR, "either filename or program is required for file_fdw foreign tables");
 
 	*other_options = options;
 }
@@ -475,12 +492,15 @@ fileGetForeignRelSize(PlannerInfo *root,
 	FileFdwPlanState *fdw_private;
 
 	/*
-	 * Fetch options.  We only need filename at this point, but we might as
-	 * well get everything and not need to re-fetch it later in planning.
+	 * Fetch options.  We only need filename (or program) at this point, but
+	 * we might as well get everything and not need to re-fetch it later in
+	 * planning.
 	 */
 	fdw_private = (FileFdwPlanState *) palloc(sizeof(FileFdwPlanState));
 	fileGetOptions(foreigntableid,
-				   &fdw_private->filename, &fdw_private->options);
+				   &fdw_private->filename,
+				   &fdw_private->is_program,
+				   &fdw_private->options);
 	baserel->fdw_private = (void *) fdw_private;
 
 	/* Estimate relation size */
@@ -583,20 +603,27 @@ static void
 fileExplainForeignScan(ForeignScanState *node, ExplainState *es)
 {
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 
-	/* Fetch options --- we only need filename at this point */
+	/* Fetch options --- we only need filename and is_program at this point */
 	fileGetOptions(RelationGetRelid(node->ss.ss_currentRelation),
-				   &filename, &options);
+				   &filename, &is_program, &options);
 
-	ExplainPropertyText("Foreign File", filename, es);
+	if(filename)
+	{
+		if (is_program)
+			ExplainPropertyText("Foreign Program", filename, es);
+		else
+			ExplainPropertyText("Foreign File", filename, es);
+	}
 
 	/* Suppress file size if we're not showing cost details */
 	if (es->costs)
 	{
 		struct stat stat_buf;
 
-		if (stat(filename, &stat_buf) == 0)
+		if ((!is_program) && (stat(filename, &stat_buf) == 0))
 			ExplainPropertyLong("Foreign File Size", (long) stat_buf.st_size,
 								es);
 	}
@@ -611,6 +638,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 {
 	ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	CopyState	cstate;
 	FileFdwExecutionState *festate;
@@ -623,7 +651,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 
 	/* Fetch options of foreign table */
 	fileGetOptions(RelationGetRelid(node->ss.ss_currentRelation),
-				   &filename, &options);
+				   &filename, &is_program, &options);
 
 	/* Add any options from the plan (currently only convert_selectively) */
 	options = list_concat(options, plan->fdw_private);
@@ -633,11 +661,11 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 	 * as to match the expected ScanTupleSlot signature.
 	 */
 	cstate = BeginCopyFrom(NULL,
-						   node->ss.ss_currentRelation,
-						   filename,
-						   false,
-						   NIL,
-						   options);
+							node->ss.ss_currentRelation,
+							filename,
+							is_program,
+							NIL,
+							options);
 
 	/*
 	 * Save state in node->fdw_state.  We must save enough information to call
@@ -645,6 +673,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 	 */
 	festate = (FileFdwExecutionState *) palloc(sizeof(FileFdwExecutionState));
 	festate->filename = filename;
+	festate->is_program = is_program;
 	festate->options = options;
 	festate->cstate = cstate;
 
@@ -709,7 +738,7 @@ fileReScanForeignScan(ForeignScanState *node)
 	festate->cstate = BeginCopyFrom(NULL,
 									node->ss.ss_currentRelation,
 									festate->filename,
-									false,
+									festate->is_program,
 									NIL,
 									festate->options);
 }
@@ -738,11 +767,19 @@ fileAnalyzeForeignTable(Relation relation,
 						BlockNumber *totalpages)
 {
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	struct stat stat_buf;
 
 	/* Fetch options of foreign table */
-	fileGetOptions(RelationGetRelid(relation), &filename, &options);
+	fileGetOptions(RelationGetRelid(relation), &filename, &is_program, &options);
+
+	/*
+	 * If this is a program instead of a file, just return false to skip
+	 * analyzing the table.
+	 */
+	if (is_program)
+		return false;
 
 	/*
 	 * Get size of the file.  (XXX if we fail here, would it be better to just
@@ -916,9 +953,10 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 
 	/*
 	 * Get size of the file.  It might not be there at plan time, though, in
-	 * which case we have to use a default estimate.
+	 * which case we have to use a default estimate.  We also have to fall
+	 * back to the default if using a program as the input.
 	 */
-	if (stat(fdw_private->filename, &stat_buf) < 0)
+	if (fdw_private->is_program || stat(fdw_private->filename, &stat_buf) < 0)
 		stat_buf.st_size = 10 * BLCKSZ;
 
 	/*
@@ -1036,6 +1074,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	bool	   *nulls;
 	bool		found;
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	CopyState	cstate;
 	ErrorContextCallback errcallback;
@@ -1050,12 +1089,12 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
 
 	/* Fetch options of foreign table */
-	fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+	fileGetOptions(RelationGetRelid(onerel), &filename, &is_program, &options);
 
 	/*
 	 * Create CopyState from FDW options.
 	 */
-	cstate = BeginCopyFrom(NULL, onerel, filename, false, NIL, options);
+	cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NIL, options);
 
 	/*
 	 * Use per-tuple memory context to prevent leak of memory used to read
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 685561f..59a983d 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -185,6 +185,17 @@ SET ROLE regress_file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE regress_file_fdw_superuser;
 
+-- PROGRAM test
+CREATE FOREIGN TABLE program_test(
+	a text,
+	b integer,
+	c date
+) SERVER file_server
+OPTIONS(program 'perl -e ''print "abc\t123\t1994-09-10\n"''');
+
+SELECT * FROM program_test;
+DROP FOREIGN TABLE program_test;
+
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 6fa5440..7dae388 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -76,7 +76,7 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
-ERROR:  filename is required for file_fdw foreign tables
+ERROR:  either filename or program is required for file_fdw foreign tables
 CREATE FOREIGN TABLE agg_text (
 	a	int2 CHECK (a >= 0),
 	b	float4
@@ -132,7 +132,7 @@ ERROR:  invalid option "force_not_null"
 HINT:  There are no valid options in this context.
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_not_null '*'); -- ERROR
 ERROR:  invalid option "force_not_null"
-HINT:  Valid options in this context are: filename, format, header, delimiter, quote, escape, null, encoding
+HINT:  Valid options in this context are: filename, program, format, header, delimiter, quote, escape, null, encoding
 -- force_null is not allowed to be specified at any foreign object level:
 ALTER FOREIGN DATA WRAPPER file_fdw OPTIONS (ADD force_null '*'); -- ERROR
 ERROR:  invalid option "force_null"
@@ -145,7 +145,7 @@ ERROR:  invalid option "force_null"
 HINT:  There are no valid options in this context.
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_null '*'); -- ERROR
 ERROR:  invalid option "force_null"
-HINT:  Valid options in this context are: filename, format, header, delimiter, quote, escape, null, encoding
+HINT:  Valid options in this context are: filename, program, format, header, delimiter, quote, escape, null, encoding
 -- basic query tests
 SELECT * FROM agg_text WHERE b > 10.0 ORDER BY a;
   a  |   b    
@@ -341,6 +341,20 @@ SET ROLE regress_file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 ERROR:  only superuser can change options of a file_fdw foreign table
 SET ROLE regress_file_fdw_superuser;
+-- PROGRAM test
+CREATE FOREIGN TABLE program_test(
+	a text,
+	b integer,
+	c date
+) SERVER file_server
+OPTIONS(program 'perl -e ''print "abc\t123\t1994-09-10\n"''');
+SELECT * FROM program_test;
+  a  |  b  |     c      
+-----+-----+------------
+ abc | 123 | 09-10-1994
+(1 row)
+
+DROP FOREIGN TABLE program_test;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index d3b39aa..4937be8 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -27,7 +27,26 @@
 
    <listitem>
     <para>
-     Specifies the file to be read.  Required.  Must be an absolute path name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal> must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><literal>program</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the command to executed.
+     Note that the command is invoked by the shell, so if you need to pass any
+     arguments to shell command that come from an untrusted source, you must
+     be careful to strip or escape any special characters that might have a
+     special meaning for the shell. For security reasons, it is best to use a
+     fixed command string, or at least avoid passing any user input in it.
+     Either <literal>program</literal> or <literal>filename</literal> must be
+     specified.  They are mutually exclusive.
     </para>
    </listitem>
   </varlistentry>
@@ -171,9 +190,9 @@
 
  <para>
   Changing table-level options requires superuser privileges, for security
-  reasons: only a superuser should be able to determine which file is read.
-  In principle non-superusers could be allowed to change the other options,
-  but that's not supported at present.
+  reasons: only a superuser should be able to determine which file is read
+  or which program is run.  In principle non-superusers could be allowed to
+  change the other options, but that's not supported at present.
  </para>
 
  <para>
#23Corey Huinker
corey.huinker@gmail.com
In reply to: Amit Langote (#21)
1 attachment(s)
Re: Let file_fdw access COPY FROM PROGRAM

On Mon, Sep 12, 2016 at 1:59 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp

wrote:

On 2016/09/11 8:04, Corey Huinker wrote:

V2 of this patch:

Changes:
* rebased to most recent master
* removed non-tap test that assumed the existence of Unix sed program
* added non-tap test that assumes the existence of perl
* switched from filename/program to filename/is_program to more closely
follow patterns in copy.c
* slight wording change in C comments

This version looks mostly good to me. Except some whitespace noise in
some hunks:

@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo
*root, RelOptInfo *rel,
*/
static bool is_valid_option(const char *option, Oid context);
static void fileGetOptions(Oid foreigntableid,
-               char **filename, List **other_options);
+                           char **filename,
+                           bool *is_program,

Space after "is_program,"

@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)

/*
* Only superusers are allowed to set options of a file_fdw foreign
table.
-     * This is because the filename is one of those options, and we don't
want
-     * non-superusers to be able to determine which file gets read.
+     * The reason for this is to prevent non-superusers from changing the

Space after "the"

-        if (stat(filename, &stat_buf) == 0)
+        if ((! is_program) && (stat(filename, &stat_buf) == 0)))

Space between ! and is_program.

-        if (strcmp(def->defname, "filename") == 0)
+        if ((strcmp(def->defname, "filename") == 0) ||
(strcmp(def->defname, "program") == 0))

I think the usual style would be to split the if statement into two lines
as follows to keep within 80 characters per line [1]:

+        if ((strcmp(def->defname, "filename") == 0) ||
+            (strcmp(def->defname, "program") == 0))

And likewise for:

-                   &fdw_private->filename, &fdw_private->options);
+                   &fdw_private->filename, &fdw_private->is_program,
&fdw_private->options);

By the way, doesn't the following paragraph in file-fdw.sgml need an
update?

<para>
Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read.
In principle non-superusers could be allowed to change the other options,
but that's not supported at present.
</para>

I would like to mark this now as "Ready for Committer".

Thanks,
Amit

[1] https://www.postgresql.org/docs/devel/static/source-format.html

(reposting non-top-posted...sorry)

Thanks for the review!

I agree with all the code cleanups suggested and have made then in the
attached patch, to save the committer some time.

Also in this patch, I changed sgml para to
<para>
Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.
</para>

"Determine" is an odd word in this context. I understand it to mean
"set/change", but I can see where a less familiar reader would take the
meaning to be "has permission to see the value already set". Either way, it
now mentions program as an option in addition to filename.

Attachments:

file_fdw_program_v3.difftext/plain; charset=US-ASCII; name=file_fdw_program_v3.diffDownload
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index b471991..7f534b1 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -59,6 +59,7 @@ struct FileFdwOption
 static const struct FileFdwOption valid_options[] = {
 	/* File options */
 	{"filename", ForeignTableRelationId},
+	{"program", ForeignTableRelationId},
 
 	/* Format options */
 	/* oids option is not supported */
@@ -85,10 +86,11 @@ static const struct FileFdwOption valid_options[] = {
  */
 typedef struct FileFdwPlanState
 {
-	char	   *filename;		/* file to read */
-	List	   *options;		/* merged COPY options, excluding filename */
-	BlockNumber pages;			/* estimate of file's physical size */
-	double		ntuples;		/* estimate of number of rows in file */
+	char	   *filename;		/* file/program to read */
+	bool		is_program;		/* true if filename represents an OS command */
+	List	   *options;		/* merged COPY options, excluding filename and program */
+	BlockNumber pages;			/* estimate of file or program output's physical size */
+	double		ntuples;		/* estimate of number of rows in file/program output */
 } FileFdwPlanState;
 
 /*
@@ -96,9 +98,10 @@ typedef struct FileFdwPlanState
  */
 typedef struct FileFdwExecutionState
 {
-	char	   *filename;		/* file to read */
-	List	   *options;		/* merged COPY options, excluding filename */
-	CopyState	cstate;			/* state of reading file */
+	char	   *filename;		/* file/program to read */
+	bool		is_program;		/* true if filename represents an OS command */
+	List	   *options;		/* merged COPY options, excluding filename and is_program */
+	CopyState	cstate;			/* state of reading file or program */
 } FileFdwExecutionState;
 
 /*
@@ -139,7 +142,9 @@ static bool fileIsForeignScanParallelSafe(PlannerInfo *root, RelOptInfo *rel,
  */
 static bool is_valid_option(const char *option, Oid context);
 static void fileGetOptions(Oid foreigntableid,
-			   char **filename, List **other_options);
+						   char **filename,
+						   bool *is_program,
+						   List **other_options);
 static List *get_file_fdw_attribute_options(Oid relid);
 static bool check_selective_binary_conversion(RelOptInfo *baserel,
 								  Oid foreigntableid,
@@ -196,16 +201,17 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 
 	/*
 	 * Only superusers are allowed to set options of a file_fdw foreign table.
-	 * This is because the filename is one of those options, and we don't want
-	 * non-superusers to be able to determine which file gets read.
+	 * The reason for this is to prevent non-superusers from changing the
+	 * definition to access an arbitrary file not visible to that user
+	 * or to run programs not accessible to that user.
 	 *
 	 * Putting this sort of permissions check in a validator is a bit of a
 	 * crock, but there doesn't seem to be any other place that can enforce
 	 * the check more cleanly.
 	 *
-	 * Note that the valid_options[] array disallows setting filename at any
-	 * options level other than foreign table --- otherwise there'd still be a
-	 * security hole.
+	 * Note that the valid_options[] array disallows setting filename and
+	 * program at any options level other than foreign table --- otherwise
+	 * there'd still be a security hole.
 	 */
 	if (catalog == ForeignTableRelationId && !superuser())
 		ereport(ERROR,
@@ -247,11 +253,12 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 		}
 
 		/*
-		 * Separate out filename and column-specific options, since
+		 * Separate out filename, program, and column-specific options, since
 		 * ProcessCopyOptions won't accept them.
 		 */
 
-		if (strcmp(def->defname, "filename") == 0)
+		if ((strcmp(def->defname, "filename") == 0) ||
+			(strcmp(def->defname, "program") == 0))
 		{
 			if (filename)
 				ereport(ERROR,
@@ -296,12 +303,13 @@ file_fdw_validator(PG_FUNCTION_ARGS)
 	ProcessCopyOptions(NULL, NULL, true, other_options);
 
 	/*
-	 * Filename option is required for file_fdw foreign tables.
+	 * Either filename or program option is required for file_fdw foreign
+	 * tables.
 	 */
 	if (catalog == ForeignTableRelationId && filename == NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FDW_DYNAMIC_PARAMETER_VALUE_NEEDED),
-				 errmsg("filename is required for file_fdw foreign tables")));
+				 errmsg("either filename or program is required for file_fdw foreign tables")));
 
 	PG_RETURN_VOID();
 }
@@ -326,12 +334,13 @@ is_valid_option(const char *option, Oid context)
 /*
  * Fetch the options for a file_fdw foreign table.
  *
- * We have to separate out "filename" from the other options because
- * it must not appear in the options list passed to the core COPY code.
+ * We have to separate out "filename" and "program" from the other options
+ * because it must not appear in the options list passed to the core COPY
+ * code.
  */
 static void
 fileGetOptions(Oid foreigntableid,
-			   char **filename, List **other_options)
+			   char **filename, bool *is_program, List **other_options)
 {
 	ForeignTable *table;
 	ForeignServer *server;
@@ -359,9 +368,10 @@ fileGetOptions(Oid foreigntableid,
 	options = list_concat(options, get_file_fdw_attribute_options(foreigntableid));
 
 	/*
-	 * Separate out the filename.
+	 * Separate out the filename and program.
 	 */
 	*filename = NULL;
+	*is_program = false;
 	prev = NULL;
 	foreach(lc, options)
 	{
@@ -373,6 +383,13 @@ fileGetOptions(Oid foreigntableid,
 			options = list_delete_cell(options, lc, prev);
 			break;
 		}
+		else if (strcmp(def->defname, "program") == 0)
+		{
+			*filename = defGetString(def);
+			*is_program = true;
+			options = list_delete_cell(options, lc, prev);
+			break;
+		}
 		prev = lc;
 	}
 
@@ -381,7 +398,7 @@ fileGetOptions(Oid foreigntableid,
 	 * options, but check again, just in case.
 	 */
 	if (*filename == NULL)
-		elog(ERROR, "filename is required for file_fdw foreign tables");
+		elog(ERROR, "either filename or program is required for file_fdw foreign tables");
 
 	*other_options = options;
 }
@@ -475,12 +492,15 @@ fileGetForeignRelSize(PlannerInfo *root,
 	FileFdwPlanState *fdw_private;
 
 	/*
-	 * Fetch options.  We only need filename at this point, but we might as
-	 * well get everything and not need to re-fetch it later in planning.
+	 * Fetch options.  We only need filename (or program) at this point, but
+	 * we might as well get everything and not need to re-fetch it later in
+	 * planning.
 	 */
 	fdw_private = (FileFdwPlanState *) palloc(sizeof(FileFdwPlanState));
 	fileGetOptions(foreigntableid,
-				   &fdw_private->filename, &fdw_private->options);
+				   &fdw_private->filename,
+				   &fdw_private->is_program,
+				   &fdw_private->options);
 	baserel->fdw_private = (void *) fdw_private;
 
 	/* Estimate relation size */
@@ -583,20 +603,27 @@ static void
 fileExplainForeignScan(ForeignScanState *node, ExplainState *es)
 {
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 
-	/* Fetch options --- we only need filename at this point */
+	/* Fetch options --- we only need filename and is_program at this point */
 	fileGetOptions(RelationGetRelid(node->ss.ss_currentRelation),
-				   &filename, &options);
+				   &filename, &is_program, &options);
 
-	ExplainPropertyText("Foreign File", filename, es);
+	if(filename)
+	{
+		if (is_program)
+			ExplainPropertyText("Foreign Program", filename, es);
+		else
+			ExplainPropertyText("Foreign File", filename, es);
+	}
 
 	/* Suppress file size if we're not showing cost details */
 	if (es->costs)
 	{
 		struct stat stat_buf;
 
-		if (stat(filename, &stat_buf) == 0)
+		if ((!is_program) && (stat(filename, &stat_buf) == 0))
 			ExplainPropertyLong("Foreign File Size", (long) stat_buf.st_size,
 								es);
 	}
@@ -611,6 +638,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 {
 	ForeignScan *plan = (ForeignScan *) node->ss.ps.plan;
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	CopyState	cstate;
 	FileFdwExecutionState *festate;
@@ -623,7 +651,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 
 	/* Fetch options of foreign table */
 	fileGetOptions(RelationGetRelid(node->ss.ss_currentRelation),
-				   &filename, &options);
+				   &filename, &is_program, &options);
 
 	/* Add any options from the plan (currently only convert_selectively) */
 	options = list_concat(options, plan->fdw_private);
@@ -633,11 +661,11 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 	 * as to match the expected ScanTupleSlot signature.
 	 */
 	cstate = BeginCopyFrom(NULL,
-						   node->ss.ss_currentRelation,
-						   filename,
-						   false,
-						   NIL,
-						   options);
+							node->ss.ss_currentRelation,
+							filename,
+							is_program,
+							NIL,
+							options);
 
 	/*
 	 * Save state in node->fdw_state.  We must save enough information to call
@@ -645,6 +673,7 @@ fileBeginForeignScan(ForeignScanState *node, int eflags)
 	 */
 	festate = (FileFdwExecutionState *) palloc(sizeof(FileFdwExecutionState));
 	festate->filename = filename;
+	festate->is_program = is_program;
 	festate->options = options;
 	festate->cstate = cstate;
 
@@ -709,7 +738,7 @@ fileReScanForeignScan(ForeignScanState *node)
 	festate->cstate = BeginCopyFrom(NULL,
 									node->ss.ss_currentRelation,
 									festate->filename,
-									false,
+									festate->is_program,
 									NIL,
 									festate->options);
 }
@@ -738,11 +767,19 @@ fileAnalyzeForeignTable(Relation relation,
 						BlockNumber *totalpages)
 {
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	struct stat stat_buf;
 
 	/* Fetch options of foreign table */
-	fileGetOptions(RelationGetRelid(relation), &filename, &options);
+	fileGetOptions(RelationGetRelid(relation), &filename, &is_program, &options);
+
+	/*
+	 * If this is a program instead of a file, just return false to skip
+	 * analyzing the table.
+	 */
+	if (is_program)
+		return false;
 
 	/*
 	 * Get size of the file.  (XXX if we fail here, would it be better to just
@@ -916,9 +953,10 @@ estimate_size(PlannerInfo *root, RelOptInfo *baserel,
 
 	/*
 	 * Get size of the file.  It might not be there at plan time, though, in
-	 * which case we have to use a default estimate.
+	 * which case we have to use a default estimate.  We also have to fall
+	 * back to the default if using a program as the input.
 	 */
-	if (stat(fdw_private->filename, &stat_buf) < 0)
+	if (fdw_private->is_program || stat(fdw_private->filename, &stat_buf) < 0)
 		stat_buf.st_size = 10 * BLCKSZ;
 
 	/*
@@ -1036,6 +1074,7 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	bool	   *nulls;
 	bool		found;
 	char	   *filename;
+	bool		is_program;
 	List	   *options;
 	CopyState	cstate;
 	ErrorContextCallback errcallback;
@@ -1050,12 +1089,12 @@ file_acquire_sample_rows(Relation onerel, int elevel,
 	nulls = (bool *) palloc(tupDesc->natts * sizeof(bool));
 
 	/* Fetch options of foreign table */
-	fileGetOptions(RelationGetRelid(onerel), &filename, &options);
+	fileGetOptions(RelationGetRelid(onerel), &filename, &is_program, &options);
 
 	/*
 	 * Create CopyState from FDW options.
 	 */
-	cstate = BeginCopyFrom(NULL, onerel, filename, false, NIL, options);
+	cstate = BeginCopyFrom(NULL, onerel, filename, is_program, NIL, options);
 
 	/*
 	 * Use per-tuple memory context to prevent leak of memory used to read
diff --git a/contrib/file_fdw/input/file_fdw.source b/contrib/file_fdw/input/file_fdw.source
index 685561f..59a983d 100644
--- a/contrib/file_fdw/input/file_fdw.source
+++ b/contrib/file_fdw/input/file_fdw.source
@@ -185,6 +185,17 @@ SET ROLE regress_file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 SET ROLE regress_file_fdw_superuser;
 
+-- PROGRAM test
+CREATE FOREIGN TABLE program_test(
+	a text,
+	b integer,
+	c date
+) SERVER file_server
+OPTIONS(program 'perl -e ''print "abc\t123\t1994-09-10\n"''');
+
+SELECT * FROM program_test;
+DROP FOREIGN TABLE program_test;
+
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
diff --git a/contrib/file_fdw/output/file_fdw.source b/contrib/file_fdw/output/file_fdw.source
index 6fa5440..7dae388 100644
--- a/contrib/file_fdw/output/file_fdw.source
+++ b/contrib/file_fdw/output/file_fdw.source
@@ -76,7 +76,7 @@ CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (format 'csv', null '
 ');       -- ERROR
 ERROR:  COPY null representation cannot use newline or carriage return
 CREATE FOREIGN TABLE tbl () SERVER file_server;  -- ERROR
-ERROR:  filename is required for file_fdw foreign tables
+ERROR:  either filename or program is required for file_fdw foreign tables
 CREATE FOREIGN TABLE agg_text (
 	a	int2 CHECK (a >= 0),
 	b	float4
@@ -132,7 +132,7 @@ ERROR:  invalid option "force_not_null"
 HINT:  There are no valid options in this context.
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_not_null '*'); -- ERROR
 ERROR:  invalid option "force_not_null"
-HINT:  Valid options in this context are: filename, format, header, delimiter, quote, escape, null, encoding
+HINT:  Valid options in this context are: filename, program, format, header, delimiter, quote, escape, null, encoding
 -- force_null is not allowed to be specified at any foreign object level:
 ALTER FOREIGN DATA WRAPPER file_fdw OPTIONS (ADD force_null '*'); -- ERROR
 ERROR:  invalid option "force_null"
@@ -145,7 +145,7 @@ ERROR:  invalid option "force_null"
 HINT:  There are no valid options in this context.
 CREATE FOREIGN TABLE tbl () SERVER file_server OPTIONS (force_null '*'); -- ERROR
 ERROR:  invalid option "force_null"
-HINT:  Valid options in this context are: filename, format, header, delimiter, quote, escape, null, encoding
+HINT:  Valid options in this context are: filename, program, format, header, delimiter, quote, escape, null, encoding
 -- basic query tests
 SELECT * FROM agg_text WHERE b > 10.0 ORDER BY a;
   a  |   b    
@@ -341,6 +341,20 @@ SET ROLE regress_file_fdw_user;
 ALTER FOREIGN TABLE agg_text OPTIONS (SET format 'text');
 ERROR:  only superuser can change options of a file_fdw foreign table
 SET ROLE regress_file_fdw_superuser;
+-- PROGRAM test
+CREATE FOREIGN TABLE program_test(
+	a text,
+	b integer,
+	c date
+) SERVER file_server
+OPTIONS(program 'perl -e ''print "abc\t123\t1994-09-10\n"''');
+SELECT * FROM program_test;
+  a  |  b  |     c      
+-----+-----+------------
+ abc | 123 | 09-10-1994
+(1 row)
+
+DROP FOREIGN TABLE program_test;
 -- cleanup
 RESET ROLE;
 DROP EXTENSION file_fdw CASCADE;
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index d3b39aa..4937be8 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -27,7 +27,26 @@
 
    <listitem>
     <para>
-     Specifies the file to be read.  Required.  Must be an absolute path name.
+     Specifies the file to be read.  Must be an absolute path name.
+     Either <literal>filename</literal> or <literal>program</literal> must be
+     specified.  They are mutually exclusive.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><literal>program</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the command to executed.
+     Note that the command is invoked by the shell, so if you need to pass any
+     arguments to shell command that come from an untrusted source, you must
+     be careful to strip or escape any special characters that might have a
+     special meaning for the shell. For security reasons, it is best to use a
+     fixed command string, or at least avoid passing any user input in it.
+     Either <literal>program</literal> or <literal>filename</literal> must be
+     specified.  They are mutually exclusive.
     </para>
    </listitem>
   </varlistentry>
@@ -171,9 +190,9 @@
 
  <para>
   Changing table-level options requires superuser privileges, for security
-  reasons: only a superuser should be able to determine which file is read.
-  In principle non-superusers could be allowed to change the other options,
-  but that's not supported at present.
+  reasons: only a superuser should be able to determine which file is read
+  or which program is run.  In principle non-superusers could be allowed to
+  change the other options, but that's not supported at present.
  </para>
 
  <para>
#24Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Corey Huinker (#23)
Re: Let file_fdw access COPY FROM PROGRAM

On 2016/09/13 2:01, Corey Huinker wrote:

Thanks for the review!

I agree with all the code cleanups suggested and have made then in the
attached patch, to save the committer some time.

Thanks. Have already marked the patch as ready for the committer.

Also in this patch, I changed sgml para to
<para>
Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.
</para>

"Determine" is an odd word in this context. I understand it to mean
"set/change", but I can see where a less familiar reader would take the
meaning to be "has permission to see the value already set". Either way,
it now mentions program as an option in addition to filename.

Agreed.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Corey Huinker (#23)
Re: Let file_fdw access COPY FROM PROGRAM

Corey Huinker <corey.huinker@gmail.com> writes:

[ file_fdw_program_v3.diff ]

Pushed with cosmetic adjustments, mostly more work on the comments and
documentation.

I did not push the proposed test case; it's unportable. The upthread
suggestion to add a TAP test would have been all right, because
--enable-tap-tests requires Perl, but the basic regression tests must
not. I'm a bit dubious that it'd be worth the work to create such a
test anyway, when COPY FROM PROGRAM itself hasn't got one.

What *would* be worth some effort is allowing ANALYZE on a file_fdw
table reading from a program. I concur that that probably can't be
the default behavior, but always falling back to the 10-block default
with no pg_stats stats is a really horrid prospect.

One idea is to invent another table-level FDW option "analyze".
If we could make that default to true for files and false for programs,
it'd preserve the desired default behavior, but it would add a feature
for plain files too: if they're too unstable to be worth analyzing,
you could turn it off.

Another thought is that maybe manual ANALYZE should go through in any
case, and the FDW option would only be needed to control auto-analyze.
Although I'm not sure what to think about scripted cases like
vacuumdb --analyze. Maybe we'd need two flags, one permitting explicit
ANALYZE and one for autoanalyze, which could have different defaults.

Another thing that felt a little unfinished was the cost estimation
behavior. Again, it's not clear how to do any better by default,
but is it worth the trouble to provide an FDW option to let the user
set the cost estimate for reading the table? I'm not sure honestly.
Since there's only one path for the FDW itself, the cost estimate
doesn't matter in simple cases, and I'm not sure how much it matters
even in more complicated ones. It definitely sucks that we don't
have a rows estimate that has anything to do with reality, but allowing
ANALYZE would be enough to handle that.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers