COPY as a set returning function

Started by Corey Huinkerover 9 years ago26 messages

corey.huinker@gmail.com

over 9 years ago

1 attachment(s)

Attached is a _very_ rough patch implementing a proof-of-concept function
copy_srf();

It allows you to do things like this:

# select a,c,e from copy_srf('echo 12,345,67,89,2016-01-01',true) as t(a
integer, b text, c text, d text, e date);
a | c | e
----+----+------------
12 | 67 | 2016-01-01
(1 row)

Uses for this include:
- avoiding the pattern of creating a temp table just to select all the rows
back out and then discard the table (avoidable disk activity, avoidable oid
churn)
- avoiding the pattern of creating a file_fdw table in pg_temp just to drop
it after one select (avoidable oid churn)
- filtering file/program input by the columns that are relevant to the
user's needs.

This experiment arose from my realization that file_fdw just plugs into the
externally visible portions of copy.c to do all of it's work. So why
couldn't we do the same for a set returning function? Of course it wasn't
as simple as that. The existing Begin/NextCopyFrom functions require the
->rel to be a valid Oid...which we won't have in this context, so I had to
bypass that code and use CopyFromRawFields() directly...which isn't
externally visible, hence this being a patch to core rather than an
extension.

Currently the function only accepts two parameters, "filename" and
"is_program". Header is always false and csv mode is always true. Obviously
if we go forward on this, we'll want to add that functionality back in, but
I'm holding off for now to keep the example simple and wait for consensus
on future direction.

As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for
all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM
'filename' a legit CTE.

Regardless of the path forward, I'm going to need help in getting there,
hence this email. Thank you for your consideration.

Attachments:

copy_as_a_srf.diffapplication/octet-stream; name=copy_as_a_srf.diffDownload

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 432b0ca..5ba2f1f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -579,7 +579,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
 						 errmsg("could not read from COPY file: %m")));
 			break;
 		case COPY_OLD_FE:
-
 			/*
 			 * We cannot read more than minread bytes (which in practice is 1)
 			 * because old protocol doesn't have any clear way of separating
@@ -4540,3 +4539,235 @@ CreateCopyDestReceiver(void)
 
 	return (DestReceiver *) self;
 }
+
+/*
+ * copy_srf starts here
+ */
+
+#include "funcapi.h"
+/*
+#ifdef PG_MODULE_MAGIC
+PG_MODULE_MAGIC;
+#endif
+*/
+
+PG_FUNCTION_INFO_V1(copy_srf);
+
+Datum
+copy_srf(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo	*rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc		tupdesc;
+	//AttInMetadata	*attinmeta;
+	Tuplestorestate *tupstore = NULL;
+	MemoryContext	per_query_ctx;
+	MemoryContext	oldcontext;
+	FmgrInfo        *in_functions;
+	Oid             *typioparams;
+	Oid             in_func_oid;
+
+	CopyStateData	copy_state;
+	int			col;
+
+	Datum       *values;
+	bool		*nulls;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	}
+
+	if (!(rsinfo->allowedModes & SFRM_Materialize) || rsinfo->expectedDesc == NULL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+	}
+
+	tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+	//attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	values = (Datum *) palloc(tupdesc->natts * sizeof(Datum));
+	nulls = (bool *) palloc(tupdesc->natts * sizeof(bool));
+	in_functions = (FmgrInfo *) palloc(tupdesc->natts * sizeof(FmgrInfo));
+	typioparams = (Oid *) palloc(tupdesc->natts * sizeof(Oid));
+
+	for (col = 0; col < tupdesc->natts; col++)
+	{
+		getTypeInputInfo(tupdesc->attrs[col]->atttypid,&in_func_oid,&typioparams[col]);
+		fmgr_info(in_func_oid,&in_functions[col]);
+	}
+
+	/* Mock up a copy state */
+	initStringInfo(&copy_state.line_buf);
+	initStringInfo(&copy_state.attribute_buf);
+	copy_state.fe_msgbuf = makeStringInfo();
+
+	//copy_state.copy_dest = COPY_NEW_FE; /* maybe just COPY_FILE? */
+	copy_state.copy_dest = COPY_FILE; /* maybe just COPY_FILE? */
+	copy_state.file_encoding = 0; /* TODO total guess */
+	copy_state.need_transcoding = false; /* TODO a guess */
+	copy_state.encoding_embeds_ascii = false; /* TODO a guess */
+
+	/* parameters from the COPY command */
+	copy_state.rel = NULL;
+	copy_state.queryDesc = NULL;
+	/* TODO */
+	//List	   *attnumlist;		/* integer list of attnums to copy */
+	//copy_state.filename = "/bin/echo 123,abc,456,def,2016-01-04";
+	if (PG_ARGISNULL(0))
+	{
+		copy_state.filename = NULL;
+	}
+	else
+	{
+		text *fn = PG_GETARG_TEXT_P(0);
+		int len = VARSIZE(fn) - VARHDRSZ;
+		copy_state.filename = (char *) palloc(len);
+		memcpy(copy_state.filename,VARDATA(fn),len);
+	}
+	//copy_state.is_program = true;
+	copy_state.is_program = PG_GETARG_BOOL(1);
+	copy_state.binary = false;
+	copy_state.oids = false;
+	copy_state.freeze = false;
+	copy_state.csv_mode = true;
+	copy_state.header_line = false;
+	//char	   *null_print;		/* NULL marker string (server encoding!) */
+	//int			null_print_len; /* length of same */
+	//char	   *null_print_client;		/* same converted to file encoding */
+	copy_state.delim = ",";
+	copy_state.quote = "\"";
+	copy_state.escape = "\\";
+	//List	   *force_quote;	/* list of column names */
+	copy_state.force_quote_all = false;
+	//bool	   *force_quote_flags;		/* per-column CSV FQ flags */
+	//List	   *force_notnull;	/* list of column names */
+	//bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+	//List	   *force_null;		/* list of column names */
+	//bool	   *force_null_flags;		/* per-column CSV FN flags */
+	//bool		convert_selectively;	/* do selective binary conversion? */
+	//List	   *convert_select; /* list of column names (can be NIL) */
+	//bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
+	copy_state.max_fields = tupdesc->natts;
+	copy_state.raw_fields = (char **) palloc(tupdesc->natts * sizeof(char *));
+
+	/* let the caller know we're sending back a tuplestore */
+	rsinfo->returnMode = SFRM_Materialize;
+	per_query_ctx = fcinfo->flinfo->fn_mcxt;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true,false,work_mem);
+
+	/* open "file" */
+	if (copy_state.is_program)
+	{
+		copy_state.copy_file = OpenPipeStream(copy_state.filename, PG_BINARY_R);
+
+		if (copy_state.copy_file == NULL)
+		{
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not execute command \"%s\": %m",
+							copy_state.filename)));
+		}
+	}
+	else
+	{
+		struct stat st;
+
+		copy_state.copy_file = AllocateFile(copy_state.filename, PG_BINARY_R);
+		if (copy_state.copy_file == NULL)
+		{
+			/* copy errno because ereport subfunctions might change it */
+			int         save_errno = errno;
+
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not open file \"%s\" for reading: %m",
+							copy_state.filename),
+					 (save_errno == ENOENT || save_errno == EACCES) ?
+					 errhint("copy_srf instructs the PostgreSQL server process to read a file. "
+							 "You may want a client-side facility such as psql's \\copy.") : 0));
+		}
+
+		if (fstat(fileno(copy_state.copy_file), &st))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not stat file \"%s\": %m",
+							copy_state.filename)));
+
+		if (S_ISDIR(st.st_mode))
+			ereport(ERROR,
+					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+					 errmsg("\"%s\" is a directory", copy_state.filename)));
+	}
+
+	while(1)
+	{
+		char    **field_strings;
+		int     field_strings_count;
+		int     col;
+		HeapTuple tuple;
+
+		if (! NextCopyFromRawFields(&copy_state,&field_strings,&field_strings_count))
+		{
+			break;
+		}
+		if (field_strings_count != tupdesc->natts)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("found %d fields but expected %d on line %d",
+							field_strings_count, tupdesc->natts, copy_state.cur_lineno)));
+		}
+
+		for (col = 0; col < tupdesc->natts; col++)
+		{
+			values[col] = InputFunctionCall(&in_functions[col],
+											field_strings[col],
+											typioparams[col],
+											tupdesc->attrs[col]->atttypmod);
+			nulls[col] = (field_strings[col] == NULL);
+		}
+
+		tuple = heap_form_tuple(tupdesc,values,nulls);
+		//tuple = BuildTupleFromCStrings(attinmeta, field_strings);
+		tuplestore_puttuple(tupstore, tuple);
+	}
+
+	/* close "file" */
+	if (copy_state.is_program)
+	{
+		int         pclose_rc;
+
+		pclose_rc = ClosePipeStream(copy_state.copy_file);
+		if (pclose_rc == -1)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close pipe to external command: %m")));
+		else if (pclose_rc != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
+					 errmsg("program \"%s\" failed",
+							copy_state.filename),
+					 errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+	}
+	else
+	{
+		if (copy_state.filename != NULL && FreeFile(copy_state.copy_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							copy_state.filename)));
+	}
+
+	tuplestore_donestoring(tupstore);
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+	MemoryContextSwitchTo(oldcontext);
+
+	return (Datum) 0;
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index e2d08ba..de3973f 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5344,6 +5344,9 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,bigint_timestamps,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+DATA(insert OID = 3445 (  copy_srf PGNSP PGUID 12 1 1000 0 0 f f f f t t v u 2 0 2249 "25 16" _null_ _null_ _null_ _null_ _null_ copy_srf _null_ _null_ _null_ ));
+DESCR("set-returning COPY proof of concept");
+
 /*
  * Symbolic values for provolatile column: these indicate whether the result
  * of a function is dependent *only* on the values of its explicit arguments,

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Corey Huinker (#1)

Re: COPY as a set returning function

Corey Huinker <corey.huinker@gmail.com> writes:

Attached is a _very_ rough patch implementing a proof-of-concept function
copy_srf();
...
As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for
all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM
'filename' a legit CTE.

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function. Once you had the framework for that, other
rows-returning utility commands such as EXPLAIN might plug in as well,
whenever somebody got enough of an itch for it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Craig Ringer

craig.ringer@2ndquadrant.com

over 9 years ago

In reply to: Tom Lane (#2)

Re: COPY as a set returning function

On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Corey Huinker <corey.huinker@gmail.com> writes:

Attached is a _very_ rough patch implementing a proof-of-concept

function

copy_srf();
...
As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for
all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5

FROM

'filename' a legit CTE.

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function. Once you had the framework for that, other
rows-returning utility commands such as EXPLAIN might plug in as well,
whenever somebody got enough of an itch for it.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

Show quoted text

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Craig Ringer (#3)

Re: COPY as a set returning function

Craig Ringer <craig.ringer@2ndquadrant.com> writes:

On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying. You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument. A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Corey Huinker

corey.huinker@gmail.com

about 9 years ago

In reply to: Tom Lane (#4)

Re: COPY as a set returning function

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying. You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument. A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

regards, tom lane

I would like to make COPY itself a SRF. That's a bit beyond my
capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy AS (
COPY FROM 'example.csv' TO *RESULT SET(c1 text, c2 integer, dummy1
text, dummy2 text, c5 date)* WITH (FORMAT CSV)
*RETURNING c1, c2, c3*
)
SELECT ...
FROM my_copy
LEFT OUTER JOIN ref_table ...

The RESULT SET (colspecs) bit would be the rsinfo currently used by
copy_srf(). It would be nice if the CTE declaration could take types, but
it can't.

The RETURNING clause here doesn't return all the columns made available
from the COPY. That would be nice, but not required because the same
filtration could be done when the CTE is referenced. So if we require RETURNING
* be the only returning option I'd be fine with that.

If we're ok with adding a function like copy_srf() to the core, will we
still be happy with it when COPY does get a RETURNING clause?

Somewhat off-topic: here's some other observations of a n00b who spent a
fair amount of time looking at the copy.c code.

1. BeginCopyFrom and NextCopyFrom pull attlist/tupdesc info from ->rel,
repeatedly. If we were going to try to leverage that code we'd need to
store those things in a separate cstate member so that we add complexity
only in the initialization of the copy state data struct, pulling the
result structure from rsinfo rather than a relation. There's probably a
minor performance gain to be had in keeping that info around. Refactoring
those two procs to allow for a pre-set attlist/tupdesc would help.

2. NextCopyFrom() checks every single time to see if it's row 0 and if it
should skip this header row. I know a single (row_num == 0 && has_header)
isn't much extra processing, but shouldn't we digest and discard headers
before going into the per-row loop?

3. All the code that handles indexes, constraints, buffering, etc, simply
doesn't apply in the SRF context.

4. The code somewhat needlessly mixes code for the COPY FROM and COPY TO
cases. There's probably a good reason for this, but it made for a lot of
clutter in achieving my very narrow goal.

Craig Ringer

craig.ringer@2ndquadrant.com

about 9 years ago

In reply to: Corey Huinker (#1)

Re: COPY as a set returning function

On 15 Oct. 2016 04:56, "Corey Huinker" <corey.huinker@gmail.com> wrote:

I would like to make COPY itself a SRF. That's a bit beyond my

capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy AS (
COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1

text, dummy2 text, c5 date) WITH (FORMAT CSV)

RETURNING c1, c2, c3
)

Strong -1 from me on this approach. Our CTE implementation materializes
everything so this is no better than COPYing to a temp table.

Not unless you plan to fix that (and figure out the backward compatibility
issues since the bug is documented as a feature) or implement RETURNING in
subqueries... I'd go for the function.

Import Notes

Reply to msg id not found: CAMsr+YHE0Kq63VnxEgNEC2+vefKDbHjpfPbGNDsGPRwyvcx2g@mail.gmail.com

Corey Huinker

corey.huinker@gmail.com

about 9 years ago

In reply to: Craig Ringer (#6)

Re: COPY as a set returning function

On Sun, Oct 16, 2016 at 9:01 AM, Craig Ringer <craig.ringer@2ndquadrant.com>
wrote:

On 15 Oct. 2016 04:56, "Corey Huinker" <corey.huinker@gmail.com> wrote:

I would like to make COPY itself a SRF. That's a bit beyond my

capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy AS (
COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1

text, dummy2 text, c5 date) WITH (FORMAT CSV)

RETURNING c1, c2, c3
)

Strong -1 from me on this approach. Our CTE implementation materializes
everything so this is no better than COPYing to a temp table.

Not unless you plan to fix that (and figure out the backward compatibility
issues since the bug is documented as a feature) or implement RETURNING in
subqueries... I'd go for the function.

Well, it saves burning the oid and the pg_attribute rows. A few long
running transactions can cause pg_attribute to bloat to 400GB on one of our
systems - hence my wanting something like this function.

If it does stay a function, we only need to implement 8 of the 12 options
as parameters (FREEZE and FORCE* options don't apply). My guess is that
future options added to COPY will be more about handling output or
optimizing table inserts, neither of which mean more options for this
proposed function.

Would the best approach be to build in a core srf-returning function that
might be deprecated once COPY is set-returning AND CTEs don't have to
materialize, or to refactor what's in copy.c such that a contrib module can
easily plug into it, and have copy_srf live there?

Merlin Moncure

mmoncure@gmail.com

about 9 years ago

In reply to: Tom Lane (#4)

Re: COPY as a set returning function

On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig.ringer@2ndquadrant.com> writes:

On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying. You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument. A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

One big disadvantage of SRF-returning-record syntax is that functions
are basically unwrappable with generic wrappers sans major gymnastics
such as dynamically generating the query and executing it. This is a
major disadvantage relative to the null::type hack we use in the
populate_record style functions and perhaps ought to make this
(SRF-returning-record syntax) style of use discouraged for useful
library functions. If there were a way to handle wrapping I'd
withdraw this minor objection -- this has come up in dblink
discussions a few times).

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Corey Huinker

corey.huinker@gmail.com

about 9 years ago

In reply to: Merlin Moncure (#8)

1 attachment(s)

Re: COPY as a set returning function

Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY
options (filename+is_program instead of FILENAME or PROGRAM, etc)

CREATE OR REPLACE FUNCTION copy_srf(
filename text DEFAULT null,
is_program boolean DEFAULT false,
format text DEFAULT 'text', /* accepts text, csv, binary */
delimiter text DEFAULT null,
null_string text DEFAULT E'\\N',
header boolean DEFAULT false,
quote text DEFAULT null,
escape text DEFAULT null,
encoding text DEFAULT null)
RETURNS SETOF RECORD

The major utility for this (at least for me) will be in ETLs that currently
make a lot of use of temp tables, and wish to either reduce I/O or avoid
pg_attribute bloat.

I have not yet implemented STDIN mode, but it's a start.

$ psql test
psql (10devel)
Type "help" for help.

# select * from copy_srf('echo 1,2; echo 4,5',true,'csv') as t(x text, y
text);
x | y
---+---
1 | 2
4 | 5
(2 rows)

Time: 1.456 ms
# select * from copy_srf('/tmp/small_file.txt'::text,false,'text') as
t(x text, y text);
x | y
-----+-----
1 | 2
a | b
cde | fgh
(3 rows)

On Mon, Oct 17, 2016 at 2:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

Show quoted text

On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig.ringer@2ndquadrant.com> writes:

On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying. You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument. A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

One big disadvantage of SRF-returning-record syntax is that functions
are basically unwrappable with generic wrappers sans major gymnastics
such as dynamically generating the query and executing it. This is a
major disadvantage relative to the null::type hack we use in the
populate_record style functions and perhaps ought to make this
(SRF-returning-record syntax) style of use discouraged for useful
library functions. If there were a way to handle wrapping I'd
withdraw this minor objection -- this has come up in dblink
discussions a few times).

merlin

Attachments:

copy_srf_function.difftext/plain; charset=US-ASCII; name=copy_srf_function.diffDownload

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ada2142..0876ee1 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1006,6 +1006,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+	IN filename text DEFAULT null, 
+	IN is_program boolean DEFAULT false,
+	IN format text DEFAULT 'text',
+	IN delimiter text DEFAULT null,
+	IN null_string text DEFAULT E'\\N',
+	IN header boolean DEFAULT false,
+	IN quote text DEFAULT null,
+	IN escape text DEFAULT null,
+	IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b4140eb..90ed2c5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -555,7 +556,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
 						 errmsg("could not read from COPY file: %m")));
 			break;
 		case COPY_OLD_FE:
-
 			/*
 			 * We cannot read more than minread bytes (which in practice is 1)
 			 * because old protocol doesn't have any clear way of separating
@@ -4555,3 +4555,377 @@ CreateCopyDestReceiver(void)
 
 	return (DestReceiver *) self;
 }
+
+Datum
+copy_srf(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo	*rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc		tupdesc;
+	Tuplestorestate *tupstore = NULL;
+	MemoryContext	per_query_ctx;
+	MemoryContext	oldcontext;
+	FmgrInfo        *in_functions;
+	Oid             *typioparams;
+	Oid             in_func_oid;
+
+	CopyStateData	copy_state;
+	int			col;
+
+	Datum       *values;
+	bool		*nulls;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	}
+
+	if (!(rsinfo->allowedModes & SFRM_Materialize) || rsinfo->expectedDesc == NULL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+	}
+
+	tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+	values = (Datum *) palloc(tupdesc->natts * sizeof(Datum));
+	nulls = (bool *) palloc(tupdesc->natts * sizeof(bool));
+	in_functions = (FmgrInfo *) palloc(tupdesc->natts * sizeof(FmgrInfo));
+	typioparams = (Oid *) palloc(tupdesc->natts * sizeof(Oid));
+
+	for (col = 0; col < tupdesc->natts; col++)
+	{
+		getTypeInputInfo(tupdesc->attrs[col]->atttypid,&in_func_oid,&typioparams[col]);
+		fmgr_info(in_func_oid,&in_functions[col]);
+	}
+
+	/*
+	 * Function signature is:
+	 * copy_srf( filename text default null,
+	 *           is_program boolean default false,
+	 *           format text default 'text',
+	 *           delimiter text default E'\t' in text mode, ',' in csv mode,
+	 *           null_string text default '\N',
+	 *           header boolean default false,
+	 *           quote text default '"' in csv mode only,
+	 *           escape text default null, -- defaults to whatever quote is
+	 *           encoding text default null
+	 */
+
+	/* Mock up a copy state */
+	initStringInfo(&copy_state.line_buf);
+	initStringInfo(&copy_state.attribute_buf);
+	copy_state.fe_msgbuf = makeStringInfo();
+	copy_state.oids = false;
+	copy_state.freeze = false;
+
+	copy_state.need_transcoding = false;
+	copy_state.encoding_embeds_ascii = false;
+	copy_state.rel = NULL;
+	copy_state.queryDesc = NULL;
+
+	/* param 0: filename */
+	if (PG_ARGISNULL(0))
+	{
+		copy_state.copy_dest = COPY_NEW_FE;
+		copy_state.filename = NULL;
+	}
+	else
+	{
+		copy_state.copy_dest = COPY_FILE;
+		copy_state.filename = TextDatumGetCString(PG_GETARG_TEXT_P(0));
+	}
+
+	/* param 1: is_program */
+	if (PG_ARGISNULL(1))
+	{
+		copy_state.is_program = false;
+	}
+	else
+	{
+		copy_state.is_program = PG_GETARG_BOOL(1);
+	}
+
+	/* param 2: format - text, csv, binary */
+	if (PG_ARGISNULL(2))
+	{
+		copy_state.binary = false;
+		copy_state.csv_mode = false;
+	}
+	else
+	{
+		char* format_str = TextDatumGetCString(PG_GETARG_TEXT_P(2));
+		if (strcmp(format_str,"text") == 0)
+		{
+			copy_state.binary = false;
+			copy_state.csv_mode = false;
+		}
+		else if (strcmp(format_str,"csv") == 0)
+		{
+			copy_state.binary = false;
+			copy_state.csv_mode = true;
+		}
+		else if (strcmp(format_str,"binary") == 0)
+		{
+			copy_state.binary = true;
+			copy_state.csv_mode = false;
+		}
+		else
+		{
+			ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("Format must be one of: text csv binary")));
+		}
+	}
+
+	/* param 3: delimiter text default E'\t', */
+	if (PG_ARGISNULL(3))
+	{
+		copy_state.delim = copy_state.csv_mode ? "," : "\t";
+	}
+	else
+	{
+		if (copy_state.binary)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify DELIMITER in BINARY mode")));
+		}
+		copy_state.delim = TextDatumGetCString(PG_GETARG_TEXT_P(3));
+
+		if (strlen(copy_state.delim) != 1)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				  errmsg("COPY delimiter must be a single one-byte character")));
+		}
+
+		/* Disallow end-of-line characters */
+		if (strchr(copy_state.delim, '\r') != NULL ||
+			strchr(copy_state.delim, '\n') != NULL)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY delimiter cannot be newline or carriage return")));
+		}
+	}
+
+	/* param 4: null_string text default '\N', */
+	if (PG_ARGISNULL(4))
+	{
+		copy_state.null_print = copy_state.csv_mode ? "" : "\\N";
+	}
+	else
+	{
+		copy_state.null_print = TextDatumGetCString(PG_GETARG_TEXT_P(4));
+	}
+	copy_state.null_print_len = strlen(copy_state.null_print);
+	/* NOT SET char	   *null_print_client; */
+
+	/* param 5: header boolean default false, */
+	if (PG_ARGISNULL(5))
+	{
+		copy_state.header_line = false;
+	}
+	else
+	{
+		copy_state.header_line = PG_GETARG_BOOL(5);
+	}
+
+	/* param 6: quote text default '"', */
+	if (PG_ARGISNULL(6))
+	{
+		copy_state.quote = "\"";
+	}
+	else
+	{
+		if (copy_state.csv_mode)
+		{
+			if (strlen(copy_state.quote) != 1)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("COPY quote must be a single one-byte character")));
+			}
+
+			if (copy_state.delim[0] == copy_state.quote[0])
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("COPY delimiter and quote must be different")));
+			}
+		}
+		else
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY quote available only in CSV mode")));
+		}
+		copy_state.quote = TextDatumGetCString(PG_GETARG_TEXT_P(6));
+	}
+
+	/* param 7: escape text default null, -- defaults to whatever quote is */
+	if (PG_ARGISNULL(7))
+	{
+		copy_state.escape = copy_state.quote;
+	}
+	else
+	{
+		if (copy_state.csv_mode)
+		{
+			copy_state.escape = TextDatumGetCString(PG_GETARG_TEXT_P(7));
+			if (strlen(copy_state.escape) != 1)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("COPY escape must be a single one-byte character")));
+			}
+		}
+		else
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY escape available only in CSV mode")));
+		}
+	}
+
+	/* param 8: encoding text default null */
+	if (PG_ARGISNULL(8))
+	{
+		copy_state.file_encoding = pg_get_client_encoding();
+	}
+	else
+	{
+		char* encoding = TextDatumGetCString(PG_GETARG_TEXT_P(8));
+		copy_state.file_encoding = pg_char_to_encoding(encoding);
+		if (copy_state.file_encoding < 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("argument to option \"%s\" must be a valid encoding name",
+							encoding)));
+		}
+	}
+
+	copy_state.max_fields = tupdesc->natts;
+	copy_state.raw_fields = (char **) palloc(tupdesc->natts * sizeof(char *));
+
+	/* let the caller know we're sending back a tuplestore */
+	rsinfo->returnMode = SFRM_Materialize;
+	per_query_ctx = fcinfo->flinfo->fn_mcxt;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true,false,work_mem);
+
+	/* open "file" */
+	if (copy_state.is_program)
+	{
+		copy_state.copy_file = OpenPipeStream(copy_state.filename, PG_BINARY_R);
+
+		if (copy_state.copy_file == NULL)
+		{
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not execute command \"%s\": %m",
+							copy_state.filename)));
+		}
+	}
+	else
+	{
+		struct stat st;
+
+		copy_state.copy_file = AllocateFile(copy_state.filename, PG_BINARY_R);
+		if (copy_state.copy_file == NULL)
+		{
+			/* copy errno because ereport subfunctions might change it */
+			int         save_errno = errno;
+
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not open file \"%s\" for reading: %m",
+							copy_state.filename),
+					 (save_errno == ENOENT || save_errno == EACCES) ?
+					 errhint("copy_srf instructs the PostgreSQL server process to read a file. "
+							 "You may want a client-side facility such as psql's \\copy.") : 0));
+		}
+
+		if (fstat(fileno(copy_state.copy_file), &st))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not stat file \"%s\": %m",
+							copy_state.filename)));
+
+		if (S_ISDIR(st.st_mode))
+			ereport(ERROR,
+					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+					 errmsg("\"%s\" is a directory", copy_state.filename)));
+	}
+
+	while(1)
+	{
+		char    **field_strings;
+		int     field_strings_count;
+		int     col;
+		HeapTuple tuple;
+
+		if (! NextCopyFromRawFields(&copy_state,&field_strings,&field_strings_count))
+		{
+			break;
+		}
+		if (field_strings_count != tupdesc->natts)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("found %d fields but expected %d on line %d",
+							field_strings_count, tupdesc->natts, copy_state.cur_lineno)));
+		}
+
+		for (col = 0; col < tupdesc->natts; col++)
+		{
+			values[col] = InputFunctionCall(&in_functions[col],
+											field_strings[col],
+											typioparams[col],
+											tupdesc->attrs[col]->atttypmod);
+			nulls[col] = (field_strings[col] == NULL);
+		}
+
+		tuple = heap_form_tuple(tupdesc,values,nulls);
+		//tuple = BuildTupleFromCStrings(attinmeta, field_strings);
+		tuplestore_puttuple(tupstore, tuple);
+	}
+
+	/* close "file" */
+	if (copy_state.is_program)
+	{
+		int         pclose_rc;
+
+		pclose_rc = ClosePipeStream(copy_state.copy_file);
+		if (pclose_rc == -1)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close pipe to external command: %m")));
+		else if (pclose_rc != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
+					 errmsg("program \"%s\" failed",
+							copy_state.filename),
+					 errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+	}
+	else
+	{
+		if (copy_state.filename != NULL && FreeFile(copy_state.copy_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							copy_state.filename)));
+	}
+
+	tuplestore_donestoring(tupstore);
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+	MemoryContextSwitchTo(oldcontext);
+
+	return (Datum) 0;
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 17ec71d..d8076ee 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5341,6 +5341,11 @@ DESCR("pg_controldata recovery state information as a function");
 DATA(insert OID = 3444 ( pg_control_init PGNSP PGUID 12 1 0 0 0 f f f f t f v s 0 0 2249 "" "{23,23,23,23,23,23,23,23,23,16,16,16,23}" "{o,o,o,o,o,o,o,o,o,o,o,o,o}" "{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,bigint_timestamps,float4_pass_by_value,float8_pass_by_value,data_page_checksum_version}" _null_ _null_ pg_control_init _null_ _null_ _null_ ));
 DESCR("pg_controldata init state information as a function");
 
+
+DATA(insert OID = 3445 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ copy_srf _null_ _null_ _null_ ));
+DESCR("set-returning COPY proof of concept");
+
+
 /*
  * Symbolic values for provolatile column: these indicate whether the result
  * of a function is dependent *only* on the values of its explicit arguments,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 65eb347..c27baac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -37,4 +37,7 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern Datum copy_srf(PG_FUNCTION_ARGS);
+
+
 #endif   /* COPY_H */

#10

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Corey Huinker (#9)

Re: COPY as a set returning function

On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com>
wrote:

Attached is a patch that implements copy_srf().

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

#11

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Haribabu Kommi (#10)

Re: COPY as a set returning function

On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com>
wrote:

Attached is a patch that implements copy_srf().

Moved to next CF with "needs review" status.

This patch is still waiting for review. David, are you planning to
look at it by the end of the CF?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

David Fetter

david@fetter.org

almost 9 years ago

In reply to: Michael Paquier (#11)

Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 02:37:57PM +0900, Michael Paquier wrote:

On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com>
wrote:

Attached is a patch that implements copy_srf().

Moved to next CF with "needs review" status.

This patch is still waiting for review. David, are you planning to
look at it by the end of the CF?

I'll be doing this today.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

David Fetter

david@fetter.org

almost 9 years ago

In reply to: Corey Huinker (#9)

Re: COPY as a set returning function

On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:

Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY
options (filename+is_program instead of FILENAME or PROGRAM, etc)

The patch as it stands needs a rebase. I'll see what I can do today.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

David Fetter

david@fetter.org

almost 9 years ago

In reply to: David Fetter (#13)

1 attachment(s)

Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 06:16:16AM -0800, David Fetter wrote:

On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:

Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY
options (filename+is_program instead of FILENAME or PROGRAM, etc)

The patch as it stands needs a rebase. I'll see what I can do today.

Please find attached a rebase, which fixes an OID collision that crept in.

- The patch builds against master and passes "make check".
- The patch does not contain user-visible documentation or examples.
- The patch does not contain tests.

I got the following when I did a brief test.

SELECT * FROM copy_srf(filename => 'ls', is_program => true) AS t(l text);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

copy_srf_function_002.difftext/plain; charset=us-asciiDownload

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 4dfedf8..ae07cfb 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+	IN filename text DEFAULT null,
+	IN is_program boolean DEFAULT false,
+	IN format text DEFAULT 'text',
+	IN delimiter text DEFAULT null,
+	IN null_string text DEFAULT E'\\N',
+	IN header boolean DEFAULT false,
+	IN quote text DEFAULT null,
+	IN escape text DEFAULT null,
+	IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f9362be..8e1bd39 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
 						 errmsg("could not read from COPY file: %m")));
 			break;
 		case COPY_OLD_FE:
-
 			/*
 			 * We cannot read more than minread bytes (which in practice is 1)
 			 * because old protocol doesn't have any clear way of separating
@@ -4740,3 +4740,377 @@ CreateCopyDestReceiver(void)
 
 	return (DestReceiver *) self;
 }
+
+Datum
+copy_srf(PG_FUNCTION_ARGS)
+{
+	ReturnSetInfo	*rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc		tupdesc;
+	Tuplestorestate *tupstore = NULL;
+	MemoryContext	per_query_ctx;
+	MemoryContext	oldcontext;
+	FmgrInfo        *in_functions;
+	Oid             *typioparams;
+	Oid             in_func_oid;
+
+	CopyStateData	copy_state;
+	int			col;
+
+	Datum       *values;
+	bool		*nulls;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+	}
+
+	if (!(rsinfo->allowedModes & SFRM_Materialize) || rsinfo->expectedDesc == NULL)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+	}
+
+	tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+	values = (Datum *) palloc(tupdesc->natts * sizeof(Datum));
+	nulls = (bool *) palloc(tupdesc->natts * sizeof(bool));
+	in_functions = (FmgrInfo *) palloc(tupdesc->natts * sizeof(FmgrInfo));
+	typioparams = (Oid *) palloc(tupdesc->natts * sizeof(Oid));
+
+	for (col = 0; col < tupdesc->natts; col++)
+	{
+		getTypeInputInfo(tupdesc->attrs[col]->atttypid,&in_func_oid,&typioparams[col]);
+		fmgr_info(in_func_oid,&in_functions[col]);
+	}
+
+	/*
+	 * Function signature is:
+	 * copy_srf( filename text default null,
+	 *           is_program boolean default false,
+	 *           format text default 'text',
+	 *           delimiter text default E'\t' in text mode, ',' in csv mode,
+	 *           null_string text default '\N',
+	 *           header boolean default false,
+	 *           quote text default '"' in csv mode only,
+	 *           escape text default null, -- defaults to whatever quote is
+	 *           encoding text default null
+	 */
+
+	/* Mock up a copy state */
+	initStringInfo(&copy_state.line_buf);
+	initStringInfo(&copy_state.attribute_buf);
+	copy_state.fe_msgbuf = makeStringInfo();
+	copy_state.oids = false;
+	copy_state.freeze = false;
+
+	copy_state.need_transcoding = false;
+	copy_state.encoding_embeds_ascii = false;
+	copy_state.rel = NULL;
+	copy_state.queryDesc = NULL;
+
+	/* param 0: filename */
+	if (PG_ARGISNULL(0))
+	{
+		copy_state.copy_dest = COPY_NEW_FE;
+		copy_state.filename = NULL;
+	}
+	else
+	{
+		copy_state.copy_dest = COPY_FILE;
+		copy_state.filename = TextDatumGetCString(PG_GETARG_TEXT_P(0));
+	}
+
+	/* param 1: is_program */
+	if (PG_ARGISNULL(1))
+	{
+		copy_state.is_program = false;
+	}
+	else
+	{
+		copy_state.is_program = PG_GETARG_BOOL(1);
+	}
+
+	/* param 2: format - text, csv, binary */
+	if (PG_ARGISNULL(2))
+	{
+		copy_state.binary = false;
+		copy_state.csv_mode = false;
+	}
+	else
+	{
+		char* format_str = TextDatumGetCString(PG_GETARG_TEXT_P(2));
+		if (strcmp(format_str,"text") == 0)
+		{
+			copy_state.binary = false;
+			copy_state.csv_mode = false;
+		}
+		else if (strcmp(format_str,"csv") == 0)
+		{
+			copy_state.binary = false;
+			copy_state.csv_mode = true;
+		}
+		else if (strcmp(format_str,"binary") == 0)
+		{
+			copy_state.binary = true;
+			copy_state.csv_mode = false;
+		}
+		else
+		{
+			ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("Format must be one of: text csv binary")));
+		}
+	}
+
+	/* param 3: delimiter text default E'\t', */
+	if (PG_ARGISNULL(3))
+	{
+		copy_state.delim = copy_state.csv_mode ? "," : "\t";
+	}
+	else
+	{
+		if (copy_state.binary)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_SYNTAX_ERROR),
+					 errmsg("cannot specify DELIMITER in BINARY mode")));
+		}
+		copy_state.delim = TextDatumGetCString(PG_GETARG_TEXT_P(3));
+
+		if (strlen(copy_state.delim) != 1)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				  errmsg("COPY delimiter must be a single one-byte character")));
+		}
+
+		/* Disallow end-of-line characters */
+		if (strchr(copy_state.delim, '\r') != NULL ||
+			strchr(copy_state.delim, '\n') != NULL)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("COPY delimiter cannot be newline or carriage return")));
+		}
+	}
+
+	/* param 4: null_string text default '\N', */
+	if (PG_ARGISNULL(4))
+	{
+		copy_state.null_print = copy_state.csv_mode ? "" : "\\N";
+	}
+	else
+	{
+		copy_state.null_print = TextDatumGetCString(PG_GETARG_TEXT_P(4));
+	}
+	copy_state.null_print_len = strlen(copy_state.null_print);
+	/* NOT SET char	   *null_print_client; */
+
+	/* param 5: header boolean default false, */
+	if (PG_ARGISNULL(5))
+	{
+		copy_state.header_line = false;
+	}
+	else
+	{
+		copy_state.header_line = PG_GETARG_BOOL(5);
+	}
+
+	/* param 6: quote text default '"', */
+	if (PG_ARGISNULL(6))
+	{
+		copy_state.quote = "\"";
+	}
+	else
+	{
+		if (copy_state.csv_mode)
+		{
+			if (strlen(copy_state.quote) != 1)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("COPY quote must be a single one-byte character")));
+			}
+
+			if (copy_state.delim[0] == copy_state.quote[0])
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+						 errmsg("COPY delimiter and quote must be different")));
+			}
+		}
+		else
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY quote available only in CSV mode")));
+		}
+		copy_state.quote = TextDatumGetCString(PG_GETARG_TEXT_P(6));
+	}
+
+	/* param 7: escape text default null, -- defaults to whatever quote is */
+	if (PG_ARGISNULL(7))
+	{
+		copy_state.escape = copy_state.quote;
+	}
+	else
+	{
+		if (copy_state.csv_mode)
+		{
+			copy_state.escape = TextDatumGetCString(PG_GETARG_TEXT_P(7));
+			if (strlen(copy_state.escape) != 1)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("COPY escape must be a single one-byte character")));
+			}
+		}
+		else
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY escape available only in CSV mode")));
+		}
+	}
+
+	/* param 8: encoding text default null */
+	if (PG_ARGISNULL(8))
+	{
+		copy_state.file_encoding = pg_get_client_encoding();
+	}
+	else
+	{
+		char* encoding = TextDatumGetCString(PG_GETARG_TEXT_P(8));
+		copy_state.file_encoding = pg_char_to_encoding(encoding);
+		if (copy_state.file_encoding < 0)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("argument to option \"%s\" must be a valid encoding name",
+							encoding)));
+		}
+	}
+
+	copy_state.max_fields = tupdesc->natts;
+	copy_state.raw_fields = (char **) palloc(tupdesc->natts * sizeof(char *));
+
+	/* let the caller know we're sending back a tuplestore */
+	rsinfo->returnMode = SFRM_Materialize;
+	per_query_ctx = fcinfo->flinfo->fn_mcxt;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true,false,work_mem);
+
+	/* open "file" */
+	if (copy_state.is_program)
+	{
+		copy_state.copy_file = OpenPipeStream(copy_state.filename, PG_BINARY_R);
+
+		if (copy_state.copy_file == NULL)
+		{
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not execute command \"%s\": %m",
+							copy_state.filename)));
+		}
+	}
+	else
+	{
+		struct stat st;
+
+		copy_state.copy_file = AllocateFile(copy_state.filename, PG_BINARY_R);
+		if (copy_state.copy_file == NULL)
+		{
+			/* copy errno because ereport subfunctions might change it */
+			int         save_errno = errno;
+
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not open file \"%s\" for reading: %m",
+							copy_state.filename),
+					 (save_errno == ENOENT || save_errno == EACCES) ?
+					 errhint("copy_srf instructs the PostgreSQL server process to read a file. "
+							 "You may want a client-side facility such as psql's \\copy.") : 0));
+		}
+
+		if (fstat(fileno(copy_state.copy_file), &st))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not stat file \"%s\": %m",
+							copy_state.filename)));
+
+		if (S_ISDIR(st.st_mode))
+			ereport(ERROR,
+					(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+					 errmsg("\"%s\" is a directory", copy_state.filename)));
+	}
+
+	while(1)
+	{
+		char    **field_strings;
+		int     field_strings_count;
+		int     col;
+		HeapTuple tuple;
+
+		if (! NextCopyFromRawFields(&copy_state,&field_strings,&field_strings_count))
+		{
+			break;
+		}
+		if (field_strings_count != tupdesc->natts)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("found %d fields but expected %d on line %d",
+							field_strings_count, tupdesc->natts, copy_state.cur_lineno)));
+		}
+
+		for (col = 0; col < tupdesc->natts; col++)
+		{
+			values[col] = InputFunctionCall(&in_functions[col],
+											field_strings[col],
+											typioparams[col],
+											tupdesc->attrs[col]->atttypmod);
+			nulls[col] = (field_strings[col] == NULL);
+		}
+
+		tuple = heap_form_tuple(tupdesc,values,nulls);
+		//tuple = BuildTupleFromCStrings(attinmeta, field_strings);
+		tuplestore_puttuple(tupstore, tuple);
+	}
+
+	/* close "file" */
+	if (copy_state.is_program)
+	{
+		int         pclose_rc;
+
+		pclose_rc = ClosePipeStream(copy_state.copy_file);
+		if (pclose_rc == -1)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close pipe to external command: %m")));
+		else if (pclose_rc != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
+					 errmsg("program \"%s\" failed",
+							copy_state.filename),
+					 errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+	}
+	else
+	{
+		if (copy_state.filename != NULL && FreeFile(copy_state.copy_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							copy_state.filename)));
+	}
+
+	tuplestore_donestoring(tupstore);
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+	MemoryContextSwitchTo(oldcontext);
+
+	return (Datum) 0;
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c1f492b..9fb2ff7 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5359,6 +5359,10 @@ DESCR("pg_controldata init state information as a function");
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "16 4089" _null_ _null_ "{if_not_exists,schema}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
 
+DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ copy_srf _null_ _null_ _null_ ));
+DESCR("set-returning COPY proof of concept");
+
+
 /*
  * Symbolic values for provolatile column: these indicate whether the result
  * of a function is dependent *only* on the values of its explicit arguments,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index d63ca0f..de09841 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,4 +38,7 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern Datum copy_srf(PG_FUNCTION_ARGS);
+
+
 #endif   /* COPY_H */

#15

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: David Fetter (#14)

Re: COPY as a set returning function

David Fetter wrote:

@@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
errmsg("could not read from COPY file: %m")));
break;
case COPY_OLD_FE:
-
/*
* We cannot read more than minread bytes (which in practice is 1)
* because old protocol doesn't have any clear way of separating

This change is pointless as it'd be undone by pgindent.

+	/*
+	 * Function signature is:
+	 * copy_srf( filename text default null,
+	 *           is_program boolean default false,
+	 *           format text default 'text',
+	 *           delimiter text default E'\t' in text mode, ',' in csv mode,
+	 *           null_string text default '\N',
+	 *           header boolean default false,
+	 *           quote text default '"' in csv mode only,
+	 *           escape text default null, -- defaults to whatever quote is
+	 *           encoding text default null
+	 */

This comment would be mangled by pgindent -- please add an /*--- marker
to prevent it.

+	/* param 7: escape text default null, -- defaults to whatever quote is */
+	if (PG_ARGISNULL(7))
+	{
+		copy_state.escape = copy_state.quote;
+	}
+	else
+	{
+		if (copy_state.csv_mode)
+		{
+			copy_state.escape = TextDatumGetCString(PG_GETARG_TEXT_P(7));
+			if (strlen(copy_state.escape) != 1)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("COPY escape must be a single one-byte character")));
+			}
+		}
+		else
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY escape available only in CSV mode")));
+		}
+	}

I don't understand why do we have all these checks. Can't we just pass
the values to COPY and let it apply the checks? That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code. Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

+		tuple = heap_form_tuple(tupdesc,values,nulls);
+		//tuple = BuildTupleFromCStrings(attinmeta, field_strings);
+		tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.

+	}
+
+	/* close "file" */
+	if (copy_state.is_program)
+	{
+		int         pclose_rc;
+
+		pclose_rc = ClosePipeStream(copy_state.copy_file);
+		if (pclose_rc == -1)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close pipe to external command: %m")));
+		else if (pclose_rc != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
+					 errmsg("program \"%s\" failed",
+							copy_state.filename),
+					 errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+	}
+	else
+	{
+		if (copy_state.filename != NULL && FreeFile(copy_state.copy_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							copy_state.filename)));
+	}

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

+DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ copy_srf _null_ _null_ _null_ ));
+DESCR("set-returning COPY proof of concept");

Why is this marked "proof of concept"? If this is a PoC only, why are
you submitting it as a patch?

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Corey Huinker

corey.huinker@gmail.com

almost 9 years ago

In reply to: Alvaro Herrera (#15)

Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 11:57 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

David Fetter wrote:

@@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int

minread, int maxread)

errmsg("could not read

from COPY file: %m")));

break;
case COPY_OLD_FE:
-
/*
* We cannot read more than minread bytes (which

in practice is 1)

* because old protocol doesn't have any clear way

of separating

This change is pointless as it'd be undone by pgindent.
+     /*
+      * Function signature is:
+      * copy_srf( filename text default null,
+      *           is_program boolean default false,
+      *           format text default 'text',
+      *           delimiter text default E'\t' in text mode, ',' in csv
mode,
+      *           null_string text default '\N',
+      *           header boolean default false,
+      *           quote text default '"' in csv mode only,
+      *           escape text default null, -- defaults to whatever
quote is
+      *           encoding text default null
+      */
This comment would be mangled by pgindent -- please add an /*--- marker
to prevent it.

+ /* param 7: escape text default null, -- defaults to whatever

quote is */
+     if (PG_ARGISNULL(7))
+     {
+             copy_state.escape = copy_state.quote;
+     }
+     else
+     {
+             if (copy_state.csv_mode)
+             {
+                     copy_state.escape = TextDatumGetCString(PG_GETARG_
TEXT_P(7));
+                     if (strlen(copy_state.escape) != 1)
+                     {
+                             ereport(ERROR,
+
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

+ errmsg("COPY escape must

be a single one-byte character")));
+                     }
+             }
+             else
+             {
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_FEATURE_NOT_
SUPPORTED),

+ errmsg("COPY escape available

only in CSV mode")));

+ }
+ }

I don't understand why do we have all these checks. Can't we just pass
the values to COPY and let it apply the checks? That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code. Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.
+             tuple = heap_form_tuple(tupdesc,values,nulls);
+             //tuple = BuildTupleFromCStrings(attinmeta,
field_strings);

+ tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.
+     }
+
+     /* close "file" */
+     if (copy_state.is_program)
+     {
+             int         pclose_rc;
+
+             pclose_rc = ClosePipeStream(copy_state.copy_file);
+             if (pclose_rc == -1)
+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("could not close pipe to
external command: %m")));
+             else if (pclose_rc != 0)
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_EXTERNAL_
ROUTINE_EXCEPTION),
+                                      errmsg("program \"%s\" failed",
+
copy_state.filename),

+ errdetail_internal("%s",

wait_result_to_str(pclose_rc))));
+     }
+     else
+     {
+             if (copy_state.filename != NULL &&
FreeFile(copy_state.copy_file))
+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("could not close file
\"%s\": %m",

+

copy_state.filename)));

+ }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

+DATA(insert OID = 3353 ( copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v

u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_
copy_srf _null_ _null_ _null_ ));

+DESCR("set-returning COPY proof of concept");

Why is this marked "proof of concept"? If this is a PoC only, why are
you submitting it as a patch?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Feel free to mark it returned as feedback. The concept didn't generate as
much enthusiasm as I had hoped, so I think the right thing to do now is
scale it back to a patch that makes CopyFromRawFields() externally visible
so that extensions can use them.

#17

David Fetter

david@fetter.org

almost 9 years ago

In reply to: Corey Huinker (#16)

Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:

Feel free to mark it returned as feedback. The concept didn't
generate as much enthusiasm as I had hoped, so I think the right
thing to do now is scale it back to a patch that makes
CopyFromRawFields() externally visible so that extensions can use
them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement. That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Corey Huinker

corey.huinker@gmail.com

almost 9 years ago

In reply to: David Fetter (#17)

Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 1:10 PM, David Fetter <david@fetter.org> wrote:

On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:

Feel free to mark it returned as feedback. The concept didn't
generate as much enthusiasm as I had hoped, so I think the right
thing to do now is scale it back to a patch that makes
CopyFromRawFields() externally visible so that extensions can use
them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement. That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

True about that. I was referring to "ooh! I need that!"-type interest. I'll
proceed, keeping in mind that there's a fallback position of just making
some of the guts of COPY available to be called by extensions like was done
for file_fdw.

#19

Corey Huinker

corey.huinker@gmail.com

almost 9 years ago

In reply to: Alvaro Herrera (#15)

Re: COPY as a set returning function

+ /* param 7: escape text default null, -- defaults to whatever

quote is */
+     if (PG_ARGISNULL(7))
+     {
+             copy_state.escape = copy_state.quote;
+     }
+     else
+     {
+             if (copy_state.csv_mode)
+             {
+                     copy_state.escape = TextDatumGetCString(PG_GETARG_
TEXT_P(7));
+                     if (strlen(copy_state.escape) != 1)
+                     {
+                             ereport(ERROR,
+
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

+ errmsg("COPY escape must

be a single one-byte character")));
+                     }
+             }
+             else
+             {
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_FEATURE_NOT_
SUPPORTED),

+ errmsg("COPY escape available

only in CSV mode")));

+ }
+ }

I don't understand why do we have all these checks. Can't we just pass
the values to COPY and let it apply the checks? That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code. Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
having a relation in pg_class to reference for attributes.
In this case, there is no such relation. So I'd have to fake a relcache
entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
happy to go that route if you think it's a good idea.

+             tuple = heap_form_tuple(tupdesc,values,nulls);
+             //tuple = BuildTupleFromCStrings(attinmeta,
field_strings);

+ tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.

Good to know!

+     }
+
+     /* close "file" */
+     if (copy_state.is_program)
+     {
+             int         pclose_rc;
+
+             pclose_rc = ClosePipeStream(copy_state.copy_file);
+             if (pclose_rc == -1)
+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("could not close pipe to

external command: %m")));

+             else if (pclose_rc != 0)
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_EXTERNAL_

ROUTINE_EXCEPTION),

+                                      errmsg("program \"%s\" failed",
+

copy_state.filename),

+ errdetail_internal("%s",

wait_result_to_str(pclose_rc))));

+     }
+     else
+     {
+             if (copy_state.filename != NULL &&

FreeFile(copy_state.copy_file))

+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("could not close file

\"%s\": %m",

+

copy_state.filename)));

+ }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

Yes. This got started as a patch to core because not all of the parts of
COPY are externally callable, and aren't broken down in a way that allowed
for use in a SRF.

I'll get to work on these suggestions.

#20

Corey Huinker

corey.huinker@gmail.com

almost 9 years ago

In reply to: Corey Huinker (#19)

1 attachment(s)

Re: COPY as a set returning function

I don't understand why do we have all these checks. Can't we just pass
the values to COPY and let it apply the checks? That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code. Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
having a relation in pg_class to reference for attributes.
In this case, there is no such relation. So I'd have to fake a relcache
entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
happy to go that route if you think it's a good idea.
+             tuple = heap_form_tuple(tupdesc,values,nulls);
+             //tuple = BuildTupleFromCStrings(attinmeta,
field_strings);

+ tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.
Good to know!

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

Yes. This got started as a patch to core because not all of the parts of
COPY are externally callable, and aren't broken down in a way that allowed
for use in a SRF.

I'll get to work on these suggestions.

I've put in some more work on this patch, mostly just taking Alvaro's
suggestions, which resulted in big code savings.

I had to add a TupleDesc parameter to BeginCopy() and BeginCopyFrom(). This
seemed the easiest way to leverage the existing tested code (and indeed, it
worked nearly out-of-the-box). The only drawback is that a minor change
will have to be made to the BeginCopyFrom() call in file_fdw.c, and any
other extensions that leverage COPY. We could make compatibility functions
that take the original signature and pass it along to the corresponding
function with rsTupDesc set to NULL.

Some issues:
- I'm still not sure if the direction we want to go is a set returning
function, or a change in grammar that lets us use COPY as a CTE or similar.
- This function will have the same difficulties as adding the program
option did to file_fdw: there's very little we can reference that isn't
os/environment specific
- Inline (STDIN) prompts the user for input, but gives the error: server
sent data ("D" message) without prior row description ("T" message). I
looked for a place where the Relation was consulted for the row
description, but I'm not finding it.

I can continue to flesh this out with documentation and test cases if there
is consensus that this is the way to go.

# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
x | y
---+---
x | y
(1 row)

Time: 1.074 ms
# select * from copy_srf('echo "x\t4"',true) as t(x text, y integer);
x | y
---+---
x | 4
(1 row)

Time: 1.095 ms
# select * from copy_srf(null) as t(x text, y integer);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

a 4
b 5
\.

server sent data ("D" message) without prior row description ("T" message)

Attachments:

copy_srf_function_003.difftext/plain; charset=US-ASCII; name=copy_srf_function_003.diffDownload

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 4dfedf8..26f81f3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1065,6 +1065,21 @@ LANGUAGE INTERNAL
 STRICT IMMUTABLE PARALLEL SAFE
 AS 'jsonb_insert';
 
+CREATE OR REPLACE FUNCTION copy_srf(
+	IN filename text,
+	IN is_program boolean DEFAULT false,
+	IN format text DEFAULT null,
+	IN delimiter text DEFAULT null,
+	IN null_string text DEFAULT null,
+	IN header boolean DEFAULT null,
+	IN quote text DEFAULT null,
+	IN escape text DEFAULT null,
+	IN encoding text DEFAULT null)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000 CALLED ON NULL INPUT
+AS 'copy_srf';
+
 -- The default permissions for functions mean that anyone can execute them.
 -- A number of functions shouldn't be executable by just anyone, but rather
 -- than use explicit 'superuser()' checks in those functions, we use the GRANT
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f9362be..4e6a32c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -30,6 +30,7 @@
 #include "commands/defrem.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
@@ -286,7 +287,8 @@ static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
 
 /* non-export function prototypes */
-static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
+static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel, 
+		  TupleDesc rsTupDesc,
 		  RawStmt *raw_query, Oid queryRelId, List *attnamelist,
 		  List *options);
 static void EndCopy(CopyState cstate);
@@ -562,7 +564,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
 						 errmsg("could not read from COPY file: %m")));
 			break;
 		case COPY_OLD_FE:
-
 			/*
 			 * We cannot read more than minread bytes (which in practice is 1)
 			 * because old protocol doesn't have any clear way of separating
@@ -967,7 +968,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 			PreventCommandIfReadOnly("COPY FROM");
 		PreventCommandIfParallelMode("COPY FROM");
 
-		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
+		cstate = BeginCopyFrom(pstate, rel, NULL, stmt->filename, stmt->is_program,
 							   stmt->attlist, stmt->options);
 		cstate->range_table = range_table;
 		*processed = CopyFrom(cstate);	/* copy from file to database */
@@ -1370,6 +1371,7 @@ static CopyState
 BeginCopy(ParseState *pstate,
 		  bool is_from,
 		  Relation rel,
+		  TupleDesc rsTupDesc,
 		  RawStmt *raw_query,
 		  Oid queryRelId,
 		  List *attnamelist,
@@ -1396,6 +1398,9 @@ BeginCopy(ParseState *pstate,
 	/* Extract options from the statement node tree */
 	ProcessCopyOptions(pstate, cstate, is_from, options);
 
+	/* Cannot have both a relation a result set */
+	Assert(rel == NULL || rsTupDesc == NULL);
+
 	/* Process the source/target relation or query */
 	if (rel)
 	{
@@ -1436,6 +1441,14 @@ BeginCopy(ParseState *pstate,
 			cstate->partition_tuple_slot = partition_tuple_slot;
 		}
 	}
+	else if (rsTupDesc)
+	{
+		/* COPY FROM file/program to result set */
+		Assert(!raw_query);
+		Assert(is_from);
+		Assert(attnamelist == NIL);
+		tupDesc = rsTupDesc;
+	}
 	else
 	{
 		List	   *rewritten;
@@ -1802,7 +1815,7 @@ BeginCopyTo(ParseState *pstate,
 							RelationGetRelationName(rel))));
 	}
 
-	cstate = BeginCopy(pstate, false, rel, query, queryRelId, attnamelist,
+	cstate = BeginCopy(pstate, false, rel, NULL, query, queryRelId, attnamelist,
 					   options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
@@ -2875,6 +2888,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
 CopyState
 BeginCopyFrom(ParseState *pstate,
 			  Relation rel,
+			  TupleDesc rsTupDesc,
 			  const char *filename,
 			  bool is_program,
 			  List *attnamelist,
@@ -2895,13 +2909,14 @@ BeginCopyFrom(ParseState *pstate,
 	MemoryContext oldcontext;
 	bool		volatile_defexprs;
 
-	cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+	cstate = BeginCopy(pstate, true, rel, rsTupDesc, NULL, InvalidOid, attnamelist, options);
 	oldcontext = MemoryContextSwitchTo(cstate->copycontext);
 
 	/* Initialize state variables */
 	cstate->fe_eof = false;
 	cstate->eol_type = EOL_UNKNOWN;
-	cstate->cur_relname = RelationGetRelationName(cstate->rel);
+	if (cstate->rel)
+		cstate->cur_relname = RelationGetRelationName(cstate->rel);
 	cstate->cur_lineno = 0;
 	cstate->cur_attname = NULL;
 	cstate->cur_attval = NULL;
@@ -2913,7 +2928,10 @@ BeginCopyFrom(ParseState *pstate,
 	cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
 	cstate->raw_buf_index = cstate->raw_buf_len = 0;
 
-	tupDesc = RelationGetDescr(cstate->rel);
+	if (cstate->rel)
+		tupDesc = RelationGetDescr(cstate->rel);
+	else
+		tupDesc = rsTupDesc;
 	attr = tupDesc->attrs;
 	num_phys_attrs = tupDesc->natts;
 	num_defaults = 0;
@@ -2950,8 +2968,10 @@ BeginCopyFrom(ParseState *pstate,
 		{
 			/* attribute is NOT to be copied from input */
 			/* use default value if one exists */
-			Expr	   *defexpr = (Expr *) build_column_default(cstate->rel,
-																attnum);
+			Expr	   *defexpr = NULL;
+
+			if (rel)
+				defexpr = (Expr *) build_column_default(cstate->rel, attnum);
 
 			if (defexpr != NULL)
 			{
@@ -4740,3 +4760,167 @@ CreateCopyDestReceiver(void)
 
 	return (DestReceiver *) self;
 }
+
+Datum
+copy_srf(PG_FUNCTION_ARGS)
+{
+	ParseState		*pstate = NULL;
+	ReturnSetInfo	*rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc		tupdesc;
+	Tuplestorestate *tupstore = NULL;
+	MemoryContext	per_query_ctx;
+	MemoryContext	oldcontext;
+	FmgrInfo        *in_functions;
+	Oid             *typioparams;
+	Oid             in_func_oid;
+
+	CopyState	cstate;
+	int			col;
+
+	Datum       *values;
+	bool		*nulls;
+
+	char		*filename = NULL;
+	bool		is_program = false;
+	List		*options = NIL;
+
+	/* check to see if caller supports us returning a tuplestore */
+	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("set-valued function called in context that cannot accept a set")));
+
+	if (!(rsinfo->allowedModes & SFRM_Materialize) || rsinfo->expectedDesc == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("materialize mode required, but it is not allowed in this context")));
+
+	tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
+	values = (Datum *) palloc(tupdesc->natts * sizeof(Datum));
+	nulls = (bool *) palloc(tupdesc->natts * sizeof(bool));
+	in_functions = (FmgrInfo *) palloc(tupdesc->natts * sizeof(FmgrInfo));
+	typioparams = (Oid *) palloc(tupdesc->natts * sizeof(Oid));
+
+	for (col = 0; col < tupdesc->natts; col++)
+	{
+		getTypeInputInfo(tupdesc->attrs[col]->atttypid,&in_func_oid,&typioparams[col]);
+		fmgr_info(in_func_oid,&in_functions[col]);
+	}
+
+	/*----------
+	 * Function signature is:
+	 * copy_srf( filename text,
+	 *           is_program boolean default false,
+	 *           format text default null,
+	 *           delimiter text default null,
+	 *           null_string text default null,
+	 *           header boolean default null,
+	 *           quote text default null,
+	 *           escape text default null,
+	 *           encoding text default null
+	 *----------
+	 */
+	/* param 0: filename */
+	if (! PG_ARGISNULL(0))
+		filename = TextDatumGetCString(PG_GETARG_TEXT_P(0));
+
+	/* param 1: is_program */
+	if (! PG_ARGISNULL(1))
+		is_program = PG_GETARG_BOOL(1);
+
+	if (filename == NULL && is_program)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("is_program cannot be true in pipe mode")));
+
+
+	/* param 2: format - text, csv, binary */
+	if (! PG_ARGISNULL(2))
+		options = lcons(makeDefElem("format",
+									(Node *) makeString( TextDatumGetCString(PG_GETARG_TEXT_P(2))),-1),
+						options);
+
+	/* param 3: delimiter text default E'\t', */
+	if (! PG_ARGISNULL(3))
+		options = lcons(makeDefElem("delimiter",
+									(Node *) makeString( TextDatumGetCString(PG_GETARG_TEXT_P(3))),-1),
+						options);
+
+	/* param 4: null_string text default '\N', */
+	if (! PG_ARGISNULL(4))
+		options = lcons(makeDefElem("null",
+									(Node *) makeString( TextDatumGetCString(PG_GETARG_TEXT_P(4))),-1),
+						options);
+
+	/* param 5: header boolean default false, */
+	if (! PG_ARGISNULL(5))
+		if (PG_GETARG_BOOL(5))
+			options = lcons(makeDefElem("header", (Node *) makeString( "true"),-1),options);
+
+	/* param 6: quote text default '"', */
+	if (! PG_ARGISNULL(6))
+		options = lcons(makeDefElem("quote",
+									(Node *) makeString( TextDatumGetCString(PG_GETARG_TEXT_P(6))),-1),
+						options);
+
+	/* param 7: escape text default null, -- defaults to whatever quote is */
+	if (! PG_ARGISNULL(7))
+		options = lcons(makeDefElem("escape",
+									(Node *) makeString( TextDatumGetCString(PG_GETARG_TEXT_P(6))),-1),
+						options);
+
+	/* param 8: encoding text default null */
+	if (! PG_ARGISNULL(8))
+		options = lcons(makeDefElem("encoding",
+									(Node *) makeString( TextDatumGetCString(PG_GETARG_TEXT_P(8))),-1),
+						options);
+
+	/* let the caller know we're sending back a tuplestore */
+	rsinfo->returnMode = SFRM_Materialize;
+	per_query_ctx = fcinfo->flinfo->fn_mcxt;
+	oldcontext = MemoryContextSwitchTo(per_query_ctx);
+
+	tupstore = tuplestore_begin_heap(true,false,work_mem);
+
+	cstate = BeginCopyFrom(pstate, NULL, tupdesc, filename, is_program, NIL, options);
+
+	while(1)
+	{
+		char    **field_strings;
+		int     field_strings_count;
+		int     col;
+
+		if (! NextCopyFromRawFields(cstate,&field_strings,&field_strings_count))
+		{
+			break;
+		}
+		if (field_strings_count != tupdesc->natts)
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+					 errmsg("found %d fields but expected %d on line %d",
+							field_strings_count, tupdesc->natts, cstate->cur_lineno)));
+		}
+
+		for (col = 0; col < tupdesc->natts; col++)
+		{
+			values[col] = InputFunctionCall(&in_functions[col],
+											field_strings[col],
+											typioparams[col],
+											tupdesc->attrs[col]->atttypmod);
+			nulls[col] = (field_strings[col] == NULL);
+		}
+
+		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
+	}
+
+	tuplestore_donestoring(tupstore);
+	rsinfo->setResult = tupstore;
+	rsinfo->setDesc = tupdesc;
+
+	EndCopy(cstate);
+
+	MemoryContextSwitchTo(oldcontext);
+
+	return (Datum) 0;
+}
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index c1f492b..9fb2ff7 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -5359,6 +5359,10 @@ DESCR("pg_controldata init state information as a function");
 DATA(insert OID = 3445 ( pg_import_system_collations PGNSP PGUID 12 100 0 0 0 f f f f t f v r 2 0 2278 "16 4089" _null_ _null_ "{if_not_exists,schema}" _null_ _null_ pg_import_system_collations _null_ _null_ _null_ ));
 DESCR("import collations from operating system");
 
+DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ copy_srf _null_ _null_ _null_ ));
+DESCR("set-returning COPY proof of concept");
+
+
 /*
  * Symbolic values for provolatile column: these indicate whether the result
  * of a function is dependent *only* on the values of its explicit arguments,
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index d63ca0f..6364604 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -27,8 +27,8 @@ extern void DoCopy(ParseState *state, const CopyStmt *stmt,
 	   uint64 *processed);
 
 extern void ProcessCopyOptions(ParseState *pstate, CopyState cstate, bool is_from, List *options);
-extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, const char *filename,
-			  bool is_program, List *attnamelist, List *options);
+extern CopyState BeginCopyFrom(ParseState *pstate, Relation rel, TupleDesc rsTupDesc,
+					const char *filename, bool is_program, List *attnamelist, List *options);
 extern void EndCopyFrom(CopyState cstate);
 extern bool NextCopyFrom(CopyState cstate, ExprContext *econtext,
 			 Datum *values, bool *nulls, Oid *tupleOid);
@@ -38,4 +38,7 @@ extern void CopyFromErrorCallback(void *arg);
 
 extern DestReceiver *CreateCopyDestReceiver(void);
 
+extern Datum copy_srf(PG_FUNCTION_ARGS);
+
+
 #endif   /* COPY_H */

#21

David Fetter

david@fetter.org

almost 9 years ago

In reply to: Corey Huinker (#20)

Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 08:51:38PM -0500, Corey Huinker wrote:

I've put in some more work on this patch, mostly just taking Alvaro's
suggestions, which resulted in big code savings.

The patch applies atop master.

The test (ls) which previously crashed the backend now doesn't, and
works in a reasonable way.

The description of the function still talks about its being a proof of
concept.

There are still neither regression tests nor SGML documentation.

Are we at a point where we should add these things?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: Corey Huinker (#20)

Re: COPY as a set returning function

On 1/25/17 8:51 PM, Corey Huinker wrote:

# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);

I find these parameters weird. Just looking at this, one has no idea
what the "true" means. Why not have a "filename" and a "program"
parameter and make them mutually exclusive?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23

Peter Eisentraut

peter.eisentraut@2ndquadrant.com

almost 9 years ago

In reply to: David Fetter (#21)

Re: COPY as a set returning function

On 1/27/17 8:07 PM, David Fetter wrote:

There are still neither regression tests nor SGML documentation.

Are we at a point where we should add these things?

One could argue about the documentation at this point, since the
function is somewhat self-documenting for the time being. But surely
there should be tests.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Corey Huinker

corey.huinker@gmail.com

almost 9 years ago

In reply to: Peter Eisentraut (#22)

Re: COPY as a set returning function

On Fri, Jan 27, 2017 at 9:42 PM, Peter Eisentraut <
peter.eisentraut@2ndquadrant.com> wrote:

On 1/25/17 8:51 PM, Corey Huinker wrote:

# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);

I find these parameters weird. Just looking at this, one has no idea
what the "true" means. Why not have a "filename" and a "program"
parameter and make them mutually exclusive?

It was done that way to match the parameters of BeginCopyFrom() and it
could easily be changed.

I suppose I could have written it as:

select * from copy_srf(filename => 'echo "x\ty"', is_program => true) as
t(x text, y text);

But this goes to the core of this patch/poc: I don't know where we want to
take it next.

Options at this point are:
1. Continue with a SRF, in which case discussions about parameters are
completely valid.
2. Add a RETURNING clause to COPY. This would dispense with the parameters
problem, but potentially create others.
3. Add the TupDesc parameter to BeginCopyFrom, make sure all data
structures are visible to an extension, and call it a day. If someone wants
to write an extension that makes their own copy_srf(), they can.
4. Someone else's better idea.

#25

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Corey Huinker (#24)

Re: COPY as a set returning function

On Tue, Jan 31, 2017 at 3:05 AM, Corey Huinker <corey.huinker@gmail.com> wrote:

On Fri, Jan 27, 2017 at 9:42 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 1/25/17 8:51 PM, Corey Huinker wrote:

# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);

I find these parameters weird. Just looking at this, one has no idea
what the "true" means. Why not have a "filename" and a "program"
parameter and make them mutually exclusive?

It was done that way to match the parameters of BeginCopyFrom() and it could
easily be changed.

I suppose I could have written it as:

select * from copy_srf(filename => 'echo "x\ty"', is_program => true) as t(x
text, y text);

But this goes to the core of this patch/poc: I don't know where we want to
take it next.

Options at this point are:
1. Continue with a SRF, in which case discussions about parameters are
completely valid.
2. Add a RETURNING clause to COPY. This would dispense with the parameters
problem, but potentially create others.
3. Add the TupDesc parameter to BeginCopyFrom, make sure all data structures
are visible to an extension, and call it a day. If someone wants to write an
extension that makes their own copy_srf(), they can.
4. Someone else's better idea.

Here is a 4: Refactoring BeginCopyFrom so as instead of a Relation are
used a TupleDesc, a default expression list, and a relation name. You
could as well make NextCopyFrom() smarter so as it does nothing if no
expression contexts are given by the caller, which is the case of your
function here. To be honest, I find a bit weird to use
NextCopyFromRawFields when there is already a bunch of sanity checks
happening in NextCopyFrom(), where you surely want to have them
checked even for your function.

Looking at the last patch proposed, I find the current patch proposed
as immature, as rsTupDesc basically overlaps with the relation a
caller can define in BeginCopyFrom(), so marking this patch as
"returned with feedback".
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Corey Huinker

corey.huinker@gmail.com

almost 9 years ago

In reply to: Michael Paquier (#25)

Re: COPY as a set returning function

Here is a 4: Refactoring BeginCopyFrom so as instead of a Relation are
used a TupleDesc, a default expression list, and a relation name. You
could as well make NextCopyFrom() smarter so as it does nothing if no
expression contexts are given by the caller, which is the case of your
function here. To be honest, I find a bit weird to use
NextCopyFromRawFields when there is already a bunch of sanity checks
happening in NextCopyFrom(), where you surely want to have them
checked even for your function.

Looking at the last patch proposed, I find the current patch proposed
as immature, as rsTupDesc basically overlaps with the relation a
caller can define in BeginCopyFrom(), so marking this patch as
"returned with feedback".
--
Michael

I like that suggestion and will move forward on it. I wasn't sure there
would be support for such a refactoring.

As for the copy from stdin case, Andrew Gierth laid out why that's nearly
impossible to unwind in the network protocol (the act of starting the copy
causes the query result to end before any rows were returned), and that
might make STDIN input a dead issue.