COPY as a set returning function

Started by Corey Huinkerover 9 years ago26 messageshackers
Jump to latest
#1Corey Huinker
corey.huinker@gmail.com

Attached is a _very_ rough patch implementing a proof-of-concept function
copy_srf();

It allows you to do things like this:

# select a,c,e from copy_srf('echo 12,345,67,89,2016-01-01',true) as t(a
integer, b text, c text, d text, e date);
a | c | e
----+----+------------
12 | 67 | 2016-01-01
(1 row)

Uses for this include:
- avoiding the pattern of creating a temp table just to select all the rows
back out and then discard the table (avoidable disk activity, avoidable oid
churn)
- avoiding the pattern of creating a file_fdw table in pg_temp just to drop
it after one select (avoidable oid churn)
- filtering file/program input by the columns that are relevant to the
user's needs.

This experiment arose from my realization that file_fdw just plugs into the
externally visible portions of copy.c to do all of it's work. So why
couldn't we do the same for a set returning function? Of course it wasn't
as simple as that. The existing Begin/NextCopyFrom functions require the
->rel to be a valid Oid...which we won't have in this context, so I had to
bypass that code and use CopyFromRawFields() directly...which isn't
externally visible, hence this being a patch to core rather than an
extension.

Currently the function only accepts two parameters, "filename" and
"is_program". Header is always false and csv mode is always true. Obviously
if we go forward on this, we'll want to add that functionality back in, but
I'm holding off for now to keep the example simple and wait for consensus
on future direction.

As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for
all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM
'filename' a legit CTE.

Regardless of the path forward, I'm going to need help in getting there,
hence this email. Thank you for your consideration.

Attachments:

copy_as_a_srf.diffapplication/octet-stream; name=copy_as_a_srf.diffDownload+235-1
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Corey Huinker (#1)
Re: COPY as a set returning function

Corey Huinker <corey.huinker@gmail.com> writes:

Attached is a _very_ rough patch implementing a proof-of-concept function
copy_srf();
...
As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for
all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5 FROM
'filename' a legit CTE.

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function. Once you had the framework for that, other
rows-returning utility commands such as EXPLAIN might plug in as well,
whenever somebody got enough of an itch for it.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: COPY as a set returning function

On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Corey Huinker <corey.huinker@gmail.com> writes:

Attached is a _very_ rough patch implementing a proof-of-concept

function

copy_srf();
...
As for that future direction, we could either have:
- a robust function named something like copy_srf(), with parameters for
all of the relevant options found in the COPY command
- a function that accepts an options string and parse that
- we could alter the grammar to make COPY RETURNING col1, col3, col5

FROM

'filename' a legit CTE.

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function. Once you had the framework for that, other
rows-returning utility commands such as EXPLAIN might plug in as well,
whenever somebody got enough of an itch for it.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

Show quoted text

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#3)
Re: COPY as a set returning function

Craig Ringer <craig.ringer@2ndquadrant.com> writes:

On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying. You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument. A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Corey Huinker
corey.huinker@gmail.com
In reply to: Tom Lane (#4)
Re: COPY as a set returning function

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying. You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument. A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

regards, tom lane

I would like to make COPY itself a SRF. That's a bit beyond my
capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy AS (
COPY FROM 'example.csv' TO *RESULT SET(c1 text, c2 integer, dummy1
text, dummy2 text, c5 date)* WITH (FORMAT CSV)
*RETURNING c1, c2, c3*
)
SELECT ...
FROM my_copy
LEFT OUTER JOIN ref_table ...

The RESULT SET (colspecs) bit would be the rsinfo currently used by
copy_srf(). It would be nice if the CTE declaration could take types, but
it can't.

The RETURNING clause here doesn't return all the columns made available
from the COPY. That would be nice, but not required because the same
filtration could be done when the CTE is referenced. So if we require RETURNING
* be the only returning option I'd be fine with that.

If we're ok with adding a function like copy_srf() to the core, will we
still be happy with it when COPY does get a RETURNING clause?

Somewhat off-topic: here's some other observations of a n00b who spent a
fair amount of time looking at the copy.c code.

1. BeginCopyFrom and NextCopyFrom pull attlist/tupdesc info from ->rel,
repeatedly. If we were going to try to leverage that code we'd need to
store those things in a separate cstate member so that we add complexity
only in the initialization of the copy state data struct, pulling the
result structure from rsinfo rather than a relation. There's probably a
minor performance gain to be had in keeping that info around. Refactoring
those two procs to allow for a pre-set attlist/tupdesc would help.

2. NextCopyFrom() checks every single time to see if it's row 0 and if it
should skip this header row. I know a single (row_num == 0 && has_header)
isn't much extra processing, but shouldn't we digest and discard headers
before going into the per-row loop?

3. All the code that handles indexes, constraints, buffering, etc, simply
doesn't apply in the SRF context.

4. The code somewhat needlessly mixes code for the COPY FROM and COPY TO
cases. There's probably a good reason for this, but it made for a lot of
clutter in achieving my very narrow goal.

#6Craig Ringer
craig@2ndquadrant.com
In reply to: Corey Huinker (#1)
Re: COPY as a set returning function

On 15 Oct. 2016 04:56, "Corey Huinker" <corey.huinker@gmail.com> wrote:

I would like to make COPY itself a SRF. That's a bit beyond my

capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy AS (
COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1

text, dummy2 text, c5 date) WITH (FORMAT CSV)

RETURNING c1, c2, c3
)

Strong -1 from me on this approach. Our CTE implementation materializes
everything so this is no better than COPYing to a temp table.

Not unless you plan to fix that (and figure out the backward compatibility
issues since the bug is documented as a feature) or implement RETURNING in
subqueries... I'd go for the function.

#7Corey Huinker
corey.huinker@gmail.com
In reply to: Craig Ringer (#6)
Re: COPY as a set returning function

On Sun, Oct 16, 2016 at 9:01 AM, Craig Ringer <craig.ringer@2ndquadrant.com>
wrote:

On 15 Oct. 2016 04:56, "Corey Huinker" <corey.huinker@gmail.com> wrote:

I would like to make COPY itself a SRF. That's a bit beyond my

capabilities, so if that is the route we want to go, I will need help.

The syntax would probably look like this (new bits in bold):

WITH my_copy AS (
COPY FROM 'example.csv' TO RESULT SET(c1 text, c2 integer, dummy1

text, dummy2 text, c5 date) WITH (FORMAT CSV)

RETURNING c1, c2, c3
)

Strong -1 from me on this approach. Our CTE implementation materializes
everything so this is no better than COPYing to a temp table.

Not unless you plan to fix that (and figure out the backward compatibility
issues since the bug is documented as a feature) or implement RETURNING in
subqueries... I'd go for the function.

Well, it saves burning the oid and the pg_attribute rows. A few long
running transactions can cause pg_attribute to bloat to 400GB on one of our
systems - hence my wanting something like this function.

If it does stay a function, we only need to implement 8 of the 12 options
as parameters (FREEZE and FORCE* options don't apply). My guess is that
future options added to COPY will be more about handling output or
optimizing table inserts, neither of which mean more options for this
proposed function.

Would the best approach be to build in a core srf-returning function that
might be deprecated once COPY is set-returning AND CTEs don't have to
materialize, or to refactor what's in copy.c such that a contrib module can
easily plug into it, and have copy_srf live there?

#8Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#4)
Re: COPY as a set returning function

On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig.ringer@2ndquadrant.com> writes:

On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying. You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument. A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

One big disadvantage of SRF-returning-record syntax is that functions
are basically unwrappable with generic wrappers sans major gymnastics
such as dynamically generating the query and executing it. This is a
major disadvantage relative to the null::type hack we use in the
populate_record style functions and perhaps ought to make this
(SRF-returning-record syntax) style of use discouraged for useful
library functions. If there were a way to handle wrapping I'd
withdraw this minor objection -- this has come up in dblink
discussions a few times).

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Corey Huinker
corey.huinker@gmail.com
In reply to: Merlin Moncure (#8)
Re: COPY as a set returning function

Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY
options (filename+is_program instead of FILENAME or PROGRAM, etc)

CREATE OR REPLACE FUNCTION copy_srf(
filename text DEFAULT null,
is_program boolean DEFAULT false,
format text DEFAULT 'text', /* accepts text, csv, binary */
delimiter text DEFAULT null,
null_string text DEFAULT E'\\N',
header boolean DEFAULT false,
quote text DEFAULT null,
escape text DEFAULT null,
encoding text DEFAULT null)
RETURNS SETOF RECORD

The major utility for this (at least for me) will be in ETLs that currently
make a lot of use of temp tables, and wish to either reduce I/O or avoid
pg_attribute bloat.

I have not yet implemented STDIN mode, but it's a start.

$ psql test
psql (10devel)
Type "help" for help.

# select * from copy_srf('echo 1,2; echo 4,5',true,'csv') as t(x text, y
text);
x | y
---+---
1 | 2
4 | 5
(2 rows)

Time: 1.456 ms
# select * from copy_srf('/tmp/small_file.txt'::text,false,'text') as
t(x text, y text);
x | y
-----+-----
1 | 2
a | b
cde | fgh
(3 rows)

On Mon, Oct 17, 2016 at 2:33 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

Show quoted text

On Fri, Sep 30, 2016 at 9:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Craig Ringer <craig.ringer@2ndquadrant.com> writes:

On 1 Oct. 2016 05:20, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think the last of those suggestions has come up before. It has the
large advantage that you don't have to remember a different syntax for
copy-as-a-function.

That sounds fantastic. It'd help this copy variant retain festure parity
with normal copy. And it'd bring us closer to being able to FETCH in non
queries.

On second thought, though, this couldn't exactly duplicate the existing
COPY syntax, because COPY relies heavily on the rowtype of the named
target table to tell it what it's copying. You'd need some new syntax
to provide the list of column names and types, which puts a bit of
a hole in the "syntax we already know" argument. A SRF-returning-record
would have a leg up on that, because we do have existing syntax for
defining the concrete rowtype that any particular call returns.

One big disadvantage of SRF-returning-record syntax is that functions
are basically unwrappable with generic wrappers sans major gymnastics
such as dynamically generating the query and executing it. This is a
major disadvantage relative to the null::type hack we use in the
populate_record style functions and perhaps ought to make this
(SRF-returning-record syntax) style of use discouraged for useful
library functions. If there were a way to handle wrapping I'd
withdraw this minor objection -- this has come up in dblink
discussions a few times).

merlin

Attachments:

copy_srf_function.difftext/plain; charset=US-ASCII; name=copy_srf_function.diffDownload+398-1
#10Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Corey Huinker (#9)
Re: COPY as a set returning function

On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com>
wrote:

Attached is a patch that implements copy_srf().

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

#11Michael Paquier
michael@paquier.xyz
In reply to: Haribabu Kommi (#10)
Re: COPY as a set returning function

On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com>
wrote:

Attached is a patch that implements copy_srf().

Moved to next CF with "needs review" status.

This patch is still waiting for review. David, are you planning to
look at it by the end of the CF?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12David Fetter
david@fetter.org
In reply to: Michael Paquier (#11)
Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 02:37:57PM +0900, Michael Paquier wrote:

On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:

On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker <corey.huinker@gmail.com>
wrote:

Attached is a patch that implements copy_srf().

Moved to next CF with "needs review" status.

This patch is still waiting for review. David, are you planning to
look at it by the end of the CF?

I'll be doing this today.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13David Fetter
david@fetter.org
In reply to: Corey Huinker (#9)
Re: COPY as a set returning function

On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:

Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY
options (filename+is_program instead of FILENAME or PROGRAM, etc)

The patch as it stands needs a rebase. I'll see what I can do today.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14David Fetter
david@fetter.org
In reply to: David Fetter (#13)
Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 06:16:16AM -0800, David Fetter wrote:

On Mon, Oct 31, 2016 at 04:45:40PM -0400, Corey Huinker wrote:

Attached is a patch that implements copy_srf().

The function signature reflects cstate more than it reflects the COPY
options (filename+is_program instead of FILENAME or PROGRAM, etc)

The patch as it stands needs a rebase. I'll see what I can do today.

Please find attached a rebase, which fixes an OID collision that crept in.

- The patch builds against master and passes "make check".
- The patch does not contain user-visible documentation or examples.
- The patch does not contain tests.

I got the following when I did a brief test.

SELECT * FROM copy_srf(filename => 'ls', is_program => true) AS t(l text);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachments:

copy_srf_function_002.difftext/plain; charset=us-asciiDownload+397-1
#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Fetter (#14)
Re: COPY as a set returning function

David Fetter wrote:

@@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
errmsg("could not read from COPY file: %m")));
break;
case COPY_OLD_FE:
-
/*
* We cannot read more than minread bytes (which in practice is 1)
* because old protocol doesn't have any clear way of separating

This change is pointless as it'd be undone by pgindent.

+	/*
+	 * Function signature is:
+	 * copy_srf( filename text default null,
+	 *           is_program boolean default false,
+	 *           format text default 'text',
+	 *           delimiter text default E'\t' in text mode, ',' in csv mode,
+	 *           null_string text default '\N',
+	 *           header boolean default false,
+	 *           quote text default '"' in csv mode only,
+	 *           escape text default null, -- defaults to whatever quote is
+	 *           encoding text default null
+	 */

This comment would be mangled by pgindent -- please add an /*--- marker
to prevent it.

+	/* param 7: escape text default null, -- defaults to whatever quote is */
+	if (PG_ARGISNULL(7))
+	{
+		copy_state.escape = copy_state.quote;
+	}
+	else
+	{
+		if (copy_state.csv_mode)
+		{
+			copy_state.escape = TextDatumGetCString(PG_GETARG_TEXT_P(7));
+			if (strlen(copy_state.escape) != 1)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+						 errmsg("COPY escape must be a single one-byte character")));
+			}
+		}
+		else
+		{
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("COPY escape available only in CSV mode")));
+		}
+	}

I don't understand why do we have all these checks. Can't we just pass
the values to COPY and let it apply the checks? That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code. Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

+		tuple = heap_form_tuple(tupdesc,values,nulls);
+		//tuple = BuildTupleFromCStrings(attinmeta, field_strings);
+		tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.

+	}
+
+	/* close "file" */
+	if (copy_state.is_program)
+	{
+		int         pclose_rc;
+
+		pclose_rc = ClosePipeStream(copy_state.copy_file);
+		if (pclose_rc == -1)
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close pipe to external command: %m")));
+		else if (pclose_rc != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_EXTERNAL_ROUTINE_EXCEPTION),
+					 errmsg("program \"%s\" failed",
+							copy_state.filename),
+					 errdetail_internal("%s", wait_result_to_str(pclose_rc))));
+	}
+	else
+	{
+		if (copy_state.filename != NULL && FreeFile(copy_state.copy_file))
+			ereport(ERROR,
+					(errcode_for_file_access(),
+					 errmsg("could not close file \"%s\": %m",
+							copy_state.filename)));
+	}

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

+DATA(insert OID = 3353 (  copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_ copy_srf _null_ _null_ _null_ ));
+DESCR("set-returning COPY proof of concept");

Why is this marked "proof of concept"? If this is a PoC only, why are
you submitting it as a patch?

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Corey Huinker
corey.huinker@gmail.com
In reply to: Alvaro Herrera (#15)
Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 11:57 AM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

David Fetter wrote:

@@ -562,7 +563,6 @@ CopyGetData(CopyState cstate, void *databuf, int

minread, int maxread)

errmsg("could not read

from COPY file: %m")));

break;
case COPY_OLD_FE:
-
/*
* We cannot read more than minread bytes (which

in practice is 1)

* because old protocol doesn't have any clear way

of separating

This change is pointless as it'd be undone by pgindent.

+     /*
+      * Function signature is:
+      * copy_srf( filename text default null,
+      *           is_program boolean default false,
+      *           format text default 'text',
+      *           delimiter text default E'\t' in text mode, ',' in csv

mode,

+      *           null_string text default '\N',
+      *           header boolean default false,
+      *           quote text default '"' in csv mode only,
+      *           escape text default null, -- defaults to whatever

quote is

+      *           encoding text default null
+      */

This comment would be mangled by pgindent -- please add an /*--- marker
to prevent it.

+ /* param 7: escape text default null, -- defaults to whatever

quote is */

+     if (PG_ARGISNULL(7))
+     {
+             copy_state.escape = copy_state.quote;
+     }
+     else
+     {
+             if (copy_state.csv_mode)
+             {
+                     copy_state.escape = TextDatumGetCString(PG_GETARG_

TEXT_P(7));

+                     if (strlen(copy_state.escape) != 1)
+                     {
+                             ereport(ERROR,
+

(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

+ errmsg("COPY escape must

be a single one-byte character")));

+                     }
+             }
+             else
+             {
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_FEATURE_NOT_

SUPPORTED),

+ errmsg("COPY escape available

only in CSV mode")));

+ }
+ }

I don't understand why do we have all these checks. Can't we just pass
the values to COPY and let it apply the checks? That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code. Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

+             tuple = heap_form_tuple(tupdesc,values,nulls);
+             //tuple = BuildTupleFromCStrings(attinmeta,

field_strings);

+ tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.

+     }
+
+     /* close "file" */
+     if (copy_state.is_program)
+     {
+             int         pclose_rc;
+
+             pclose_rc = ClosePipeStream(copy_state.copy_file);
+             if (pclose_rc == -1)
+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("could not close pipe to

external command: %m")));

+             else if (pclose_rc != 0)
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_EXTERNAL_

ROUTINE_EXCEPTION),

+                                      errmsg("program \"%s\" failed",
+

copy_state.filename),

+ errdetail_internal("%s",

wait_result_to_str(pclose_rc))));

+     }
+     else
+     {
+             if (copy_state.filename != NULL &&

FreeFile(copy_state.copy_file))

+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("could not close file

\"%s\": %m",

+

copy_state.filename)));

+ }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

+DATA(insert OID = 3353 ( copy_srf PGNSP PGUID 12 1 0 0 0 f f f f f t v

u 9 0 2249 "25 16 25 25 25 16 25 25 25" _null_ _null_ _null_ _null_ _null_
copy_srf _null_ _null_ _null_ ));

+DESCR("set-returning COPY proof of concept");

Why is this marked "proof of concept"? If this is a PoC only, why are
you submitting it as a patch?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Feel free to mark it returned as feedback. The concept didn't generate as
much enthusiasm as I had hoped, so I think the right thing to do now is
scale it back to a patch that makes CopyFromRawFields() externally visible
so that extensions can use them.

#17David Fetter
david@fetter.org
In reply to: Corey Huinker (#16)
Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:

Feel free to mark it returned as feedback. The concept didn't
generate as much enthusiasm as I had hoped, so I think the right
thing to do now is scale it back to a patch that makes
CopyFromRawFields() externally visible so that extensions can use
them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement. That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Corey Huinker
corey.huinker@gmail.com
In reply to: David Fetter (#17)
Re: COPY as a set returning function

On Wed, Jan 25, 2017 at 1:10 PM, David Fetter <david@fetter.org> wrote:

On Wed, Jan 25, 2017 at 12:23:23PM -0500, Corey Huinker wrote:

Feel free to mark it returned as feedback. The concept didn't
generate as much enthusiasm as I had hoped, so I think the right
thing to do now is scale it back to a patch that makes
CopyFromRawFields() externally visible so that extensions can use
them.

You're getting enthusiasm in the form of these reviews and suggestions
for improvement. That it doesn't always happen immediately is a
byproduct of the scarcity of developer time and the sheer volume of
things to which people need to pay attention.

True about that. I was referring to "ooh! I need that!"-type interest. I'll
proceed, keeping in mind that there's a fallback position of just making
some of the guts of COPY available to be called by extensions like was done
for file_fdw.

#19Corey Huinker
corey.huinker@gmail.com
In reply to: Alvaro Herrera (#15)
Re: COPY as a set returning function

+ /* param 7: escape text default null, -- defaults to whatever

quote is */

+     if (PG_ARGISNULL(7))
+     {
+             copy_state.escape = copy_state.quote;
+     }
+     else
+     {
+             if (copy_state.csv_mode)
+             {
+                     copy_state.escape = TextDatumGetCString(PG_GETARG_

TEXT_P(7));

+                     if (strlen(copy_state.escape) != 1)
+                     {
+                             ereport(ERROR,
+

(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),

+ errmsg("COPY escape must

be a single one-byte character")));

+                     }
+             }
+             else
+             {
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_FEATURE_NOT_

SUPPORTED),

+ errmsg("COPY escape available

only in CSV mode")));

+ }
+ }

I don't understand why do we have all these checks. Can't we just pass
the values to COPY and let it apply the checks? That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code. Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
having a relation in pg_class to reference for attributes.
In this case, there is no such relation. So I'd have to fake a relcache
entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
happy to go that route if you think it's a good idea.

+             tuple = heap_form_tuple(tupdesc,values,nulls);
+             //tuple = BuildTupleFromCStrings(attinmeta,

field_strings);

+ tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.

Good to know!

+     }
+
+     /* close "file" */
+     if (copy_state.is_program)
+     {
+             int         pclose_rc;
+
+             pclose_rc = ClosePipeStream(copy_state.copy_file);
+             if (pclose_rc == -1)
+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("could not close pipe to

external command: %m")));

+             else if (pclose_rc != 0)
+                     ereport(ERROR,
+                                     (errcode(ERRCODE_EXTERNAL_

ROUTINE_EXCEPTION),

+                                      errmsg("program \"%s\" failed",
+

copy_state.filename),

+ errdetail_internal("%s",

wait_result_to_str(pclose_rc))));

+     }
+     else
+     {
+             if (copy_state.filename != NULL &&

FreeFile(copy_state.copy_file))

+                     ereport(ERROR,
+                                     (errcode_for_file_access(),
+                                      errmsg("could not close file

\"%s\": %m",

+

copy_state.filename)));

+ }

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

Yes. This got started as a patch to core because not all of the parts of
COPY are externally callable, and aren't broken down in a way that allowed
for use in a SRF.

I'll get to work on these suggestions.

#20Corey Huinker
corey.huinker@gmail.com
In reply to: Corey Huinker (#19)
Re: COPY as a set returning function

I don't understand why do we have all these checks. Can't we just pass
the values to COPY and let it apply the checks? That way, when COPY is
updated to support multibyte escape chars (for example) we don't need to
touch this code. Together with removing the unneeded braces that would
make these stanzas about six lines long instead of fifteen.

If I understand you correctly, COPY (via BeginCopyFrom) itself relies on
having a relation in pg_class to reference for attributes.
In this case, there is no such relation. So I'd have to fake a relcache
entry, or refactor BeginCopyFrom() to extract a ReturnSetInfo from the
Relation and pass that along to a new function BeginCopyFromReturnSet. I'm
happy to go that route if you think it's a good idea.

+             tuple = heap_form_tuple(tupdesc,values,nulls);
+             //tuple = BuildTupleFromCStrings(attinmeta,

field_strings);

+ tuplestore_puttuple(tupstore, tuple);

No need to form a tuple; use tuplestore_putvalues here.

Good to know!

I wonder if these should be an auxiliary function in copy.c to do this.
Surely copy.c itself does pretty much the same thing ...

Yes. This got started as a patch to core because not all of the parts of
COPY are externally callable, and aren't broken down in a way that allowed
for use in a SRF.

I'll get to work on these suggestions.

I've put in some more work on this patch, mostly just taking Alvaro's
suggestions, which resulted in big code savings.

I had to add a TupleDesc parameter to BeginCopy() and BeginCopyFrom(). This
seemed the easiest way to leverage the existing tested code (and indeed, it
worked nearly out-of-the-box). The only drawback is that a minor change
will have to be made to the BeginCopyFrom() call in file_fdw.c, and any
other extensions that leverage COPY. We could make compatibility functions
that take the original signature and pass it along to the corresponding
function with rsTupDesc set to NULL.

Some issues:
- I'm still not sure if the direction we want to go is a set returning
function, or a change in grammar that lets us use COPY as a CTE or similar.
- This function will have the same difficulties as adding the program
option did to file_fdw: there's very little we can reference that isn't
os/environment specific
- Inline (STDIN) prompts the user for input, but gives the error: server
sent data ("D" message) without prior row description ("T" message). I
looked for a place where the Relation was consulted for the row
description, but I'm not finding it.

I can continue to flesh this out with documentation and test cases if there
is consensus that this is the way to go.

# select * from copy_srf('echo "x\ty"',true) as t(x text, y text);
x | y
---+---
x | y
(1 row)

Time: 1.074 ms
# select * from copy_srf('echo "x\t4"',true) as t(x text, y integer);
x | y
---+---
x | 4
(1 row)

Time: 1.095 ms
# select * from copy_srf(null) as t(x text, y integer);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

a 4
b 5
\.

server sent data ("D" message) without prior row description ("T" message)

Attachments:

copy_srf_function_003.difftext/plain; charset=US-ASCII; name=copy_srf_function_003.diffDownload+217-11
#21David Fetter
david@fetter.org
In reply to: Corey Huinker (#20)
#22Peter Eisentraut
peter_e@gmx.net
In reply to: Corey Huinker (#20)
#23Peter Eisentraut
peter_e@gmx.net
In reply to: David Fetter (#21)
#24Corey Huinker
corey.huinker@gmail.com
In reply to: Peter Eisentraut (#22)
#25Michael Paquier
michael@paquier.xyz
In reply to: Corey Huinker (#24)
#26Corey Huinker
corey.huinker@gmail.com
In reply to: Michael Paquier (#25)