COPY IN as SELECT target

Started by Andrew Dunstanabout 16 years ago16 messages
#1Andrew Dunstan
andrew@dunslane.net

Recently there was discussion about allowing a COPY statement to be a
SELECT target, returning a text array, although the syntax wasn't really
nailed down that I recall. I was thinking that we might have

COPY RETURNING ARRAY FROM ...

instead of

COPY tablename opt_column_list FROM ...

the we possibly could do things like:

SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY
FROM STDIN CSV) as t;

Thoughts?

cheers

andrew

#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Andrew Dunstan (#1)
Re: COPY IN as SELECT target

Andrew Dunstan wrote:

Recently there was discussion about allowing a COPY statement to be a
SELECT target, returning a text array, although the syntax wasn't really
nailed down that I recall. I was thinking that we might have

COPY RETURNING ARRAY FROM ...

instead of

COPY tablename opt_column_list FROM ...

It's not really returning an array, is it? It's returning a bag of rows
like a (sub)query.

the we possibly could do things like:

SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY
FROM STDIN CSV) as t;

How about just COPY FROM? As in

SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN CSV) as t

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#2)
Re: COPY IN as SELECT target

On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

How about just COPY FROM? As in

SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN CSV) as t

I had the same thought. Though it would also be nice to allow something like:

COPY (type1, type2, type3, type4) FROM STDIN CSV

...which is obviously going to create a horrible parser problem if you
actually tried to use that syntax.

...Robert

#4David Fetter
david@fetter.org
In reply to: Robert Haas (#3)
Re: COPY IN as SELECT target

On Thu, Dec 17, 2009 at 12:28:50PM -0500, Robert Haas wrote:

On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

How about just COPY FROM? As in

SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN
CSV) as t

I had the same thought. Though it would also be nice to allow
something like:

COPY (type1, type2, type3, type4) FROM STDIN CSV

...which is obviously going to create a horrible parser problem if
you actually tried to use that syntax.

How about using the CTE syntax?

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#5Robert Haas
robertmhaas@gmail.com
In reply to: David Fetter (#4)
Re: COPY IN as SELECT target

On Thu, Dec 17, 2009 at 12:38 PM, David Fetter <david@fetter.org> wrote:

On Thu, Dec 17, 2009 at 12:28:50PM -0500, Robert Haas wrote:

On Thu, Dec 17, 2009 at 12:23 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

How about just COPY FROM? As in

SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY FROM STDIN
CSV) as t

I had the same thought.  Though it would also be nice to allow
something like:

COPY (type1, type2, type3, type4) FROM STDIN CSV

...which is obviously going to create a horrible parser problem if
you actually tried to use that syntax.

How about using the CTE syntax?

I'm not sure what you're suggesting exactly, but the problem with the
syntax I suggested is that COPY (...) TO <whatever> expects the "..."
part to be a subselect. You can't make COPY (...) FROM have something
in there other than a subselect, because the parser can't fast-forward
and look at the word FROM and then go back and decide how to parse the
parenthesized stuff. That's almost magic in the general case. You'd
have to stick a keyword in there before the opening parentheses.

...Robert

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#2)
Re: COPY IN as SELECT target

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Andrew Dunstan wrote:

COPY RETURNING ARRAY FROM ...

It's not really returning an array, is it? It's returning a bag of rows
like a (sub)query.

How about just COPY FROM?

The problem with COPY FROM is that it hard-wires a decision that there
is one and only one possible result format, which I think we pretty
much proved already is the wrong thing. I'm not thrilled with "RETURNING
ARRAY" either, but we need to leave ourselves wiggle room to have more
than one result format from the same source file.

regards, tom lane

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: COPY IN as SELECT target

Tom Lane wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

Andrew Dunstan wrote:

COPY RETURNING ARRAY FROM ...

It's not really returning an array, is it? It's returning a bag of rows
like a (sub)query.

How about just COPY FROM?

The problem with COPY FROM is that it hard-wires a decision that there
is one and only one possible result format, which I think we pretty
much proved already is the wrong thing. I'm not thrilled with "RETURNING
ARRAY" either, but we need to leave ourselves wiggle room to have more
than one result format from the same source file.

Well, we could have "RETURNING type-expression" with "text[]" supported
for the first iteration.

In answer to Heiki's argument, what I wanted was exactly to return an
array of text for each row. Whatever we have needs to be able to handle
to possibility of ragged input (see previous discussion) so we can't tie
it down too tightly.

cheers

andrew

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#7)
Re: COPY IN as SELECT target

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

The problem with COPY FROM is that it hard-wires a decision that there
is one and only one possible result format, which I think we pretty
much proved already is the wrong thing. I'm not thrilled with "RETURNING
ARRAY" either, but we need to leave ourselves wiggle room to have more
than one result format from the same source file.

Well, we could have "RETURNING type-expression" with "text[]" supported
for the first iteration.

In answer to Heiki's argument, what I wanted was exactly to return an
array of text for each row. Whatever we have needs to be able to handle
to possibility of ragged input (see previous discussion) so we can't tie
it down too tightly.

I think that there are two likely possibilities for the result format:

* "Raw" data after just the de-escaping and column separation steps.
Array of text is probably the right thing here, at least for a text COPY
(doesn't seem to cover the binary case though).

* The data converted to some specified row type.

"RETURNING type-expression" is probably not good since it looks more
like the second case than the first --- and in fact it could be outright
ambiguous, what if your data actually is one column that is a text
array?

If we're willing to assume these are the *only* possibilities then we
could use "COPY FROM ..." for the first and "COPY RETURNING type-list
FROM ..." for the second. I'm a bit uncomfortable with that assumption
though; it seems likely that we'll want to shoehorn in some more
alternatives later. (Like, what about the binary case?)

regards, tom lane

#9Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#7)
Re: COPY IN as SELECT target

In answer to Heiki's argument, what I wanted was exactly to return an
array of text for each row. Whatever we have needs to be able to handle
to possibility of ragged input (see previous discussion) so we can't tie
it down too tightly.

I would have *lots* of use for this feature.

Mind you, returning (arbitrary expression) would be even better, but if
we can get returning TEXT[] for 8.5, I think it's worth doing on its own.

--Josh Berkus

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#8)
Re: COPY IN as SELECT target

On Thu, Dec 17, 2009 at 1:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

The problem with COPY FROM is that it hard-wires a decision that there
is one and only one possible result format, which I think we pretty
much proved already is the wrong thing.  I'm not thrilled with "RETURNING
ARRAY" either, but we need to leave ourselves wiggle room to have more
than one result format from the same source file.

Well, we could have "RETURNING type-expression" with  "text[]" supported
for the first iteration.

In answer to Heiki's argument, what I wanted was exactly to return an
array of text for each row. Whatever we have needs to be able to handle
to possibility of ragged input (see previous discussion) so we can't tie
it down too tightly.

I think that there are two likely possibilities for the result format:

* "Raw" data after just the de-escaping and column separation steps.
Array of text is probably the right thing here, at least for a text COPY
(doesn't seem to cover the binary case though).

* The data converted to some specified row type.

Agreed.

"RETURNING type-expression" is probably not good since it looks more
like the second case than the first --- and in fact it could be outright
ambiguous, what if your data actually is one column that is a text
array?

If we're willing to assume these are the *only* possibilities then we
could use "COPY FROM ..." for the first and "COPY RETURNING type-list
FROM ..." for the second.  I'm a bit uncomfortable with that assumption
though; it seems likely that we'll want to shoehorn in some more
alternatives later.  (Like, what about the binary case?)

You might want to specify column names as well as well as types, in
this second case.

...Robert

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: COPY IN as SELECT target

Robert Haas <robertmhaas@gmail.com> writes:

You might want to specify column names as well as well as types, in
this second case.

Well, we could do it like VALUES: arbitrarily name the columns column1
... columnN and tell people to use an alias if they want other names.
If it's convenient to fit column names into the syntax, good, but we
don't absolutely have to.

[ thinks... ] Although actually the obvious SQL-ish syntax for a rowtype
specification is

( colname typename [ , ... ] )

so that's probably what we'd want to do in the processed-data case.
Not sure about the raw-data case --- maybe a predetermined name is
okay there.

regards, tom lane

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: COPY IN as SELECT target

On Thu, Dec 17, 2009 at 1:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

You might want to specify column names as well as well as types, in
this second case.

Well, we could do it like VALUES: arbitrarily name the columns column1
... columnN and tell people to use an alias if they want other names.
If it's convenient to fit column names into the syntax, good, but we
don't absolutely have to.

[ thinks... ] Although actually the obvious SQL-ish syntax for a rowtype
specification is

       ( colname typename [ , ... ] )

so that's probably what we'd want to do in the processed-data case.

Yeah, I think that's good.

Not sure about the raw-data case --- maybe a predetermined name is
okay there.

I would expect so.

...Robert

#13Dimitri Fontaine
dfontaine@hi-media.com
In reply to: Josh Berkus (#9)
Re: COPY IN as SELECT target

Hi,

Le 17 déc. 2009 à 19:39, Josh Berkus a écrit :

Mind you, returning (arbitrary expression) would be even better, but if
we can get returning TEXT[] for 8.5, I think it's worth doing on its own.

Well, you already have it as soon as you have text[]:

INSERT INTO destination
SELECT row[0], row[1], myfunction(row[0], row[1]), row[2]::int + 1
FROM (COPY RETURNING text[] FROM '/path/to/file.cvs' CVS HEADER) as file(row);

Of course as Andrew said already what it needs that the syntax here does not cover is ragged file processing, that is accepting file content when all the rows will not have the same number of columns.

But if you have ragged input reading and COPY as a relation in a query, then you're able to apply any expression you want to in the query itself. Such as transforming the input slightly in order to conform to PostgreSQL datatype input syntaxes, e.g.

Regards,
--
dim

Let's deprecate pgloader.

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andrew Dunstan (#1)
Re: COPY IN as SELECT target

2009/12/17 Andrew Dunstan <andrew@dunslane.net>:

Recently there was discussion about allowing a COPY statement to be a SELECT
target, returning a text array, although the syntax wasn't really nailed
down that I recall. I was thinking that  we might have

  COPY RETURNING ARRAY FROM ...

instead of

  COPY tablename opt_column_list FROM ...

the we possibly could do things like:

  SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY FROM
STDIN CSV) as t;

Thoughts?

In this case copy doesn't return array - so RETURNING ARRAY is little
bit strange.

what

SELECT .. FROM (COPY VALUES [(colums)] FROM ....)

Regards
Pavel

Show quoted text

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Pavel Stehule (#14)
Re: COPY IN as SELECT target

Pavel Stehule wrote:

2009/12/17 Andrew Dunstan <andrew@dunslane.net>:

Recently there was discussion about allowing a COPY statement to be a SELECT
target, returning a text array, although the syntax wasn't really nailed
down that I recall. I was thinking that we might have

COPY RETURNING ARRAY FROM ...

instead of

COPY tablename opt_column_list FROM ...

the we possibly could do things like:

SELECT t[5] as a, 3*(t[3]::numeric) as b FROM (COPY RETURNING ARRAY FROM
STDIN CSV) as t;

Thoughts?

In this case copy doesn't return array - so RETURNING ARRAY is little
bit strange.

what

SELECT .. FROM (COPY VALUES [(colums)] FROM ....)

You are misunderstanding what I want to provide, which is that it *does*
return an array of text for each line. That was what the previous
discussion arrived at, and is illustrated in the example I showed above.

cheers

andrew

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: COPY IN as SELECT target

Tom Lane wrote:

I think that there are two likely possibilities for the result format:

* "Raw" data after just the de-escaping and column separation steps.
Array of text is probably the right thing here, at least for a text COPY
(doesn't seem to cover the binary case though).

* The data converted to some specified row type.

"RETURNING type-expression" is probably not good since it looks more
like the second case than the first --- and in fact it could be outright
ambiguous, what if your data actually is one column that is a text
array?

If we're willing to assume these are the *only* possibilities then we
could use "COPY FROM ..." for the first and "COPY RETURNING type-list
FROM ..." for the second. I'm a bit uncomfortable with that assumption
though; it seems likely that we'll want to shoehorn in some more
alternatives later. (Like, what about the binary case?)

Yeah. I think we need an explicit marker. The first of these cases is
the one I'm particularly interested in. I think you could actually get
the second from the first with a little more work anyway, but the "raw"
input as an array lets me get the things I can't easily get another way.

I think we're going to need some marker such as parentheses to
distinguish the second case. In that case, "RETURNING text[]" could be
the first case and "RETURNING (text[])" could be the second, but maybe
that's a bit too subtle. How about "RETURNING TYPE (type_list)" for the
second case?

cheers

andrew