COPY FROM WHEN condition

Started by Surafel Temesgenover 7 years ago90 messageshackers
Jump to latest
#1Surafel Temesgen
surafel3000@gmail.com

Hello,

Currently we can not moves data from a file to a table based on some
condition on a certain column but I think there are many use case for it
that worth supporting. Attache is a patch for escaping a row that does not
satisfy WHEN condition from inserting into a table and its work on the top
of commit b68ff3ea672c06

and the syntax is

COPY table_name [ ( column_name [, ...] ) ]

FROM { 'filename' | PROGRAM 'command' | STDIN }

[ [ WITH ] ( option [, ...] ) ]

[ WHEN condition ]

comment ?

Regards

Surafel

Attachments:

copy_from_when_con_v1.patchtext/x-patch; charset=US-ASCII; name=copy_from_when_con_v1.patchDownload+66-2
#2Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Surafel Temesgen (#1)
Re: COPY FROM WHEN condition

## Surafel Temesgen (surafel3000@gmail.com):

Currently we can not moves data from a file to a table based on some
condition on a certain column

You can:
COPY ( query ) TO 'filename';

There's even an example in the documentation:
https://www.postgresql.org/docs/10/static/sql-copy.html
"To copy into a file just the countries whose names start with 'A':
COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';"

Regards,
Christoph

--
Spare Space.

#3Surafel Temesgen
surafel3000@gmail.com
In reply to: Christoph Moench-Tegeder (#2)
Re: COPY FROM WHEN condition

On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder <
cmt@burggraben.net> wrote:

You can:
COPY ( query ) TO 'filename';

it is for COPY FROM

regards
Surafel

#4Corey Huinker
corey.huinker@gmail.com
In reply to: Surafel Temesgen (#3)
Re: COPY FROM WHEN condition

On Thu, Oct 11, 2018 at 5:04 AM Surafel Temesgen <surafel3000@gmail.com>
wrote:

On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder <
cmt@burggraben.net> wrote:

You can:
COPY ( query ) TO 'filename';

it is for COPY FROM

regards
Surafel

It didn't get far, but you may want to take a look at a rejected patch for
copy_srf() (set returning function)
/messages/by-id/CADkLM=doeiWQX4AGtDNG4PsWfSXz3ai7kY=PZm3sUhsUeev9Bg@mail.gmail.com
https://commitfest.postgresql.org/12/869/

Having a set returning function gives you the full expressiveness of SQL,
at the cost of an extra materialization step.

#5David Fetter
david@fetter.org
In reply to: Corey Huinker (#4)
Re: COPY FROM WHEN condition

On Thu, Oct 11, 2018 at 05:12:48AM -0400, Corey Huinker wrote:

On Thu, Oct 11, 2018 at 5:04 AM Surafel Temesgen <surafel3000@gmail.com>
wrote:

On Thu, Oct 11, 2018 at 12:00 PM Christoph Moench-Tegeder <
cmt@burggraben.net> wrote:

You can:
COPY ( query ) TO 'filename';

it is for COPY FROM

regards
Surafel

It didn't get far, but you may want to take a look at a rejected patch for
copy_srf() (set returning function)
/messages/by-id/CADkLM=doeiWQX4AGtDNG4PsWfSXz3ai7kY=PZm3sUhsUeev9Bg@mail.gmail.com
https://commitfest.postgresql.org/12/869/

Having a set returning function gives you the full expressiveness of SQL,
at the cost of an extra materialization step.

I wonder whether something JIT-like could elide this. A very
interesting subset of such WHEN clauses could be pretty
straight-forward to implement in a pretty efficient way.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#6myungkyu.lim
myungkyu.lim@samsung.com
In reply to: David Fetter (#5)
RE: COPY FROM WHEN condition

Hello,

I want test this patch, but can't apply it.

(master)$ git apply copy_from_when_con_v1.patch
error: patch failed: src/backend/commands/copy.c:849
error: src/backend/commands/copy.c: patch does not apply

fix or another way, let me know.

Best regards,
Myungkyu, Lim

#7Surafel Temesgen
surafel3000@gmail.com
In reply to: myungkyu.lim (#6)
Re: COPY FROM WHEN condition

On Fri, Oct 26, 2018 at 9:09 AM 임명규 <myungkyu.lim@samsung.com> wrote:

Hello,

I want test this patch, but can't apply it.

(master)$ git apply copy_from_when_con_v1.patch

i use patch and it work

try (master)$ patch -p1 < copy_from_when_con_v1.patch

regards

Surafel

#8Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Surafel Temesgen (#1)
Re: COPY FROM WHEN condition

Hi,

I've taken a quick look at this on the way back from pgconf.eu, and it
seems like a nice COPY improvement in a fairly good shape.

Firstly, I think it's a valuable because it allows efficiently importing
a subset of data. Currently, we either have to create an intermediate
table, copy all the data into that, do the filtering, and then insert
the subset into the actual target table. Another common approach that I
see in practice is using file_fdw to do this, but it's not particularly
straight-forward (having to create the FDW servers etc. first) not
efficient (particularly for large amounts of data). This feature allows
skipping this extra step (at least in simpler ETL cases).

So, I like the idea and I think it makes sense.

A couple of comments regarding the code/docs:

1) I think this deserves at least some regression tests. Plenty of tests
already use COPY, but there's no coverage for the new piece. So let's
add a new test suite, or maybe add a couple of tests into copy2.sql.

2) In copy.sqml, the new item is defined like this

<term><literal>WHEN Clause</literal></term>

I suggest we use just <term><literal>WHEN</literal></term>, that's what
the other items do (see ENCODING).

The other thing is that this does not say what expressions are allowed
in the WHEN clause. It seems pretty close to WHEN clause for triggers,
which e.g. mentions that subselects are not allowed. I'm pretty sure
that's true here too, because it fails like this (118 = T_SubLink):

test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
ERROR: unrecognized node type: 118

So, the patch needs to detect this, produce a reasonable error message
and document the limitations in copy.sqml, just like we do for CREATE
TRIGGER.

3) For COPY TO, the WHEN clause is accepted but ignored, leading to
confusing cases like this:

test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
COPY 151690

So, it contains subselect, but unlike COPY FROM it does not fail
(because we never execute it). The fun part is that the expression is
logically false, so a user might expect it to filter rows, yet we copy
everything.

IMHO we need to either error-out in these cases, complaining about WHEN
not being supported for COPY TO, or make it work (effectively treating
it as a simpler alternative to COPY (subselect) TO).

4) There are some minor code style issues in copy.c - the variable is
misnamed as when_cluase, there are no spaces after 'if' etc. See the
attached patch fixing this.

AFAICS we could just get rid of the extra when_cluase variable and mess
with the cstate->whenClause directly, depending on how (3) gets fixed.

5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
requiring the parentheses.

6) The skip logic in CopyFrom() seems to be slightly wrong. It does
work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
so a COPY will not respond to Ctrl-C unless it finds a row matching the
WHEN condition. If you have a highly selective condition, that's a bit
inconvenient.

It also skips

MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

so I wonder what the heap_form_tuple() right after the next_record label
will use for tuples right after a skipped one. I'd bet it'll use the
oldcontext (essentially the long-lived context), essentially making it
a memory leak.

So I suggest to get rid of the next_record label, and use 'continue'
instead of the 'goto next_record'.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

0002-review.patchtext/x-patch; name=0002-review.patchDownload+14-15
0001-rebased-patch.patchtext/x-patch; name=0001-rebased-patch.patchDownload+66-3
#9myungkyu.lim
myungkyu.lim@samsung.com
In reply to: Tomas Vondra (#8)
RE: COPY FROM WHEN condition

Hello,

Basically, this patch worked very well in my tests.

3) For COPY TO, the WHEN clause is accepted but ignored, leading to confusing cases like this:

I found same issue.

postgres=# copy t1 to '/home/lmk/t1.data' when c1 < 1;

In the 'COPY TO' statement, 'WHEN clause' does not do any extra work.
(include error or hint)

4) There are some minor code style issues in copy.c - the variable is misnamed as when_cluase, there are no spaces after 'if' etc. See the attached patch fixing this.

It is recommended to use 'pg tool' when you finish development.

src/tools/pgindent/pgindent

pgindent is used to fix the source code style to conform to pg standards.

Best regards,
Myungkyu, Lim

#10Surafel Temesgen
surafel3000@gmail.com
In reply to: Tomas Vondra (#8)
Re: COPY FROM WHEN condition

Hi,
Thank you for looking at it .
On Sun, Oct 28, 2018 at 7:19 PM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:

1) I think this deserves at least some regression tests. Plenty of tests
already use COPY, but there's no coverage for the new piece. So let's
add a new test suite, or maybe add a couple of tests into copy2.sql.

2) In copy.sqml, the new item is defined like this

<term><literal>WHEN Clause</literal></term>

I suggest we use just <term><literal>WHEN</literal></term>, that's what
the other items do (see ENCODING).

The other thing is that this does not say what expressions are allowed
in the WHEN clause. It seems pretty close to WHEN clause for triggers,
which e.g. mentions that subselects are not allowed. I'm pretty sure
that's true here too, because it fails like this (118 = T_SubLink):

test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
ERROR: unrecognized node type: 118

So, the patch needs to detect this, produce a reasonable error message
and document the limitations in copy.sqml, just like we do for CREATE
TRIGGER.

fixed

3) For COPY TO, the WHEN clause is accepted but ignored, leading to
confusing cases like this:

test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
COPY 151690

So, it contains subselect, but unlike COPY FROM it does not fail
(because we never execute it). The fun part is that the expression is
logically false, so a user might expect it to filter rows, yet we copy
everything.

IMHO we need to either error-out in these cases, complaining about WHEN
not being supported for COPY TO, or make it work (effectively treating
it as a simpler alternative to COPY (subselect) TO).

English is not my first language but I chose error-out because WHEN
condition for COPY TO seems to me semantically incorrect

AFAICS we could just get rid of the extra when_cluase variable and mess
with the cstate->whenClause directly, depending on how (3) gets fixed.

I did it this way because CopyState structure memory allocate and
initialize in BeginCopyFrom but the analysis done before it

Show quoted text

5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
requiring the parentheses.

6) The skip logic in CopyFrom() seems to be slightly wrong. It does
work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
so a COPY will not respond to Ctrl-C unless it finds a row matching the
WHEN condition. If you have a highly selective condition, that's a bit
inconvenient.

It also skips

MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

so I wonder what the heap_form_tuple() right after the next_record label
will use for tuples right after a skipped one. I'd bet it'll use the
oldcontext (essentially the long-lived context), essentially making it
a memory leak.

So I suggest to get rid of the next_record label, and use 'continue'
instead of the 'goto next_record'.

fixed

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

copy_from_when_con_v2.patchtext/x-patch; charset=US-ASCII; name=copy_from_when_con_v2.patchDownload+126-4
#11Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Surafel Temesgen (#10)
Re: COPY FROM WHEN condition

On Tue, Oct 30, 2018 at 11:47 PM Surafel Temesgen <surafel3000@gmail.com> wrote:

Hi,
Thank you for looking at it .
On Sun, Oct 28, 2018 at 7:19 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

1) I think this deserves at least some regression tests. Plenty of tests
already use COPY, but there's no coverage for the new piece. So let's
add a new test suite, or maybe add a couple of tests into copy2.sql.

2) In copy.sqml, the new item is defined like this

<term><literal>WHEN Clause</literal></term>

I suggest we use just <term><literal>WHEN</literal></term>, that's what
the other items do (see ENCODING).

The other thing is that this does not say what expressions are allowed
in the WHEN clause. It seems pretty close to WHEN clause for triggers,
which e.g. mentions that subselects are not allowed. I'm pretty sure
that's true here too, because it fails like this (118 = T_SubLink):

test=# copy t(a,b,c) from '/tmp/t.data' when ((select 1) < 10);
ERROR: unrecognized node type: 118

So, the patch needs to detect this, produce a reasonable error message
and document the limitations in copy.sqml, just like we do for CREATE
TRIGGER.

fixed

3) For COPY TO, the WHEN clause is accepted but ignored, leading to
confusing cases like this:

test=# copy t(a,b,c) to '/tmp/t.data' when ((select 100) < 10);
COPY 151690

So, it contains subselect, but unlike COPY FROM it does not fail
(because we never execute it). The fun part is that the expression is
logically false, so a user might expect it to filter rows, yet we copy
everything.

IMHO we need to either error-out in these cases, complaining about WHEN
not being supported for COPY TO, or make it work (effectively treating
it as a simpler alternative to COPY (subselect) TO).

English is not my first language but I chose error-out because WHEN condition for COPY TO seems to me semantically incorrect

AFAICS we could just get rid of the extra when_cluase variable and mess
with the cstate->whenClause directly, depending on how (3) gets fixed.

I did it this way because CopyState structure memory allocate and initialize in BeginCopyFrom but the analysis done before it

5) As I mentioned, the CREATE TRIGGER already has WHEN clause, but it
requires it to be 'WHEN (expr)'. I suggest we do the same thing here,
requiring the parentheses.

6) The skip logic in CopyFrom() seems to be slightly wrong. It does
work, but the next_record label is defined after CHECK_FOR_INTERRUPTS()
so a COPY will not respond to Ctrl-C unless it finds a row matching the
WHEN condition. If you have a highly selective condition, that's a bit
inconvenient.

It also skips

MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));

so I wonder what the heap_form_tuple() right after the next_record label
will use for tuples right after a skipped one. I'd bet it'll use the
oldcontext (essentially the long-lived context), essentially making it
a memory leak.

So I suggest to get rid of the next_record label, and use 'continue'
instead of the 'goto next_record'.

fixed

I've looked at this patch and tested.

When I use a function returning string in WHEN clause I got the following error:

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) = 'hello');
ERROR: could not determine which collation to use for lower() function
HINT: Use the COLLATE clause to set the collation explicitly.
CONTEXT: COPY hoge, line 1: "1,hoge,2018-01-01"

And then although I specified COLLATE I got an another error (127 =
T_CollateExpr):

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) collate
"en_US" = 'hello');
ERROR: unrecognized node type: 127

This error doesn't happen if I put the similar condition in WHEN
clause for triggers. I think the patch needs to produce a reasonable
error message.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#12Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: David Fetter (#5)
Re: COPY FROM WHEN condition

On Oct 11, 2018, at 10:35 AM, David Fetter <david@fetter.org> wrote:

It didn't get far, but you may want to take a look at a rejected patch for
copy_srf() (set returning function)
/messages/by-id/CADkLM=doeiWQX4AGtDNG4PsWfSXz3ai7kY=PZm3sUhsUeev9Bg@mail.gmail.com
https://commitfest.postgresql.org/12/869/

Having a set returning function gives you the full expressiveness of SQL,
at the cost of an extra materialization step.

I wonder whether something JIT-like could elide this. A very
interesting subset of such WHEN clauses could be pretty
straight-forward to implement in a pretty efficient way.

Are you thinking something like having a COPY command that provides results in such a way that they could be referenced in a FROM clause (perhaps a COPY that defines a cursor…)?

#13David Fetter
david@fetter.org
In reply to: Jim Nasby (#12)
Re: COPY FROM WHEN condition

On Wed, Oct 31, 2018 at 11:21:33PM +0000, Nasby, Jim wrote:

On Oct 11, 2018, at 10:35 AM, David Fetter <david@fetter.org> wrote:

It didn't get far, but you may want to take a look at a rejected patch for
copy_srf() (set returning function)
/messages/by-id/CADkLM=doeiWQX4AGtDNG4PsWfSXz3ai7kY=PZm3sUhsUeev9Bg@mail.gmail.com
https://commitfest.postgresql.org/12/869/

Having a set returning function gives you the full expressiveness of SQL,
at the cost of an extra materialization step.

I wonder whether something JIT-like could elide this. A very
interesting subset of such WHEN clauses could be pretty
straight-forward to implement in a pretty efficient way.

Are you thinking something like having a COPY command that provides
results in such a way that they could be referenced in a FROM clause
(perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#14Corey Huinker
corey.huinker@gmail.com
In reply to: David Fetter (#13)
Re: COPY FROM WHEN condition

Are you thinking something like having a COPY command that provides
results in such a way that they could be referenced in a FROM clause
(perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

If COPY (or a syntactical equivalent) can return a result set, then the
whole of SQL is available to filter and aggregate the results and we don't
have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program
output) has to be consumed no matter what, the columns have to be parsed no
matter what. At least some of the columns have to be converted to their
assigned datatypes enough to know whether or not to filter the row, but we
might be able push that logic inside a copy. I'm thinking of something like
this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'

In this case, there is the *opportunity* to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a
datum (though we might do so just to confirm that they conform to the data
type)
- if column d is converted first, we can filter on it and avoid converting
columns a,b
- whatever optimizations we can infer from knowing that the two surviving
columns will go directly into an aggregate

If we go this route, we can train the planner to notice other optimizations
and add those mechanisms at that time, and then existing code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every
possible future optimization.

#15Pavel Stehule
pavel.stehule@gmail.com
In reply to: Corey Huinker (#14)
Re: COPY FROM WHEN condition

pá 2. 11. 2018 v 3:57 odesílatel Corey Huinker <corey.huinker@gmail.com>
napsal:

Are you thinking something like having a COPY command that provides

results in such a way that they could be referenced in a FROM clause
(perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

If COPY (or a syntactical equivalent) can return a result set, then the
whole of SQL is available to filter and aggregate the results and we don't
have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or program
output) has to be consumed no matter what, the columns have to be parsed no
matter what. At least some of the columns have to be converted to their
assigned datatypes enough to know whether or not to filter the row, but we
might be able push that logic inside a copy. I'm thinking of something like
this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'

Without some special feature this example is not extra useful. It is based
on copy on server that can use only super user with full rights.

What should be benefit of this feature?

Regards

Pavel

Show quoted text

In this case, there is the *opportunity* to see the following
optimizations:
- columns c and e are never referenced, and need never be turned into a
datum (though we might do so just to confirm that they conform to the data
type)
- if column d is converted first, we can filter on it and avoid converting
columns a,b
- whatever optimizations we can infer from knowing that the two surviving
columns will go directly into an aggregate

If we go this route, we can train the planner to notice other
optimizations and add those mechanisms at that time, and then existing code
gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for every
possible future optimization.

#16Surafel Temesgen
surafel3000@gmail.com
In reply to: Masahiko Sawada (#11)
Re: COPY FROM WHEN condition

hi,

On Wed, Oct 31, 2018 at 10:54 AM Masahiko Sawada <sawada.mshk@gmail.com>
wrote:

On Tue, Oct 30, 2018 at 11:47 PM Surafel Temesgen <surafel3000@gmail.com>
wrote:

I've looked at this patch and tested.

When I use a function returning string in WHEN clause I got the following
error:

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) = 'hello');
ERROR: could not determine which collation to use for lower() function
HINT: Use the COLLATE clause to set the collation explicitly.
CONTEXT: COPY hoge, line 1: "1,hoge,2018-01-01"

And then although I specified COLLATE I got an another error (127 =
T_CollateExpr):

=# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) collate
"en_US" = 'hello');
ERROR: unrecognized node type: 127

This error doesn't happen if I put the similar condition in WHEN
clause for triggers. I think the patch needs to produce a reasonable
error message.

The attached patch include a fix for it .can you confirm it

regards
Surafel

Attachments:

copy_from_when_con_v3.patchtext/x-patch; charset=US-ASCII; name=copy_from_when_con_v3.patchDownload+131-4
#17Daniel Verite
daniel@manitou-mail.org
In reply to: Pavel Stehule (#15)
Re: COPY FROM WHEN condition

Pavel Stehule wrote:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'

Without some special feature this example is not extra useful. It is based
on copy on server that can use only super user with full rights.

And if superuser, one might use the file data wrapper [1]https://www.postgresql.org/docs/current/static/file-fdw.html to get
the same results without the need for the explicit COPY in the query.

BTW, this brings into question whether the [WHEN condition] clause
should be included in the options of file_fdw, as it supports pretty
much all COPY options.

Also, psql's \copy should gain the option too?

[1]: https://www.postgresql.org/docs/current/static/file-fdw.html

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

#18Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Corey Huinker (#14)
Re: COPY FROM WHEN condition

On 11/02/2018 03:57 AM, Corey Huinker wrote:

Are you thinking something like having a COPY command that provides
results in such a way that they could be referenced in a FROM clause
(perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

If COPY (or a syntactical equivalent) can return a result set, then the
whole of SQL is available to filter and aggregate the results and we
don't have to invent new syntax, or endure confusion whenCOPY-WHEN
syntax behaves subtly different from a similar FROM-WHERE.

Also, what would we be saving computationally? The whole file (or
program output) has to be consumed no matter what, the columns have to
be parsed no matter what. At least some of the columns have to be
converted to their assigned datatypes enough to know whether or not to
filter the row, but we might be able push that logic inside a copy. I'm
thinking of something like this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer,
b numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'

In this case, there is the /opportunity/ to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a
datum (though we might do so just to confirm that they conform to the
data type)
- if column d is converted first, we can filter on it and avoid
converting columns a,b
- whatever optimizations we can infer from knowing that the two
surviving columns will go directly into an aggregate

If we go this route, we can train the planner to notice other
optimizations and add those mechanisms at that time, and then existing
code gets faster.

If we go the COPY-WHEN route, then we have to make up new syntax for
every possible future optimization.

IMHO those two things address vastly different use-cases. The COPY WHEN
case deals with filtering data while importing them into a database,
while what you're describing seems to be more about querying data stored
in a CSV file. But we already do have a solution for that - FDW, and I'd
say it's working pretty well. And AFAIK it does give you tools to
implement most of what you're asking for. I don't see why should we bolt
this on top of COPY, or how is it an alternative to COPY WHEN.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#19David Fetter
david@fetter.org
In reply to: Corey Huinker (#14)
Re: COPY FROM WHEN condition

On Thu, Nov 01, 2018 at 10:57:25PM -0400, Corey Huinker wrote:

Are you thinking something like having a COPY command that provides
results in such a way that they could be referenced in a FROM clause
(perhaps a COPY that defines a cursor…)?

That would also be nice, but what I was thinking of was that some
highly restricted subset of cases of SQL in general could lend
themselves to levels of optimization that would be impractical in
other contexts.

If COPY (or a syntactical equivalent) can return a result set, then the
whole of SQL is available to filter and aggregate the results and we don't
have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
subtly different from a similar FROM-WHERE.

That's an excellent point.

Also, what would we be saving computationally? The whole file (or program
output) has to be consumed no matter what, the columns have to be parsed no
matter what. At least some of the columns have to be converted to their
assigned datatypes enough to know whether or not to filter the row, but we
might be able push that logic inside a copy. I'm thinking of something like
this:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b numeric, c text, d date, e json) )

Apologies for bike-shedding, but wouldn't the following be a better
fit with the current COPY?

COPY t(a integer, b numeric, c text, d date, e json) FROM '/path/to/foo.txt' WITH (FORMAT CSV, INLINE)

WHERE x.d >= '2018-11-01'

In this case, there is the *opportunity* to see the following optimizations:
- columns c and e are never referenced, and need never be turned into a
datum (though we might do so just to confirm that they conform to the data
type)

That sounds like something that could go inside the WITH extension
I'm proposing above.

[STRICT_TYPE boolean DEFAULT true]?

This might not be something that has to be in version 1.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#20David Fetter
david@fetter.org
In reply to: Daniel Verite (#17)
Re: COPY FROM WHEN condition

On Fri, Nov 02, 2018 at 12:58:12PM +0100, Daniel Verite wrote:

Pavel Stehule wrote:

SELECT x.a, sum(x.b)
FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
numeric, c text, d date, e json) )
WHERE x.d >= '2018-11-01'

Without some special feature this example is not extra useful. It is based
on copy on server that can use only super user with full rights.

And if superuser, one might use the file data wrapper [1] to get
the same results without the need for the explicit COPY in the query.

BTW, this brings into question whether the [WHEN condition] clause
should be included in the options of file_fdw, as it supports pretty
much all COPY options.

Also, psql's \copy should gain the option too?

tl;dr: \copy support is a very large can of worms.

psql's \copy is something which should probably be handled separately
from COPY, as it's both a way to access the filesystem without
superuser permission and an interface to the COPY part of the
protocol. It seems like poor design to add grammar to support a
single client, so we'd need to think about this in terms of what we
want to support on the client side independent of specific clients. It
also seems like a violation of separation of concerns to couple FEBE
to grammar, so there'd need to be some way to do those things
separately, too.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#21Corey Huinker
corey.huinker@gmail.com
In reply to: David Fetter (#19)
#22Daniel Verite
daniel@manitou-mail.org
In reply to: David Fetter (#20)
#23Adam Berlin
berlin.ab@gmail.com
In reply to: Daniel Verite (#22)
#24Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Adam Berlin (#23)
#25myungkyu.lim
myungkyu.lim@samsung.com
In reply to: Tomas Vondra (#24)
#26Surafel Temesgen
surafel3000@gmail.com
In reply to: Tomas Vondra (#24)
#27Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Surafel Temesgen (#26)
#28Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Tomas Vondra (#27)
#29Surafel Temesgen
surafel3000@gmail.com
In reply to: Dean Rasheed (#28)
#30Surafel Temesgen
surafel3000@gmail.com
In reply to: Tomas Vondra (#27)
#31Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Surafel Temesgen (#30)
#32Surafel Temesgen
surafel3000@gmail.com
In reply to: Tomas Vondra (#31)
#33Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Surafel Temesgen (#32)
#34Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#33)
#35Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Tomas Vondra (#31)
#37Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Robert Haas (#36)
#38Surafel Temesgen
surafel3000@gmail.com
In reply to: Alvaro Herrera (#34)
#39Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Surafel Temesgen (#38)
#40Surafel Temesgen
surafel3000@gmail.com
In reply to: Tomas Vondra (#39)
#41Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Surafel Temesgen (#40)
#42Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#41)
#43Surafel Temesgen
surafel3000@gmail.com
In reply to: Tomas Vondra (#42)
#44Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#42)
#45Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#44)
#46Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#45)
#47Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#46)
#48Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#47)
#49Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#47)
#50Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#49)
#51Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#50)
#52Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#51)
#53Surafel Temesgen
surafel3000@gmail.com
In reply to: Tomas Vondra (#50)
#54Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#52)
#55Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#54)
#56Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Surafel Temesgen (#53)
#57David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#54)
#58Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#57)
#59Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#57)
#60Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#59)
#61David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#59)
#62David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#60)
#63David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#58)
#64Andres Freund
andres@anarazel.de
In reply to: David Rowley (#62)
#65David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#64)
#66Andres Freund
andres@anarazel.de
In reply to: David Rowley (#65)
#67Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#60)
#68David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#67)
#69Andres Freund
andres@anarazel.de
In reply to: David Rowley (#68)
#70David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#69)
#71Andres Freund
andres@anarazel.de
In reply to: David Rowley (#70)
#72David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#71)
#73Andres Freund
andres@anarazel.de
In reply to: David Rowley (#72)
#74David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#73)
#75Andres Freund
andres@anarazel.de
In reply to: David Rowley (#74)
#76David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#75)
#77Andres Freund
andres@anarazel.de
In reply to: David Rowley (#76)
#78David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#77)
#79Andres Freund
andres@anarazel.de
In reply to: David Rowley (#78)
#80Andres Freund
andres@anarazel.de
In reply to: David Rowley (#78)
#81Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#79)
#82David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#80)
#83Andres Freund
andres@anarazel.de
In reply to: David Rowley (#82)
#84David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#83)
#85Andres Freund
andres@anarazel.de
In reply to: David Rowley (#84)
#86David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#85)
#87Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#85)
#88Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#87)
#89David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#88)
#90Andres Freund
andres@anarazel.de
In reply to: David Rowley (#89)