About COPY command (and probably file fdw too)

Started by Stefan Stefanovalmost 11 years ago9 messagesgeneral
Jump to latest
#1Stefan Stefanov
stefanov.sm@abv.bg

Hi,

I have been using COPY .. FROM a lot these days for reading in tabular data and it does a very good job. Still there is an inconvenience when a (large) text file contains more columns than the target table or the columns’ order differs. I can imagine three ways round and none is really nice -
- mount the file as a foreign table with all the text file’s columns then insert into the target table a select from the foreign table;
- create an intermediate table with all the text file’s columns, copy into it from the file then insert into the target table and finally drop the intermediate table when no more files are expected;
- remove the unneeded columns from the file with a text editor prior to COPY-ing.
I think that this is happening often in real life and therefore have a suggestion to add this option “[SKIP] COLUMNS <columnslist>” to the WITH clause of COPY .. FROM. It may be very useful in file fdw too.
To be able to re-arrange columns’ order would come as a free bonus for users.

Sincerely,
Stefan Stefanov

#2Nicolas Paris
niparisco@gmail.com
In reply to: Stefan Stefanov (#1)
Re: About COPY command (and probably file fdw too)

2015-05-20 22:16 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg>:

Hi,

I have been using COPY .. FROM a lot these days for reading in tabular
data and it does a very good job. Still there is an inconvenience when a
(large) text file contains more columns than the target table or the
columns’ order differs. I can imagine three ways round and none is really
nice -
- mount the file as a foreign table with all the text file’s columns then
insert into the target table a select from the foreign table;
- create an intermediate table with all the text file’s columns, copy into
it from the file then insert into the target table and finally drop the
intermediate table when no more files are expected;
- remove the unneeded columns from the file with a text editor prior to
COPY-ing.
I think that this is happening often in real life and therefore have a
suggestion to add this option “[SKIP] COLUMNS <columnslist>” to the WITH
clause of COPY .. FROM. It may be very useful in file fdw too.
To be able to re-arrange columns’ order would come as a free bonus for
users.

Sincerely,
Stefan Stefanov

​Hi,

I guess it already does (from documentation):

COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ] ( option [, ...] ) ]

Then you can order the column_name as the source file has.​

#3Stefan Stefanov
stefanov.sm@abv.bg
In reply to: Nicolas Paris (#2)
Re: About COPY command (and probably file fdw too)

Hi,

Maybe I need to clarify a little.
The suggested option “[SKIP] COLUMNS ”
would contain columns' positions in the file so that only some of the columns in a text file would be read into a table.
Example: copy the first, second and seventh columns form myfile.txt into table "stafflist". myfile.txt has many columns.
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7), ENCODING 'windows-1250')

BR,

Stefan

-------- Оригинално писмо --------

От: Nicolas Paris niparisco@gmail.com

Относно: Re: [GENERAL] About COPY command (and probably file fdw too)

До: Stefan Stefanov

Изпратено на: 20.05.2015 23:21

2015-05-20 22:16 GMT+02:00 Stefan Stefanov
stefanov.sm@abv.bg > :

Hi,

I have been using COPY .. FROM a lot these days for reading in tabular data and it does a very good job.
Still there is an inconvenience when a (large) text file contains more columns than the target table or the columns’ order differs. I can imagine three ways round and none is really nice -

- mount the file as a foreign table with all the text file’s columns then insert into the target table a select from the foreign table;

- create an intermediate table with all the text file’s columns, copy into it from the file then insert into the target table and finally drop the intermediate table when no more files are expected;

- remove the unneeded columns from the file with a text editor prior to COPY-ing.

I think that this is happening often in real life and therefore have a suggestion to add this option “[SKIP] COLUMNS ”
to the WITH clause of COPY .. FROM. It may be very useful in file fdw too.

To be able to re-arrange columns’ order would come as a free bonus for users.

Sincerely,

Stefan Stefanov

​Hi,

I guess it already does (from documentation):

COPY table_name [ ( column_name [, ...] ) ]
FROM { ' filename ' | STDIN }
[ [ WITH ] ( option [, ...] ) ]

Then you can order the column_name as the source file has.​

#4Nicolas Paris
niparisco@gmail.com
In reply to: Stefan Stefanov (#3)
Re: About COPY command (and probably file fdw too)

Hi,

To me this would be great. Why not the ability to restrict lines too
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7),
LINES(2:1000,2000:3000), ENCODING 'windows-1250')
=> subset of full data.

2015-05-21 22:25 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg>:

Show quoted text

Hi,

Maybe I need to clarify a little.
The suggested option “[SKIP] COLUMNS <columnslist>” would contain
columns' positions in the file so that only some of the columns in a text
file would be read into a table.
Example: copy the first, second and seventh columns form myfile.txt into
table "stafflist". myfile.txt has many columns.
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7), ENCODING
'windows-1250')

BR, Stefan

-------- Оригинално писмо --------
От: Nicolas Paris niparisco@gmail.com
Относно: Re: [GENERAL] About COPY command (and probably file fdw too)
До: Stefan Stefanov <stefanov.sm@abv.bg>
Изпратено на: 20.05.2015 23:21

2015-05-20 22:16 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg>:

Hi,

I have been using COPY .. FROM a lot these days for reading in tabular
data and it does a very good job. Still there is an inconvenience when a
(large) text file contains more columns than the target table or the
columns’ order differs. I can imagine three ways round and none is really
nice -
- mount the file as a foreign table with all the text file’s columns then
insert into the target table a select from the foreign table;
- create an intermediate table with all the text file’s columns, copy
into it from the file then insert into the target table and finally drop
the intermediate table when no more files are expected;
- remove the unneeded columns from the file with a text editor prior to
COPY-ing.
I think that this is happening often in real life and therefore have a
suggestion to add this option “[SKIP] COLUMNS <columnslist>” to the WITH
clause of COPY .. FROM. It may be very useful in file fdw too.
To be able to re-arrange columns’ order would come as a free bonus for
users.

Sincerely,
Stefan Stefanov

​Hi,

I guess it already does (from documentation):

COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ] ( option [, ...] ) ]

Then you can order the column_name as the source file has.​

#5Brent Wood
Brent.Wood@niwa.co.nz
In reply to: Nicolas Paris (#4)
Re: About COPY command (and probably file fdw too)

You can already do that, natively in Linux/Mac & by adding some simple tools to try & make Windows useful:

cat <FILE> | grep <filter> | psql -d <DB> -c "copy ....;"

between grep, sed, tr, awk you can do almost any in-line filtering or text manipulation you are likely to need. Or a bit of Perl/Python...

Brent Wood

Programme leader: Environmental Information Delivery
NIWA
DDI: +64 (4) 3860529

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529 | 301 Evans Bay Parade, Greta Point, Wellington | www.niwa.co.nz<http://www.niwa.co.nz&gt;
[NIWA]<http://www.niwa.co.nz&gt;
To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems.
________________________________
From: pgsql-general-owner@postgresql.org <pgsql-general-owner@postgresql.org> on behalf of Nicolas Paris <niparisco@gmail.com>
Sent: Friday, May 22, 2015 8:33 AM
To: Stefan Stefanov
Cc: Forums postgresql
Subject: Re: [GENERAL] About COPY command (and probably file fdw too)

Hi,

To me this would be great. Why not the ability to restrict lines too
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7), LINES(2:1000,2000:3000), ENCODING 'windows-1250')
=> subset of full data.

2015-05-21 22:25 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg<mailto:stefanov.sm@abv.bg>>:
Hi,

Maybe I need to clarify a little.
The suggested option “[SKIP] COLUMNS <columnslist>” would contain columns' positions in the file so that only some of the columns in a text file would be read into a table.
Example: copy the first, second and seventh columns form myfile.txt into table "stafflist". myfile.txt has many columns.
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7), ENCODING 'windows-1250')

BR, Stefan

-------- Оригинално писмо --------
От: Nicolas Paris niparisco@gmail.com<mailto:niparisco@gmail.com>
Относно: Re: [GENERAL] About COPY command (and probably file fdw too)
До: Stefan Stefanov <stefanov.sm@abv.bg<mailto:stefanov.sm@abv.bg>>
Изпратено на: 20.05.2015 23:21

2015-05-20 22:16 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg>:
Hi,

I have been using COPY .. FROM a lot these days for reading in tabular data and it does a very good job. Still there is an inconvenience when a (large) text file contains more columns than the target table or the columns’ order differs. I can imagine three ways round and none is really nice -
- mount the file as a foreign table with all the text file’s columns then insert into the target table a select from the foreign table;
- create an intermediate table with all the text file’s columns, copy into it from the file then insert into the target table and finally drop the intermediate table when no more files are expected;
- remove the unneeded columns from the file with a text editor prior to COPY-ing.
I think that this is happening often in real life and therefore have a suggestion to add this option “[SKIP] COLUMNS <columnslist>” to the WITH clause of COPY .. FROM. It may be very useful in file fdw too.
To be able to re-arrange columns’ order would come as a free bonus for users.

Sincerely,
Stefan Stefanov

​Hi,

I guess it already does (from documentation):

COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ] ( option [, ...] ) ]

Then you can order the column_name as the source file has.​

Attachments:

image56c4fa.JPGimage/jpeg; name=image56c4fa.JPGDownload
#6Melvin Davidson
melvin6925@gmail.com
In reply to: Nicolas Paris (#4)
Re: About COPY command (and probably file fdw too)

I understand what you want with regards to skipping columns in input, but
rather than wait to see if that feature is added to a future version of
PostgreSQL, probably the best work around is to

1. CREATE an intermediate table with all columns in the input text file.

2. COPY into the intermediate table.
3. INSERT into your table
SELECT cola, col2, coln from intermediate table.
4. TRUNCATE intermediate table and repeat steps 2 > 4 as needed.

On Thu, May 21, 2015 at 4:33 PM, Nicolas Paris <niparisco@gmail.com> wrote:

Hi,

To me this would be great. Why not the ability to restrict lines too
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7),
LINES(2:1000,2000:3000), ENCODING 'windows-1250')
=> subset of full data.

2015-05-21 22:25 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg>:

Hi,

Maybe I need to clarify a little.
The suggested option “[SKIP] COLUMNS <columnslist>” would contain
columns' positions in the file so that only some of the columns in a text
file would be read into a table.
Example: copy the first, second and seventh columns form myfile.txt into
table "stafflist". myfile.txt has many columns.
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7), ENCODING
'windows-1250')

BR, Stefan

-------- Оригинално писмо --------
От: Nicolas Paris niparisco@gmail.com
Относно: Re: [GENERAL] About COPY command (and probably file fdw too)
До: Stefan Stefanov <stefanov.sm@abv.bg>
Изпратено на: 20.05.2015 23:21

2015-05-20 22:16 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg>:

Hi,

I have been using COPY .. FROM a lot these days for reading in tabular
data and it does a very good job. Still there is an inconvenience when a
(large) text file contains more columns than the target table or the
columns’ order differs. I can imagine three ways round and none is really
nice -
- mount the file as a foreign table with all the text file’s columns
then insert into the target table a select from the foreign table;
- create an intermediate table with all the text file’s columns, copy
into it from the file then insert into the target table and finally drop
the intermediate table when no more files are expected;
- remove the unneeded columns from the file with a text editor prior to
COPY-ing.
I think that this is happening often in real life and therefore have a
suggestion to add this option “[SKIP] COLUMNS <columnslist>” to the WITH
clause of COPY .. FROM. It may be very useful in file fdw too.
To be able to re-arrange columns’ order would come as a free bonus for
users.

Sincerely,
Stefan Stefanov

​Hi,

I guess it already does (from documentation):

COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ] ( option [, ...] ) ]

Then you can order the column_name as the source file has.​

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

#7David G. Johnston
david.g.johnston@gmail.com
In reply to: Nicolas Paris (#4)
Re: About COPY command (and probably file fdw too)

On Thu, May 21, 2015 at 1:33 PM, Nicolas Paris <niparisco@gmail.com> wrote:

Hi,

To me this would be great. Why not the ability to restrict lines too
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7),
LINES(2:1000,2000:3000), ENCODING 'windows-1250')
=> subset of full data.

​At some level of complexity it is questionable whether a feature belongs
in core that can exist outside of it.

While I have not yet personally used pgloader it seems to accomplish much
of what is being requested.

http://pgloader.io/index.html

COPY (and \copy) serves its purpose extremely well​

​but expects the user to deal with any customization needed either before
or after it has done its thing. I believe this is for the best since such
customizations and tools have no need to operate on the same release cycle
as the core PostgreSQL project.

David J.

#8Stefan Stefanov
stefanov.sm@abv.bg
In reply to: David G. Johnston (#7)
Re: About COPY command (and probably file fdw too)

Hi,
I agree, pgloader seems to be right. And yes, it’s a matter of complexity and usability estimation.
Stefan

From: David G. Johnston
Sent: Friday, May 22, 2015 12:19 AM
To: Nicolas Paris
Cc: Stefan Stefanov ; Forums postgresql
Subject: Re: [GENERAL] About COPY command (and probably file fdw too)

On Thu, May 21, 2015 at 1:33 PM, Nicolas Paris <niparisco@gmail.com> wrote:

Hi,

To me this would be great. Why not the ability to restrict lines too
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7), LINES(2:1000,2000:3000), ENCODING 'windows-1250')

=> subset of full data.

​At some level of complexity it is questionable whether a feature belongs in core that can exist outside of it.
While I have not yet personally used pgloader it seems to accomplish much of what is being requested.
http://pgloader.io/index.html
COPY (and \copy) serves its purpose extremely well​ ​but expects the user to deal with any customization needed either before or after it has done its thing. I believe this is for the best since such customizations and tools have no need to operate on the same release cycle as the core PostgreSQL project.
David J.

#9Stefan Stefanov
stefanov.sm@abv.bg
In reply to: Melvin Davidson (#6)
Re: About COPY command (and probably file fdw too)

What you suggest is exactly the second option in the first message below but that’s a real lot of overhead.

From: Melvin Davidson
Sent: Thursday, May 21, 2015 11:48 PM
To: Nicolas Paris
Cc: Stefan Stefanov ; Forums postgresql
Subject: Re: [GENERAL] About COPY command (and probably file fdw too)

I understand what you want with regards to skipping columns in input, but rather than wait to see if that feature is added to a future version of PostgreSQL, probably the best work around is to

1. CREATE an intermediate table with all columns in the input text file.

2. COPY into the intermediate table.

3. INSERT into your table

SELECT cola, col2, coln from intermediate table.

4. TRUNCATE intermediate table and repeat steps 2 > 4 as needed.

On Thu, May 21, 2015 at 4:33 PM, Nicolas Paris <niparisco@gmail.com> wrote:

Hi,

To me this would be great. Why not the ability to restrict lines too
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7), LINES(2:1000,2000:3000), ENCODING 'windows-1250')

=> subset of full data.

2015-05-21 22:25 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg>:

Hi,

Maybe I need to clarify a little.
The suggested option “[SKIP] COLUMNS <columnslist>” would contain columns' positions in the file so that only some of the columns in a text file would be read into a table.
Example: copy the first, second and seventh columns form myfile.txt into table "stafflist". myfile.txt has many columns.
COPY stafflist (userid, username, staffid)
FROM 'myfile.txt'
WITH (FORMAT text, DELIMITER E'\t', COLUMNS (1, 2, 7), ENCODING 'windows-1250')

BR, Stefan

-------- Оригинално писмо --------
От: Nicolas Paris niparisco@gmail.com
Относно: Re: [GENERAL] About COPY command (and probably file fdw too)
До: Stefan Stefanov <stefanov.sm@abv.bg>
Изпратено на: 20.05.2015 23:21

2015-05-20 22:16 GMT+02:00 Stefan Stefanov <stefanov.sm@abv.bg>:

Hi,
I have been using COPY .. FROM a lot these days for reading in tabular data and it does a very good job. Still there is an inconvenience when a (large) text file contains more columns than the target table or the columns’ order differs. I can imagine three ways round and none is really nice -
- mount the file as a foreign table with all the text file’s columns then insert into the target table a select from the foreign table;
- create an intermediate table with all the text file’s columns, copy into it from the file then insert into the target table and finally drop the intermediate table when no more files are expected;
- remove the unneeded columns from the file with a text editor prior to COPY-ing.
I think that this is happening often in real life and therefore have a suggestion to add this option “[SKIP] COLUMNS <columnslist>” to the WITH clause of COPY .. FROM. It may be very useful in file fdw too.
To be able to re-arrange columns’ order would come as a free bonus for users.
Sincerely,
Stefan Stefanov

​Hi,

I guess it already does (from documentation):
COPY table_name [ ( column_name [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ] ( option [, ...] ) ]Then you can order the column_name as the source file has.​

--

Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.