COPY with column headings
Someone just asked about a COPY capability to supply the column headings
as the first line of the copy statement. Do we want to support
something like that? Does anyone else want such functionality?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Someone just asked about a COPY capability to supply the column headings
as the first line of the copy statement. Do we want to support
something like that?
No. Tell him to use SELECT.
(Such a capability would be a permanent hazard, because any time you
tried to import with the opposite setting from what you'd exported,
you'd either get a failure, or bad data, or silent loss of the first
line of real data.)
regards, tom lane
On Mon, Aug 16, 2004 at 10:53:36AM -0400, Bruce Momjian wrote:
Someone just asked about a COPY capability to supply the column headings
as the first line of the copy statement. Do we want to support
something like that? Does anyone else want such functionality?
Wouldn't it be more logical, and more compatible, to keep this information
as regular column information in the PGresult returned by the COPY? Or
would that clash with the difference between "command" and "query" result
handling?
Jeroen
Bruce Momjian wrote:
Someone just asked about a COPY capability to supply the column headings
as the first line of the copy statement. Do we want to support
something like that? Does anyone else want such functionality?
I had it in mind all along as a possible option for CSV mode, but given
the heat that was generated by the minimal implementation we did, I
thought I'd let it rest.
I certainly think it's worth considering, although I see Tom has
objected :-) . Say we have an option called FIRSTLINELABELS, then on
copy out it would write the headings on the first line, and on copy in
it could just ignore the first line (so it could work symmetrically).
cheers
andrew
Andrew Dunstan wrote:
Bruce Momjian wrote:
Someone just asked about a COPY capability to supply the column headings
as the first line of the copy statement. Do we want to support
something like that? Does anyone else want such functionality?I had it in mind all along as a possible option for CSV mode, but given
the heat that was generated by the minimal implementation we did, I
thought I'd let it rest.I certainly think it's worth considering, although I see Tom has
objected :-) . Say we have an option called FIRSTLINELABELS, then on
copy out it would write the headings on the first line, and on copy in
it could just ignore the first line (so it could work symmetrically).
Yes, that is what I was thinking. Is this a TODO? Sure it can be
misused but most copy options can be similarly misused.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:
Wouldn't it be more logical, and more compatible, to keep this information
as regular column information in the PGresult returned by the COPY?
That would work but would require a protocol change, which this is
surely not worth by itself. (We could put it on the to-do list for the
next protocol rev though.)
The bigger question is whether this would do anything to satisfy the
requestor. He probably wants the headings to appear in the file
resulting from COPY TO file (or the psql equivalent), which this would
not do. Providing the info in the COPY PGresult would change nothing
except the behavior of client applications specially rewritten to use it.
regards, tom lane
On Mon, Aug 16, 2004 at 11:30:49AM -0400, Tom Lane wrote:
The bigger question is whether this would do anything to satisfy the
requestor. He probably wants the headings to appear in the file
resulting from COPY TO file (or the psql equivalent), which this would
not do. Providing the info in the COPY PGresult would change nothing
except the behavior of client applications specially rewritten to use it.
True, and from a compatibility standpoint that would probably be good...
Output post-processing (translation to CSV or XML output, XSLT transforms,
what have you) IMHO should probably go on top of COPY anyway, rather than
into it. Are people really having problems with
select * from table where 1=0 ; copy table to stdout
and first printing the column names found in the first PGresult, then
dumping the COPY lines to the same fd? I think that should do the trick for
most uses, although obviously I'm no expert.
Jeroen
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I certainly think it's worth considering, although I see Tom has
objected :-) . Say we have an option called FIRSTLINELABELS, then on
copy out it would write the headings on the first line, and on copy in
it could just ignore the first line (so it could work symmetrically).
Yes, that is what I was thinking. Is this a TODO? Sure it can be
misused but most copy options can be similarly misused.
Most copy options will result in obvious failures if misused. I dislike
this one because of the high risk of silent corruption of your data
(loss of a row or insertion of a row that shouldn't be there).
Considering no one has even made a positive case why we should add it
(for instance, a use-case where it's necessary), I don't think it
belongs in the to-do list.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I certainly think it's worth considering, although I see Tom has
objected :-) . Say we have an option called FIRSTLINELABELS, then on
copy out it would write the headings on the first line, and on copy in
it could just ignore the first line (so it could work symmetrically).Yes, that is what I was thinking. Is this a TODO? Sure it can be
misused but most copy options can be similarly misused.Most copy options will result in obvious failures if misused. I dislike
this one because of the high risk of silent corruption of your data
(loss of a row or insertion of a row that shouldn't be there).
Considering no one has even made a positive case why we should add it
(for instance, a use-case where it's necessary), I don't think it
belongs in the to-do list.
The use case is that it fits in with the way spreadsheets usually do
data tables, and many will only allow you to export a whole worksheet
(including the heading row) to CSV, not a part of one. Conversely,
working with imported data tables will be harder if they lack headings.
cheers
andrew
Andrew Dunstan <andrew@dunslane.net> writes:
The use case is that it fits in with the way spreadsheets usually do
data tables, and many will only allow you to export a whole worksheet
(including the heading row) to CSV, not a part of one. Conversely,
working with imported data tables will be harder if they lack headings.
Well, if we wanted to *require* a heading row in CSV mode, it would be
relatively foolproof. What I don't like is the proposal for an
independent option; you've got a 50-50 chance of getting it wrong on
import, and that's too high odds for me.
Next question: are you imagining that the header row will actually have
any semantic significance on input? Will we check the column names?
Will we be willing to rearrange the columns if the header row claims the
column order is different than the COPY command says?
regards, tom lane
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The use case is that it fits in with the way spreadsheets usually do
data tables, and many will only allow you to export a whole worksheet
(including the heading row) to CSV, not a part of one. Conversely,
working with imported data tables will be harder if they lack headings.Well, if we wanted to *require* a heading row in CSV mode, it would be
relatively foolproof. What I don't like is the proposal for an
independent option; you've got a 50-50 chance of getting it wrong on
import, and that's too high odds for me.
No, we can't require it. Not all will have headers, and then we sure
would skip a wanted row. And while my use case referred to spreadsheets,
they are not the only sources/recipients of CSVs.
Your 50-50 chance assumes the user knows nothing about the data and uses
the switch at random. I think we're entitled to assume the user knows
something about their data and uses the switch according to what they
have/want.
Right now, if you have a CSV with a header line you have to remove the
line with something like 'sed 1d' before importing to PostgreSQL, and
when importing from PostgreSQL you have to insert the headers manually
after importing the CSV to your spreadsheet. That's got to be annoying
and at least as error-prone as providing a switch to accomodate header
lines.
Next question: are you imagining that the header row will actually have
any semantic significance on input? Will we check the column names?
Will we be willing to rearrange the columns if the header row claims the
column order is different than the COPY command says?
I'm leaning towards not trying to interpret the line on copy in.
Certainly that is the minimalist way to approach this.
cheers
andrew
Might be worthwhile to look at SQL*Server BCP format files:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/adminsq
l/ad_impt_bcp_9yat.asp
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref
/ts_ba-bz_4fec.asp
A TCL/TK front end would be especially nice. Of course, being a lazy
slug, I am not volunteering to write any of this. Just a thought in
case someone else is interested in doing it.
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, August 16, 2004 12:04 PM
To: Andrew Dunstan
Cc: Bruce Momjian; PostgreSQL-development
Subject: Re: [HACKERS] COPY with column headingsAndrew Dunstan <andrew@dunslane.net> writes:
The use case is that it fits in with the way spreadsheets usually do
data tables, and many will only allow you to export a wholeworksheet
(including the heading row) to CSV, not a part of one. Conversely,
working with imported data tables will be harder if theylack headings.
Well, if we wanted to *require* a heading row in CSV mode, it
would be relatively foolproof. What I don't like is the
proposal for an independent option; you've got a 50-50 chance
of getting it wrong on import, and that's too high odds for me.Next question: are you imagining that the header row will
actually have any semantic significance on input? Will we
check the column names? Will we be willing to rearrange the
columns if the header row claims the column order is
different than the COPY command says?regards, tom lane
---------------------------(end of
broadcast)---------------------------
TIP 6: Have you searched our list archives?
Import Notes
Resolved by subject fallback
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The use case is that it fits in with the way spreadsheets usually do
data tables, and many will only allow you to export a whole worksheet
(including the heading row) to CSV, not a part of one. Conversely,
working with imported data tables will be harder if they lack headings.Well, if we wanted to *require* a heading row in CSV mode, it would be
relatively foolproof. What I don't like is the proposal for an
independent option; you've got a 50-50 chance of getting it wrong on
import, and that's too high odds for me.
But we have other CVS options with even lower odds of success on a
random guess. People have to know if their input file has headings,
period. Telling people they can't have the option because they might
get it wrong seems strange to me.
Next question: are you imagining that the header row will actually have
any semantic significance on input? Will we check the column names?
Will we be willing to rearrange the columns if the header row claims the
column order is different than the COPY command says?
I assume no semantic significance to header row values on input.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
That'd likely be useful. Either specify the column
names with COPY or \copy, or put them in the data file
and pass an option to the command to look for them.
The only time this could be a problem is if you forget
to tell COPY to look for the field names in the file
(AND specify the field names to COPY), and the values
for the field names are such that they don't cause
data errors (probably pretty slim chance).
I'd also really like to see the ability to specify
default values for fields not included in the data
file (field defaults in the table definition is
limited and doesn't fill the order when you need to
specify various foreign key ids, specific times,
etc.).
CSN
Someone just asked about a COPY capability to supply
the column headings
as the first line of the copy statement. Do we want
to support
something like that? Does anyone else want such functionality?
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail
Import Notes
Resolved by subject fallback
Added to TODO:
o Allow COPY to optionally include column headings as the first
I know Tom didn't like it but there were others who did.
---------------------------------------------------------------------------
Andrew Dunstan wrote:
Tom Lane wrote:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I certainly think it's worth considering, although I see Tom has
objected :-) . Say we have an option called FIRSTLINELABELS, then on
copy out it would write the headings on the first line, and on copy in
it could just ignore the first line (so it could work symmetrically).Yes, that is what I was thinking. Is this a TODO? Sure it can be
misused but most copy options can be similarly misused.Most copy options will result in obvious failures if misused. I dislike
this one because of the high risk of silent corruption of your data
(loss of a row or insertion of a row that shouldn't be there).
Considering no one has even made a positive case why we should add it
(for instance, a use-case where it's necessary), I don't think it
belongs in the to-do list.The use case is that it fits in with the way spreadsheets usually do
data tables, and many will only allow you to export a whole worksheet
(including the heading row) to CSV, not a part of one. Conversely,
working with imported data tables will be harder if they lack headings.cheers
andrew
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073