CSV mode option for pg_dump
Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.
This should be a pretty low cost item, I expect (good newbie project?)
thoughts?
cheers
andrew
Andrew Dunstan wrote:
Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.This should be a pretty low cost item, I expect (good newbie project?)
Uhh... just about any application that can import CSV can import our
dumps. It just tell it the delimiter is a tab.
Joshua D. Drake
thoughts?
cheers
andrew
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Joshua D. Drake wrote:
Andrew Dunstan wrote:
Something someone said on IRC just now triggered a little memory ...
I think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in
the case of text dumps, but in custom or tar dumps where you might
want to get at individual data members, having an option for CSVs
that you want to load into some other product might be nice.This should be a pretty low cost item, I expect (good newbie project?)
Uhh... just about any application that can import CSV can import our
dumps. It just tell it the delimiter is a tab.
No it won't, not if there are tabs in the data.
Why do you think we did CSV in the first place? Precisely because our
dump does *not* work as a general export mechanism for arbitrary data.
cheers
andrew
Josh,
This should be a pretty low cost item, I expect (good newbie project?)
Uhh... just about any application that can import CSV can import our
dumps. It just tell it the delimiter is a tab.
Actually, there was an Summer of Code applcation to do this, but with all
the other nifty stuff it wasn't accepted -- partly because we weren't sure
that we wanted a CSV mode for dumps, partly because this should be a
weekend fix, not a 3-month project.
--
--Josh
Josh Berkus
PostgreSQL @ Sun
San Francisco
Andrew Dunstan <andrew@dunslane.net> writes:
Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.
This is silly. You'd just COPY the particular table you want, not use
pg_dump. pg_dump's already got an unreasonably large number of options
without adding ones that have essentially zero use. Also, I think there
are sufficient grounds to worry about whether a CSV dump would always
reload correctly --- we already know that that's a poorly thought out
"standard".
regards, tom lane
No it won't, not if there are tabs in the data.
<snipping noise>
Hmmm then would just double quoting the data work? At least in OOCalc
(and IIRC Excel) there is the ability to select a text delimiter.
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.This is silly. You'd just COPY the particular table you want, not use
pg_dump. pg_dump's already got an unreasonably large number of options
without adding ones that have essentially zero use. Also, I think there
are sufficient grounds to worry about whether a CSV dump would always
reload correctly --- we already know that that's a poorly thought out
"standard".
Well, if you have dozens or hundreds of tables it might well be more
convenient.
As for not reloading - I went to some trouble to make sure that we could
reload what we dumped, exactly, unless the force options are used. I
might have made a bug in that, but it isn't dependent on the particular
CSV format used.
Naturally you won't have a use for it, but I suspect others might (in
which case they had better speak up ;-) )
I suppose the alternative would be to write a little tool in perl or
whatever to do the same thing for you. Maybe a good pgfoundry project.
cheers
andrew
Here's me speaking up -- I'd definitely use it! As a quick way to pull
data into Excel to do basic reports or analysis, a CSV format would be
great. Some of our users currently pull data into Excel for quickie
analysis, but creating fixed-width data via psql requires them to parse
the data and dumping anything via pg_dump with any delimiter (tabs,
etc.) usually doesn't work due to the delimiters being embedded in the
real data.
- Bill
Show quoted text
-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
Andrew Dunstan
Sent: Monday, June 12, 2006 2:16 PM
To: Tom Lane
Cc: PG Hackers
Subject: Re: [HACKERS] CSV mode option for pg_dumpTom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Something someone said on IRC just now triggered a little
memory ...
I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have muchimportance in the
case of text dumps, but in custom or tar dumps where you
might want to
get at individual data members, having an option for CSVs
that you want
to load into some other product might be nice.
This is silly. You'd just COPY the particular table you
want, not use
pg_dump. pg_dump's already got an unreasonably large number of
options without adding ones that have essentially zero use.Also, I
think there are sufficient grounds to worry about whether a
CSV dump
would always reload correctly --- we already know that
that's a poorly
thought out "standard".
Well, if you have dozens or hundreds of tables it might well be more
convenient.As for not reloading - I went to some trouble to make sure
that we could
reload what we dumped, exactly, unless the force options are used. I
might have made a bug in that, but it isn't dependent on the
particular
CSV format used.Naturally you won't have a use for it, but I suspect others might (in
which case they had better speak up ;-) )I suppose the alternative would be to write a little tool in perl or
whatever to do the same thing for you. Maybe a good pgfoundry project.cheers
andrew
---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
On Mon, Jun 12, 2006 at 02:15:59PM -0400, Andrew Dunstan wrote:
Naturally you won't have a use for it, but I suspect others might (in
which case they had better speak up ;-) )
I'd bet that those who would find this useful are far more likely to be
on -general and not in here.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Bill Bartlett wrote:
Here's me speaking up -- I'd definitely use it! As a quick way to pull
data into Excel to do basic reports or analysis, a CSV format would be
great.
Why not just use ODBC?
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Can't -- the main production database is over at a CoLo site with access
only available via SSH, and tightly-restricted SSH at that. Generally
one of the developers will SSH over to the server, pull out whatever
data is needed into a text file via psql or pg_dump, scp the file(s)
back here and send them to the user.
We're working on implementing a full-blown reporting system to eliminate
some of this need, but it's not quite there yet.
However, CSV is a very handy format to use to send data to other people
(especially "less technical" users), so even in the future the ability
to export to CSV would be handy.
- Bill
Show quoted text
-----Original Message-----
From: Joshua D. Drake [mailto:jd@commandprompt.com]
Sent: Monday, June 12, 2006 4:15 PM
To: Bill Bartlett
Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
Subject: Re: [HACKERS] CSV mode option for pg_dumpBill Bartlett wrote:
Here's me speaking up -- I'd definitely use it! As a
quick way to pull
data into Excel to do basic reports or analysis, a CSV
format would be
great.
Why not just use ODBC?
Joshua D. Drake
--=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240
Providing the most comprehensive PostgreSQL solutions
since 1997
http://www.commandprompt.com/
Already on TODO:
pg_dump:
o %Add CSV output format
---------------------------------------------------------------------------
Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.This is silly. You'd just COPY the particular table you want, not use
pg_dump. pg_dump's already got an unreasonably large number of options
without adding ones that have essentially zero use. Also, I think there
are sufficient grounds to worry about whether a CSV dump would always
reload correctly --- we already know that that's a poorly thought out
"standard".regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bill Bartlett wrote:
Can't -- the main production database is over at a CoLo site with access
only available via SSH, and tightly-restricted SSH at that. Generally
one of the developers will SSH over to the server, pull out whatever
data is needed into a text file via psql or pg_dump, scp the file(s)
back here and send them to the user.
ODBC over an SSH tunnnel?
Matthew T. OConnor said:
Bill Bartlett wrote:
Can't -- the main production database is over at a CoLo site with
access only available via SSH, and tightly-restricted SSH at that.
Generally one of the developers will SSH over to the server, pull out
whatever data is needed into a text file via psql or pg_dump, scp the
file(s) back here and send them to the user.ODBC over an SSH tunnnel?
I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.
I know many people work in a Postgres only world. I wish everybody did, and
then we could just forget about things like CSV. They don't, so we can't.
I think I have said this before, but I'll say it again. From time to time
people thank me for things I have done for Postgres. The two things that
stand out BY FAR on the list of these are CSV import/export and dollar
quoting. This is a widely used feature.
cheers
andrew
Andrew Dunstan wrote:
Matthew T. OConnor said:
Bill Bartlett wrote:
Can't -- the main production database is over at a CoLo site with
access only available via SSH, and tightly-restricted SSH at that.
Generally one of the developers will SSH over to the server, pull out
whatever data is needed into a text file via psql or pg_dump, scp the
file(s) back here and send them to the user.ODBC over an SSH tunnnel?
I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.I know many people work in a Postgres only world. I wish everybody did, and
then we could just forget about things like CSV. They don't, so we can't.I think I have said this before, but I'll say it again. From time to time
people thank me for things I have done for Postgres. The two things that
stand out BY FAR on the list of these are CSV import/export and dollar
quoting. This is a widely used feature.
I think the bottom line is that "ease of use" isn't as high enough on
the project's priority list as you (and others) think it should be.
I personally feel as you do that we should value "ease of use" more.
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
"Andrew Dunstan" <andrew@dunslane.net> writes:
I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.
Supporting all of the CSV options in pg_dump would involve major bloat
in its option set, and it already has far too many options. If it were
just a matter of adding a "--csv" switch I wouldn't be complaining, but
there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one. Propagating
all that cruft through pg_dump would be a PITA, and updating it to track
future additions would be too.
Furthermore, the entire rationale for the feature is predicated on the
claim that programs other than pg_restore might find it useful. But
this conveniently ignores the fact that if there are any such programs
in existence, what this will really do is BREAK them, because they won't
be able to cope with all the variants that pass for CSV.
My opinions would be less negative if I thought that CSV were a
well-defined format that would never change. I don't believe that it
has either property, however, and so I'm against letting it get into our
dump file format. I think we'll just live to regret it if we do.
regards, tom lane
Good point. The number of CSV options would be hard to support for
pg_dump. Any thoughts from anyone on how to do that cleanly? Could we
just support the default behavior?
---------------------------------------------------------------------------
Tom Lane wrote:
"Andrew Dunstan" <andrew@dunslane.net> writes:
I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.Supporting all of the CSV options in pg_dump would involve major bloat
in its option set, and it already has far too many options. If it were
just a matter of adding a "--csv" switch I wouldn't be complaining, but
there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one. Propagating
all that cruft through pg_dump would be a PITA, and updating it to track
future additions would be too.Furthermore, the entire rationale for the feature is predicated on the
claim that programs other than pg_restore might find it useful. But
this conveniently ignores the fact that if there are any such programs
in existence, what this will really do is BREAK them, because they won't
be able to cope with all the variants that pass for CSV.My opinions would be less negative if I thought that CSV were a
well-defined format that would never change. I don't believe that it
has either property, however, and so I'm against letting it get into our
dump file format. I think we'll just live to regret it if we do.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Tue, Jun 13, 2006 at 10:20:53AM -0400, Bruce Momjian wrote:
Good point. The number of CSV options would be hard to support for
pg_dump. Any thoughts from anyone on how to do that cleanly? Could we
just support the default behavior?
What this tells me is that we need a tool somewhere between psql and
pg_dump, say, pgquery. It's sole purpose in life is to generate output
from various queries. Because it's a seperate tool there's no question
of psql or pg_dump being able to parse them.
While you're at it, you could add modules to support many different
output styles, like CSV, XML, Excel format, HTML, etc.
This I beleive would take the load off psql to provide many different
output styles, as well as the load off pg_dump to produce
parsable-by-third-party output.
Thoughts?
Side note: In my experience Excel happily slurps up tab delimited
output, so I'm not sure why all of this is an issue in the first place.
Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On Jun 13 10:20, Bruce Momjian wrote:
Good point. The number of CSV options would be hard to support for
pg_dump. Any thoughts from anyone on how to do that cleanly? Could we
just support the default behavior?
IMHO, it might be better if we'd support a syntax like
pg_dump --csv=opt0,para0:opt2,opt3
This can save us from the pg_dump parameter pollution a little bit.
Furthermore, I think CSV format for the dump files can be maintained
better under an external project. (pgFoundry?) By this way, main
developers will be able to cope with their own core problems while
other users/developers can contribute on the CSV code easily. And if
any user will ever want to get CSV functionality in the pg_dump,
he/she will just issue a --csv parameter (with the above syntax) and
pg_dump will make a suitable dlopen() call for the related (CSV)
module. Anyway, this is just an idea for modularity; but the main
thing I try to underline is to give pg_dump a module functionality for
similar problems.
Regards.
Tom Lane wrote:
there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one.
This is a bit unfair. The feature was introduced in 8.0, and slightly
enhanced in 8.1. There have not been any additional CSV features this
release cycle unless my memory is worse than I thought, and I at least
have said previously that I will be resistant to the addition of further
CSV options.
My thoughts regarding options for pg_dump was actually to provide a much
smaller set than the full set available with COPY, specifically to
provide for using a single rather than a double quote char, and optional
header lines - no alternate escape or delimiter, and no FORCE QUOTE
(FORCE NOT NULL isn't relevant as it is only useful for non-postgres
derived data). At least that would be a reasonable starting point, and
would I believe cater for the vast majority of uses.
cheers
andrew