CSV mode option for pg_dump

Started by Andrew Dunstanalmost 20 years ago48 messageshackers
Jump to latest
#1Andrew Dunstan
andrew@dunslane.net

Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.

This should be a pretty low cost item, I expect (good newbie project?)

thoughts?

cheers

andrew

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Andrew Dunstan (#1)
Re: CSV mode option for pg_dump

Andrew Dunstan wrote:

Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.

This should be a pretty low cost item, I expect (good newbie project?)

Uhh... just about any application that can import CSV can import our
dumps. It just tell it the delimiter is a tab.

Joshua D. Drake

thoughts?

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Joshua D. Drake (#2)
Re: CSV mode option for pg_dump

Joshua D. Drake wrote:

Andrew Dunstan wrote:

Something someone said on IRC just now triggered a little memory ...
I think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in
the case of text dumps, but in custom or tar dumps where you might
want to get at individual data members, having an option for CSVs
that you want to load into some other product might be nice.

This should be a pretty low cost item, I expect (good newbie project?)

Uhh... just about any application that can import CSV can import our
dumps. It just tell it the delimiter is a tab.

No it won't, not if there are tabs in the data.

Why do you think we did CSV in the first place? Precisely because our
dump does *not* work as a general export mechanism for arbitrary data.

cheers

andrew

#4Josh Berkus
josh@agliodbs.com
In reply to: Joshua D. Drake (#2)
Re: CSV mode option for pg_dump

Josh,

This should be a pretty low cost item, I expect (good newbie project?)

Uhh... just about any application that can import CSV can import our
dumps. It just tell it the delimiter is a tab.

Actually, there was an Summer of Code applcation to do this, but with all
the other nifty stuff it wasn't accepted -- partly because we weren't sure
that we wanted a CSV mode for dumps, partly because this should be a
weekend fix, not a 3-month project.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: CSV mode option for pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.

This is silly. You'd just COPY the particular table you want, not use
pg_dump. pg_dump's already got an unreasonably large number of options
without adding ones that have essentially zero use. Also, I think there
are sufficient grounds to worry about whether a CSV dump would always
reload correctly --- we already know that that's a poorly thought out
"standard".

regards, tom lane

#6Joshua D. Drake
jd@commandprompt.com
In reply to: Andrew Dunstan (#3)
Re: CSV mode option for pg_dump

No it won't, not if there are tabs in the data.

<snipping noise>

Hmmm then would just double quoting the data work? At least in OOCalc
(and IIRC Excel) there is the ability to select a text delimiter.

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#5)
Re: CSV mode option for pg_dump

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.

This is silly. You'd just COPY the particular table you want, not use
pg_dump. pg_dump's already got an unreasonably large number of options
without adding ones that have essentially zero use. Also, I think there
are sufficient grounds to worry about whether a CSV dump would always
reload correctly --- we already know that that's a poorly thought out
"standard".

Well, if you have dozens or hundreds of tables it might well be more
convenient.

As for not reloading - I went to some trouble to make sure that we could
reload what we dumped, exactly, unless the force options are used. I
might have made a bug in that, but it isn't dependent on the particular
CSV format used.

Naturally you won't have a use for it, but I suspect others might (in
which case they had better speak up ;-) )

I suppose the alternative would be to write a little tool in perl or
whatever to do the same thing for you. Maybe a good pgfoundry project.

cheers

andrew

#8Bill Bartlett
bbartlett@softwareanalytics.com
In reply to: Andrew Dunstan (#7)
Re: CSV mode option for pg_dump

Here's me speaking up -- I'd definitely use it! As a quick way to pull
data into Excel to do basic reports or analysis, a CSV format would be
great. Some of our users currently pull data into Excel for quickie
analysis, but creating fixed-width data via psql requires them to parse
the data and dumping anything via pg_dump with any delimiter (tabs,
etc.) usually doesn't work due to the delimiters being embedded in the
real data.

- Bill

Show quoted text

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
Andrew Dunstan
Sent: Monday, June 12, 2006 2:16 PM
To: Tom Lane
Cc: PG Hackers
Subject: Re: [HACKERS] CSV mode option for pg_dump

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Something someone said on IRC just now triggered a little

memory ...

I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much

importance in the

case of text dumps, but in custom or tar dumps where you

might want to

get at individual data members, having an option for CSVs

that you want

to load into some other product might be nice.

This is silly. You'd just COPY the particular table you

want, not use

pg_dump. pg_dump's already got an unreasonably large number of
options without adding ones that have essentially zero use.

Also, I

think there are sufficient grounds to worry about whether a

CSV dump

would always reload correctly --- we already know that

that's a poorly

thought out "standard".

Well, if you have dozens or hundreds of tables it might well be more
convenient.

As for not reloading - I went to some trouble to make sure
that we could
reload what we dumped, exactly, unless the force options are used. I
might have made a bug in that, but it isn't dependent on the
particular
CSV format used.

Naturally you won't have a use for it, but I suspect others might (in
which case they had better speak up ;-) )

I suppose the alternative would be to write a little tool in perl or
whatever to do the same thing for you. Maybe a good pgfoundry project.

cheers

andrew

---------------------------(end of
broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

#9Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andrew Dunstan (#7)
Re: CSV mode option for pg_dump

On Mon, Jun 12, 2006 at 02:15:59PM -0400, Andrew Dunstan wrote:

Naturally you won't have a use for it, but I suspect others might (in
which case they had better speak up ;-) )

I'd bet that those who would find this useful are far more likely to be
on -general and not in here.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

#10Joshua D. Drake
jd@commandprompt.com
In reply to: Bill Bartlett (#8)
Re: CSV mode option for pg_dump

Bill Bartlett wrote:

Here's me speaking up -- I'd definitely use it! As a quick way to pull
data into Excel to do basic reports or analysis, a CSV format would be
great.

Why not just use ODBC?

Joshua D. Drake
--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#11Bill Bartlett
bbartlett@softwareanalytics.com
In reply to: Joshua D. Drake (#10)
Re: CSV mode option for pg_dump

Can't -- the main production database is over at a CoLo site with access
only available via SSH, and tightly-restricted SSH at that. Generally
one of the developers will SSH over to the server, pull out whatever
data is needed into a text file via psql or pg_dump, scp the file(s)
back here and send them to the user.

We're working on implementing a full-blown reporting system to eliminate
some of this need, but it's not quite there yet.

However, CSV is a very handy format to use to send data to other people
(especially "less technical" users), so even in the future the ability
to export to CSV would be handy.

- Bill

Show quoted text

-----Original Message-----
From: Joshua D. Drake [mailto:jd@commandprompt.com]
Sent: Monday, June 12, 2006 4:15 PM
To: Bill Bartlett
Cc: 'Andrew Dunstan'; 'Tom Lane'; 'PG Hackers'
Subject: Re: [HACKERS] CSV mode option for pg_dump

Bill Bartlett wrote:

Here's me speaking up -- I'd definitely use it! As a

quick way to pull

data into Excel to do basic reports or analysis, a CSV

format would be

great.

Why not just use ODBC?

Joshua D. Drake
--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency:
+1.800.492.2240
Providing the most comprehensive PostgreSQL solutions
since 1997
http://www.commandprompt.com/

#12Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: CSV mode option for pg_dump

Already on TODO:

pg_dump:

o %Add CSV output format

---------------------------------------------------------------------------

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Something someone said on IRC just now triggered a little memory ... I
think we should provide an option to have pg_dump work in CSV mode
rather than text mode. This probably doesn't have much importance in the
case of text dumps, but in custom or tar dumps where you might want to
get at individual data members, having an option for CSVs that you want
to load into some other product might be nice.

This is silly. You'd just COPY the particular table you want, not use
pg_dump. pg_dump's already got an unreasonably large number of options
without adding ones that have essentially zero use. Also, I think there
are sufficient grounds to worry about whether a CSV dump would always
reload correctly --- we already know that that's a poorly thought out
"standard".

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#13Matthew T. O'Connor
matthew@zeut.net
In reply to: Bill Bartlett (#11)
Re: CSV mode option for pg_dump

Bill Bartlett wrote:

Can't -- the main production database is over at a CoLo site with access
only available via SSH, and tightly-restricted SSH at that. Generally
one of the developers will SSH over to the server, pull out whatever
data is needed into a text file via psql or pg_dump, scp the file(s)
back here and send them to the user.

ODBC over an SSH tunnnel?

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Matthew T. O'Connor (#13)
Re: CSV mode option for pg_dump

Matthew T. OConnor said:

Bill Bartlett wrote:

Can't -- the main production database is over at a CoLo site with
access only available via SSH, and tightly-restricted SSH at that.
Generally one of the developers will SSH over to the server, pull out
whatever data is needed into a text file via psql or pg_dump, scp the
file(s) back here and send them to the user.

ODBC over an SSH tunnnel?

I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.

I know many people work in a Postgres only world. I wish everybody did, and
then we could just forget about things like CSV. They don't, so we can't.

I think I have said this before, but I'll say it again. From time to time
people thank me for things I have done for Postgres. The two things that
stand out BY FAR on the list of these are CSV import/export and dollar
quoting. This is a widely used feature.

cheers

andrew

#15Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#14)
Re: CSV mode option for pg_dump

Andrew Dunstan wrote:

Matthew T. OConnor said:

Bill Bartlett wrote:

Can't -- the main production database is over at a CoLo site with
access only available via SSH, and tightly-restricted SSH at that.
Generally one of the developers will SSH over to the server, pull out
whatever data is needed into a text file via psql or pg_dump, scp the
file(s) back here and send them to the user.

ODBC over an SSH tunnnel?

I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.

I know many people work in a Postgres only world. I wish everybody did, and
then we could just forget about things like CSV. They don't, so we can't.

I think I have said this before, but I'll say it again. From time to time
people thank me for things I have done for Postgres. The two things that
stand out BY FAR on the list of these are CSV import/export and dollar
quoting. This is a widely used feature.

I think the bottom line is that "ease of use" isn't as high enough on
the project's priority list as you (and others) think it should be.

I personally feel as you do that we should value "ease of use" more.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#14)
Re: CSV mode option for pg_dump

"Andrew Dunstan" <andrew@dunslane.net> writes:

I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.

Supporting all of the CSV options in pg_dump would involve major bloat
in its option set, and it already has far too many options. If it were
just a matter of adding a "--csv" switch I wouldn't be complaining, but
there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one. Propagating
all that cruft through pg_dump would be a PITA, and updating it to track
future additions would be too.

Furthermore, the entire rationale for the feature is predicated on the
claim that programs other than pg_restore might find it useful. But
this conveniently ignores the fact that if there are any such programs
in existence, what this will really do is BREAK them, because they won't
be able to cope with all the variants that pass for CSV.

My opinions would be less negative if I thought that CSV were a
well-defined format that would never change. I don't believe that it
has either property, however, and so I'm against letting it get into our
dump file format. I think we'll just live to regret it if we do.

regards, tom lane

#17Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#16)
Re: CSV mode option for pg_dump

Good point. The number of CSV options would be hard to support for
pg_dump. Any thoughts from anyone on how to do that cleanly? Could we
just support the default behavior?

---------------------------------------------------------------------------

Tom Lane wrote:

"Andrew Dunstan" <andrew@dunslane.net> writes:

I wish I could understand why people are so keen to make other people turn
handsprings in order to avoid a feature which, as Bruce points out, is
already on the TODO list, and which, by my 10 minute analysis, would involve
almost trivial code impact and risk. If this involved major impact I might
understand, but it really doesn't.

Supporting all of the CSV options in pg_dump would involve major bloat
in its option set, and it already has far too many options. If it were
just a matter of adding a "--csv" switch I wouldn't be complaining, but
there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one. Propagating
all that cruft through pg_dump would be a PITA, and updating it to track
future additions would be too.

Furthermore, the entire rationale for the feature is predicated on the
claim that programs other than pg_restore might find it useful. But
this conveniently ignores the fact that if there are any such programs
in existence, what this will really do is BREAK them, because they won't
be able to cope with all the variants that pass for CSV.

My opinions would be less negative if I thought that CSV were a
well-defined format that would never change. I don't believe that it
has either property, however, and so I'm against letting it get into our
dump file format. I think we'll just live to regret it if we do.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#18Martijn van Oosterhout
kleptog@svana.org
In reply to: Bruce Momjian (#17)
Re: CSV mode option for pg_dump

On Tue, Jun 13, 2006 at 10:20:53AM -0400, Bruce Momjian wrote:

Good point. The number of CSV options would be hard to support for
pg_dump. Any thoughts from anyone on how to do that cleanly? Could we
just support the default behavior?

What this tells me is that we need a tool somewhere between psql and
pg_dump, say, pgquery. It's sole purpose in life is to generate output
from various queries. Because it's a seperate tool there's no question
of psql or pg_dump being able to parse them.

While you're at it, you could add modules to support many different
output styles, like CSV, XML, Excel format, HTML, etc.

This I beleive would take the load off psql to provide many different
output styles, as well as the load off pg_dump to produce
parsable-by-third-party output.

Thoughts?

Side note: In my experience Excel happily slurps up tab delimited
output, so I'm not sure why all of this is an issue in the first place.

Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#19Volkan YAZICI
yazicivo@ttnet.net.tr
In reply to: Bruce Momjian (#17)
Re: CSV mode option for pg_dump

On Jun 13 10:20, Bruce Momjian wrote:

Good point. The number of CSV options would be hard to support for
pg_dump. Any thoughts from anyone on how to do that cleanly? Could we
just support the default behavior?

IMHO, it might be better if we'd support a syntax like

pg_dump --csv=opt0,para0:opt2,opt3

This can save us from the pg_dump parameter pollution a little bit.

Furthermore, I think CSV format for the dump files can be maintained
better under an external project. (pgFoundry?) By this way, main
developers will be able to cope with their own core problems while
other users/developers can contribute on the CSV code easily. And if
any user will ever want to get CSV functionality in the pg_dump,
he/she will just issue a --csv parameter (with the above syntax) and
pg_dump will make a suitable dlopen() call for the related (CSV)
module. Anyway, this is just an idea for modularity; but the main
thing I try to underline is to give pg_dump a module functionality for
similar problems.

Regards.

#20Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#16)
Re: CSV mode option for pg_dump

Tom Lane wrote:

there are half a dozen more sub-options, and it seems like every time we
turn around someone is finding a reason for another one.

This is a bit unfair. The feature was introduced in 8.0, and slightly
enhanced in 8.1. There have not been any additional CSV features this
release cycle unless my memory is worse than I thought, and I at least
have said previously that I will be resistant to the addition of further
CSV options.

My thoughts regarding options for pg_dump was actually to provide a much
smaller set than the full set available with COPY, specifically to
provide for using a single rather than a double quote char, and optional
header lines - no alternate escape or delimiter, and no FORCE QUOTE
(FORCE NOT NULL isn't relevant as it is only useful for non-postgres
derived data). At least that would be a reasonable starting point, and
would I believe cater for the vast majority of uses.

cheers

andrew

#21Joshua D. Drake
jd@commandprompt.com
In reply to: Bruce Momjian (#17)
#22Andrew Dunstan
andrew@dunslane.net
In reply to: Volkan YAZICI (#19)
#23Andrew Dunstan
andrew@dunslane.net
In reply to: Martijn van Oosterhout (#18)
#24Rod Taylor
rbt@rbt.ca
In reply to: Bill Bartlett (#11)
#25Steve Atkins
steve@blighty.com
In reply to: Martijn van Oosterhout (#18)
#26Joshua D. Drake
jd@commandprompt.com
In reply to: Rod Taylor (#24)
#27PFC
lists@peufeu.com
In reply to: Volkan YAZICI (#19)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: PFC (#27)
#29Joshua D. Drake
jd@commandprompt.com
In reply to: Steve Atkins (#25)
#30Bill Bartlett
bbartlett@softwareanalytics.com
In reply to: Rod Taylor (#24)
#31Steve Atkins
steve@blighty.com
In reply to: Joshua D. Drake (#29)
#32Andrew Dunstan
andrew@dunslane.net
In reply to: Steve Atkins (#25)
#33Bruce Momjian
bruce@momjian.us
In reply to: Joshua D. Drake (#29)
#34Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#28)
#36Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#35)
#37Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#37)
#39Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#39)
#41Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#40)
#42Martijn van Oosterhout
kleptog@svana.org
In reply to: Joshua D. Drake (#26)
#43Martijn van Oosterhout
kleptog@svana.org
In reply to: Bruce Momjian (#41)
#44Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#36)
#45Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#39)
#46Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#45)
#47Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#44)
#48Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#45)