Add encoding support to COPY

Started by David Blewettover 16 years ago11 messages
#1David Blewett
david@dawninglight.net

Today on IRC, someone was wondering what the preferred method of
exporting data in a specific encoding via COPY was. They reply was
wrapping the COPY command in "set client_encoding='foo';", which made
me wonder how hard it would be to add an additional WITH parameter to
the actual COPY statement to specify the encoding, a la:
[ [ WITH ]
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] 'delimiter' ]
[ ENCODING [ AS ] 'charset' ]
[ NULL [ AS ] 'null string' ]
[ CSV [ HEADER ]
[ QUOTE [ AS ] 'quote' ]
[ ESCAPE [ AS ] 'escape' ]
[ FORCE QUOTE column [, ...] ]

Any objections? It seems like a cleaner solution client side than
issuing multiple calls to set the client_encoding. If there are no
objections, I can attempt to prepare a patch for the next commitfest.

David Blewett

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Blewett (#1)
Re: Add encoding support to COPY

David Blewett <david@dawninglight.net> writes:

Today on IRC, someone was wondering what the preferred method of
exporting data in a specific encoding via COPY was. They reply was
wrapping the COPY command in "set client_encoding='foo';", which made
me wonder how hard it would be to add an additional WITH parameter to
the actual COPY statement to specify the encoding, a la:

What is the point? You'd generally have client_encoding set correctly
for your usage anyway, and if you did not, the data could confuse your
client-side code terribly. Offering an option to let the backend send
data in the "wrong" encoding does NOT seem like a good idea to me.

regards, tom lane

#3David Blewett
david@dawninglight.net
In reply to: Tom Lane (#2)
Re: Add encoding support to COPY

On Wed, Jul 15, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

What is the point?  You'd generally have client_encoding set correctly
for your usage anyway, and if you did not, the data could confuse your
client-side code terribly.  Offering an option to let the backend send
data in the "wrong" encoding does NOT seem like a good idea to me.

The use case was that the client connection was using one encoding,
but needed to output the file in a different encoding. So they would
have to do the "set client_encoding" dance each time they wanted to
export the file. I don't see how it's "wrong", especially considering
there is already a method to do this, albeit cumbersome. I consider it
simply syntactic sugar over existing functionality.

David Blewett

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Blewett (#3)
Re: Add encoding support to COPY

David Blewett <david@dawninglight.net> writes:

On Wed, Jul 15, 2009 at 12:04 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

What is the point? �You'd generally have client_encoding set correctly
for your usage anyway, and if you did not, the data could confuse your
client-side code terribly. �Offering an option to let the backend send
data in the "wrong" encoding does NOT seem like a good idea to me.

The use case was that the client connection was using one encoding,
but needed to output the file in a different encoding. So they would
have to do the "set client_encoding" dance each time they wanted to
export the file.

Well, it might make sense to allow an ENCODING option attached to a COPY
with a file source/destination. I remain of the opinion that overriding
client_encoding on a transfer to/from the client is a bad idea.

regards, tom lane

#5Nagy Karoly Gabriel
nagy.karoly@expert-erp.net
In reply to: David Blewett (#1)
Re: Add encoding support to COPY

David Blewett wrote:

Today on IRC, someone was wondering what the preferred method of
exporting data in a specific encoding via COPY was. They reply was
wrapping the COPY command in "set client_encoding='foo';", which made
me wonder how hard it would be to add an additional WITH parameter to
the actual COPY statement to specify the encoding, a la:
[ [ WITH ]
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] 'delimiter' ]
[ ENCODING [ AS ] 'charset' ]
[ NULL [ AS ] 'null string' ]
[ CSV [ HEADER ]
[ QUOTE [ AS ] 'quote' ]
[ ESCAPE [ AS ] 'escape' ]
[ FORCE QUOTE column [, ...] ]

Any objections? It seems like a cleaner solution client side than
issuing multiple calls to set the client_encoding. If there are no
objections, I can attempt to prepare a patch for the next commitfest.

David Blewett

I think that I was the one who wondered about that. Our use case is
related to moving data between different servers which have different
encodings. Ofcourse the encoding should be an option only when COPY
involves files.

--
Nagy Karoly Gabriel
Expert Software Group SRL

(o__ 417495 Sanmartin nr. 205
//\' Bihor, Romania
V_/_ Tel./Fax: +4 0259 317 142, +4 0259 317 143

#6David Blewett
david@dawninglight.net
In reply to: Tom Lane (#4)
Re: Add encoding support to COPY

Apologies to Tom for the duplicate...

On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Well, it might make sense to allow an ENCODING option attached to a COPY
with a file source/destination.  I remain of the opinion that overriding
client_encoding on a transfer to/from the client is a bad idea.

I really don't see how it is any different from manually flipping the
client_encoding before/after the transfer. We could of course put a
warning sign in the docs, but it seems to me it's more error prone for
clients to set the client_encoding manually rather than include an
option for a single command. What happens if an exception is thrown
during the COPY process and the client doesn't handle things
correctly? The rest of their session could be in an unexpected
encoding, whereas with this method we know to return to the original
client_encoding before doing anything else. By including the encoding
option, their explicitly saying how they want to handle the data.

I could see a use case for remote client code to do a COPY to STDOUT,
that is actually being redirected to a file. If the consensus is for
local file-based operations only, however, I can structure the patch
that way.

David

#7Bernd Helmle
mailings@oopsware.de
In reply to: Nagy Karoly Gabriel (#5)
Re: Add encoding support to COPY

--On 15. Juli 2009 19:59:56 +0300 Nagy Karoly Gabriel
<nagy.karoly@expert-erp.net> wrote:

I think that I was the one who wondered about that. Our use case is
related to moving data between different servers which have different
encodings. Ofcourse the encoding should be an option only when COPY
involves files.

I find this rather confusing: can't you just tell via client_encoding the
correct encoding your file contains during restore?

--
Thanks

Bernd

#8Alvaro Herrera
alvherre@commandprompt.com
In reply to: David Blewett (#6)
Re: Add encoding support to COPY

David Blewett wrote:

On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Well, it might make sense to allow an ENCODING option attached to a COPY
with a file source/destination. �I remain of the opinion that overriding
client_encoding on a transfer to/from the client is a bad idea.

I could see a use case for remote client code to do a COPY to STDOUT,
that is actually being redirected to a file. If the consensus is for
local file-based operations only, however, I can structure the patch
that way.

Yeah, the problem is that reading to/from files is only allowed to
superusers ...

(I'm not sure how this affects \copy in psql; probably something you
should investigate)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Blewett (#6)
Re: Add encoding support to COPY

David Blewett <david@dawninglight.net> writes:

On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Well, it might make sense to allow an ENCODING option attached to a COPY
with a file source/destination. �I remain of the opinion that overriding
client_encoding on a transfer to/from the client is a bad idea.

I really don't see how it is any different from manually flipping the
client_encoding before/after the transfer.

The difference is that the client-side code gets told that the encoding
changed if you do the latter.

regards, tom lane

#10David Blewett
david@dawninglight.net
In reply to: Tom Lane (#9)
Re: Add encoding support to COPY

On Wed, Jul 15, 2009 at 4:07 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

David Blewett <david@dawninglight.net> writes:

On Wed, Jul 15, 2009 at 12:17 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Well, it might make sense to allow an ENCODING option attached to a COPY
with a file source/destination.  I remain of the opinion that overriding
client_encoding on a transfer to/from the client is a bad idea.

I really don't see how it is any different from manually flipping the
client_encoding before/after the transfer.

The difference is that the client-side code gets told that the encoding
changed if you do the latter.

Do you mean at the protocol level?

All I was planning on having the patch do is the equivalent of the set
client_encoding dance. Wouldn't that be sufficent to notify the client
of the encoding change?

David Blewett

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Nagy Karoly Gabriel (#5)
Re: Add encoding support to COPY

Nagy Karoly Gabriel wrote:

David Blewett wrote:

Today on IRC, someone was wondering what the preferred method of
exporting data in a specific encoding via COPY was. They reply was
wrapping the COPY command in "set client_encoding='foo';", which made
me wonder how hard it would be to add an additional WITH parameter to
the actual COPY statement to specify the encoding, a la:

I think that I was the one who wondered about that. Our use case is
related to moving data between different servers which have different
encodings. Ofcourse the encoding should be an option only when COPY
involves files.

Well, that is the case that there seems to be consensus about, and it's
also the case that can't be done via client encoding. We tend to have a
bias against providing lots of ways to do the same thing, so let's go
with this case (i.e. do it for cases other than STDIN/STDOUT).

cheers

andrew