Proposed TODO: --encoding option for pg_dump

Started by Josh Berkusover 20 years ago10 messages
#1Josh Berkus
josh@agliodbs.com

Folks,

There's no time to do this for 8.1, but I'd like to get it on the books for
8.2:

The Problem: Occassionally a DBA needs to dump a database to a new
encoding. In instances where the current encoding, (or lack of an
encoding, like SQL_ASCII) is poorly supported on the target database
server, it can be useful to dump into a particular encoding. But,
currently the only way to set the encoding of a pg_dump file is to change
client_encoding in postgresql.conf and restart postmaster. This is more
than a little awkward for production systems.

The TODO: add an --encoding=[encoding name] option to pg_dump. This would
set client_encoding for pg_dump's session(s).

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Josh Berkus (#1)
Re: Proposed TODO: --encoding option for pg_dump

Josh Berkus wrote:

currently the only way to set the encoding of a pg_dump file is to
change client_encoding in postgresql.conf and restart postmaster.

Another way is to set the environment variable PGCLIENTENCODING.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#3Kris Jurka
books@ejurka.com
In reply to: Josh Berkus (#1)
Re: Proposed TODO: --encoding option for pg_dump

On Tue, 28 Jun 2005, Josh Berkus wrote:

The TODO: add an --encoding=[encoding name] option to pg_dump. This would
set client_encoding for pg_dump's session(s).

What about just using the PGCLIENTENCODING environment variable?

Kris Jurka

#4Magnus Hagander
mha@sollentuna.net
In reply to: Kris Jurka (#3)
2 attachment(s)
Re: [HACKERS] Proposed TODO: --encoding option for pg_dump

There's no time to do this for 8.1, but I'd like to get it on
the books for
8.2:

The Problem: Occassionally a DBA needs to dump a database to a new
encoding. In instances where the current encoding, (or lack of an
encoding, like SQL_ASCII) is poorly supported on the target
database server, it can be useful to dump into a particular
encoding. But, currently the only way to set the encoding of
a pg_dump file is to change
client_encoding in postgresql.conf and restart postmaster.
This is more
than a little awkward for production systems.

The TODO: add an --encoding=[encoding name] option to
pg_dump. This would set client_encoding for pg_dump's session(s).

I *think* that's easy enough to do in time for 8.1. Trivial patch
attached. I hope it's enough :-) It passed my very quick testing...

(Yup, I read the mails aobut PGCLIENTENCODING, but an option to pg_dump
is certainly easier)

//Magnus

Attachments:

pg_dump.diffapplication/octet-stream; name=pg_dump.diffDownload
*** pg_dump.c.orig	2005-06-28 20:49:10.000000000 +0100
--- pg_dump.c	2005-06-28 21:16:21.000000000 +0100
***************
*** 183,188 ****
--- 183,189 ----
  	const char *pghost = NULL;
  	const char *pgport = NULL;
  	const char *username = NULL;
+ 	const char *dumpencoding = NULL;
  	bool		oids = false;
  	TableInfo  *tblinfo;
  	int			numTables;
***************
*** 229,234 ****
--- 230,236 ----
  		{"no-privileges", no_argument, NULL, 'x'},
  		{"no-acl", no_argument, NULL, 'x'},
  		{"compress", required_argument, NULL, 'Z'},
+ 		{"encoding", required_argument, NULL, 'E'},
  		{"help", no_argument, NULL, '?'},
  		{"version", no_argument, NULL, 'V'},
  
***************
*** 277,283 ****
  		}
  	}
  
! 	while ((c = getopt_long(argc, argv, "abcCdDf:F:h:in:oOp:RsS:t:uU:vWxX:Z:",
  							long_options, &optindex)) != -1)
  	{
  		switch (c)
--- 279,285 ----
  		}
  	}
  
! 	while ((c = getopt_long(argc, argv, "abcCdDE:f:F:h:in:oOp:RsS:t:uU:vWxX:Z:",
  							long_options, &optindex)) != -1)
  	{
  		switch (c)
***************
*** 309,314 ****
--- 311,320 ----
  				attrNames = true;
  				break;
  
+ 			case 'E':			/* Dump encoding */
+ 				dumpencoding = optarg;
+ 				break;
+ 
  			case 'f':
  				filename = optarg;
  				break;
***************
*** 533,538 ****
--- 539,553 ----
  	/* Set the datestyle to ISO to ensure the dump's portability */
  	do_sql_command(g_conn, "SET DATESTYLE = ISO");
  
+ 	/* Set the client encoding */
+ 	if (dumpencoding)
+ 	{
+ 		char *cmd = malloc(strlen(dumpencoding) + 32);
+ 		sprintf(cmd,"SET client_encoding='%s'", dumpencoding);
+ 		do_sql_command(g_conn, cmd);
+ 		free(cmd);
+ 	}
+ 
  	/*
  	 * If supported, set extra_float_digits so that we can dump float data
  	 * exactly (given correctly implemented float I/O code, anyway)
***************
*** 675,680 ****
--- 690,696 ----
  	printf(_("  -C, --create             include commands to create database in dump\n"));
  	printf(_("  -d, --inserts            dump data as INSERT, rather than COPY, commands\n"));
  	printf(_("  -D, --column-inserts     dump data as INSERT commands with column names\n"));
+ 	printf(_("  -E, --encoding=ENCODING  dump the data in encoding ENCODING\n"));
  	printf(_("  -n, --schema=SCHEMA      dump the named schema only\n"));
  	printf(_("  -o, --oids               include OIDs in dump\n"));
  	printf(_("  -O, --no-owner           skip restoration of object ownership\n"
pg_dump.sgml.diffapplication/octet-stream; name=pg_dump.sgml.diffDownload
*** pg_dump.sgml.orig	2005-06-28 21:19:28.000000000 +0100
--- pg_dump.sgml	2005-06-28 21:21:27.000000000 +0100
***************
*** 206,211 ****
--- 206,222 ----
        </listitem>
       </varlistentry>
  
+ 	 <varlistentry>
+ 	  <term><option>-E <replaceable class="parameter">encoding</replaceable></option></term>
+ 	  <listitem>
+ 	   <para>
+ 	    Create the dump in the specified encoding. By default, the dump is
+ 		created in the database encoding.
+ 	   </para>
+ 	  </listitem>
+      </varlistentry>
+ 
+ 
       <varlistentry>
        <term><option>-f <replaceable class="parameter">file</replaceable></option></term>
        <term><option>--file=<replaceable class="parameter">file</replaceable></option></term>
#5Alvaro Herrera
alvherre@surnet.cl
In reply to: Magnus Hagander (#4)
Re: [HACKERS] Proposed TODO: --encoding option for pg_dump

On Tue, Jun 28, 2005 at 10:24:19PM +0200, Magnus Hagander wrote:

I *think* that's easy enough to do in time for 8.1. Trivial patch
attached. I hope it's enough :-) It passed my very quick testing...

(Yup, I read the mails aobut PGCLIENTENCODING, but an option to pg_dump
is certainly easier)

You forgot to document the long option, I think.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"No necesitamos banderas
No reconocemos fronteras" (Jorge Gonz�lez)

#6Magnus Hagander
mha@sollentuna.net
In reply to: Alvaro Herrera (#5)
1 attachment(s)
Re: [HACKERS] Proposed TODO: --encoding option for pg_dump

I *think* that's easy enough to do in time for 8.1. Trivial patch
attached. I hope it's enough :-) It passed my very quick testing...

(Yup, I read the mails aobut PGCLIENTENCODING, but an option to
pg_dump is certainly easier)

You forgot to document the long option, I think.

Oops. Fixed. Thanks.

//Magnus

Attachments:

pg_dump.sgml.diffapplication/octet-stream; name=pg_dump.sgml.diffDownload
*** pg_dump.sgml.orig	2005-06-28 21:19:28.000000000 +0100
--- pg_dump.sgml	2005-06-28 21:41:40.000000000 +0100
***************
*** 206,211 ****
--- 206,223 ----
        </listitem>
       </varlistentry>
  
+ 	 <varlistentry>
+ 	  <term><option>-E <replaceable class="parameter">encoding</replaceable></option></term>
+ 	  <term><option>--encoding=<replaceable class="parameter">encoding</replaceable></option></term>
+ 	  <listitem>
+ 	   <para>
+ 	    Create the dump in the specified encoding. By default, the dump is
+ 		created in the database encoding.
+ 	   </para>
+ 	  </listitem>
+      </varlistentry>
+ 
+ 
       <varlistentry>
        <term><option>-f <replaceable class="parameter">file</replaceable></option></term>
        <term><option>--file=<replaceable class="parameter">file</replaceable></option></term>
#7Michael Paesold
mpaesold@gmx.at
In reply to: Magnus Hagander (#4)
Re: [HACKERS] Proposed TODO: --encoding option for pg_dump

Alvaro Herrera wrote:

On Tue, Jun 28, 2005 at 10:24:19PM +0200, Magnus Hagander wrote:

I *think* that's easy enough to do in time for 8.1. Trivial patch
attached. I hope it's enough :-) It passed my very quick testing...

(Yup, I read the mails aobut PGCLIENTENCODING, but an option to pg_dump
is certainly easier)

You forgot to document the long option, I think.

Are the man pages generated from the sgml docs? Have never had a look at
that.

Best Regards,
Michael Paesold

#8Magnus Hagander
mha@sollentuna.net
In reply to: Michael Paesold (#7)
Re: [HACKERS] Proposed TODO: --encoding option for pg_dump

I *think* that's easy enough to do in time for 8.1. Trivial patch
attached. I hope it's enough :-) It passed my very quick testing...

(Yup, I read the mails aobut PGCLIENTENCODING, but an option to
pg_dump is certainly easier)

You forgot to document the long option, I think.

Are the man pages generated from the sgml docs? Have never
had a look at that.

Yes - using docbook2man.

//Magnus

#9laser
laser@toping.com.cn
In reply to: Magnus Hagander (#4)
Re: Proposed TODO: --encoding option for pg_dump

I support to add the option, for I've been seeing too many of
our client got 'bad' dump just because they don't set PGCLIENTENCODING
correctly, (mostly because they use UTF8 as database encoding
but use some other encoding, like GBK as client encoding, but some
words break the autoconversion at present version and set
PGCLIENTENCODING to UTF8 would fix the problem).
Adding such a switch would remind DBAs there exists some encoding
conversion. In fact, I even think that we should use database encoding
to dump data regardless the PGCLIENTENCODING setting (unless
the user set the --encoding switch explicit), but I think
that might be break someone's application somewhere. :(

regards laser

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Magnus Hagander (#6)
Re: [HACKERS] Proposed TODO: --encoding option for pg_dump

Add pg_dump --encoding.

Patch applied. Thanks.

---------------------------------------------------------------------------

Magnus Hagander wrote:

I *think* that's easy enough to do in time for 8.1. Trivial patch
attached. I hope it's enough :-) It passed my very quick testing...

(Yup, I read the mails aobut PGCLIENTENCODING, but an option to
pg_dump is certainly easier)

You forgot to document the long option, I think.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073