client_encoding in dump file

Started by Pavel Stehuleover 22 years ago6 messageshackers
Jump to latest
#1Pavel Stehule
pavel.stehule@gmail.com

Hello

I send my first patch for PostgreSQL - maybe ugly patch. This patch
generate on top of dump file line with setting of current encoding. Its
useful for languages like czech with more than one wide used encoding.
We need informations about used encoding.

Regards
Pavel Stehule

Attachments:

pg_backup_archiver.difftext/plain; charset=US-ASCII; name=pg_backup_archiver.diffDownload+41-0
#2Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#1)
Re: [PATCHES] client_encoding in dump file

Pavel Stehule wrote:

Hello

I send my first patch for PostgreSQL - maybe ugly patch. This patch
generate on top of dump file line with setting of current encoding. Its
useful for languages like czech with more than one wide used encoding.
We need informations about used encoding.

I found this patch intersting. How do we deal with restoring a database
with a different encoding from the one dumped. Does having SET at the
top help? (Also, we use diff -c.)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: [PATCHES] client_encoding in dump file

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Pavel Stehule wrote:

I send my first patch for PostgreSQL - maybe ugly patch. This patch
generate on top of dump file line with setting of current encoding. Its
useful for languages like czech with more than one wide used encoding.
We need informations about used encoding.

I found this patch intersting. How do we deal with restoring a database
with a different encoding from the one dumped. Does having SET at the
top help? (Also, we use diff -c.)

Yes, the SET should help; it will result in character encoding
translation to whatever the database encoding is. This has been
discussed before, IIRC.

I was planning to commit this patch (perhaps after cleanup, haven't
looked at it yet) but it's not got to the top of the todo queue...

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: [PATCHES] client_encoding in dump file

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Pavel Stehule wrote:

I send my first patch for PostgreSQL - maybe ugly patch. This patch
generate on top of dump file line with setting of current encoding. Its
useful for languages like czech with more than one wide used encoding.
We need informations about used encoding.

I found this patch intersting. How do we deal with restoring a database
with a different encoding from the one dumped. Does having SET at the
top help? (Also, we use diff -c.)

Yes, the SET should help; it will result in character encoding
translation to whatever the database encoding is. This has been
discussed before, IIRC.

I was planning to commit this patch (perhaps after cleanup, haven't
looked at it yet) but it's not got to the top of the todo queue...

Thanks. I will just keep it in my mailbox until it is dealt with.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#1)
Re: client_encoding in dump file

Pavel Stehule <stehule@kix.fsv.cvut.cz> writes:

I send my first patch for PostgreSQL - maybe ugly patch. This patch
generate on top of dump file line with setting of current encoding. Its
useful for languages like czech with more than one wide used encoding.
We need informations about used encoding.

This patch wouldn't work in the pg_dump/pg_restore case, because it
assumes that the original connection is still accessible when the output
script is being generated. You have to make an ArchiveEntry that can
be recorded in non-text archives. I've applied the attached patch
instead, which I think does things correctly.

regards, tom lane

PS: please send future patches in "diff -c" format. With plain diff
output it's impossible to tell where insertions really go ... especially
since you did not specify what version you made the diff against ...

Index: pg_backup_archiver.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.79.2.1
diff -c -r1.79.2.1 pg_backup_archiver.c
*** pg_backup_archiver.c	4 Jan 2004 04:02:22 -0000	1.79.2.1
--- pg_backup_archiver.c	24 Feb 2004 03:27:04 -0000
***************
*** 48,53 ****
--- 48,54 ----
  		 const int compression, ArchiveMode mode);
  static int	_printTocEntry(ArchiveHandle *AH, TocEntry *te, RestoreOptions *ropt, bool isData);
+ static void _doSetFixedOutputState(ArchiveHandle *AH);
  static void _doSetSessionAuth(ArchiveHandle *AH, const char *user);
  static void _reconnectToDB(ArchiveHandle *AH, const char *dbname, const char *user);
  static void _becomeUser(ArchiveHandle *AH, const char *user);
***************
*** 205,210 ****
--- 206,216 ----
  	ahprintf(AH, "--\n-- PostgreSQL database dump\n--\n\n");
  	/*
+ 	 * Establish important parameter values right away.
+ 	 */
+ 	_doSetFixedOutputState(AH);
+ 
+ 	/*
  	 * Drop the items at the start, in reverse order
  	 */
  	if (ropt->dropSchema)
***************
*** 1703,1709 ****
  	AH->currUser = strdup("");	/* So it's valid, but we can free() it
  								 * later if necessary */
  	AH->currSchema = strdup("");	/* ditto */
- 	AH->chk_fn_bodies = true;	/* assumed default state */
  	AH->toc = (TocEntry *) calloc(1, sizeof(TocEntry));
  	if (!AH->toc)
--- 1709,1714 ----
***************
*** 1935,1940 ****
--- 1940,1949 ----
  {
  	teReqs		res = 3;		/* Schema = 1, Data = 2, Both = 3 */
+ 	/* ENCODING objects are dumped specially, so always reject here */
+ 	if (strcmp(te->desc, "ENCODING") == 0)
+ 		return 0;
+ 
  	/* If it's an ACL, maybe ignore it */
  	if (ropt->aclsSkip && strcmp(te->desc, "ACL") == 0)
  		return 0;
***************
*** 2020,2025 ****
--- 2029,2061 ----
  }
  /*
+  * Issue SET commands for parameters that we want to have set the same way
+  * at all times during execution of a restore script.
+  */
+ static void
+ _doSetFixedOutputState(ArchiveHandle *AH)
+ {
+ 	TocEntry   *te;
+ 
+ 	/* If we have an encoding setting, emit that */
+ 	te = AH->toc->next;
+ 	while (te != AH->toc)
+ 	{
+ 		if (strcmp(te->desc, "ENCODING") == 0)
+ 		{
+ 			ahprintf(AH, "%s", te->defn);
+ 			break;
+ 		}
+ 		te = te->next;
+ 	}
+ 
+ 	/* Make sure function checking is disabled */
+ 	ahprintf(AH, "SET check_function_bodies = false;\n");
+ 
+ 	ahprintf(AH, "\n");
+ }
+ 
+ /*
   * Issue a SET SESSION AUTHORIZATION command.  Caller is responsible
   * for updating state if appropriate.  If user is NULL or an empty string,
   * the specification DEFAULT will be used.
***************
*** 2100,2106 ****
  		free(AH->currSchema);
  	AH->currSchema = strdup("");

! AH->chk_fn_bodies = true; /* assumed default state */
}

  /*
--- 2136,2143 ----
  		free(AH->currSchema);
  	AH->currSchema = strdup("");

! /* re-establish fixed state */
! _doSetFixedOutputState(AH);
}

/*
***************
*** 2195,2207 ****
/* Select owner and schema as necessary */
_becomeOwner(AH, te);
_selectOutputSchema(AH, te->namespace);
-
- /* If it's a function, make sure function checking is disabled */
- if (AH->chk_fn_bodies && strcmp(te->desc, "FUNCTION") == 0)
- {
- ahprintf(AH, "SET check_function_bodies = false;\n\n");
- AH->chk_fn_bodies = false;
- }

  	if (isData)
  		pfx = "Data for ";
--- 2232,2237 ----
Index: pg_backup_archiver.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_backup_archiver.h,v
retrieving revision 1.52
diff -c -r1.52 pg_backup_archiver.h
*** pg_backup_archiver.h	3 Oct 2003 20:10:59 -0000	1.52
--- pg_backup_archiver.h	24 Feb 2004 03:27:04 -0000
***************
*** 241,247 ****
  	/* these vars track state to avoid sending redundant SET commands */
  	char	   *currUser;		/* current username */
  	char	   *currSchema;		/* current schema */
- 	bool		chk_fn_bodies;	/* current state of check_function_bodies */
  	void	   *lo_buf;
  	size_t		lo_buf_used;
--- 241,246 ----
Index: pg_dump.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.355.2.2
diff -c -r1.355.2.2 pg_dump.c
*** pg_dump.c	22 Jan 2004 19:09:48 -0000	1.355.2.2
--- pg_dump.c	24 Feb 2004 03:27:05 -0000
***************
*** 115,120 ****
--- 115,121 ----
  static const char *fmtQualifiedId(const char *schema, const char *id);
  static int	dumpBlobs(Archive *AH, char *, void *);
  static int	dumpDatabase(Archive *AH);
+ static void dumpEncoding(Archive *AH);
  static const char *getAttrName(int attrnum, TableInfo *tblInfo);
  static const char *fmtCopyColumnList(const TableInfo *ti);
***************
*** 547,552 ****
--- 548,556 ----
  			write_msg(NULL, "last built-in OID is %u\n", g_last_builtin_oid);
  	}
+ 	/* First the special encoding entry. */
+ 	dumpEncoding(g_fout);
+ 
  	/* Dump the database definition */
  	if (!dataOnly)
  		dumpDatabase(g_fout);
***************
*** 1241,1246 ****
--- 1245,1305 ----
  	destroyPQExpBuffer(creaQry);
  	return 1;
+ }
+ 
+ 
+ /*
+  * dumpEncoding: put the correct encoding into the archive
+  */
+ static void
+ dumpEncoding(Archive *AH)
+ {
+ 	PQExpBuffer qry;
+ 	PGresult   *res;
+ 
+ 	/* Can't read the encoding from pre-7.3 servers (SHOW isn't a query) */
+ 	if (AH->remoteVersion < 70300)
+ 		return;
+ 
+ 	if (g_verbose)
+ 		write_msg(NULL, "saving encoding\n");
+ 
+ 	qry = createPQExpBuffer();
+ 
+ 	appendPQExpBuffer(qry, "SHOW client_encoding");
+ 
+ 	res = PQexec(g_conn, qry->data);
+ 	if (!res ||
+ 		PQresultStatus(res) != PGRES_TUPLES_OK ||
+ 		PQntuples(res) != 1)
+ 	{
+ 		write_msg(NULL, "SQL command failed\n");
+ 		write_msg(NULL, "Error message from server: %s", PQerrorMessage(g_conn));
+ 		write_msg(NULL, "The command was: %s\n", qry->data);
+ 		exit_nicely();
+ 	}
+ 
+ 	resetPQExpBuffer(qry);
+ 
+ 	appendPQExpBuffer(qry, "SET client_encoding = ");
+ 	appendStringLiteral(qry, PQgetvalue(res, 0, 0), true);
+ 	appendPQExpBuffer(qry, ";\n");
+ 
+ 	ArchiveEntry(AH, "0",		/* OID */
+ 				 "ENCODING",	/* Name */
+ 				 NULL,			/* Namespace */
+ 				 "",			/* Owner */
+ 				 "ENCODING",	/* Desc */
+ 				 NULL,			/* Deps */
+ 				 qry->data,		/* Create */
+ 				 "",			/* Del */
+ 				 NULL,			/* Copy */
+ 				 NULL,			/* Dumper */
+ 				 NULL);			/* Dumper Arg */
+ 
+ 	PQclear(res);
+ 
+ 	destroyPQExpBuffer(qry);
  }
#6Bruce Momjian
bruce@momjian.us
In reply to: Pavel Stehule (#1)
Re: client_encoding in dump file

Pavel Stehule wrote:

Hello

I send my first patch for PostgreSQL - maybe ugly patch. This patch
generate on top of dump file line with setting of current encoding. Its
useful for languages like czech with more than one wide used encoding.
We need informations about used encoding.

FYI, this was fixed in 7.4.2.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073