dollar quoting and pg_dump

Started by Andrew Dunstanalmost 22 years ago11 messages
#1Andrew Dunstan
andrew@dunslane.net

I had a brief look at this today. Basically, I thought of adding a new
routine to dumputils.c thus:

void appendStringLiteralDQ(PQExpBuffer buf, const char *str, const
char *dqprefix)

and using it in dumping function bodies and comments on all objects,
with a prefix argument of "function" and "comment" respectively. There
might be other places where we want to use dollar quoting, but this
would be a good start, ISTM.

Basically, this routine would start with $ (+ dqprefix if not null) and
then keep adding characters (in turn "_1234567890") until that string
was not found in str, then appending "$" and using that as the delimiter.

Thoughts?

cheers

andrew

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#1)
Re: dollar quoting and pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

... using it in dumping function bodies and comments on all objects,
with a prefix argument of "function" and "comment" respectively. There
might be other places where we want to use dollar quoting, but this
would be a good start, ISTM.

Do we really need to be that verbose? Why not start with the minimal $$
and extend only if needed? On the KISS principle, trying "$$", "$X$",
"$XX$", "$XXX$", etc seems sufficient.

For that matter, I'm not convinced we should use $$ for comments. They
don't have nearly the problem that functions do with embedded quotes.

A thought: maybe just put this logic into the regular
appendStringLiteral routine, and trigger it when the string contains any
quotes or backslashes; if it has none, you can just use quotes ...

BTW, I've been holding off making this change myself, realizing that it
will completely break backwards compatibility of pg_dump output to 7.4
and earlier. Not sure if anyone is trying to use CVS tip pg_dump with
older releases, but it seems possible given that the dump ordering issue
is finally solved. Might be a good idea to make it disablable with a
fallback to regular quoting.

regards, tom lane

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#2)
Re: dollar quoting and pg_dump

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

... using it in dumping function bodies and comments on all objects,
with a prefix argument of "function" and "comment" respectively. There
might be other places where we want to use dollar quoting, but this
would be a good start, ISTM.

Do we really need to be that verbose? Why not start with the minimal $$
and extend only if needed? On the KISS principle, trying "$$", "$X$",
"$XX$", "$XXX$", etc seems sufficient.

It's a matter of taste, I guess. I'm certainly not dogmatic about it.
The function design in my head is flexible enough for either.

For that matter, I'm not convinced we should use $$ for comments. They
don't have nearly the problem that functions do with embedded quotes.

Well, I keep a master schema file and I like to decorate it with fairly
verbose comments, so users can see what the object is for and how it
works. I've been caught a few times with forgetting to double quotes
inside the comments. But again, Maybe you are right, and we should at
least start with just the obvious case.

A thought: maybe just put this logic into the regular
appendStringLiteral routine, and trigger it when the string contains any
quotes or backslashes; if it has none, you can just use quotes ...

I did think of fallback, but rejected it on the KISS principle :-) I
also prefer consistency in style - I want all my functions dollar quoted
even if they don't currently contain characters in need of escape.

BTW, I've been holding off making this change myself, realizing that it
will completely break backwards compatibility of pg_dump output to 7.4
and earlier. Not sure if anyone is trying to use CVS tip pg_dump with
older releases, but it seems possible given that the dump ordering issue
is finally solved. Might be a good idea to make it disablable with a
fallback to regular quoting.

Makes sense. "-X disable-dollar-quoting"? Or we could have it turned off
by default and require it to be specifically turned on - that might
conform to the principle of least surprise, at least for now.

For that matter, we could also have a "verbose-dollar-quoting" feature,
and/or a "dollar-quote-objects=functions,comments,......" feature

But let's walk before we start to run ;-)

cheers

andrew

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#3)
Re: dollar quoting and pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

A thought: maybe just put this logic into the regular
appendStringLiteral routine, and trigger it when the string contains any
quotes or backslashes; if it has none, you can just use quotes ...

I did think of fallback, but rejected it on the KISS principle :-) I
also prefer consistency in style - I want all my functions dollar quoted
even if they don't currently contain characters in need of escape.

Good point. Never mind that idea then.

Might be a good idea to make it disablable with a
fallback to regular quoting.

Makes sense. "-X disable-dollar-quoting"? Or we could have it turned off
by default and require it to be specifically turned on - that might
conform to the principle of least surprise, at least for now.

I don't mind if it's on by default; just thinking that some people might
appreciate a way to turn it off. "-X disable-dollar-quoting" sounds
fine.

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#4)
1 attachment(s)

Here's my attempt, as discussed earlier today. As always, comments
welcome. I did provide (and use) a fallback mechanism after all, for
the case of a function with a non-empty probin. A few examples from a
dump of the regression db:

--
-- Name: tg_hub_adjustslots(character, integer, integer); Type:
FUNCTION; Schema
: public; Owner: andrew
--

CREATE FUNCTION tg_hub_adjustslots(character, integer, integer) RETURNS
integer
AS $_$
declare
hname alias for $1;
oldnslots alias for $2;
newnslots alias for $3;
begin
if newnslots = oldnslots then
return 0;
end if;
if newnslots < oldnslots then
delete from HSlot where hubname = hname and slotno > newnslots;
return 0;
end if;
for i in oldnslots + 1 .. newnslots loop
insert into HSlot (slotname, hubname, slotno, slotlink)
values ('HS.dummy', hname, i, '');
end loop;
return 0;
end;
$_$
LANGUAGE plpgsql;

--
-- Name: ttdummy(); Type: FUNCTION; Schema: public; Owner: andrew
--

CREATE FUNCTION ttdummy() RETURNS "trigger"
AS '/home/andrew/pgwork/tip/pgsql/src/test/regress/regress.so',
'ttdummy'
LANGUAGE c;

--
-- Name: user_relns(); Type: FUNCTION; Schema: public; Owner: andrew
--

CREATE FUNCTION user_relns() RETURNS SETOF name
AS $$select relname
from pg_class c, pg_namespace n
where relnamespace = n.oid and
(nspname !~ 'pg_.*' and nspname <> 'information_schema') and
relkind <> 'i' $$
LANGUAGE sql;

cheers

andrew

Attachments:

dumpdq.patchtext/plain; name=dumpdq.patchDownload
Index: doc/src/sgml/ref/pg_dump.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/ref/pg_dump.sgml,v
retrieving revision 1.68
diff -c -r1.68 pg_dump.sgml
*** doc/src/sgml/ref/pg_dump.sgml	1 Dec 2003 22:07:58 -0000	1.68
--- doc/src/sgml/ref/pg_dump.sgml	23 Mar 2004 19:30:59 -0000
***************
*** 461,466 ****
--- 461,477 ----
       </varlistentry>
  
       <varlistentry>
+       <term><option>-X disable-dollar-quoting</></term>
+       <term><option>--disable-dollar-quoting</></term>
+       <listitem>
+        <para>
+         This option disables the use of dollar quoting for function bodies,
+         and forces them to be quoted using SQL standard strings.
+        </para>
+      </listitem>
+     </varlistentry>
+ 
+      <varlistentry>
        <term><option>-Z <replaceable class="parameter">0..9</replaceable></option></term>
        <term><option>--compress=<replaceable class="parameter">0..9</replaceable></option></term>
        <listitem>
Index: src/bin/pg_dump/dumputils.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/bin/pg_dump/dumputils.c,v
retrieving revision 1.11
diff -c -r1.11 dumputils.c
*** src/bin/pg_dump/dumputils.c	7 Jan 2004 00:44:21 -0000	1.11
--- src/bin/pg_dump/dumputils.c	23 Mar 2004 19:30:59 -0000
***************
*** 143,148 ****
--- 143,205 ----
  
  
  /*
+  * Convert a string value to a dollar quoted literal and append it to
+  * the given buffer. If the dqprefix parameter is not NULL then the 
+  * dollar quote delimiter will begin with that (after the opening $).
+  *
+  * No escaping is done at all on str, in compliance with the rules
+  * for parsing dollar quoted strings.
+  */
+ void
+ appendStringLiteralDQ(PQExpBuffer buf, const char *str, const char *dqprefix)
+ {
+ 	static char suffixes[] = "_XXXXXXXX";
+ 	int nextchar = 0;
+ 	PQExpBuffer delimBuf = createPQExpBuffer();
+ 
+ 	/* start with $ + dqprefix if not NULL */
+ 	appendPQExpBufferChar(delimBuf, '$');
+ 	if (dqprefix)
+ 		appendPQExpBuffer(delimBuf, dqprefix);
+ 
+ 	/* 
+ 	 * make sure we have a delimiter which (without the trailing $)
+ 	 * is not preent in the string being quoted. We don't check with the
+ 	 * trailing $ so that a string ending in $foo is not quoted with
+ 	 * $foo$.
+ 	 */
+ 	while (strstr(str, delimBuf->data) != NULL)
+ 	{
+ 		appendPQExpBufferChar(delimBuf, suffixes[nextchar++]);
+ 		nextchar %= sizeof(suffixes)-1;
+ 	}
+ 
+ 	/* add trailing $ */
+ 	appendPQExpBufferChar(delimBuf, '$');
+ 
+ 	/* quote it and we are all done */
+ 	appendPQExpBuffer(buf,"%s%s%s",delimBuf->data,str,delimBuf->data);
+ 	destroyPQExpBuffer(delimBuf);
+ 	
+ }
+ 
+ /*
+  * use dollar quoting if the string to be quoted contains ' or \,
+  * otherwise use standard quoting.
+  */
+ 
+ 
+ void appendStringLiteralDQFallback(PQExpBuffer buf, const char *str, 
+ 				bool escapeAll, const char *dqprefix)
+ {
+ 	if (strchr(str,'\'') == NULL && strchr(str,'\\') == NULL)
+ 		appendStringLiteral(buf,str,escapeAll);
+ 	else
+ 		appendStringLiteralDQ(buf,str,dqprefix);
+ }
+ 
+ 
+ /*
   * Convert backend's version string into a number.
   */
  int
Index: src/bin/pg_dump/dumputils.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/bin/pg_dump/dumputils.h,v
retrieving revision 1.10
diff -c -r1.10 dumputils.h
*** src/bin/pg_dump/dumputils.h	7 Jan 2004 00:44:21 -0000	1.10
--- src/bin/pg_dump/dumputils.h	23 Mar 2004 19:30:59 -0000
***************
*** 21,26 ****
--- 21,30 ----
  extern const char *fmtId(const char *identifier);
  extern void appendStringLiteral(PQExpBuffer buf, const char *str,
  					bool escapeAll);
+ extern void appendStringLiteralDQ(PQExpBuffer buf, const char *str, 
+ 				const char *dqprefix);
+ extern void appendStringLiteralDQFallback(PQExpBuffer buf, const char *str, 
+ 				bool escapeAll, const char *dqprefix);
  extern int	parse_version(const char *versionString);
  extern bool parsePGArray(const char *atext, char ***itemarray, int *nitems);
  extern bool buildACLCommands(const char *name, const char *type,
Index: src/bin/pg_dump/pg_dump.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.368
diff -c -r1.368 pg_dump.c
*** src/bin/pg_dump/pg_dump.c	20 Mar 2004 20:09:45 -0000	1.368
--- src/bin/pg_dump/pg_dump.c	23 Mar 2004 19:31:02 -0000
***************
*** 107,112 ****
--- 107,115 ----
  static NamespaceInfo *g_namespaces;
  static int	g_numNamespaces;
  
+ /* flag to turn on/off dollar quoting */
+ static int     disable_dollar_quoting = 0;
+ 
  
  static void help(const char *progname);
  static NamespaceInfo *findNamespace(Oid nsoid, Oid objoid);
***************
*** 233,238 ****
--- 236,242 ----
  		 */
  		{"use-set-session-authorization", no_argument, &use_setsessauth, 1},
  		{"disable-triggers", no_argument, &disable_triggers, 1},
+ 		{"disable-dollar-quoting", no_argument, &disable_dollar_quoting, 1},
  
  		{NULL, 0, NULL, 0}
  	};
***************
*** 389,394 ****
--- 393,400 ----
  					/* no-op, still allowed for compatibility */ ;
  				else if (strcmp(optarg, "disable-triggers") == 0)
  					disable_triggers = 1;
+ 				else if (strcmp(optarg,"disable-dollar-quoting") == 0)
+ 					disable_dollar_quoting = true;
  				else
  				{
  					fprintf(stderr,
***************
*** 681,686 ****
--- 687,694 ----
  	printf(_("  -x, --no-privileges      do not dump privileges (grant/revoke)\n"));
  	printf(_("  -X disable-triggers, --disable-triggers\n"
  			 "                           disable triggers during data-only restore\n"));
+ 	printf(_("  -X disable-dollar-quoting, --disable-dollar-quoting\n"
+ 			 "                           disable dollar quoting, use SQL standard quoting\n"));
  
  	printf(_("\nConnection options:\n"));
  	printf(_("  -h, --host=HOSTNAME      database server host or socket directory\n"));
***************
*** 5076,5082 ****
  		if (strcmp(prosrc, "-") != 0)
  		{
  			appendPQExpBuffer(asPart, ", ");
! 			appendStringLiteral(asPart, prosrc, false);
  		}
  	}
  	else
--- 5084,5097 ----
  		if (strcmp(prosrc, "-") != 0)
  		{
  			appendPQExpBuffer(asPart, ", ");
! 			/* 
! 			 * where we have bin, use dollar quoting if allowed 
! 			 * conditionally on src 
! 			 */
! 			if (disable_dollar_quoting)
! 				appendStringLiteral(asPart, prosrc, false);
! 			else
! 				appendStringLiteralDQFallback(asPart, prosrc, false, NULL);
  		}
  	}
  	else
***************
*** 5084,5090 ****
  		if (strcmp(prosrc, "-") != 0)
  		{
  			appendPQExpBuffer(asPart, "AS ");
! 			appendStringLiteral(asPart, prosrc, false);
  		}
  	}
  
--- 5099,5109 ----
  		if (strcmp(prosrc, "-") != 0)
  		{
  			appendPQExpBuffer(asPart, "AS ");
! 			/* with no bin, dollar quote src unconditionally if allowed */
! 			if (disable_dollar_quoting)
! 				appendStringLiteral(asPart, prosrc, false);
! 			else
! 				appendStringLiteralDQ(asPart, prosrc, NULL);
  		}
  	}
  
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#5)
Re: dollar quoting and pg_dump

Andrew Dunstan <andrew@dunslane.net> writes:

Here's my attempt, as discussed earlier today. As always, comments
welcome. I did provide (and use) a fallback mechanism after all, for
the case of a function with a non-empty probin.

Applied with light editorializing ("fallback" seemed a weird description
of what that function was doing, for instance).

At this point I think we are done with the coding part of this project,
and it's time to get to work on the documentation ...

regards, tom lane

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: dollar quoting and pg_dump

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Here's my attempt, as discussed earlier today. As always, comments
welcome. I did provide (and use) a fallback mechanism after all, for
the case of a function with a non-empty probin.

Applied with light editorializing ("fallback" seemed a weird description
of what that function was doing, for instance).

At this point I think we are done with the coding part of this project,
and it's time to get to work on the documentation ...

It is being worked on.

cheers

andrew

#8Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#4)
Re: dollar quoting and pg_dump

I don't mind if it's on by default; just thinking that some people might
appreciate a way to turn it off. "-X disable-dollar-quoting" sounds
fine.

Does it _have_ to be dollars? Other languages call this feature
'heretext' IIRC.

Chris

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Christopher Kings-Lynne (#8)
Re: dollar quoting and pg_dump

Christopher Kings-Lynne wrote:

I don't mind if it's on by default; just thinking that some people might
appreciate a way to turn it off. "-X disable-dollar-quoting" sounds
fine.

Does it _have_ to be dollars? Other languages call this feature
'heretext' IIRC.

This is not heretext, and calling it such would be quite misleading.
Nobody has yet come up with a better name for it.

cheers

andrew

#10Jon Jensen
jon@endpoint.com
In reply to: Christopher Kings-Lynne (#8)
Re: dollar quoting and pg_dump

On Wed, 24 Mar 2004, Christopher Kings-Lynne wrote:

I don't mind if it's on by default; just thinking that some people might
appreciate a way to turn it off. "-X disable-dollar-quoting" sounds
fine.

Does it _have_ to be dollars? Other languages call this feature
'heretext' IIRC.

No, but I think the rough consensus was that the behavior of this feature
is different enough from shell and Perl here documents that it deserved a
different name. In particular the $ is part of both the opening and
closing token, and newlines aren't relevant at either the beginning or
end.

Jon

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#8)
Re: dollar quoting and pg_dump

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

Does it _have_ to be dollars? Other languages call this feature
'heretext' IIRC.

I'm not in love with the name "dollar quoting" either ... but "here
text" would be quite misleading. See the archives for the discussions
that led us to develop our definition. The features that go by that
name usually have dependencies on formatting (ie newline boundaries)
that we specifically rejected.

Right now would be a fine time to invent a better name if you can think
of one ...

regards, tom lane