patches for items from TODO list

Started by Sergey Tenover 20 years ago64 messages
#1Sergey Ten
sergey@sourcelabs.com

Hello all,

We would like to contribute to the Postgresql community by implementing
the following items from the TODO list
(http://developer.postgresql.org/todo.php):
. Allow COPY to understand \x as a hex byte . Allow COPY to optionally
include column headings in the first line . Add XML output to COPY

The changes are straightforward and include implementation of the
features as well as modification of the regression tests and documentation.

Before sending a diff file with the changes, we would like to know if
these features have been already implemented.

Best regards,
Jason Lucas and Sergey Ten
SourceLabs

Dependable Open Source Systems

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Sergey Ten (#1)
Re: patches for items from TODO list

Sergey Ten wrote:

Hello all,

We would like to contribute to the Postgresql community by implementing
the following items from the TODO list
(http://developer.postgresql.org/todo.php):
. Allow COPY to understand \x as a hex byte . Allow COPY to optionally
include column headings in the first line . Add XML output to COPY

The changes are straightforward and include implementation of the
features as well as modification of the regression tests and documentation.

Before sending a diff file with the changes, we would like to know if
these features have been already implemented.

Please check the web site version. Someone has already implemented
"Allow COPY to optionally include column headings in the first line".

As far as XML, there has been discussion on where that should be done?
In the backend, libpq, or psql. It will need discussion on hackers. I
assume you have read the developer's FAQ too.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#2)
Re: patches for items from TODO list

Please check the web site version. Someone has already implemented
"Allow COPY to optionally include column headings in the first line".

As far as XML, there has been discussion on where that should be done?
In the backend, libpq, or psql. It will need discussion on hackers. I
assume you have read the developer's FAQ too.

The other issue is 'what XML format'? Find me a standard data dump XML
DTD and I'll change phpPgAdmin to use it as well.

Otherwise, phpPgAdmin's is quite simple.

Chris

#4Sergey Ten
sergey@sourcelabs.com
In reply to: Christopher Kings-Lynne (#3)
Re: patches for items from TODO list

Thank you to all who replied for your time.

We have checked http://momjian.postgresql.org/cgi-bin/pgpatches and
http://momjian.postgresql.org/cgi-bin/pgpatches2, and could not find patches
with words "copy" or "xml" in the subject. Could you please clarify what the
website version is and where it can be found? Is it sources available at
ftp://ftp.postgresql.org?

Best regards,
Jason, Sergey

Show quoted text

-----Original Message-----
From: Christopher Kings-Lynne [mailto:chriskl@familyhealth.com.au]
Sent: Wednesday, May 11, 2005 7:36 PM
To: Bruce Momjian
Cc: Sergey Ten; pgsql-hackers@postgresql.org; jason@sourcelabs.com
Subject: Re: [HACKERS] patches for items from TODO list

Please check the web site version. Someone has already implemented
"Allow COPY to optionally include column headings in the first line".

As far as XML, there has been discussion on where that should be done?
In the backend, libpq, or psql. It will need discussion on hackers. I
assume you have read the developer's FAQ too.

The other issue is 'what XML format'? Find me a standard data dump XML
DTD and I'll change phpPgAdmin to use it as well.

Otherwise, phpPgAdmin's is quite simple.

Chris

#5Michael Paesold
mpaesold@gmx.at
In reply to: Sergey Ten (#4)
Re: patches for items from TODO list

Sergey Ten wrote:

Thank you to all who replied for your time.

We have checked http://momjian.postgresql.org/cgi-bin/pgpatches and
http://momjian.postgresql.org/cgi-bin/pgpatches2, and could not find
patches
with words "copy" or "xml" in the subject. Could you please clarify what
the
website version is and where it can be found? Is it sources available at
ftp://ftp.postgresql.org?

I think the website version of the TODO is that one on www.postgresql.org
(see http://www.postgresql.org/docs/faqs.TODO.html, from
Developers->Roadmap).
"-Allow COPY to optionally include column headings in the first line" is
already completed (-). That is what Bruce referenced (below).

The patch has already been commited and is therefore not on the patches list
anymore:
http://archives.postgresql.org/pgsql-committers/2005-05/msg00075.php

(I searched for "copy" in the mailing list archives of pgsql-committers.)

Bruce Momjian wrote:

Please check the web site version. Someone has already implemented
"Allow COPY to optionally include column headings in the first line".

Hope that clarifies the situation a little bit.

Best Regards,
Michael Paesold

#6Sergey Ten
sergey@sourcelabs.com
In reply to: Bruce Momjian (#2)
1 attachment(s)
Re: patches for items from TODO list

Hello all,

Thank you to all who replied for suggestions and help. Enclosed please find
code changes for the following items:
- Allow COPY to understand \x as a hex byte, and
- Add XML output to COPY
The changes include implementation of the features as well as modification
of the copy regression test.

After a careful consideration we decided to
- put XML implementation in the backend and
- use XML format described below, with justification of our decision.

The XML schema used by the COPY TO command was designed for ease of use and
to avoid the problem of column names appearing in XML element names.
XML doesn't allow spaces and punctuation in element names but Postgres does
allow these characters in column names; therefore, a direct mapping would be
problematic.

The solution selected places the column names into attribute fields where
any special characters they contain can be properly escaped using XML
entities. An additional attribute is used to distinguish null fields from
empty ones.

The example below is taken from the test suite. It demonstrates some basic
XML escaping in row 2. Row 3 demonstrates the difference between an empty
string (in col2) and a null string (in col3). If a field is null it will
always be empty but a field which is empty may or may not be null.
Always check the value of the 'null' attribute to be sure when a field is
truly null.

<?xml version='1.0'?>
<table>
<row>
<col name='col1' null='n'>Jackson, Sam</col>
<col name='col2' null='n'>\h</col>
</row>
<row>
<col name='col1' null='n'>It is &quot;perfect&quot;.</col>
<col name='col2' null='n'>&#09;</col>
</row>
<row>
<col name='col1' null='n'></col>
<col name='col2' null='y'></col>
</row>
</table>

Please let us know if about any concerns, objections the proposed change may
cause.

Best regards,
Jason Lucas, Sergey Ten
SourceLabs

Show quoted text

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Wednesday, May 11, 2005 7:11 PM
To: Sergey Ten
Cc: pgsql-hackers@postgresql.org; jason@sourcelabs.com
Subject: Re: [HACKERS] patches for items from TODO list

Sergey Ten wrote:

Hello all,

We would like to contribute to the Postgresql community by implementing
the following items from the TODO list
(http://developer.postgresql.org/todo.php):
. Allow COPY to understand \x as a hex byte . Allow COPY to optionally
include column headings in the first line . Add XML output to COPY

The changes are straightforward and include implementation of the
features as well as modification of the regression tests and

documentation.

Before sending a diff file with the changes, we would like to know if
these features have been already implemented.

Please check the web site version. Someone has already implemented
"Allow COPY to optionally include column headings in the first line".

As far as XML, there has been discussion on where that should be done?
In the backend, libpq, or psql. It will need discussion on hackers. I
assume you have read the developer's FAQ too.

--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania
19073

Attachments:

diff.txttext/plain; name=diff.txtDownload
Index: src/backend/commands/copy.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.244
diff -u -r1.244 copy.c
--- src/backend/commands/copy.c	7 May 2005 02:22:46 -0000	1.244
+++ src/backend/commands/copy.c	13 May 2005 22:21:00 -0000
@@ -84,6 +84,16 @@
 	EOL_CRNL
 } EolType;

+/*
+ *	Represents the format of the file to be read or written
+ */
+typedef enum CopyFmt
+{
+	FMT_TXT,
+	FMT_BIN,
+	FMT_CSV,
+	FMT_XML
+} CopyFmt;
 
 static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
 
@@ -129,14 +139,14 @@
 static bool line_buf_converted;
 
 /* non-export function prototypes */
-static void DoCopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
-		 char *delim, char *null_print, bool csv_mode, char *quote,
+static void DoCopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+		 char *delim, char *null_print, char *quote,
 		 char *escape, List *force_quote_atts, bool header_line, bool fe_copy);
-static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
- char *delim, char *null_print, bool csv_mode, char *quote, char *escape,
+static void CopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+ char *delim, char *null_print, char *quote, char *escape,
 	   List *force_quote_atts, bool header_line);
-static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
- char *delim, char *null_print, bool csv_mode, char *quote, char *escape,
+static void CopyFrom(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+ char *delim, char *null_print, char *quote, char *escape,
 		 List *force_notnull_atts, bool header_line);
 static bool CopyReadLine(char * quote, char * escape);
 static char *CopyReadAttribute(const char *delim, const char *null_print,
@@ -171,6 +181,11 @@
 static void CopySendInt16(int16 val);
 static int16 CopyGetInt16(void);
 
+static int GetDecimalFromHex(char hex);
+
+static void CopyAttributeOutXML (char *colname, char *string);
+static void CopySendStringXML(char *string);
+static char *CopyGetXMLEntity(char c, char *buf);
 
 /*
  * Send copy start/stop messages for frontend copies.  These have changed
@@ -692,10 +707,9 @@
 	List	   *attnamelist = stmt->attlist;
 	List	   *attnumlist;
 	bool		fe_copy = false;
-	bool		binary = false;
 	bool		oids = false;
-	bool		csv_mode = false;
-	bool        header_line = false;
+	bool		header_line = false;
+	CopyFmt		fmt = FMT_TXT;
 	char	   *delim = NULL;
 	char	   *quote = NULL;
 	char	   *escape = NULL;
@@ -715,11 +729,11 @@
 
 		if (strcmp(defel->defname, "binary") == 0)
 		{
-			if (binary)
+			if (fmt != FMT_TXT)
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options")));
-			binary = intVal(defel->arg);
+			fmt = FMT_BIN;
 		}
 		else if (strcmp(defel->defname, "oids") == 0)
 		{
@@ -747,11 +761,19 @@
 		}
 		else if (strcmp(defel->defname, "csv") == 0)
 		{
-			if (csv_mode)
+			if (fmt != FMT_TXT)
 				ereport(ERROR,
 						(errcode(ERRCODE_SYNTAX_ERROR),
 						 errmsg("conflicting or redundant options")));
-			csv_mode = intVal(defel->arg);
+			fmt = FMT_CSV;
+		}
+		else if (strcmp(defel->defname, "xml") == 0)
+		{
+			if (fmt != FMT_TXT)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			fmt = FMT_XML;
 		}
 		else if (strcmp(defel->defname, "header") == 0)
 		{
@@ -798,29 +820,39 @@
 				 defel->defname);
 	}
 
-	if (binary && delim)
+	if (fmt == FMT_BIN && delim)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
 				 errmsg("cannot specify DELIMITER in BINARY mode")));
 
-	if (binary && csv_mode)
+	if (fmt == FMT_BIN && null_print)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify CSV in BINARY mode")));
+				 errmsg("cannot specify NULL in BINARY mode")));
 
-	if (binary && null_print)
+	if (fmt == FMT_XML && is_from)
 		ereport(ERROR,
 				(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify NULL in BINARY mode")));
+				 errmsg("XML mode is not available in COPY FROM")));
+
+	if (fmt == FMT_XML && delim)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify DELIMITER in XML mode")));
+
+	if (fmt == FMT_XML && null_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify NULL in XML mode")));
 
 	/* Set defaults */
 	if (!delim)
-		delim = csv_mode ? "," : "\t";
+		delim = (fmt == FMT_CSV) ? "," : "\t";
 
 	if (!null_print)
-		null_print = csv_mode ? "" : "\\N";
+		null_print = (fmt == FMT_CSV) ? "" : "\\N";
 
-	if (csv_mode)
+	if (fmt == FMT_CSV)
 	{
 		if (!quote)
 			quote = "\"";
@@ -835,35 +867,35 @@
 				 errmsg("COPY delimiter must be a single character")));
 
   	/* Check header */
-	if (!csv_mode && header_line)
+	if (fmt != FMT_CSV && header_line)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY HEADER available only in CSV mode")));
 
 	/* Check quote */
-	if (!csv_mode && quote != NULL)
+	if (fmt != FMT_CSV && quote != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote available only in CSV mode")));
 
-	if (csv_mode && strlen(quote) != 1)
+	if (fmt == FMT_CSV && strlen(quote) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY quote must be a single character")));
 
 	/* Check escape */
-	if (!csv_mode && escape != NULL)
+	if (fmt != FMT_CSV && escape != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape available only in CSV mode")));
 
-	if (csv_mode && strlen(escape) != 1)
+	if (fmt == FMT_CSV && strlen(escape) != 1)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY escape must be a single character")));
 
 	/* Check force_quote */
-	if (!csv_mode && force_quote != NIL)
+	if (fmt != FMT_CSV && force_quote != NIL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("COPY force quote available only in CSV mode")));
@@ -873,7 +905,7 @@
 			   errmsg("COPY force quote only available using COPY TO")));
 
 	/* Check force_notnull */
-	if (!csv_mode && force_notnull != NIL)
+	if (fmt != FMT_CSV && force_notnull != NIL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 			  errmsg("COPY force not null available only in CSV mode")));
@@ -889,7 +921,7 @@
 				 errmsg("COPY delimiter must not appear in the NULL specification")));
 
 	/* Don't allow the csv quote char to appear in the null string. */
-	if (csv_mode && strchr(null_print, quote[0]) != NULL)
+	if (fmt == FMT_CSV && strchr(null_print, quote[0]) != NULL)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("CSV quote character must not appear in the NULL specification")));
@@ -1004,7 +1036,7 @@
 		if (pipe)
 		{
 			if (whereToSendOutput == Remote)
-				ReceiveCopyBegin(binary, list_length(attnumlist));
+				ReceiveCopyBegin(fmt == FMT_BIN, list_length(attnumlist));
 			else
 				copy_file = stdin;
 		}
@@ -1029,7 +1061,7 @@
 						 errmsg("\"%s\" is a directory", filename)));
 			}
 		}
-		CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
+		CopyFrom(rel, attnumlist, fmt, oids, delim, null_print,
 				 quote, escape, force_notnull_atts, header_line);
 	}
 	else
@@ -1093,7 +1125,7 @@
 			}
 		}
 
-		DoCopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
+		DoCopyTo(rel, attnumlist, fmt, oids, delim, null_print,
 				 quote, escape, force_quote_atts, header_line, fe_copy);
 	}
 
@@ -1124,20 +1156,20 @@
  * so we don't need to plaster a lot of variables with "volatile".
  */
 static void
-DoCopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
-		 char *delim, char *null_print, bool csv_mode, char *quote,
+DoCopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+		 char *delim, char *null_print, char *quote,
 		 char *escape, List *force_quote_atts, bool header_line, bool fe_copy)
 {
 	PG_TRY();
 	{
 		if (fe_copy)
-			SendCopyBegin(binary, list_length(attnumlist));
+			SendCopyBegin(fmt == FMT_BIN, list_length(attnumlist));
 
-		CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
+		CopyTo(rel, attnumlist, fmt, oids, delim, null_print,
 			   quote, escape, force_quote_atts, header_line);
 
 		if (fe_copy)
-			SendCopyEnd(binary);
+			SendCopyEnd(fmt == FMT_BIN);
 	}
 	PG_CATCH();
 	{
@@ -1156,8 +1188,8 @@
  * Copy from relation TO file.
  */
 static void
-CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
-	   char *delim, char *null_print, bool csv_mode, char *quote,
+CopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+	   char *delim, char *null_print, char *quote,
 	   char *escape, List *force_quote_atts, bool header_line)
 {
 	HeapTuple	tuple;
@@ -1187,7 +1219,7 @@
 		Oid			out_func_oid;
 		bool		isvarlena;
 
-		if (binary)
+		if (fmt == FMT_BIN)
 			getTypeBinaryOutputInfo(attr[attnum - 1]->atttypid,
 									&out_func_oid,
 									&isvarlena);
@@ -1215,7 +1247,7 @@
 									  ALLOCSET_DEFAULT_INITSIZE,
 									  ALLOCSET_DEFAULT_MAXSIZE);
 
-	if (binary)
+	if (fmt == FMT_BIN)
 	{
 		/* Generate header for a binary copy */
 		int32		tmp;
@@ -1233,6 +1265,14 @@
 	}
 	else
 	{
+		if (fmt == FMT_XML)
+		{
+			CopySendString("<?xml version='1.0'?>");
+			CopySendEndOfRow(false);
+			CopySendString("<table>");
+			CopySendEndOfRow(false);
+		}
+
 		/*
 		 * For non-binary copy, we need to convert null_print to client
 		 * encoding, because it will be sent directly with CopySendString.
@@ -1262,7 +1302,7 @@
 									strcmp(colname, null_print) == 0);
 			}
 
-			CopySendEndOfRow(binary);
+			CopySendEndOfRow(fmt == FMT_BIN);
 
 		}
 	}
@@ -1278,7 +1318,7 @@
 		MemoryContextReset(mycontext);
 		oldcontext = MemoryContextSwitchTo(mycontext);
 
-		if (binary)
+		if (fmt == FMT_BIN)
 		{
 			/* Binary per-tuple header */
 			CopySendInt16(attr_count);
@@ -1294,25 +1334,34 @@
 		}
 		else
 		{
+			if (fmt == FMT_XML)
+				CopySendString("<row>");
+
 			/* Text format has no per-tuple header, but send OID if wanted */
 			if (oids)
 			{
 				string = DatumGetCString(DirectFunctionCall1(oidout,
 							  ObjectIdGetDatum(HeapTupleGetOid(tuple))));
-				CopySendString(string);
+
+				if (fmt == FMT_XML)
+					CopyAttributeOutXML("oid", string);
+				else
+					CopySendString(string);
+
 				need_delim = true;
 			}
 		}
 
 		foreach(cur, attnumlist)
 		{
-			int			attnum = lfirst_int(cur);
+			int		attnum = lfirst_int(cur);
 			Datum		value;
 			bool		isnull;
+			char		*colname = NameStr(attr[attnum - 1]->attname);
 
 			value = heap_getattr(tuple, attnum, tupDesc, &isnull);
 
-			if (!binary)
+			if (fmt == FMT_TXT || fmt == FMT_CSV)
 			{
 				if (need_delim)
 					CopySendChar(delim[0]);
@@ -1321,53 +1370,71 @@
 
 			if (isnull)
 			{
-				if (!binary)
-					CopySendString(null_print); /* null indicator */
-				else
-					CopySendInt32(-1);	/* null marker */
+				switch (fmt)
+				{
+					case FMT_BIN:
+						CopySendInt32(-1);	/* null marker */
+						break;
+					case FMT_XML:
+						CopyAttributeOutXML(colname, NULL); /* null entity */
+						break;
+					default:
+						CopySendString(null_print); /* null indicator */
+						break;
+				}
+
 			}
 			else
 			{
-				if (!binary)
+				if (fmt == FMT_BIN)
 				{
-					string = DatumGetCString(FunctionCall1(&out_functions[attnum - 1],
-														   value));
-					if (csv_mode)
-					{
-						CopyAttributeOutCSV(string, delim, quote, escape,
-									  (strcmp(string, null_print) == 0 ||
-									   force_quote[attnum - 1]));
-					}
-					else
-						CopyAttributeOut(string, delim);
-
+					bytea	*outputbytes =
+						DatumGetByteaP(FunctionCall1(&out_functions[attnum - 1], value));
+					/* We assume the result will not have been toasted */
+					CopySendInt32(VARSIZE(outputbytes) - VARHDRSZ);
+					CopySendData(VARDATA(outputbytes), VARSIZE(outputbytes) - VARHDRSZ);
 				}
 				else
 				{
-					bytea	   *outputbytes;
-
-					outputbytes = DatumGetByteaP(FunctionCall1(&out_functions[attnum - 1],
-															   value));
-					/* We assume the result will not have been toasted */
-					CopySendInt32(VARSIZE(outputbytes) - VARHDRSZ);
-					CopySendData(VARDATA(outputbytes),
-								 VARSIZE(outputbytes) - VARHDRSZ);
+					string = DatumGetCString(FunctionCall1(&out_functions[attnum - 1], value));
+					switch (fmt)
+					{
+						case FMT_CSV:
+							CopyAttributeOutCSV(string, delim, quote, escape,
+								(strcmp(string, null_print) == 0
+								|| force_quote[attnum - 1]));
+							break;
+						case FMT_XML:
+							CopyAttributeOutXML(colname, string);
+							break;
+						default:
+							CopyAttributeOut(string, delim);
+							break;
+					}
 				}
 			}
 		}
 
-		CopySendEndOfRow(binary);
+		if (fmt == FMT_XML)
+			CopySendString("</row>");
+
+		CopySendEndOfRow(fmt == FMT_BIN);
 
 		MemoryContextSwitchTo(oldcontext);
 	}
 
 	heap_endscan(scandesc);
 
-	if (binary)
+	if (fmt == FMT_BIN)
 	{
 		/* Generate trailer for a binary copy */
 		CopySendInt16(-1);
 	}
+	else if (fmt == FMT_XML)
+	{
+		CopySendString("</table>");
+		CopySendEndOfRow(false);
+	}
 
 	MemoryContextDelete(mycontext);
 
@@ -1464,8 +1531,8 @@
  * Copy FROM file to relation.
  */
 static void
-CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
-		 char *delim, char *null_print, bool csv_mode, char *quote,
+CopyFrom(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+		 char *delim, char *null_print, char *quote,
 		 char *escape, List *force_notnull_atts, bool header_line)
 {
 	HeapTuple	tuple;
@@ -1549,7 +1616,7 @@
 			continue;
 
 		/* Fetch the input function and typioparam info */
-		if (binary)
+		if (fmt == FMT_BIN)
 			getTypeBinaryInputInfo(attr[attnum - 1]->atttypid,
 								 &in_func_oid, &typioparams[attnum - 1]);
 		else
@@ -1620,7 +1687,7 @@
 	 */
 	ExecBSInsertTriggers(estate, resultRelInfo);
 
-	if (!binary)
+	if (fmt != FMT_BIN)
 		file_has_oids = oids;	/* must rely on user to tell us this... */
 	else
 	{
@@ -1663,7 +1730,7 @@
 		}
 	}
 
-	if (file_has_oids && binary)
+	if (file_has_oids && fmt == FMT_BIN)
 	{
 		getTypeBinaryInputInfo(OIDOID,
 							   &in_func_oid, &oid_typioparam);
@@ -1681,7 +1748,7 @@
 	/* Initialize static variables */
 	fe_eof = false;
 	eol_type = EOL_UNKNOWN;
-	copy_binary = binary;
+	copy_binary = (fmt == FMT_BIN);
 	copy_relname = RelationGetRelationName(rel);
 	copy_lineno = 0;
 	copy_attname = NULL;
@@ -1718,14 +1785,14 @@
 		MemSet(values, 0, num_phys_attrs * sizeof(Datum));
 		MemSet(nulls, 'n', num_phys_attrs * sizeof(char));
 
-		if (!binary)
+		if (fmt != FMT_BIN)
 		{
 			CopyReadResult result = NORMAL_ATTR;
 			char	   *string;
 			ListCell   *cur;
 
 			/* Actually read the line into memory here */
-			done = csv_mode ? 
+			done = (fmt == FMT_CSV) ? 
 				CopyReadLine(quote, escape) : CopyReadLine(NULL, NULL);
 
 			/*
@@ -1776,7 +1843,7 @@
 							 errmsg("missing data for column \"%s\"",
 									NameStr(attr[m]->attname))));
 
-				if (csv_mode)
+				if (fmt == FMT_CSV)
 				{
 					string = CopyReadAttributeCSV(delim, null_print, quote,
 											   escape, &result, &isnull);
@@ -1789,7 +1856,7 @@
 					string = CopyReadAttribute(delim, null_print,
 											   &result, &isnull);
 
-				if (csv_mode && isnull && force_notnull[m])
+				if (fmt == FMT_CSV && isnull && force_notnull[m])
 				{
 					string = null_print;		/* set to NULL string */
 					isnull = false;
@@ -2275,6 +2342,27 @@
 }
 
 /*----------
+ * Returns decimal value for a hexadecimal digit.
+*----------
+ */
+static int GetDecimalFromHex(char hex)
+{
+	if (isdigit(hex))
+	{
+		// If it is a digit
+		return hex - '0';
+	}
+	if (hex < 'a')
+	{
+		return hex - 'A' + 10;
+	}
+	else
+	{
+		return hex - 'a' + 10;
+	}
+}
+
+/*----------
  * Read the value of a single attribute, performing de-escaping as needed.
  *
  * delim is the column delimiter string (must be just one byte for now).
@@ -2378,6 +2466,29 @@
 				case 'v':
 					c = '\v';
 					break;
+				case 'x':
+				case 'X':
+					if (line_buf.cursor < line_buf.len)
+					{
+						char hexchar = line_buf.data[line_buf.cursor];
+						if (isxdigit(hexchar))
+						{
+							int val = GetDecimalFromHex(hexchar);
+							line_buf.cursor++;
+							if (line_buf.cursor < line_buf.len)
+							{
+								hexchar = line_buf.data[line_buf.cursor];
+								if (isxdigit(hexchar))
+								{
+									line_buf.cursor++;
+									val = (val << 4) + GetDecimalFromHex(hexchar);
+								}
+							}
+
+							c = val & 0xff;
+						}
+					}
+					break;
 
 					/*
 					 * in all other cases, take the char after '\'
@@ -2760,3 +2871,84 @@
 
 	return attnums;
 }
+
+/*
+ * Send XML representation of one attribute, with element tagging, null
+ * marking, and entity escaping.
+ */
+
+static void
+CopyAttributeOutXML (char *colname, char *string)
+{
+	CopySendString("<col name='");
+	CopySendStringXML(colname);
+	CopySendString("' null='");
+	CopySendChar((string == NULL) ? 'y' : 'n');
+	CopySendString("'>");
+
+	if (string != NULL)
+		CopySendStringXML(string);
+
+	CopySendString("</col>");
+}
+
+/*
+ * Sends a string with entity escaping.
+ */
+
+static void
+CopySendStringXML (char *string)
+{
+	char	*csr;
+	for (csr = string; *csr; ++csr)
+	{
+		char buf[10];
+		char *entity = CopyGetXMLEntity(*csr, buf);
+		if (entity)
+			CopySendString(entity);
+		else
+			CopySendChar(*csr);
+	}
+}
+
+/*
+ * Locates or creates an XML entity for the given character.
+ * If that character doesn't require an entity, then the
+ * function returns NULL.
+ */
+
+static char *
+CopyGetXMLEntity (char c, char *buf)
+{
+	char *entity;
+
+	switch (c)
+	{
+		case '<':
+			entity = "&lt;";
+			break;
+		case '>':
+			entity = "&gt;";
+			break;
+		case '&':
+			entity = "&amp;";
+			break;
+		case '\'':
+			entity = "&apos;";
+			break;
+		case '"':
+			entity = "&quot;";
+			break;
+		default:
+			if (!isgraph(c) && c != ' ')
+			{
+				sprintf(buf, "&#%02x;", (unsigned char)c);
+				entity = buf;
+			}
+			else
+				entity = NULL;
+			break;
+	}
+
+	return entity;
+}
Index: src/backend/parser/gram.y
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.491
diff -u -r2.491 gram.y
--- src/backend/parser/gram.y	7 May 2005 02:22:46 -0000	2.491
+++ src/backend/parser/gram.y	13 May 2005 22:21:01 -0000
@@ -413,6 +413,8 @@

 	WHEN WHERE WITH WITHOUT WORK WRITE
 
+	XML
+
 	YEAR_P
 
 	ZONE
@@ -1448,6 +1450,10 @@
 				{
 					$$ = makeDefElem("header", (Node *)makeInteger(TRUE));
 				}
+			| XML
+				{
+					$$ = makeDefElem("xml", (Node *)makeInteger(TRUE));
+				}
 			| QUOTE opt_as Sconst
 				{
 					$$ = makeDefElem("quote", (Node *)makeString($3));
Index: src/backend/parser/keywords.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/parser/keywords.c,v
retrieving revision 1.155
diff -u -r1.155 keywords.c
--- src/backend/parser/keywords.c	7 May 2005 02:22:47 -0000	1.155
+++ src/backend/parser/keywords.c	13 May 2005 22:21:01 -0000
@@ -342,6 +342,7 @@
 	{"without", WITHOUT},
 	{"work", WORK},
 	{"write", WRITE},
+	{"xml", XML},
 	{"year", YEAR_P},
 	{"zone", ZONE},
 };
Index: src/test/regress/expected/copy2.out
===================================================================
RCS file: /projects/cvsroot/pgsql/src/test/regress/expected/copy2.out,v
retrieving revision 1.21
diff -u -r1.21 copy2.out
--- src/test/regress/expected/copy2.out	13 May 2005 06:33:40 -0000	1.21
+++ src/test/regress/expected/copy2.out	13 May 2005 22:21:01 -0000
@@ -194,6 +194,28 @@
 --test that we read consecutive LFs properly
 CREATE TEMP TABLE testnl (a int, b text, c int);
 COPY testnl FROM stdin CSV;
-DROP TABLE x, y;
+CREATE TABLE z (
+	col1 text,
+	col2 text
+);
+COPY z from stdin;
+COPY z TO stdout;
+Jackson, Sam	\\h
+ABC	\\\\\t
+It is "perfect".	\t
+	NULL
+COPY z TO stdout WITH CSV;
+"Jackson, Sam",\h
+ABC,\\	
+"It is ""perfect"".",	
+"",NULL
+COPY y TO stdout WITH XML;
+<?xml version='1.0'?>
+<table>
+<row><col name='col1' null='n'>Jackson, Sam</col><col name='col2' null='n'>\h</col></row>
+<row><col name='col1' null='n'>It is &quot;perfect&quot;.</col><col name='col2' null='n'>&#09;</col></row>
+<row><col name='col1' null='n'></col><col name='col2' null='y'></col></row>
+</table>
+DROP TABLE x, y, z;
 DROP FUNCTION fn_x_before();
 DROP FUNCTION fn_x_after();
Index: src/test/regress/sql/copy2.sql
===================================================================
RCS file: /projects/cvsroot/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.12
diff -u -r1.12 copy2.sql
--- src/test/regress/sql/copy2.sql	13 May 2005 06:33:40 -0000	1.12
+++ src/test/regress/sql/copy2.sql	13 May 2005 22:21:01 -0000
@@ -139,7 +139,22 @@
 inside",2
 \.
 
+CREATE TABLE z (
+	col1 text,
+	col2 text
+);
 
-DROP TABLE x, y;
+COPY z from stdin;
+Jackson, Sam	\\h
+\x41\x42\x43\xa0\x1	\x5c\x5c\x9
+It is "perfect".	\t
+	NULL
+\.
+
+COPY z TO stdout;
+COPY z TO stdout WITH CSV;
+COPY y TO stdout WITH XML;
+
+DROP TABLE x, y, z;
 DROP FUNCTION fn_x_before();
 DROP FUNCTION fn_x_after();
#7David Fetter
david@fetter.org
In reply to: Sergey Ten (#1)
Re: [Fwd: Re: SQL99 Hierarchical queries]

On Sun, May 15, 2005 at 04:44:57PM +0800, Christopher Kings-Lynne wrote:

Looks like hierarchical queries are now officially stalled :(

Anyone want to take this up for 8.1?

Sergei and Jason,

Feel like taking SQL:1999 WITH RECURSIVE? It would be a giant help to
the PostgreSQL community. :)

http://gppl.moonbone.ru/index.shtml

has part of it, and

http://candle.pha.pa.us/mhonarc/patches2/msg00175.html

has others.

There's also MERGE, which is covered starting on page 47 of
http://wiscorp.com/sql/SQL2003Features.pdf

also pp 839-845 of 5WD-02-Foundation-2003-09.pdf which is part of
this:
http://wiscorp.com/sql/sql_2003_standard.zip

and an overview here:
http://www.varlena.com/varlena/GeneralBits/73.php

Cheers,
D

Hi,

I haven't done any significant progress on that way because of lack of
free time.
Beside this, I'm recently changed my job and now I'm woking for MySQL.
I think it's not possible for me to continue work on PostgreSQL.
Feel free to take the patch and develop it further as long as you
mention me as author of initial version.

Regards, Evgen.

On 5/5/05, Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote:

Hi Evgen,

I just keep pinging this patch thread every once in a while to make sure
it doesn't get forgotten :)

How is the syncing with 8.1 CVS coming along?

Chris

Evgen Potemkin wrote:

Hi hackers!

I have done initial implementation of SQL99 WITH clause (attached).
It's now only for v7.3.4 and haven't a lot of checks and restrictions.
It can execute only simple WITH queries like
WITH tree AS (SELECT id,pnt,name,1::int AS level FROM t WHERE id=0
UNION SELECT t.id,t.pnt,t.name,tree.level+1 FROM t JOIN tree ON
tree.id=t.pnt ) SELECT * FROM tree;
It would be great if someone with knowledge of Pg internals can make a
kind of code review and make some advices how to better implement all
this.

Regards, Evgen.

------------------------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

On Wed, May 11, 2005 at 07:01:50PM -0700, Sergey Ten wrote:

Hello all,

We would like to contribute to the Postgresql community by implementing
the following items from the TODO list
(http://developer.postgresql.org/todo.php):
. Allow COPY to understand \x as a hex byte . Allow COPY to optionally
include column headings in the first line . Add XML output to COPY

The changes are straightforward and include implementation of the
features as well as modification of the regression tests and documentation.

Before sending a diff file with the changes, we would like to know if
these features have been already implemented.

Best regards,
Jason Lucas and Sergey Ten
SourceLabs

Dependable Open Source Systems

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#8Noname
jason@sourcelabs.com
In reply to: David Fetter (#7)
SQL99 hierarchical queries stalled

David--

My boss has given me approval to put up to 8 hours per week of SourceLabs'
time in on the SQL99 hierarchical query implementation. (I'm free, of
course, to supplement this with whatever of my own time I can spare.) I'm
willing to take on the work. What's the next step?

--Jason

#9David Fetter
david@fetter.org
In reply to: Noname (#8)
Re: SQL99 hierarchical queries stalled

On Mon, May 16, 2005 at 03:09:18PM -0700, jason@sourcelabs.com wrote:

David--

My boss has given me approval to put up to 8 hours per week of
SourceLabs' time in on the SQL99 hierarchical query implementation.

That's great! :)

(I'm free, of course, to supplement this with whatever of my own
time I can spare.) I'm willing to take on the work. What's the
next step?

I suppose the first thing would be to look over the patches I
mentioned and the SQL:2003 specification, then put together a
preliminary patch and send it to -hackers.

Cheers,
David.
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#9)
Re: SQL99 hierarchical queries stalled

David Fetter <david@fetter.org> writes:

What's the next step?

I suppose the first thing would be to look over the patches I
mentioned and the SQL:2003 specification, then put together a
preliminary patch and send it to -hackers.

You can get useful feedback long before having anything that looks
like a patch --- and I'd encourage you to do so. Send us design
notes, rough data structures, etc.

Quite honestly, hierarchical queries aren't the easiest thing in the
world and wouldn't be my recommendation of the first ... or even the
second ... backend hacking project for a new posthacker to tackle.
If that's where you feel you must start, OK, but try to get as much
feedback as soon as you can, sooner not later.

I seem to recall some discussion of how to do this in the past;
have you trolled the pghackers archives?

regards, tom lane

#11Hannu Krosing
hannu@skype.net
In reply to: Tom Lane (#10)
Re: SQL99 hierarchical queries stalled

On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote:

David Fetter <david@fetter.org> writes:

What's the next step?

I suppose the first thing would be to look over the patches I
mentioned and the SQL:2003 specification, then put together a
preliminary patch and send it to -hackers.

...

I seem to recall some discussion of how to do this in the past;
have you trolled the pghackers archives?

I think that Jasons inspiration for doing it came from the the fact that
there are already now abandoned patches for doing it.

So studying/understanding the current patch, and describing and getting
feedback from pgsql-hackers should be quite a good way of gaining
insight in trickier parts of postgres.

So it will not be jumping at new problem and writing a patch, but rather
trying to get an existing patch into good shape for being accepted in
the backend.

--
Hannu Krosing <hannu@skype.net>

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#11)
Re: SQL99 hierarchical queries stalled

Hannu Krosing <hannu@skype.net> writes:

On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote:

I seem to recall some discussion of how to do this in the past;
have you trolled the pghackers archives?

I think that Jasons inspiration for doing it came from the the fact that
there are already now abandoned patches for doing it.

Having looked over the latest patch, my advice would be to ignore it :-(
It's almost completely devoid of documentation, except for comments
that he copied-and-pasted from elsewhere without modification. Wrong
comments are even worse than none.

What I'd like to see before anyone writes a line of code is a text
document explaining how this is going to work: what's the plan tree
structure, what happens at execution time, how much of the SQL99 spec
is going to get implemented. If you don't have that understanding
first, you're going to get buried in irrelevant details.

regards, tom lane

#13Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#12)
Re: SQL99 hierarchical queries stalled

Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote:

I seem to recall some discussion of how to do this in the past;
have you trolled the pghackers archives?

I think that Jasons inspiration for doing it came from the the fact that
there are already now abandoned patches for doing it.

Having looked over the latest patch, my advice would be to ignore it :-(
It's almost completely devoid of documentation, except for comments
that he copied-and-pasted from elsewhere without modification. Wrong
comments are even worse than none.

What I'd like to see before anyone writes a line of code is a text
document explaining how this is going to work: what's the plan tree
structure, what happens at execution time, how much of the SQL99 spec
is going to get implemented. If you don't have that understanding
first, you're going to get buried in irrelevant details.

I have updated the developer's FAQ to cover these suggestions on how to
start a patch:

1.4) What do I do after choosing an item to work on?

Send an email to pgsql-hackers with a proposal for what you want to do
(assuming your contribution is not trivial). Working in isolation is not
advisable because others might be working on the same TODO item, or you
might have misunderstood the TODO item. In the email, discuss both the
internal implementation method you plan to use, and any user-visible
changes (new syntax, etc). For complex patches, it is important to get
community feeback on your proposal before starting work. Failure to do
so might mean your patch is rejected.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14Hannu Krosing
hannu@skype.net
In reply to: Tom Lane (#12)
Re: SQL99 hierarchical queries stalled

On T, 2005-05-17 at 11:22 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote:

I seem to recall some discussion of how to do this in the past;
have you trolled the pghackers archives?

I think that Jasons inspiration for doing it came from the the fact that
there are already now abandoned patches for doing it.

Having looked over the latest patch, my advice would be to ignore it :-(
It's almost completely devoid of documentation, except for comments
that he copied-and-pasted from elsewhere without modification. Wrong
comments are even worse than none.

ANd even worse - this is from the README on the website:

----8<---------8<---------8<---------8<---------8<---------8<---------8<-----
WHAT'S THIS

This is a patch which allows PgSQL to make hierarchical queries a la
Oracle do.

(c) Evgen Potemkin 2003,2004, < gppl at inbox dot ru >, entirely based
on PostgreSQL (http://www.postgresql.org)
Patch itself distributed under GPL. No warranty of any kind is given,
use it at your own risk.
----8<---------8<---------8<---------8<---------8<---------8<---------8<-----

So the license is also incompatible :(

--
Hannu Krosing <hannu@skype.net>

#15Markus Bertheau
twanger@bluetwanger.de
In reply to: Sergey Ten (#6)
Re: patches for items from TODO list

Dnia 13-05-2005, pią o godzinie 16:01 -0700, Sergey Ten napisał(a):

<?xml version='1.0'?>
<table>
<row>
<col name='col1' null='n'>Jackson, Sam</col>
<col name='col2' null='n'>\h</col>
</row>
<row>
<col name='col1' null='n'>It is &quot;perfect&quot;.</col>
<col name='col2' null='n'>&#09;</col>
</row>
<row>
<col name='col1' null='n'></col>
<col name='col2' null='y'></col>
</row>
</table>

Why didn't you do something to the effect of

<?xml version='1.0'?>
<table>
<cols>
<col name='col1'/>
<col name='col2'/>
</cols>
<row>
<col null='n'>Jackson, Sam</col>
<col null='n'>\h</col>
</row>
<row>
<col null='n'>It is &quot;perfect&quot;.</col>
<col null='n'>&#09;</col>
</row>
<row>
<col null='n'></col>
<col null='y'></col>
</row>
</table>

This avoids repeating the column names in every row, which don't change
over the rows anyway. By reducing redundant information it also makes
structurally invalid XML less likely (whether that is relevant depends
on what people do with the XML data).

Also you could encode the XML output as UTF-8, which would make the
files more readable for humans if the text data is not ASCII.

Markus

--
Markus Bertheau <twanger@bluetwanger.de>

#16Neil Conway
neilc@samurai.com
In reply to: Sergey Ten (#6)
Re: patches for items from TODO list

Sergey Ten wrote:

After a careful consideration we decided to
- put XML implementation in the backend

What advantage does putting the XML output mode in the backend provide?

-Neil

#17David Fetter
david@fetter.org
In reply to: Hannu Krosing (#14)
Re: SQL99 hierarchical queries stalled

On Tue, May 17, 2005 at 11:24:19PM +0300, Hannu Krosing wrote:

On T, 2005-05-17 at 11:22 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

On T, 2005-05-17 at 00:42 -0400, Tom Lane wrote:

I seem to recall some discussion of how to do this in the past;
have you trolled the pghackers archives?

I think that Jasons inspiration for doing it came from the the fact that
there are already now abandoned patches for doing it.

Having looked over the latest patch, my advice would be to ignore it :-(
It's almost completely devoid of documentation, except for comments
that he copied-and-pasted from elsewhere without modification. Wrong
comments are even worse than none.

ANd even worse - this is from the README on the website:

----8<---------8<---------8<---------8<---------8<---------8<---------8<-----
WHAT'S THIS

This is a patch which allows PgSQL to make hierarchical queries a la
Oracle do.

(c) Evgen Potemkin 2003,2004, < gppl at inbox dot ru >, entirely based
on PostgreSQL (http://www.postgresql.org)
Patch itself distributed under GPL. No warranty of any kind is given,
use it at your own risk.

It's the notes on the CONNECT BY patch, not the WITH patch. The WITH
patch, as far as I can tell, is in the public domain.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#18Sergey Ten
sergey@sourcelabs.com
In reply to: Markus Bertheau (#15)
Re: patches for items from TODO list

Markus,

Thank you for your reply.
We considered embedding of an XML schema first followed by data. We decided
to stick to our current data format to make sure stateless XML parsers can
process it as well. Would it be better to add an option to the COPY command,
to allow embedding an XML schema, so more advanced XML parsers can take
advantage of it?

Thanks,
Jason, Sergey

Show quoted text

-----Original Message-----
From: Markus Bertheau [mailto:twanger@bluetwanger.de]
Sent: Tuesday, May 17, 2005 6:00 PM
To: Sergey Ten
Cc: 'Bruce Momjian'; 'Christopher Kings-Lynne'; pgsql-
hackers@postgresql.org; jason@sourcelabs.com
Subject: Re: [HACKERS] patches for items from TODO list

Dnia 13-05-2005, pią o godzinie 16:01 -0700, Sergey Ten napisał(a):

<?xml version='1.0'?>
<table>
<row>
<col name='col1' null='n'>Jackson, Sam</col>
<col name='col2' null='n'>\h</col>
</row>
<row>
<col name='col1' null='n'>It is &quot;perfect&quot;.</col>
<col name='col2' null='n'>&#09;</col>
</row>
<row>
<col name='col1' null='n'></col>
<col name='col2' null='y'></col>
</row>
</table>

Why didn't you do something to the effect of

<?xml version='1.0'?>
<table>
<cols>
<col name='col1'/>
<col name='col2'/>
</cols>
<row>
<col null='n'>Jackson, Sam</col>
<col null='n'>\h</col>
</row>
<row>
<col null='n'>It is &quot;perfect&quot;.</col>
<col null='n'>&#09;</col>
</row>
<row>
<col null='n'></col>
<col null='y'></col>
</row>
</table>

This avoids repeating the column names in every row, which don't change
over the rows anyway. By reducing redundant information it also makes
structurally invalid XML less likely (whether that is relevant depends
on what people do with the XML data).

Also you could encode the XML output as UTF-8, which would make the
files more readable for humans if the text data is not ASCII.

Markus

--
Markus Bertheau <twanger@bluetwanger.de>

#19Sergey Ten
sergey@sourcelabs.com
In reply to: Neil Conway (#16)
Re: patches for items from TODO list

Neil,

We think that putting it in the backend will make access from other
components easier.

Thank you,
Sergey

Show quoted text

-----Original Message-----
From: Neil Conway [mailto:neilc@samurai.com]
Sent: Tuesday, May 17, 2005 7:11 PM
To: Sergey Ten
Cc: 'Bruce Momjian'; 'Christopher Kings-Lynne'; pgsql-
hackers@postgresql.org; jason@sourcelabs.com
Subject: Re: [HACKERS] patches for items from TODO list

Sergey Ten wrote:

After a careful consideration we decided to
- put XML implementation in the backend

What advantage does putting the XML output mode in the backend provide?

-Neil

#20Neil Conway
neilc@samurai.com
In reply to: Sergey Ten (#19)
Re: patches for items from TODO list

Sergey Ten wrote:

We think that putting it in the backend will make access from other
components easier.

In what way?

It seems to me that this can be done just as easily in a client
application / library, without cluttering the backend with yet another
COPY output format. It would also avoid the need to mandate a single XML
schema -- different clients will likely have different requirements.
Since I am skeptical of the value of this feature in the first place, I
think it would do less damage if implemented outside the backend.

-Neil

#21Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Sergey Ten (#18)
Re: patches for items from TODO list

Sergey Ten wrote:

Markus,

Thank you for your reply.
We considered embedding of an XML schema first followed by data. We decided
to stick to our current data format to make sure stateless XML parsers can
process it as well. Would it be better to add an option to the COPY command,
to allow embedding an XML schema, so more advanced XML parsers can take
advantage of it?

Current CVS has a WITH CSV HEADER option. I wonder if we should add a
HEADER option to XML to output the schema --- seems to make sense.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#22Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Neil Conway (#20)
Re: patches for items from TODO list

Neil Conway wrote:

Sergey Ten wrote:

We think that putting it in the backend will make access from other
components easier.

In what way?

It seems to me that this can be done just as easily in a client
application / library, without cluttering the backend with yet another
COPY output format. It would also avoid the need to mandate a single XML
schema -- different clients will likely have different requirements.
Since I am skeptical of the value of this feature in the first place, I
think it would do less damage if implemented outside the backend.

We considered putting XML in psql or libpq in the past, but the problem
is that interfaces like jdbc couldn't take advantage of it. I do think
it needs to be in the backend, and I think that is the agreement we had
in the past.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#23Andrew Dunstan
andrew@dunslane.net
In reply to: Sergey Ten (#6)
Re: patches for items from TODO list

I've been reviewing this patch and some of the following discussion.

First, postgresql patches are usually sent as context diffs. I don't
object to unidiffs myself, but you should do what everybody else does.

Second, it's best not to combine features in one patch. The \x escape
piece should be broken out.

I'm also a rather worried about COPY producing output which it can't
itself parse. We can read our own binary, text and CSV formats, and I
think that's a useful validation tool. I know the TODO item only
mentions output, but I believe we should rethink that. In any case, if
it's valid for us to hand XML to other programs why shouldn't we accept
it too. This is all about playing nicely in the playground.

One advantage of XML is that, being hierarchical, it can easily express
nested composites (records, arrays) in a way that our present text and
CSV formats really can't. But unless I missed something this patch
doesn't in fact do anything to break out nested composites.

Finally, I don't know if there is a standard on this, or even a
convention. What do other DBs do? I'm not keen on us just inventing our
own XML dialect for something that should after all be most useful in
data exchange.

Bottom line, much as I would like to see XML input/output, I think this
needs lots more thought and discussion.

cheers

andrew

Sergey Ten wrote:

Show quoted text

Hello all,

Thank you to all who replied for suggestions and help. Enclosed please find
code changes for the following items:
- Allow COPY to understand \x as a hex byte, and
- Add XML output to COPY
The changes include implementation of the features as well as modification
of the copy regression test.

After a careful consideration we decided to
- put XML implementation in the backend and
- use XML format described below, with justification of our decision.

The XML schema used by the COPY TO command was designed for ease of use and
to avoid the problem of column names appearing in XML element names.
XML doesn't allow spaces and punctuation in element names but Postgres does
allow these characters in column names; therefore, a direct mapping would be
problematic.

The solution selected places the column names into attribute fields where
any special characters they contain can be properly escaped using XML
entities. An additional attribute is used to distinguish null fields from
empty ones.

The example below is taken from the test suite. It demonstrates some basic
XML escaping in row 2. Row 3 demonstrates the difference between an empty
string (in col2) and a null string (in col3). If a field is null it will
always be empty but a field which is empty may or may not be null.
Always check the value of the 'null' attribute to be sure when a field is
truly null.

<?xml version='1.0'?>
<table>
<row>
<col name='col1' null='n'>Jackson, Sam</col>
<col name='col2' null='n'>\h</col>
</row>
<row>
<col name='col1' null='n'>It is &quot;perfect&quot;.</col>
<col name='col2' null='n'>&#09;</col>
</row>
<row>
<col name='col1' null='n'></col>
<col name='col2' null='y'></col>
</row>
</table>

Please let us know if about any concerns, objections the proposed change may
cause.

Best regards,
Jason Lucas, Sergey Ten
SourceLabs

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Wednesday, May 11, 2005 7:11 PM
To: Sergey Ten
Cc: pgsql-hackers@postgresql.org; jason@sourcelabs.com
Subject: Re: [HACKERS] patches for items from TODO list

Sergey Ten wrote:

Hello all,

We would like to contribute to the Postgresql community by implementing
the following items from the TODO list
(http://developer.postgresql.org/todo.php):
. Allow COPY to understand \x as a hex byte . Allow COPY to optionally
include column headings in the first line . Add XML output to COPY

The changes are straightforward and include implementation of the
features as well as modification of the regression tests and

documentation.

Before sending a diff file with the changes, we would like to know if
these features have been already implemented.

Please check the web site version. Someone has already implemented
"Allow COPY to optionally include column headings in the first line".

As far as XML, there has been discussion on where that should be done?
In the backend, libpq, or psql. It will need discussion on hackers. I
assume you have read the developer's FAQ too.

--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania
19073

------------------------------------------------------------------------

Index: src/backend/commands/copy.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.244
diff -u -r1.244 copy.c
--- src/backend/commands/copy.c	7 May 2005 02:22:46 -0000	1.244
+++ src/backend/commands/copy.c	13 May 2005 22:21:00 -0000
@@ -84,6 +84,16 @@
EOL_CRNL
} EolType;
+/*
+ *	Represents the format of the file to be read or written
+ */
+typedef enum CopyFmt
+{
+	FMT_TXT,
+	FMT_BIN,
+	FMT_CSV,
+	FMT_XML
+} CopyFmt;

static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";

@@ -129,14 +139,14 @@
static bool line_buf_converted;

/* non-export function prototypes */
-static void DoCopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
-		 char *delim, char *null_print, bool csv_mode, char *quote,
+static void DoCopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+		 char *delim, char *null_print, char *quote,
char *escape, List *force_quote_atts, bool header_line, bool fe_copy);
-static void CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
- char *delim, char *null_print, bool csv_mode, char *quote, char *escape,
+static void CopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+ char *delim, char *null_print, char *quote, char *escape,
List *force_quote_atts, bool header_line);
-static void CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
- char *delim, char *null_print, bool csv_mode, char *quote, char *escape,
+static void CopyFrom(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+ char *delim, char *null_print, char *quote, char *escape,
List *force_notnull_atts, bool header_line);
static bool CopyReadLine(char * quote, char * escape);
static char *CopyReadAttribute(const char *delim, const char *null_print,
@@ -171,6 +181,11 @@
static void CopySendInt16(int16 val);
static int16 CopyGetInt16(void);
+static int GetDecimalFromHex(char hex);
+
+static void CopyAttributeOutXML (char *colname, char *string);
+static void CopySendStringXML(char *string);
+static char *CopyGetXMLEntity(char c, char *buf);

/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -692,10 +707,9 @@
List *attnamelist = stmt->attlist;
List *attnumlist;
bool fe_copy = false;
- bool binary = false;
bool oids = false;
- bool csv_mode = false;
- bool header_line = false;
+ bool header_line = false;
+ CopyFmt fmt = FMT_TXT;
char *delim = NULL;
char *quote = NULL;
char *escape = NULL;
@@ -715,11 +729,11 @@

if (strcmp(defel->defname, "binary") == 0)
{
-			if (binary)
+			if (fmt != FMT_TXT)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
-			binary = intVal(defel->arg);
+			fmt = FMT_BIN;
}
else if (strcmp(defel->defname, "oids") == 0)
{
@@ -747,11 +761,19 @@
}
else if (strcmp(defel->defname, "csv") == 0)
{
-			if (csv_mode)
+			if (fmt != FMT_TXT)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("conflicting or redundant options")));
-			csv_mode = intVal(defel->arg);
+			fmt = FMT_CSV;
+		}
+		else if (strcmp(defel->defname, "xml") == 0)
+		{
+			if (fmt != FMT_TXT)
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("conflicting or redundant options")));
+			fmt = FMT_XML;
}
else if (strcmp(defel->defname, "header") == 0)
{
@@ -798,29 +820,39 @@
defel->defname);
}

- if (binary && delim)
+ if (fmt == FMT_BIN && delim)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
errmsg("cannot specify DELIMITER in BINARY mode")));

-	if (binary && csv_mode)
+	if (fmt == FMT_BIN && null_print)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify CSV in BINARY mode")));
+				 errmsg("cannot specify NULL in BINARY mode")));
-	if (binary && null_print)
+	if (fmt == FMT_XML && is_from)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
-				 errmsg("cannot specify NULL in BINARY mode")));
+				 errmsg("XML mode is not available in COPY FROM")));
+
+	if (fmt == FMT_XML && delim)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify DELIMITER in XML mode")));
+
+	if (fmt == FMT_XML && null_print)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("cannot specify NULL in XML mode")));
/* Set defaults */
if (!delim)
-		delim = csv_mode ? "," : "\t";
+		delim = (fmt == FMT_CSV) ? "," : "\t";
if (!null_print)
-		null_print = csv_mode ? "" : "\\N";
+		null_print = (fmt == FMT_CSV) ? "" : "\\N";

- if (csv_mode)
+ if (fmt == FMT_CSV)
{
if (!quote)
quote = "\"";
@@ -835,35 +867,35 @@
errmsg("COPY delimiter must be a single character")));

/* Check header */
- if (!csv_mode && header_line)
+ if (fmt != FMT_CSV && header_line)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY HEADER available only in CSV mode")));

/* Check quote */
- if (!csv_mode && quote != NULL)
+ if (fmt != FMT_CSV && quote != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote available only in CSV mode")));

-	if (csv_mode && strlen(quote) != 1)
+	if (fmt == FMT_CSV && strlen(quote) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single character")));

/* Check escape */
- if (!csv_mode && escape != NULL)
+ if (fmt != FMT_CSV && escape != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape available only in CSV mode")));

-	if (csv_mode && strlen(escape) != 1)
+	if (fmt == FMT_CSV && strlen(escape) != 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY escape must be a single character")));
/* Check force_quote */
-	if (!csv_mode && force_quote != NIL)
+	if (fmt != FMT_CSV && force_quote != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY force quote available only in CSV mode")));
@@ -873,7 +905,7 @@
errmsg("COPY force quote only available using COPY TO")));
/* Check force_notnull */
-	if (!csv_mode && force_notnull != NIL)
+	if (fmt != FMT_CSV && force_notnull != NIL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY force not null available only in CSV mode")));
@@ -889,7 +921,7 @@
errmsg("COPY delimiter must not appear in the NULL specification")));
/* Don't allow the csv quote char to appear in the null string. */
-	if (csv_mode && strchr(null_print, quote[0]) != NULL)
+	if (fmt == FMT_CSV && strchr(null_print, quote[0]) != NULL)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("CSV quote character must not appear in the NULL specification")));
@@ -1004,7 +1036,7 @@
if (pipe)
{
if (whereToSendOutput == Remote)
-				ReceiveCopyBegin(binary, list_length(attnumlist));
+				ReceiveCopyBegin(fmt == FMT_BIN, list_length(attnumlist));
else
copy_file = stdin;
}
@@ -1029,7 +1061,7 @@
errmsg("\"%s\" is a directory", filename)));
}
}
-		CopyFrom(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
+		CopyFrom(rel, attnumlist, fmt, oids, delim, null_print,
quote, escape, force_notnull_atts, header_line);
}
else
@@ -1093,7 +1125,7 @@
}
}
-		DoCopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
+		DoCopyTo(rel, attnumlist, fmt, oids, delim, null_print,
quote, escape, force_quote_atts, header_line, fe_copy);
}
@@ -1124,20 +1156,20 @@
* so we don't need to plaster a lot of variables with "volatile".
*/
static void
-DoCopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
-		 char *delim, char *null_print, bool csv_mode, char *quote,
+DoCopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+		 char *delim, char *null_print, char *quote,
char *escape, List *force_quote_atts, bool header_line, bool fe_copy)
{
PG_TRY();
{
if (fe_copy)
-			SendCopyBegin(binary, list_length(attnumlist));
+			SendCopyBegin(fmt == FMT_BIN, list_length(attnumlist));
-		CopyTo(rel, attnumlist, binary, oids, delim, null_print, csv_mode,
+		CopyTo(rel, attnumlist, fmt, oids, delim, null_print,
quote, escape, force_quote_atts, header_line);
if (fe_copy)
-			SendCopyEnd(binary);
+			SendCopyEnd(fmt == FMT_BIN);
}
PG_CATCH();
{
@@ -1156,8 +1188,8 @@
* Copy from relation TO file.
*/
static void
-CopyTo(Relation rel, List *attnumlist, bool binary, bool oids,
-	   char *delim, char *null_print, bool csv_mode, char *quote,
+CopyTo(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+	   char *delim, char *null_print, char *quote,
char *escape, List *force_quote_atts, bool header_line)
{
HeapTuple	tuple;
@@ -1187,7 +1219,7 @@
Oid			out_func_oid;
bool		isvarlena;

- if (binary)
+ if (fmt == FMT_BIN)
getTypeBinaryOutputInfo(attr[attnum - 1]->atttypid,
&out_func_oid,
&isvarlena);
@@ -1215,7 +1247,7 @@
ALLOCSET_DEFAULT_INITSIZE,
ALLOCSET_DEFAULT_MAXSIZE);

-	if (binary)
+	if (fmt == FMT_BIN)
{
/* Generate header for a binary copy */
int32		tmp;
@@ -1233,6 +1265,14 @@
}
else
{
+		if (fmt == FMT_XML)
+		{
+			CopySendString("<?xml version='1.0'?>");
+			CopySendEndOfRow(false);
+			CopySendString("<table>");
+			CopySendEndOfRow(false);
+		}
+
/*
* For non-binary copy, we need to convert null_print to client
* encoding, because it will be sent directly with CopySendString.
@@ -1262,7 +1302,7 @@
strcmp(colname, null_print) == 0);
}
-			CopySendEndOfRow(binary);
+			CopySendEndOfRow(fmt == FMT_BIN);

}
}
@@ -1278,7 +1318,7 @@
MemoryContextReset(mycontext);
oldcontext = MemoryContextSwitchTo(mycontext);

-		if (binary)
+		if (fmt == FMT_BIN)
{
/* Binary per-tuple header */
CopySendInt16(attr_count);
@@ -1294,25 +1334,34 @@
}
else
{
+			if (fmt == FMT_XML)
+				CopySendString("<row>");
+
/* Text format has no per-tuple header, but send OID if wanted */
if (oids)
{
string = DatumGetCString(DirectFunctionCall1(oidout,
ObjectIdGetDatum(HeapTupleGetOid(tuple))));
-				CopySendString(string);
+
+				if (fmt == FMT_XML)
+					CopyAttributeOutXML("oid", string);
+				else
+					CopySendString(string);
+
need_delim = true;
}
}
foreach(cur, attnumlist)
{
-			int			attnum = lfirst_int(cur);
+			int		attnum = lfirst_int(cur);
Datum		value;
bool		isnull;
+			char		*colname = NameStr(attr[attnum - 1]->attname);

value = heap_getattr(tuple, attnum, tupDesc, &isnull);

-			if (!binary)
+			if (fmt == FMT_TXT || fmt == FMT_CSV)
{
if (need_delim)
CopySendChar(delim[0]);
@@ -1321,53 +1370,71 @@
if (isnull)
{
-				if (!binary)
-					CopySendString(null_print); /* null indicator */
-				else
-					CopySendInt32(-1);	/* null marker */
+				switch (fmt)
+				{
+					case FMT_BIN:
+						CopySendInt32(-1);	/* null marker */
+						break;
+					case FMT_XML:
+						CopyAttributeOutXML(colname, NULL); /* null entity */
+						break;
+					default:
+						CopySendString(null_print); /* null indicator */
+						break;
+				}
+
}
else
{
-				if (!binary)
+				if (fmt == FMT_BIN)
{
-					string = DatumGetCString(FunctionCall1(&out_functions[attnum - 1],
-														   value));
-					if (csv_mode)
-					{
-						CopyAttributeOutCSV(string, delim, quote, escape,
-									  (strcmp(string, null_print) == 0 ||
-									   force_quote[attnum - 1]));
-					}
-					else
-						CopyAttributeOut(string, delim);
-
+					bytea	*outputbytes =
+						DatumGetByteaP(FunctionCall1(&out_functions[attnum - 1], value));
+					/* We assume the result will not have been toasted */
+					CopySendInt32(VARSIZE(outputbytes) - VARHDRSZ);
+					CopySendData(VARDATA(outputbytes), VARSIZE(outputbytes) - VARHDRSZ);
}
else
{
-					bytea	   *outputbytes;
-
-					outputbytes = DatumGetByteaP(FunctionCall1(&out_functions[attnum - 1],
-															   value));
-					/* We assume the result will not have been toasted */
-					CopySendInt32(VARSIZE(outputbytes) - VARHDRSZ);
-					CopySendData(VARDATA(outputbytes),
-								 VARSIZE(outputbytes) - VARHDRSZ);
+					string = DatumGetCString(FunctionCall1(&out_functions[attnum - 1], value));
+					switch (fmt)
+					{
+						case FMT_CSV:
+							CopyAttributeOutCSV(string, delim, quote, escape,
+								(strcmp(string, null_print) == 0
+								|| force_quote[attnum - 1]));
+							break;
+						case FMT_XML:
+							CopyAttributeOutXML(colname, string);
+							break;
+						default:
+							CopyAttributeOut(string, delim);
+							break;
+					}
}
}
}
-		CopySendEndOfRow(binary);
+		if (fmt == FMT_XML)
+			CopySendString("</row>");
+
+		CopySendEndOfRow(fmt == FMT_BIN);

MemoryContextSwitchTo(oldcontext);
}

heap_endscan(scandesc);

-	if (binary)
+	if (fmt == FMT_BIN)
{
/* Generate trailer for a binary copy */
CopySendInt16(-1);
}
+	else if (fmt == FMT_XML)
+	{
+		CopySendString("</table>");
+		CopySendEndOfRow(false);
+	}

MemoryContextDelete(mycontext);

@@ -1464,8 +1531,8 @@
* Copy FROM file to relation.
*/
static void
-CopyFrom(Relation rel, List *attnumlist, bool binary, bool oids,
-		 char *delim, char *null_print, bool csv_mode, char *quote,
+CopyFrom(Relation rel, List *attnumlist, CopyFmt fmt, bool oids,
+		 char *delim, char *null_print, char *quote,
char *escape, List *force_notnull_atts, bool header_line)
{
HeapTuple	tuple;
@@ -1549,7 +1616,7 @@
continue;

/* Fetch the input function and typioparam info */
- if (binary)
+ if (fmt == FMT_BIN)
getTypeBinaryInputInfo(attr[attnum - 1]->atttypid,
&in_func_oid, &typioparams[attnum - 1]);
else
@@ -1620,7 +1687,7 @@
*/
ExecBSInsertTriggers(estate, resultRelInfo);

- if (!binary)
+ if (fmt != FMT_BIN)
file_has_oids = oids; /* must rely on user to tell us this... */
else
{
@@ -1663,7 +1730,7 @@
}
}

- if (file_has_oids && binary)
+ if (file_has_oids && fmt == FMT_BIN)
{
getTypeBinaryInputInfo(OIDOID,
&in_func_oid, &oid_typioparam);
@@ -1681,7 +1748,7 @@
/* Initialize static variables */
fe_eof = false;
eol_type = EOL_UNKNOWN;
- copy_binary = binary;
+ copy_binary = (fmt == FMT_BIN);
copy_relname = RelationGetRelationName(rel);
copy_lineno = 0;
copy_attname = NULL;
@@ -1718,14 +1785,14 @@
MemSet(values, 0, num_phys_attrs * sizeof(Datum));
MemSet(nulls, 'n', num_phys_attrs * sizeof(char));

- if (!binary)
+ if (fmt != FMT_BIN)
{
CopyReadResult result = NORMAL_ATTR;
char *string;
ListCell *cur;

/* Actually read the line into memory here */
-			done = csv_mode ? 
+			done = (fmt == FMT_CSV) ? 
CopyReadLine(quote, escape) : CopyReadLine(NULL, NULL);

/*
@@ -1776,7 +1843,7 @@
errmsg("missing data for column \"%s\"",
NameStr(attr[m]->attname))));

- if (csv_mode)
+ if (fmt == FMT_CSV)
{
string = CopyReadAttributeCSV(delim, null_print, quote,
escape, &result, &isnull);
@@ -1789,7 +1856,7 @@
string = CopyReadAttribute(delim, null_print,
&result, &isnull);

-				if (csv_mode && isnull && force_notnull[m])
+				if (fmt == FMT_CSV && isnull && force_notnull[m])
{
string = null_print;		/* set to NULL string */
isnull = false;
@@ -2275,6 +2342,27 @@
}
/*----------
+ * Returns decimal value for a hexadecimal digit.
+*----------
+ */
+static int GetDecimalFromHex(char hex)
+{
+	if (isdigit(hex))
+	{
+		// If it is a digit
+		return hex - '0';
+	}
+	if (hex < 'a')
+	{
+		return hex - 'A' + 10;
+	}
+	else
+	{
+		return hex - 'a' + 10;
+	}
+}
+
+/*----------
* Read the value of a single attribute, performing de-escaping as needed.
*
* delim is the column delimiter string (must be just one byte for now).
@@ -2378,6 +2466,29 @@
case 'v':
c = '\v';
break;
+				case 'x':
+				case 'X':
+					if (line_buf.cursor < line_buf.len)
+					{
+						char hexchar = line_buf.data[line_buf.cursor];
+						if (isxdigit(hexchar))
+						{
+							int val = GetDecimalFromHex(hexchar);
+							line_buf.cursor++;
+							if (line_buf.cursor < line_buf.len)
+							{
+								hexchar = line_buf.data[line_buf.cursor];
+								if (isxdigit(hexchar))
+								{
+									line_buf.cursor++;
+									val = (val << 4) + GetDecimalFromHex(hexchar);
+								}
+							}
+
+							c = val & 0xff;
+						}
+					}
+					break;

/*
* in all other cases, take the char after '\'
@@ -2760,3 +2871,84 @@

return attnums;
}
+
+/*
+ * Send XML representation of one attribute, with element tagging, null
+ * marking, and entity escaping.
+ */
+
+static void
+CopyAttributeOutXML (char *colname, char *string)
+{
+	CopySendString("<col name='");
+	CopySendStringXML(colname);
+	CopySendString("' null='");
+	CopySendChar((string == NULL) ? 'y' : 'n');
+	CopySendString("'>");
+
+	if (string != NULL)
+		CopySendStringXML(string);
+
+	CopySendString("</col>");
+}
+
+/*
+ * Sends a string with entity escaping.
+ */
+
+static void
+CopySendStringXML (char *string)
+{
+	char	*csr;
+	for (csr = string; *csr; ++csr)
+	{
+		char buf[10];
+		char *entity = CopyGetXMLEntity(*csr, buf);
+		if (entity)
+			CopySendString(entity);
+		else
+			CopySendChar(*csr);
+	}
+}
+
+/*
+ * Locates or creates an XML entity for the given character.
+ * If that character doesn't require an entity, then the
+ * function returns NULL.
+ */
+
+static char *
+CopyGetXMLEntity (char c, char *buf)
+{
+	char *entity;
+
+	switch (c)
+	{
+		case '<':
+			entity = "&lt;";
+			break;
+		case '>':
+			entity = "&gt;";
+			break;
+		case '&':
+			entity = "&amp;";
+			break;
+		case '\'':
+			entity = "&apos;";
+			break;
+		case '"':
+			entity = "&quot;";
+			break;
+		default:
+			if (!isgraph(c) && c != ' ')
+			{
+				sprintf(buf, "&#%02x;", (unsigned char)c);
+				entity = buf;
+			}
+			else
+				entity = NULL;
+			break;
+	}
+
+	return entity;
+}
Index: src/backend/parser/gram.y
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.491
diff -u -r2.491 gram.y
--- src/backend/parser/gram.y	7 May 2005 02:22:46 -0000	2.491
+++ src/backend/parser/gram.y	13 May 2005 22:21:01 -0000
@@ -413,6 +413,8 @@

WHEN WHERE WITH WITHOUT WORK WRITE

+	XML
+
YEAR_P
ZONE
@@ -1448,6 +1450,10 @@
{
$$ = makeDefElem("header", (Node *)makeInteger(TRUE));
}
+			| XML
+				{
+					$$ = makeDefElem("xml", (Node *)makeInteger(TRUE));
+				}
| QUOTE opt_as Sconst
{
$$ = makeDefElem("quote", (Node *)makeString($3));
Index: src/backend/parser/keywords.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/parser/keywords.c,v
retrieving revision 1.155
diff -u -r1.155 keywords.c
--- src/backend/parser/keywords.c	7 May 2005 02:22:47 -0000	1.155
+++ src/backend/parser/keywords.c	13 May 2005 22:21:01 -0000
@@ -342,6 +342,7 @@
{"without", WITHOUT},
{"work", WORK},
{"write", WRITE},
+	{"xml", XML},
{"year", YEAR_P},
{"zone", ZONE},
};
Index: src/test/regress/expected/copy2.out
===================================================================
RCS file: /projects/cvsroot/pgsql/src/test/regress/expected/copy2.out,v
retrieving revision 1.21
diff -u -r1.21 copy2.out
--- src/test/regress/expected/copy2.out	13 May 2005 06:33:40 -0000	1.21
+++ src/test/regress/expected/copy2.out	13 May 2005 22:21:01 -0000
@@ -194,6 +194,28 @@
--test that we read consecutive LFs properly
CREATE TEMP TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
-DROP TABLE x, y;
+CREATE TABLE z (
+	col1 text,
+	col2 text
+);
+COPY z from stdin;
+COPY z TO stdout;
+Jackson, Sam	\\h
+ABC	\\\\\t
+It is "perfect".	\t
+	NULL
+COPY z TO stdout WITH CSV;
+"Jackson, Sam",\h
+ABC,\\	
+"It is ""perfect"".",	
+"",NULL
+COPY y TO stdout WITH XML;
+<?xml version='1.0'?>
+<table>
+<row><col name='col1' null='n'>Jackson, Sam</col><col name='col2' null='n'>\h</col></row>
+<row><col name='col1' null='n'>It is &quot;perfect&quot;.</col><col name='col2' null='n'>&#09;</col></row>
+<row><col name='col1' null='n'></col><col name='col2' null='y'></col></row>
+</table>
+DROP TABLE x, y, z;
DROP FUNCTION fn_x_before();
DROP FUNCTION fn_x_after();
Index: src/test/regress/sql/copy2.sql
===================================================================
RCS file: /projects/cvsroot/pgsql/src/test/regress/sql/copy2.sql,v
retrieving revision 1.12
diff -u -r1.12 copy2.sql
--- src/test/regress/sql/copy2.sql	13 May 2005 06:33:40 -0000	1.12
+++ src/test/regress/sql/copy2.sql	13 May 2005 22:21:01 -0000
@@ -139,7 +139,22 @@
inside",2
\.
+CREATE TABLE z (
+	col1 text,
+	col2 text
+);
-DROP TABLE x, y;
+COPY z from stdin;
+Jackson, Sam	\\h
+\x41\x42\x43\xa0\x1	\x5c\x5c\x9
+It is "perfect".	\t
+	NULL
+\.
+
+COPY z TO stdout;
+COPY z TO stdout WITH CSV;
+COPY y TO stdout WITH XML;
+
+DROP TABLE x, y, z;
DROP FUNCTION fn_x_before();
DROP FUNCTION fn_x_after();

------------------------------------------------------------------------

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#24Neil Conway
neilc@samurai.com
In reply to: Bruce Momjian (#22)
Re: patches for items from TODO list

Bruce Momjian wrote:

We considered putting XML in psql or libpq in the past, but the problem
is that interfaces like jdbc couldn't take advantage of it.

Well, you could implement it as a C UDF and use SPI. Or write it as a C
client library, and use JNI. Or just provide a Java implementation --
it's not like the COPY -> XML transformation is very complex.

To restate the case:

- I don't see how this feature is useful. Perhaps I'm mistaken, but I
don't think there's a lot of user demand for it (feel free to
demonstrate the contrary)

- The COPY -> XML transformation is trivial -- it would be easy for
clients to roll their own. At the same time, there is no standard or
canonical XML representation for COPY output, and I can easily imagine
different clients needing different representations. So there is limited
value in providing a single, inflexible backend implementation.

- There's no need for it to be in the backend, anyway. Perhaps if there
were overwhelming demand for this functionality, we would need to
provide it for all client libraries and the easiest solution would be to
put it in the backend, but I don't think that's the case.

If people really think XML COPY output mode is useful, why not implement
it client-side first and host it on pgfoundry? If lots of people are
using the client-side code, we can consider moving it into the core
distribution or the backend itself at that point.

-Neil

#25Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Neil Conway (#24)
Re: patches for items from TODO list

Neil Conway wrote:

Bruce Momjian wrote:

We considered putting XML in psql or libpq in the past, but the problem
is that interfaces like jdbc couldn't take advantage of it.

Well, you could implement it as a C UDF and use SPI. Or write it as a C
client library, and use JNI. Or just provide a Java implementation --
it's not like the COPY -> XML transformation is very complex.

To restate the case:

- I don't see how this feature is useful. Perhaps I'm mistaken, but I
don't think there's a lot of user demand for it (feel free to
demonstrate the contrary)

- The COPY -> XML transformation is trivial -- it would be easy for
clients to roll their own. At the same time, there is no standard or
canonical XML representation for COPY output, and I can easily imagine
different clients needing different representations. So there is limited
value in providing a single, inflexible backend implementation.

- There's no need for it to be in the backend, anyway. Perhaps if there
were overwhelming demand for this functionality, we would need to
provide it for all client libraries and the easiest solution would be to
put it in the backend, but I don't think that's the case.

If people really think XML COPY output mode is useful, why not implement
it client-side first and host it on pgfoundry? If lots of people are
using the client-side code, we can consider moving it into the core
distribution or the backend itself at that point.

All I can say is that we rejected an XML in the client patch a long time
ago and the discussion was that it belongs in the backend so everyone
can use it. I don't use XML myself so I have no opinion. We need
people who need XML output to comment in this thread.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#26Josh Berkus
josh@agliodbs.com
In reply to: Neil Conway (#24)
Re: patches for items from TODO list

Folks,

- The COPY -> XML transformation is trivial -- it would be easy for
clients to roll their own. At the same time, there is no standard or
canonical XML representation for COPY output, and I can easily imagine
different clients needing different representations. So there is limited
value in providing a single, inflexible backend implementation.

I'm going to second Neil here. This feature becomes useful *only* when there
is a certified or de-facto universal standard XML representation for database
data. Then I could see a case for it. But there isn't.

Feel free to throw it on pgFoundry, though.

--
Josh Berkus
Aglio Database Solutions
San Francisco

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#26)
Re: patches for items from TODO list

Josh Berkus <josh@agliodbs.com> writes:

I'm going to second Neil here.

I think the same --- given the points about lack of standardization,
it seems premature to put this into the backend. I'd be for it if
there were a clear standard, but ...

regards, tom lane

#28Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Josh Berkus (#26)
Re: patches for items from TODO list

I'm going to second Neil here. This feature becomes useful *only* when there
is a certified or de-facto universal standard XML representation for database
data. Then I could see a case for it. But there isn't.

We've done it in phpPgAdmin (we made up our own standard), and a couple
of people use it. I also do not think that it should be in the backend
until there is a standard. Here is what phpPgAdmin produces (note NULL
handling):

<?xml version="1.0" encoding="UTF-8" ?>
<data>
<header>
<column name="feature_id" type="varchar" />
<column name="feature_name" type="varchar" />
<column name="is_supported" type="varchar" />
<column name="is_verified_by" type="varchar" />
<column name="comments" type="varchar" />
</header>
<records>
<row>
<column name="feature_id">PKG000</column>
<column name="feature_name">Core</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG001</column>
<column name="feature_name">Enhanced datetime facilities</column>
<column name="is_supported">YES</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG002</column>
<column name="feature_name">Enhanced integrity management</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG003</column>
<column name="feature_name">OLAP facilities</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG004</column>
<column name="feature_name">PSM</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">PL/pgSQL is similar.</column>
</row>
<row>
<column name="feature_id">PKG005</column>
<column name="feature_name">CLI</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">ODBC is similar.</column>
</row>
<row>
<column name="feature_id">PKG006</column>
<column name="feature_name">Basic object support</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG007</column>
<column name="feature_name">Enhanced object support</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG008</column>
<column name="feature_name">Active database</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG010</column>
<column name="feature_name">OLAP</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">NO</column>
</row>
</records>
</data>

#29Andrew Dunstan
andrew@dunslane.net
In reply to: Christopher Kings-Lynne (#28)
Re: patches for items from TODO list

minor nit: the null attribute should take XMLSchema boolean type values:
{true, false, 1, 0}

More importantly, how do you handle array or record type fields? If they
are just opaque text I don't think that's what I at least would want
from XML output routines.

cheers

andrew

Christopher Kings-Lynne wrote:

Show quoted text

I'm going to second Neil here. This feature becomes useful *only*
when there is a certified or de-facto universal standard XML
representation for database data. Then I could see a case for it.
But there isn't.

We've done it in phpPgAdmin (we made up our own standard), and a
couple of people use it. I also do not think that it should be in the
backend until there is a standard. Here is what phpPgAdmin produces
(note NULL handling):

<?xml version="1.0" encoding="UTF-8" ?>
<data>
<header>
<column name="feature_id" type="varchar" />
<column name="feature_name" type="varchar" />
<column name="is_supported" type="varchar" />
<column name="is_verified_by" type="varchar" />
<column name="comments" type="varchar" />
</header>
<records>
<row>
<column name="feature_id">PKG000</column>
<column name="feature_name">Core</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG001</column>
<column name="feature_name">Enhanced datetime
facilities</column>
<column name="is_supported">YES</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG002</column>
<column name="feature_name">Enhanced integrity
management</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG003</column>
<column name="feature_name">OLAP facilities</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG004</column>
<column name="feature_name">PSM</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">PL/pgSQL is similar.</column>
</row>
<row>
<column name="feature_id">PKG005</column>
<column name="feature_name">CLI</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">ODBC is similar.</column>
</row>
<row>
<column name="feature_id">PKG006</column>
<column name="feature_name">Basic object support</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG007</column>
<column name="feature_name">Enhanced object support</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG008</column>
<column name="feature_name">Active database</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments" null="null"></column>
</row>
<row>
<column name="feature_id">PKG010</column>
<column name="feature_name">OLAP</column>
<column name="is_supported">NO</column>
<column name="is_verified_by" null="null"></column>
<column name="comments">NO</column>
</row>
</records>
</data>

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#30Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Sergey Ten (#6)
1 attachment(s)
Re: [HACKERS] patches for items from TODO list

I have extracted the COPY \x part of your patch, and have attached it
for later application to CVS for 8.1.

---------------------------------------------------------------------------

Sergey Ten wrote:

Hello all,

Thank you to all who replied for suggestions and help. Enclosed please find
code changes for the following items:
- Allow COPY to understand \x as a hex byte, and
- Add XML output to COPY
The changes include implementation of the features as well as modification
of the copy regression test.

After a careful consideration we decided to
- put XML implementation in the backend and
- use XML format described below, with justification of our decision.

The XML schema used by the COPY TO command was designed for ease of use and
to avoid the problem of column names appearing in XML element names.
XML doesn't allow spaces and punctuation in element names but Postgres does
allow these characters in column names; therefore, a direct mapping would be
problematic.

The solution selected places the column names into attribute fields where
any special characters they contain can be properly escaped using XML
entities. An additional attribute is used to distinguish null fields from
empty ones.

The example below is taken from the test suite. It demonstrates some basic
XML escaping in row 2. Row 3 demonstrates the difference between an empty
string (in col2) and a null string (in col3). If a field is null it will
always be empty but a field which is empty may or may not be null.
Always check the value of the 'null' attribute to be sure when a field is
truly null.

<?xml version='1.0'?>
<table>
<row>
<col name='col1' null='n'>Jackson, Sam</col>
<col name='col2' null='n'>\h</col>
</row>
<row>
<col name='col1' null='n'>It is &quot;perfect&quot;.</col>
<col name='col2' null='n'>&#09;</col>
</row>
<row>
<col name='col1' null='n'></col>
<col name='col2' null='y'></col>
</row>
</table>

Please let us know if about any concerns, objections the proposed change may
cause.

Best regards,
Jason Lucas, Sergey Ten
SourceLabs

-----Original Message-----
From: Bruce Momjian [mailto:pgman@candle.pha.pa.us]
Sent: Wednesday, May 11, 2005 7:11 PM
To: Sergey Ten
Cc: pgsql-hackers@postgresql.org; jason@sourcelabs.com
Subject: Re: [HACKERS] patches for items from TODO list

Sergey Ten wrote:

Hello all,

We would like to contribute to the Postgresql community by implementing
the following items from the TODO list
(http://developer.postgresql.org/todo.php):
. Allow COPY to understand \x as a hex byte . Allow COPY to optionally
include column headings in the first line . Add XML output to COPY

The changes are straightforward and include implementation of the
features as well as modification of the regression tests and

documentation.

Before sending a diff file with the changes, we would like to know if
these features have been already implemented.

Please check the web site version. Someone has already implemented
"Allow COPY to optionally include column headings in the first line".

As far as XML, there has been discussion on where that should be done?
In the backend, libpq, or psql. It will need discussion on hackers. I
assume you have read the developer's FAQ too.

--
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania
19073

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/pgpatches/hextext/plainDownload
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.65
diff -c -c -r1.65 copy.sgml
*** doc/src/sgml/ref/copy.sgml	7 May 2005 02:22:45 -0000	1.65
--- doc/src/sgml/ref/copy.sgml	28 May 2005 03:49:10 -0000
***************
*** 424,436 ****
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
       </tbody>
      </tgroup>
     </informaltable>
  
!     Presently, <command>COPY TO</command> will never emit an octal-digits
!     backslash sequence, but it does use the other sequences listed above
!     for those control characters.
     </para>
  
     <para>
--- 424,441 ----
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
+       <row>
+        <entry><literal>\x</><replaceable>digits</></entry>
+        <entry>Backslash <literal>x</> followed by one or two hex digits specifies
+        the character with that numeric code</entry>
+       </row>
       </tbody>
      </tgroup>
     </informaltable>
  
!     Presently, <command>COPY TO</command> will never emit an octal or 
!     hex-digits backslash sequence, but it does use the other sequences
!     listed above for those control characters.
     </para>
  
     <para>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.244
diff -c -c -r1.244 copy.c
*** src/backend/commands/copy.c	7 May 2005 02:22:46 -0000	1.244
--- src/backend/commands/copy.c	28 May 2005 03:49:12 -0000
***************
*** 2274,2279 ****
--- 2274,2294 ----
  	return result;
  }
  
+ /*
+  *	Return decimal value for a hexadecimal digit
+  */
+ static
+ int GetDecimalFromHex(char hex)
+ {
+ 	if (isdigit(hex))
+ 		return hex - '0';
+ 	else
+ 	{
+ 		hex = tolower(hex);
+ 		return hex - 'a' + 10;
+ 	}
+ }
+ 
  /*----------
   * Read the value of a single attribute, performing de-escaping as needed.
   *
***************
*** 2335,2340 ****
--- 2350,2356 ----
  				case '5':
  				case '6':
  				case '7':
+ 					/* handle \013 */
  					{
  						int			val;
  
***************
*** 2360,2365 ****
--- 2376,2405 ----
  						c = val & 0377;
  					}
  					break;
+ 				case 'x':
+ 					/* Handle \x3F */
+ 					if (line_buf.cursor < line_buf.len)
+ 					{
+ 						char hexchar = line_buf.data[line_buf.cursor];
+ 
+ 						if (isxdigit(hexchar))
+ 						{
+ 							int val = GetDecimalFromHex(hexchar);
+ 
+ 							line_buf.cursor++;
+ 							if (line_buf.cursor < line_buf.len)
+ 							{
+ 								hexchar = line_buf.data[line_buf.cursor];
+ 								if (isxdigit(hexchar))
+ 								{
+ 									line_buf.cursor++;
+ 									val = (val << 4) + GetDecimalFromHex(hexchar);
+ 								}
+ 							}
+ 							c = val & 0xff;
+ 						}
+ 					}
+ 					break;
  				case 'b':
  					c = '\b';
  					break;
#31Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Josh Berkus (#26)
Re: patches for items from TODO list

I have removed the XML TODO item:

* Add XML output to pg_dump and COPY

We already allow XML to be stored in the database, and XPath queries
can be used on that data using /contrib/xml2. It also supports XSLT
transformations.

---------------------------------------------------------------------------

Josh Berkus wrote:

Folks,

- The COPY -> XML transformation is trivial -- it would be easy for
clients to roll their own. At the same time, there is no standard or
canonical XML representation for COPY output, and I can easily imagine
different clients needing different representations. So there is limited
value in providing a single, inflexible backend implementation.

I'm going to second Neil here. This feature becomes useful *only* when there
is a certified or de-facto universal standard XML representation for database
data. Then I could see a case for it. But there isn't.

Feel free to throw it on pgFoundry, though.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#32Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Sergey Ten (#6)
3 attachment(s)
Re: [HACKERS] patches for items from TODO list

Here is an updated version of the COPY \x patch. It is the first patch
attached.

Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend. This is the second patch.

Third, I found out that psql has some unusual handling of escaped
numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
is octal, and \0xddd is decimal. It is basically following the strtol()
rules for an escaped value. This seems confusing and contradicts how
the rest of our system works. I looked at 'bash' and found that it
supports the \000 and \x00 just like C, so I am confused why we have
the current behavior. This patch makes psql consistent with the rest of
our system for back slashes. It does break backward compatibility. It
wouldn't be a big deal to fix, except we document this in the psql
manual page, and that adds confusion.

FYI, here is the current psql behavior:

test=> \set x '\42'
test=> \echo :x
*
test=> \set x '\042'
test=> \echo :x
"
test=> \set x '\0x42'
test=> \echo :x
B

The new behavior is:

test=> \set x '\42'
test=> \echo :x
"
test=> \set x '\042'
test=> \echo :x
"
test=> \set x '\x42'
test=> \echo :x
B

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/pgpatches/copyhextext/plainDownload
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.65
diff -c -c -r1.65 copy.sgml
*** doc/src/sgml/ref/copy.sgml	7 May 2005 02:22:45 -0000	1.65
--- doc/src/sgml/ref/copy.sgml	28 May 2005 14:02:59 -0000
***************
*** 424,436 ****
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
       </tbody>
      </tgroup>
     </informaltable>
  
!     Presently, <command>COPY TO</command> will never emit an octal-digits
!     backslash sequence, but it does use the other sequences listed above
!     for those control characters.
     </para>
  
     <para>
--- 424,441 ----
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
+       <row>
+        <entry><literal>\x</><replaceable>digits</></entry>
+        <entry>Backslash <literal>x</> followed by one or two hex digits specifies
+        the character with that numeric code</entry>
+       </row>
       </tbody>
      </tgroup>
     </informaltable>
  
!     Presently, <command>COPY TO</command> will never emit an octal or 
!     hex-digits backslash sequence, but it does use the other sequences
!     listed above for those control characters.
     </para>
  
     <para>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.244
diff -c -c -r1.244 copy.c
*** src/backend/commands/copy.c	7 May 2005 02:22:46 -0000	1.244
--- src/backend/commands/copy.c	28 May 2005 14:03:07 -0000
***************
*** 2274,2279 ****
--- 2274,2291 ----
  	return result;
  }
  
+ /*
+  *	Return decimal value for a hexadecimal digit
+  */
+ static
+ int GetDecimalFromHex(char hex)
+ {
+ 	if (isdigit(hex))
+ 		return hex - '0';
+ 	else
+ 		return tolower(hex) - 'a' + 10;
+ }
+ 
  /*----------
   * Read the value of a single attribute, performing de-escaping as needed.
   *
***************
*** 2335,2340 ****
--- 2347,2353 ----
  				case '5':
  				case '6':
  				case '7':
+ 					/* handle \013 */
  					{
  						int			val;
  
***************
*** 2360,2365 ****
--- 2373,2402 ----
  						c = val & 0377;
  					}
  					break;
+ 				case 'x':
+ 					/* Handle \x3F */
+ 					if (line_buf.cursor < line_buf.len)
+ 					{
+ 						char hexchar = line_buf.data[line_buf.cursor];
+ 
+ 						if (isxdigit(hexchar))
+ 						{
+ 							int val = GetDecimalFromHex(hexchar);
+ 
+ 							line_buf.cursor++;
+ 							if (line_buf.cursor < line_buf.len)
+ 							{
+ 								hexchar = line_buf.data[line_buf.cursor];
+ 								if (isxdigit(hexchar))
+ 								{
+ 									line_buf.cursor++;
+ 									val = (val << 4) + GetDecimalFromHex(hexchar);
+ 								}
+ 							}
+ 							c = val & 0xff;
+ 						}
+ 					}
+ 					break;
  				case 'b':
  					c = '\b';
  					break;

/pgpatches/strhextext/plainDownload
Index: doc/src/sgml/datatype.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.157
diff -c -c -r1.157 datatype.sgml
*** doc/src/sgml/datatype.sgml	1 May 2005 15:54:46 -0000	1.157
--- doc/src/sgml/datatype.sgml	28 May 2005 14:02:40 -0000
***************
*** 1118,1124 ****
     <para>
      When entering <type>bytea</type> values, octets of certain values
      <emphasis>must</emphasis> be escaped (but all octet values
!     <emphasis>may</emphasis> be escaped) when used as part of a string
      literal in an <acronym>SQL</acronym> statement. In general, to
      escape an octet, it is converted into the three-digit octal number
      equivalent of its decimal octet value, and preceded by two
--- 1118,1124 ----
     <para>
      When entering <type>bytea</type> values, octets of certain values
      <emphasis>must</emphasis> be escaped (but all octet values
!     <emphasis>can</emphasis> be escaped) when used as part of a string
      literal in an <acronym>SQL</acronym> statement. In general, to
      escape an octet, it is converted into the three-digit octal number
      equivalent of its decimal octet value, and preceded by two
Index: doc/src/sgml/libpq.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/libpq.sgml,v
retrieving revision 1.180
diff -c -c -r1.180 libpq.sgml
*** doc/src/sgml/libpq.sgml	26 Feb 2005 18:39:04 -0000	1.180
--- doc/src/sgml/libpq.sgml	28 May 2005 14:02:46 -0000
***************
*** 2229,2235 ****
  
  <para>
     Certain byte values <emphasis>must</emphasis> be escaped (but all
!    byte values <emphasis>may</emphasis> be escaped) when used as part
     of a <type>bytea</type> literal in an <acronym>SQL</acronym>
     statement. In general, to escape a byte, it is converted into the
     three digit octal number equal to the octet value, and preceded by
--- 2229,2235 ----
  
  <para>
     Certain byte values <emphasis>must</emphasis> be escaped (but all
!    byte values <emphasis>can</emphasis> be escaped) when used as part
     of a <type>bytea</type> literal in an <acronym>SQL</acronym>
     statement. In general, to escape a byte, it is converted into the
     three digit octal number equal to the octet value, and preceded by
Index: doc/src/sgml/syntax.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v
retrieving revision 1.99
diff -c -c -r1.99 syntax.sgml
*** doc/src/sgml/syntax.sgml	23 Dec 2004 05:37:40 -0000	1.99
--- doc/src/sgml/syntax.sgml	28 May 2005 14:02:58 -0000
***************
*** 254,270 ****
  
      <para>
       Another <productname>PostgreSQL</productname> extension is that
!      C-style backslash escapes are available:
!      <literal>\b</literal> is a backspace, <literal>\f</literal> is a
!      form feed, <literal>\n</literal> is a newline,
!      <literal>\r</literal> is a carriage return, <literal>\t</literal>
!      is a tab, and <literal>\<replaceable>xxx</replaceable></literal>,
!      where <replaceable>xxx</replaceable> is an octal number, is a
!      byte with the corresponding code.  (It is your responsibility
!      that the byte sequences you create are valid characters in the
!      server character set encoding.)  Any other character following a
!      backslash is taken literally.  Thus, to include a backslash in a
!      string constant, write two backslashes.
      </para>
  
      <para>
--- 254,271 ----
  
      <para>
       Another <productname>PostgreSQL</productname> extension is that
!      C-style backslash escapes are available: <literal>\b</literal> is a
!      backspace, <literal>\f</literal> is a form feed,
!      <literal>\n</literal> is a newline, <literal>\r</literal> is a
!      carriage return, <literal>\t</literal> is a tab. Also supported is
!      <literal>\<replaceable>digits</replaceable></literal>, where
!      <replaceable>ddd</replaceable> represents an octal byte value, and
!      <literal>\x<replaceable>hexdigits</replaceable></literal>, where
!      <replaceable>hexdigits</replaceable> represents a hexadecimal byte value.
!      (It is your responsibility that the byte sequences you create are
!      valid characters in the server character set encoding.) Any other
!      character following a backslash is taken literally. Thus, to
!      include a backslash in a string constant, write two backslashes.
      </para>
  
      <para>
Index: src/backend/parser/scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v
retrieving revision 1.122
diff -c -c -r1.122 scan.l
*** src/backend/parser/scan.l	26 May 2005 01:24:29 -0000	1.122
--- src/backend/parser/scan.l	28 May 2005 14:03:10 -0000
***************
*** 193,200 ****
  xqstart			{quote}
  xqdouble		{quote}{quote}
  xqinside		[^\\']+
! xqescape		[\\][^0-7]
  xqoctesc		[\\][0-7]{1,3}
  
  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
--- 193,201 ----
  xqstart			{quote}
  xqdouble		{quote}{quote}
  xqinside		[^\\']+
! xqescape		[\\][^0-7x]
  xqoctesc		[\\][0-7]{1,3}
+ xqhexesc		[\\]x[0-9A-Fa-f]{1,2}
  
  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
***************
*** 435,440 ****
--- 436,445 ----
  					unsigned char c = strtoul(yytext+1, NULL, 8);
  					addlitchar(c);
  				}
+ <xq>{xqhexesc}  {
+ 					unsigned char c = strtoul(yytext+2, NULL, 16);
+ 					addlitchar(c);
+ 				}
  <xq>{quotecontinue} {
  					/* ignore */
  				}
/pgpatches/psqlhextext/plainDownload
Index: doc/src/sgml/ref/psql-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/psql-ref.sgml,v
retrieving revision 1.135
diff -c -c -r1.135 psql-ref.sgml
*** doc/src/sgml/ref/psql-ref.sgml	28 Apr 2005 13:09:59 -0000	1.135
--- doc/src/sgml/ref/psql-ref.sgml	28 May 2005 16:23:01 -0000
***************
*** 590,599 ****
      precede it by a backslash. Anything contained in single quotes is
      furthermore subject to C-like substitutions for
      <literal>\n</literal> (new line), <literal>\t</literal> (tab),
!     <literal>\</literal><replaceable>digits</replaceable>,
!     <literal>\0</literal><replaceable>digits</replaceable>, and
!     <literal>\0x</literal><replaceable>digits</replaceable> (the
!     character with the given decimal, octal, or hexadecimal code).
      </para>
  
      <para>
--- 590,598 ----
      precede it by a backslash. Anything contained in single quotes is
      furthermore subject to C-like substitutions for
      <literal>\n</literal> (new line), <literal>\t</literal> (tab),
!     <literal>\</literal><replaceable>digits</replaceable>, and
!     <literal>\x</literal><replaceable>digits</replaceable> (the
!     character with the given octal or hexadecimal code).
      </para>
  
      <para>
Index: src/bin/psql/psqlscan.l
===================================================================
RCS file: /cvsroot/pgsql/src/bin/psql/psqlscan.l,v
retrieving revision 1.10
diff -c -c -r1.10 psqlscan.l
*** src/bin/psql/psqlscan.l	26 May 2005 01:24:29 -0000	1.10
--- src/bin/psql/psqlscan.l	28 May 2005 16:23:10 -0000
***************
*** 849,877 ****
  "\\r"			{ appendPQExpBufferChar(output_buf, '\r'); }
  "\\f"			{ appendPQExpBufferChar(output_buf, '\f'); }
  
! "\\"[1-9][0-9]*	{
! 					/* decimal case */
! 					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 0));
! 				}
! 
! "\\"0[0-7]*		{
  					/* octal case */
  					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 0));
  				}
  
! "\\"0[xX][0-9A-Fa-f]+	{
  					/* hex case */
  					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 0));
! 				}
! 
! "\\"0[xX]	{
! 					/* failed hex case */
! 					yyless(2);
! 					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 0));
  				}
  
  "\\".			{ emit(yytext + 1, 1); }
--- 849,864 ----
  "\\r"			{ appendPQExpBufferChar(output_buf, '\r'); }
  "\\f"			{ appendPQExpBufferChar(output_buf, '\f'); }
  
! "\\"[0-7]{1,3}	{
  					/* octal case */
  					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 8));
  				}
  
! "\\"x[0-9A-Fa-f]{1,2}	{
  					/* hex case */
  					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 2, NULL, 16));
  				}
  
  "\\".			{ emit(yytext + 1, 1); }
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#32)
Re: [HACKERS] patches for items from TODO list

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Here is an updated version of the COPY \x patch. It is the first patch
attached.
Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend. This is the second patch.

Do we really want to do any of these things? We've been getting beaten
up recently about the fact that we have non-SQL-spec string escapes
(ie, all the backslash stuff) so I'm a bit dubious about adding more,
especially when there's little if any demand for it.

I don't object too much to the COPY addition, since that's outside any
spec anyway, but I do think we ought to think twice about adding this
to SQL literal handling.

Third, I found out that psql has some unusual handling of escaped
numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
is octal, and \0xddd is decimal. It is basically following the strtol()
rules for an escaped value. This seems confusing and contradicts how
the rest of our system works.

I agree, that's just going to confuse people.

! xqescape [\\][^0-7x]

If you are going to insist on this, at least make it case-insensitive.

regards, tom lane

#34Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#33)
Escape handling in COPY, strings, psql

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Here is an updated version of the COPY \x patch. It is the first patch
attached.
Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend. This is the second patch.

Do we really want to do any of these things? We've been getting beaten
up recently about the fact that we have non-SQL-spec string escapes
(ie, all the backslash stuff) so I'm a bit dubious about adding more,
especially when there's little if any demand for it.

I thought about that, but adding additional escape letters isn't our
problem --- it is the escape mechanism itself that is the issue.

I have wanted to post on this issue so now is a good time. I think we
have been validly beaten up in that we pride ourselves on standards
compliance but have escape requirement on all strings. Our string
escapes are a major problem --- not the number of them but the
requirement to double backslashes on input, like 'C:\\tmp'. I am
thinking the only clean solution is to add a special keyword like ESCAPE
before strings that contain escape information. I think a GUC is too
general. You know if the string is a constant if it contains escapes
just by looking at it, and if it is a variable, hopefully you know if it
has escapes.

Basically, I think we have to deal with this somehow. I think it could
be implemented by looking for the ESCAPE keyword in parser/scan.l and
handling it all in there by ignoring backslash escapes if ESCAPE
preceeds the string. By the time you are in gram.y, it is too late.

I don't object too much to the COPY addition, since that's outside any
spec anyway, but I do think we ought to think twice about adding this
to SQL literal handling.

Third, I found out that psql has some unusual handling of escaped
numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
is octal, and \0xddd is decimal. It is basically following the strtol()
rules for an escaped value. This seems confusing and contradicts how
the rest of our system works.

I agree, that's just going to confuse people.

! xqescape [\\][^0-7x]

If you are going to insist on this, at least make it case-insensitive.

The submitted COPY patch also was case-insensitive, \x and \X, but I
changed that because we are case-sensitive for all backslashes in COPY,
and C is the same (\n and \N are different too, so we actually use the
case-sensitivity). Should we allow \X just so it is case-insensitive
like the SQL specification X'4f'? That is the only logic I can think of
for it to be case-insensitive, but we have to then do that at all
levels, and I am not sure it makes sense.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#35Michael Paesold
mpaesold@gmx.at
In reply to: Tom Lane (#33)
Re: [HACKERS] patches for items from TODO list

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Here is an updated version of the COPY \x patch. It is the first patch
attached.
Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend. This is the second patch.

Do we really want to do any of these things? We've been getting beaten
up recently about the fact that we have non-SQL-spec string escapes
(ie, all the backslash stuff) so I'm a bit dubious about adding more,
especially when there's little if any demand for it.

I don't object too much to the COPY addition, since that's outside any
spec anyway, but I do think we ought to think twice about adding this
to SQL literal handling.

+1 from me on this for Tom -- please don't break spec compliance!

Best Regards,
Michael Paesold

#36Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#34)
Re: Escape handling in COPY, strings, psql

Bruce Momjian wrote:

I thought about that, but adding additional escape letters isn't our
problem --- it is the escape mechanism itself that is the issue.

In a random-encoding environment, the option to specify byte values
directly -- at any level -- is of limited value anyway and is a
potential source of errors. So let's stay away from that.

I did not find the original posts that your quotations came from, but it
has to be considered that COPY is also subject to encoding processing.
Overall, I find this proposal to be a dangerous option.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#37Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#34)
Re: Escape handling in COPY, strings, psql

Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Here is an updated version of the COPY \x patch. It is the first patch
attached.
Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend. This is the second patch.

Do we really want to do any of these things? We've been getting beaten
up recently about the fact that we have non-SQL-spec string escapes
(ie, all the backslash stuff) so I'm a bit dubious about adding more,
especially when there's little if any demand for it.

I thought about that, but adding additional escape letters isn't our
problem --- it is the escape mechanism itself that is the issue.

I have wanted to post on this issue so now is a good time. I think we
have been validly beaten up in that we pride ourselves on standards
compliance but have escape requirement on all strings. Our string
escapes are a major problem --- not the number of them but the
requirement to double backslashes on input, like 'C:\\tmp'. I am
thinking the only clean solution is to add a special keyword like ESCAPE
before strings that contain escape information. I think a GUC is too
general. You know if the string is a constant if it contains escapes
just by looking at it, and if it is a variable, hopefully you know if it
has escapes.

Basically, I think we have to deal with this somehow. I think it could
be implemented by looking for the ESCAPE keyword in parser/scan.l and
handling it all in there by ignoring backslash escapes if ESCAPE
preceeds the string. By the time you are in gram.y, it is too late.

One other idea would be to remove escape processing for single-quoted
strings but keep it for our $$ strings, becuase they are not ANSI
standard.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#37)
Re: Escape handling in COPY, strings, psql

Bruce Momjian <pgman@candle.pha.pa.us> writes:

One other idea would be to remove escape processing for single-quoted
strings but keep it for our $$ strings, becuase they are not ANSI
standard.

There is *no* escape processing within $$, and never will be, because
that would destroy the entire point. You'd be right back having to
double backslashes.

regards, tom lane

#39Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#37)
Re: Escape handling in COPY, strings, psql

Bruce Momjian wrote:

I am thinking the only clean solution is to add a special keyword
like ESCAPE before strings that contain escape information. I
think a GUC is too general. You know if the string is a constant
if it contains escapes just by looking at it, and if it is a
variable, hopefully you know if it has escapes.

I do support gradually phasing out backslash escapes in standard string
literals in the interest of portability. Most of the current escape
sequences are of limited value anyway. Let's think about ways to get
there:

Enabling escape sequences in string literals controls the formatting of
input (and output?) data, so it is akin to, say, the client encoding
and the date style, so a GUC variable isn't out of the question in my
mind. It makes most sense, though, if we want to eventually make users
switch it off all the time, that is, as a transition aid. But before
that can happen, we need to come up with an alternative mechanism to
enter weird characters.

One such way may be to provide functions (say, chr(), tab(), etc.) to
give access to unprintable characters, but that will result in terrible
performance for long strings and it also won't help with COPY or places
where only literals are allowed.

Another way would be to allow escape sequences only in specially marked
strings. The proposal above doing 'foo' ESCAPE 'x' seems fairly
elegant for SQL linguists but would be pretty weird to implement in the
lexer. It won't help with COPY either, but that is really the case for
all solutions.

A more compact represenation may be using a prefix letter, like E'foo'.
This fits the SQL syntax, is familiar with Python programmers (although
in the other direction), and can be implemented efficiently in the
lexer. I like that the best, personally.

For COPY, we would probably have to use a flag in the COPY command
itself either way (like already done for NULL AS).

Comments? Other ideas? Keep the escapes?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#39)
Re: Escape handling in COPY, strings, psql

Peter Eisentraut <peter_e@gmx.net> writes:

I do support gradually phasing out backslash escapes in standard string
literals in the interest of portability. Most of the current escape
sequences are of limited value anyway. Let's think about ways to get
there:

I really don't think there is any way to get there without creating
gaping security holes in all kinds of client code :-(. If we change
the escaping rules, then a client that is expecting some other rule
than happens to be in force will be subject to trivial SQL-injection
attacks. This will make the autocommit fiasco pale by comparison ...

For COPY, we would probably have to use a flag in the COPY command
itself either way (like already done for NULL AS).

The spec-compatibility argument for removing escapes does not apply to
COPY at all, so I see no need to fool with the COPY definition in any
case.

regards, tom lane

#41Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#39)
Re: Escape handling in COPY, strings, psql

Peter Eisentraut wrote:

Bruce Momjian wrote:

I am thinking the only clean solution is to add a special keyword
like ESCAPE before strings that contain escape information. I
think a GUC is too general. You know if the string is a constant
if it contains escapes just by looking at it, and if it is a
variable, hopefully you know if it has escapes.

I do support gradually phasing out backslash escapes in standard string
literals in the interest of portability. Most of the current escape
sequences are of limited value anyway. Let's think about ways to get
there:

Enabling escape sequences in string literals controls the formatting of
input (and output?) data, so it is akin to, say, the client encoding
and the date style, so a GUC variable isn't out of the question in my
mind. It makes most sense, though, if we want to eventually make users
switch it off all the time, that is, as a transition aid. But before
that can happen, we need to come up with an alternative mechanism to
enter weird characters.

One such way may be to provide functions (say, chr(), tab(), etc.) to
give access to unprintable characters, but that will result in terrible
performance for long strings and it also won't help with COPY or places
where only literals are allowed.

Another way would be to allow escape sequences only in specially marked
strings. The proposal above doing 'foo' ESCAPE 'x' seems fairly
elegant for SQL linguists but would be pretty weird to implement in the
lexer. It won't help with COPY either, but that is really the case for
all solutions.

I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to be
before the string so scan.l can alter its processing of the string ---
after the string is too late --- there is no way to undo any escaping
that has happened, and it might already be used by gram.y.

I could probably hack up a sample implementation if people are
interested.

A more compact representation may be using a prefix letter, like E'foo'.
This fits the SQL syntax, is familiar with Python programmers (although
in the other direction), and can be implemented efficiently in the
lexer. I like that the best, personally.

For COPY, we would probably have to use a flag in the COPY command
itself either way (like already done for NULL AS).

I agree with Tom that COPY has to be left unchanged. The fundamental
problem is the representation of NULL values, that I don't think we can
do without some escape mechanism. Single-quote escapes works by
doubling them, but once you need to represent something more like
null's, I can't think of a solution without escapes.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#42Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#40)
Re: Escape handling in COPY, strings, psql

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

I do support gradually phasing out backslash escapes in standard string
literals in the interest of portability. Most of the current escape
sequences are of limited value anyway. Let's think about ways to get
there:

I really don't think there is any way to get there without creating
gaping security holes in all kinds of client code :-(. If we change
the escaping rules, then a client that is expecting some other rule
than happens to be in force will be subject to trivial SQL-injection
attacks. This will make the autocommit fiasco pale by comparison ...

I looked at PQescapeString() and fortunately it escapes single quotes
by doing double-single quotes, not by using a backslash. This was
probably chosen for standards compliance.

Basically, I think our current behavior is not sustainable. I think we
are going to need to do something, and I think we should consider a
solution now rather than later. I don't think we can be as serious a
contender for portability without some kind of solution.

I am thinking we should first tell people in 8.1 that they should start
using only double-single quotes, and perhaps support the ESCAPE phrase
as a no-op, and then consider some kind of solution in 8.2 or later.

I don't think fixing this is going to be a huge security problem, but it
might be a small one. The good thing is that double-single quotes work,
so if people use only that for quote escaping, if you forget the ESCAPE
clause, you just get literal backslashes, not a security problem.

I ran the following test:

test=> select $$\$$;
?column?
----------
\
(1 row)

test=> create table test (x TEXT);
CREATE TABLE
test=> INSERT INTO test VALUES ($$\$$);
INSERT 0 1
test=> SELECT * FROM test;
x
---
\
(1 row)

and the good news is that output of backslashes is fine --- it is just
input that is the issue, and the security problem is only using \',
which we would have to tell people to avoid and start using only ''.

I think we can tell people in 8.1 that they should modify their
applications to only use '', and that \' might be a security problem in
the future. If we get to that then using ESC or not only affects input
of values and literal backslashes being entered, and my guess is that
90% of the backslash entries that want escaping are literal in the
application and not supplied by program variables. In fact, if we
disable backslash by default then strings coming in only have to deal
with single quotes (like other databases) and the system is more secure
because there is no special backslash handling by default.

For COPY, we would probably have to use a flag in the COPY command
itself either way (like already done for NULL AS).

The spec-compatibility argument for removing escapes does not apply to
COPY at all, so I see no need to fool with the COPY definition in any
case.

Agreed.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#43Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#42)
Re: Escape handling in COPY, strings, psql

I think we can tell people in 8.1 that they should modify their
applications to only use '', and that \' might be a security problem in
the future. If we get to that then using ESC or not only affects input
of values and literal backslashes being entered, and my guess is that
90% of the backslash entries that want escaping are literal in the
application and not supplied by program variables. In fact, if we
disable backslash by default then strings coming in only have to deal
with single quotes (like other databases) and the system is more secure
because there is no special backslash handling by default.

I can tell you right now this will be a problem :) There are loads of
PHP ppl who use addslashes() instead of pg_escape_string() to escape data.

Chris

#44Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#41)
Re: Escape handling in COPY, strings, psql

Bruce Momjian wrote:

I was suggesting ESCAPE 'string' or ESC 'string'.  The marker has to
be before the string so scan.l can alter its processing of the string
--- after the string is too late --- there is no way to undo any
escaping that has happened, and it might already be used by gram.y.

That pretty much corresponds to my E'string' proposal. Both are
probably equally trivial to implement.

I agree with Tom that COPY has to be left unchanged. The fundamental
problem is the representation of NULL values, that I don't think we
can do without some escape mechanism. Single-quote escapes works by
doubling them, but once you need to represent something more like
null's, I can't think of a solution without escapes.

Yes, I now realize that COPY has a whole set of different rules anyway,
so we can leave that out of this discussion.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#45Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Christopher Kings-Lynne (#43)
Re: Escape handling in COPY, strings, psql

Christopher Kings-Lynne wrote:

I think we can tell people in 8.1 that they should modify their
applications to only use '', and that \' might be a security problem in
the future. If we get to that then using ESC or not only affects input
of values and literal backslashes being entered, and my guess is that
90% of the backslash entries that want escaping are literal in the
application and not supplied by program variables. In fact, if we
disable backslash by default then strings coming in only have to deal
with single quotes (like other databases) and the system is more secure
because there is no special backslash handling by default.

I can tell you right now this will be a problem :) There are loads of
PHP ppl who use addslashes() instead of pg_escape_string() to escape data.

I read the PHP addslashes() manual page:

http://us3.php.net/addslashes

First, I see what people mean about PHP having most of the complex
content in comments, rather than in the actual manual text, and this
tendency is certainly something we want to avoid --- you end up having
to digest all the comments to find the details that should be in the
manual already.

On to the case at hand, the comments mention that addslashes() isn't
safe for all databases, and in fact isn't the prefered method. I do
think it could be a problem we have to have people avoid. One idea for
8.1 is to throw a warning if \' appears in a string, thereby helping
people find the places they are using the incorrect non-standard
escaping.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#46Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Bruce Momjian (#45)
Re: Escape handling in COPY, strings, psql

I read the PHP addslashes() manual page:

http://us3.php.net/addslashes

First, I see what people mean about PHP having most of the complex
content in comments, rather than in the actual manual text, and this
tendency is certainly something we want to avoid --- you end up having
to digest all the comments to find the details that should be in the
manual already.

Actually, all the comments are posted on the php-doc list, with
automatic urls in them for 'fixed in cvs', 'rejected', etc.

Each comment is supposed to be acted upon (ie. fixed in source), then
deleted.

There's still a lot of old comment around that hasn't had that treatment
though...

Chris

#47Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Christopher Kings-Lynne (#46)
Re: Escape handling in COPY, strings, psql

Christopher Kings-Lynne wrote:

I read the PHP addslashes() manual page:

http://us3.php.net/addslashes

First, I see what people mean about PHP having most of the complex
content in comments, rather than in the actual manual text, and this
tendency is certainly something we want to avoid --- you end up having
to digest all the comments to find the details that should be in the
manual already.

Actually, all the comments are posted on the php-doc list, with
automatic urls in them for 'fixed in cvs', 'rejected', etc.

Each comment is supposed to be acted upon (ie. fixed in source), then
deleted.

There's still a lot of old comment around that hasn't had that treatment
though...

Right, they are more _usage_ comments, but still I think they could be
consolidated into manual text.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#48Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#44)
Re: Escape handling in COPY, strings, psql

Peter Eisentraut wrote:

Bruce Momjian wrote:

I was suggesting ESCAPE 'string' or ESC 'string'.  The marker has to
be before the string so scan.l can alter its processing of the string
--- after the string is too late --- there is no way to undo any
escaping that has happened, and it might already be used by gram.y.

That pretty much corresponds to my E'string' proposal. Both are
probably equally trivial to implement.

Right. I think your E'' idea has the benefit of fitting with our
existing X'' and B'' modifiers. It is also simpler and cleaner to do in
scan.l, so I think your idea is best.

I agree with Tom that COPY has to be left unchanged. The fundamental
problem is the representation of NULL values, that I don't think we
can do without some escape mechanism. Single-quote escapes works by
doubling them, but once you need to represent something more like
null's, I can't think of a solution without escapes.

Yes, I now realize that COPY has a whole set of different rules anyway,
so we can leave that out of this discussion.

Cool.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#49Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#33)
1 attachment(s)
Re: [HACKERS] patches for items from TODO list

Tom Lane wrote:

Third, I found out that psql has some unusual handling of escaped
numbers. Instead of using \ddd as octal, it has \ddd is decimal, \0ddd
is octal, and \0xddd is decimal. It is basically following the strtol()
rules for an escaped value. This seems confusing and contradicts how
the rest of our system works.

I agree, that's just going to confuse people.

Fixed and applied, with no hex addition.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/bjm/difftext/plainDownload
Index: src/bin/psql/psqlscan.l
===================================================================
RCS file: /cvsroot/pgsql/src/bin/psql/psqlscan.l,v
retrieving revision 1.10
diff -c -c -r1.10 psqlscan.l
*** src/bin/psql/psqlscan.l	26 May 2005 01:24:29 -0000	1.10
--- src/bin/psql/psqlscan.l	30 May 2005 14:35:43 -0000
***************
*** 849,877 ****
  "\\r"			{ appendPQExpBufferChar(output_buf, '\r'); }
  "\\f"			{ appendPQExpBufferChar(output_buf, '\f'); }
  
! "\\"[1-9][0-9]*	{
! 					/* decimal case */
! 					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 0));
! 				}
! 
! "\\"0[0-7]*		{
  					/* octal case */
  					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 0));
! 				}
! 
! "\\"0[xX][0-9A-Fa-f]+	{
! 					/* hex case */
! 					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 0));
! 				}
! 
! "\\"0[xX]	{
! 					/* failed hex case */
! 					yyless(2);
! 					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 0));
  				}
  
  "\\".			{ emit(yytext + 1, 1); }
--- 849,858 ----
  "\\r"			{ appendPQExpBufferChar(output_buf, '\r'); }
  "\\f"			{ appendPQExpBufferChar(output_buf, '\f'); }
  
! "\\"[0-7]{1,3}	{
  					/* octal case */
  					appendPQExpBufferChar(output_buf,
! 										  (char) strtol(yytext + 1, NULL, 8));
  				}
  
  "\\".			{ emit(yytext + 1, 1); }
#50Greg Stark
gsstark@mit.edu
In reply to: Bruce Momjian (#47)
Re: Escape handling in COPY, strings, psql

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Christopher Kings-Lynne wrote:

Each comment is supposed to be acted upon (ie. fixed in source), then
deleted.

Right, they are more _usage_ comments, but still I think they could be
consolidated into manual text.

If that's "supposed" to happen it certainly hasn't been the de facto
procedure.

I think they have things partly right here though. A lot of those comments
aren't actually the kinds of things that belong in the canonical reference.
They include things like "watch out for this common error" or "here's a handy
use for this function". Often the "common error" or "handy use" are pretty
bogus but every now and then there's a genuinely useful one.

These kinds of things would just clutter up a reference. A reference should
just state unambiguously this function does XYZ and give examples that help
explain XYZ.

The PHP Docs do have a bit of a problem in that often the comments include
things like "In case X, what happens is Y" which really ought to be covered by
the canonical reference. That's a problem.

--
greg

#51Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#32)
3 attachment(s)
Adding \x escape processing to COPY, psql, backend

Bruce Momjian wrote:

Here is an updated version of the COPY \x patch. It is the first patch
attached.

Also, I realized that if we support \x in COPY, we should also support
\x in strings to the backend. This is the second patch.

Here is a new version of the three \x hex support patches. I have added
\x for psql variables, which is the last patch.

I have IM'ed with Peter and he is now OK with the idea of supporting \x,
with the underestanding that it doesn't take us any farther away from
compatibility than we are now.

I have already fixed the psql octal/decimal/hex behavior in another
patch.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/pgpatches/copyhextext/plainDownload
Index: doc/src/sgml/ref/copy.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/copy.sgml,v
retrieving revision 1.65
diff -c -c -r1.65 copy.sgml
*** doc/src/sgml/ref/copy.sgml	7 May 2005 02:22:45 -0000	1.65
--- doc/src/sgml/ref/copy.sgml	28 May 2005 14:02:59 -0000
***************
*** 424,436 ****
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
       </tbody>
      </tgroup>
     </informaltable>
  
!     Presently, <command>COPY TO</command> will never emit an octal-digits
!     backslash sequence, but it does use the other sequences listed above
!     for those control characters.
     </para>
  
     <para>
--- 424,441 ----
         <entry>Backslash followed by one to three octal digits specifies
         the character with that numeric code</entry>
        </row>
+       <row>
+        <entry><literal>\x</><replaceable>digits</></entry>
+        <entry>Backslash <literal>x</> followed by one or two hex digits specifies
+        the character with that numeric code</entry>
+       </row>
       </tbody>
      </tgroup>
     </informaltable>
  
!     Presently, <command>COPY TO</command> will never emit an octal or 
!     hex-digits backslash sequence, but it does use the other sequences
!     listed above for those control characters.
     </para>
  
     <para>
Index: src/backend/commands/copy.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.244
diff -c -c -r1.244 copy.c
*** src/backend/commands/copy.c	7 May 2005 02:22:46 -0000	1.244
--- src/backend/commands/copy.c	28 May 2005 14:03:07 -0000
***************
*** 2274,2279 ****
--- 2274,2291 ----
  	return result;
  }
  
+ /*
+  *	Return decimal value for a hexadecimal digit
+  */
+ static
+ int GetDecimalFromHex(char hex)
+ {
+ 	if (isdigit(hex))
+ 		return hex - '0';
+ 	else
+ 		return tolower(hex) - 'a' + 10;
+ }
+ 
  /*----------
   * Read the value of a single attribute, performing de-escaping as needed.
   *
***************
*** 2335,2340 ****
--- 2347,2353 ----
  				case '5':
  				case '6':
  				case '7':
+ 					/* handle \013 */
  					{
  						int			val;
  
***************
*** 2360,2365 ****
--- 2373,2402 ----
  						c = val & 0377;
  					}
  					break;
+ 				case 'x':
+ 					/* Handle \x3F */
+ 					if (line_buf.cursor < line_buf.len)
+ 					{
+ 						char hexchar = line_buf.data[line_buf.cursor];
+ 
+ 						if (isxdigit(hexchar))
+ 						{
+ 							int val = GetDecimalFromHex(hexchar);
+ 
+ 							line_buf.cursor++;
+ 							if (line_buf.cursor < line_buf.len)
+ 							{
+ 								hexchar = line_buf.data[line_buf.cursor];
+ 								if (isxdigit(hexchar))
+ 								{
+ 									line_buf.cursor++;
+ 									val = (val << 4) + GetDecimalFromHex(hexchar);
+ 								}
+ 							}
+ 							c = val & 0xff;
+ 						}
+ 					}
+ 					break;
  				case 'b':
  					c = '\b';
  					break;

/pgpatches/strhextext/plainDownload
Index: doc/src/sgml/syntax.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v
retrieving revision 1.99
diff -c -c -r1.99 syntax.sgml
*** doc/src/sgml/syntax.sgml	23 Dec 2004 05:37:40 -0000	1.99
--- doc/src/sgml/syntax.sgml	28 May 2005 14:02:58 -0000
***************
*** 254,270 ****
  
      <para>
       Another <productname>PostgreSQL</productname> extension is that
!      C-style backslash escapes are available:
!      <literal>\b</literal> is a backspace, <literal>\f</literal> is a
!      form feed, <literal>\n</literal> is a newline,
!      <literal>\r</literal> is a carriage return, <literal>\t</literal>
!      is a tab, and <literal>\<replaceable>xxx</replaceable></literal>,
!      where <replaceable>xxx</replaceable> is an octal number, is a
!      byte with the corresponding code.  (It is your responsibility
!      that the byte sequences you create are valid characters in the
!      server character set encoding.)  Any other character following a
!      backslash is taken literally.  Thus, to include a backslash in a
!      string constant, write two backslashes.
      </para>
  
      <para>
--- 254,271 ----
  
      <para>
       Another <productname>PostgreSQL</productname> extension is that
!      C-style backslash escapes are available: <literal>\b</literal> is a
!      backspace, <literal>\f</literal> is a form feed,
!      <literal>\n</literal> is a newline, <literal>\r</literal> is a
!      carriage return, <literal>\t</literal> is a tab. Also supported is
!      <literal>\<replaceable>digits</replaceable></literal>, where
!      <replaceable>ddd</replaceable> represents an octal byte value, and
!      <literal>\x<replaceable>hexdigits</replaceable></literal>, where
!      <replaceable>hexdigits</replaceable> represents a hexadecimal byte value.
!      (It is your responsibility that the byte sequences you create are
!      valid characters in the server character set encoding.) Any other
!      character following a backslash is taken literally. Thus, to
!      include a backslash in a string constant, write two backslashes.
      </para>
  
      <para>
Index: src/backend/parser/scan.l
===================================================================
RCS file: /cvsroot/pgsql/src/backend/parser/scan.l,v
retrieving revision 1.122
diff -c -c -r1.122 scan.l
*** src/backend/parser/scan.l	26 May 2005 01:24:29 -0000	1.122
--- src/backend/parser/scan.l	28 May 2005 14:03:10 -0000
***************
*** 193,200 ****
  xqstart			{quote}
  xqdouble		{quote}{quote}
  xqinside		[^\\']+
! xqescape		[\\][^0-7]
  xqoctesc		[\\][0-7]{1,3}
  
  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
--- 193,201 ----
  xqstart			{quote}
  xqdouble		{quote}{quote}
  xqinside		[^\\']+
! xqescape		[\\][^0-7x]
  xqoctesc		[\\][0-7]{1,3}
+ xqhexesc		[\\]x[0-9A-Fa-f]{1,2}
  
  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
***************
*** 435,440 ****
--- 436,445 ----
  					unsigned char c = strtoul(yytext+1, NULL, 8);
  					addlitchar(c);
  				}
+ <xq>{xqhexesc}  {
+ 					unsigned char c = strtoul(yytext+2, NULL, 16);
+ 					addlitchar(c);
+ 				}
  <xq>{quotecontinue} {
  					/* ignore */
  				}
/pgpatches/psqlhextext/plainDownload
Index: doc/src/sgml/ref/psql-ref.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/ref/psql-ref.sgml,v
retrieving revision 1.137
diff -c -c -r1.137 psql-ref.sgml
*** doc/src/sgml/ref/psql-ref.sgml	30 May 2005 15:24:23 -0000	1.137
--- doc/src/sgml/ref/psql-ref.sgml	30 May 2005 19:07:20 -0000
***************
*** 589,596 ****
      single quote. To include a single quote into such an argument,
      precede it by a backslash. Anything contained in single quotes is
      furthermore subject to C-like substitutions for
!     <literal>\n</literal> (new line), <literal>\t</literal> (tab), and
!     <literal>\</literal><replaceable>digits</replaceable> (octal).
      </para>
  
      <para>
--- 589,597 ----
      single quote. To include a single quote into such an argument,
      precede it by a backslash. Anything contained in single quotes is
      furthermore subject to C-like substitutions for
!     <literal>\n</literal> (new line), <literal>\t</literal> (tab),
!     <literal>\</literal><replaceable>digits</replaceable> (octal),
!     <literal>\x</literal><replaceable>digits</replaceable> (hexadecimal).
      </para>
  
      <para>
Index: src/bin/psql/psqlscan.l
===================================================================
RCS file: /cvsroot/pgsql/src/bin/psql/psqlscan.l,v
retrieving revision 1.12
diff -c -c -r1.12 psqlscan.l
*** src/bin/psql/psqlscan.l	30 May 2005 16:48:47 -0000	1.12
--- src/bin/psql/psqlscan.l	30 May 2005 19:07:25 -0000
***************
*** 250,257 ****
  xqstart			{quote}
  xqdouble		{quote}{quote}
  xqinside		[^\\']+
! xqescape		[\\][^0-7]
  xqoctesc		[\\][0-7]{1,3}
  
  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
--- 250,258 ----
  xqstart			{quote}
  xqdouble		{quote}{quote}
  xqinside		[^\\']+
! xqescape		[\\][^0-7x]
  xqoctesc		[\\][0-7]{1,3}
+ xqhexesc		[\\]x[0-9A-Fa-f]{1,2}
  
  /* $foo$ style quotes ("dollar quoting")
   * The quoted string starts with $foo$ where "foo" is an optional string
***************
*** 467,472 ****
--- 468,476 ----
  <xq>{xqoctesc}  {
  					ECHO;
  				}
+ <xq>{xqhexesc}  {
+ 					ECHO;
+ 				}
  <xq>{quotecontinue} {
  					ECHO;
  				}
***************
*** 855,860 ****
--- 859,870 ----
  										  (char) strtol(yytext + 1, NULL, 8));
  				}
  
+ {xqhexesc}		{
+ 					/* hex case */
+ 					appendPQExpBufferChar(output_buf,
+ 										  (char) strtol(yytext + 2, NULL, 16));
+ 				}
+ 
  "\\".			{ emit(yytext + 1, 1); }
  
  {other}|\n		{ ECHO; }
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#51)
Re: Adding \x escape processing to COPY, psql, backend

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Here is a new version of the three \x hex support patches. I have added
\x for psql variables, which is the last patch.

I have IM'ed with Peter and he is now OK with the idea of supporting \x,
with the underestanding that it doesn't take us any farther away from
compatibility than we are now.

Peter may be OK with it, but I object strongly to adding this to SQL
literals. This is exactly *not* the direction we want to be going in.

I don't really see the point for COPY and psql, either.

regards, tom lane

#53Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#48)
Backslash handling in strings

Bruce Momjian wrote:

Peter Eisentraut wrote:

Bruce Momjian wrote:

I was suggesting ESCAPE 'string' or ESC 'string'.  The marker has to
be before the string so scan.l can alter its processing of the string
--- after the string is too late --- there is no way to undo any
escaping that has happened, and it might already be used by gram.y.

That pretty much corresponds to my E'string' proposal. Both are
probably equally trivial to implement.

Right. I think your E'' idea has the benefit of fitting with our
existing X'' and B'' modifiers. It is also simpler and cleaner to do in
scan.l, so I think your idea is best.

[ CC list trimmed.]

OK, I talked to Tom and Peter and I have come up with a tentative plan.

The goal, at some point, is that we would have two types of strings, ''
strings and E'' strings. '' strings don't have any special backslash
handling for compatibility with with the ANSI spec and all other
databases except MySQL (and in MySQL it is now optional). E'' strings
behave just like our strings do now, with backslash handling.

In 8.0.X, we add support for E'' strings, but it is a noop. This is
done just for portability with future releases. We also state that
users should stop using \' to escape quotes in strings, and instead use
'', and that we will throw a warning in 8.1 if we see a \' in a non-E
string. (We could probably throw a warning for E'' use of \' too, but I
want to give users the ability to avoid the warning if they can't change
from using \' to ''.)

In 8.1, we start issuing the warning for \' in non-E strings, plus we
tell users who want escape processing that they will have to use E''
strings for this starting in release 8.2, and they should start
migrating their escaped strings over to E''.

Tom also suggested a readonly GUC variable that is sent to clients that
indicates if simple strings have backslash handling, for use by
applications that are doing escapes themselves, perhaps
'escape_all_strings'.

PQescapeString() and PQescapeBytea() can still be used, but only with
E'' strings in 8.2. We could create PQquoteString() for 8.1 and later
to allow for just single-quote doubling for non-E strings.

Tom asked about how to handle pg_dump contents that have strings, like
function bodies. We could start using E'' for those in 8.0 but it does
break backward movement of dumps, and someone upgrading from 7.1 to 8.2
would be in trouble. :-( Perhaps we will have another round of
subrelease fixes and we can bundle this into that and tell people they
have to upgrade to the newest subrelease before going to 8.2. I think
we have had that requirement in the past when we had broken pg_dump
processing.

The good news is that once everyone uses only '' to quote string, we
will not have any data security issues with this change. The only
potential problem is the mishandling of backslash characters if there is
a mismatch between what the client expects and the server uses. By
backpatching E'' perhaps even to 7.4 and earlier (as a noop), we could
minimize this problem.

Is this whole thing ugly?  Yes.  Can we just close our eyes and hope we
can continue with our current behavior while growing a larger userbase
--- probabably not.
-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#54Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#52)
Re: [PATCHES] Adding \x escape processing to COPY, psql, backend

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Here is a new version of the three \x hex support patches. I have added
\x for psql variables, which is the last patch.

I have IM'ed with Peter and he is now OK with the idea of supporting \x,
with the underestanding that it doesn't take us any farther away from
compatibility than we are now.

Peter may be OK with it, but I object strongly to adding this to SQL
literals. This is exactly *not* the direction we want to be going in.

I don't really see the point for COPY and psql, either.

We already support \n, \r, \t, and \octal. I don't see any problem with
improving it. It does not take us any closer or farther away from spec
compliance.

COPY \x has been requested by several people, and there are actually two
patches that have been submitted in the past year for this.

As you know, escapes already provide a useful mechanism on COPY and SQL
strings, and there is a plan I just posted to deal with standards
issues, but I don't see \x taking us closer or farther from this.

Please explain why this takes us in the wrong direction.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#55Greg Stark
gsstark@mit.edu
In reply to: Bruce Momjian (#53)
Re: Backslash handling in strings

Bruce Momjian <pgman@candle.pha.pa.us> writes:

The goal, at some point, is that we would have two types of strings, ''
strings and E'' strings. '' strings don't have any special backslash
handling for compatibility with with the ANSI spec and all other
databases except MySQL (and in MySQL it is now optional). E'' strings
behave just like our strings do now, with backslash handling.

The only thing I'm not clear on is what exactly is the use case for E''
strings. That is, who do you expect to actually use them?

Any new applications are recommended to be using '' strings. And any existing
applications obviously won't be using them since they don't currently exist.

The only potential candidates are existing applications being ported forward.
And that only makes sense if they're currently using some function like
addslash. Is it really easier to change all the SQL queries to use E'' (and
still have a bug) than it is to replace addslash with PQquoteString() ?

Also, I'm really confused why you would make PQescapeString require E''
strings and introduce a new function. That means existing non-buggy
applications would suddenly be buggy? And it would be impossible to write a
properly functioning application that interpolates a constant into a query
that would be portable to 8.2 and 8.0?

--
greg

#56Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Greg Stark (#55)
Re: Backslash handling in strings

Greg Stark wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

The goal, at some point, is that we would have two types of strings, ''
strings and E'' strings. '' strings don't have any special backslash
handling for compatibility with with the ANSI spec and all other
databases except MySQL (and in MySQL it is now optional). E'' strings
behave just like our strings do now, with backslash handling.

The only thing I'm not clear on is what exactly is the use case for E''
strings. That is, who do you expect to actually use them?

Any new applications are recommended to be using '' strings. And any existing
applications obviously won't be using them since they don't currently exist.

We are saying to use '' to escape single quotes in all strings. E'' is
still useful if you want to use backslash escapes in your strings.

Does that answer your questions?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#57Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Stark (#55)
Re: Backslash handling in strings

Greg Stark <gsstark@mit.edu> writes:

The only thing I'm not clear on is what exactly is the use case for E''
strings. That is, who do you expect to actually use them?

The case that convinced me we need to keep some sort of backslash
capability is this: suppose you want to put a string including a tab
into your database. Try to do it with psql:
t=> insert into foo values ('<TAB>
Guess what: you won't get anywhere, at least not unless you disable
readline. So it's nice to be able to use \t.

There are related issues involving \r and \n depending on your platform.
And this doesn't even scratch the surface of encoding-related funnies.

So there's definitely a use-case for keeping the existing backslash
behavior, and E'string' seems like a reasonable proposal for doing that
without conflicting with the SQL spec.

What I do not see at the moment is how we get there from here (ie,
dropping backslashing in regular literals) without incurring tremendous
pain --- including breaking all existing pg_dump files, introducing
security holes and/or data corruption into many existing apps that are
not presently broken, and probably some other ways of ruining your day.
I'm quite unconvinced that this particular letter of the SQL spec is
worth complying with ...

regards, tom lane

#58Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Lane (#57)
Re: Backslash handling in strings

On Tue, 31 May 2005, Tom Lane wrote:

The case that convinced me we need to keep some sort of backslash
capability is this: suppose you want to put a string including a tab
into your database. Try to do it with psql:
t=> insert into foo values ('<TAB>
Guess what: you won't get anywhere, at least not unless you disable
readline. So it's nice to be able to use \t.

To insert a tab using readline you can press ESC followed by TAB. This
works as least in readline as it is setup in redhat/fedora (and readline
can be setup in 1000 different ways so who knows how portable this is).

--
/Dennis Bj�rklund

#59Tom Ivar Helbekkmo
tih@eunetnorge.no
In reply to: Dennis Bjorklund (#58)
Re: Backslash handling in strings

Dennis Bjorklund <db@zigo.dhs.org> writes:

To insert a tab using readline you can press ESC followed by TAB.

...or ^V followed by TAB, as per age-old tradition. :-)

-tih
--
Don't ascribe to stupidity what can be adequately explained by ignorance.

#60Dennis Bjorklund
db@zigo.dhs.org
In reply to: Tom Ivar Helbekkmo (#59)
Re: Backslash handling in strings

On Tue, 31 May 2005, Tom Ivar Helbekkmo wrote:

...or ^V followed by TAB, as per age-old tradition. :-)

Right, I forgot about that one. One can also do other control characters
instead of TAB by pressing CTRL-J and similar.

Well, I just wanted to point out that it's possible. The main problem is
still to make sure that old dumps work and can be imported. I don't see
how that can work without a GUC variable in addition to the E'foo' stuff
(but that's not so bad as it can be phased in to support old pg_dumps and
phased out again in pg 10 or something).

--
/Dennis Bj�rklund

#61Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#57)
Re: Backslash handling in strings

Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

The only thing I'm not clear on is what exactly is the use case for E''
strings. That is, who do you expect to actually use them?

The case that convinced me we need to keep some sort of backslash
capability is this: suppose you want to put a string including a tab
into your database. Try to do it with psql:
t=> insert into foo values ('<TAB>
Guess what: you won't get anywhere, at least not unless you disable
readline. So it's nice to be able to use \t.

There are related issues involving \r and \n depending on your platform.
And this doesn't even scratch the surface of encoding-related funnies.

So there's definitely a use-case for keeping the existing backslash
behavior, and E'string' seems like a reasonable proposal for doing that
without conflicting with the SQL spec.

What I do not see at the moment is how we get there from here (ie,
dropping backslashing in regular literals) without incurring tremendous
pain --- including breaking all existing pg_dump files, introducing
security holes and/or data corruption into many existing apps that are
not presently broken, and probably some other ways of ruining your day.
I'm quite unconvinced that this particular letter of the SQL spec is
worth complying with ...

I think this is going to be like the Win32 port, where there is little
excitement from our existing users, but it is needed to grow our user
base.

I think the E'' is useful becuase it gives people a migration path for
the escapes they are already using, and the escape mechanism itself it
something useful to keep.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#62Bruno Wolff III
bruno@wolff.to
In reply to: Dennis Bjorklund (#58)
Re: Backslash handling in strings

On Tue, May 31, 2005 at 11:49:20 +0200,
Dennis Bjorklund <db@zigo.dhs.org> wrote:

On Tue, 31 May 2005, Tom Lane wrote:

The case that convinced me we need to keep some sort of backslash
capability is this: suppose you want to put a string including a tab
into your database. Try to do it with psql:
t=> insert into foo values ('<TAB>
Guess what: you won't get anywhere, at least not unless you disable
readline. So it's nice to be able to use \t.

To insert a tab using readline you can press ESC followed by TAB. This
works as least in readline as it is setup in redhat/fedora (and readline
can be setup in 1000 different ways so who knows how portable this is).

There are still advantages to having printable backslashed escaped characters
in strings that are saved to files. It makes it easier to see what is really
in the string and they are less likely to get accidentally munged when
editing the file or moving it between systems with different line termination
conventions.

#63Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#54)
Re: [PATCHES] Adding \x escape processing to COPY, psql, backend

Patch applied. Thanks for the COPY \x patch.

---------------------------------------------------------------------------

Bruce Momjian wrote:

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Here is a new version of the three \x hex support patches. I have added
\x for psql variables, which is the last patch.

I have IM'ed with Peter and he is now OK with the idea of supporting \x,
with the underestanding that it doesn't take us any farther away from
compatibility than we are now.

Peter may be OK with it, but I object strongly to adding this to SQL
literals. This is exactly *not* the direction we want to be going in.

I don't really see the point for COPY and psql, either.

We already support \n, \r, \t, and \octal. I don't see any problem with
improving it. It does not take us any closer or farther away from spec
compliance.

COPY \x has been requested by several people, and there are actually two
patches that have been submitted in the past year for this.

As you know, escapes already provide a useful mechanism on COPY and SQL
strings, and there is a plan I just posted to deal with standards
issues, but I don't see \x taking us closer or farther from this.

Please explain why this takes us in the wrong direction.

-- 
Bruce Momjian                        |  http://candle.pha.pa.us
pgman@candle.pha.pa.us               |  (610) 359-1001
+  If your life is a hard drive,     |  13 Roberts Road
+  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#64Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruno Wolff III (#62)
Re: Backslash handling in strings

Here is a summary of the issues with moving to no escapes for non-E
strings:

http://candle.pha.pa.us/cgi-bin/pgescape

---------------------------------------------------------------------------

Bruno Wolff III wrote:

On Tue, May 31, 2005 at 11:49:20 +0200,
Dennis Bjorklund <db@zigo.dhs.org> wrote:

On Tue, 31 May 2005, Tom Lane wrote:

The case that convinced me we need to keep some sort of backslash
capability is this: suppose you want to put a string including a tab
into your database. Try to do it with psql:
t=> insert into foo values ('<TAB>
Guess what: you won't get anywhere, at least not unless you disable
readline. So it's nice to be able to use \t.

To insert a tab using readline you can press ESC followed by TAB. This
works as least in readline as it is setup in redhat/fedora (and readline
can be setup in 1000 different ways so who knows how portable this is).

There are still advantages to having printable backslashed escaped characters
in strings that are saved to files. It makes it easier to see what is really
in the string and they are less likely to get accidentally munged when
editing the file or moving it between systems with different line termination
conventions.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073