\xDD patch for 7.5devel

Started by Jason Goddenover 22 years ago11 messageshackers

jasongodden@optushome.com.au

over 22 years ago

Hi all,

This is my first patch for PostgreSQL against the 7.5devel cvs (please advise if this is the wrong place to post patches). This patch simply enables the \xDD (or \XDD) hexadecimal import in the copy command (im starting with the simple stuff first). I did notice that there may be a need to issue an error if an invalid octal or hex character is found following a \ or \x. No errors are currently flagged by the octal (or this hex) import.

Rgds,

Jason

cvs server: Diffing .
Index: copy.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/commands/copy.c,v
retrieving revision 1.213
diff -u -r1.213 copy.c
--- copy.c      6 Oct 2003 02:38:53 -0000       1.213
+++ copy.c      5 Nov 2003 11:19:33 -0000
@@ -48,9 +48,10 @@
 #include "utils/lsyscache.h"
 #include "utils/syscache.h"

-
 #define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
 #define OCTVALUE(c) ((c) - '0')
+#define ISHEX(c) ((((c)>='0') && ((c)<='9')) || (((c)>='A') && ((c)<='F')) || (((c)>='a') && ((c)<='f')))
+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) : ((c)-'0')))

 /*
  * Represents the different source/dest cases we need to worry about at
@@ -1947,6 +1948,33 @@
                        c = line_buf.data[line_buf.cursor++];
                        switch (c)
                        {
+                               case 'x':
+                               case 'X':
+                                       {
+                                               if (line_buf.cursor < line_buf.len)
+                                               {
+                                                       int hexval = 0;
+
+                                                       c = line_buf.data[line_buf.cursor];
+                                                       if (ISHEX(c))
+                                                       {
+                                                               line_buf.cursor++;
+                                                               hexval = HEXVALUE(c);
+                                                               if (line_buf.cursor < line_buf.len)
+                                                               {
+                                                                       c = line_buf.data[line_buf.cursor];
+                                                                       line_buf.cursor++;
+                                                                       if (ISHEX(c))
+                                                                       {
+                                                                               line_buf.cursor++;
+                                                                               hexval = (hexval << 4) + HEXVALUE(c);
+                                                                       }
+                                                               }
+                                                       }
+                                                       c = hexval;
+                                               }
+                                       }
+                                       break;
                                case '0':
                                case '1':
                                case '2':

Peter Eisentraut

peter_e@gmx.net

over 22 years ago

In reply to: Jason Godden (#1)

Re: \xDD patch for 7.5devel

Jason Godden writes:

This is my first patch for PostgreSQL against the 7.5devel cvs (please advise if this is the wrong place to post patches). This patch simply enables the \xDD (or \XDD) hexadecimal import in the copy command (im starting with the simple stuff first). I did notice that there may be a need to issue an error if an invalid octal or hex character is found following a \ or \x. No errors are currently flagged by the octal (or this hex) import.

I think this belongs into the string literal parser (at least in addition
to COPY).

--
Peter Eisentraut peter_e@gmx.net

Christopher Kings-Lynne

chriskl@familyhealth.com.au

over 22 years ago

In reply to: Peter Eisentraut (#2)

Re: \xDD patch for 7.5devel

This is my first patch for PostgreSQL against the 7.5devel cvs (please advise if this is the wrong place to post patches). This patch simply enables the \xDD (or \XDD) hexadecimal import in the copy command (im starting with the simple stuff first). I did notice that there may be a need to issue an error if an invalid octal or hex character is found following a \ or \x. No errors are currently flagged by the octal (or this hex) import.

I think this belongs into the string literal parser (at least in addition
to COPY).

That's what always happens when I start working on something - someone
points out something that makes it 100 times harder :P

Chris

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Jason Godden (#1)

Re: \xDD patch for 7.5devel

Jason Godden <jasongodden@optushome.com.au> writes:

This is my first patch for PostgreSQL against the 7.5devel cvs (please
advise if this is the wrong place to post patches).

pgsql-patches in future, please.

+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) : ((c)-'0')))

This seems excessively dependent on the assumption that the character
set is ASCII. Why have you hard-coded numeric equivalents into this
macro?

BTW, the patch is incomplete because it is lacking documentation.

regards, tom lane

Markus Bertheau

twanger@bluetwanger.de

over 22 years ago

In reply to: Tom Lane (#4)

Re: \xDD patch for 7.5devel

В Срд, 05.11.2003, в 16:25, Tom Lane пишет:

+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) : ((c)-'0')))

This seems excessively dependent on the assumption that the character
set is ASCII. Why have you hard-coded numeric equivalents into this
macro?

What not ASCII compatible character sets are out there in use still
today?

--
Markus Bertheau <twanger@bluetwanger.de>

Jason Godden

jasongodden@optushome.com.au

over 22 years ago

In reply to: Markus Bertheau (#5)

Re: \xDD patch for 7.5devel

On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:

В Срд, 05.11.2003, в 16:25, Tom Lane пишет:

+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) :
((c)-'0')))

This seems excessively dependent on the assumption that the character
set is ASCII. Why have you hard-coded numeric equivalents into this
macro?

What not ASCII compatible character sets are out there in use still
today?

Ah, yes - didn't even think about the character sets. If thats the case then
octal needs attention as well because it makes a similar assumption. Peter
Eisentraut commented that this should be in the string literal parser.
Should this be the case? and if so should i migrate both octal and hex to
this parser?

Rgds,

Jason

Larry Rosenman

ler@lerctr.org

over 22 years ago

In reply to: Jason Godden (#6)

Re: \xDD patch for 7.5devel

--On Thursday, November 06, 2003 07:43:07 +1100 Jason Godden
<jasongodden@optushome.com.au> wrote:

On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:

? ???, 05.11.2003, ? 16:25, Tom Lane ?????:

+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55)
: ((c)-'0')))

This seems excessively dependent on the assumption that the character
set is ASCII. Why have you hard-coded numeric equivalents into this
macro?

What not ASCII compatible character sets are out there in use still
today?

EBCDIC as far as I know is still the default on IBM Mainframes (been 5+
years but...).

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Kurt Roeckx

Q@ping.be

over 22 years ago

In reply to: Larry Rosenman (#7)

Re: \xDD patch for 7.5devel

On Wed, Nov 05, 2003 at 02:47:17PM -0600, Larry Rosenman wrote:

--On Thursday, November 06, 2003 07:43:07 +1100 Jason Godden
<jasongodden@optushome.com.au> wrote:

On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:

? ???, 05.11.2003, ? 16:25, Tom Lane ?????:

+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55)
: ((c)-'0')))

This seems excessively dependent on the assumption that the character
set is ASCII. Why have you hard-coded numeric equivalents into this
macro?

What not ASCII compatible character sets are out there in use still
today?

EBCDIC as far as I know is still the default on IBM Mainframes (been 5+
years but...).

Linux on the s390, s390x runs in ASCII mode. MVS, OS/390, z/OS
all use EBCDIC though.

But I don't think it has anything to do with which OS/hardware
you use but rather what charset is used during the communication.
It's probably about the charset that is used to send the "\xDD".
I guess question is that you can assume that that string is
encoded in ASCII.

If this is broken, I'd say that the octal encoding and other
quotes are broken too.

Kurt

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 22 years ago

In reply to: Jason Godden (#6)

Re: \xDD patch for 7.5devel

On Thu, 6 Nov 2003, Jason Godden wrote:

On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:

В Срд, 05.11.2003, в 16:25, Tom Lane пишет:

+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) :
((c)-'0')))

This seems excessively dependent on the assumption that the character
set is ASCII. Why have you hard-coded numeric equivalents into this
macro?

What not ASCII compatible character sets are out there in use still
today?

Ah, yes - didn't even think about the character sets. If thats the case then
octal needs attention as well because it makes a similar assumption. Peter

I haven't looked at the code in question, but assuming the digits are
contiguous and in order is safe, the C spec mandates that. Assuming that
the letters are in order and contiguous is not safe.

#10

Tom Lane

tgl@sss.pgh.pa.us

over 22 years ago

In reply to: Stephan Szabo (#9)

Re: \xDD patch for 7.5devel

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Thu, 6 Nov 2003, Jason Godden wrote:

On Thu, 6 Nov 2003 06:25 am, Markus Bertheau wrote:

+#define HEXVALUE(c) (((c)>='a') ? ((c)-87) : (((c)>='A') ? ((c)-55) :

((c)-'0')))

I haven't looked at the code in question, but assuming the digits are
contiguous and in order is safe, the C spec mandates that. Assuming that
the letters are in order and contiguous is not safe.

I believe that's a true statement with respect to the character sets
used in the field; I dunno whether the C spec actually says that though.

My original concern about this macro had several different levels:

1. I can't see any reason why the subtractions are coded as "-55" and
not "-'A' + 10". The existing coding is less understandable than doing
it right, and won't save anything in runtime (since the compiler will fold
the constants anyway), on top of being a character set dependency.

2. I don't much care for the assumption that lower case letters are
greater than upper case are greater than digits. This could be avoided
at a fairly small runtime penalty by making range tests:

#define HEXVALUE(c) \
(((c) >= '0' && (c) <= '9') ? ((c) - '0') : \
(((c) >= 'A' && (c) <= 'F') ? ((c) - 'A' + 10) : \
((c) - 'a' + 10)))

3. The third level would be to get rid of the assumption that letters
are contiguous, which would probably require making a lookup table to
map from char code to hex value.

I'm not sure level 3 is really worth doing, since AFAIK no one tries to
run Postgres on any EBCDIC platform. (It's likely that there are other
places that depend on the letters-are-contiguous assumption anyway.)
I do think level 1 and probably level 2 are appropriate changes.

regards, tom lane

#11

Jason Godden

jasongodden@optushome.com.au

over 22 years ago

In reply to: Tom Lane (#10)

Re: \xDD patch for 7.5devel

#define HEXVALUE(c) \
(((c) >= '0' && (c) <= '9') ? ((c) - '0') : \
(((c) >= 'A' && (c) <= 'F') ? ((c) - 'A' + 10) : \
((c) - 'a' + 10)))

3. The third level would be to get rid of the assumption that letters
are contiguous, which would probably require making a lookup table to
map from char code to hex value.

I'm not sure level 3 is really worth doing, since AFAIK no one tries to
run Postgres on any EBCDIC platform. (It's likely that there are other
places that depend on the letters-are-contiguous assumption anyway.)
I do think level 1 and probably level 2 are appropriate changes.

regards, tom lane

Hi Guys,

Thanks for the feedback. Tom I'll make the changes proposed here to the macro
and repost the patch to pgsql-patches (and do some reading on Unicode!). I
guess at this stage I would like to offer any of my time to any janitorial
work that might be needed as until im more knowledgeable about the pg source
I think any large scale stuff is off the cards.

Rgds,

Jason