COPY FROM is not 8bit clean

Started by Darcy Buskermolenabout 24 years ago8 messageshackersbugs
Jump to latest
#1Darcy Buskermolen
darcy@ok-connect.com
hackersbugs

ACK!!!!! must rember which MTA I'm useing...
When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
delimiter and ends up with parse errors when trying to do the insert

What the ?? why dind' tthat go through with the body of the text.. *sigh*
I'll resend in the AM..

#2Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Darcy Buskermolen (#1)
hackersbugs
Re: COPY FROM is not 8bit clean

When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
delimiter and ends up with parse errors when trying to do the insert

What the ?? why dind' tthat go through with the body of the text.. *sigh*
I'll resend in the AM..

Good catch. It's definitely a bug in copy command. Please try
following patches (this is against 7.2).

*** src/backend/commands/copy.c.orig	Tue Feb 26 21:11:05 2002
--- src/backend/commands/copy.c	Tue Feb 26 21:11:35 2002
***************
*** 1024,1030 ****
  CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, char *null_print)
  {
  	int			c;
! 	int			delimc = delim[0];
  #ifdef MULTIBYTE
  	int			mblen;
--- 1024,1030 ----
  CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline, char *null_print)
  {
  	int			c;
! 	int			delimc = (unsigned char)delim[0];

#ifdef MULTIBYTE
int mblen;

#3Darcy Buskermolen
darcy@ok-connect.com
In reply to: Tatsuo Ishii (#2)
bugs
Re: COPY FROM is not 8bit clean

Postgres was not compiled with Multibyte, if I replace the if (delimc == c)
with if (strstr(delim,c)) it works as expected. This changes was
implemented for performance reasons according to the CVS log.

At 11:57 PM 2/25/02 -0500, Tom Lane wrote:

Show quoted text

Darcy Buskermolen <darcy@ok-connect.com> writes:

When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
delimiter and ends up with parse errors when trying to do the insert

Are you perhaps operating in a multibyte encoding in which \254 is
just the first byte of a multibyte character?

I'm not sure what we do in such a case, and even less sure what we
should do ... but I am entirely prepared to believe that we don't
do the Right Thing ...

regards, tom lane

#4Darcy Buskermolen
darcy@ok-connect.com
In reply to: Tatsuo Ishii (#2)
hackersbugs
Re: COPY FROM is not 8bit clean

This patch solves the problem.

At 09:16 PM 2/26/02 +0900, Tatsuo Ishii wrote:

When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
delimiter and ends up with parse errors when trying to do the insert

What the ?? why dind' tthat go through with the body of the text.. *sigh*
I'll resend in the AM..

Good catch. It's definitely a bug in copy command. Please try
following patches (this is against 7.2).

*** src/backend/commands/copy.c.orig	Tue Feb 26 21:11:05 2002
--- src/backend/commands/copy.c	Tue Feb 26 21:11:35 2002
***************
*** 1024,1030 ****
CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline,

char *null_print)

{
int c;
! int delimc = delim[0];

#ifdef MULTIBYTE
int			mblen;
--- 1024,1030 ----
CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline,

char *null_print)

Show quoted text

{
int c;
! int delimc = (unsigned char)delim[0];

#ifdef MULTIBYTE
int mblen;

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Darcy Buskermolen (#3)
bugs
Re: COPY FROM is not 8bit clean

Darcy Buskermolen <darcy@ok-connect.com> writes:

Postgres was not compiled with Multibyte, if I replace the if (delimc == c)
with if (strstr(delim,c)) it works as expected. This changes was
implemented for performance reasons according to the CVS log.

Yeah, my error :-(. See Tatsuo's reply for the correct fix.

regards, tom lane

#6Bruce Momjian
bruce@momjian.us
In reply to: Darcy Buskermolen (#4)
hackersbugs
Re: COPY FROM is not 8bit clean

Can someone explain why this fixes the problem. I thought it was safe
to assign a char to an int and do a compare. The compare I see is:

if (c == delimc)
break;

---------------------------------------------------------------------------

Darcy Buskermolen wrote:

This patch solves the problem.

At 09:16 PM 2/26/02 +0900, Tatsuo Ishii wrote:

When useing COPY FROM 'file' DELIMITER '\254' copyfrom reads past the
delimiter and ends up with parse errors when trying to do the insert

What the ?? why dind' tthat go through with the body of the text.. *sigh*
I'll resend in the AM..

Good catch. It's definitely a bug in copy command. Please try
following patches (this is against 7.2).

*** src/backend/commands/copy.c.orig	Tue Feb 26 21:11:05 2002
--- src/backend/commands/copy.c	Tue Feb 26 21:11:35 2002
***************
*** 1024,1030 ****
CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline,

char *null_print)

{
int c;
! int delimc = delim[0];

#ifdef MULTIBYTE
int			mblen;
--- 1024,1030 ----
CopyReadAttribute(FILE *fp, bool *isnull, char *delim, int *newline,

char *null_print)

{
int c;
! int delimc = (unsigned char)delim[0];

#ifdef MULTIBYTE
int mblen;

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
hackersbugs
Re: COPY FROM is not 8bit clean

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Can someone explain why this fixes the problem.

Think about a machine where char is signed by default. Extracting \254
into an int will produce -2, which will not equal \254 returned by getc.

regards, tom lane

#8Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#7)
hackersbugs
Re: COPY FROM is not 8bit clean

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Can someone explain why this fixes the problem.

Think about a machine where char is signed by default. Extracting \254
into an int will produce -2, which will not equal \254 returned by getc.

Oh, I thought that the int returned by getc already had that sign
extension, but now I remember it doesn't. In fact, it specifically
returns an int so -1 can be identified. Got it. Seems I am forgetting
some of my C.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026