bug in COPY

Started by Nonameover 23 years ago12 messages
#1Noname
nconway@klamath.dyndns.org

This behavior doesn't look right:

nconway=# create table foo (a int default 50, b int default 100);
CREATE TABLE
nconway=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

\.

nconway=# select * from foo;
a | b
---+---
0 |
(1 row)

(The first line of the COPY input is blank: i.e. just a newline)

The problem appears to be in both 7.2.1 and current CVS.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: bug in COPY

nconway@klamath.dyndns.org (Neil Conway) writes:

This behavior doesn't look right:

It's not, but I believe the correct point of view is that the input
data is defective and should be rejected. See past discussions
leading up to the TODO item that mentions rejecting COPY input rows
with the wrong number of fields (rather than silently filling with
NULLs as we do now).

A subsidiary point here is that pg_atoi() explicitly takes a zero-length
string as valid input of value 0. I think this is quite bogus myself,
but I don't know why that behavior was put in or whether we'd be breaking
anything if we tightened it up.

regards, tom lane

#3Noname
nconway@klamath.dyndns.org
In reply to: Tom Lane (#2)
Re: bug in COPY

On Wed, Jul 24, 2002 at 04:23:56PM -0400, Tom Lane wrote:

nconway@klamath.dyndns.org (Neil Conway) writes:

This behavior doesn't look right:

It's not, but I believe the correct point of view is that the input
data is defective and should be rejected. See past discussions
leading up to the TODO item that mentions rejecting COPY input rows
with the wrong number of fields (rather than silently filling with
NULLs as we do now).

Yeah, I was thinking that too. Now that we have column lists in
COPY, there is no need to keep this functionality around: if the
user wants to load data that is missing a column, they can just
omit the column from the column list and have the column default
inserted (which is a lot more sensible than inserting NULL).

Unfortunately, I think that removing this properly will require
refactoring some of the COPY code. I'll take a look at implementing
this...

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#3)
Re: bug in COPY

nconway@klamath.dyndns.org (Neil Conway) writes:

leading up to the TODO item that mentions rejecting COPY input rows
with the wrong number of fields (rather than silently filling with
NULLs as we do now).

Yeah, I was thinking that too. Now that we have column lists in
COPY, there is no need to keep this functionality around: if the
user wants to load data that is missing a column, they can just
omit the column from the column list and have the column default
inserted (which is a lot more sensible than inserting NULL).

Right, that was the thinking exactly.

Unfortunately, I think that removing this properly will require
refactoring some of the COPY code. I'll take a look at implementing
this...

I thought it could be hacked in pretty easily, but if you want to
clean up the structure while you're in there, go for it.

regards, tom lane

#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#2)
Re: bug in COPY

Tom Lane wrote:

nconway@klamath.dyndns.org (Neil Conway) writes:

This behavior doesn't look right:

It's not, but I believe the correct point of view is that the input
data is defective and should be rejected. See past discussions
leading up to the TODO item that mentions rejecting COPY input rows
with the wrong number of fields (rather than silently filling with
NULLs as we do now).

A subsidiary point here is that pg_atoi() explicitly takes a zero-length
string as valid input of value 0. I think this is quite bogus myself,
but I don't know why that behavior was put in or whether we'd be breaking
anything if we tightened it up.

Yea, I found the atoi zero-length behavior when I tried to clean up the
use of strtol() recently. If you remove that behavior, the regression
tests fail.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#2)
2 attachment(s)
Re: bug in COPY

Tom Lane wrote:

A subsidiary point here is that pg_atoi() explicitly takes a zero-length
string as valid input of value 0. I think this is quite bogus myself,
but I don't know why that behavior was put in or whether we'd be breaking
anything if we tightened it up.

I have attached a patch the throws an error if pg_atoi() is passed a
zero-length string, and have included regression diffs showing the
effects of the patch.

Seems the new code catches a few places that were bad, like defineing {}
for an array of 0,0. The copy2.out change is because pg_atoi catches
the problem before COPY does.

The tightening up of pg_atoi seems safe and makes sense to me.

If no adverse comments, I will apply and fix up the regression results.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/bjm/difftext/plainDownload
Index: src/backend/utils/adt/numutils.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/utils/adt/numutils.c,v
retrieving revision 1.51
diff -c -c -r1.51 numutils.c
*** src/backend/utils/adt/numutils.c	16 Jul 2002 18:34:16 -0000	1.51
--- src/backend/utils/adt/numutils.c	27 Aug 2002 17:51:45 -0000
***************
*** 60,66 ****
  	if (s == (char *) NULL)
  		elog(ERROR, "pg_atoi: NULL pointer!");
  	else if (*s == 0)
! 		l = (long) 0;
  	else
  		l = strtol(s, &badp, 10);
  
--- 60,66 ----
  	if (s == (char *) NULL)
  		elog(ERROR, "pg_atoi: NULL pointer!");
  	else if (*s == 0)
! 		elog(ERROR, "pg_atoi: zero-length string!");
  	else
  		l = strtol(s, &badp, 10);
  
/pg/test/regress/regression.diffstext/plainDownload
*** ./expected/arrays.out	Thu Nov 29 16:02:41 2001
--- ./results/arrays.out	Tue Aug 27 14:14:47 2002
***************
*** 16,21 ****
--- 16,22 ----
  --
  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
     VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}');
+ ERROR:  pg_atoi: zero-length string!
  UPDATE arrtest SET e[0] = '1.1';
  UPDATE arrtest SET e[1] = '2.2';
  INSERT INTO arrtest (f)
***************
*** 29,39 ****
     VALUES ('{}', '{3,4}', '{foo,bar}', '{bar,foo}');
  SELECT * FROM arrtest;
        a      |        b        |     c     |       d       |     e     |        f        |      g      
! -------------+-----------------+-----------+---------------+-----------+-----------------+-------------
!  {1,2,3,4,5} | {{{0,0},{1,2}}} | {}        | {}            |           | {}              | {}
   {11,12,23}  | {{3,4},{4,5}}   | {foobar}  | {{elt1,elt2}} | {3.4,6.7} | {"abc  ",abcde} | {abc,abcde}
   {}          | {3,4}           | {foo,bar} | {bar,foo}     |           |                 | 
! (3 rows)
  
  SELECT arrtest.a[1],
            arrtest.b[1][1][1],
--- 30,39 ----
     VALUES ('{}', '{3,4}', '{foo,bar}', '{bar,foo}');
  SELECT * FROM arrtest;
       a      |       b       |     c     |       d       |     e     |        f        |      g      
! ------------+---------------+-----------+---------------+-----------+-----------------+-------------
   {11,12,23} | {{3,4},{4,5}} | {foobar}  | {{elt1,elt2}} | {3.4,6.7} | {"abc  ",abcde} | {abc,abcde}
   {}         | {3,4}         | {foo,bar} | {bar,foo}     |           |                 | 
! (2 rows)
  
  SELECT arrtest.a[1],
            arrtest.b[1][1][1],
***************
*** 43,61 ****
     FROM arrtest;
   a  | b |   c    |  d   | e 
  ----+---+--------+------+---
-   1 | 0 |        |      |  
   11 |   | foobar | elt1 |  
      |   | foo    |      |  
! (3 rows)
  
  SELECT a[1], b[1][1][1], c[1], d[1][1], e[0]
     FROM arrtest;
   a  | b |   c    |  d   | e 
  ----+---+--------+------+---
-   1 | 0 |        |      |  
   11 |   | foobar | elt1 |  
      |   | foo    |      |  
! (3 rows)
  
  SELECT a[1:3],
            b[1:1][1:2][1:2],
--- 43,59 ----
     FROM arrtest;
   a  | b |   c    |  d   | e 
  ----+---+--------+------+---
   11 |   | foobar | elt1 |  
      |   | foo    |      |  
! (2 rows)
  
  SELECT a[1], b[1][1][1], c[1], d[1][1], e[0]
     FROM arrtest;
   a  | b |   c    |  d   | e 
  ----+---+--------+------+---
   11 |   | foobar | elt1 |  
      |   | foo    |      |  
! (2 rows)
  
  SELECT a[1:3],
            b[1:1][1:2][1:2],
***************
*** 63,82 ****
            d[1:1][1:2]
     FROM arrtest;
       a      |        b        |     c     |       d       
! ------------+-----------------+-----------+---------------
!  {1,2,3}    | {{{0,0},{1,2}}} |           | 
   {11,12,23} |                 | {foobar}  | {{elt1,elt2}}
              |                 | {foo,bar} | 
! (3 rows)
  
  SELECT array_dims(a) AS a,array_dims(b) AS b,array_dims(c) AS c
     FROM arrtest;
     a   |        b        |   c   
! -------+-----------------+-------
!  [1:5] | [1:1][1:2][1:2] | 
   [1:3] | [1:2][1:2]      | [1:1]
         | [1:2]           | [1:2]
! (3 rows)
  
  -- returns nothing 
  SELECT *
--- 61,78 ----
            d[1:1][1:2]
     FROM arrtest;
       a      | b |     c     |       d       
! ------------+---+-----------+---------------
   {11,12,23} |   | {foobar}  | {{elt1,elt2}}
              |   | {foo,bar} | 
! (2 rows)
  
  SELECT array_dims(a) AS a,array_dims(b) AS b,array_dims(c) AS c
     FROM arrtest;
     a   |     b      |   c   
! -------+------------+-------
   [1:3] | [1:2][1:2] | [1:1]
         | [1:2]      | [1:2]
! (2 rows)
  
  -- returns nothing 
  SELECT *
***************
*** 99,109 ****
    WHERE array_dims(c) is not null;
  SELECT a,b,c FROM arrtest;
         a       |           b           |         c         
! ---------------+-----------------------+-------------------
!  {16,25,3,4,5} | {{{113,142},{1,147}}} | {}
   {}            | {3,4}                 | {foo,new_word}
   {16,25,23}    | {{3,4},{4,5}}         | {foobar,new_word}
! (3 rows)
  
  SELECT a[1:3],
            b[1:1][1:2][1:2],
--- 95,104 ----
    WHERE array_dims(c) is not null;
  SELECT a,b,c FROM arrtest;
       a      |       b       |         c         
! ------------+---------------+-------------------
   {}         | {3,4}         | {foo,new_word}
   {16,25,23} | {{3,4},{4,5}} | {foobar,new_word}
! (2 rows)
  
  SELECT a[1:3],
            b[1:1][1:2][1:2],
***************
*** 111,119 ****
            d[1:1][2:2]
     FROM arrtest;
       a      |           b           |         c         |    d     
! ------------+-----------------------+-------------------+----------
!  {16,25,3}  | {{{113,142},{1,147}}} |                   | 
              |                       | {foo,new_word}    | 
   {16,25,23} |                       | {foobar,new_word} | {{elt2}}
! (3 rows)
  
--- 106,113 ----
            d[1:1][2:2]
     FROM arrtest;
       a      | b |         c         |    d     
! ------------+---+-------------------+----------
              |   | {foo,new_word}    | 
   {16,25,23} |   | {foobar,new_word} | {{elt2}}
! (2 rows)
  

======================================================================

*** ./expected/copy2.out	Wed Aug 21 22:25:28 2002
--- ./results/copy2.out	Tue Aug 27 14:15:50 2002
***************
*** 34,40 ****
  ERROR:  Attribute "d" specified more than once
  -- missing data: should fail
  COPY x from stdin;
! ERROR:  copy: line 1, Missing data for column "b"
  lost synchronization with server, resetting connection
  COPY x from stdin;
  ERROR:  copy: line 1, Missing data for column "e"
--- 34,40 ----
  ERROR:  Attribute "d" specified more than once
  -- missing data: should fail
  COPY x from stdin;
! ERROR:  copy: line 1, pg_atoi: zero-length string!
  lost synchronization with server, resetting connection
  COPY x from stdin;
  ERROR:  copy: line 1, Missing data for column "e"

======================================================================

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: bug in COPY

Bruce Momjian <pgman@candle.pha.pa.us> writes:

else if (*s == 0)
! elog(ERROR, "pg_atoi: zero-length string!");

The exclamation point seems inappropriate. Perhaps "zero-length input"
would be better than "string" also.

Otherwise this seems like a reasonable thing to do.

regards, tom lane

#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#7)
Re: bug in COPY

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

else if (*s == 0)
! elog(ERROR, "pg_atoi: zero-length string!");

The exclamation point seems inappropriate. Perhaps "zero-length input"
would be better than "string" also.

I copied the other test case:

if (s == (char *) NULL)
elog(ERROR, "pg_atoi: NULL pointer!");

I removed them both '!'.

Otherwise this seems like a reasonable thing to do.

OK, will apply now.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#8)
Re: bug in COPY

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

The exclamation point seems inappropriate. Perhaps "zero-length input"
would be better than "string" also.

I copied the other test case:

if (s == (char *) NULL)
elog(ERROR, "pg_atoi: NULL pointer!");

Well, the NULL-pointer test might equally well be coded as an Assert:
it's to catch backend coding errors, not cases of incorrect user input.
So the exclamation point there didn't bother me.

I removed them both '!'.

If you like. But the two conditions are not comparable.

regards, tom lane

#10Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#9)
Re: bug in COPY

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

The exclamation point seems inappropriate. Perhaps "zero-length input"
would be better than "string" also.

I copied the other test case:

if (s == (char *) NULL)
elog(ERROR, "pg_atoi: NULL pointer!");

Well, the NULL-pointer test might equally well be coded as an Assert:
it's to catch backend coding errors, not cases of incorrect user input.
So the exclamation point there didn't bother me.

You really think it should be Assert? I don't mind opening the chance
to throw an error for bad input, but are you sure that a NULL should
never get there?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#11Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#9)
1 attachment(s)
Re: bug in COPY

Patch applied.

---------------------------------------------------------------------------

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

The exclamation point seems inappropriate. Perhaps "zero-length input"
would be better than "string" also.

I copied the other test case:

if (s == (char *) NULL)
elog(ERROR, "pg_atoi: NULL pointer!");

Well, the NULL-pointer test might equally well be coded as an Assert:
it's to catch backend coding errors, not cases of incorrect user input.
So the exclamation point there didn't bother me.

I removed them both '!'.

If you like. But the two conditions are not comparable.

regards, tom lane

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/bjm/difftext/plainDownload
Index: src/backend/utils/adt/numutils.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/backend/utils/adt/numutils.c,v
retrieving revision 1.51
diff -c -c -r1.51 numutils.c
*** src/backend/utils/adt/numutils.c	16 Jul 2002 18:34:16 -0000	1.51
--- src/backend/utils/adt/numutils.c	27 Aug 2002 20:28:41 -0000
***************
*** 58,66 ****
  	 */
  
  	if (s == (char *) NULL)
! 		elog(ERROR, "pg_atoi: NULL pointer!");
  	else if (*s == 0)
! 		l = (long) 0;
  	else
  		l = strtol(s, &badp, 10);
  
--- 58,66 ----
  	 */
  
  	if (s == (char *) NULL)
! 		elog(ERROR, "pg_atoi: NULL pointer");
  	else if (*s == 0)
! 		elog(ERROR, "pg_atoi: zero-length string");
  	else
  		l = strtol(s, &badp, 10);
  
Index: src/test/regress/expected/arrays.out
===================================================================
RCS file: /cvsroot/pgsql-server/src/test/regress/expected/arrays.out,v
retrieving revision 1.9
diff -c -c -r1.9 arrays.out
*** src/test/regress/expected/arrays.out	29 Nov 2001 21:02:41 -0000	1.9
--- src/test/regress/expected/arrays.out	27 Aug 2002 20:28:45 -0000
***************
*** 15,21 ****
  -- 'e' is also a large object.
  --
  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
!    VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}');
  UPDATE arrtest SET e[0] = '1.1';
  UPDATE arrtest SET e[1] = '2.2';
  INSERT INTO arrtest (f)
--- 15,21 ----
  -- 'e' is also a large object.
  --
  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
!    VALUES ('{1,2,3,4,5}', '{{{0,0},{1,2}}}', '{}', '{}', '{}', '{}');
  UPDATE arrtest SET e[0] = '1.1';
  UPDATE arrtest SET e[1] = '2.2';
  INSERT INTO arrtest (f)
Index: src/test/regress/expected/copy2.out
===================================================================
RCS file: /cvsroot/pgsql-server/src/test/regress/expected/copy2.out,v
retrieving revision 1.6
diff -c -c -r1.6 copy2.out
*** src/test/regress/expected/copy2.out	22 Aug 2002 00:01:51 -0000	1.6
--- src/test/regress/expected/copy2.out	27 Aug 2002 20:28:45 -0000
***************
*** 34,40 ****
  ERROR:  Attribute "d" specified more than once
  -- missing data: should fail
  COPY x from stdin;
! ERROR:  copy: line 1, Missing data for column "b"
  lost synchronization with server, resetting connection
  COPY x from stdin;
  ERROR:  copy: line 1, Missing data for column "e"
--- 34,40 ----
  ERROR:  Attribute "d" specified more than once
  -- missing data: should fail
  COPY x from stdin;
! ERROR:  copy: line 1, pg_atoi: zero-length string
  lost synchronization with server, resetting connection
  COPY x from stdin;
  ERROR:  copy: line 1, Missing data for column "e"
Index: src/test/regress/sql/arrays.sql
===================================================================
RCS file: /cvsroot/pgsql-server/src/test/regress/sql/arrays.sql,v
retrieving revision 1.8
diff -c -c -r1.8 arrays.sql
*** src/test/regress/sql/arrays.sql	21 May 2001 16:54:46 -0000	1.8
--- src/test/regress/sql/arrays.sql	27 Aug 2002 20:28:45 -0000
***************
*** 18,24 ****
  --
  
  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
!    VALUES ('{1,2,3,4,5}', '{{{},{1,2}}}', '{}', '{}', '{}', '{}');
  
  UPDATE arrtest SET e[0] = '1.1';
  
--- 18,24 ----
  --
  
  INSERT INTO arrtest (a[5], b[2][1][2], c, d, f, g)
!    VALUES ('{1,2,3,4,5}', '{{{0,0},{1,2}}}', '{}', '{}', '{}', '{}');
  
  UPDATE arrtest SET e[0] = '1.1';
  
#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#10)
Re: bug in COPY

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Tom Lane wrote:

Well, the NULL-pointer test might equally well be coded as an Assert:

You really think it should be Assert?

I don't feel a need to change it, no. I was just pointing out that
there shouldn't be any way for a user to cause that condition to occur.

regards, tom lane