COPY incorrectly uses null instead of an empty string in last field

Started by Oliver Elphickalmost 24 years ago8 messages
#1Oliver Elphick
olly@lfix.co.uk

Release 7.2:

Where the last field of a line contains an empty string, COPY
incorrectly inserts a NULL. This will cause data to be loaded
incorrectly from pg_dump. This happens because, when a newline is seen,
COPY needs to know whether there was a preceding delimiter and use an
empty string in that case; there is a difference between a last field
that is an empty string and one or more last fields that are completely
unspecified.

Changing this should not affect COPY TO or pg_dump, which already put
out \N in the last field if it really is NULL.

For example:

--
-- Selected TOC Entries:
--
\connect - "olly"

--
-- TOC Entry ID 2 (OID 1522144)
--
-- Name: schau Type: TABLE Owner: olly
--

CREATE TABLE "schau" (
"feld1" text NOT NULL,
"feld2" character varying(10) NOT NULL,
Constraint "pk_schau" Primary Key ("feld1", "feld2")
);

--
-- Data for TOC Entry ID 3 (OID 1522144)
--
-- Name: schau Type: TABLE DATA Owner: olly
--

COPY "schau" FROM stdin;
Helge
Arne
Alle Keiner
\.

I tried to make a fix, but managed to break it completely :-(

--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C

"But God commendeth his love toward us, in that, while
we were yet sinners, Christ died for us."
Romans 5:8

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oliver Elphick (#1)
Re: COPY incorrectly uses null instead of an empty string in last field

Oliver Elphick <olly@lfix.co.uk> writes:

Where the last field of a line contains an empty string, COPY
incorrectly inserts a NULL.

Certainly not. Using your example table definition, I did:

regression=# insert into schau values ('test','foo');
INSERT 146293 1
regression=# insert into schau values ('test2','');
INSERT 146294 1
regression=# insert into schau values ('test2',null);
ERROR: ExecAppend: Fail to add null value in not null attribute feld2
regression=# copy schau to '/tmp/schau.out';
COPY
regression=# delete from schau;
DELETE 2
regression=# copy schau from '/tmp/schau.out';
COPY
regression=# select * from schau ;
feld1 | feld2
-------+-------
test | foo
test2 |
(2 rows)

regression=# select * from schau where feld2 is null;
feld1 | feld2
-------+-------
(0 rows)

The contents of /tmp/schau.out are:

test foo
test2

(there's a tab after test2, in case you can't see it ;-))

I don't see any problem here.

regards, tom lane

#3Oliver Elphick
olly@lfix.co.uk
In reply to: Oliver Elphick (#1)
Re: COPY incorrectly uses null instead of an empty string

On Tue, 2002-02-26 at 19:02, Tom Lane wrote:

I did it this morning on getting the Debian bug report (against 7.1.3)
and it failed. I repeated it just now to check after reading your reply
and it succeeded, using exactly the same input file (in which there is
indeed a tab after the first field before the empty string.) And I was
certainly running 7.2 both times.

What the heck? Something is weird there. Maybe there is some
additional condition needed to cause a problem. Can you send me the bug
report?

I will if I can make it happen again.

The only thing I can think of at the moment is that this morning I may
have used cut-and-paste to write the input file or to read in the
commands, either of which would have lost the tabs.

The original reporter's mail had no tabs at all, but it was not an
attachment and it had had lost all its tabs along the way. His input
script came from 7.0 pg_dump. I am following up with him whether there
are actually tabs before the empty fields in that pg_dump output.

--
Oliver Elphick Oliver.Elphick@lfix.co.uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C

"But God commendeth his love toward us, in that, while
we were yet sinners, Christ died for us."
Romans 5:8

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oliver Elphick (#3)
Re: COPY incorrectly uses null instead of an empty string in last field

The original reporter's mail had no tabs at all, but it was not an
attachment and it had had lost all its tabs along the way. His input
script came from 7.0 pg_dump. I am following up with him whether there
are actually tabs before the empty fields in that pg_dump output.

If a trailing tab got lost in a dump file, then COPY IN would silently
assume that fields after the tab position should be NULL. I suspect
that that is what happened here. Probably some "helpful" program
stripped trailing whitespace from the file.

One of the things we've agreed to do in 7.3 is change COPY IN to remove
that assumption --- a line with too few fields (too few tabs) will draw
an error report instead of silently doing what's likely the wrong thing.

regards, tom lane

#5Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#4)
Re: COPY incorrectly uses null instead of an empty string in last field

One of the things we've agreed to do in 7.3 is change COPY IN to remove
that assumption --- a line with too few fields (too few tabs) will draw
an error report instead of silently doing what's likely the wrong thing.

But there will be new syntax for COPY, that allows missing trailing columns.
I hope.

Andreas

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#5)
Re: COPY incorrectly uses null instead of an empty string in last field

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

One of the things we've agreed to do in 7.3 is change COPY IN to remove
that assumption --- a line with too few fields (too few tabs) will draw
an error report instead of silently doing what's likely the wrong thing.

But there will be new syntax for COPY, that allows missing trailing columns.
I hope.

Why?

regards, tom lane

#7Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#6)
Re: COPY incorrectly uses null instead of an empty string in last field

One of the things we've agreed to do in 7.3 is change COPY IN to remove
that assumption --- a line with too few fields (too few tabs) will draw
an error report instead of silently doing what's likely the wrong thing.

But there will be new syntax for COPY, that allows missing trailing columns.
I hope.

Why?

Well, good question. For one for backwards compatibility.
I guess I would prefer COPY syntax that allows you to specify columns
as has been previously discussed. Having that, that would be sufficient
and safer than only a switch.

COPY 'afile' to atab (a1, a3, a5, a2)

Andreas

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#7)
Re: COPY incorrectly uses null instead of an empty string in last field

"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:

But there will be new syntax for COPY, that allows missing trailing columns.
I hope.

Why?

Well, good question. For one for backwards compatibility.

It's an undocumented feature. How many people are likely to be using it?

I guess I would prefer COPY syntax that allows you to specify columns
as has been previously discussed.

Yes, that's on the to-do list as well. But no matter what the expected
set of columns is, COPY ought to complain if a line is missing some
fields.

regards, tom lane