COPY Error Message is Confusing
I just spent the morning chasing down a small data corruption. It showed up
when loading a database from the output of a dump. The error message was:
copy: line 8590351, Bad date external representation '04-0| '
I suggest this be changed to:
copy: input tuple 8590351, Bad date external representation '04-0| '
After investigating this it turns out the number reported is a 1-based input
record number. Referring to it as a line number is very confusing because
records may span line boundaries. The following other interpretations are
credible:
A line number in the dump file
A line number relative to the start of the COPY.
It would also be useful to report the name of the table being copied to. It
would be really useful if it would output the offending input line(s)
content though that might have security related issues.
---------
Bryan White, ArcaMax.com, VP of Technology
This email represents the consensus opinion
of the many voices in my head.
--- Bryan White <bryan@arcamax.com> wrote:
I suggest this be changed to:
copy: input tuple 8590351, Bad date external
representation '04-0| '
It's not strictly a "tuple" until it's been loaded.
After investigating this it turns out the number
reported is a 1-based input
record number. Referring to it as a line number is
very confusing because
records may span line boundaries.
Not so with COPY. The record separator is hard-coded
to be a newline: the field separator can be set at
runtime, but the record separator cannot. That would
be a nice feature to have, though.
It would also be useful to report the name of the
table being copied to. It
would be really useful if it would output the
offending input line(s)
content though that might have security related
issues.
Various people have wished for an import application
with more intelligence than COPY now has. No doubt
much of this could be achieved simply by building
extra features into COPY.
With about three more years of study, I might have the
competency to attempt that myself. In the meantime,
is anyone else volunteering? :-)
---------
Bryan White, ArcaMax.com, VP of Technology
This email represents the consensus opinion
of the many voices in my head.---------------------------(end of
broadcast)---------------------------
TIP 2: you can get off all lists at once with the
unregister command
(send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)
__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/
It's not strictly a "tuple" until it's been loaded.
I guess that depends on your definition of 'tuple'. Are the rows returned
by a select statment tuples if the select is a join of multiple tables. I
tend to think of a tuple as an ordered set of values but maybe I have it
wrong. In any event anyone of 'tuple', 'record', or 'row' would be less
confusing than 'line'.
Not so with COPY. The record separator is hard-coded
to be a newline: the field separator can be set at
runtime, but the record separator cannot. That would
be a nice feature to have, though.
The record separator is hard coded but it may occur in the data. If it
occurs in the data it will be escaped but this fact eludes my text editor.
The fact that the current error message refers to a line number is
confusing. I can find the offending record by line or by tuple/record/row
number, it just would help if the error message was clear about what it
meant.
Various people have wished for an import application
with more intelligence than COPY now has. No doubt
much of this could be achieved simply by building
extra features into COPY.
This existing functionaly serves my needs. I just find the message
confusing and think a minor change in verbage would make it less so.
---------
Bryan White, ArcaMax.com, VP of Technology
This email represents the consensus opinion
of the many voices in my head.
"Bryan White" <bryan@arcamax.com> writes:
It's not strictly a "tuple" until it's been loaded.
I guess that depends on your definition of 'tuple'. Are the rows returned
by a select statment tuples if the select is a join of multiple tables. I
tend to think of a tuple as an ordered set of values but maybe I have it
wrong. In any event anyone of 'tuple', 'record', or 'row' would be less
confusing than 'line'.
I agree that 'line' seems confusing in the presence of escaped newlines.
I prefer 'row' or possibly 'record' to 'tuple', however. 'tuple'
strikes me as unnecessarily jargon-ish in this context.
regards, tom lane