BUG #7611: \copy (and COPY?) incorrectly parses nul character for windows-1252

Started by Nonameover 13 years ago2 messagesbugs
Jump to latest
#1Noname
sams.james+postgres@gmail.com

The following bug has been logged on the website:

Bug reference: 7611
Logged by: James
Email address: sams.james+postgres@gmail.com
PostgreSQL version: 9.1.6
Operating system: Ubuntu Linux 12.04
Description:

I have a file with several nul characters in it. The file itself appears to
be encoded as windows-1252, though I am not 100% certain of that. I do know
that other software (e.g. Python) can decode the data as windows-1252
without issue. Postgres's \copy, however, chokes on the nul byte:

ERROR: unterminated CSV quoted field
CONTEXT: COPY promo_nonactive_load_fake, line 239900

Note that the error is wrong, the field is quoted but postgres seems to jump
forward in the file when it encounters the nul bytes.

Further, the line number is wrong. That is the length of the file (in
lines), not the line on which the error occurs, which is several hundred
lines before this.

Deleting the nul byte characters allowed copy to proceed normally. I
experienced similar issues with psycopg2 and copy_expert using COPY FROM
STDIN and this file.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: BUG #7611: \copy (and COPY?) incorrectly parses nul character for windows-1252

sams.james+postgres@gmail.com writes:

I have a file with several nul characters in it. The file itself appears to
be encoded as windows-1252, though I am not 100% certain of that. I do know
that other software (e.g. Python) can decode the data as windows-1252
without issue. Postgres's \copy, however, chokes on the nul byte:

ERROR: unterminated CSV quoted field
CONTEXT: COPY promo_nonactive_load_fake, line 239900

Postgres doesn't support nul characters in data, so the best you could
hope for here is an error message anyway. It looks to me like the
immediate cause of this is that \copy reads the file with fgets()
which will effectively ignore the rest of the line after a nul byte.
But there are probably more issues downstream.

regards, tom lane