\copy produces CSV output that cannot be read by \copy
Description: If the end-of-data marker \. occurs on a line of its own in a
multiline string value, \copy outputs it to CSV without any quoting.
Subsequently reading that CSV file using \copy fails.
Version:
$ psql
psql (9.5.7)
Type "help" for help.
test=# select version();
version
---------------------------------------------------------------------------------------------------
PostgreSQL 9.5.7 on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-16)
6.3.0 20170425, 64-bit
(1 row)
How to reproduce:
CREATE TABLE test (t varchar);
INSERT INTO test (t) VALUES ('
\.
');
\copy (SELECT t FROM test) TO 'test.csv' WITH CSV;
\copy test FROM 'test.csv' WITH (FORMAT CSV);
This results in:
ERROR: unterminated CSV quoted field
CONTEXT: COPY test, line 1: ""
"
Expected result: \copy can read the files it produced itself.
Analysis: I think that neither the writing nor the reading is incompatible
with the documentation. Rather, there doesn’t seem to be a way to write
such values to CSV that will be read back correctly (or at least I couldn‘t
find such a way), as the reader always stops when encountering \. on a line
by itself, and because there doesn’t seem to be a way to escape such a
value in a way that wouldn’t produce a \. on a line by itself.
Nicolas
On Fri, Aug 4, 2017 at 2:38 PM, Nicolas Barbier
<nicolas.barbier@gmail.com> wrote:
Analysis: I think that neither the writing nor the reading is incompatible
with the documentation. Rather, there doesn’t seem to be a way to write such
values to CSV that will be read back correctly (or at least I couldn‘t find
such a way), as the reader always stops when encountering \. on a line by
itself, and because there doesn’t seem to be a way to escape such a value in
a way that wouldn’t produce a \. on a line by itself.
The format produced by COPY OUT looks fine to me, and can be reloaded
with a plain COPY (not \copy). And you may be interested in this bit
from src/bin/psql/copy.c:
/*
* This code erroneously assumes '\.' on a line alone
* inside a quoted CSV string terminates the \copy.
*
/messages/by-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
*/
if (strcmp(buf, "\\.\n") == 0 ||
strcmp(buf, "\\.\r\n") == 0)
{
copydone = true;
break;
}
postgresql.org is offline now, and I don't have this thread at hand,
but I guess that the answer is there...
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Michael Paquier <michael.paquier@gmail.com> writes:
The format produced by COPY OUT looks fine to me, and can be reloaded
with a plain COPY (not \copy). And you may be interested in this bit
from src/bin/psql/copy.c:
/*
* This code erroneously assumes '\.' on a line alone
* inside a quoted CSV string terminates the \copy.
*
/messages/by-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
*/
I wonder if it would improve matters to check for "\." only when
copystream == pset.cur_cmd_source, that is, only when the copy data
is inlined into the SQL stream. That would create an inconsistency
between inline and out-of-line data, but it might be a reasonable
thing to do anyway.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Sat, Aug 5, 2017 at 6:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
The format produced by COPY OUT looks fine to me, and can be reloaded
with a plain COPY (not \copy). And you may be interested in this bit
from src/bin/psql/copy.c:
/*
* This code erroneously assumes '\.' on a line alone
* inside a quoted CSV string terminates the \copy.
*
/messages/by-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
*/I wonder if it would improve matters to check for "\." only when
copystream == pset.cur_cmd_source, that is, only when the copy data
is inlined into the SQL stream. That would create an inconsistency
between inline and out-of-line data, but it might be a reasonable
thing to do anyway.
A complete solution would be to look for the quote option provided by
the user and track if the string being passed to the backend is within
a quoted area or not, no? If that's a quoted area, the check for "\."
could be bypassed. Now, as parse_slash_copy() has its own way to parse
the command options given by the user, perhaps all this extra
engineering is not worth fixing an edge case.
In short, I agree that what you propose here has value to fix the case
proposed here, as even COPY FROM stdin (not only \copy) fails now.
--
Michael
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Thu, Aug 17, 2017 at 01:53:05PM +0900, Michael Paquier wrote:
On Sat, Aug 5, 2017 at 6:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
The format produced by COPY OUT looks fine to me, and can be reloaded
with a plain COPY (not \copy). And you may be interested in this bit
from src/bin/psql/copy.c:
/*
* This code erroneously assumes '\.' on a line alone
* inside a quoted CSV string terminates the \copy.
*
/messages/by-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
*/I wonder if it would improve matters to check for "\." only when
copystream == pset.cur_cmd_source, that is, only when the copy data
is inlined into the SQL stream. That would create an inconsistency
between inline and out-of-line data, but it might be a reasonable
thing to do anyway.A complete solution would be to look for the quote option provided by
the user and track if the string being passed to the backend is within
a quoted area or not, no? If that's a quoted area, the check for "\."
could be bypassed. Now, as parse_slash_copy() has its own way to parse
the command options given by the user, perhaps all this extra
engineering is not worth fixing an edge case.In short, I agree that what you propose here has value to fix the case
proposed here, as even COPY FROM stdin (not only \copy) fails now.
This is a six-year-old thread, but I have applied the following doc
patch to at least document this behavior.
---------------------------------------------------------------------------
commit 42d3125ada
Author: Bruce Momjian <bruce@momjian.us>
Date: Fri Nov 3 13:57:59 2023 -0400
doc: \copy can get data values \. and end-of-input confused
Reported-by: Svante Richter
Discussion: /messages/by-id/fcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com
Backpatch-through: 11
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index d94e3cacfc..cc7d797159 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -1119,6 +1119,10 @@ INSERT INTO tbl1 VALUES ($1, $2) \bind 'first value' 'second value' \g
destination, because all data must pass through the client/server
connection. For large amounts of data the <acronym>SQL</acronym>
command might be preferable.
+ Also, because of this pass-through method, <literal>\copy
+ ... from</literal> in <acronym>CSV</acronym> mode will erroneously
+ treat a <literal>\.</literal> data value alone on a line as an
+ end-of-input marker.
</para>
</tip>
diff --git a/src/bin/psql/copy.c b/src/bin/psql/copy.c
index b3cc3d9a29..dbbbdb8898 100644
--- a/src/bin/psql/copy.c
+++ b/src/bin/psql/copy.c
@@ -627,6 +627,8 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
* This code erroneously assumes '\.' on a line alone
* inside a quoted CSV string terminates the \copy.
* https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
+ *
+ * https://www.postgresql.org/message-id/bfcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com
*/
if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
(linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.