pgsql: Add regression tests for CSV and \., and add automatic quoting of
Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I
also added documentation describing this issue.
Modified Files:
--------------
pgsql/doc/src/sgml/ref:
copy.sgml (r1.70 -> r1.71)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/copy.sgml.diff?r1=1.70&r2=1.71)
pgsql/src/backend/commands:
copy.c (r1.256 -> r1.257)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/copy.c.diff?r1=1.256&r2=1.257)
pgsql/src/test/regress/expected:
copy2.out (r1.22 -> r1.23)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/copy2.out.diff?r1=1.22&r2=1.23)
pgsql/src/test/regress/sql:
copy2.sql (r1.13 -> r1.14)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/copy2.sql.diff?r1=1.13&r2=1.14)
Bruce Momjian said:
Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I
also added documentation describing this issue.
This seems unnecessarily elaborate, in code that is already byzantine. I
think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.
cheers
andrew
Andrew Dunstan said:
Bruce Momjian said:
Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I also
added documentation describing this issue.
This seems unnecessarily elaborate, in code that is already byzantine. I
think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.
Now that I've woken up properly I realise that it's also just wrong - it
will miss the case we need to catch of the first column of a multi-column
line beginning with \. - just treat them all the same and all will be well.
Also, this test is suspicious:
strcmp(string, "\\.") == 0
Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.
cheers
andrew
Import Notes
Resolved by subject fallback
Andrew Dunstan wrote:
Bruce Momjian said:
Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I
also added documentation describing this issue.This seems unnecessarily elaborate, in code that is already byzantine. I
think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.
Only \. as a single column will be interpreted as an end-of-data, so I
want to be accurate in what we do, rather than sloppy. If we quote any
\. data value, we are going to get questions from people as why _one_
value is quote and the others are not, and we are going to have to
explain that it is quoted in column 4, but in reality it is only it
being alone that needs quoting. I don't think we want to be
inconsistent because that leads to confusion.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Andrew Dunstan wrote:
Andrew Dunstan said:
Bruce Momjian said:
Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of asingle column dump that has a \. value, so the load works properly. I also
added documentation describing this issue.This seems unnecessarily elaborate, in code that is already byzantine. I
think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.Now that I've woken up properly I realise that it's also just wrong - it
will miss the case we need to catch of the first column of a multi-column
line beginning with \. - just treat them all the same and all will be well.Also, this test is suspicious:
strcmp(string, "\\.") == 0
Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.
Have you looked at the regression tests I added? \.x will no longer be
interpreted as an end-of-data marker:
COPY testeoc FROM stdin CSV;
a\.
\.b
c\.d
"\."
\.
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
Our documentation says \. must appear alone on a line. With non-CSV, we
allow \. to appear on the end of a line too because it can not be a data
value, but for CSV, we have to enforce that.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Andrew Dunstan wrote:
Bruce Momjian said:
Now that I've woken up properly I realise that it's also just wrong -
it will miss the case we need to catch of the first column of a
multi-column line beginning with \. - just treat them all the same and
all will be well.Also, this test is suspicious:
strcmp(string, "\\.") == 0
Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.Have you looked at the regression tests I added? \.x will no longer be
interpreted as an end-of-data marker:Oh. I didn't realise that part of the behaviour had changed. Does that alter
our robustness?
It improves our robustness, for sure.
I still think it seems odd to quote the field depending on it being the only
field. That feels more inconsistent than quoting it whenever it is \.But I guess it's a corner case that is hardly likely to happen in real life.
That is why I did it only for the single-column case, so the quoting
would happen only when required, and it would limit confusion.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Import Notes
Reply to msg id not found: 43446.68.143.134.146.1135784755.squirrel@www.dunslane.net | Resolved by subject fallback
Bruce Momjian said:
Now that I've woken up properly I realise that it's also just wrong -
it will miss the case we need to catch of the first column of a
multi-column line beginning with \. - just treat them all the same and
all will be well.Also, this test is suspicious:
strcmp(string, "\\.") == 0
Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.Have you looked at the regression tests I added? \.x will no longer be
interpreted as an end-of-data marker:
Oh. I didn't realise that part of the behaviour had changed. Does that alter
our robustness?
I still think it seems odd to quote the field depending on it being the only
field. That feels more inconsistent than quoting it whenever it is \.
But I guess it's a corner case that is hardly likely to happen in real life.
cheers
andrew