pgsql: Add regression tests for CSV and \., and add automatic quoting of

Started by Bruce Momjianover 20 years ago7 messagescomitters
Jump to latest
#1Bruce Momjian
bruce@momjian.us

Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I
also added documentation describing this issue.

Modified Files:
--------------
pgsql/doc/src/sgml/ref:
copy.sgml (r1.70 -> r1.71)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/doc/src/sgml/ref/copy.sgml.diff?r1=1.70&r2=1.71)
pgsql/src/backend/commands:
copy.c (r1.256 -> r1.257)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/copy.c.diff?r1=1.256&r2=1.257)
pgsql/src/test/regress/expected:
copy2.out (r1.22 -> r1.23)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/expected/copy2.out.diff?r1=1.22&r2=1.23)
pgsql/src/test/regress/sql:
copy2.sql (r1.13 -> r1.14)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/test/regress/sql/copy2.sql.diff?r1=1.13&r2=1.14)

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#1)
Re: pgsql: Add regression tests for CSV and \., and add automatic quoting of

Bruce Momjian said:

Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I
also added documentation describing this issue.

This seems unnecessarily elaborate, in code that is already byzantine. I
think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.

cheers

andrew

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#2)
Re: pgsql: Add regression tests for CSV and \., and add automatic quoting of

Andrew Dunstan said:

Bruce Momjian said:

Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a

single column dump that has a \. value, so the load works properly. I also
added documentation describing this issue.

This seems unnecessarily elaborate, in code that is already byzantine. I

think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.

Now that I've woken up properly I realise that it's also just wrong - it
will miss the case we need to catch of the first column of a multi-column
line beginning with \. - just treat them all the same and all will be well.

Also, this test is suspicious:

strcmp(string, "\\.") == 0

Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.

cheers

andrew

#4Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#2)
Re: pgsql: Add regression tests for CSV and \., and add

Andrew Dunstan wrote:

Bruce Momjian said:

Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a
single column dump that has a \. value, so the load works properly. I
also added documentation describing this issue.

This seems unnecessarily elaborate, in code that is already byzantine. I
think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.

Only \. as a single column will be interpreted as an end-of-data, so I
want to be accurate in what we do, rather than sloppy. If we quote any
\. data value, we are going to get questions from people as why _one_
value is quote and the others are not, and we are going to have to
explain that it is quoted in column 4, but in reality it is only it
being alone that needs quoting. I don't think we want to be
inconsistent because that leads to confusion.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Bruce Momjian
bruce@momjian.us
In reply to: Andrew Dunstan (#3)
Re: pgsql: Add regression tests for CSV and \., and add

Andrew Dunstan wrote:

Andrew Dunstan said:

Bruce Momjian said:

Log Message:
-----------
Add regression tests for CSV and \., and add automatic quoting of a

single column dump that has a \. value, so the load works properly. I also
added documentation describing this issue.

This seems unnecessarily elaborate, in code that is already byzantine. I

think we can safely quote *any* field that has \. regardless of whether or
not it is a singleton. There's no need to make a single column a special
case - if it's valid for a singleton it's valid for any, and vice versa.

Now that I've woken up properly I realise that it's also just wrong - it
will miss the case we need to catch of the first column of a multi-column
line beginning with \. - just treat them all the same and all will be well.

Also, this test is suspicious:

strcmp(string, "\\.") == 0

Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.

Have you looked at the regression tests I added? \.x will no longer be
interpreted as an end-of-data marker:

COPY testeoc FROM stdin CSV;
a\.
\.b
c\.d
"\."
\.

COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."

Our documentation says \. must appear alone on a line. With non-CSV, we
allow \. to appear on the end of a line too because it can not be a data
value, but for CSV, we have to enforce that.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#6Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#5)
Re: pgsql: Add regression tests for CSV and \., and add

Andrew Dunstan wrote:

Bruce Momjian said:

Now that I've woken up properly I realise that it's also just wrong -
it will miss the case we need to catch of the first column of a
multi-column line beginning with \. - just treat them all the same and
all will be well.

Also, this test is suspicious:

strcmp(string, "\\.") == 0

Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.

Have you looked at the regression tests I added? \.x will no longer be
interpreted as an end-of-data marker:

Oh. I didn't realise that part of the behaviour had changed. Does that alter
our robustness?

It improves our robustness, for sure.

I still think it seems odd to quote the field depending on it being the only
field. That feels more inconsistent than quoting it whenever it is \.

But I guess it's a corner case that is hardly likely to happen in real life.

That is why I did it only for the single-column case, so the quoting
would happen only when required, and it would limit confusion.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#5)
Re: pgsql: Add regression tests for CSV and \., and add

Bruce Momjian said:

Now that I've woken up properly I realise that it's also just wrong -
it will miss the case we need to catch of the first column of a
multi-column line beginning with \. - just treat them all the same and
all will be well.

Also, this test is suspicious:

strcmp(string, "\\.") == 0

Don't we also want to quote it if the field reads \.x ?
strncmp(string, "\\.",2) == 0
seems like it would be a better test.

Have you looked at the regression tests I added? \.x will no longer be
interpreted as an end-of-data marker:

Oh. I didn't realise that part of the behaviour had changed. Does that alter
our robustness?

I still think it seems odd to quote the field depending on it being the only
field. That feels more inconsistent than quoting it whenever it is \.

But I guess it's a corner case that is hardly likely to happen in real life.

cheers

andrew