Two small questions re/ COPY CSV data into table
Hello,
I'm using COPY to load CSV data into the table in PG 10.x and have to simple
questions:
1. I do use as command:
COPY adm_cat FROM '/home/sisis/PostgreSQL/PDcsv/adm_cat.load' WITH DELIMITER '|' CSV ;
which works fine. When I read the PG docs the syntax should be
... WITH DELIMITER '|', FORMAT CSV ;
But this gives an syntax error as:
LINE 1: .../PostgreSQL/PDcsv/adm_cat.load' WITH DELIMITER '|', FORMAT C...
and the ',' is marked as its place. Why?
2. The CSV export is done by some Sybase tool which escapes the
delimiter as '\|', i.e. putting a backslash before the delimiter. I
found no way that COPY understands this excaping. Any ideas?
Thanks
matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
"Matthias" == Matthias Apitz <guru@unixarea.de> writes:
Matthias> ... WITH DELIMITER '|', FORMAT CSV ;
I think you misread the docs; the new-style syntax would be
COPY ... WITH (DELIMITER '|', FORMAT CSV);
where the parens are not optional. The old-style syntax with no parens
after WITH is what you were using before.
Matthias> 2. The CSV export is done by some Sybase tool which escapes
Matthias> the delimiter as '\|', i.e. putting a backslash before the
Matthias> delimiter. I found no way that COPY understands this
Matthias> excaping. Any ideas?
That sounds like the file is not actually a CSV - why do you think it
is?
PG accepts two formats (actually 3 if you count binary format which is
rarely used):
TEXT: records are delimited by newlines, columns are delimited by a
delimiter character (default tab), there are no quotation marks, any
newline, backslash, or delimiter in the data must be escaped as a
backslash-sequence (e.g. \n or \012)
CSV: columns may be quoted (in which case delimiters and newlines inside
them are ignored), records are delimited by newlines _outside_ quoted
fields, there are no backslash-sequences or escapes outside of quoted
fields, quote characters inside quoted fields are doubled (though
there's an option to change this).
PG follows the CSV spec at https://www.ietf.org/rfc/rfc4180.txt fairly
closely.
--
Andrew (irc:RhodiumToad)
El día Tuesday, June 04, 2019 a las 11:32:45AM +0100, Andrew Gierth escribió:
Matthias> 2. The CSV export is done by some Sybase tool which escapes
Matthias> the delimiter as '\|', i.e. putting a backslash before the
Matthias> delimiter. I found no way that COPY understands this
Matthias> excaping. Any ideas?That sounds like the file is not actually a CSV - why do you think it
is?...
Well, it's not strictly CSV, but it is what the Sybase tool produces. The
delimiter is set to '|' and a data example line looks like:
0|1| cat $1 \| lpr -Pprinter |3|4
I do load this now with COPY in mode TEXT and modify the data before
with:
sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy
Works fine.
matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
"Matthias" == Matthias Apitz <guru@unixarea.de> writes:
That sounds like the file is not actually a CSV - why do you think
it is?
Matthias> Well, it's not strictly CSV,
In other words it's not a CSV at all.
Matthias> I do load this now with COPY in mode TEXT and modify the data
Matthias> before with:
Matthias> sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy
What on earth is this supposed to achieve?
--
Andrew (irc:RhodiumToad)
El día Tuesday, June 04, 2019 a las 04:41:47PM +0100, Andrew Gierth escribió:
Matthias> I do load this now with COPY in mode TEXT and modify the data
Matthias> before with:Matthias> sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy
What on earth is this supposed to achieve?
It first translates any char '|' to vtab and then any '\vtab' (i.e. a
backslash followed by a vtab) back to the char |
The new DELIMITER for PG is then vtab and the | is just a char in the
data (in the example above a pipe of two UNIX cmd). Do you do UNIX?
matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
"Matthias" == Matthias Apitz <guru@unixarea.de> writes:
Matthias> sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copy
What on earth is this supposed to achieve?
Matthias> It first translates any char '|' to vtab and then any '\vtab'
Matthias> (i.e. a backslash followed by a vtab) back to the char |
Matthias> The new DELIMITER for PG is then vtab and the | is just a char in the
Matthias> data (in the example above a pipe of two UNIX cmd).
Yes, but why?
If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode)
then the \| is accepted as being a literal | and the unescaped | is
treated as a delimiter. What is the point of the substitutions?
--
Andrew (irc:RhodiumToad)
El día Tuesday, June 04, 2019 a las 05:56:49PM +0100, Andrew Gierth escribió:
Matthias> The new DELIMITER for PG is then vtab and the | is just a char in the
Matthias> data (in the example above a pipe of two UNIX cmd).Yes, but why?
If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode)
then the \| is accepted as being a literal | and the unescaped | is
treated as a delimiter. What is the point of the substitutions?
In the original data record in Sybase there is a column containing, i.e.
some UNIX print command to be read and executed by the application:
cmd | lpr ...
it is exported into the file as
cmd \| lpr ...
because the export delimiter is set to '|';
the COPY in text mode was complaining because it interpreted the |
of the sequence \| as an delimiter and the number of columns did not
matched.
I will provide tomorrow the exact input line, the exact COPY command and
the error.
matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!
"Matthias" == Matthias Apitz <guru@unixarea.de> writes:
Matthias> it is exported into the file as
Matthias> cmd \| lpr ...
Matthias> because the export delimiter is set to '|';
Matthias> the COPY in text mode was complaining because it interpreted
Matthias> the | of the sequence \| as an delimiter and the number of
Matthias> columns did not matched.
Matthias> I will provide tomorrow the exact input line, the exact COPY
Matthias> command and the error.
This worked for me:
postgres=# copy ct(a,b,c,d,e) from stdin with delimiter '|';
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
0|1| cat $1 \| lpr -Pprinter |3|4
\.
COPY 1
--
Andrew (irc:RhodiumToad)
Mathias:
On Tue, Jun 4, 2019 at 8:35 PM Matthias Apitz <guru@unixarea.de> wrote:
El día Tuesday, June 04, 2019 a las 05:41:43PM +0200, Francisco Olarte escribió:
On Tue, Jun 4, 2019 at 5:03 PM Matthias Apitz <guru@unixarea.de> wrote:
Well, it's not strictly CSV, but it is what the Sybase tool produces. The
delimiter is set to '|' and a data example line looks like:
0|1| cat $1 \| lpr -Pprinter |3|4
I do load this now with COPY in mode TEXT and modify the data before
with:
sed 's/|/\v/g' < table-from-Sybase | sed 's/\\\v/|/g' > table-for-copyYou are lucky it is not CSV, it is much better, it's similar to the
text format originally used by postgres. Your problem is just it uses
| instead of TAB for field delimiter.
What I d not know is why you use \v, doesn't sed use \t for TAB?
I was thinking about \t and was afraid, that in ~2 GByte char fields
even some \t could be used in the data. That's why I decided to use \v
which is more unlikely to be used in our data.
Fine then. I haven't got much sed-foo under my belt, but if you do
this in perl ( I learned perl a while ago, forgot all awk and sed to
free some brain cells ;-> ) or something similar you can use high code
points or control chars ( and I do not know if sed can do several
replacements in a pass, but perl certainly can ). Something like (
untested ) "perl -pe 's/\\\\/\001/g; s/\\\|/\002/; s/\|/\t/;
s/\002/|/; s/\001/\\\\/", the advantage over the pipe approach may be
noticeable on big files, as it cuts a lot of context switches ( OTOH
it'll run single-core ).
( the sample tries to do
scaped backslash to soh
scaped pipe to stx
pipe to tab
stx to unescaped pipe
soh to escaped backlash
)
In fact you can do "tab to backslash t" after phase 2 and tab would be
ok in the input.
But, as I said before and others have also pointed, COPY is perfectly
happy to use pipe as delimiter and havinf it escaped with backslash on
input.
(You didn't cc'ed the list, by intention?)
No, I forgot to hit reply-all because I did not pay enough attention,
my fault. Doing it now in case some one thinks this is interesting.
Francisco Olarte.
Import Notes
Reply to msg id not found: 20190604183556.GA2797@sh4-5.1blu.de
El día Tuesday, June 04, 2019 a las 07:20:54PM +0200, Matthias Apitz escribió:
El día Tuesday, June 04, 2019 a las 05:56:49PM +0100, Andrew Gierth escribió:
If you use COPY ... WITH DELIMITER '|' (i.e. text mode, not CSV mode)
then the \| is accepted as being a literal | and the unescaped | is
treated as a delimiter. What is the point of the substitutions?...
I will provide tomorrow the exact input line, the exact COPY command and
the error.
It works exactly as Andrew states. My error was not caused by treating
the '|' of '\|' as DELIMITER, but by not having WITH (NULL '' ...) in the COPY cmd.
I misinterpreted the error message as not matching number of columns.
Sorry for the noise.
Thanks
matthias
--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
May, 9: Спаси́бо освободители! Thank you very much, Russian liberators!