Copy From suggestion
Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from an ancient
program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting program,
which I am forced to use, does not surround this column with quotes and this
column contains cr/lf characters, which I must deal with (and have dealt
with) before I can import the file via Copy. Hence to my suggestion:
I was envisioning a parameter DELIMITER_COUNT which, if one was 100%
confident that all columns are accounted for in the input file, could be
used to alleviate the need to deal with cr/lf's in varchar and text columns.
i.e., if copy loaded a line with fewer delimiters than delimiter_count, the
next line from the text file would be read and the assignment of columns
would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.
Mark Watson
On Friday 17 December 2010 7:46:12 am Mark Watson wrote:
Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from an ancient
program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting
program, which I am forced to use, does not surround this column with
quotes and this column contains cr/lf characters, which I must deal with
(and have dealt with) before I can import the file via Copy. Hence to my
suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was
100% confident that all columns are accounted for in the input file, could
be used to alleviate the need to deal with cr/lf's in varchar and text
columns. i.e., if copy loaded a line with fewer delimiters than
delimiter_count, the next line from the text file would be read and the
assignment of columns would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.Mark Watson
A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/
If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.
Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com
Thanks, Adrian,
Ill try a windows compile of pgloader sometime during the holidays. Its
true that I already have a solution (export <= 65000 row chunks, import into
Excel, export via Excel puts quotes around the text columns), but something
faster and more efficient would really help in this case.
-Mark
_____
De : pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] De la part de Adrian Klaver
Envoyé : 18 décembre 2010 18:05
À : pgsql-general@postgresql.org
Cc : Mark Watson
Objet : Re: [GENERAL] Copy From suggestion
On Friday 17 December 2010 7:46:12 am Mark Watson wrote:
Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from an
ancient
program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting
program, which I am forced to use, does not surround this column with
quotes and this column contains cr/lf characters, which I must deal with
(and have dealt with) before I can import the file via Copy. Hence to my
suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one
was
100% confident that all columns are accounted for in the input file, could
be used to alleviate the need to deal with cr/lf's in varchar and text
columns. i.e., if copy loaded a line with fewer delimiters than
delimiter_count, the next line from the text file would be read and the
assignment of columns would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.Mark Watson
A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/
If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.
Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
_____
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3325 - Release Date: 12/19/10
With OpenOffice.org that 65K limit goes away as well...
I don't know why it is still like that today for MS Office... It is almost
2011 and they still think 64K is enough? :-)
--
Jorge Godoy <jgodoy@gmail.com>
On Mon, Dec 20, 2010 at 11:49, Mark Watson <mark.watson@jurisconcept.ca>wrote:
Show quoted text
Thanks, Adrian,
I’ll try a windows compile of pgloader sometime during the holidays. It’s
true that I already have a solution (export <= 65000 row chunks, import into
Excel, export via Excel puts quotes around the text columns), but something
faster and more efficient would really help in this case.-Mark
------------------------------*De :* pgsql-general-owner@postgresql.org [mailto:
pgsql-general-owner@postgresql.org] *De la part de* Adrian Klaver
*Envoyé :* 18 décembre 2010 18:05
*À :* pgsql-general@postgresql.org
*Cc :* Mark Watson
*Objet :* Re: [GENERAL] Copy From suggestionOn Friday 17 December 2010 7:46:12 am Mark Watson wrote:
Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from anancient
program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting
program, which I am forced to use, does not surround this column with
quotes and this column contains cr/lf characters, which I must deal with
(and have dealt with) before I can import the file via Copy. Hence to my
suggestion: I was envisioning a parameter DELIMITER_COUNT which, if onewas
100% confident that all columns are accounted for in the input file,
could
be used to alleviate the need to deal with cr/lf's in varchar and text
columns. i.e., if copy loaded a line with fewer delimiters than
delimiter_count, the next line from the text file would be read and the
assignment of columns would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.Mark Watson
A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
------------------------------No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3325 - Release Date: 12/19/10
On Monday 20. December 2010 15.24.58 Jorge Godoy wrote:
With OpenOffice.org that 65K limit goes away as well...
I don't know why it is still like that today for MS Office... It is
almost
2011 and they still think 64K is enough? :-)
Maybe there's an uncrippled «Professional» or «Enterprise» version
costing an arm and a leg? ;)
regards,
Leif B. Kristensen
On Monday 20 December 2010 7:09:23 am Leif Biberg Kristensen wrote:
On Monday 20. December 2010 15.24.58 Jorge Godoy wrote:
With OpenOffice.org that 65K limit goes away as well...
I don't know why it is still like that today for MS Office... It is
almost
2011 and they still think 64K is enough? :-)
Maybe there's an uncrippled «Professional» or «Enterprise» version
costing an arm and a leg? ;)regards,
Leif B. Kristensen
FYI with Office 2007 that limit went to a little over 1 million rows.
--
Adrian Klaver
adrian.klaver@gmail.com