Copy From suggestion

Started by Mark Watsonover 15 years ago6 messagesgeneral
Jump to latest
#1Mark Watson
mark.watson@jurisconcept.ca

Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from an ancient
program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting program,
which I am forced to use, does not surround this column with quotes and this
column contains cr/lf characters, which I must deal with (and have dealt
with) before I can import the file via Copy. Hence to my suggestion:
I was envisioning a parameter DELIMITER_COUNT which, if one was 100%
confident that all columns are accounted for in the input file, could be
used to alleviate the need to deal with cr/lf's in varchar and text columns.
i.e., if copy loaded a line with fewer delimiters than delimiter_count, the
next line from the text file would be read and the assignment of columns
would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.

Mark Watson

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Mark Watson (#1)
Re: Copy From suggestion

On Friday 17 December 2010 7:46:12 am Mark Watson wrote:

Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from an ancient
program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting
program, which I am forced to use, does not surround this column with
quotes and this column contains cr/lf characters, which I must deal with
(and have dealt with) before I can import the file via Copy. Hence to my
suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one was
100% confident that all columns are accounted for in the input file, could
be used to alleviate the need to deal with cr/lf's in varchar and text
columns. i.e., if copy loaded a line with fewer delimiters than
delimiter_count, the next line from the text file would be read and the
assignment of columns would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.

Mark Watson

A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/

If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.

Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com

#3Mark Watson
mark.watson@jurisconcept.ca
In reply to: Adrian Klaver (#2)
Re: Copy From suggestion

Thanks, Adrian,

I’ll try a windows compile of pgloader sometime during the holidays. It’s
true that I already have a solution (export <= 65000 row chunks, import into
Excel, export via Excel puts quotes around the text columns), but something
faster and more efficient would really help in this case.

-Mark

_____

De : pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] De la part de Adrian Klaver
Envoyé : 18 décembre 2010 18:05
À : pgsql-general@postgresql.org
Cc : Mark Watson
Objet : Re: [GENERAL] Copy From suggestion

On Friday 17 December 2010 7:46:12 am Mark Watson wrote:

Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from an

ancient

program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting
program, which I am forced to use, does not surround this column with
quotes and this column contains cr/lf characters, which I must deal with
(and have dealt with) before I can import the file via Copy. Hence to my
suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one

was

100% confident that all columns are accounted for in the input file, could
be used to alleviate the need to deal with cr/lf's in varchar and text
columns. i.e., if copy loaded a line with fewer delimiters than
delimiter_count, the next line from the text file would be read and the
assignment of columns would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.

Mark Watson

A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/

If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.

Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

_____

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3325 - Release Date: 12/19/10

#4Jorge Godoy
jgodoy@gmail.com
In reply to: Mark Watson (#3)
Re: Copy From suggestion

With OpenOffice.org that 65K limit goes away as well...

I don't know why it is still like that today for MS Office... It is almost
2011 and they still think 64K is enough? :-)

--
Jorge Godoy <jgodoy@gmail.com>

On Mon, Dec 20, 2010 at 11:49, Mark Watson <mark.watson@jurisconcept.ca>wrote:

Show quoted text

Thanks, Adrian,

I’ll try a windows compile of pgloader sometime during the holidays. It’s
true that I already have a solution (export <= 65000 row chunks, import into
Excel, export via Excel puts quotes around the text columns), but something
faster and more efficient would really help in this case.

-Mark
------------------------------

*De :* pgsql-general-owner@postgresql.org [mailto:
pgsql-general-owner@postgresql.org] *De la part de* Adrian Klaver
*Envoyé :* 18 décembre 2010 18:05
*À :* pgsql-general@postgresql.org
*Cc :* Mark Watson
*Objet :* Re: [GENERAL] Copy From suggestion

On Friday 17 December 2010 7:46:12 am Mark Watson wrote:

Hello all,
Firstly, I apologise if this is not the correct list for this subject.
Lately, I've been working on a data conversion, importing into Postgres
using Copy From. The text file I'm copying from is produced from an

ancient

program and produces either a tab or semi-colon delimited file. One file
contains about 1.8M rows and has a 'comments' column. The exporting
program, which I am forced to use, does not surround this column with
quotes and this column contains cr/lf characters, which I must deal with
(and have dealt with) before I can import the file via Copy. Hence to my
suggestion: I was envisioning a parameter DELIMITER_COUNT which, if one

was

100% confident that all columns are accounted for in the input file,

could

be used to alleviate the need to deal with cr/lf's in varchar and text
columns. i.e., if copy loaded a line with fewer delimiters than
delimiter_count, the next line from the text file would be read and the
assignment of columns would continue for the current row/column.
Just curious as to the thoughts out there.
Thanks to all for this excellent product, and a merry Christmas/holiday
period to all.

Mark Watson

A suggestion,give pgloader a look;
http://pgloader.projects.postgresql.org/

If I am following you it might already have the solution to the multi-line
problem. In particular read the History section of the docs.

Thanks,
--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
------------------------------

No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1170 / Virus Database: 426/3325 - Release Date: 12/19/10

#5Leif B. Kristensen
leif@solumslekt.org
In reply to: Jorge Godoy (#4)
Re: Copy From suggestion

On Monday 20. December 2010 15.24.58 Jorge Godoy wrote:

With OpenOffice.org that 65K limit goes away as well...

I don't know why it is still like that today for MS Office... It is

almost

2011 and they still think 64K is enough? :-)

Maybe there's an uncrippled «Professional» or «Enterprise» version
costing an arm and a leg? ;)

regards,
Leif B. Kristensen

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Leif B. Kristensen (#5)
Re: Copy From suggestion

On Monday 20 December 2010 7:09:23 am Leif Biberg Kristensen wrote:

On Monday 20. December 2010 15.24.58 Jorge Godoy wrote:

With OpenOffice.org that 65K limit goes away as well...

I don't know why it is still like that today for MS Office... It is

almost

2011 and they still think 64K is enough? :-)

Maybe there's an uncrippled «Professional» or «Enterprise» version
costing an arm and a leg? ;)

regards,
Leif B. Kristensen

FYI with Office 2007 that limit went to a little over 1 million rows.

--
Adrian Klaver
adrian.klaver@gmail.com