Off topic? - Solution for a delimiter problem

Started by Andrew Gouldover 25 years ago5 messagesgeneral
Jump to latest
#1Andrew Gould
andrewgould@yahoo.com

I receive space delimited files that have spaces
within double quoted text fields.

Since PostgreSQL's COPY doesn't deal with multiple
delimiters, I've written a crude Python script to
convert the files to a tab delimited format without
the double quotes and without disturbing the spaces
within text fields. I'm sure someone could make it
more efficient. (Probably by rewriting it in Perl.)

Does anyone else need the script? Is the script
something I can/should post?

Andrew Gould

__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/

#2Poul L. Christiansen
poulc@cs.auc.dk
In reply to: Andrew Gould (#1)
Re: Off topic? - Solution for a delimiter problem

Maybe you can specify the delimiter to be space.

radius=# \h copy
Command: COPY
Description: Copies data between files and tables
Syntax:
COPY [ BINARY ] table [ WITH OIDS ]
FROM { 'filename' | stdin }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]
COPY [ BINARY ] table [ WITH OIDS ]
TO { 'filename' | stdout }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]

That would probably be COPY table FROM filename USING DELIMITERS ' ';

HTH,
Poul L. Christiansen

Andrew Gould wrote:

Show quoted text

I receive space delimited files that have spaces
within double quoted text fields.

Since PostgreSQL's COPY doesn't deal with multiple
delimiters, I've written a crude Python script to
convert the files to a tab delimited format without
the double quotes and without disturbing the spaces
within text fields. I'm sure someone could make it
more efficient. (Probably by rewriting it in Perl.)

Does anyone else need the script? Is the script
something I can/should post?

Andrew Gould

__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/

#3Andrew Gould
andrewgould@yahoo.com
In reply to: Poul L. Christiansen (#2)
Re: Off topic? - Solution for a delimiter problem

If space was used as the delimiter without accounting
for the quotation marks, text fields might be broken
apart whenever a space occurred. For example, the
street address "123 Sesame Street" might be split into
3 fields containing '"123', 'Sesame' and 'Street"'
respectively. In the past, I imported the file into
MySQL, then exported the data into a tab delimited
text file for importing into PostgreSQL.

The Python script saves many steps; so I was offering
it up for use and improvement by others. I didn't
know if I should post the script to the mailing list
or send it directly to people who are interested.
(It's only 1k.)

Andrew Gould

--- "Poul L. Christiansen" <poulc@cs.auc.dk> wrote:

Maybe you can specify the delimiter to be space.

radius=# \h copy
Command: COPY
Description: Copies data between files and tables
Syntax:
COPY [ BINARY ] table [ WITH OIDS ]
FROM { 'filename' | stdin }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]
COPY [ BINARY ] table [ WITH OIDS ]
TO { 'filename' | stdout }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]

That would probably be COPY table FROM filename
USING DELIMITERS ' ';

HTH,
Poul L. Christiansen

Andrew Gould wrote:

I receive space delimited files that have spaces
within double quoted text fields.

Since PostgreSQL's COPY doesn't deal with multiple
delimiters, I've written a crude Python script to
convert the files to a tab delimited format

without

the double quotes and without disturbing the

spaces

within text fields. I'm sure someone could make

it

more efficient. (Probably by rewriting it in

Perl.)

Does anyone else need the script? Is the script
something I can/should post?

Andrew Gould

__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/

#4Jeff Eckermann
jeckermann@verio.net
In reply to: Andrew Gould (#3)
RE: Off topic? - Solution for a delimiter problem

Isn't this overkill?
Something like:
sed -e 's/\" \"/^I/g' -e 's/\"//g' <file>
would do the job nicely. I'm sure people on this list can suggest even
simpler ways (note: the "^I", or tab character, in the example is created by
typing Ctrl-V then Ctrl-I).
Sed has been (and is) my friend for many such cases.

Show quoted text

-----Original Message-----
From: Andrew Gould [SMTP:andrewgould@yahoo.com]
Sent: Wednesday, December 20, 2000 5:58 PM
To: Postgres Mailing List
Subject: Re: [GENERAL] Off topic? - Solution for a delimiter problem

If space was used as the delimiter without accounting
for the quotation marks, text fields might be broken
apart whenever a space occurred. For example, the
street address "123 Sesame Street" might be split into
3 fields containing '"123', 'Sesame' and 'Street"'
respectively. In the past, I imported the file into
MySQL, then exported the data into a tab delimited
text file for importing into PostgreSQL.

The Python script saves many steps; so I was offering
it up for use and improvement by others. I didn't
know if I should post the script to the mailing list
or send it directly to people who are interested.
(It's only 1k.)

Andrew Gould

--- "Poul L. Christiansen" <poulc@cs.auc.dk> wrote:

Maybe you can specify the delimiter to be space.

radius=# \h copy
Command: COPY
Description: Copies data between files and tables
Syntax:
COPY [ BINARY ] table [ WITH OIDS ]
FROM { 'filename' | stdin }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]
COPY [ BINARY ] table [ WITH OIDS ]
TO { 'filename' | stdout }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]

That would probably be COPY table FROM filename
USING DELIMITERS ' ';

HTH,
Poul L. Christiansen

Andrew Gould wrote:

I receive space delimited files that have spaces
within double quoted text fields.

Since PostgreSQL's COPY doesn't deal with multiple
delimiters, I've written a crude Python script to
convert the files to a tab delimited format

without

the double quotes and without disturbing the

spaces

within text fields. I'm sure someone could make

it

more efficient. (Probably by rewriting it in

Perl.)

Does anyone else need the script? Is the script
something I can/should post?

Andrew Gould

__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/

#5Andrew Gould
andrewgould@yahoo.com
In reply to: Jeff Eckermann (#4)
RE: Off topic? - Solution for a delimiter problem

I think (I could be miss-reading the sed line) you're
assuming that every field is a text field. I receive
many files with both text and numeric fields. The
numberic fields are not double quoted; so automating
the pairing of double quotes as you have done becomes
more complicated.

The algorithm I used reads the file one character at a
time. The detection of a double quotation mark
changes the value of a variable by multiplying the
variable's value by -1. The variable's value acts as
an on/off switch that determines whether or not a
space will be replaced by a tab. This algorithm
allows for any combination of text and numeric fields.

Thank you for the suggestion though. I am trying to
learn more about regular expressions.

Andrew Gould

--- Jeff Eckermann <jeckermann@verio.net> wrote:

Isn't this overkill?
Something like:
sed -e 's/\" \"/^I/g' -e 's/\"//g' <file>
would do the job nicely. I'm sure people on this
list can suggest even
simpler ways (note: the "^I", or tab character, in
the example is created by
typing Ctrl-V then Ctrl-I).
Sed has been (and is) my friend for many such cases.

-----Original Message-----
From: Andrew Gould [SMTP:andrewgould@yahoo.com]
Sent: Wednesday, December 20, 2000 5:58 PM
To: Postgres Mailing List
Subject: Re: [GENERAL] Off topic? - Solution for a

delimiter problem

If space was used as the delimiter without

accounting

for the quotation marks, text fields might be

broken

apart whenever a space occurred. For example, the
street address "123 Sesame Street" might be split

into

3 fields containing '"123', 'Sesame' and 'Street"'
respectively. In the past, I imported the file

into

MySQL, then exported the data into a tab delimited
text file for importing into PostgreSQL.

The Python script saves many steps; so I was

offering

it up for use and improvement by others. I didn't
know if I should post the script to the mailing

list

or send it directly to people who are interested.
(It's only 1k.)

Andrew Gould

--- "Poul L. Christiansen" <poulc@cs.auc.dk>

wrote:

Maybe you can specify the delimiter to be space.

radius=# \h copy
Command: COPY
Description: Copies data between files and

tables

Syntax:
COPY [ BINARY ] table [ WITH OIDS ]
FROM { 'filename' | stdin }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]
COPY [ BINARY ] table [ WITH OIDS ]
TO { 'filename' | stdout }
[ [USING] DELIMITERS 'delimiter' ]
[ WITH NULL AS 'null string' ]

That would probably be COPY table FROM filename
USING DELIMITERS ' ';

HTH,
Poul L. Christiansen

Andrew Gould wrote:

I receive space delimited files that have

spaces

within double quoted text fields.

Since PostgreSQL's COPY doesn't deal with

multiple

delimiters, I've written a crude Python script

to

convert the files to a tab delimited format

without

the double quotes and without disturbing the

spaces

within text fields. I'm sure someone could

make

it

more efficient. (Probably by rewriting it in

Perl.)

Does anyone else need the script? Is the

script

something I can/should post?

Andrew Gould

__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of

Products.

http://shopping.yahoo.com/

__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/