Need help for import of text file

Started by Andreasover 13 years ago9 messagesgeneral
Jump to latest
#1Andreas
maps.on@gmx.net

Hi,

I need to import textfiles that have 5 columns but there is just blanks
as delimitors.
I could use COPY to read them but there is a time column that shows
times as " h:mm.ss,ms" in the morning and "hh:mm.ss,ms" in the afternoon.

Problem here is in the morning the first digit of the hour is shown as a
blank so there are 2 blanks before the time so COPY misstakes this as an
empty column and gets confused.

Can someone point me in the direction of an COPY option I'm not aware
of, or alternativly to some console tool that I can put in the batch
before the import step and replace the 2 blanks with 1 blank.

I use an OpenSuse server so some linux tool would do.

regards
Andreas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Sheraz Sharif
sheraz@under-new-management.com
In reply to: Andreas (#1)
Re: Need help for import of text file

On Dec 15, 2012, at 1:06 PM, Andreas wrote:

Hi,

I need to import textfiles that have 5 columns but there is just blanks as delimitors.
I could use COPY to read them but there is a time column that shows times as " h:mm.ss,ms" in the morning and "hh:mm.ss,ms" in the afternoon.

Problem here is in the morning the first digit of the hour is shown as a blank so there are 2 blanks before the time so COPY misstakes this as an empty column and gets confused.

Can someone point me in the direction of an COPY option I'm not aware of, or alternativly to some console tool that I can put in the batch before the import step and replace the 2 blanks with 1 blank.

I use an OpenSuse server so some linux tool would do.

regards
Andreas

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

sed and awk are your friends.

You might consider some text processing prior to import. I do this a lot because I work with external datasets that require all kinds of massaging.

For example:

sed -e 's/^\s{2}/ /g' filename | psql DATABASE -c 'COPY table_name from STDIN'

the above will replace 2 spaces appearing at the front of the file with one space, then pipe the result to psql copy command that expects input from STDIN.

Hope that is helpful

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Andreas (#1)
Re: Need help for import of text file

On 12/15/2012 11:06 AM, Andreas wrote:

Hi,

I need to import textfiles that have 5 columns but there is just blanks
as delimitors.
I could use COPY to read them but there is a time column that shows
times as " h:mm.ss,ms" in the morning and "hh:mm.ss,ms" in the afternoon.

Problem here is in the morning the first digit of the hour is shown as a
blank so there are 2 blanks before the time so COPY misstakes this as an
empty column and gets confused.

Can someone point me in the direction of an COPY option I'm not aware
of, or alternativly to some console tool that I can put in the batch
before the import step and replace the 2 blanks with 1 blank.

I use an OpenSuse server so some linux tool would do.

How big a file are we talking about?
I found using the OO/LibreOffice spreadsheet good for this, assuming a
reasonable file size.

If you use the CSV import you can make the columns where you want them
and then save the file with another delimiter(I tend to use the pipe
symbol |). Then use that file with COPY.

regards
Andreas

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Peter Bex
Peter.Bex@xs4all.nl
In reply to: Andreas (#1)
Re: Need help for import of text file

On Sat, Dec 15, 2012 at 08:06:44PM +0100, Andreas wrote:

Hi,

Problem here is in the morning the first digit of the hour is shown as a
blank so there are 2 blanks before the time so COPY misstakes this as an
empty column and gets confused.

Can someone point me in the direction of an COPY option I'm not aware
of, or alternativly to some console tool that I can put in the batch
before the import step and replace the 2 blanks with 1 blank.

I use an OpenSuse server so some linux tool would do.

A simple sed(1) expression should do the trick:

sed -E 's/ +/ /g' old-file > new-file

GNU sed also allows in-place editing using -i, so you can avoid
writing it to a second file. Some seds accept a different flag
to enable extended regexps.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
is especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an aesthetic
experience much like composing poetry or music."
-- Donald Knuth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Peter Bex
Peter.Bex@xs4all.nl
In reply to: Peter Bex (#4)
Re: Need help for import of text file

On Sat, Dec 15, 2012 at 10:16:55PM +0100, Peter Bex wrote:

A simple sed(1) expression should do the trick:

sed -E 's/ +/ /g' old-file > new-file

I just remembered where I could check, and the GNU sed equivalent is:

sed -r 's/ +/ /g' old-file > new-file

Sorry for the confusion.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
is especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an aesthetic
experience much like composing poetry or music."
-- Donald Knuth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Andreas
maps.on@gmx.net
In reply to: Peter Bex (#5)
Re: Need help for import of text file

Am 15.12.2012 22:22, schrieb Peter Bex:

On Sat, Dec 15, 2012 at 10:16:55PM +0100, Peter Bex wrote:

A simple sed(1) expression should do the trick:

sed -E 's/ +/ /g' old-file > new-file

I just remembered where I could check, and the GNU sed equivalent is:

sed -r 's/ +/ /g' old-file > new-file

Sorry for the confusion.

With sed as startingpoint I figured it out.
Those 3 steps make the input files consumable for COPY

1. dos2unix
2. sed -i 's/[ \t]*$//'
3. sed -i 's/ / /g'

The input files get created by a simple windows batch where I can't
change anything.
It uses echo to attach a line of 4 parameters to those textfiles.

How would you manage if one or more of those parameters contained blanks
in some cases?
This doesn't appear, yet. But I consider this as luck. :}

The real column formats are ( TEXT, TEXT, DATE, TIME ).

#7Peter Bex
Peter.Bex@xs4all.nl
In reply to: Andreas (#6)
Re: Need help for import of text file

On Sun, Dec 16, 2012 at 06:48:35PM +0100, Andreas wrote:

With sed as startingpoint I figured it out.
Those 3 steps make the input files consumable for COPY

1. dos2unix
2. sed -i 's/[ \t]*$//'
3. sed -i 's/ / /g'

You can reduce this to one invocation by separating the commands
by a semicolon (or by passing multiple -e flags)

sed -i 's/[ \t]*$//;s/ / /g'

The input files get created by a simple windows batch where I can't
change anything.
It uses echo to attach a line of 4 parameters to those textfiles.

How would you manage if one or more of those parameters contained blanks
in some cases?
This doesn't appear, yet. But I consider this as luck. :}

The real column formats are ( TEXT, TEXT, DATE, TIME ).

Well, that's a bit trickier and my sed skills are rather rusty.
I'd probably use awk for these more complex tasks:

awk '/\(.*\)/ { gsub(/ +/, " "); } { print $0 }'

The "gsub" command acts like sed's "s" command with the "g" modifier.
By prefixing the block with the gsub command with a regex, it only
acts on that regex. The regex in this example only looks for an opening
and a closing paren anywhere on the line; you might need to tweak it
to more closely match your case. Alternatively, you could implement
a counter that skips the four lines (which can be done with both sed
and awk).

If it gets more complex than this, you can always write a proper
program in a "real" language to do it. This can be easier to maintain.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
is especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an aesthetic
experience much like composing poetry or music."
-- Donald Knuth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Steve Clark
sclark@netwolves.com
In reply to: Peter Bex (#7)
Re: Need help for import of text file

On 12/16/2012 01:12 PM, Peter Bex wrote:

On Sun, Dec 16, 2012 at 06:48:35PM +0100, Andreas wrote:

With sed as startingpoint I figured it out.
Those 3 steps make the input files consumable for COPY

1. dos2unix
2. sed -i 's/[ \t]*$//'
3. sed -i 's/ / /g'

You can reduce this to one invocation by separating the commands
by a semicolon (or by passing multiple -e flags)

sed -i 's/[ \t]*$//;s/ / /g'

The input files get created by a simple windows batch where I can't
change anything.
It uses echo to attach a line of 4 parameters to those textfiles.

How would you manage if one or more of those parameters contained blanks
in some cases?
This doesn't appear, yet. But I consider this as luck. :}

The real column formats are ( TEXT, TEXT, DATE, TIME ).

Well, that's a bit trickier and my sed skills are rather rusty.
I'd probably use awk for these more complex tasks:

awk '/\(.*\)/ { gsub(/ +/, " "); } { print $0 }'

The "gsub" command acts like sed's "s" command with the "g" modifier.
By prefixing the block with the gsub command with a regex, it only
acts on that regex. The regex in this example only looks for an opening
and a closing paren anywhere on the line; you might need to tweak it
to more closely match your case. Alternatively, you could implement
a counter that skips the four lines (which can be done with both sed
and awk).

If it gets more complex than this, you can always write a proper
program in a "real" language to do it. This can be easier to maintain.

Cheers,
Peter

why not use the squeeze option of tr.

tr -s " "

--
Stephen Clark
*NetWolves*
Director of Technology
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
http://www.netwolves.com

#9Peter Bex
Peter.Bex@xs4all.nl
In reply to: Steve Clark (#8)
Re: Need help for import of text file

On Sun, Dec 16, 2012 at 06:30:24PM -0500, Steve Clark wrote:

why not use the squeeze option of tr.

tr -s " "

I wasn't aware of that one, it's even simpler and more elegant.
Thanks!

For this particular case, tr(1) won't do for the same reason
the simple sed(1) expression I gave won't do: it should only
be applied to the data, not the extra 4 lines of meta-data.

Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
is especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an aesthetic
experience much like composing poetry or music."
-- Donald Knuth

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general