COPY and file_fdw with fixed column widths

Started by Bruce Momjianover 10 years ago4 messages
#1Bruce Momjian
bruce@momjian.us

I know COPY doesn't support importing files with fixed column widths,
i.e. we can't say field1 is the first 30 characters, and field2 is the
rest of the line. We need a unique delimiter at column 31. (Commercial
Ingres does support this ability.)

I know we tell most people to use sed, Perl, or an ETL tool to convert
files into a format COPY understands, and I think that is a reasonable
answer. However, the file_fdw also reads our COPY format, and in that
case, the data file might be updated regularly and running an ETL
process on it every time it is read is inconvenient.

I asking because I am playing around with file_fdw and I have some log
files with a Unix 'date' prefix, a colon, then some text that might also
contain a colon. It would be ideal if I could read in first 28
characters into field1, then the rest of the line into field2. The only
idea I have is to create a single-column foreign table that reads the
entire line as one field, then a view on top of that the parses the line
into fields, but that seems kind of complex.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: COPY and file_fdw with fixed column widths

Bruce Momjian <bruce@momjian.us> writes:

I know COPY doesn't support importing files with fixed column widths,
i.e. we can't say field1 is the first 30 characters, and field2 is the
rest of the line. We need a unique delimiter at column 31. (Commercial
Ingres does support this ability.)

I know we tell most people to use sed, Perl, or an ETL tool to convert
files into a format COPY understands, and I think that is a reasonable
answer. However, the file_fdw also reads our COPY format, and in that
case, the data file might be updated regularly and running an ETL
process on it every time it is read is inconvenient.

COPY is, and has always been intended to be, as fast as possible; loading
format transformation abilities onto it seems like a fundamental mistake.
Therefore, if you wish file_fdw were more flexible, I think the answer is
to create a variant of file_fdw that doesn't use COPY but some other
mechanism.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: COPY and file_fdw with fixed column widths

On Tue, Apr 28, 2015 at 12:46:22PM -0700, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

I know COPY doesn't support importing files with fixed column widths,
i.e. we can't say field1 is the first 30 characters, and field2 is the
rest of the line. We need a unique delimiter at column 31. (Commercial
Ingres does support this ability.)

I know we tell most people to use sed, Perl, or an ETL tool to convert
files into a format COPY understands, and I think that is a reasonable
answer. However, the file_fdw also reads our COPY format, and in that
case, the data file might be updated regularly and running an ETL
process on it every time it is read is inconvenient.

COPY is, and has always been intended to be, as fast as possible; loading
format transformation abilities onto it seems like a fundamental mistake.
Therefore, if you wish file_fdw were more flexible, I think the answer is
to create a variant of file_fdw that doesn't use COPY but some other
mechanism.

Yes, I think this is a missing feature. While we can tell people to do
ETL for loading, we are really not doing that for file_fdw.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#3)
Re: COPY and file_fdw with fixed column widths

On 04/28/2015 03:50 PM, Bruce Momjian wrote:

On Tue, Apr 28, 2015 at 12:46:22PM -0700, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

I know COPY doesn't support importing files with fixed column widths,
i.e. we can't say field1 is the first 30 characters, and field2 is the
rest of the line. We need a unique delimiter at column 31. (Commercial
Ingres does support this ability.)
I know we tell most people to use sed, Perl, or an ETL tool to convert
files into a format COPY understands, and I think that is a reasonable
answer. However, the file_fdw also reads our COPY format, and in that
case, the data file might be updated regularly and running an ETL
process on it every time it is read is inconvenient.

COPY is, and has always been intended to be, as fast as possible; loading
format transformation abilities onto it seems like a fundamental mistake.
Therefore, if you wish file_fdw were more flexible, I think the answer is
to create a variant of file_fdw that doesn't use COPY but some other
mechanism.

Yes, I think this is a missing feature. While we can tell people to do
ETL for loading, we are really not doing that for file_fdw.

This needs some love, but it's probably along the lines you need.

<https://github.com/adunstan/file_fixed_length_record_fdw&gt;

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers