COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

Started by Joel Jacobsonover 4 years ago22 messages

joel@compiler.org

over 4 years ago

Hi,

From time to time, I need to deal with bizarrely delimited text files,
having to use tools such as awk/sed/perl to reformat the files
so that they can be copied into PostgreSQL.

If such files could be imported to a table with a single text column,
we could then use PostgreSQL's now really fast regex-engine
to do data cleaning/reformatting, instead of having to rely on external awk-ward tools.

Furthermore, sometimes you don't want to clean/reformat the data at all,
but simply import the text lines "as is" without modifications,
such as when wanting to import unformatted log files,
where the log lines can contain any characters.

Could it be an idea to exploit the fact that DELIMITER E'\n' is currently an error?

ERROR: COPY delimiter cannot be newline or carriage return

That is, to change E'\n' to be a valid delimiter, which would simply read each line
delimited by newlines, as a single column.

The hack I'm currently abusing is to find some one-byte character that is not present anywhere in the text file,
and then to use that character as a delimiter. This doesn't work when needing to deal with a text file
which content is unknown at the time when writing the code though, so it's mostly useful for throwaway one-off queries.

Thoughts?

/Joel

MySQL seems to already support using \n as a delimiter (I haven't verified it myself though) [1]https://stackoverflow.com/questions/18394620/postgres-import-file-that-has-columns-separated-by-new-lines

[1]: https://stackoverflow.com/questions/18394620/postgres-import-file-that-has-columns-separated-by-new-lines

David G. Johnston

david.g.johnston@gmail.com

over 4 years ago

In reply to: Joel Jacobson (#1)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Wed, May 5, 2021 at 8:31 AM Joel Jacobson <joel@compiler.org> wrote:

Could it be an idea to exploit the fact that DELIMITER E'\n' is currently
an error?

Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
what it says and does exactly what you desire?

David J.

Chapman Flack

chap@anastigmatix.net

over 4 years ago

In reply to: David G. Johnston (#2)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On 05/05/21 13:02, David G. Johnston wrote:

Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
what it says and does exactly what you desire?

What would it mean? That you get one column, multiple rows of text
corresponding to "lines" delimited by something, or that you get one
column, one row of text for the entire content of the file?

Regards,
-Chap

Isaac Morland

isaac.morland@gmail.com

over 4 years ago

In reply to: Chapman Flack (#3)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Wed, 5 May 2021 at 13:23, Chapman Flack <chap@anastigmatix.net> wrote:

On 05/05/21 13:02, David G. Johnston wrote:

Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
what it says and does exactly what you desire?

What would it mean? That you get one column, multiple rows of text
corresponding to "lines" delimited by something, or that you get one
column, one row of text for the entire content of the file?

It means no column delimiter. In other words, there is no character
which marks the end of a data value, so the entire line is a single data
value.

Would DELIMITER NULL make sense? The existing values are literal strings so
NULL fits with that. Do we already have NONE as a keyword somewhere? It's
listed in the keyword appendix to the documentation but I can't think of
where it is used off the top of my head.

David G. Johnston

david.g.johnston@gmail.com

over 4 years ago

In reply to: Isaac Morland (#4)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Wed, May 5, 2021 at 10:34 AM Isaac Morland <isaac.morland@gmail.com>
wrote:

On Wed, 5 May 2021 at 13:23, Chapman Flack <chap@anastigmatix.net> wrote:

On 05/05/21 13:02, David G. Johnston wrote:

Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
what it says and does exactly what you desire?

What would it mean? That you get one column, multiple rows of text
corresponding to "lines" delimited by something, or that you get one
column, one row of text for the entire content of the file?

It means no column delimiter. In other words, there is no character
which marks the end of a data value, so the entire line is a single data
value.

This. When dealing with COPY it's expected that each line becomes its own
row. On the server you can do pg_read_file() if you need the entire file
to be considered a single value. psql (\I and variables) is a bit more
hackey, but I'd rather see that improved directly anyway if the goal is to
try and make getting the "whole document" easier - copy isn't the right API
for that IMO.

David J.

Joel Jacobson

joel@compiler.org

over 4 years ago

In reply to: David G. Johnston (#5)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Wed, May 5, 2021, at 19:58, David G. Johnston wrote:

On Wed, May 5, 2021 at 10:34 AM Isaac Morland <isaac.morland@gmail.com> wrote:

On Wed, 5 May 2021 at 13:23, Chapman Flack <chap@anastigmatix.net> wrote:

On 05/05/21 13:02, David G. Johnston wrote:

Why not just allow: "DELIMITER NONE" to be valid syntax meaning exactly
what it says and does exactly what you desire?

What would it mean? That you get one column, multiple rows of text
corresponding to "lines" delimited by something, or that you get one
column, one row of text for the entire content of the file?

It means no column delimiter. In other words, there is no character which marks the end of a data value, so the entire line is a single data value.

This. When dealing with COPY it's expected that each line becomes its own row. On the server you can do pg_read_file() if you need the entire file to be considered a single value. psql (\I and variables) is a bit more hackey, but I'd rather see that improved directly anyway if the goal is to try and make getting the "whole document" easier - copy isn't the right API for that IMO.

I think you misunderstood the problem.
I don't want the entire file to be considered a single value.
I want each line to become its own row, just a row with a single column.

So I actually think COPY seems like a perfect match for the job,
since it does precisely that, except there is no delimiter in this case.

I'm currently using the pg_read_file()-hack in a project,
and even though it can read files up to 1GB,
using e.g. regexp_split_to_table() to split on E'\n'
seems to need 4x as much memory, so it only
works with files less than ~256MB.

SELECT COUNT(*) FROM regexp_split_to_table(repeat(E'\n',1000000000),E'\n');
ERROR: invalid memory alloc request size 4000000004
Time: 4151.374 ms (00:04.151)

/Joel

Joel Jacobson

joel@compiler.org

over 4 years ago

In reply to: Isaac Morland (#4)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Wed, May 5, 2021, at 19:34, Isaac Morland wrote:

Would DELIMITER NULL make sense? The existing values are literal strings so NULL fits with that. Do we already have NONE as a keyword somewhere? It's listed in the keyword appendix to the documentation but I can't think of where it is used off the top of my head.

+1 to using some keyword. NULL or NONE seems fine to me. Or maybe WITHOUT DELIMITER?

/Joel

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Joel Jacobson (#6)

Re: Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

"Joel Jacobson" <joel@compiler.org> writes:

I think you misunderstood the problem.
I don't want the entire file to be considered a single value.
I want each line to become its own row, just a row with a single column.

So I actually think COPY seems like a perfect match for the job,
since it does precisely that, except there is no delimiter in this case.

Well, there's more to it than just the column delimiter.

* What about \N being converted to NULL?
* What about \. being treated as EOF?
* Do you want to turn off the special behavior of backslash (ESCAPE)
altogether?
* What about newline conversions (\r\n being seen as just \n, etc)?

I'm inclined to think that "use pg_read_file and then split at newlines"
might be a saner answer than delving into all these fine points.
Not least because people yell when you add cycles to the COPY
inner loops.

I'm currently using the pg_read_file()-hack in a project,
and even though it can read files up to 1GB,
using e.g. regexp_split_to_table() to split on E'\n'
seems to need 4x as much memory, so it only
works with files less than ~256MB.

Yeah, that's because of the conversion to "chr". But a regexp
is overkill for that anyway. Don't we have something that will
split on simple substring matches?

regards, tom lane

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Tom Lane (#8)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On 5/5/21 2:45 PM, Tom Lane wrote:

"Joel Jacobson" <joel@compiler.org> writes:

I think you misunderstood the problem.
I don't want the entire file to be considered a single value.
I want each line to become its own row, just a row with a single column.
So I actually think COPY seems like a perfect match for the job,
since it does precisely that, except there is no delimiter in this case.

Well, there's more to it than just the column delimiter.

* What about \N being converted to NULL?
* What about \. being treated as EOF?
* Do you want to turn off the special behavior of backslash (ESCAPE)
altogether?
* What about newline conversions (\r\n being seen as just \n, etc)?

I'm inclined to think that "use pg_read_file and then split at newlines"
might be a saner answer than delving into all these fine points.
Not least because people yell when you add cycles to the COPY
inner loops.

Also we have generally been resistant to supporting odd formats. FDWs
can help here (e.g. file_text_array), but they can't use STDIN IIRC.

I'm currently using the pg_read_file()-hack in a project,
and even though it can read files up to 1GB,
using e.g. regexp_split_to_table() to split on E'\n'
seems to need 4x as much memory, so it only
works with files less than ~256MB.

Yeah, that's because of the conversion to "chr". But a regexp
is overkill for that anyway. Don't we have something that will
split on simple substring matches?

Not that I know of. There is split_part but I don't think that's fit for
purpose here. Do we need one, or have I missed something?

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#10

Justin Pryzby

pryzby@telsasoft.com

over 4 years ago

In reply to: Tom Lane (#8)

Re: Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Wed, May 05, 2021 at 02:45:41PM -0400, Tom Lane wrote:

I'm currently using the pg_read_file()-hack in a project,
and even though it can read files up to 1GB,
using e.g. regexp_split_to_table() to split on E'\n'
seems to need 4x as much memory, so it only
works with files less than ~256MB.

Yeah, that's because of the conversion to "chr". But a regexp
is overkill for that anyway. Don't we have something that will
split on simple substring matches?

For v14

commit 66f163068030b5c5fe792a0daee27822dac43791
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Sep 2 18:23:56 2020 -0400

Add string_to_table() function.

--
Justin

#11

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Justin Pryzby (#10)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On 5/5/21 3:36 PM, Justin Pryzby wrote:

On Wed, May 05, 2021 at 02:45:41PM -0400, Tom Lane wrote:

I'm currently using the pg_read_file()-hack in a project,
and even though it can read files up to 1GB,
using e.g. regexp_split_to_table() to split on E'\n'
seems to need 4x as much memory, so it only
works with files less than ~256MB.

Yeah, that's because of the conversion to "chr". But a regexp
is overkill for that anyway. Don't we have something that will
split on simple substring matches?

For v14

commit 66f163068030b5c5fe792a0daee27822dac43791
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Sep 2 18:23:56 2020 -0400

Add string_to_table() function.

Ha! just in time :-)

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#12

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Andrew Dunstan (#9)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

Andrew Dunstan <andrew@dunslane.net> writes:

On 5/5/21 2:45 PM, Tom Lane wrote:

Yeah, that's because of the conversion to "chr". But a regexp
is overkill for that anyway. Don't we have something that will
split on simple substring matches?

Not that I know of. There is split_part but I don't think that's fit for
purpose here. Do we need one, or have I missed something?

[ checks manual ... ]

string_to_array or string_to_table would do, I think.

regards, tom lane

#13

Joel Jacobson

joel@compiler.org

over 4 years ago

In reply to: Tom Lane (#12)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Wed, May 5, 2021, at 21:51, Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net <mailto:andrew%40dunslane.net>> writes:

On 5/5/21 2:45 PM, Tom Lane wrote:

Yeah, that's because of the conversion to "chr". But a regexp
is overkill for that anyway. Don't we have something that will
split on simple substring matches?

Not that I know of. There is split_part but I don't think that's fit for
purpose here. Do we need one, or have I missed something?

[ checks manual ... ]

string_to_array or string_to_table would do, I think.

regards, tom lane

Thanks for these new functions, they seem really useful for a lot of cases.

However, I see two problems with using string_to_table() for this particular use-case.

- Doesn't work with files larger than 1GB, due to pg_read_file()'s limit.
- 68% slower than using the COPY-hack.

% ls -lah foo.txt
-rw-r--r-- 1 joel staff 623M May 6 07:31 foo.txt

% wc -l foo.txt
6771864 foo.txt

# COPY txt (line) FROM 'foo.txt' DELIMITER '"';
COPY 6771864
Time: 9829.707 ms (00:09.830)
Time: 9552.286 ms (00:09.552)
Time: 9483.115 ms (00:09.483)

# TRUNCATE txt;
TRUNCATE TABLE

# INSERT INTO txt (line) SELECT string_to_table(pg_read_file('foo.txt'),E'\n');
INSERT 0 6771865
Time: 16556.078 ms (00:16.556)
Time: 14720.343 ms (00:14.720)
Time: 17266.088 ms (00:17.266)

/Joel

#14

Thomas Kellerer

shammat@gmx.net

over 4 years ago

In reply to: Joel Jacobson (#1)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

Joel Jacobson schrieb am 05.05.2021 um 17:30:

Could it be an idea to exploit the fact that DELIMITER E'\n' is currently an error?

ERROR: COPY delimiter cannot be newline or carriage return

That is, to change E'\n' to be a valid delimiter, which would simply read each line
delimited by newlines, as a single column.

The hack I'm currently abusing is to find some one-byte character that is not present anywhere in the text file,
and then to use that character as a delimiter. This doesn't work when needing to deal with a text file
which content is unknown at the time when writing the code though, so it's mostly useful for throwaway one-off queries.

What about

delimiter E'\1'

The probability that a file contains the ASCII "character" 1 seems rather low.

Thomas

#15

Joel Jacobson

joel@compiler.org

over 4 years ago

In reply to: Tom Lane (#8)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Wed, May 5, 2021, at 20:45, Tom Lane wrote:

"Joel Jacobson" <joel@compiler.org <mailto:joel%40compiler.org>> writes:

I think you misunderstood the problem.
I don't want the entire file to be considered a single value.
I want each line to become its own row, just a row with a single column.

So I actually think COPY seems like a perfect match for the job,
since it does precisely that, except there is no delimiter in this case.

Well, there's more to it than just the column delimiter.

* What about \N being converted to NULL?
* What about \. being treated as EOF?
* Do you want to turn off the special behavior of backslash (ESCAPE)
altogether?
* What about newline conversions (\r\n being seen as just \n, etc)?

I'm inclined to think that "use pg_read_file and then split at newlines"
might be a saner answer than delving into all these fine points.
Not least because people yell when you add cycles to the COPY
inner loops.

Thanks for providing strong arguments why the COPY approach is a dead-end, I agree.

However, as demonstrated in my previous email, using

string_to_table(pg_read_file( filename ), E'\n')

has its performance as well as max size issues.

Maybe these two problems could be solved by combining the two functions into one?

file_to_table ( filename text, delimiter text [, null_string text ] ) → setof text

I'm thinking thanks to returning "setof text", such a function could read a stream,
and return a line as soon as a delimiter is encountered, not having to keep
the entire file in memory at any time.

/Joel

#16

Darafei "Komяpa" Praliaskouski

me@komzpa.net

over 4 years ago

In reply to: Joel Jacobson (#15)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

I have similar problems and what is really needed is a way to get a file
from client side into a server side object that can be dealt with later.
The most popular way is COPY and it is built into the psql tool. In general
it supports \copy wrapper, and there is COPY FROM STDIN. However, it is not
available to the files that are not following the csv-like structure. I had
to use it for XML and huge JSON files before, and it's always `sed` before
the import and a replace() after.

pg_read_file does not help on cloud and managed installs of postgres here.

What I would prefer is some new COPY mode like RAW that will just push
whatever it gets on the stdin/input into the cell on the server side. This
way it can be proxied by psql, utilize existing infra for passing streams
and be used in shell scripting.

On Thu, May 6, 2021 at 9:14 AM Joel Jacobson <joel@compiler.org> wrote:

On Wed, May 5, 2021, at 20:45, Tom Lane wrote:

"Joel Jacobson" <joel@compiler.org> writes:

I think you misunderstood the problem.
I don't want the entire file to be considered a single value.
I want each line to become its own row, just a row with a single column.

So I actually think COPY seems like a perfect match for the job,
since it does precisely that, except there is no delimiter in this case.

Well, there's more to it than just the column delimiter.

* What about \N being converted to NULL?
* What about \. being treated as EOF?
* Do you want to turn off the special behavior of backslash (ESCAPE)
altogether?
* What about newline conversions (\r\n being seen as just \n, etc)?

I'm inclined to think that "use pg_read_file and then split at newlines"
might be a saner answer than delving into all these fine points.
Not least because people yell when you add cycles to the COPY
inner loops.

Thanks for providing strong arguments why the COPY approach is a dead-end,
I agree.

However, as demonstrated in my previous email, using

string_to_table(pg_read_file( filename ), E'\n')

has its performance as well as max size issues.

Maybe these two problems could be solved by combining the two functions
into one?

file_to_table ( filename text, delimiter text [, null_string text ] ) →
setof text

I'm thinking thanks to returning "setof text", such a function could read
a stream,
and return a line as soon as a delimiter is encountered, not having to keep
the entire file in memory at any time.

/Joel

--
Darafei "Komяpa" Praliaskouski
OSM BY Team - http://openstreetmap.by/

#17

Isaac Morland

isaac.morland@gmail.com

over 4 years ago

In reply to: Darafei "Komяpa" Praliaskouski (#16)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Thu, 6 May 2021 at 02:21, Darafei "Komяpa" Praliaskouski <me@komzpa.net>
wrote:

What I would prefer is some new COPY mode like RAW that will just push
whatever it gets on the stdin/input into the cell on the server side. This
way it can be proxied by psql, utilize existing infra for passing streams
and be used in shell scripting.

Yes! A significant missing feature is “take this arbitrary bucket of bits
and move it to/from the database from/to this file without modification of
any kind”. There are all sorts of tutorials on the Web about how to almost
do it or fake it or convert to hex or whatever but I’ve never seen a nice
simple explanation of “here is the command:”.

Of course there is a lot to think about. Coming out of the database, the
query result must be exactly one row containing exactly one column; how do
we handle other database results? Coming into the database, only one data
value can come from a single file; so how do we populate the other columns
of whatever table we copy to?

#18

Joel Jacobson

joel@compiler.org

over 4 years ago

In reply to: Isaac Morland (#17)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Thu, May 6, 2021, at 13:41, Isaac Morland wrote:

On Thu, 6 May 2021 at 02:21, Darafei "Komяpa" Praliaskouski <me@komzpa.net> wrote:

What I would prefer is some new COPY mode like RAW that will just push whatever it gets on the stdin/input into the cell on the server side. This way it can be proxied by psql, utilize existing infra for passing streams and be used in shell scripting.

Yes! A significant missing feature is “take this arbitrary bucket of bits and move it to/from the database from/to this file without modification of any kind”. There are all sorts of tutorials on the Web about how to almost do it or fake it or convert to hex or whatever but I’ve never seen a nice simple explanation of “here is the command:”.

Of course there is a lot to think about. Coming out of the database, the query result must be exactly one row containing exactly one column; how do we handle other database results? Coming into the database, only one data value can come from a single file; so how do we populate the other columns of whatever table we copy to?

If the file is on the server, you can use pg_read_binary_file() for that purpose.

/Joel

#19

Isaac Morland

isaac.morland@gmail.com

over 4 years ago

In reply to: Joel Jacobson (#18)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On Thu, 6 May 2021 at 12:02, Joel Jacobson <joel@compiler.org> wrote:

On Thu, May 6, 2021, at 13:41, Isaac Morland wrote:

Yes! A significant missing feature is “take this arbitrary bucket of bits
and move it to/from the database from/to this file without modification of
any kind”. There are all sorts of tutorials on the Web about how to almost
do it or fake it or convert to hex or whatever but I’ve never seen a nice
simple explanation of “here is the command:”.

Of course there is a lot to think about. Coming out of the database, the
query result must be exactly one row containing exactly one column; how do
we handle other database results? Coming into the database, only one data
value can come from a single file; so how do we populate the other columns
of whatever table we copy to?

If the file is on the server, you can use pg_read_binary_file() for that
purpose.

Yes, sorry, I should have explicitly said “on the client” somewhere up
there. Getting files from the DB server into the DB is no problem.

#20

Andrew Dunstan

andrew@dunslane.net

over 4 years ago

In reply to: Isaac Morland (#17)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

On 5/6/21 7:41 AM, Isaac Morland wrote:

On Thu, 6 May 2021 at 02:21, Darafei "Komяpa" Praliaskouski
<me@komzpa.net <mailto:me@komzpa.net>> wrote:

What I would prefer is some new COPY mode like RAW that will just
push whatever it gets on the stdin/input into the cell on the
server side. This way it can be proxied by psql, utilize existing
infra for passing streams and be used in shell scripting.

Yes! A significant missing feature is “take this arbitrary bucket of
bits and move it to/from the database from/to this file without
modification of any kind”. There are all sorts of tutorials on the Web
about how to almost do it or fake it or convert to hex or whatever but
I’ve never seen a nice simple explanation of “here is the command:”.

Of course there is a lot to think about. Coming out of the database,
the query result must be exactly one row containing exactly one
column; how do we handle other database results? Coming into the
database, only one data value can come from a single file; so how do
we populate the other columns of whatever table we copy to?

What I'd like is something a bit more general. We could extend the FDW
API to allow for a CopyStdin handler or some such. The input could be
sent unfiltered to the handler, which would do whatever it liked with
it. That way FDWs like file_fdw and file_text_array_fdw could read from
stdin, for example. I'm pretty sure it would handle the OP's use case.
The downside is you'd need to write an FDW handler, but that's not too
hard, and there are lots of examples.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#21

Daniel Verite

daniel@manitou-mail.org

over 4 years ago

In reply to: Darafei "Komяpa" Praliaskouski (#16)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

Darafei "Komяpa" Praliaskouski wrote:

What I would prefer is some new COPY mode like RAW that will just push
whatever it gets on the stdin/input into the cell on the server side. This
way it can be proxied by psql, utilize existing infra for passing streams
and be used in shell scripting.

COPY RAW has been proposed and discussed quite a bit previously:
https://commitfest.postgresql.org/12/676/

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#22

Joel Jacobson

joel@compiler.org

over 4 years ago

In reply to: Daniel Verite (#21)

Re: COPY table_name (single_column) FROM 'unknown.txt' DELIMITER E'\n'

What about:

COPY ... FROM ... WITH PATTERN 'regexp_pattern'

Where the columns would be matched with the capture groups.

This could handle the quite common case of varying white-space as column separators:

COPY log (col1, col2, col3) FROM 'log.txt' WITH PATTERN '^(\S+)\s+(\S+)\s+(\S+)$'

This could also handle $SUBJECT:

COPY table_name (single_column) FROM 'unknown.txt' WITH PATTERN '^(.*)$';

And lots of other more complex use-cases.

/Joel