Additional options for COPY from

Started by Bryan Sayer7 months ago5 messagesgeneral
Jump to latest
#1Bryan Sayer
brysayer@gmail.com

Hi,

I'm not sure if this is the best list to ask this (and I am very new to
PostgreSQL) but I think more options are needed in the COPY from
command, in order to better deal with exceptions. By exceptions I mean
data not consistent with the format, empty rows, extra delimiters at the
end of rows, etc.

Is there someone or a party that deals with the details of the COPY
command that I could discuss this with?

Just for context, I am dealing with delimited data of millions of rows
and perhaps 60 columns or so that I wish to read into tables in a
Postgres 17 database under Windows 11, with everything on local drives.

--
*Bryan Sayer*
Retired Demographer/Statistician
/In a world in which you can be anything, *be kind*/

#2Ron
ronljohnsonjr@gmail.com
In reply to: Bryan Sayer (#1)
Re: Additional options for COPY from

On Thu, Sep 25, 2025 at 3:49 PM Bryan Sayer <brysayer@gmail.com> wrote:

Hi,

I'm not sure if this is the best list to ask this (and I am very new to
PostgreSQL) but I think more options are needed in the COPY from command,
in order to better deal with exceptions. By exceptions I mean data not
consistent with the format, empty rows, extra delimiters at the end of
rows, etc.

PG 17 COPY has ON_ERROR.

Is there someone or a party that deals with the details of the COPY
command that I could discuss this with?

Just for context, I am dealing with delimited data of millions of rows and
perhaps 60 columns or so that I wish to read into tables in a Postgres 17
database under Windows 11, with everything on local drives.

COPY is pretty simplistic. 3rd party CSV loaders like pgloader and
pg_bulkload exist for that reason. And there's always Python & Perl for
when your data is really dirty.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

#3Brent Wood
pcreso@yahoo.com
In reply to: Bryan Sayer (#1)
Re: Additional options for COPY from

As I understand it, COPY is intended as a simple & fast way to get clean data into Postgres.
It is not intended to groom data during the process. There are a variety of ETL tools that can apply filters & rules to data streams to do what you are asking, so no need to invent another Postgres specific wheel.
That said, there are some ways to do this without a formal ETL application.
If the data rules you want to apply can be provided in a script, an approach I use is to pipe the data file through a set of checks & filters into a COPY command reading from STDIN. Very effective on Linux, Windows doesn't really do this very well.
eg: to change any instance of Grent in your data to great, use:
cat <file> | sed 's/Grent/great/' | psql <db> -c "copy <table> from STDIN with delimiter ',' null '';" 
You can chain as many awk, sed, tr, grep, etc commands as you need together to transform the data as required on the way to COPY
A Postgres only approach that I've also used is to create a loading table with columns to match the incoming data, but each column is an unconstrained text or varchar. Then modify these data to remove any errors using SQL, finishing up with a select to insert the cleaned data into the final table. If the final table has appropriate constraints to validate the data, only valid data will get there. As you find errors, you fix them & rerun the insert until it works.
HTH,
Brent Wood

On Friday, September 26, 2025 at 07:49:09 AM GMT+12, Bryan Sayer <brysayer@gmail.com> wrote:

Hi,

I'm not sure if this is the best list to ask this (and I am very new to PostgreSQL) but I think more options are needed in the COPY from command, in order to better deal with exceptions. By exceptions I mean data not consistent with the format, empty rows, extra delimiters at the end of rows, etc.

Is there someone or a party that deals with the details of the COPY command that I could discuss this with?

Just for context, I am dealing with delimited data of millions of rows and perhaps 60 columns or so that I wish to read into tables in a Postgres 17 database under Windows 11, with everything on local drives.

--
Bryan Sayer
Retired Demographer/Statistician
In a world in which you can be anything, be kind

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Bryan Sayer (#1)
Re: Additional options for COPY from

On Thursday, September 25, 2025, Bryan Sayer <brysayer@gmail.com> wrote:

Is there someone or a party that deals with the details of the COPY
command that I could discuss this with?

This mailing list is the primary medium that reaches a large number of
people some of whom may wish to join in your discussion. Just start the
discussion and see what happens. We tend to discourage addressing
individuals unless you are reporting an actual bug is some code they’ve
written or committed.

David J.

#5Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Bryan Sayer (#1)
Re: Additional options for COPY from

On 9/25/25 11:49, Bryan Sayer wrote:

Hi,

I'm not sure if this is the best list to ask this (and I am very new to
PostgreSQL) but I think more options are needed in the COPY from
command, in order to better deal with exceptions. By exceptions I mean
data not consistent with the format, empty rows, extra delimiters at the
end of rows, etc.

What do you mean by '... deal with exceptions.'?

*Bryan Sayer*
Retired Demographer/Statistician
/In a world in which you can be anything, *be kind*/

--
Adrian Klaver
adrian.klaver@aklaver.com